|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] [xen-unstable test] 6947: regressions - trouble: broken/
On 02/05/2011 10:01, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>>>> On 01.05.11 at 22:48, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
>> On 01/05/2011 20:56, "Ian Jackson" <Ian.Jackson@xxxxxxxxxxxxx> wrote:
>>
>>> flight 6947 xen-unstable real [real]
>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/6947/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking:
>>> test-amd64-amd64-pair 8 xen-boot/dst_host fail REGR. vs.
>>> 6944
>>> test-amd64-amd64-pair 7 xen-boot/src_host fail REGR. vs.
>>> 6944
>>> test-amd64-amd64-pv 5 xen-boot fail REGR. vs.
>>> 6944
>>
>> Looks like your bug, Jan (changeset 23296):
>
> I'm afraid you'll have to revert 23295 and 23296 for the time being,
> as there's no obvious immediate solution: {set,clear}_domain_irq_pirq()
> must be called with the IRQ descriptor lock held (which implies disabling
> IRQs), and they must be able to call xmalloc() (both through
> radix_tree_insert() and pirq_get_info() -> alloc_pirq_struct()).
Okay, reverted.
> I have to admit that I find it bogus to not be allowed to call xmalloc()
> with interrupts disabled. There's no equivalent restriction on kmalloc()
> in Linux.
Well, the reason for the restriction on IRQ-disabled status on spinlock
acquisition (IRQs disabled *only*, or IRQs disabled *never*) is because of
the TSC synchronising rendezvous in x86/time.c:time_calibration().
A few options:
(1) Revert that rendezvous to using softirq or similar. The reason it was
turned into hardirq rendezvous is that Dan Magenheimer measured that it
reduced TSC skew by an order of magnitude or more. Perhaps it matters less
on modern CPUs, or perhaps we could come up with some other smart workaround
that would once again let us acquire IRQ-unsafe spinlocks with IRQs
disabled. See (2) for why alloc_heap_pages() may still be IRQs-disabled
unsafe however.
(2) Change the xmalloc lock to spin_lock_irqsave(). This would also have to
be transitively applied to at least the heap_lock in page_alloc.c. One issue
with this (and indeed with calling alloc_heap_pages at all with IRQs
disabled) is that alloc_heap_pages does actually assume IRQs are enabled
(for example, it calls flush_tlb_mask()) -- actually I think this limitation
probably predates the tsc rendezvous changes, and could be a source of
latent bugs in earlier Xen releases.
(3) Restructure the interrupt code to do less work in IRQ context. For
example tasklet-per-irq, and schedule on the local cpu. Protect a bunch of
the PIRQ structures with a non-IRQ lock. Would increase interrupt latency if
the local CPU is interrupted in hypervisor context. I'm not sure about this
one -- I'm not that happy about the amount of work now done in hardirq
context, but I'm not sure on the performance impact of deferring the work.
-- Keir
> If we really need to stay with this limitation, I'd have to replace the
> call to xmalloc() in alloc_irq_struct() with one to xmem_pool_alloc(),
> disabling interrupts up front. Similarly I'd have to call the radix tree
> insertion/deletion functions with custom allocation routines. Both
> parts would feel like hacks to me though.
>
> An alternative (implementation-wise, i.e. not much less of a hack
> imo) might be to introduce something like xmalloc_irq() which always
> disabled IRQs and does its allocations from a separate pool (thus
> using a distinct spin lock).
>
> Jan
>
>> May 1 17:03:45.335804 (XEN) Xen BUG at spinlock.c:47
>> May 1 17:03:45.734780 (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Not
>> tainted ]----
>> May 1 17:03:45.734819 (XEN) CPU: 0
>> May 1 17:03:45.743763 (XEN) RIP: e008:[<ffff82c480123cc4>]
>> check_lock+0x44/0x50
>> May 1 17:03:45.743796 (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor
>> May 1 17:03:45.755762 (XEN) rax: 0000000000000000 rbx: ffff8301a7ff9868
>> rcx: 0000000000000001
>> May 1 17:03:45.755797 (XEN) rdx: 0000000000000000 rsi: 0000000000000001
>> rdi: ffff8301a7ff986c
>> May 1 17:03:45.770774 (XEN) rbp: ffff82c48029fca0 rsp: ffff82c48029fca0
>> r8: 0000000000000000
>> May 1 17:03:45.782761 (XEN) r9: 00000000deadbeef r10: ffff82c48021ca20
>> r11: 0000000000000286
>> May 1 17:03:45.782796 (XEN) r12: ffff8301a7ff8000 r13: 0000000000000080
>> r14: 0000000000000000
>> May 1 17:03:45.787773 (XEN) r15: ffff8301a7ff9868 cr0: 000000008005003b
>> cr4: 00000000000006f0
>> May 1 17:03:45.802762 (XEN) cr3: 000000021b001000 cr2: ffff88000191cfc0
>> May 1 17:03:45.802791 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss:
>> e010 cs: e008
>> May 1 17:03:45.814766 (XEN) Xen stack trace from rsp=ffff82c48029fca0:
>> May 1 17:03:45.814794 (XEN) ffff82c48029fcb8 ffff82c480123d01
>> 0000000000000080 ffff82c48029fcf8
>> May 1 17:03:45.826766 (XEN) ffff82c48012a73a ffff82c48029fd08
>> 0000000000000080 0000000000000080
>> May 1 17:03:45.826800 (XEN) 0000000000000090 0000000000000000
>> ffff8301a7e82000 ffff82c48029fd28
>> May 1 17:03:45.838781 (XEN) ffff82c48012abfa 0000000000000002
>> 0000000000000010 ffff8301a7e82000
>> May 1 17:03:45.846770 (XEN) ffff8301a7e82000 ffff82c48029fd58
>> ffff82c4801615e1 ffff82c4802d9950
>> May 1 17:03:45.858762 (XEN) 0000000000000000 ffff8301a7e82000
>> ffff8301a7e821a8 ffff82c48029fd88
>> May 1 17:03:45.858797 (XEN) ffff82c4801043b9 ffff8301a7e82c18
>> 0000000000000000 0000000000000000
>> May 1 17:03:45.870772 (XEN) 0000000000000000 ffff82c48029fdc8
>> ffff82c480160bdb 0000000000000000
>> May 1 17:03:45.882762 (XEN) 0000000000000286 0000000000000000
>> 0000000000000000 ffff8301a7e82000
>> May 1 17:03:45.882796 (XEN) 0000000000000000 ffff82c48029fe48
>> ffff82c480161186 0000000000000000
>> May 1 17:03:45.894774 (XEN) 00000001801198ad 0000000000000000
>> ffff8301a7ffaed0 ffff82c48029fe48
>> May 1 17:03:45.899765 (XEN) ffff82c4801675a1 ffff8301a7f000b4
>> ffff8300d7afb000 ffff82c48029fe48
>> May 1 17:03:45.899805 (XEN) 0000000000000000 ffffffff817afea8
>> 0000000000000000 0000000000000000
>> May 1 17:03:45.911776 (XEN) 0000000000000000 ffff82c48029fef8
>> ffff82c480174adb ffff82c4802d8c00
>> May 1 17:03:45.923767 (XEN) ffff82c4802d95a0 000000011fc37ff0
>> 0000000000000000 ffffffff817afee8
>> May 1 17:03:45.923801 (XEN) ffffffff810565b5 ffffffff817aff18
>> 0000000000000000 0000000000000000
>> May 1 17:03:45.938774 (XEN) ffff82c4802b8880 ffff82c48029ff18
>> ffffffffffffffff ffff8301a7e82000
>> May 1 17:03:45.947764 (XEN) 000000008012395f ffff82c480159df4
>> ffff8300d7afb000 0000000000000000
>> May 1 17:03:45.947800 (XEN) ffffffff817aff08 ffffffff818cc510
>> 0000000000000000 00007d3b7fd600c7
>> May 1 17:03:45.959772 (XEN) ffff82c480213eb8 ffffffff8100942a
>> 0000000000000021 0000000000000000
>> May 1 17:03:45.974784 (XEN) Xen call trace:
>> May 1 17:03:45.974811 (XEN) [<ffff82c480123cc4>] check_lock+0x44/0x50
>> May 1 17:03:45.974830 (XEN) [<ffff82c480123d01>] _spin_lock+0x11/0x5d
>> May 1 17:03:45.982768 (XEN) [<ffff82c48012a73a>]
>> xmem_pool_alloc+0x138/0x4d4
>> May 1 17:03:45.982799 (XEN) [<ffff82c48012abfa>] _xmalloc+0x124/0x1ce
>> May 1 17:03:45.991767 (XEN) [<ffff82c4801615e1>]
>> alloc_pirq_struct+0x36/0x7f
>> May 1 17:03:45.991804 (XEN) [<ffff82c4801043b9>] pirq_get_info+0x43/0x8f
>> May 1 17:03:46.003769 (XEN) [<ffff82c480160bdb>]
>> set_domain_irq_pirq+0x71/0xae
>> May 1 17:03:46.003791 (XEN) [<ffff82c480161186>]
>> map_domain_pirq+0x370/0x3bb
>> May 1 17:03:46.018770 (XEN) [<ffff82c480174adb>]
>> do_physdev_op+0xa6b/0x1598
>> May 1 17:03:46.018802 (XEN) [<ffff82c480213eb8>]
>> syscall_enter+0xc8/0x122
>> May 1 17:03:46.030766 (XEN)
>> May 1 17:03:46.030783 (XEN)
>> May 1 17:03:46.030798 (XEN) ****************************************
>> May 1 17:03:46.030825 (XEN) Panic on CPU 0:
>> May 1 17:03:46.038760 (XEN) Xen BUG at spinlock.c:47
>> May 1 17:03:46.038783 (XEN) ****************************************
>> May 1 17:03:46.038808 (XEN)
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|