Hi, Alex,
It would be interesting to monitor the resume point of xenlinux
when you see box halted there. For example, when you're suspecting
an infinite wait for serial tx space, what on earth does xenlinux try to
write? Based on the content at that point, it's possible to grep in kernel
source to see whether that sentence is a normal output or an error
output.
It's a boresome case if some printks keep roared in the interrupt
handler path of shared NIC device before that handler writes EOI to
IOSAPIC. In most cases, such printks normally mean warning or error
in this path. So maybe you can do a check to see whether your halt
falls into this scenario first. Then if yes, we can see whether to address
this issue directly or to solve another culprit causing it.
As a robust solution even for such error condition, maybe to
override old content in serial tx buffer has to be allowed to forward progress.
Thanks,
Kevin
>From: Tian, Kevin
>Sent: 2006年7月7日 10:09
>>From: Alex Williamson [mailto:alex.williamson@xxxxxx]
>>Sent: 2006年7月7日 6:14
>>
>>On Tue, 2006-07-04 at 09:49 +0800, Tian, Kevin wrote:
>>> Hi, Alex,
>>> Could you try attached patch to see whether progressing a step
>>> for you? It's made on top of last patch, to address a bug that
>>> VEC_XEN_ALIAS is only meaningful when enable bit is on. This bug
>>> may result guest to think shared irq line edge-triggered and thus no
>>> EOI request is issued which may stuck the subsequent instances. :-)
>>
>>Hi Kevin,
>>
>> Good catch with this patch, but it still hangs. Besides having xen
>>call end() in __do_IRQ(), I can also prevent the hang by booting with
>>sync_console. If I INIT the system when it's hung, the only CPU that's
>>not in the idle loop is sitting in do_console_io(), maybe into
>>guest_console_write() (which appears to be getting inlined). I'm
>>wondering if the problem is actually Xen spinning there waiting for tx
>>space and preventing the guest from calling end(). I added a loop
>>counter for debug, but I haven't been able to make it pop out yet.
>>Thanks,
>>
>> Alex
>>
>
>That's the possible cause. Actually I seldom considered serial driver
>itself before:-). Does it spin tx buffer in irq handler or somewhere else
>with irq disabled? Which event may cause xen into infinite spin? If spin
>can exit, xenlinux can be resumed and then end() should be triggered...
>
>Thanks,
>Kevin
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|