I don't know if it can give any insights, but i tried running a xentrace, the
only thing i don't know is how close to the real freeze has made it to disk ...
In these last 2 seconds of the trace i do see some times:
169.940118823 ||xl d1v0 hypercall 17 (iret) eip ffffffff810012eb
169.940119616 ||xl d1v0 hypercall 11 (xen_version) eip ffffffff8100122a
169.940120050 ||xl d1v0 hypercall 11 (xen_version) eip ffffffff8100122a
169.940120540 ||xl d1v0 hypercall 1d (sched_op) eip ffffffff810013aa
]169.940120843 ||xl d1v0 28006(2:8:6) 2 [ 1 0 ]
]169.940122066 ||xl d1v0 2800e(2:8:e) 2 [ 1 6db9 ]
]169.940122206 ||xl d1v0 2800f(2:8:f) 3 [ 0 6db9 1c9c380 ]
]169.940122393 ||xl d1v0 2800a(2:8:a) 4 [ 1 0 0 2 ]
169.940122586 ||xl d1v0 runstate_change d1v0 running->blocked
sched_runstate_process: 1 lost cpus, setting d1v0 runstate to RUNSTATE_LOST
169.940122820 ||xl d?v? runstate_change d0v2 runnable->running
169.940124900 |x|l d0v0 page_fault[ db3124a0 2b9e dc0d1000 2b9e 6 ]
169.940125350 ||xl d0v2 hypercall 11 (xen_version) eip ffffffff8100922a
169.940125986 ||xl d0v2 hypercall 11 (xen_version) eip ffffffff8100922a
169.940126983 |x|l d0v0 hypercall 11 (xen_version) eip ffffffff8100922a
169.940127210 ||xl d0v2 emulate privop[ 8167dc5e ffffffff ]
169.940127773 ||xl d0v2 emulate privop[ 8167dca6 ffffffff ]
But perhaps that sounds worse than it actually is.
This trace was done on:
- Intel Quad core
- only 1 domU started, with videograbbing on pci-e xhci controller, device
using msi-x interrupts
- xen_changeset : Fri Oct 08 11:41:57 2010 +0100 22230:a33886146b45
- dom0 kernel jeremy's pvops xen/next last commit
- domU kernel konrad's pcifront-0.8.1 tree last commit
- last piece of the trace bzip2'ed
Wednesday, October 13, 2010, 9:52:22 AM, you wrote:
> On 13/10/2010 08:00, "Sander Eikelenboom" <linux@xxxxxxxxxxxxxx> wrote:
>> Hello Keir,
>> OK let's rephrase, in what cases is it logical that the xen serial console
>> freezes together with dom0 ?
>> For example some deadlock causes cpu0 to stall on a heavily loaded system ..
>> I think having the serial console available to dump the machines state is
>> quite vital :-(
> Oh, there was a fix for serial interrupt routing: xen-unstable:22148 or
> xen-4.0-testing:21342. Are you running a more recent hypervisor than that?
> The fix prevents serial interrupt from being migrated away from pcpu0, which
> will not work as there is no vector allocated for it on other pcpus. This
> kind of fits with the bug you're seeing, which doesn't manifest if you leave
> pcpu0 unloaded (and hence presumably serial interrupt binding prefers to
> stay with unloaded pcpu0).
> -- Keir
>> I have tried the max_cstate=1 together with the latest 2.6.32-xen-next-pvops
>> kernel as dom0 kernel (which Ian's fix to the event channels).
>> But with the compile test it freezes just as fast.
>> Will try xen before changesets 20072/20073 now, probably with 2.6.31 pvops,
>> since 2.6.32 would need a more recent hypervisor.
>> Wednesday, October 13, 2010, 1:34:58 AM, you wrote:
>>> On 12/10/2010 18:17, "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx> wrote:
>>>> A couple of that might fix the problems are:
>>>> 1). Ian's fix to the event channels:
>>>> 2). Disable IRQ balancing in Xen (and also in Linux kernel).
>>>> 3). Pin domains, but nothing to Domain 0.
>>> ITYM cpu 0. Not that this should rightly make any difference that I can see.
>>> My suspicion would be the per-CPU IDT patches introduced during 4.0
>>> development. Or changes to enable deep C-state sleeps by default. One or the
>>> other causing lost interrupts. I think the latter can be discounted by
>>> max_cstate=1 as a Xen boot parameter. The former would require trying a
>>> build of Xen before and after changesets 20072/20073 -- they are the ones
>>> that did the heavy lifting to implement per-CPU IDTs.
>>> -- Keir
>>>> But it might be worth trying them out?
Description: Binary data
Xen-devel mailing list