This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: Xen 4.1 interrupts not delievered.

To: Keir Fraser <keir@xxxxxxx>
Subject: [Xen-devel] Re: Xen 4.1 interrupts not delievered.
From: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
Date: Sat, 16 Oct 2010 19:56:59 +0200
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Ray.Lin@xxxxxxx, Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>, m.a.young@xxxxxxxxxxxx, Bruce Edge <bruce.edge@xxxxxxxxx>
Delivery-date: Wed, 20 Oct 2010 08:11:57 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C8DB23C6.25D5B%keir@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Eikelenboom IT services
References: <1459310812.20101013090027@xxxxxxxxxxxxxx> <C8DB23C6.25D5B%keir@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi Keir,

I don't know if it can give any insights, but i tried running a xentrace, the 
only thing i don't know is how close to the real freeze has made it to disk ...

In these last 2 seconds of the trace i do see some times:

 169.940118823 ||xl d1v0 hypercall 17 (iret) eip ffffffff810012eb
 169.940119616 ||xl d1v0 hypercall 11 (xen_version) eip ffffffff8100122a
 169.940120050 ||xl d1v0 hypercall 11 (xen_version) eip ffffffff8100122a
 169.940120540 ||xl d1v0 hypercall 1d (sched_op) eip ffffffff810013aa
]169.940120843 ||xl d1v0   28006(2:8:6) 2 [ 1 0 ]
]169.940122066 ||xl d1v0   2800e(2:8:e) 2 [ 1 6db9 ]
]169.940122206 ||xl d1v0   2800f(2:8:f) 3 [ 0 6db9 1c9c380 ]
]169.940122393 ||xl d1v0   2800a(2:8:a) 4 [ 1 0 0 2 ]
 169.940122586 ||xl d1v0 runstate_change d1v0 running->blocked
sched_runstate_process: 1 lost cpus, setting d1v0 runstate to RUNSTATE_LOST
 169.940122820 ||xl d?v? runstate_change d0v2 runnable->running
 169.940124900 |x|l d0v0 page_fault[ db3124a0 2b9e dc0d1000 2b9e 6 ]
 169.940125350 ||xl d0v2 hypercall 11 (xen_version) eip ffffffff8100922a
 169.940125986 ||xl d0v2 hypercall 11 (xen_version) eip ffffffff8100922a
 169.940126983 |x|l d0v0 hypercall 11 (xen_version) eip ffffffff8100922a
 169.940127210 ||xl d0v2 emulate privop[ 8167dc5e ffffffff ]
 169.940127773 ||xl d0v2 emulate privop[ 8167dca6 ffffffff ]

 But perhaps that sounds worse than it actually is.

 This trace was done on:

 - Intel Quad core
 - only 1 domU started, with videograbbing on pci-e xhci controller, device 
using msi-x interrupts
 - xen_changeset : Fri Oct 08 11:41:57 2010 +0100 22230:a33886146b45
 - dom0 kernel jeremy's pvops xen/next last commit 
 - domU kernel konrad's  pcifront-0.8.1 tree last commit 

- last piece of the trace bzip2'ed



Wednesday, October 13, 2010, 9:52:22 AM, you wrote:

> On 13/10/2010 08:00, "Sander Eikelenboom" <linux@xxxxxxxxxxxxxx> wrote:

>> Hello Keir,
>> OK let's rephrase, in what cases is it logical that the xen serial console
>> freezes together with dom0 ?
>> For example some deadlock causes cpu0 to stall on a heavily loaded system ..
>> I think having the serial console available to dump the machines state is
>> quite vital :-(

> Oh, there was a fix for serial interrupt routing: xen-unstable:22148 or
> xen-4.0-testing:21342. Are you running a more recent hypervisor than that?
> The fix prevents serial interrupt from being migrated away from pcpu0, which
> will not work as there is no vector allocated for it on other pcpus. This
> kind of fits with the bug you're seeing, which doesn't manifest if you leave
> pcpu0 unloaded (and hence presumably serial interrupt binding prefers to
> stay with unloaded pcpu0).

>  -- Keir

>> I have tried the max_cstate=1 together with the latest 2.6.32-xen-next-pvops
>> kernel as dom0 kernel (which Ian's fix to the event channels).
>> But with the compile test it freezes just as fast.
>> Will try xen before changesets 20072/20073 now, probably with 2.6.31 pvops,
>> since 2.6.32 would need a more recent hypervisor.
>> --
>> Sander
>> Wednesday, October 13, 2010, 1:34:58 AM, you wrote:
>>> On 12/10/2010 18:17, "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx> wrote:
>>>> A couple of that might fix the problems are:
>>>>  1). Ian's fix to the event channels:
>>>> http://xenbits.xen.org/gitweb?p=people/ianc/linux-2.6.git;a=commit;h=5d30cb2
>>>> a8
>>>> 5912ffb5f6556d55472c26801eef2ea
>>>>  2). Disable IRQ balancing in Xen (and also in Linux kernel). 
>>>> "noirqbalance"
>>>>  3). Pin domains, but nothing to Domain 0.
>>> ITYM cpu 0. Not that this should rightly make any difference that I can see.
>>> My suspicion would be the per-CPU IDT patches introduced during 4.0
>>> development. Or changes to enable deep C-state sleeps by default. One or the
>>> other causing lost interrupts. I think the latter can be discounted by
>>> max_cstate=1 as a Xen boot parameter. The former would require trying a
>>> build of Xen before and after changesets 20072/20073 -- they are the ones
>>> that did the heavy lifting to implement per-CPU IDTs.
>>>  -- Keir
>>>> But it might be worth trying them out?

Best regards,
 Sander                            mailto:linux@xxxxxxxxxxxxxx

Attachment: xen-trace.bz2
Description: Binary data

Xen-devel mailing list
<Prev in Thread] Current Thread [Next in Thread>