This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] domU and dom0 hung with Xen console interrupt binding sh

To: "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx>, "Bruce Edge" <bruce.edge@xxxxxxxxx>
Subject: Re: [Xen-devel] domU and dom0 hung with Xen console interrupt binding showing in-flight=1, (---M)
From: "Jan Beulich" <JBeulich@xxxxxxxxxx>
Date: Wed, 18 Aug 2010 09:47:36 +0100
Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 18 Aug 2010 01:47:43 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C8908CF3.1E2E0%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTimvCMc2_EBmFy4XsXLBa4m_T7LyHMn7Lea_qViY@xxxxxxxxxxxxxx> <C8908CF3.1E2E0%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>>> On 17.08.10 at 20:01, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
> On 17/08/2010 18:28, "Bruce Edge" <bruce.edge@xxxxxxxxx> wrote:
>> On Tue, Jun 29, 2010 at 1:42 AM, Jan Beulich <JBeulich@xxxxxxxxxx> wrote:
>>>>>> On 28.06.10 at 20:22, Dante Cinco <dantecinco@xxxxxxxxx> wrote:
>>>> I have an HP Proliant DL380-G6 (dual Xeon E5540 @ 2.53GHz) with Xen 4.0.0
>>>> and dom0 Linux x86_64 pvops and domU Linux kernel 
>>>> x86_64.
>>>> I'm using PCI passthrough (pci-stub) to pass my 4-port 8Gb PMC-Sierra Fibre
>>>> Channel HBA to domU. After running I/Os for several hours, both dom0 and
>>>> domU hangs and the Xen console shows the interrupt binding below where IRQ
>>>> 66 shows in-flight=1 and mask set (---M). What's the best way to debug this
>>>> problem?
>>> There are potentially two problems here: One is that the guest may
>>> fail to send the EOI notification. You would want to check whether
>>> pirq_guest_eoi() got run after that last occurrence of the interrupt.
>>> The more worrying part is that Xen should time out on a guest failing
>>> to send the EOI notification, and ack the interrupt nevertheless.
>>> Looking at the code I fail to see how the ack_APIC_irq() would get
>>> sent in this case: non-maskable MSIs get this issued from
>>> end_msi_irq(), but ->end doesn't get invoked from
>>> irq_guest_eoi_timer_fn() (only ->enable does). Keir, am I missing
>>> something?
> I don't think that timer logic is designed to handle non-maskable MSIs, only
> maskable ones. It ought to be not too hard to fix it up for non-maskable
> ones too by issuing the ->end() call from the timer handler?

Yes, that was what I was trying to hint at, but I wasn't sure whether
calling ->end() here has any unintended side effects and/or requires
any extra care (like preventing a subsequent guest initiated EOI to
call ->end() again).

While looking at this I came across another thing I don't understand:
__pirq_guest_eoi(), for the ACKTYPE_EOI case, calls __set_eoi_ready()
in a cpu_test_and_clear() conditional, but __set_eoi_ready() bails
out if it finds !cpu_test_and_clear() on the same bitmap - what's the
point of calling __set_eoi_ready() here then (or what am I missing)?


Xen-devel mailing list