This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] domU and dom0 hung with Xen console interrupt binding sh

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] domU and dom0 hung with Xen console interrupt binding showing in-flight=1, (---M)
From: Bruce Edge <bruce.edge@xxxxxxxxx>
Date: Thu, 19 Aug 2010 06:42:36 -0700
Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>
Delivery-date: Thu, 19 Aug 2010 06:43:16 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=IODuh827F50dbGmIK3XHzMFxVm/9p4WK0kGK/X3wfb8=; b=IS23sguQ7WgrKfjmd9avjuBm1SiSh/jJURNkbYGe7NKnp5AxB/NW3OiBnOhxnZPeBH vYFjpnkgZfk/hDtghXNeMdc3SeX0q02NZczdqBupINgoyeuRrhKqrleVDxDWhu6PbBAr itdwT1am2pC62NxafPMIUSP90IfKch8jwelU8=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=G1o8bQB+2XnHA84SC7yiMUtiNhYa9QfFkLd3hHbpbrdIAPgUXSJB/3jr2M9RW0vi2z sW3hxMwquN87M1MnvYBhFCCKEJM5ZPCl1RbqDVb7s6iE6uV5qUMWH3pnSR2OKGLsZL16 FXZhmqVstYLAsr9+6qBWsB6jawpw5sdyteuJ4=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C8916927.1E3CC%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C6BBA480200007800010949@xxxxxxxxxxxxxxxxxx> <C8916927.1E3CC%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx


On Wed, Aug 18, 2010 at 2:40 AM, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
On 18/08/2010 09:47, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:

> Yes, that was what I was trying to hint at, but I wasn't sure whether
> calling ->end() here has any unintended side effects and/or requires
> any extra care (like preventing a subsequent guest initiated EOI to
> call ->end() again).

Oh you can't naively call ->end() from the time-out handler. You would need
to do something like this in irq_guest_eoi_timer_fn:
 if ( (desc->status & IRQ_GUEST) &&
     (action->ack_type == ACKTYPE_EOI) ) {
   cpu_eoi_map = action->cpu_eoi_map;
   on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, 0);

I don't think the IRQ_GUEST_EOI_PENDING flag or any of that stuff is needed
for the ACKTYPE_EOI case. I'd make the handling of that, calling of
->disable/->enable and so on, dependent on ACKTYPE_NONE.

> While looking at this I came across another thing I don't understand:
> __pirq_guest_eoi(), for the ACKTYPE_EOI case, calls __set_eoi_ready()
> in a cpu_test_and_clear() conditional, but __set_eoi_ready() bails
> out if it finds !cpu_test_and_clear() on the same bitmap - what's the
> point of calling __set_eoi_ready() here then (or what am I missing)?

__pirq_guest_eoi() acts on a private on-stack copy of cpu_eoi_map. This is
because on_selected_cpus() cannot be called with desc->lock held. But as
soon as desc->lock is released, the desc->action structure can be freed by
another CPU, so it would be invalid to reference action->cpu_eoi_map
directly after desc->lock is released.

 -- Keir

Is there any more information that I can provide that would be helpful in diagnosing the direct cause and the appropriate fix?
Possibly adding instrumentation or trace code to detect the trigger conditions?
This is very repeatable on our target systems after a few hours of load.



Xen-devel mailing list