This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channel

To: Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Subject: Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Fri, 27 Aug 2010 16:43:44 -0700
Cc: "Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>, Tom Kopec <tek@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>
Delivery-date: Fri, 27 Aug 2010 16:44:42 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1282941781.26797.386.camel@xxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C743B2C.8070208@xxxxxxxx> <4C74E7C802000078000120C0@xxxxxxxxxxxxxxxxxx> <4C7558E0.1060806@xxxxxxxx> <4C7629D10200007800012387@xxxxxxxxxxxxxxxxxx> <4C769736.4050409@xxxxxxxx> <4C7799EB020000780001276F@xxxxxxxxxxxxxxxxxx> <1282941781.26797.386.camel@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100806 Fedora/3.1.2-1.fc13 Lightning/1.0b2pre Thunderbird/3.1.2
 On 08/27/2010 01:43 PM, Daniel Stodden wrote:
> On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote:
>>>>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
>>> On 08/25/2010 11:46 PM, Jan Beulich wrote:
>>>>  >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
>>>>> Note that this patch is specifically for upstream Xen, which doesn't
>>>>> have any pirq support in it at present.
>>>> I understand that, but saw that you had paralleling changes to the
>>>> pirq handling in your Dom0 tree.
>>>>> However,  I did consider using fasteoi, but I couldn't see how to make
>>>>> it work.  The problem is that it only does a single call into the
>>>>> irq_chip for EOI after calling the interrupt handler, but there is no
>>>>> call beforehand to ack the interrupt (which means clear the event flag
>>>>> in our case).  This leads to a race where an event can be lost after the
>>>>> interrupt handler has returned, but before the event flag has been
>>>>> cleared (because Xen won't set pending or call the upcall function if
>>>>> the event is already set).  I guess I could pre-clear the event in the
>>>>> upcall function, but I'm not sure that's any better.
>>>> That's precisely what we're doing.
>>> You mean pre-clearing the event?  OK.
>>> But aren't you still subject to the bug the switch to handle_edge_irq fixed?
>>> With handle_fasteoi_irq:
>>> cpu A                       cpu B
>>> get event
>> mask and clear event
> Argh. Right, I guess that's my fault, I was the one who came up with the
> PENDING theory, but indeed I failed to see the event masking bits.
> However, please read on.
>>> call action
>>>    :
>>>    :
>>> <migrate event channel to B>
>>>    :                        get event
>> Cannot happen, event is masked (i.e. all that would happen is
>> that the event occurrence would be logged evtchn_pending).
>>>    :                        INPROGRESS set? -> EOI, return
>>>    :
>>> action returns
>>> clear INPROGRESS
>>> EOI
>> unmask event, checking for whether the event got re-bound (and
>> doing the unmask through a hypercall if necessary), thus re-raising
>> the event in any case
> Yes. I agree. So let's come up with a new theory. Right now I'm still
> looking at xen/next. Correct me if I'm mistaken:
> mask_ack_pirq will:
>  1. chip->mask
>  2. chip->ack
> Where chip->ack will:
>  1. move_native_irq
>  2. clear_evtchn.
> Now if you look into move_native_irq, it will:
>  1. chip->mask (gratuitous)
>  2. move
>  3. chip->unmask (aiiiiiie).
> That explains why edge_irq still fixed the problem.

Good point.  I guess the simplest fix in that case would have been to
use move_masked_irq()...

The current fix is not wrong, so we can leave it as-is upstream for now.

But I think I will try Jan's idea about masking/clearing in the event
upcall then using fasteoi as the handler.


Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>