xen-devel

[Top] [All Lists]

Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels

from [Jan Beulich]

[Permanent Link][Original]

To:	"Jeremy Fitzhardinge" <jeremy@xxxxxxxx>
Subject:	Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
From:	"Jan Beulich" <JBeulich@xxxxxxxxxx>
Date:	Fri, 27 Aug 2010 09:56:43 +0100
Cc:	"Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>, Tom Kopec <tek@xxxxxxx>, Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Delivery-date:	Fri, 27 Aug 2010 01:57:48 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<4C769736.4050409@xxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<4C743B2C.8070208@xxxxxxxx> <4C74E7C802000078000120C0@xxxxxxxxxxxxxxxxxx> <4C7558E0.1060806@xxxxxxxx> <4C7629D10200007800012387@xxxxxxxxxxxxxxxxxx> <4C769736.4050409@xxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

 >>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
> On 08/25/2010 11:46 PM, Jan Beulich wrote:
>>  >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
>>> Note that this patch is specifically for upstream Xen, which doesn't
>>> have any pirq support in it at present.
>> I understand that, but saw that you had paralleling changes to the
>> pirq handling in your Dom0 tree.
>>
>>> However,  I did consider using fasteoi, but I couldn't see how to make
>>> it work.  The problem is that it only does a single call into the
>>> irq_chip for EOI after calling the interrupt handler, but there is no
>>> call beforehand to ack the interrupt (which means clear the event flag
>>> in our case).  This leads to a race where an event can be lost after the
>>> interrupt handler has returned, but before the event flag has been
>>> cleared (because Xen won't set pending or call the upcall function if
>>> the event is already set).  I guess I could pre-clear the event in the
>>> upcall function, but I'm not sure that's any better.
>> That's precisely what we're doing.
> 
> You mean pre-clearing the event?  OK.
> 
> But aren't you still subject to the bug the switch to handle_edge_irq fixed?
> 
> With handle_fasteoi_irq:
> 
> cpu A                 cpu B
> get event

mask and clear event

> set INPROGRESS
> call action
>    :
>    :
> <migrate event channel to B>
>    :                  get event

Cannot happen, event is masked (i.e. all that would happen is
that the event occurrence would be logged evtchn_pending).

>    :                  INPROGRESS set? -> EOI, return
>    :
> action returns
> clear INPROGRESS
> EOI

unmask event, checking for whether the event got re-bound (and
doing the unmask through a hypercall if necessary), thus re-raising
the event in any case

> The event arriving on B is lost, and there's no record made of it ever
> having arrived.  How do you avoid this?
> 
> With handle_edge_irq, the second event will set PENDING in the irq_desc,
> and a loop will keep running on cpu A until PENDING is clear, so nothing
> is ever lost.

Actually, considering that you mask and unmask just like we do, I
cannot even see why you would have run into above scenario
with handle_level_irq(). Where is the window that I'm missing?

>>> In the dom0 kernels, I followed the example of the IOAPIC irq_chip
>>> implementation, which does the hardware EOI in the ack call at the start
>>> of handle_edge_irq, can did the EOI hypercall (when necessary) there.  I
>>> welcome a reviewer's eye on this though.
>> This I didn't actually notice so far.
>>
>> That doesn't look right, at least in combination with ->mask() being
>> wired to disable_pirq(), which is empty (and btw., if the latter was
>> right, you should also wire ->mask_ack() to disable_pirq() to avoid
>> a pointless indirect call).
>>
>> But even with ->mask() actually masking the IRQ I'm not certain
>> this is right. At the very least it'll make Xen see a potential
>> second instance of the same IRQ much earlier than you will
>> really be able to handle it. This is particularly bad for level
>> triggered ones, as Xen will see them right again after you
>> passed it the EOI notification - as a result there'll be twice as
>> many interrupts seen by Xen on the respective lines.
>>
>> The native I/O APIC can validly do this as ->ack() only gets
>> called for edge triggered interrupts (which is why ->eoi() is
>> wired to ack_apic_level()).
> 
> Yes.  The code as-is works OK, but I haven't checked to see if Xen it
> taking many spurious interrupts.  It probably helps that my test machine
> has MSI for almost everything.
> 
> But does that mean the pirq code needs to have different ack/eoi
> behaviour depending on the triggering of the ioapic interrupt?

If you want to continue to use handle_edge_irq(), I think you will.
With handle_fasteoi_irq(), you would leverage Xen's handling of
edge/level, and wouldn't need to make any distinction.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
[Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jeremy Fitzhardinge Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jan Beulich Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Daniel Stodden Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jan Beulich Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jeremy Fitzhardinge Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jan Beulich Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jeremy Fitzhardinge Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jan Beulich <= Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Daniel Stodden Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Daniel Stodden Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jeremy Fitzhardinge Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jan Beulich Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jan Beulich Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Keir Fraser Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jan Beulich Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Keir Fraser Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jan Beulich Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jeremy Fitzhardinge

Previous by Date:	Re: [Xen-devel] Virtualization project idea, Pasi Kärkkäinen
Next by Date:	RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and pending bit array, Jan Beulich
Previous by Thread:	Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Jeremy Fitzhardinge
Next by Thread:	Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels, Daniel Stodden
Indexes:	[Date] [Thread] [Top] [All Lists]