This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channel

On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote:
> >>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
> > On 08/25/2010 11:46 PM, Jan Beulich wrote:
> >>  >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
> >>> Note that this patch is specifically for upstream Xen, which doesn't
> >>> have any pirq support in it at present.
> >> I understand that, but saw that you had paralleling changes to the
> >> pirq handling in your Dom0 tree.
> >>
> >>> However,  I did consider using fasteoi, but I couldn't see how to make
> >>> it work.  The problem is that it only does a single call into the
> >>> irq_chip for EOI after calling the interrupt handler, but there is no
> >>> call beforehand to ack the interrupt (which means clear the event flag
> >>> in our case).  This leads to a race where an event can be lost after the
> >>> interrupt handler has returned, but before the event flag has been
> >>> cleared (because Xen won't set pending or call the upcall function if
> >>> the event is already set).  I guess I could pre-clear the event in the
> >>> upcall function, but I'm not sure that's any better.
> >> That's precisely what we're doing.
> > 
> > You mean pre-clearing the event?  OK.
> > 
> > But aren't you still subject to the bug the switch to handle_edge_irq fixed?
> > 
> > With handle_fasteoi_irq:
> > 
> > cpu A                       cpu B
> > get event
> mask and clear event

Argh. Right, I guess that's my fault, I was the one who came up with the
PENDING theory, but indeed I failed to see the event masking bits.

However, please read on.

> > call action
> >    :
> >    :
> > <migrate event channel to B>
> >    :                        get event
> Cannot happen, event is masked (i.e. all that would happen is
> that the event occurrence would be logged evtchn_pending).
> >    :                        INPROGRESS set? -> EOI, return
> >    :
> > action returns
> > clear INPROGRESS
> > EOI
> unmask event, checking for whether the event got re-bound (and
> doing the unmask through a hypercall if necessary), thus re-raising
> the event in any case

Yes. I agree. So let's come up with a new theory. Right now I'm still
looking at xen/next. Correct me if I'm mistaken:

mask_ack_pirq will:
 1. chip->mask
 2. chip->ack

Where chip->ack will:
 1. move_native_irq
 2. clear_evtchn.

Now if you look into move_native_irq, it will:
 1. chip->mask (gratuitous)
 2. move
 3. chip->unmask (aiiiiiie).

That explains why edge_irq still fixed the problem.

Price question is if that's the kind of fix we wanted then.


Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>