WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channel

On Fri, 2010-08-27 at 13:43 -0700, Daniel Stodden wrote:
> On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote:
> > >>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
> > > On 08/25/2010 11:46 PM, Jan Beulich wrote:
> > >>  >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
> > >>> Note that this patch is specifically for upstream Xen, which doesn't
> > >>> have any pirq support in it at present.
> > >> I understand that, but saw that you had paralleling changes to the
> > >> pirq handling in your Dom0 tree.
> > >>
> > >>> However,  I did consider using fasteoi, but I couldn't see how to make
> > >>> it work.  The problem is that it only does a single call into the
> > >>> irq_chip for EOI after calling the interrupt handler, but there is no
> > >>> call beforehand to ack the interrupt (which means clear the event flag
> > >>> in our case).  This leads to a race where an event can be lost after the
> > >>> interrupt handler has returned, but before the event flag has been
> > >>> cleared (because Xen won't set pending or call the upcall function if
> > >>> the event is already set).  I guess I could pre-clear the event in the
> > >>> upcall function, but I'm not sure that's any better.
> > >> That's precisely what we're doing.
> > > 
> > > You mean pre-clearing the event?  OK.
> > > 
> > > But aren't you still subject to the bug the switch to handle_edge_irq 
> > > fixed?
> > > 
> > > With handle_fasteoi_irq:
> > > 
> > > cpu A                     cpu B
> > > get event
> > 
> > mask and clear event
> 
> Argh. Right, I guess that's my fault, I was the one who came up with the
> PENDING theory, but indeed I failed to see the event masking bits.
> 
> However, please read on.
> 
> > > set INPROGRESS
> > > call action
> > >    :
> > >    :
> > > <migrate event channel to B>
> > >    :                      get event
> > 
> > Cannot happen, event is masked (i.e. all that would happen is
> > that the event occurrence would be logged evtchn_pending).
> > 
> > >    :                      INPROGRESS set? -> EOI, return
> > >    :
> > > action returns
> > > clear INPROGRESS
> > > EOI
> > 
> > unmask event, checking for whether the event got re-bound (and
> > doing the unmask through a hypercall if necessary), thus re-raising
> > the event in any case
> 
> Yes. I agree. So let's come up with a new theory. Right now I'm still
> looking at xen/next. Correct me if I'm mistaken:
> 
> mask_ack_pirq will:
>  1. chip->mask
>  2. chip->ack
> 
> Where chip->ack will:
>  1. move_native_irq
>  2. clear_evtchn.
> 
> Now if you look into move_native_irq, it will:
>  1. chip->mask (gratuitous)
>  2. move
>  3. chip->unmask (aiiiiiie).
> 
> That explains why edge_irq still fixed the problem.
> 
> Price question is if that's the kind of fix we wanted then.

XCP has, presumably older, mask_ack() and ack() handlers in
core/evtchn.c. Those
1. move
2. mask
3. ack

and therefore don't have that problem. So maybe this was caused by some
pvops specific patch a while ago?

Cheers,
Daniel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>