WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: prevent vector sharing within IO-APICs

>>> On 14.11.11 at 16:27, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> On 14/11/11 14:20, Jan Beulich wrote:
>> I'm having some more fundamental problem with this original change
>> of yours: The assignment of the new vector happens in the context
>> of move_native_irq(), which gets called *after* doing the manual
>> EOI (and hence also *after* the vector != desc->arch.vector check).
>> How does that do what it is supposed to?
>>
>> (Below the [untested] patch that I would propose to do what I described
>> above, pending your clarification regarding the original change.)
>>
>> Jan
> 
> Are you sure?  I was under the impression that for safety, you have to
> move the IRQ before ack'ing it at the local APIC, to prevent another IRQ
> coming in while you are updating the structures.

Just take another look at end_level_ioapic_irq().

> The steps (as far as I remember from debugging this issue) are:
> 
> 1) Scheduler decides move a vcpu to a different pcpu.  This implies
> moving all bound IRQs.  It sets irq_desc->arch->pending_mask to the
> pcpu(s) we wish to move to, and sets irq_desc->static |= IRQ_MOVE_PENDING
> 2) When an irq appears (on the original pcpu) which has IRQ_MOVE_PENDING
> set, the ack handler rewrites the RTE to point to the new pcpu:vector,
> and updates the irq_desc a bit.  This rewriting necessarily happens
> before sending an EOI.

One would think so, but according to my reading of the code that's
not the case. And it really can't, as otherwise
io_apic_level_ack_pending() would always return true (and hence
move_native_irq() would never get called).

> 3) When the next irq appears (on the new pcpu), the code cleans up the
> old vector_irq reference for the old pcpu, and tweaks some of irq_desc.

But the crucial thing is that desc->arch.vector gets written during step 2,
and hence the condition around the EOI can never be true unless an
interrupt surfaces in the window between the EOI in step 2 and the
disabling of it in move_native_irq(). Was it just this special case that
your original change addressed (the change description doesn't hint in
that direction).

> An alternative to 3) can occur when using the IRQ_MOVE_CLEANUP_VECTOR
> IPI which looks through all pending cleanups on the current pcpu and
> cleans them up.
> 
> Anyway, it was certainly the case that I saw the RTE being updated
> before the EOI was sent, which is why I had to change the
> hw_irq_controler.end interface to include the vector from the pending
> EOI stack, so end_level_ioapic_irq could EOI the new vector if it was
> different from the vector the lapic saw.

That's another thing I would want to revert: There really shouldn't be
a need to pass in the old vector, as it is getting stored in
__assign_irq_vector(). If the value wasn't valid anymore at the point
we'd need it, we'd have a more severe problem - the vector gets
marked as not in use by smp_irq_move_cleanup_interrupt(), and
hence it wouldn't be valid anymore to play with it (read: EOI it) as
it may have got re-used already. As vector_irq[] gets updated there
too, that should result in "No irq handler for vector ...", but it would
not make it into end_level_ioapic_irq() in such a case.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel