>>> On 14.11.11 at 16:27, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> On 14/11/11 14:20, Jan Beulich wrote:
>> I'm having some more fundamental problem with this original change
>> of yours: The assignment of the new vector happens in the context
>> of move_native_irq(), which gets called *after* doing the manual
>> EOI (and hence also *after* the vector != desc->arch.vector check).
>> How does that do what it is supposed to?
>>
>> (Below the [untested] patch that I would propose to do what I described
>> above, pending your clarification regarding the original change.)
>>
>> Jan
>
> Are you sure? I was under the impression that for safety, you have to
> move the IRQ before ack'ing it at the local APIC, to prevent another IRQ
> coming in while you are updating the structures.
Just take another look at end_level_ioapic_irq().
> The steps (as far as I remember from debugging this issue) are:
>
> 1) Scheduler decides move a vcpu to a different pcpu. This implies
> moving all bound IRQs. It sets irq_desc->arch->pending_mask to the
> pcpu(s) we wish to move to, and sets irq_desc->static |= IRQ_MOVE_PENDING
> 2) When an irq appears (on the original pcpu) which has IRQ_MOVE_PENDING
> set, the ack handler rewrites the RTE to point to the new pcpu:vector,
> and updates the irq_desc a bit. This rewriting necessarily happens
> before sending an EOI.
One would think so, but according to my reading of the code that's
not the case. And it really can't, as otherwise
io_apic_level_ack_pending() would always return true (and hence
move_native_irq() would never get called).
> 3) When the next irq appears (on the new pcpu), the code cleans up the
> old vector_irq reference for the old pcpu, and tweaks some of irq_desc.
But the crucial thing is that desc->arch.vector gets written during step 2,
and hence the condition around the EOI can never be true unless an
interrupt surfaces in the window between the EOI in step 2 and the
disabling of it in move_native_irq(). Was it just this special case that
your original change addressed (the change description doesn't hint in
that direction).
> An alternative to 3) can occur when using the IRQ_MOVE_CLEANUP_VECTOR
> IPI which looks through all pending cleanups on the current pcpu and
> cleans them up.
>
> Anyway, it was certainly the case that I saw the RTE being updated
> before the EOI was sent, which is why I had to change the
> hw_irq_controler.end interface to include the vector from the pending
> EOI stack, so end_level_ioapic_irq could EOI the new vector if it was
> different from the vector the lapic saw.
That's another thing I would want to revert: There really shouldn't be
a need to pass in the old vector, as it is getting stored in
__assign_irq_vector(). If the value wasn't valid anymore at the point
we'd need it, we'd have a more severe problem - the vector gets
marked as not in use by smp_irq_move_cleanup_interrupt(), and
hence it wouldn't be valid anymore to play with it (read: EOI it) as
it may have got re-used already. As vector_irq[] gets updated there
too, that should result in "No irq handler for vector ...", but it would
not make it into end_level_ioapic_irq() in such a case.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|