On 15/11/11 11:10, Jan Beulich wrote:
>>>> On 14.11.11 at 18:33, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 14/11/11 16:29, Jan Beulich wrote:
>>>>>> On 14.11.11 at 16:27, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> On 14/11/11 14:20, Jan Beulich wrote:
>>>>> I'm having some more fundamental problem with this original change
>>>>> of yours: The assignment of the new vector happens in the context
>>>>> of move_native_irq(), which gets called *after* doing the manual
>>>>> EOI (and hence also *after* the vector != desc->arch.vector check).
>>>>> How does that do what it is supposed to?
>>>>>
>>>>> (Below the [untested] patch that I would propose to do what I described
>>>>> above, pending your clarification regarding the original change.)
>>>>>
>>>>> Jan
>>>> Are you sure? I was under the impression that for safety, you have to
>>>> move the IRQ before ack'ing it at the local APIC, to prevent another IRQ
>>>> coming in while you are updating the structures.
>>> Just take another look at end_level_ioapic_irq().
>> If it makes a difference, I was debugging the issue on boxes without
>> x2APIC, so there was no direct EOI support. Given the number of times
> Directed EOI has nothing to do with x2apic.
The manual (http://www.naic.edu/~phil/software/intel/318148.pdf) implies
that only x2APIC hardware has DirectEOI support, even though it is
usable in both xAPIC and x2APIC mode if bit 24 of the LAPIC version
register is set.
Irrespective of the implication, I discovered when profiling the code
that DirectEOI was not supported on the box in question.
>> the code conditionally returns depending on direct_eoi_enabled, I would
>> not be surprised if there are two separate codepaths.
>> ...
>>> But the crucial thing is that desc->arch.vector gets written during step 2,
>>> and hence the condition around the EOI can never be true unless an
>>> interrupt surfaces in the window between the EOI in step 2 and the
>>> disabling of it in move_native_irq(). Was it just this special case that
>>> your original change addressed (the change description doesn't hint in
>>> that direction).
>> Given that the bug was certainly a timing issue, and most of the time
>> the IRQ migrated, everything worked fine, It is quite possible that it
>> was a new IRQ appearing in that window. Having said that, even if a new
>> IRQ arrived, its remote IRR should not be set until the LAPIC has acked
>> it, at which point we should be in the IRQ handler for it which seems
>> rather hard if we are still in the handler for the previous IRQ.
> No - the LAPIC acking the IO-APIC request doesn't mean the IRQ
> is already in the process of being handled by software. In particular,
> while the first IRQ is in progress, the IRR bit could get set by a
> second interrupt instance between the EOI and the disabling (in
> move_native_irq()), bit EFLAGS.IF would still be zero. Once sti (or
> an equivalent thereof) would get executed, that second interrupt
> would get delivered from the LAPIC to the execution unit, and the
> mask-and-ack would invoke irq_complete_move(), which in turn
> might send the cleanup IPI (which in turn wouldn't get delivered
> until EFLAGS.IF became 1 again). Depending on whether the IPI
> arrives before or after end_level_ioapic_irq() runs for this second
> interrupt instance, the vectors may or may appear to be out of
> sync (and here the description of your original change seems
> wrong - you're using the new vector, not the old one, as you
> want to make sure that the vector written actually matches the
> already updated RTE(s)).
If this is the case, we really should not be sending an EOI to line
level interrupts until we have actually serviced them (and moved them if
relevant) The IRR should not be cleared until we are able to service a
new interrupt from the source.
Rereading my comment in the patch, it is wrong. The comment should read
/* Manually EOI the new vector if we have migrated while servicing */ or
something equivalent.
>> Perhaps there is a weird interaction whereby acking a masked level
>> interrupt doesn't reset the IRR.
>>
>> I have to admit that the logic seems a bit broken, and it certainly
>> seems to be more complicated than it needs to be.
>>
>>>> An alternative to 3) can occur when using the IRQ_MOVE_CLEANUP_VECTOR
>>>> IPI which looks through all pending cleanups on the current pcpu and
>>>> cleans them up.
>>>>
>>>> Anyway, it was certainly the case that I saw the RTE being updated
>>>> before the EOI was sent, which is why I had to change the
>>>> hw_irq_controler.end interface to include the vector from the pending
>>>> EOI stack, so end_level_ioapic_irq could EOI the new vector if it was
>>>> different from the vector the lapic saw.
>>> That's another thing I would want to revert: There really shouldn't be
>>> a need to pass in the old vector, as it is getting stored in
>>> __assign_irq_vector(). If the value wasn't valid anymore at the point
>>> we'd need it, we'd have a more severe problem - the vector gets
>>> marked as not in use by smp_irq_move_cleanup_interrupt(), and
>>> hence it wouldn't be valid anymore to play with it (read: EOI it) as
>>> it may have got re-used already. As vector_irq[] gets updated there
>>> too, that should result in "No irq handler for vector ...", but it would
>>> not make it into end_level_ioapic_irq() in such a case.
>>>
>>> Jan
>> That is fair enough. When I submitted this patch, irq_cfg did not have
>> old_vector in it - I added old_vector in a later patch. I really should
>> have remembered that I could then revert the change to the interface.
> The point is that we need to be sure that ->arch.old_vector is still valid
> at this point. Yesterday I though I convinced myself that it would be,
> but today I'm again not certain anymore.
>
> Jan
If your description of the case causing the bug above is correct, then
->arch.old_vector may not be valid at this point because of other
processors taking a new interrupt and calling the cleanup code. I am
struggling to work out whether that race condition is possible or not.
One option would be to try it and see if the bug reoccurs - I still have
access to the server which would reproduce the Line Level EOI bug.
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|