On 14/11/11 14:20, Jan Beulich wrote:
>>>> On 14.11.11 at 14:59, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 14/11/11 08:57, Jan Beulich wrote:
>>>>>> On 11.11.11 at 18:13, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> On 11/11/11 16:00, Jan Beulich wrote:
>>>>> Following the prevention of vector sharing for MSIs, this change
>>>>> enforces the same within IO-APICs: Pin based interrupts use the IO-APIC
>>>>> as their identifying device under the AMD IOMMU (and just like for
>>>>> MSIs, only the identifying device is used to remap interrupts here,
>>>>> with no regard to an interrupt's destination).
>>>>>
>>>>> Additionally, LAPIC initiated EOIs (for level triggered interrupts) too
>>>>> use only the vector for identifying which interrupts to end. While this
>>>>> generally causes no significant problem (at worst an interrupt would be
>>>>> re-raised without a new interrupt event actually having occurred)
>>>> At worst, hardware asserts a line interrupt, deasserts it later, and an
>>>> EOI broadcast gets rid of any record that the IRQ was ever raised.
>>>> While I would classify this as buggy behavior, I believe I have seen
>>>> some hardware doing this when investigating the line level IRQ migration
>>>> bug, as clearing the IRR did not immediately cause another interrupt to
>>>> be generated.
>>>>
>>>>> , it
>>>>> still seems better to avoid the situation.
>>>>>
>>>>> For this second aspect, a distinction is being made between the
>>>>> traditional and the directed-EOI cases: In the former, vectors should
>>>>> not be shared throughout all IO-APICs in the system, while in the
>>>>> latter case only individual IO-APICs need to be contrained (or, if the
>>>>> firmware indicates so, sub- groups of them having the same GSI appear
>>>>> at multiple pins).
>>>>>
>>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>> Provisional nack because it is my understanding that under all
>>>> circumstances, you must maintain a vector exclusivity map across all
>>>> IO-APICs because of the broadcast problem. Or have I made a mistake in
>>>> my reasoning?
>>> With directed EOI there's no broadcasting involved, which is why
>>> global sharing prevention is not necessary.
>>>
>>> However, after some more thinking over the weekend I think we need
>>> to also/first adjust end_level_ioapic_irq()'s call to io_apic_eoi_vector():
>>> It shouldn't really iterate over all IO-APICs, but instead call
>>> eoi_IO_APIC_irq(). Thoughts? (That would also eliminate the need to
>>> look up pin or vector in __io_apic_eoi(), as all remaining call sites pass
>>> both.)
>>>
>>> Jan
>>>
>> I believe that should work. By the point end_level_ioapic_irq() is
>> called, all the irq_desc information should point to the new vector, so
>> eoi_IO_APIC_irq() should get it correct. At the time I made that patch,
>> I was not so familiar with the IO-APIC code so decided that calling
>> io_apic_eoi was the safer bet.
> I'm having some more fundamental problem with this original change
> of yours: The assignment of the new vector happens in the context
> of move_native_irq(), which gets called *after* doing the manual
> EOI (and hence also *after* the vector != desc->arch.vector check).
> How does that do what it is supposed to?
>
> (Below the [untested] patch that I would propose to do what I described
> above, pending your clarification regarding the original change.)
>
> Jan
Are you sure? I was under the impression that for safety, you have to
move the IRQ before ack'ing it at the local APIC, to prevent another IRQ
coming in while you are updating the structures.
The steps (as far as I remember from debugging this issue) are:
1) Scheduler decides move a vcpu to a different pcpu. This implies
moving all bound IRQs. It sets irq_desc->arch->pending_mask to the
pcpu(s) we wish to move to, and sets irq_desc->static |= IRQ_MOVE_PENDING
2) When an irq appears (on the original pcpu) which has IRQ_MOVE_PENDING
set, the ack handler rewrites the RTE to point to the new pcpu:vector,
and updates the irq_desc a bit. This rewriting necessarily happens
before sending an EOI.
3) When the next irq appears (on the new pcpu), the code cleans up the
old vector_irq reference for the old pcpu, and tweaks some of irq_desc.
An alternative to 3) can occur when using the IRQ_MOVE_CLEANUP_VECTOR
IPI which looks through all pending cleanups on the current pcpu and
cleans them up.
Anyway, it was certainly the case that I saw the RTE being updated
before the EOI was sent, which is why I had to change the
hw_irq_controler.end interface to include the vector from the pending
EOI stack, so end_level_ioapic_irq could EOI the new vector if it was
different from the vector the lapic saw.
I hope this makes sense - it has been a little while since I last looked
at the problem.
~Andrew
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -69,10 +69,6 @@ int __read_mostly nr_ioapics;
>
> #define ioapic_has_eoi_reg(apic) (mp_ioapics[(apic)].mpc_apicver >= 0x20)
>
> -#define io_apic_eoi_vector(apic, vector) io_apic_eoi((apic), (vector), -1)
> -#define io_apic_eoi_pin(apic, pin) io_apic_eoi((apic), -1, (pin))
> -
> -
> /*
> * This is performance-critical, we want to do it O(1)
> *
> @@ -220,14 +216,11 @@ static void ioapic_write_entry(
> * same redirection entry in the IO-APIC. */
> static void __io_apic_eoi(unsigned int apic, unsigned int vector, unsigned
> int pin)
> {
> - /* Ensure some useful information is passed in */
> - BUG_ON( (vector == -1 && pin == -1) );
> -
> /* Prefer the use of the EOI register if available */
> if ( ioapic_has_eoi_reg(apic) )
> {
> /* If vector is unknown, read it from the IO-APIC */
> - if ( vector == -1 )
> + if ( vector == IRQ_VECTOR_UNASSIGNED )
> vector = __ioapic_read_entry(apic, pin, TRUE).vector;
>
> *(IO_APIC_BASE(apic)+16) = vector;
> @@ -239,42 +232,6 @@ static void __io_apic_eoi(unsigned int a
> struct IO_APIC_route_entry entry;
> bool_t need_to_unmask = 0;
>
> - /* If pin is unknown, search for it */
> - if ( pin == -1 )
> - {
> - unsigned int p;
> - for ( p = 0; p < nr_ioapic_entries[apic]; ++p )
> - {
> - entry = __ioapic_read_entry(apic, p, TRUE);
> - if ( entry.vector == vector )
> - {
> - pin = p;
> - /* break; */
> -
> - /* Here should be a break out of the loop, but at the
> - * Xen code doesn't actually prevent multiple IO-APIC
> - * entries being assigned the same vector, so EOI all
> - * pins which have the correct vector.
> - *
> - * Remove the following code when the above assertion
> - * is fulfilled. */
> - __io_apic_eoi(apic, vector, p);
> - }
> - }
> -
> - /* If search fails, nothing to do */
> -
> - /* if ( pin == -1 ) */
> -
> - /* Because the loop wasn't broken out of (see comment above),
> - * all relevant pins have been EOI, so we can always return.
> - *
> - * Re-instate the if statement above when the Xen logic has been
> - * fixed.*/
> -
> - return;
> - }
> -
> entry = __ioapic_read_entry(apic, pin, TRUE);
>
> if ( ! entry.mask )
> @@ -1645,7 +1602,6 @@ static void mask_and_ack_level_ioapic_ir
> static void end_level_ioapic_irq(struct irq_desc *desc, u8 vector)
> {
> unsigned long v;
> - int i;
>
> if ( !ioapic_ack_new )
> {
> @@ -1689,15 +1645,9 @@ static void end_level_ioapic_irq(struct
> * operation to prevent an edge-triggered interrupt escaping meanwhile.
> * The idea is from Manfred Spraul. --macro
> */
> - i = desc->arch.vector;
> -
> /* Manually EOI the old vector if we are moving to the new */
> - if ( vector && i != vector )
> - {
> - int ioapic;
> - for (ioapic = 0; ioapic < nr_ioapics; ioapic++)
> - io_apic_eoi_vector(ioapic, i);
> - }
> + if ( vector && desc->arch.vector != vector )
> + eoi_IO_APIC_irq(desc);
>
> v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
>
>
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|