WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] IRQ: manually EOI migrating line interrupts

>>> On 05.09.11 at 14:29, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:

> 
> On 05/09/11 11:43, Jan Beulich wrote:
>>>>> On 30.08.11 at 16:17, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>> When migrating IO-APIC line level interrupts between PCPUs, the
>>> migration code rewrites the IO-APIC entry to point to the new
>>> CPU/Vector before EOI'ing it.
>>>
>>> The EOI process says that EOI'ing the Local APIC will cause a
>>> broadcast with the vector number, which the IO-APIC must listen to to
>>> clear the IRR and Status bits.
>>>
>>> In the case of migrating, the IO-APIC has already been
>>> reprogrammed so the EOI broadcast with the old vector fails to match
>>> the new vector, leaving the IO-APIC with an outstanding vector,
>>> preventing any more use of that line interrupt.  This causes a lockup
>>> especially when your root device is using PCI INTA (megaraid_sas
>>> driver *ehem*)
>>>
>>> However, the problem is mostly hidden because send_cleanup_vector()
>>> causes a cleanup of all moving vectors on the current PCPU in such a
>>> way which does not cause the problem, and if the problem has occured,
>>> the writes it makes to the IO-APIC clears the IRR and Status bits
>>> which unlocks the problem.
>>>
>>>
>>> This fix is distinctly a temporary hack, waiting on a cleanup of the
>>> irq code.  It checks for the edge case where we have moved the irq,
>>> and manually EOI's the old vector with the IO-APIC which correctly
>>> clears the IRR and Status bits.  Also, it protects the code which
>>> updates irq_cfg by disabling interrupts.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>
>>> diff -r 227130622561 -r a95fd5d03c20 xen/arch/x86/hpet.c
>>> --- a/xen/arch/x86/hpet.c   Thu Aug 25 12:03:14 2011 +0100
>>> +++ b/xen/arch/x86/hpet.c   Tue Aug 30 15:15:56 2011 +0100
>>> @@ -301,7 +301,7 @@ static void hpet_msi_ack(unsigned int ir
>>>      ack_APIC_irq();
>>>  }
>>>  
>>> -static void hpet_msi_end(unsigned int irq)
>>> +static void hpet_msi_end(unsigned int irq, u8 vector)
>>>  {
>>>  }
>>>  
>>> diff -r 227130622561 -r a95fd5d03c20 xen/arch/x86/i8259.c
>>> --- a/xen/arch/x86/i8259.c  Thu Aug 25 12:03:14 2011 +0100
>>> +++ b/xen/arch/x86/i8259.c  Tue Aug 30 15:15:56 2011 +0100
>>> @@ -93,7 +93,7 @@ static unsigned int startup_8259A_irq(un
>>>      return 0; /* never anything pending */
>>>  }
>>>  
>>> -static void end_8259A_irq(unsigned int irq)
>>> +static void end_8259A_irq(unsigned int irq, u8 vector)
>>>  {
>>>      if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)))
>>>          enable_8259A_irq(irq);
>>> diff -r 227130622561 -r a95fd5d03c20 xen/arch/x86/io_apic.c
>>> --- a/xen/arch/x86/io_apic.c        Thu Aug 25 12:03:14 2011 +0100
>>> +++ b/xen/arch/x86/io_apic.c        Tue Aug 30 15:15:56 2011 +0100
>>> @@ -1690,7 +1690,7 @@ static void mask_and_ack_level_ioapic_ir
>>>      }
>>>  }
>>>  
>>> -static void end_level_ioapic_irq (unsigned int irq)
>>> +static void end_level_ioapic_irq (unsigned int irq, u8 vector)
>>>  {
>>>      unsigned long v;
>>>      int i;
>>> @@ -1739,6 +1739,14 @@ static void end_level_ioapic_irq (unsign
>>>   */
>>>      i = IO_APIC_VECTOR(irq);
>>>  
>>> +    /* Manually EOI the old vector if we are moving to the new */
>>> +    if ( vector && i != vector )
>>> +    {
>>> +        int ioapic;
>>> +        for (ioapic = 0; ioapic < nr_ioapics; ioapic++)
>>> +            io_apic_eoi(ioapic, i);
>> While I realize that the patch already went in, this still will need
>> adjustment for dealing with old IO-APICs that don't have an EOI
>> register (or if we want to stop supporting such, a clear panic()
>> rather than subtle and hard to debug failure).
>>
>> Jan
>>
> 
> This is a good point.  However, due to the use of io_apic_eoi elsewhere
> in the code, I don't think this is the only area susceptible to this issue.

The only other two uses that I could spot are in directed-EOI specific
code (where a new enough IO-APIC can be implied) and in the code
recently added to clear remoteIRR bits (protected by a version check).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>