WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: [PATCH] xen: fix interrupt routing

On 14.06.2011, at 15:27, Stefano Stabellini wrote:

> On Tue, 14 Jun 2011, Alexander Graf wrote:
>>>>>>> static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id)
>>>>>>> {
>>>>>>>  PCII440FXState *d = opaque;
>>>>>>> @@ -267,8 +263,17 @@ static PCIBus *i440fx_common_init(const char 
>>>>>>> *device_name,
>>>>>>>  d = pci_create_simple(b, 0, device_name);
>>>>>>>  *pi440fx_state = DO_UPCAST(PCII440FXState, dev, d);
>>>>>>> 
>>>>>>> -    piix3 = DO_UPCAST(PIIX3State, dev,
>>>>>>> -                      pci_create_simple_multifunction(b, -1, true, 
>>>>>>> "PIIX3"));
>>>>>>> +    if (xen_enabled()) {
>>>>>>> +        piix3 = DO_UPCAST(PIIX3State, dev,
>>>>>>> +                pci_create_simple_multifunction(b, -1, true, 
>>>>>>> "PIIX3-xen"));
>>>>>>> +        pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
>>>>>>> +                piix3, XEN_PIIX_NUM_PIRQS);
>>>>>> 
>>>>>> But with XEN_PIIX_NUM_PIRQS it's not a piix3 anymore, no? What's the 
>>>>>> reason behind this change?
>>>>> 
>>>>> It is still a piix3, but also provides non-legacy interrupt links to the
>>>>> IO-APIC.
>>>>> The four pins of each PCI device on the bus not only are routed to the
>>>>> normal four pirqs (programmed writing to 0x60-0x63, see above) but also
>>>>> they are connected to the IO-APIC directly.
>>>>> These additional routes can only be discovered through ACPI, so you need
>>>>> matching ACPI tables. We used to build the old ACPI tables like this:
>>>>> 
>>>>> /* PRTA: APIC routing table (via non-legacy IOAPIC GSIs). */
>>>>> printf("Name(PRTA, Package() {\n");
>>>>> for ( dev = 1; dev < 32; dev++ )
>>>>>  for ( intx = 0; intx < 4; intx++ ) /* INTA-D */
>>>>>      printf("Package(){0x%04xffff, %u, 0, %u},\n",
>>>>>             dev, intx, ((dev*4+dev/8+intx)&31)+16);
>>>>> printf("})\n");
>>>>> 
>>>> 
>>>> Interesting concept, but completely non-standard and very much
>>>> different from real hardware. Please at least add a comment there to
>>>> show readers that Xen is doing a hack which is not at all related to
>>>> how the PIIX really works.
>>> 
>>> Isn't this more a function of the "wires" on the motherboard than the
>>> PIIX specifically? i.e. this just encodes the permutation of the wires
>>> from the PCI slots into the IO-APIC input pins (bypassing the PIIX,
>>> which is only used for legacy ISA IRQs i.e. by non-APIC aware OSes)?
>> 
>> Interrupts with PCI work slightly different. PCI devices can map (themselves 
>> or by software) to one of 4 interrupt lines: INTA, INTB, INTC, INTD. These 
>> get converted using PCI host controller specific logic to 4 interrupt lines 
>> which then go into the IO-APIC.
>> 
>> The IO-APIC is a chip with a limited number of pins. IIRC it was 24, could 
>> be 26 though.
> 
> The number of redirection entries in the IOAPIC can be discovered
> reading from the IOAPICVER register and it is a property of a specific
> model of IOAPIC. As a matter of fact Xen's emulated IOAPIC supports more
> pins than the most popular IOAPIC used with PIIX3.

which means you're emulating hardware that never existed :).

> 
> 
>> I haven't seen a single case where PCI devices have a direct link to the 
>> IO-APIC. I also have not seen any PCI host controller that exports more than 
>> 4 interrupts. Giving each PCI device its own line, on top of that more than 
>> ever could be in real hardware, is a plain hack IMHO.
> 
> Actually this happens quite often: if I am not mistaken all the GSIs
> higher than 15 are actually the result of a direct connection between
> an interrupt source and the IOAPIC. I have several on my testboxes.

Yes. "Interrupt source" meaning a wire on the board. I haven't seen any 
situation so far where you get direct IO-APIC connections to PCI _device_ pins. 
You obviously get plenty connections to PCI _bus_ pins.

> Also give a look at the Intel Multiprocessor Specification, section
> 3.6.2.3: as you can see from the diagram in "Symmetric I/O Mode" all the
> interrupts are routed through the IOAPIC directly.
> 
> 
>> Did this really give you actual performance/latency/scalability gains? I 
>> still think for devices that matter, we should go with MSI rather than 
>> deriving from real hw.
>> 
> 
> Not all the operating systems support MSIs, it is nice to be able to
> avoid interrupt sharing without recurring to MSIs.

Yes and no. It's a tradeoff. If no interrupt sharing means that we emulate 
hardware that simply never could have existed the way we model it, I think it's 
a bad idea.

> Also this is how Xen has been working for more then 5 years in HVM mode,
> so this configuration is well tested and supported by most operating
> systems (at least all the ones we tried so far).

I'm fine with Xen breaking its own neck, as long as it doesn't affect non-Xen 
code paths. Just be aware that I'm not a huge fan of this approach :).

> In any case I think it is a good idea to add a comment to better explain
> what we are doing, see below.
> 
> 
> 
> commit 973bb091a967fdec37a1bc8fe30d46a483d2903d
> Author: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> Date:   Tue May 17 12:10:36 2011 +0000
> 
>    xen: fix interrupt routing
> 
>    - remove i440FX-xen and i440fx_write_config_xen
>    we don't need to intercept pci config writes to i440FX anymore;
> 
>    - introduce PIIX3-xen and piix3_write_config_xen
>    we do need to intercept pci config write to the PCI-ISA bridge to update
>    the PCI link routing;
> 
>    - set the number of PIIX3-xen interrupts line to 128;

I still find it unpretty and I'm pretty sure it's completely different from 
real hardware, but since Xen code is your call and this doesn't affect non-Xen 
workloads, I won't block it, unless someone else is very much opposed to it.

Please resend as proper patch.


Alex


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel