I'd give some experiement I did after I discovered this issue.
The device was a 82575EB NIC card, the driver I used was igb 1.0.8
(search http://sourceforge.net/project/showfiles.php?group_id=42302 for
it).
LSC interrupt is a line status change interrupt. It can happen
physically , or it can be triggered as the driver did in igb_open() in
igb_main.c line 1496, which write to a special register (E1000_ICS) to
trigger an interrupt event.
I did some experiemnt in linux 2.6.23 again with this driver, I try to
a) change the handle_edge_irqs() to mask/ack to only ack the interrupt
if the interrupt happen when the previous one is on way, see the patch
below, b) commented out line 1496 in the driver.
The investigation result is,
1) if mask and ack the interrupt, the interrupt will happen 3 times, the
last 2 is masked because they happened when the first one is still
pending for ISR's handler, the system is ok.
2) if ack and no-mask the interrupt, the interrupt will happen
continously, the system hang for ever.
3) if ack and no-mask the interrupt, and I remove line 1496 (i.e. no
software trigger interrupt), the intrrupt will happen twice, system is
ok.
So I suppose the problem happens only if trigger the interrupt by
software. I consulted the HW engineer also but didn't get confirmation,
the only answer I got is, the PCI-E need a rising edge before send the
2nd interrupt :(
I'm not sure if there are any other BRAIN-DEAD device like this, I only
have this device to test MSI-X function, but we may need make sure it
will not break the whole system.
The call-back to guest because we are using the ACK-new method to work
around this issue. Yes, it is expensive, Also, this ACK-new method may
cause deadlock as Haitao suggested in the mail.
But if we move the config space to HV, then we don't need this ACK-new
method, that should be ok, but admittedly, that should be the last
method we we turn to, since config-space should be owned by domain0.
Thanks
-- Yunhong Jiang
The patch to ack and no-mask the MSI-x interrupt is below:
--- kernel/irq/chip.c 2008-03-28 13:23:51.000000000 -0400
+++ ../linux-2.6.23/kernel/irq/chip.c 2007-10-09 16:31:38.000000000
-0400
@@ -439,9 +439,7 @@
* the handler was running. If all pending interrupts are handled,
the
* loop is left.
*/
-
-extern struct irq_chip msi_chip ;
-void
+void fastcall
handle_edge_irq(unsigned int irq, struct irq_desc *desc)
{
const unsigned int cpu = smp_processor_id();
@@ -457,23 +455,11 @@
*/
if (unlikely((desc->status & (IRQ_INPROGRESS | IRQ_DISABLED)) ||
!desc->action)) {
-
- if (desc->chip == &msi_chip)
- printk("mask msi chip irq %x cpu %x desc->status %x
desc->action %p tsc %lx\n", irq, cpu, desc->status, desc->action,
tsc_this);
-
desc->status |= (IRQ_PENDING | IRQ_MASKED);
- if (desc->chip == &msi_chip)
- {
- desc->chip->ack(irq);
- }else
mask_ack_irq(desc, irq);
-
goto out_unlock;
}
Keir Fraser <mailto:keir.fraser@xxxxxxxxxxxxx> wrote:
> This requires the guest to call back into Xen to signal EOI (as we
already
> do for legacy level-triggered interrupts). We shouldn't really
> need to do
> that for MSI and it's rather more expensive than a couple of
> accesses over
> the PCI bus!
>
> It's this callback into Xen, which we do not really understand why
it's
> needed, which I'm railing against. Is there some fundamental
> aspect of MSI
> we do not understand, or are we working around one brain-dead or buggy
> device?
>
> -- Keir
>
> On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>
>> Not masking each time when interrupt happen, instead, we do that only
>> when the second interrupt happen while the previous one is still
>> pending, it should be something like handle_edge_irqs() in upstream
linux.
>>
>> -- Yunhong Jiang
>>
>> Espen Skoglund <mailto:espen.skoglund@xxxxxxxxxxxxx> wrote:
>>> Preventing interrupt storms by masking the interrupt in the
MSI/MSI-X
>>> capabilty structure or MSI-X table within the interrupt handler is
>>> insane. It requires accesses over the PCI/PCIe bus and is clearly
>>> something you want to avoid on the fast path.
>>>
>>> eSk
>>>
>>>
>>> [Haitao Shan]
>>>> There are no much changes made compared with the original
patches.
>>>> But there do have some issues that we need your kind comments.
>>>
>>>> 1> ACK-NEW method is necessary to avoid IRQ storm. But it causes
the
>>>> deadlock. During my tests, I do find there can be deadlock
with
>>>> patches applied. When assigned a NIC device to HVM domain, the
scenario
>>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is
waiting
>>>> for qemu's IDE emulation and thus blocked; NIC interrupt (MSI
vector
>>>> 0x31) is waiting for injection to HVM domain since it is blocked
now; IDE
>>>> interrupt is waiting for NIC interrupt since NIC interrupt is of
high
>>>> priority but not ACKed by XEN now. When IDE interrupt and NIC
interrupt
>>>> are delivered to the same CPU, and when guest OS is Vista, the
>>>> phenomenon is easy to be observed.
>>>
>>>> 2> Without ACK-NEW, some naughty NIC devices as we observed will
>>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment
more.
>>>> Basically, writing EOI without mask the source of MSI will bring
IRQ
>>>> storm. Although the reason is under investigation, XEN should
anyhow
>>>> handle such bogous device, right?
>>>
>>>> 3> Using ACK-OLD and masking the MSI when writing EOI can be
>>>> solution. However, XEN does not own PCI configuration spaces.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|