[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH][FIX] Possible fix for spurious interrupts




On 15 Apr 2006, at 19:57, Arun Sharma wrote:

I think you guys are running into the same problem that FreeBSD ran
into on some Intel motherboards more than a year ago.

This explanation seems to make the most sense.

http://thread.gmane.org/gmane.os.freebsd.current/67490/focus=67490

Because the problem happened on FreeBSD (which masks ioapic RTEs to
implement interrupt threads) and not on Linux, it was hard to get
attention from the hardware guys back then. I had suspected Xen
would run into it sooner or later.

Thanks Arun, this is very informative although unfortunately not very helpful. Matt Dillon's suggested alternatives to masking do not really work as they all cause spurious interrupts. Do you know if they ever found a good fix, or do they live with the problem?

I'd not heard of boot interrupt mode before, but it sounds like many chipsets cannot disable it and, even when it can be disabled, the method is chipset specific. The Intel legacy INTx model is so unbelievably crap. At least source-vectored interrupts are becoming more common.

Anyway, this the current status of my workaround for Xen:
1. I added a new ioapic ack method which delays EOI until after ISR processing in the driver domain. This mode is enabled by default but can be disabled with 'ioapic_ack=old' as a Xen boot parameter. 2. The code to safely manage deferred EOI is quite complicated and has some weaknesses:
     * Must EOI on the CPU that received the interrupt
     * Must EOI in 'reverse' order when interrupts have nested
* Un-EOIed interrupts block other guest-bound interrupts which happen to have lower priority * Right now, disable_irq() in a driver domain may potentially lock up all interrupt sources as it may prevent EOI ever happening (until enable_irq() or the interrupt is unbound from the domain) * All Xen-bound interrupts have strictly higher priority than any guest-bound IO-APIC interrupt. This should avoid deadlock issues.

Really it's a messy solution. I think having both old and new ack methods makes sense, but I'm not sure how we will end up picking which to use automatically. Maybe using the old method is best, and let users pick the new one if they see spurious interrupt problems. Or maybe the problems with the new method are mostly theoretical and we should use that by default. Or maybe we should have a DMI table to pick between them. I'm not sure.

Another question is whether to put this in 3.0.2. I think it definitely needs more testing before that, but it might not make sense to do so at all as the patch is quite invasive.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.