WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Fatal crash: ACPI assigning duplicate physical IRQ's to seco

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] Fatal crash: ACPI assigning duplicate physical IRQ's to second DomU
From: Hilton Day <xlot@xxxxxxxxxxxxxxxxx>
Date: Tue, 10 Oct 2006 13:39:38 +1000
Delivery-date: Mon, 09 Oct 2006 20:40:52 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 1.5.0.7 (Windows/20060909)
Hi,

I've got a problem with ACPI assigning duplicate physical IRQ's to one
of my DomU's that I'm passing a PCI NIC to.  Can anyone shed some light
into ways I can avoid this problem with IRQ allocation?  I can see the
irq allocations using /proc/interrupts and see the conflict.

In my Dom0 I have 3 network cards.  eth0 and eth1 are identical
tulip-based 100MB cards, and eth2 is a realtek gigabit card that I'm
using as the xen-bridge. I have this problem with a variety of different
kernels - currently running kernel-xen-2.6.18-1 (fedora core 6
development tree) on all hosts, with xen-3.0.2.

Pass-through always works just fine for one of my DomU's, and ACPI
allocates an unused physical IRQ with no problems.  However, in a second
DomU, it consistently allocates the same IRQ as is used by my onboard
SATA controller (libata).

When the second DomU is running, I get a fatal crash that also destroys
my RAID volume info, and severely damages the filesystem.  I've had to
manually rebuild the raid each time this happens, so that I can try a
new alternative solution.  The crash typically happens within a few
minutes of booting.

After reading the archives of this list, as well as other lists, I've
tried putting "noirqdebug" as a kernel parameter in both the Dom0 and
DomU, and also made use of "noapic" and "acpi=off", as well as disabling
ACPI in my motherboard's bios (system is an athlonxp running on an
nforce2 motherboard with 2 gigs of ram).  None of them resolves the
conflict - it appears to be a bug that affects pass-through of PCI
devices and IRQ allocation?

I've also tried a variety of other ethernet devices (the forcedeth
driver for nforce2 onboard nic, and also natsemi driver for a Netgear
FA311) to pass through to the second DomU, with the same result.  Moving
the PCI card to a different PCI bus address/slot doesn't resolve the
problem either.

I managed to grab Dmesg outputs from dom0 and the problem DomU last time
it crashed -

The message I'm getting to console in the domU is:
####
irq 11: nobody cared (try booting with the "irqpoll" option)
[<c040569e>] dump_trace+0x69/0x1af
[<c04057fc>] show_trace_log_lvl+0x18/0x2c
[<c0405d9c>] show_trace+0xf/0x11
[<c0405dcb>] dump_stack+0x15/0x17
[<c044636e>] __report_bad_irq+0x36/0x7d
[<c044655b>] note_interrupt+0x1a6/0x1e3
[<c0445bda>] __do_IRQ+0xba/0xf2
[<c0406c2c>] do_IRQ+0x9e/0xbc
=======================
handlers:
[<d10636e8>] (tulip_interrupt+0x0/0xdb8 [tulip])
Disabling IRQ #11
end_request: I/O error, dev xvda, sector 42806344
Buffer I/O error on device xvda3, logical block 5061623
lost page write due to I/O error on xvda3
####

The dmesg output from the Dom0 following booting of the second DomU is:

####
PCI: Enabling device 0000:01:0a.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:01:0a.0[A] -> Link [LNK3] -> GSI 11 (level,
low) -> IRQ 11
ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready
xenbr0: port 4(vif4.0) entering learning state
xenbr0: topology change detected, propagating
xenbr0: port 4(vif4.0) entering forwarding state
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (BMDMA stat 0x64)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: (BMDMA stat 0x64)
ata2.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata2: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: revalidation failed (errno=-5)
ata2: failed to recover some devices, retrying in 5 secs
ata1: hard resetting port
ata2: hard resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: revalidation failed (errno=-5)
ata2: failed to recover some devices, retrying in 5 secs
ata1: hard resetting port
ata2: hard resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1.00: disabled
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: revalidation failed (errno=-5)
ata2.00: disabled
ata1: EH complete
ata2: EH complete
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 58152585
raid5:md3: read error not correctable (sector 46682112 on sda5).
raid5: Disk failure on sda5, disabling device. Operation continuing on 1
devices
raid5:md3: read error not correctable (sector 46682120 on sda5).
####


Please, any help in resolving this appreciated - I'd like to get this
host up and running!

Thanks,

Hilton.


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>