[Xen-users] Fatal Trap 18 (convincing hardware engineer)

Subject: [Xen-users] Fatal Trap 18 (convincing hardware engineer)
From: Matthew Baker <matt.baker@xxxxxxxxxxxxx>
Date: Fri, 07 Dec 2007 14:00:26 +0000
Hi all,

I have 2 servers with identical hardware (lspci at the bottom of this

An Extra Intel PRO/1000 MT Dual Port Server Adapter[1] has been
connected into the second slot on a pci-x capable riser (the first slot
taken by the SAS Raid controller).

When this nic *is* connected *and* the boxes boot a Xen kernel (debian
4.0 2.6.18-5-xen and using Xen HyperVisor(PAE) 3.0.3-0-4) after about 2
days I get this error on the console:

(XEN) ----[ Xen-3.0.3-1 x86_32p debug=n Not tainted ]----
(XEN) CPU: 1
(XEN) EIP: e008:[<ff1193be>]CPU: 3
(XEN) EIP: e008:[<ff1193be>] idle_loop+0x4e/0x60 idle_loop+0x4e/0x60
(XEN) EFLAGS: 00000246 CONTEXT: hypervisor
(XEN) eax: 00000000 ebx: ffbeffb4 ecx: 00000001 edx: 00000000
(XEN) esi: ffbeffb4 edi: ffbf6080 ebp: 000090dc esp: ffbeffa8
(XEN) cr0: 8005003b cr4: 000006f0 cr3: a3363000 cr2: b7f2c260
(XEN) EFLAGS: 00000246 CONTEXT: hypervisor
(XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) eax: 00000000 ebx: ffbe3fb4 ecx: 096a03ba edx: ff18c080
(XEN) Xen stack trace from esp=ffbeffa8:
(XEN) esi: ffbf0080 edi: 07a0403a ebp: 000090dc esp: ffbe3fa8
(XEN) 00000001cr0: 8005003b cr4: 000006f0 cr3: a1b80000 cr2: b7edd260
(XEN) 00000001 00001000 00000001 00000000 00000000 00000001
00000001ds: e010 8(XEN)
(XEN) 00000000Xen stack trace from esp=ffbe3fa8:
(XEN) 00000000 00000001 00f90000 00000003 c01013a7 ffbf0080
00000061 00000001(XEN) 0000007b 0000007b 00000000 00000000 00000001
ffbf6080 00000003
(XEN) Xen call trace:
(XEN) [<ff1193be>]
(XEN) idle_loop+0x4e/0x60
(XEN) 00000000
(XEN) ************************************
(XEN) 00000000CPU1 FATAL TRAP 18 (machine check), ERROR_CODE 0000.
(XEN) System shutting down -- need manual reset.
(XEN) ************************************

The machine obviously hangs.

If I remove the PCI NIC the machine stays up. If I boot into a vanilla
kernel with the NIC in the box it stays up.

I have NICs like these bought in batch running in other machines that
are also running Xen. The machines aren't really used a great deal (at
the moment although need to be soon) and as far as i can tell there's no
other issue with respect to the system that is failing, i.e the obvious
stuff like disk space running out or exhaustive cronjobs). There are no
logs other than the one to the console suggesting a failure elsewhere.

Our hardware engineer is convinced it's either a Xen or driver issue.
I've seen the thread at
and have directed the engineer at this.

My questions to the list are:

1. Can this be caused by anything else (other than hardware)?
2. Is there anything I can do to debug this further to confirm what part
of the system is failing (e.g. either CPU/RAM or PCI/BUS timeout)?

Any help on this would be greatly appreciated.

Many thanks,


 Matthew Baker, UNIX Systems Administrator
 Institute for Learning and Research Technology (ILRT)
 A: University of Bristol,
    8-10 Berkeley Square,
    BS8 1HH
 W: http://www.ilrt.bristol.ac.uk
 E: matt.baker@xxxxxxxxxx
 T: +44 (0)117 928 7121

- -- lspci

00:00.0 Host bridge: Intel Corporation E7320 Memory Controller Hub (rev 0c)
00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port
A (rev)00:03.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI
Express Port A1 (re)00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit
PCI-X Bridge (rev 02)
00:1d.0 USB Controller: Intel Corporation 6300ESB USB Universal Host
Controller)00:1d.1 USB Controller: Intel Corporation 6300ESB USB
Universal Host Controller)00:1d.4 System peripheral: Intel Corporation
6300ESB Watchdog Timer (rev 02)
00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced Programmable
Interrupt Cont)00:1d.7 USB Controller: Intel Corporation 6300ESB USB2
Enhanced Host Controller)00:1e.0 PCI bridge: Intel Corporation 82801 PCI
Bridge (rev 0a)
00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller
(rev 02)
00:1f.1 IDE interface: Intel Corporation 6300ESB PATA Storage Controller
(rev 0)00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02)
01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge
A (rev )01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI
Bridge B (rev )02:02.0 PCI bridge: Intel Corporation 80331 [Lindsay] I/O
processor (PCI-X Brid)03:0e.0 RAID bus controller: Adaptec AAC-RAID (rev 0a)
06:01.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit
Ethernet Cont)06:02.0 Ethernet controller: Intel Corporation 82541GI/PI
Gigabit Ethernet Cont)07:02.0 VGA compatible controller: ATI
Technologies Inc Rage XL (rev 27)
