WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Oops in Dom0 kernel when eth link fails

To: "xen-users@xxxxxxxxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-users] Oops in Dom0 kernel when eth link fails
From: "Ronald Moesbergen" <rmoesbergen@xxxxxxxxxxxxxx>
Date: Mon, 27 Nov 2006 11:24:28 +0100
Delivery-date: Tue, 28 Nov 2006 01:37:28 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
Importance: Normal
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Reply-to: rmoesbergen@xxxxxxxxxxxxxx
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Kerio Outlook Connector (6.2.3.2117)
Hi,

While running two xen machines with kernel 2.6.18-2 (the standard Xen kernels 
supplied by debian unstable) I get the following oops in the Dom0 kernel when 
the ethernet link changes from up to down:

BUG: unable to handle kernel NULL pointer dereference at virtual address 
00000000
 printing eip:
c02855ba
*pde = ma 00000000 pa fffff000
Oops: 0002 [#1]
SMP
Modules linked in: ip_vs_wrr ip_vs xt_physdev netconsole iptable_filter 
ip_tables x_tables bridge netloop drbd button ac battery loop shpchp 
pci_hotplug pcspkr serial_core serio_raw psmouse evdev tsdev ext3 jbd mbcache 
dm_mirror dm_snapshot dm_mod ide_cd cdrom generic usbhid cciss piix scsi_mod 
uhci_hcd ide_core bnx2 usbcore thermal processor fan
CPU:    0
EIP:    0061:[<c02855ba>]    Not tainted VLI
EFLAGS: 00010286   (2.6.18-2-xen-686 #1)
EIP is at iret_exc+0x883/0xbe6
eax: 00000000   ebx: 00000000   ecx: 00000007   edx: c0ca0000
esi: c0ca0018   edi: c06d1890   ebp: 0000004c   esp: c0315d0c
ds: 007b   es: 007b   ss: 0069
Process swapper (pid: 0, ti=c0314000 task=c02c9660 task.ti=c0314000)
Stack: 0000004c 000001d8 c0ca0000 c0227f6d c0ca0000 c06d1878 000001d8 00000000
       00000000 00000000 00000018 c06d1878 c71038ac 00000001 0000004c 000005dc
       c52fd53c 0000025f c02079fc 000001d8 c0315e38 00000224 c76fee80 0000022c
Call Trace:
 [<c0227f6d>] skb_copy_and_csum_bits+0x129/0x2a9
 [<c02079fc>] __alloc_skb+0x6c/0x70
 [<c02647a9>] icmp_glue_bits+0x1f/0x74
 [<c02496f8>] ip_append_data+0x5d1/0x942
 [<c026478a>] icmp_glue_bits+0x0/0x74
 [<c026467d>] icmp_push_reply+0x3d/0x14a
 [<c0243d86>] ip_route_output_flow+0x13/0x57
 [<c0264f6d>] icmp_send+0x2e7/0x350
 [<c012b60c>] run_posix_cpu_timers+0x1c/0x6bf
 [<c011495e>] rebalance_tick+0x116/0x2ae
 [<c0241b36>] ipv4_link_failure+0x14/0x3c
 [<c0262f1c>] arp_error_report+0x1c/0x24
 [<c0232c0d>] neigh_timer_handler+0x18e/0x24d
 [<c0232a7f>] neigh_timer_handler+0x0/0x24d
 [<c0121c28>] run_timer_softirq+0x101/0x15c
 [<c011de82>] __do_softirq+0x5e/0xc3
 [<c011df21>] do_softirq+0x3a/0x4a
 [<c01060c9>] do_IRQ+0x48/0x53
 [<c0206518>] evtchn_do_upcall+0x64/0x9b
 [<c01049d9>] hypervisor_callback+0x3d/0x48
 [<c01072c6>] raw_safe_halt+0x8c/0xaf
 [<c0102c63>] xen_idle+0x22/0x2e
 [<c0102d82>] cpu_idle+0x91/0xab
 [<c03196fe>] start_kernel+0x37a/0x381
Code: ff ff ff e9 a8 4f ef ff b8 f2 ff ff ff e9 c7 4f ef ff b8 f2 ff ff ff e9 
e7 4f ef ff 8b 3d 20 0b 36 c0 e9 ef 93 ef ff 8b 5c 24 20 <c7> 03 f2 ff ff ff 8b 
7c 24 14 8b 4c 24 18 31 c0 f3 aa e9 4b 0d
EIP: [<c02855ba>] iret_exc+0x883/0xbe6 SS:ESP 0069:c0315d0c
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Some details about the setup: The machines are linked by an ethernet 
cross-cable via eth1. eth0 on both machines links to the LAN where clients 
connect to a virtual IP address managed by heartbeat. Both machines run 1 DomU 
providing services. Data replication is done with drbd over the eth1 link. This 
is what happens:

- Both machines are running fine, one DomU per physical machine, load balanced.
- One of the machines has a (simulated) problem, (poweroff -f).
- The second machine takes over all DomU's. Then seconds later the above oops 
occurs and the second machine is also down. Not quite as intended :)

My guess is that this has to do with the eth1 ethernet link failing because of 
the cross-cable, but I could be wrong. The network driver used is bnx2, the 
network card is a 'Broadcom NetXtreme II BCM5708 1000Base-T (B1) PCI-X 64-bit 
133MHz'. I have tried to reproduce it on a non-xen kernel, but couldn't. Also 
someone suggested I disable tx checksumming in both DomU's, but that made no 
difference.

Below is some output of xm info and xm dmesg.

Xm info:
host                   : kalium
release                : 2.6.18-2-xen-686
version                : #1 SMP Thu Nov 9 00:21:32 UTC 2006
machine                : i686
nr_cpus                : 4
nr_nodes               : 1
sockets_per_node       : 1
cores_per_socket       : 2
threads_per_core       : 2
cpu_mhz                : 3200
hw_caps                : 
bfebfbff:20100000:00000000:00000180:0000e43d:00000000:00000001
total_memory           : 2047
free_memory            : 1379
xen_major              : 3
xen_minor              : 0
xen_extra              : .3-1
xen_caps               : xen-3.0-x86_32 hvm-3.0-x86_32
xen_pagesize           : 4096
platform_params        : virt_start=0xfc000000
xen_changeset          : Tue Oct 17 22:09:52 2006 +0100
cc_compiler            : gcc version 4.1.2 20061028 (prerelease) (Debian 
4.1.1-19)
cc_compile_by          : ultrotter
cc_compile_domain      : debian.org
cc_compile_date        : Thu Nov  2 20:28:13 CET 2006
xend_config_format     : 2

Xm dmesg:

 Xen version 3.0.3-1 (Debian 3.0.3-0-2) (ultrotter@xxxxxxxxxx) (gcc version 
4.1.2 20061028 (prerelease) (Debian 4.1.1-19)) Thu Nov  2 20:28:13 CET 2006
 Latest ChangeSet: Tue Oct 17 22:09:52 2006 +0100

(XEN) Command line: /boot/xen-3.0.3-1-i386.gz dom0_mem=128Mb
(XEN) Physical RAM map:
(XEN)  0000000000000000 - 000000000009f400 (usable)
(XEN)  000000000009f400 - 00000000000a0000 (reserved)
(XEN)  00000000000f0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000007ffc8000 (usable)
(XEN)  000000007ffc8000 - 000000007ffd0000 (ACPI data)
(XEN)  000000007ffd0000 - 0000000080000000 (reserved)
(XEN)  00000000fec00000 - 00000000fed00000 (reserved)
(XEN)  00000000fee00000 - 00000000fee10000 (reserved)
(XEN)  00000000ffc00000 - 0000000100000000 (reserved)
(XEN) System RAM: 2047MB (2096540kB)
(XEN) Xen heap: 10MB (10408kB)
(XEN) PAE disabled.
(XEN) found SMP MP-table at 000f4f80
(XEN) DMI 2.3 present.
(XEN) Using APIC driver default
(XEN) ACPI: RSDP (v002 HP                                    ) @ 0x000f4f00
(XEN) ACPI: XSDT (v001 HP     P58      0x00000002 Ò 0x0000162e) @ 0x7ffc8300
(XEN) ACPI: FADT (v003 HP     P58      0x00000002 Ò 0x0000162e) @ 0x7ffc8380
(XEN) ACPI: SPCR (v001 HP     SPCRRBSU 0x00000001 Ò 0x0000162e) @ 0x7ffc8100
(XEN) ACPI: MCFG (v001 HP     ProLiant 0x00000001  0x00000000) @ 0x7ffc8180
(XEN) ACPI: HPET (v001 HP     P58      0x00000002 Ò 0x0000162e) @ 0x7ffc81c0
(XEN) ACPI: SPMI (v005 HP     ProLiant 0x00000001 Ò 0x0000162e) @ 0x7ffc8200
(XEN) ACPI: MADT (v001 HP     00000083 0x00000002  0x00000000) @ 0x7ffc8240
(XEN) ACPI: DSDT (v001 HP         DSDT 0x00000001 INTL 0x20030228) @ 0x00000000
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) Processor #2 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
(XEN) Processor #1 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
(XEN) Processor #3 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled)
(XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
(XEN) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 2 I/O APICs
(XEN) ACPI: HPET id: 0x10228201 base: 0xfed00000
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Initializing CPU#0
(XEN) Detected 3200.281 MHz processor.
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 0
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#0.
(XEN) CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU0: Thermal monitoring enabled
(XEN) CPU0: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Booting processor 1/2 eip 90000
(XEN) Initializing CPU#1
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#1.
(XEN) CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU1: Thermal monitoring enabled
(XEN) CPU1: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Booting processor 2/1 eip 90000
(XEN) Initializing CPU#2
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 0
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#2.
(XEN) CPU2: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU2: Thermal monitoring enabled
(XEN) CPU2: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Booting processor 3/3 eip 90000
(XEN) Initializing CPU#3
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#3.
(XEN) CPU3: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU3: Thermal monitoring enabled
(XEN) CPU3: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Total of 4 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) checking TSC synchronization across 4 CPUs: passed.
(XEN) Platform timer is 14.318MHz HPET
(XEN) Brought up 4 CPUs
(XEN) Machine check exception polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Domain 0 kernel supports features = { 0000001f }.
(XEN) Domain 0 kernel requires features = { 00000000 }.
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   03000000->04000000 (28672 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: c0100000->c038b874
(XEN)  Init. ramdisk: c038c000->c0eab200
(XEN)  Phys-Mach map: c0eac000->c0ecc000
(XEN)  Start info:    c0ecc000->c0ecc46c
(XEN)  Page tables:   c0ecd000->c0ed2000
(XEN)  Boot stack:    c0ed2000->c0ed3000
(XEN)  TOTAL:         c0000000->c1000000
(XEN)  ENTRY ADDRESS: c0100000
(XEN) Dom0 has maximum 4 VCPUs
(XEN) Initrd len 0xb1f200, start at 0xc038c000
(XEN) Scrubbing Free RAM: .....................done.
(XEN) Xen trace buffers: disabled
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to 
Xen).

Any clues to what's wrong here?

If more info is needed, please ask.
Thanks in advance.
Regards,
Ronald.



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] Oops in Dom0 kernel when eth link fails, Ronald Moesbergen <=