|
|
|
|
|
|
|
|
|
|
xen-users
[Xen-users] Oops in Dom0 kernel when eth link fails
Hi,
While running two xen machines with kernel 2.6.18-2 (the standard Xen kernels
supplied by debian unstable) I get the following oops in the Dom0 kernel when
the ethernet link changes from up to down:
BUG: unable to handle kernel NULL pointer dereference at virtual address
00000000
printing eip:
c02855ba
*pde = ma 00000000 pa fffff000
Oops: 0002 [#1]
SMP
Modules linked in: ip_vs_wrr ip_vs xt_physdev netconsole iptable_filter
ip_tables x_tables bridge netloop drbd button ac battery loop shpchp
pci_hotplug pcspkr serial_core serio_raw psmouse evdev tsdev ext3 jbd mbcache
dm_mirror dm_snapshot dm_mod ide_cd cdrom generic usbhid cciss piix scsi_mod
uhci_hcd ide_core bnx2 usbcore thermal processor fan
CPU: 0
EIP: 0061:[<c02855ba>] Not tainted VLI
EFLAGS: 00010286 (2.6.18-2-xen-686 #1)
EIP is at iret_exc+0x883/0xbe6
eax: 00000000 ebx: 00000000 ecx: 00000007 edx: c0ca0000
esi: c0ca0018 edi: c06d1890 ebp: 0000004c esp: c0315d0c
ds: 007b es: 007b ss: 0069
Process swapper (pid: 0, ti=c0314000 task=c02c9660 task.ti=c0314000)
Stack: 0000004c 000001d8 c0ca0000 c0227f6d c0ca0000 c06d1878 000001d8 00000000
00000000 00000000 00000018 c06d1878 c71038ac 00000001 0000004c 000005dc
c52fd53c 0000025f c02079fc 000001d8 c0315e38 00000224 c76fee80 0000022c
Call Trace:
[<c0227f6d>] skb_copy_and_csum_bits+0x129/0x2a9
[<c02079fc>] __alloc_skb+0x6c/0x70
[<c02647a9>] icmp_glue_bits+0x1f/0x74
[<c02496f8>] ip_append_data+0x5d1/0x942
[<c026478a>] icmp_glue_bits+0x0/0x74
[<c026467d>] icmp_push_reply+0x3d/0x14a
[<c0243d86>] ip_route_output_flow+0x13/0x57
[<c0264f6d>] icmp_send+0x2e7/0x350
[<c012b60c>] run_posix_cpu_timers+0x1c/0x6bf
[<c011495e>] rebalance_tick+0x116/0x2ae
[<c0241b36>] ipv4_link_failure+0x14/0x3c
[<c0262f1c>] arp_error_report+0x1c/0x24
[<c0232c0d>] neigh_timer_handler+0x18e/0x24d
[<c0232a7f>] neigh_timer_handler+0x0/0x24d
[<c0121c28>] run_timer_softirq+0x101/0x15c
[<c011de82>] __do_softirq+0x5e/0xc3
[<c011df21>] do_softirq+0x3a/0x4a
[<c01060c9>] do_IRQ+0x48/0x53
[<c0206518>] evtchn_do_upcall+0x64/0x9b
[<c01049d9>] hypervisor_callback+0x3d/0x48
[<c01072c6>] raw_safe_halt+0x8c/0xaf
[<c0102c63>] xen_idle+0x22/0x2e
[<c0102d82>] cpu_idle+0x91/0xab
[<c03196fe>] start_kernel+0x37a/0x381
Code: ff ff ff e9 a8 4f ef ff b8 f2 ff ff ff e9 c7 4f ef ff b8 f2 ff ff ff e9
e7 4f ef ff 8b 3d 20 0b 36 c0 e9 ef 93 ef ff 8b 5c 24 20 <c7> 03 f2 ff ff ff 8b
7c 24 14 8b 4c 24 18 31 c0 f3 aa e9 4b 0d
EIP: [<c02855ba>] iret_exc+0x883/0xbe6 SS:ESP 0069:c0315d0c
<0>Kernel panic - not syncing: Fatal exception in interrupt
Some details about the setup: The machines are linked by an ethernet
cross-cable via eth1. eth0 on both machines links to the LAN where clients
connect to a virtual IP address managed by heartbeat. Both machines run 1 DomU
providing services. Data replication is done with drbd over the eth1 link. This
is what happens:
- Both machines are running fine, one DomU per physical machine, load balanced.
- One of the machines has a (simulated) problem, (poweroff -f).
- The second machine takes over all DomU's. Then seconds later the above oops
occurs and the second machine is also down. Not quite as intended :)
My guess is that this has to do with the eth1 ethernet link failing because of
the cross-cable, but I could be wrong. The network driver used is bnx2, the
network card is a 'Broadcom NetXtreme II BCM5708 1000Base-T (B1) PCI-X 64-bit
133MHz'. I have tried to reproduce it on a non-xen kernel, but couldn't. Also
someone suggested I disable tx checksumming in both DomU's, but that made no
difference.
Below is some output of xm info and xm dmesg.
Xm info:
host : kalium
release : 2.6.18-2-xen-686
version : #1 SMP Thu Nov 9 00:21:32 UTC 2006
machine : i686
nr_cpus : 4
nr_nodes : 1
sockets_per_node : 1
cores_per_socket : 2
threads_per_core : 2
cpu_mhz : 3200
hw_caps :
bfebfbff:20100000:00000000:00000180:0000e43d:00000000:00000001
total_memory : 2047
free_memory : 1379
xen_major : 3
xen_minor : 0
xen_extra : .3-1
xen_caps : xen-3.0-x86_32 hvm-3.0-x86_32
xen_pagesize : 4096
platform_params : virt_start=0xfc000000
xen_changeset : Tue Oct 17 22:09:52 2006 +0100
cc_compiler : gcc version 4.1.2 20061028 (prerelease) (Debian
4.1.1-19)
cc_compile_by : ultrotter
cc_compile_domain : debian.org
cc_compile_date : Thu Nov 2 20:28:13 CET 2006
xend_config_format : 2
Xm dmesg:
Xen version 3.0.3-1 (Debian 3.0.3-0-2) (ultrotter@xxxxxxxxxx) (gcc version
4.1.2 20061028 (prerelease) (Debian 4.1.1-19)) Thu Nov 2 20:28:13 CET 2006
Latest ChangeSet: Tue Oct 17 22:09:52 2006 +0100
(XEN) Command line: /boot/xen-3.0.3-1-i386.gz dom0_mem=128Mb
(XEN) Physical RAM map:
(XEN) 0000000000000000 - 000000000009f400 (usable)
(XEN) 000000000009f400 - 00000000000a0000 (reserved)
(XEN) 00000000000f0000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 000000007ffc8000 (usable)
(XEN) 000000007ffc8000 - 000000007ffd0000 (ACPI data)
(XEN) 000000007ffd0000 - 0000000080000000 (reserved)
(XEN) 00000000fec00000 - 00000000fed00000 (reserved)
(XEN) 00000000fee00000 - 00000000fee10000 (reserved)
(XEN) 00000000ffc00000 - 0000000100000000 (reserved)
(XEN) System RAM: 2047MB (2096540kB)
(XEN) Xen heap: 10MB (10408kB)
(XEN) PAE disabled.
(XEN) found SMP MP-table at 000f4f80
(XEN) DMI 2.3 present.
(XEN) Using APIC driver default
(XEN) ACPI: RSDP (v002 HP ) @ 0x000f4f00
(XEN) ACPI: XSDT (v001 HP P58 0x00000002 Ò 0x0000162e) @ 0x7ffc8300
(XEN) ACPI: FADT (v003 HP P58 0x00000002 Ò 0x0000162e) @ 0x7ffc8380
(XEN) ACPI: SPCR (v001 HP SPCRRBSU 0x00000001 Ò 0x0000162e) @ 0x7ffc8100
(XEN) ACPI: MCFG (v001 HP ProLiant 0x00000001 0x00000000) @ 0x7ffc8180
(XEN) ACPI: HPET (v001 HP P58 0x00000002 Ò 0x0000162e) @ 0x7ffc81c0
(XEN) ACPI: SPMI (v005 HP ProLiant 0x00000001 Ò 0x0000162e) @ 0x7ffc8200
(XEN) ACPI: MADT (v001 HP 00000083 0x00000002 0x00000000) @ 0x7ffc8240
(XEN) ACPI: DSDT (v001 HP DSDT 0x00000001 INTL 0x20030228) @ 0x00000000
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) Processor #2 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
(XEN) Processor #1 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
(XEN) Processor #3 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled)
(XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
(XEN) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode: Flat. Using 2 I/O APICs
(XEN) ACPI: HPET id: 0x10228201 base: 0xfed00000
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Initializing CPU#0
(XEN) Detected 3200.281 MHz processor.
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 0
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#0.
(XEN) CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU0: Thermal monitoring enabled
(XEN) CPU0: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Booting processor 1/2 eip 90000
(XEN) Initializing CPU#1
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#1.
(XEN) CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU1: Thermal monitoring enabled
(XEN) CPU1: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Booting processor 2/1 eip 90000
(XEN) Initializing CPU#2
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 0
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#2.
(XEN) CPU2: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU2: Thermal monitoring enabled
(XEN) CPU2: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Booting processor 3/3 eip 90000
(XEN) Initializing CPU#3
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#3.
(XEN) CPU3: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU3: Thermal monitoring enabled
(XEN) CPU3: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Total of 4 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN) -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) checking TSC synchronization across 4 CPUs: passed.
(XEN) Platform timer is 14.318MHz HPET
(XEN) Brought up 4 CPUs
(XEN) Machine check exception polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Domain 0 kernel supports features = { 0000001f }.
(XEN) Domain 0 kernel requires features = { 00000000 }.
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN) Dom0 alloc.: 03000000->04000000 (28672 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN) Loaded kernel: c0100000->c038b874
(XEN) Init. ramdisk: c038c000->c0eab200
(XEN) Phys-Mach map: c0eac000->c0ecc000
(XEN) Start info: c0ecc000->c0ecc46c
(XEN) Page tables: c0ecd000->c0ed2000
(XEN) Boot stack: c0ed2000->c0ed3000
(XEN) TOTAL: c0000000->c1000000
(XEN) ENTRY ADDRESS: c0100000
(XEN) Dom0 has maximum 4 VCPUs
(XEN) Initrd len 0xb1f200, start at 0xc038c000
(XEN) Scrubbing Free RAM: .....................done.
(XEN) Xen trace buffers: disabled
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to
Xen).
Any clues to what's wrong here?
If more info is needed, please ask.
Thanks in advance.
Regards,
Ronald.
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- [Xen-users] Oops in Dom0 kernel when eth link fails,
Ronald Moesbergen <=
|
|
|
|
|