Hi all,
[xen-users: Please CC me in replies, I am not subscribed to this list.]
First of all, please excuse this cross-posting, but I am, so to say, stumped.
My cluster of 2 Debian etch boxes (on Intel CPUs with vmx support) running
drbd8 on Linux kernel packages linux-image-2.6.18-4-xen-686
(2.6.18.dfsg.1-12etch2) or linux-image-2.6.18-5-xen-686
(2.6.18.dfsg.1-13etch4), the latter sometimes exposing a Xen networking bug
similar to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=451297 on one of
the systems (but I haven't started debugging this one). I have compiled my
own drbd8 kernel modules, so that Debian package version is 8.0.7-1 for both
drbd8-utils and drbd8-2.6.18-*-xen-686. Xen packages are from Debian etch,
i.e. xen-utils-3.0.3-1, xen-hypervisor-3.0.3-1-i386-pae, etc. version
3.0.3-0-4. domU instances are also Debian etch using
linux-image-2.6.18-4-xen-686 to start from dom0 and
linux-modules-2.6.18-4-xen-686 installed in the domUs.
Both boxes have a ca. 60GB LVM2 VG with >20 LVs of equal size and name on
both. There is one drbd ressource for each LV (currently up to drbd28, but
using only every second one for now), which are used as backend devices for
the Xen domUs. At any time, only one node switches the respective drbd
ressource to primary and then starts the domU (live migration is not yet
used, so primary-primary does not happen right now). The drbd ressources all
have a similar configuration:
resource "xen-xxx" {
protocol C;
startup {
wfc-timeout 120; ## 2 minutes.
degr-wfc-timeout 120; ## 2 minutes.
}
disk {
on-io-error detach;
}
net {
max-buffers 128;
allow-two-primaries;
cram-hmac-alg "sha256";
shared-secret "xxxxxxxxxxxxxxxxxxxxxxxx";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 50M;
after "xen-yyy";
}
on aaaa {
device /dev/drbd26;
disk /dev/mapper/xendomains-xxx;
address 192.168.255.30:7814;
meta-disk internal;
}
on bbbb {
device /dev/drbd26;
disk /dev/mapper/xendomains-xxx;
address 192.168.255.29:7814;
meta-disk internal;
}
}
That is, syncer rate is moderate (1Gbps connection), max-buffers low so as not
to run into out-of-memory issues (this used to be a problem), and the syncer
is serialized.
Sometimes, when rebooting one of the nodes, it will crash the respective
other. This is reproducible, but not ever time. Just an hour ago, it happened
again and we were able to capture the kernel output on the serial console
after reboot was started on the other node (so to be clear: this is the one
that wasn't touched by any manual intervention):
drbd0: meta connection shut down by peer.
drbd26: meta connection shut down by peer.
drbd0: tl_clear()
drbd26: tl_clear()
drbd28: meta connection shut down by peer.
drbd28: tl_clear()
drbd24: PingAck did not arrive in time.
drbd24: short read expecting header on sock: r=-512
drbd24: tl_clear()
drbd20: PingAck did not arrive in time.
drbd20: short read expecting header on sock: r=-512
drbd20: tl_clear()
drbd14: PingAck did not arrive in time.
drbd14: short read expecting header on sock: r=-512
drbd14: tl_clear()
drbd4: PingAck did not arrive in time.
drbd4: short read expecting header on sock: r=-512
drbd4: tl_clear()
drbd10: PingAck did not arrive in time.
drbd10: short read expecting header on sock: r=-512
drbd10: tl_clear()
drbd6: PingAck did not arrive in time.
drbd6: short read expecting header on sock: r=-512
drbd6: tl_clear()
drbd22: PingAck did not arrive in time.
drbd22: short read expecting header on sock: r=-512
drbd22: tl_clear()
BUG: unable to handle kernel paging request at virtual address c081e000
printing eip:
c01bb090
19529000 -> *pde = 00000001:19f48001
27948000 -> *pme = 00000000:07099067
01099000 -> *pte = 00000000:00000000
Oops: 0000 [#1]
SMP
Modules linked in: sha256 drbd cn xt_physdev button ac battery ocfs2_dlmfs
ocfs2_dlm ocfs2_nodemanager configfs bridge ip6t_LOG ipt_LOG xt_state
ipt_REJECT xt_tcpudp ip6table_filter ip6_tables ipv6 iptable_mangle
iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter ip_tables x_tables
xfs dm_crypt loop serio_raw serial_core psmouse rtc pcspkr shpchp pci_hotplug
tsdev evdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid1 md_mod
ide_generic ide_cd cdrom sd_mod generic usbhid tg3 skge ahci libata piix
scsi_mod ide_core ehci_hcd uhci_hcd usbcore thermal processor fan
CPU: 0
EIP: 0061:[<c01bb090>] Not tainted VLI
EFLAGS: 00010202 (2.6.18-5-xen-686 #1)
EIP is at csum_partial+0x88/0x120
eax: 00000000 ebx: c01bb0f0 ecx: 00000007 edx: 00000400
esi: c081e080 edi: 00000408 ebp: 00000064 esp: c031fd54
ds: 007b es: 007b ss: 0069
Process swapper (pid: 0, ti=c031e000 task=c02cf660 task.ti=c031e000)
Stack: c081e000 00000064 c022d626 c081e000 00000400 00000000 00000010 e4f00d2c
00000000 00000050 00000464 c911cef4 c022e532 c911c800 00000400 e4f00d2c
e58258ec e5825900 c031fe54 c0231f33 92a6a1ff c83c4a00 00000003 c031fe54
Call Trace:
[<c022d626>] skb_checksum+0x112/0x27e
[<c022e532>] pskb_expand_head+0xce/0x112
[<c0231f33>] skb_checksum_help+0x5d/0xac
[<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat]
[<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle]
[<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat]
[<c024def0>] dst_output+0x0/0x7
[<c0246e28>] nf_iterate+0x30/0x61
[<c024def0>] dst_output+0x0/0x7
[<c0246f4e>] nf_hook_slow+0x3a/0x90
[<c024def0>] dst_output+0x0/0x7
[<c02500e8>] ip_queue_xmit+0x35f/0x3b3
[<c024def0>] dst_output+0x0/0x7
[<c0115f49>] rebalance_tick+0x116/0x2ae
[<c025dab0>] tcp_transmit_skb+0x604/0x632
[<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7
[<c012e066>] hrtimer_run_queues+0x147/0x15f
[<c0257960>] tcp_enter_loss+0x1a1/0x1fd
[<c02608e3>] tcp_write_timer+0x0/0x5c9
[<c0260cdb>] tcp_write_timer+0x3f8/0x5c9
[<c0123440>] run_timer_softirq+0x101/0x15c
[<c011f41e>] __do_softirq+0x5e/0xc3
[<c011f4bd>] do_softirq+0x3a/0x4a
[<c0106131>] do_IRQ+0x48/0x53
[<c020c1cc>] evtchn_do_upcall+0x64/0x9b
[<c0104a51>] hypervisor_callback+0x3d/0x48
[<c0107342>] raw_safe_halt+0x8c/0xaf
[<c0102c5f>] xen_idle+0x22/0x2e
[<c0102d7e>] cpu_idle+0x91/0xab
[<c03236fc>] start_kernel+0x378/0x37f
Code: 00 74 b6 83 e9 02 77 cd 74 16 83 c1 02 0f 84 9f 00 00 00 0f b6 1e 01 d8
83 d0 00 e9 92 00 00 00 66 03 06 83 d0 00 e9 87 00 00 00 <03> 46 80 13 46 84
13 46 88 13 46 8c 13 46 90 13 46 94 13 46 98
EIP: [<c01bb090>] csum_partial+0x88/0x120 SS:ESP 0069:c031fd54
<0>Kernel panic - not syncing: Fatal exception in interrupt
BUG: warning at arch/i386/kernel/smp-xen.c:526/smp_call_function()
[<c010cddd>] smp_call_function+0x53/0xf8
[<c011b4be>] printk+0x14/0x18
[<c010ce95>] smp_send_stop+0x13/0x1e
[<c011aaeb>] panic+0x45/0xde
[<c0105235>] die+0x242/0x276
[<c0111d05>] do_page_fault+0xa53/0xb76
[<c01bb0f0>] csum_partial+0xe8/0x120
[<c01112b2>] do_page_fault+0x0/0xb76
[<c0104a0f>] error_code+0x2b/0x30
[<c01bb0f0>] csum_partial+0xe8/0x120
[<c01bb090>] csum_partial+0x88/0x120
[<c022d626>] skb_checksum+0x112/0x27e
[<c022e532>] pskb_expand_head+0xce/0x112
[<c0231f33>] skb_checksum_help+0x5d/0xac
[<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat]
[<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle]
[<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat]
[<c024def0>] dst_output+0x0/0x7
[<c0246e28>] nf_iterate+0x30/0x61
[<c024def0>] dst_output+0x0/0x7
[<c0246f4e>] nf_hook_slow+0x3a/0x90
[<c024def0>] dst_output+0x0/0x7
[<c02500e8>] ip_queue_xmit+0x35f/0x3b3
[<c024def0>] dst_output+0x0/0x7
[<c0115f49>] rebalance_tick+0x116/0x2ae
[<c025dab0>] tcp_transmit_skb+0x604/0x632
[<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7
[<c012e066>] hrtimer_run_queues+0x147/0x15f
[<c0257960>] tcp_enter_loss+0x1a1/0x1fd
[<c02608e3>] tcp_write_timer+0x0/0x5c9
[<c0260cdb>] tcp_write_timer+0x3f8/0x5c9
[<c0123440>] run_timer_softirq+0x101/0x15c
[<c011f41e>] __do_softirq+0x5e/0xc3
[<c011f4bd>] do_softirq+0x3a/0x4a
[<c0106131>] do_IRQ+0x48/0x53
[<c020c1cc>] evtchn_do_upcall+0x64/0x9b
[<c0104a51>] hypervisor_callback+0x3d/0x48
[<c0107342>] raw_safe_halt+0x8c/0xaf
[<c0102c5f>] xen_idle+0x22/0x2e
[<c0102d7e>] cpu_idle+0x91/0xab
[<c03236fc>] start_kernel+0x378/0x37f
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
Then a reboot. ksymoops doesn't want to read the symbols from /proc/kallsyms
(and /proc/ksyms) doesn't exist, so the decoding isn't complete:
ksymoops 2.4.11 on i686 2.6.18-5-xen-686. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.6.18-5-xen-686/ (default)
-m /boot/System.map-2.6.18-5-xen-686 (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
BUG: unable to handle kernel paging request at virtual address c081e000
c01bb090
19529000 -> *pde = 00000001:19f48001
27948000 -> *pme = 00000000:07099067
01099000 -> *pte = 00000000:00000000
Oops: 0000 [#1]
CPU: 0
EIP: 0061:[<c01bb090>] Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202 (2.6.18-5-xen-686 #1)
eax: 00000000 ebx: c01bb0f0 ecx: 00000007 edx: 00000400
esi: c081e080 edi: 00000408 ebp: 00000064 esp: c031fd54
ds: 007b es: 007b ss: 0069
Stack: c081e000 00000064 c022d626 c081e000 00000400 00000000 00000010 e4f00d2c
00000000 00000050 00000464 c911cef4 c022e532 c911c800 00000400 e4f00d2c
e58258ec e5825900 c031fe54 c0231f33 92a6a1ff c83c4a00 00000003 c031fe54
Call Trace:
[<c022d626>] skb_checksum+0x112/0x27e
[<c022e532>] pskb_expand_head+0xce/0x112
[<c0231f33>] skb_checksum_help+0x5d/0xac
[<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat]
[<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle]
[<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat]
[<c024def0>] dst_output+0x0/0x7
[<c0246e28>] nf_iterate+0x30/0x61
[<c024def0>] dst_output+0x0/0x7
[<c0246f4e>] nf_hook_slow+0x3a/0x90
[<c024def0>] dst_output+0x0/0x7
[<c02500e8>] ip_queue_xmit+0x35f/0x3b3
[<c024def0>] dst_output+0x0/0x7
[<c0115f49>] rebalance_tick+0x116/0x2ae
[<c025dab0>] tcp_transmit_skb+0x604/0x632
[<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7
[<c012e066>] hrtimer_run_queues+0x147/0x15f
[<c0257960>] tcp_enter_loss+0x1a1/0x1fd
[<c02608e3>] tcp_write_timer+0x0/0x5c9
[<c0260cdb>] tcp_write_timer+0x3f8/0x5c9
[<c0123440>] run_timer_softirq+0x101/0x15c
[<c011f41e>] __do_softirq+0x5e/0xc3
[<c011f4bd>] do_softirq+0x3a/0x4a
[<c0106131>] do_IRQ+0x48/0x53
[<c020c1cc>] evtchn_do_upcall+0x64/0x9b
[<c0104a51>] hypervisor_callback+0x3d/0x48
[<c0107342>] raw_safe_halt+0x8c/0xaf
[<c0102c5f>] xen_idle+0x22/0x2e
[<c0102d7e>] cpu_idle+0x91/0xab
[<c03236fc>] start_kernel+0x378/0x37f
Code: 00 74 b6 83 e9 02 77 cd 74 16 83 c1 02 0f 84 9f 00 00 00 0f b6 1e 01 d8
83 d0 00 e9 92 00 00 00 66 03 06 83 d0 00 e9 87 00 00 00 <03> 46 80 13 46 84
13 46 88 13 46 8c 13 46 90 13 46 94 13 46 98
>>EIP; c01bb090 <csum_partial+88/120> <=====
>>ebx; c01bb0f0 <csum_partial+e8/120>
>>esp; c031fd54 <init_thread_union+1d54/2000>
Trace; c022d626 <skb_checksum+112/27e>
Trace; c022e532 <pskb_expand_head+ce/112>
Trace; c0231f33 <skb_checksum_help+5d/ac>
Trace; e93ef2ea <END_OF_CODE+2905a2ea/????>
Trace; e93f8092 <END_OF_CODE+29063092/????>
Trace; e93ef61e <END_OF_CODE+2905a61e/????>
Trace; c024def0 <dst_output+0/7>
Trace; c0246e28 <nf_iterate+30/61>
Trace; c024def0 <dst_output+0/7>
Trace; c0246f4e <nf_hook_slow+3a/90>
Trace; c024def0 <dst_output+0/7>
Trace; c02500e8 <ip_queue_xmit+35f/3b3>
Trace; c024def0 <dst_output+0/7>
Trace; c0115f49 <rebalance_tick+116/2ae>
Trace; c025dab0 <tcp_transmit_skb+604/632>
Trace; c025e80c <tcp_retransmit_skb+4e2/5c7>
Trace; c012e066 <hrtimer_run_queues+147/15f>
Trace; c0257960 <tcp_enter_loss+1a1/1fd>
Trace; c02608e3 <tcp_write_timer+0/5c9>
Trace; c0260cdb <tcp_write_timer+3f8/5c9>
Trace; c0123440 <run_timer_softirq+101/15c>
Trace; c011f41e <__do_softirq+5e/c3>
Trace; c011f4bd <do_softirq+3a/4a>
Trace; c0106131 <do_IRQ+48/53>
Trace; c020c1cc <evtchn_do_upcall+64/9b>
Trace; c0104a51 <hypervisor_callback+3d/48>
Trace; c0107342 <raw_safe_halt+8c/af>
Trace; c0102c5f <xen_idle+22/2e>
Trace; c0102d7e <cpu_idle+91/ab>
Trace; c03236fc <start_kernel+378/37f>
This architecture has variable length instructions, decoding before eip
is unreliable, take these instructions with a pinch of salt.
Code; c01bb065 <csum_partial+5d/120>
00000000 <_EIP>:
Code; c01bb065 <csum_partial+5d/120>
0: 00 74 b6 83 add %dh,0xffffff83(%esi,%esi,4)
Code; c01bb069 <csum_partial+61/120>
4: e9 02 77 cd 74 jmp 74cd770b <_EIP+0x74cd770b>
Code; c01bb06e <csum_partial+66/120>
9: 16 push %ss
Code; c01bb06f <csum_partial+67/120>
a: 83 c1 02 add $0x2,%ecx
Code; c01bb072 <csum_partial+6a/120>
d: 0f 84 9f 00 00 00 je b2 <_EIP+0xb2>
Code; c01bb078 <csum_partial+70/120>
13: 0f b6 1e movzbl (%esi),%ebx
Code; c01bb07b <csum_partial+73/120>
16: 01 d8 add %ebx,%eax
Code; c01bb07d <csum_partial+75/120>
18: 83 d0 00 adc $0x0,%eax
Code; c01bb080 <csum_partial+78/120>
1b: e9 92 00 00 00 jmp b2 <_EIP+0xb2>
Code; c01bb085 <csum_partial+7d/120>
20: 66 03 06 add (%esi),%ax
Code; c01bb088 <csum_partial+80/120>
23: 83 d0 00 adc $0x0,%eax
Code; c01bb08b <csum_partial+83/120>
26: e9 87 00 00 00 jmp b2 <_EIP+0xb2>
This decode from eip onwards should be reliable
Code; c01bb090 <csum_partial+88/120>
00000000 <_EIP>:
Code; c01bb090 <csum_partial+88/120> <=====
0: 03 46 80 add 0xffffff80(%esi),%eax <=====
Code; c01bb093 <csum_partial+8b/120>
3: 13 46 84 adc 0xffffff84(%esi),%eax
Code; c01bb096 <csum_partial+8e/120>
6: 13 46 88 adc 0xffffff88(%esi),%eax
Code; c01bb099 <csum_partial+91/120>
9: 13 46 8c adc 0xffffff8c(%esi),%eax
Code; c01bb09c <csum_partial+94/120>
c: 13 46 90 adc 0xffffff90(%esi),%eax
Code; c01bb09f <csum_partial+97/120>
f: 13 46 94 adc 0xffffff94(%esi),%eax
Code; c01bb0a2 <csum_partial+9a/120>
12: 13 46 98 adc 0xffffff98(%esi),%eax
EIP: [<c01bb090>] csum_partial+0x88/0x120 SS:ESP 0069:c031fd54
<0>Kernel panic - not syncing: Fatal exception in interrupt
[<c010cddd>] smp_call_function+0x53/0xf8
[<c011b4be>] printk+0x14/0x18
[<c010ce95>] smp_send_stop+0x13/0x1e
[<c011aaeb>] panic+0x45/0xde
[<c0105235>] die+0x242/0x276
[<c0111d05>] do_page_fault+0xa53/0xb76
[<c01bb0f0>] csum_partial+0xe8/0x120
[<c01112b2>] do_page_fault+0x0/0xb76
[<c0104a0f>] error_code+0x2b/0x30
[<c01bb0f0>] csum_partial+0xe8/0x120
[<c01bb090>] csum_partial+0x88/0x120
[<c022d626>] skb_checksum+0x112/0x27e
[<c022e532>] pskb_expand_head+0xce/0x112
[<c0231f33>] skb_checksum_help+0x5d/0xac
[<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat]
[<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle]
[<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat]
[<c024def0>] dst_output+0x0/0x7
[<c0246e28>] nf_iterate+0x30/0x61
[<c024def0>] dst_output+0x0/0x7
[<c0246f4e>] nf_hook_slow+0x3a/0x90
[<c024def0>] dst_output+0x0/0x7
[<c02500e8>] ip_queue_xmit+0x35f/0x3b3
[<c024def0>] dst_output+0x0/0x7
[<c0115f49>] rebalance_tick+0x116/0x2ae
[<c025dab0>] tcp_transmit_skb+0x604/0x632
[<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7
[<c012e066>] hrtimer_run_queues+0x147/0x15f
[<c0257960>] tcp_enter_loss+0x1a1/0x1fd
[<c02608e3>] tcp_write_timer+0x0/0x5c9
[<c0260cdb>] tcp_write_timer+0x3f8/0x5c9
[<c0123440>] run_timer_softirq+0x101/0x15c
[<c011f41e>] __do_softirq+0x5e/0xc3
[<c011f4bd>] do_softirq+0x3a/0x4a
[<c0106131>] do_IRQ+0x48/0x53
[<c020c1cc>] evtchn_do_upcall+0x64/0x9b
[<c0104a51>] hypervisor_callback+0x3d/0x48
[<c0107342>] raw_safe_halt+0x8c/0xaf
[<c0102c5f>] xen_idle+0x22/0x2e
[<c0102d7e>] cpu_idle+0x91/0xab
[<c03236fc>] start_kernel+0x378/0x37f
Warning (Oops_read): Code line not seen, dumping what data is available
>>EIP; c01bb090 <csum_partial+88/120> <=====
These are production boxes, so I should avoid rebooting them too often for
debugging purposes. Any hints on how to approach and solve this issue would
be highly appreciated.
with best regards,
Rene
--
-------------------------------------------------
Gibraltar firewall http://www.gibraltar.at/
signature.asc
Description: This is a digitally signed message part.
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|