|
|
|
|
|
|
|
|
|
|
xen-bugs
[Xen-bugs] [Bug 1486] New: dom0 crashes under heavy network load
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1486
Summary: dom0 crashes under heavy network load
Product: Xen
Version: unstable
Platform: x86-64
OS/Version: Linux
Status: NEW
Severity: major
Priority: P2
Component: Hypervisor
AssignedTo: xen-bugs@xxxxxxxxxxxxxxxxxxx
ReportedBy: uk@xxxxxxxxxxxxx
CC: uk@xxxxxxxxxxxxx
On a Dell PE-R710, with bnx2 network drivers (also tested with e1000 card, wich
also crashes if onboard-bnx2 is disabled, so I think this is not a nic driver
issue), dom0 crashes totally under heavy constant network and disk load
(produced in dom0 and one domU). faster reproduceable with an additional rsync
which also causes disk i/o.
In my testing scenario, 60 domU have been started, each of them had 6 disk- and
2 network blockdevices, so 8 backend-devices in use.
Testing scenario, using netcat to produce constant load (only zero bytes in
this case):
my.dom0 #: nc -l -p 1234 | pv > /dev/null
external.host #: cat /dev/zero | pv | nc ip.of.my.dom0 1234
then i ran additional rsync in order to produce net and disk i/o:
my.dom0 #:
for i in $(seq 1 1000); do echo "============== run $i ============" >>
rsync-runs.txt ; rm -rfv /var/spool/test/* ; rsync -avP --numeric-ids
--password-file=/etc/rsyncd.secrets user@xxxxxxxxxxxxx::source/*
/var/spool/test/; done
...which copies round about 1G of data in one run.
The Crash occurs in a few minutes or even several ours; testing the e1000 it
took 84 rsync runs (I do not know how long it took as it crashed last night).
I think I can crash the machine faster if I use the bnx2 card.
Here, the unstable kernel 2.6.27.5 from xenbits was used, but this issue also
affects older versions.
Stacktrace:
9 19:34:20 xh132 kernel: ------------[ cut here ]------------
Jul 9 19:34:20 xh132 kernel: WARNING: at net/sched/sch_generic.c:219
dev_watchdog+0x13c/0x1e9()
Jul 9 19:34:20 xh132 kernel: NETDEV WATCHDOG: eth0 (bnx2): transmit timed out
Jul 9 19:34:20 xh132 kernel: Modules linked in: iptable_filter(N) ip_tables(N)
x_tables(N) bridge(N) stp(N) llc(N) loop(N) dm_mod(N) 8021q(N) bonding(N)
dcdbas(N)
Jul 9 19:34:20 xh132 kernel: Supported: No
Jul 9 19:34:20 xh132 kernel: Pid: 0, comm: swapper Tainted: G
2.6.27.5-xen0-he+4 #7
Jul 9 19:34:20 xh132 kernel:
Jul 9 19:34:20 xh132 kernel: Call Trace:
Jul 9 19:34:20 xh132 kernel: <IRQ> [<ffffffff8022b3d7>]
warn_slowpath+0xb4/0xde
Jul 9 19:34:20 xh132 kernel: [<ffffffff80552b00>] __down_read+0xb6/0x110
Jul 9 19:34:20 xh132 kernel: [<ffffffff804d6999>] neigh_lookup+0xb0/0xc0
Jul 9 19:34:20 xh132 kernel: [<ffffffff804cafd2>] skb_queue_tail+0x17/0x3e
Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d6de>] get_nsec_offset+0x9/0x2c
Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d7ff>] local_clock+0x48/0x99
Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d6de>] get_nsec_offset+0x9/0x2c
Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d7ff>] local_clock+0x48/0x99
Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d96f>] sched_clock+0x15/0x36
Jul 9 19:34:20 xh132 kernel: [<ffffffff80241ef5>] sched_clock_cpu+0x290/0x2b9
Jul 9 19:34:20 xh132 kernel: [<ffffffff8020dfea>] timer_interrupt+0x409/0x41d
Jul 9 19:34:20 xh132 kernel: [<ffffffff804ded1f>] dev_watchdog+0x13c/0x1e9
Jul 9 19:34:20 xh132 kernel: [<ffffffffa0038b31>] br_fdb_cleanup+0x0/0xd5
[bridge]
Jul 9 19:34:20 xh132 kernel: [<ffffffff802347c8>] __mod_timer+0xc7/0xd5
Jul 9 19:34:20 xh132 kernel: [<ffffffff804debe3>] dev_watchdog+0x0/0x1e9
Jul 9 19:34:20 xh132 kernel: [<ffffffff80234131>]
run_timer_softirq+0x16c/0x211
Jul 9 19:34:20 xh132 kernel: [<ffffffff8024f132>] handle_percpu_irq+0x53/0x6f
Jul 9 19:34:20 xh132 kernel: [<ffffffff8022fee0>] __do_softirq+0x92/0x13b
Jul 9 19:34:20 xh132 kernel: [<ffffffff8020b37c>] call_softirq+0x1c/0x28
Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d1c3>] do_softirq+0x55/0xbb
Jul 9 19:34:20 xh132 kernel: [<ffffffff8020ae3e>]
do_hypervisor_callback+0x1e/0x30
Jul 9 19:34:20 xh132 kernel: <EOI> [<ffffffff8020d6af>]
xen_safe_halt+0xb3/0xd9
Jul 9 19:34:20 xh132 kernel: [<ffffffff802105b3>] xen_idle+0x2e/0x67
Jul 9 19:34:20 xh132 kernel: [<ffffffff80208dfe>] cpu_idle+0x57/0x75
Jul 9 19:34:20 xh132 kernel:
Jul 9 19:34:20 xh132 kernel: ---[ end trace a04b8dccc5213f7d ]---
Jul 9 19:34:20 xh132 kernel: bnx2: eth0 NIC Copper Link is Down
Jul 9 19:34:20 xh132 kernel: bonding: bond0: link status down for active
interface eth0, disabling it in 200 ms.
Jul 9 19:34:20 xh132 kernel: bonding: bond0: link status definitely down for
interface eth0, disabling it
Jul 9 19:34:20 xh132 kernel: device eth0 left promiscuous mode
Jul 9 19:34:20 xh132 kernel: bonding: bond0: now running without any active
interface !
Please let me know if you need further information.
So perhaps you can help.
Many thanks in advance,
best regards,
Ulf Kreutzberg
--
Configure bugmail:
http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
_______________________________________________
Xen-bugs mailing list
Xen-bugs@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-bugs
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-bugs] [Bug 1486] New: dom0 crashes under heavy network load,
bugzilla-daemon <=
|
|
|
|
|