[Xen-bugs] [Bug 705] New: BUG: soft lockup detected on CPU
Summary: BUG: soft lockup detected on CPU
In bug report 697 (replicated below) we indicated that our file system backed
domainU was to be converted to physical drives. Now the 3 guest domains were
converted to use LVM backed drives. The results are exactly the same as with
file backed guest systems.
The entire Xen system crashes and freezes the machine under heavy load. As it
stands now Xen 3.0+ can only be used under low capacity loads hence it may not
be ready for production environments. This is based on our own experience with
reproduceable scenarios and other bug reports that can be found in response to
a "soft lockup" search criteria. This issue has not been address, while it
should be considered critical.
+++ This bug was initially created as a clone of Bug #697 +++
This bug has been reported by many others utilizing different configuratons and
install. I have seen at least on report suspecting to occur under heavy load
and I can confirm that this is the case.
We have three HP DL360 G3 32-bit servers with FC4/Xen-3.0.0 and one with
FC5/Xen-3.0.2. Every one of these machines is hosting three DomainU guests
configured each with Quagga's zebrad, ospfd and net-snmp (nothing else) to
create 3 virtual ospf edge routers per machine. Regardless of configuration FC4
or FC5, and Xen-3.0.0 or Xen-3.0.2, the entire environment (Domain0 and the
entire machine, keyboard serial port, etc) freezes anywhere after 3 or 4 hours
of execution. This however only happens when we send a video feed thru one of
the virtual routers. All other three machines that do not experience the
traffic load continue to work and route traffic. We have switched machines and
traffic patterns to confirm that the problem can be recreated and only affects
the Xen environment hosting the router with the heavy video feed load.
We also have a supporting ospf backbone using five HP DL360 G4 32-bit machines
each configured with FC4, Xen-3.0.0 and Quagga's daemos with 2 guest domains
each for a test backbone consisting of 10 ospf virtual routers. These continue
to work properly as the first affected router is the edge router supporting the
video source delivery. The backbone is only subjecte to partial traffic. The
edge router that gets affected routes to the backbone and to another internal
In brief, we are running a testbed of 9 hardware servers emulating 22 ospf
virtual routers with Xen with all routers running uninterrupted 24-hours for
over a month with the exception of the machine that is handling the routing for
the video feed source. Just yesterday we were able to detect a BUG report on
the console of the DomainU machine that is handling the heavy traffic
indicating "BUG: soft lockup detected on CPU0" with a registry dump. No further
information was available, requiring a hard reboot to recover from this
As I seen this same BUG reported by others, I figured that our experience will
provide some clarification. It is our conclusion that Xen runs fine unless is
subjected to heavy data load of some sort. Since we have Xen configured to run
with an image file based drive, we will attempt to reconfigure it to run with a
physical device hard drive instead. This may or may not resolve the issue.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Xen-bugs mailing list