Re: [Xen-users] Xen system hang or freeze
It is possible to reproduce the issue that I was seeing by running xentop continuously for a few minutes like: |
# xentop -b -d 0.1 > /dev/null
>From examination of the crashdump generated after the lockup yesterday it appeared that xentop was the active process at the time of the crash.
Xentop was being used on the system for gathering some performance statistics.
It was possible to reproduce the same issue on CentOS 5.3 (kernel-xen-2.6.18-128.1.6.el5), and on different hardware.
2009/4/6 Paraic Gallagher <paraic.gallagher@xxxxxxxxx>
This problem occurred again this weekend on one of my servers. No response to input or pings to any domains - just a blank screen when keyboard and monitor connected. It had been running for around 1 week. There was no load running on the system, CentOS 5.2 Dom0 and one CentOS 5.2 domU, and one RHEL 4.1 domU. There were no errors written to syslog around the time of the lockup.
It is a Dell PE 1950 and I had the console redirected to Serial Over Lan. I had sysrc enabled on the system and attempted to get some further debugging information using these keys. However the system did not respond. I hit Ctrl-A to switch the input to Xen and get this screen and triggered a crash dump.
(XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0).
(XEN) 'h' pressed -> showing installed handlers
(XEN) key '%' (ascii '25') => Trap to xendbg
(XEN) key 'C' (ascii '43') => trigger a crashdump
(XEN) key 'H' (ascii '48') => dump heap info
(XEN) key 'N' (ascii '4e') => NMI statistics
(XEN) key 'R' (ascii '52') => reboot machine
(XEN) key 'a' (ascii '61') => dump timer queues
(XEN) key 'd' (ascii '64') => dump registers
(XEN) key 'h' (ascii '68') => show this message
(XEN) key 'i' (ascii '69') => dump interrupt bindings
(XEN) key 'm' (ascii '6d') => memory info
(XEN) key 'n' (ascii '6e') => trigger an NMI
(XEN) key 'q' (ascii '71') => dump domain (and guest debug) info
(XEN) key 'r' (ascii '72') => dump run queues
(XEN) key 't' (ascii '74') => display multi-cpu clock info
(XEN) key 'u' (ascii '75') => dump numa info
(XEN) key 'z' (ascii '7a') => print ioapic info
Does this mean the hypervisor is still active but all guests, including Dom0 are hosed?
Is there something of value to look for in the Xen menu?
From this thread three people have reported repeated system lockups, on various
hardware, with no real warning or logging information, and no solution other than a hard
reset of the system.
Is anyone aware of a bug id for this problem or should a bug be raised?
Is there some other information I can provide from my setup which would be useful to diagnose the problem?
2009/4/6 Martin Fernau <m.fernau@xxxxxxxxxx>
With "stock xen 188.8.131.52 kernel" you mean the original Kernel from
"http://www.xen.org/download/" ? I currently use the xen-kernel 2.6.18-r12
from my distro. So I could give it a try...
How dit you get notice of these kernel oops and/or soft IRQ lockups? I'm not
able to discover _any_ abnormal events on my system as all logfiles are clean.
There must be a way to debug this...
The only USB device I currently have attached to my dom0 is a Smart-UPS
System. I don't know if this really could kill the whole machine as the
communication between dom0 and this ups should be very very low.
We must find a way to discover these lockups. Are there any debug-log
functionality we could enable in xen to start to discover this problem?
I'm afraid that these lockups could become a ko criteria for xen in the future
for professional servers...
Am Sonntag, 5. April 2009 22:33:36 schrieb thomas morgan:
> Over the last year, I've experienced a couple of sources of lockups.
> The first was resolved by going to the stock xen 184.108.40.206 kernel
> compiled from source (had been using the Debian etch kernel; found
> commentary online describing the same symptoms on Ubuntu, Redhat, and
> CentOS though, each with their distro-specific kernel).
> This one tended to result in kernel oops messages--soft IRQ lockups as
> I recall. Lockup would start with a domU and within a few minutes
> would kill the dom0 too. The fastest way to trigger this one was to
> create and shutdown domU's, although I don't recall that being the
> only way.
> The second, with the stock kernel, was an errant USB hub attached to a
> xen host. Removing the hub resolved the issue. These were complete,
> sudden lockups of the dom0 and all domUs -- basically everything.
> Higher traffic over the USB port would trigger this lockup.
> So, for those who haven't tried the stock xen kernel, and are able to
> try it (based on driver support, etc.), it might help.
Xen-users mailing list