|
|
|
|
|
|
|
|
|
|
xen-users
Re: [Xen-users] Xen system hang or freeze
On Fri, Apr 03, 2009 at 03:56:28PM +0100, Paraic Gallagher wrote:
> I am running xen 3.0.3, with CentOS 5.2 based Dom0
> (kernel-xen-2.6.18-92.1.22.el5)
> Recently I have noticed some complete system lockups on a few different
> servers. Neither Dom0 or any of the guests respond to pings, connecting a
> keyboard and monitor to the system only shows a blank screen. Nothing is
> written to logs at time of lockup.
I have seen similar issues with one of my servers. I have yet to nail
down the issue.
Specs:
Distro: Debian Etch
Kernel: 2.6.18-6-xen-amd64
CPU: 2x Quad-Core AMD Opteron(tm) Processor 2350
Memory: 16G
Disk: 3ware 9650LE with 8 drive Raid6
Xen: 3.2 (from debian repo)
All vms are LVM backed. Not running any HVM guests.
For a while I was seeing softlockup on cpu scrolling on the console
and thought that may have caused it. Unfortunatly after updating the
kernel the errors went away and I have had another lockup since then.
Ive found a fairly set pattern though no time periods to predict.
A VM typically goes unresponsive first. If left unchecked for long
enough the host will lock. If caught in time I have had limited
success running xm destroy on the domU. Most of the time running xm
destroy on the domU causes the host to lock immediately requiring a
hard reboot.
The most recent lockup was a bit different that what I had in the
past.
The domU locked up (no output on domU console). xm destroy locked
dom0. I rebooted with a remote power strip. dom0 and all domUs came
back up. Nothing in logs as usual. 10 minutes later dom0 was locked
again. I drove to the datacenter and about 30-45 minutes after the
lock the machine became responsive again (according to monitoring
server) I was able to display a website running on a vm. Then the
machine went unresponsive again. Not responding to physical console
access either. Another hard reboot and things are ok.
That was the first time I had ever had so many lockups so close
together. Typically the lockups seem to be 1-2 weeks apart.
I have even tried setting up netconsole on dom0 to try to catch kernel
errors with no success.
--
Nick Anderson <nick@xxxxxxxxxxxx>
http://www.cmdln.org
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
|
|
|
|