WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] Xen system hang or freeze

To: Paraic Gallagher <paraic.gallagher@xxxxxxxxx>
Subject: Re: [Xen-users] Xen system hang or freeze
From: Nick Anderson <nick@xxxxxxxxxxxx>
Date: Fri, 3 Apr 2009 11:23:33 -0400
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 03 Apr 2009 08:24:12 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <33b90e520904030756l3d2e2eb5s1b7e50535a9a44c7@xxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <33b90e520904030756l3d2e2eb5s1b7e50535a9a44c7@xxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.18 (2008-05-17)
On Fri, Apr 03, 2009 at 03:56:28PM +0100, Paraic Gallagher wrote:
> I am running xen 3.0.3, with CentOS 5.2 based Dom0
> (kernel-xen-2.6.18-92.1.22.el5)
> Recently I have noticed some complete system lockups on a few different
> servers. Neither Dom0 or any of the guests respond to pings, connecting a
> keyboard and monitor to the system only shows a blank screen. Nothing is
> written to logs at time of lockup.

I have seen similar issues with one of my servers. I have yet to nail
down the issue. 

Specs:
Distro: Debian Etch
Kernel: 2.6.18-6-xen-amd64
CPU: 2x Quad-Core AMD Opteron(tm) Processor 2350
Memory: 16G
Disk: 3ware 9650LE with 8 drive Raid6
Xen: 3.2 (from debian repo)

All vms are LVM backed. Not running any HVM guests.

For a while I was seeing softlockup on cpu scrolling on the console
and thought that may have caused it. Unfortunatly after updating the
kernel the errors went away and I have had another lockup since then.

Ive found a fairly set pattern though no time periods to predict.

A VM typically goes unresponsive first. If left unchecked for long
enough the host will lock. If caught in time I have had limited
success running xm destroy on the domU. Most of the time running xm
destroy on the domU causes the host to lock immediately requiring a
hard reboot.

The most recent lockup was a bit different that what I had in the
past.

The domU locked up (no output on domU console). xm destroy locked
dom0. I rebooted with a remote power strip. dom0 and all domUs came
back up. Nothing in logs as usual. 10 minutes later dom0 was locked
again. I drove to the datacenter and about 30-45 minutes after the
lock the machine became responsive again (according to monitoring
server) I was able to display a website running on a vm. Then the
machine went unresponsive again. Not responding to physical console
access either. Another hard reboot and things are ok.

That was the first time I had ever had so many lockups so close
together. Typically the lockups seem to be 1-2 weeks apart.

I have even tried setting up netconsole on dom0 to try to catch kernel
errors with no success.


-- 
Nick Anderson <nick@xxxxxxxxxxxx>
http://www.cmdln.org


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users