WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] Xen system hang or freeze

To: Nick Anderson <nick@xxxxxxxxxxxx>
Subject: Re: [Xen-users] Xen system hang or freeze
From: Paraic Gallagher <paraic.gallagher@xxxxxxxxx>
Date: Fri, 3 Apr 2009 16:59:33 +0100
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 03 Apr 2009 09:00:31 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=hFiB9rCpjmAKDN1FMjCeWEFYtdqzR701pE2RMuewWDo=; b=Q1ULlZFnLyh6LzZFUt9j7XHkx1gCteKFDTHjR7AEUdGdWQZhyxN3hHAxdKtlUaGc7i oKorWR52y1MpkcC+wTT4Rni5/NchI8v0u+3U5eFj9z9N19Btv1GrNb0FLD8kkCFBj1jU 7mWx2rn+wjuyqQgaNbXPQYI5Gr1qYwstcto6c=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=aR7b12Nv6FmzzFWjt/9ozgdF2ItfmjzQDMmUgQGKnGRAWtqIfj9421Kln+SPXhY8Ej oiDcEnrHQq0jKNMXg0KQnbtpfcyn+QvtvFcH/fGQoHyEqjzMGNHQPJI+7FG9XAYUzvGU n2tQANc2UaUmbXMngN2IkdTrHmZtqJRJK3Guk=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20090403152333.GA20561@cmdln-laptop>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <33b90e520904030756l3d2e2eb5s1b7e50535a9a44c7@xxxxxxxxxxxxxx> <20090403152333.GA20561@cmdln-laptop>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx


2009/4/3 Nick Anderson <nick@xxxxxxxxxxxx>
On Fri, Apr 03, 2009 at 03:56:28PM +0100, Paraic Gallagher wrote:
> I am running xen 3.0.3, with CentOS 5.2 based Dom0
> (kernel-xen-2.6.18-92.1.22.el5)
> Recently I have noticed some complete system lockups on a few different
> servers. Neither Dom0 or any of the guests respond to pings, connecting a
> keyboard and monitor to the system only shows a blank screen. Nothing is
> written to logs at time of lockup.

I have seen similar issues with one of my servers. I have yet to nail
down the issue.

Specs:
Distro: Debian Etch
Kernel: 2.6.18-6-xen-amd64
CPU: 2x Quad-Core AMD Opteron(tm) Processor 2350
Memory: 16G
Disk: 3ware 9650LE with 8 drive Raid6
Xen: 3.2 (from debian repo)

All vms are LVM backed. Not running any HVM guests.
 
Thanks for the response. After searching net for few weeks with no luck
in finding similar issues was beginning to think I was going crazy!

Just with some further details.
I have seen the issue on two types of servers Dell PE 1950, and 2950
2x Quad core Intel Xeon E5410@xxxxxxx
Memory 4G and 16G
Disk, PERC 6/i 1.11, 2x250 Raid1, ST3250620NS Rev: 3BKT

All vms are LVM backed on this system except for Dom0.

For a while I was seeing softlockup on cpu scrolling on the console
and thought that may have caused it. Unfortunatly after updating the
kernel the errors went away and I have had another lockup since then.

Ive found a fairly set pattern though no time periods to predict.

A VM typically goes unresponsive first. If left unchecked for long
enough the host will lock. If caught in time I have had limited
success running xm destroy on the domU. Most of the time running xm
destroy on the domU causes the host to lock immediately requiring a
hard reboot.

The most recent lockup was a bit different that what I had in the
past.

The domU locked up (no output on domU console). xm destroy locked
dom0. I rebooted with a remote power strip. dom0 and all domUs came
back up. Nothing in logs as usual. 10 minutes later dom0 was locked
again. I drove to the datacenter and about 30-45 minutes after the
lock the machine became responsive again (according to monitoring
server) I was able to display a website running on a vm. Then the
machine went unresponsive again. Not responding to physical console
access either. Another hard reboot and things are ok.

That was the first time I had ever had so many lockups so close
together. Typically the lockups seem to be 1-2 weeks apart.

I have even tried setting up netconsole on dom0 to try to catch kernel
errors with no success.

This seems to be quite a similar problem from the description, however I haven't
noticed the guest vms locking up prior to Dom0. Something to keep an eye on.

Are you running a particular load on the system at the time or is it somewhat
idle? Seems to be idle in my case before lockup.

rgds,
Paraic.
 


--
Nick Anderson <nick@xxxxxxxxxxxx>
http://www.cmdln.org


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users