This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Dom0 hang problem

To: "Subrahmanian, Raj" <raj.subrahmanian@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] Dom0 hang problem
From: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
Date: Wed, 27 Sep 2006 23:44:13 +0100
Delivery-date: Wed, 27 Sep 2006 15:44:43 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <94C8C9E8B25F564F95185BDA64AB05F60438D83D@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acbifhee5BPl2WZzTjmuvQ4OpFVXSQABN5nwAACZwVA=
Thread-topic: [Xen-devel] Dom0 hang problem
> I have been running on a 8-way 32 GB ES7000 system.
> I am on the tip of the xen-unstable tree (changeset 11627).
> To test xen, I ran 4 DomUs 2 of them were 8-way, with 10 GB memory and
> others were 4-way with 1 GB of memory. The DomUs come up, run
> and ltp and shutdown. After about 3 hours of running, I tried to do an
> xm list and the machine locked up.

Hmm, this isn't going to be fun to find.

Dom0 is presumably suffering from some nasty lockup, Maybe its root
device has gone away, or there's been some locking issue.

If you boot with the console over the serial line (and don't start X for
good measure), do you see any messages from the dom0 kernel when the
lockup happens?

Can you get magic sysrq to work either on serial or on the console?
Would be great to see the task queues inside the kernel? ('t' key, I

Can you repro this with a dom0 booted with a single VCPU (max_cpus=1) ?
This may give better performance anyhow, depending on workload.

The other real PITA to setup but very interesting thing to do would be
to rerun the whole experiment with a PAE hypervisor and PAE guests.

> I could not ssh into the box, but I could ping it. I could get data
> the serial machine. Leaving the machine untouched for a long time does
> not alleviate the problem.
> The 8-way DomUs had completed their tests at this point and the 4-way
> domUs had finished their kernbench tests and were running LTP.
> Has anyone else seen issues like this?
> How can I debug this problem?
> I have attached the before and after info from the serial machine. run
> queues, memory info, VM info etc.
> I discovered this problem when I was giving Ryan Harper's NUMA patches
> spin last week.
> Further investigation revealed that the issue was *not* with the NUMA
> patches, but occurs in the mainline xen-unstable kernel.
> Thanks
> Raj
> Xen Development Team
> Unisys Corp.

Xen-devel mailing list