WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

To: Bruce Edge <bruce.edge@xxxxxxxxx>
Subject: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
From: Andreas Kinzler <ml-xen-users@xxxxxx>
Date: Mon, 27 Sep 2010 16:22:39 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 27 Sep 2010 07:23:44 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTi=jxHQp3_GDML9JcoYNNkGTGLR3_okBspWnFdfC@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTimPVj-AXyR8DuQRxuAwcFwHm0sVkgiXvkA1+f7-@xxxxxxxxxxxxxx> <4C9DE72E.1000006@xxxxxx> <AANLkTi=jxHQp3_GDML9JcoYNNkGTGLR3_okBspWnFdfC@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4
On 27.09.2010 16:06, Bruce Edge wrote:
I saw reproducible hangs in dom0 when the system is under heavy load.
four dom0s share a nfs server for domU images. a total number of 24 domUs
(6
domUs on each dom0). When the system under heavy load, busy processing
e-commerce requests, one or two of the dom0s hanged. no input can be
accepted and reboot is necessary.
Anyone had the same experience? The causes I can come up are following:
Please post your hardware (mainboard, chipset, CPU, RAID controller).
I have found a severe problem on Lynnfield systems.
Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel is
causing grief for us too.  I was wondering if this was related.

I am still researching this. For testing I bought a test system with Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while Intel still lists it as having the C6 errata. This leads me to the conclusion that the HPET timer migration code (called HPET broadcast) from Xen is the root cause. This affects all CPUs that use it - but mainly Nehalem because of turbo mode.

Regards Andreas

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel