WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] cpuidle causing Dom0 soft lockups

To: <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] cpuidle causing Dom0 soft lockups
From: "Jan Beulich" <JBeulich@xxxxxxxxxx>
Date: Thu, 21 Jan 2010 09:51:36 +0000
Delivery-date: Thu, 21 Jan 2010 01:51:57 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On large systems and with Dom0 booting with (significantly) more than
32 vCPU-s we have got multiple reports that the now by default
enabled C-state management is causing soft lockups, usually preventing
the boot from completing.

The observations are:

Reducing the number of vCPU-s (or pCPU-s) sufficiently much makes
the systems work.

max_cstate=0 makes the systems work.

max_cstate=1 makes the problem less severe on one (bigger) system,
and eliminates it completely on another (smaller) one.

When appearing to hang, all vCPU-s are in Dom0's timer_interrupt(),
and all (sometimes all but one) are attempting to acquire xtime_lock.
However, due to our use of ticket locks we can verify that this is not
a deadlock (repeatedly sending '0' shows forward progress, as the
tickets [visible on the stack] continue to increase). Additionally, there
is always one vCPU that has its polling event channel (used for
waking the next waiting vCPU when a lock becomes available)
signaled.

In one case (but not in the other) it is always the same vCPU that
is apparently taking very long to wake up from the polling request.
This may be coincidence, but output after sending 'c' also indicates
a significantly higher (about 3 times) usage value for C2 than the
second highest one; the duration printed is roughly the same for
all CPUs.

While I don't know this code well, it would seem that we're suffering
from extremely long wakeup times. This suggests that there likely is
a (performance) problem even for smaller numbers of vCPU-s.
Hence, unless it can be fixed before 4.0 releases, I would suggest
disabling C-state management by default again.

I can provide full logs in case needed.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel