RE: [Xen-devel] cpuidle causing Dom0 soft lockups

To:	Jan Beulich <JBeulich@xxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject:	RE: [Xen-devel] cpuidle causing Dom0 soft lockups
From:	"Yu, Ke" <ke.yu@xxxxxxxxx>
Date:	Wed, 3 Feb 2010 15:32:48 +0800
Accept-language:	en-US
Acceptlanguage:	en-US
Cc:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Tue, 02 Feb 2010 23:35:09 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<4B58402E020000780002B3FE@xxxxxxxxxxxxxxxxxx> <C77DE51B.6F89%keir.fraser@xxxxxxxxxxxxx> <4B67E85E020000780002D1A0@xxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	Acqj3Q0s1oo/kfQzT7a3+g7dJJDr3gAQv8qwACCny6A=
Thread-topic:	[Xen-devel] cpuidle causing Dom0 soft lockups

Hi Jan,

Could you please try the attached patch. this patch try to avoid entering deep 
C state when there is vCPU local irq disabled, and polling event channel. When 
tested in my 64 CPU box, this issue is gone with this patch.

Best Regards
Ke

>-----Original Message-----
>From: Yu, Ke
>Sent: Wednesday, February 03, 2010 1:07 AM
>To: Jan Beulich; Keir Fraser
>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>Subject: RE: [Xen-devel] cpuidle causing Dom0 soft lockups
>
>>-----Original Message-----
>>From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
>>Sent: Tuesday, February 02, 2010 3:55 PM
>>To: Keir Fraser; Yu, Ke
>>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>>Subject: Re: [Xen-devel] cpuidle causing Dom0 soft lockups
>>
>>>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 21.01.10 12:03 >>>
>>>On 21/01/2010 10:53, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>>>> I can see your point. But how can you consider shipping with something
>>>> apparently severely broken. As said before - the fact that this manifests
>>>> itself by hanging many-vCPU Dom0 has the very likely implication that
>>>> there are (so far unnoticed) problems with smaller Dom0-s. If I had a
>>>> machine at hand that supports C3, I'd try to do some measurements
>>>> with smaller domains...
>>>
>>>Well it's a fallback I guess. If we can't make progress on solving it then I
>>>suppose I agree.
>>
>>Just fyi, we now also have seen an issue on a 24-CPU system that went
>>away with cpuidle=0 (and static analysis of the hang hinted in that
>>direction). All I can judge so far is that this likely has something to do
>>with our kernel's intensive use of the poll hypercall (i.e. we see vCPU-s
>>not waking up from the call despite there being pending unmasked or
>>polled for events).
>>
>>Jan
>
>Hi Jan,
>
>We just identified the cause of this issue, and is trying to find appropriate 
>way
>to fix it.
>
>This issue is the result of following sequence:
>1. every dom0 vCPU has one 250HZ timer (i.e. 4ms period). The vCPU
>timer_interrupt handler will acquire a global ticket spin lock xtime_lock.
>When xtime_lock is hold by other vCPU, the vCPU will poll event channel and
>become blocked. As a result, the pCPU where the vCPU runs will become idle.
>Later, when the lock holder release xtime_lock, it will notify event channel to
>wake up the vCPU. As a result, the pCPU will wake up from idle state, and
>schedule the vCPU to run.
>
>From the above, we can see the latency of vCPU timer interrupt is consisted
>of the following items. The "latency" here means the time between beginning
>to acquire lock and finally lock acquired.
>T1 - CPU execution time ( e.g. timer interrupt lock holding time, event channel
>notification time)
>T2 - CPU idle wake up time, i.e. the time CPU wake up from deep C state (e.g.
>C3) to C0, usually it is in the order of several 10us or 100us
>
>2. then let's consider the case of large number of CPUs, e.g. 64 pCPU and 64
>VCPU in dom0, let's assume the lock holding sequence is VCPU0 ->
>VCPU1->VCPU2 ... ->VCPU63.
>Then vCPU63 will spend 64*(T1 + T2) to acquire the xtime_lock. if T1+T2 is
>100us, then the total latency would be ~6ms.
>As we have known that the timer is 250HZ, or 4ms period, so when event
>channel notification issued, and pCPU schedule vCPU63, hypervisor will find
>the timer is over-due, and will send another TIMER_VIRQ for vCPU63 (see
>schedule()->vcpu_periodic_timer_work() for detail). In this case, vCPU63 will
>be always busy handling timer interrupt, and not be able to update the watch
>dog, thus cause the softlock up.
>
>So from the above sequence, we can see:
>- cpuidle driver add extra latency, thus make this issue more easy to occurs.
>- Large number of CPU multiply the latency
>- ticket spin lock lead fixed lock acquiring sequence, thus lead the latency
>repeatedly being 64*(T1+T2), thus make this issue more easy to occurs.
>and the fundamental cause of this issue is that vCPU timer interrupt handler
>is not good for scaling, due to the global xtime_lock.
>
>From cpuidle point of view, one thing we are trying to do is: changing the
>cpuidle driver to not enter deep C state when there is vCPU with local irq
>disabled and event channel polling. In this case, the T2 latency will be
>eliminated.
>
>Anyway, cpuidle is just one side, we can anticipate that if CPU number is large
>enough to lead NR_CPU * T1 > 4ms, this issue will occurs again. So another
>way is to make dom0 scaling well by not using xtime_lock, although this is
>pretty hard currently. Or another way is to limit dom0 vCPU number to
>certain reasonable level.
>
>Regards
>Ke

cpuidle-hint-v2.patch
Description: cpuidle-hint-v2.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] cpuidle causing Dom0 soft lockups