WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] cpuidle causing Dom0 soft lockups

>>> "Yu, Ke" <ke.yu@xxxxxxxxx> 02.02.10 18:07 >>>
>>Just fyi, we now also have seen an issue on a 24-CPU system that went
>>away with cpuidle=0 (and static analysis of the hang hinted in that
>>direction). All I can judge so far is that this likely has something to do
>>with our kernel's intensive use of the poll hypercall (i.e. we see vCPU-s
>>not waking up from the call despite there being pending unmasked or
>>polled for events).
>
>We just identified the cause of this issue, and is trying to find appropriate 
>way to fix it.

Hmm, while I agree that the scenario you describe can be a problem, I
don't think it can explain the behavior on the 24-CPU system pointed
out above, nor the one Juergen Gross pointed out yesterday.

Nor can it explain why this happens at boot time (when you can take it
for granted that several/most of the CPUs are idle [and hence would
have their periodic timer stopped]).

Also I would think that the rate at which xtime_lock is being acquired
may not be the highest one in the entire system, and hence problems
may continue to result even if we fixed timer_interrupt().

>Anyway, cpuidle is just one side, we can anticipate that if CPU number is 
>large enough to lead NR_CPU * T1 > 4ms, this issue will occurs again. So 
>another way is to make dom0 scaling well by not using xtime_lock, although 
>this is pretty hard currently. Or another way is to limit dom0 vCPU number to 
>certain reasonable level.

I would not think that dealing with the xtime_lock scalability issue in
timer_interrupt() should be *that* difficult. In particular it should be
possibly to assign an on-duty CPU (permanent or on a round-robin
basis) that deals with updating jiffies/wallclock, and all other CPUs
just update their local clocks. I had thought about this before, but
never found a strong need to experiment with that.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel