[Oops, adding back in distro list, also adding Kevin Tian and Yu Ke
who wrote cs 19460]
The functionality I was talking about, subtracting credits and
clearing BOOST, happens in csched_vcpu_acct() (which is different than
csched_acct()). vcpu_acct() is called from csched_tick(), which
should still happen every 10ms on every cpu.
The patch I referred to (cs 19460) disables and re-enables tickers in
xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the
processor idles. I can't see anywhere else that tickers are disabled,
so it's probably something not properly re-enabling them again.
Try applying the attached patch to see if that changes anything. (I'm
on the road, so I can't repro the lockup issue.) If that doesn't
work, try disabling c-states and see if that helps. Then at least
we'll know where the problem lies.
-George
On Thu, Jul 2, 2009 at 10:10 PM, Mukesh Rathor<mukesh.rathor@xxxxxxxxxx> wrote:
> that seems to only suspend csched_pcpu.ticker which is csched_tick that is
> only sorting local runq.
>
> again, we are concerned about csched_priv.master_ticker that calls
> csched_acct? correct, so i can trace that?
>
> thanks,
> mukesh
>
>
> George Dunlap wrote:
>>
>> Ah, I see that there's been some changes to tick stuff with the
>> c-state (e.g., cs 19460). It looks like they're supposed to be going
>> still, but perhaps the tick_suspend() and tick_resume() aren't being
>> called properly. Let me take a closer look.
>>
>> -George
>>
>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh Rathor<mukesh.rathor@xxxxxxxxxx>
>> wrote:
>>>
>>> George Dunlap wrote:
>>>>
>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh Rathor<mukesh.rathor@xxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> dom0 hang:
>>>>> vcpu0 is trying to wakeup a task and in try_to_wake_up() calls
>>>>> task_rq_lock(). since the task has cpu set to 1, it gets runq lock
>>>>> for vcpu1. next it calls resched_task() which results in sending IPI
>>>>> to vcpu1. for that, vcpu0 gets into the HYPERVISOR_event_channel_op
>>>>> HCALL and is waiting to return. Meanwhile, vcpu1 got running, and is
>>>>> spinning on it's runq lock in "schedule():spin_lock_irq(&rq->lock);",
>>>>> that vcpu0 is holding (and is waiting to return from the HCALL).
>>>>>
>>>>> As I had noticed before, vcpu0 never gets scheduled in xen. So
>>>>> looking further into xen:
>>>>>
>>>>> xen:
>>>>> Both vcpu's are on the same runq, in this case cpu1. But the
>>>>> priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a result,
>>>>> the scheduler always picks vcpu1, and vcpu0 is starved. Also, I see in
>>>>> kdb that the scheduler timer is not set on cpu 0. That would've
>>>>> allowed csched_load_balance() to kick in on cpu0. [Also, on
>>>>> cpu1, the accounting timer, csched_tick, is not set. Altho,
>>>>> csched_tick() is running on cpu0, it only checks runq for cpu0.]
>>>>>
>>>>> Looks like c/s 19500 changed csched_schedule():
>>>>>
>>>>> - ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>> + ret.time = (is_idle_vcpu(snext->vcpu) ?
>>>>> + -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>
>>>>> The quickest fix for us would be to just back that out.
>>>>>
>>>>>
>>>>> BTW, just a comment on following (all in sched_credit.c):
>>>>>
>>>>> if ( svc->pri == CSCHED_PRI_TS_UNDER &&
>>>>> !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
>>>>> {
>>>>> svc->pri = CSCHED_PRI_TS_BOOST;
>>>>> }
>>>>> comibined with
>>>>> if ( snext->pri > CSCHED_PRI_TS_OVER )
>>>>> __runq_remove(snext);
>>>>>
>>>>> Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems dangerous. To me,
>>>>> since csched_schedule() never checks for time accumulated by a
>>>>> vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a vcpu to a
>>>>> pcpu. if that vcpu never makes progress, essentially, the system
>>>>> has lost a physical cpu. Optionally, csched_schedule() should
>>>>> always
>>>>> check for cpu time accumulated and reduce the priority over time.
>>>>> I can't tell right off if it already does that. or something like
>>>>> that :)... my 2 cents.
>>>>
>>>> Hmm... what's supposed to happen is that eventually a timer tick will
>>>> interrupt vcpu1. If cpu1 is set to be "active", then it will be
>>>> debited 10ms worth of credit. Eventually, it will go into OVER, and
>>>> lose BOOST. If it's "inactive", then when the tick happens, it will
>>>> be set to "active" and be debited 10ms again, setting it directly into
>>>> OVER (and thus also losing boost).
>>>>
>>>> Can you see if the timer ticks are still happening, and perhaps put
>>>> some tracing it to verify that what I described above is happening?
>>>>
>>>> -George
>>>
>>> George,
>>>
>>> Is that in csched_acct()? Looks like that's somehow gotten removed. If
>>> true, then may be that's the fundamental problem to chase.
>>>
>>> Here's what the trq looks like when hung, not in any schedule function:
>>>
>>> [0]xkdb> dtrq
>>> CPU[00]: NOW:0x00003f2db9af369e
>>> 1: exp=0x00003ee31cb32200 fn:csched_tick data:0000000000000000
>>> 2: exp=0x00003ee347ece164 fn:time_calibration data:0000000000000000
>>> 3: exp=0x00003ee69a28f04b fn:mce_work_fn data:0000000000000000
>>> 4: exp=0x00003f055895e25f fn:plt_overflow data:0000000000000000
>>> 5: exp=0x00003ee353810216 fn:rtc_update_second data:ffff83007f0226d8
>>>
>>> CPU[01]: NOW:0x00003f2db9af369e
>>> 1: exp=0x00003ee30b847988 fn:s_timer_fn data:0000000000000000
>>> 2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback data:ffff83007f022a68
>>>
>>>
>>> thanks
>>> Mukesh
>>>
>
simplify-tick-resume.diff
Description: Text Data
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|