Hi Mukesh,
Could you please try the following patch, to see if it can resolve the issue
you observed? Thanks.
Best Regards
Ke
diff -r d461c4d8af17 xen/arch/x86/acpi/cpu_idle.c
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -228,10 +228,10 @@ static void acpi_processor_idle(void)
/*
* sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
* which will break the later assumption of no sofirq pending,
- * so add do_softirq
+ * so process the pending timers
*/
- if ( softirq_pending(smp_processor_id()) )
- do_softirq();
+
+ process_pending_timers();
/*
* Interrupts must be disabled during bus mastering calculations and
>-----Original Message-----
>From: Mukesh Rathor [mailto:mukesh.rathor@xxxxxxxxxx]
>Sent: Friday, July 03, 2009 9:19 AM
>To: mukesh.rathor@xxxxxxxxxx
>Cc: George Dunlap; Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx; Yu, Ke; Kurt C.
>Hackel
>Subject: Re: [Xen-devel] dom0 hang
>
>
>Hi Kevin/Yu:
>
>acpi_processor_idle()
>{
> sched_tick_suspend();
> /*
> * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
> * which will break the later assumption of no sofirq pending,
> * so add do_softirq
> */
> if ( softirq_pending(smp_processor_id()) )
> do_softirq(); <===============
>
> local_irq_disable();
> if ( softirq_pending(smp_processor_id()) )
> {
> local_irq_enable();
> sched_tick_resume();
> cpufreq_dbs_timer_resume();
> return;
> }
>
>wouldn't the do_softirq() call scheduler with tick suspended, and
>the scheduler then context switches to another vcpu0 (with *_BOOST) which
>would result in the stuck vcpu I described?
>
>thanks
>Mukesh
>
>
>Mukesh Rathor wrote:
>> ah, i totally missed csched_tick():
>> if ( !is_idle_vcpu(current) )
>> csched_vcpu_acct(cpu);
>>
>> yeah, looks like that's what is going on. i'm still waiting to
>> reproduce. at first glance, looking at c/s 19460, seems like
>> suspend/resume, well at least the resume, should happen in
>> csched_schedule().....
>>
>> thanks,
>> Mukesh
>>
>>
>> George Dunlap wrote:
>>> [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke
>>> who wrote cs 19460]
>>>
>>> The functionality I was talking about, subtracting credits and
>>> clearing BOOST, happens in csched_vcpu_acct() (which is different than
>>> csched_acct()). vcpu_acct() is called from csched_tick(), which
>>> should still happen every 10ms on every cpu.
>>>
>>> The patch I referred to (cs 19460) disables and re-enables tickers in
>>> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the
>>> processor idles. I can't see anywhere else that tickers are disabled,
>>> so it's probably something not properly re-enabling them again.
>>>
>>> Try applying the attached patch to see if that changes anything. (I'm
>>> on the road, so I can't repro the lockup issue.) If that doesn't
>>> work, try disabling c-states and see if that helps. Then at least
>>> we'll know where the problem lies.
>>>
>>> -George
>>>
>>> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh
>>> Rathor<mukesh.rathor@xxxxxxxxxx> wrote:
>>>> that seems to only suspend csched_pcpu.ticker which is csched_tick
>>>> that is
>>>> only sorting local runq.
>>>>
>>>> again, we are concerned about csched_priv.master_ticker that calls
>>>> csched_acct? correct, so i can trace that?
>>>>
>>>> thanks,
>>>> mukesh
>>>>
>>>>
>>>> George Dunlap wrote:
>>>>> Ah, I see that there's been some changes to tick stuff with the
>>>>> c-state (e.g., cs 19460). It looks like they're supposed to be going
>>>>> still, but perhaps the tick_suspend() and tick_resume() aren't being
>>>>> called properly. Let me take a closer look.
>>>>>
>>>>> -George
>>>>>
>>>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
>Rathor<mukesh.rathor@xxxxxxxxxx>
>>>>> wrote:
>>>>>> George Dunlap wrote:
>>>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
>>>>>>> Rathor<mukesh.rathor@xxxxxxxxxx>
>>>>>>> wrote:
>>>>>>>> dom0 hang:
>>>>>>>> vcpu0 is trying to wakeup a task and in try_to_wake_up() calls
>>>>>>>> task_rq_lock(). since the task has cpu set to 1, it gets runq lock
>>>>>>>> for vcpu1. next it calls resched_task() which results in sending
>>>>>>>> IPI
>>>>>>>> to vcpu1. for that, vcpu0 gets into the
>HYPERVISOR_event_channel_op
>>>>>>>> HCALL and is waiting to return. Meanwhile, vcpu1 got running,
>>>>>>>> and is
>>>>>>>> spinning on it's runq lock in
>>>>>>>> "schedule():spin_lock_irq(&rq->lock);",
>>>>>>>> that vcpu0 is holding (and is waiting to return from the HCALL).
>>>>>>>>
>>>>>>>> As I had noticed before, vcpu0 never gets scheduled in xen. So
>>>>>>>> looking further into xen:
>>>>>>>>
>>>>>>>> xen:
>>>>>>>> Both vcpu's are on the same runq, in this case cpu1. But the
>>>>>>>> priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a
>result,
>>>>>>>> the scheduler always picks vcpu1, and vcpu0 is starved. Also, I
>>>>>>>> see in
>>>>>>>> kdb that the scheduler timer is not set on cpu 0. That would've
>>>>>>>> allowed csched_load_balance() to kick in on cpu0. [Also, on
>>>>>>>> cpu1, the accounting timer, csched_tick, is not set. Altho,
>>>>>>>> csched_tick() is running on cpu0, it only checks runq for cpu0.]
>>>>>>>>
>>>>>>>> Looks like c/s 19500 changed csched_schedule():
>>>>>>>>
>>>>>>>> - ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>>>>> + ret.time = (is_idle_vcpu(snext->vcpu) ?
>>>>>>>> + -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>>>>
>>>>>>>> The quickest fix for us would be to just back that out.
>>>>>>>>
>>>>>>>>
>>>>>>>> BTW, just a comment on following (all in sched_credit.c):
>>>>>>>>
>>>>>>>> if ( svc->pri == CSCHED_PRI_TS_UNDER &&
>>>>>>>> !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
>>>>>>>> {
>>>>>>>> svc->pri = CSCHED_PRI_TS_BOOST;
>>>>>>>> }
>>>>>>>> comibined with
>>>>>>>> if ( snext->pri > CSCHED_PRI_TS_OVER )
>>>>>>>> __runq_remove(snext);
>>>>>>>>
>>>>>>>> Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems dangerous.
>To
>>>>>>>> me,
>>>>>>>> since csched_schedule() never checks for time accumulated by a
>>>>>>>> vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a
>>>>>>>> vcpu to a
>>>>>>>> pcpu. if that vcpu never makes progress, essentially, the system
>>>>>>>> has lost a physical cpu. Optionally, csched_schedule() should
>>>>>>>> always
>>>>>>>> check for cpu time accumulated and reduce the priority over
>time.
>>>>>>>> I can't tell right off if it already does that. or something like
>>>>>>>> that :)... my 2 cents.
>>>>>>> Hmm... what's supposed to happen is that eventually a timer tick will
>>>>>>> interrupt vcpu1. If cpu1 is set to be "active", then it will be
>>>>>>> debited 10ms worth of credit. Eventually, it will go into OVER, and
>>>>>>> lose BOOST. If it's "inactive", then when the tick happens, it will
>>>>>>> be set to "active" and be debited 10ms again, setting it directly
>>>>>>> into
>>>>>>> OVER (and thus also losing boost).
>>>>>>>
>>>>>>> Can you see if the timer ticks are still happening, and perhaps put
>>>>>>> some tracing it to verify that what I described above is happening?
>>>>>>>
>>>>>>> -George
>>>>>> George,
>>>>>>
>>>>>> Is that in csched_acct()? Looks like that's somehow gotten removed. If
>>>>>> true, then may be that's the fundamental problem to chase.
>>>>>>
>>>>>> Here's what the trq looks like when hung, not in any schedule
>>>>>> function:
>>>>>>
>>>>>> [0]xkdb> dtrq
>>>>>> CPU[00]: NOW:0x00003f2db9af369e
>>>>>> 1: exp=0x00003ee31cb32200 fn:csched_tick
>data:0000000000000000
>>>>>> 2: exp=0x00003ee347ece164 fn:time_calibration
>data:0000000000000000
>>>>>> 3: exp=0x00003ee69a28f04b fn:mce_work_fn
>data:0000000000000000
>>>>>> 4: exp=0x00003f055895e25f fn:plt_overflow
>data:0000000000000000
>>>>>> 5: exp=0x00003ee353810216 fn:rtc_update_second
>data:ffff83007f0226d8
>>>>>>
>>>>>> CPU[01]: NOW:0x00003f2db9af369e
>>>>>> 1: exp=0x00003ee30b847988 fn:s_timer_fn
>data:0000000000000000
>>>>>> 2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback
>data:ffff83007f022a68
>>>>>>
>>>>>>
>>>>>> thanks
>>>>>> Mukesh
>>>>>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|