>-----Original Message-----
>From: Mukesh Rathor [mailto:mukesh.rathor@xxxxxxxxxx]
>Sent: Tuesday, July 07, 2009 11:47 AM
>To: Yu, Ke
>Cc: George Dunlap; Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx; Kurt C.
>Hackel
>Subject: Re: [Xen-devel] dom0 hang
>
>
>Well, the problem takes long to reproduce (only on certain boxes). And then it
>may not always happen. So I want to make sure I understand the fix, as it
>was pretty hard to debug.
Ok, looking forward your update.
>
>While the fix will still allow softirqs pending, I guess, functionally
>it's OK because after irq disable, it'll check for pending softirq, and
>just return. I think the comment about expecting no softirq pending
>should be fixed.
Right. the comment will also be fixed.
>
>BTW, why can't the tick be suspended when csched_schedule() concludes
>it's idle vcpu before returning? won't that would make it less intrusive.
The tick suspend can be put in csched_schedule, but the suspend/resume logic is
still needed in acpi_processor_idle anyway, due to another dbs_timer
suspend/resume. The intention here is to make acpi_processor_idle the central
place for timers which are stoppable during idle period. If there is other
stoppable timer in the future, it can be easily added to acpi_processor_idle.
So it is clean to keep current logic. and as long as we carefully not over
doing the softirq, it looks not so intrusive. How do you think?
Best Regards
Ke
>
>thanks,
>Mukesh
>
>
>Yu, Ke wrote:
>> Hi Mukesh,
>>
>> Could you please try the following patch, to see if it can resolve the issue
>you observed? Thanks.
>>
>> Best Regards
>> Ke
>>
>> diff -r d461c4d8af17 xen/arch/x86/acpi/cpu_idle.c
>> --- a/xen/arch/x86/acpi/cpu_idle.c
>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>> @@ -228,10 +228,10 @@ static void acpi_processor_idle(void)
>> /*
>> * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>> * which will break the later assumption of no sofirq pending,
>> - * so add do_softirq
>> + * so process the pending timers
>> */
>> - if ( softirq_pending(smp_processor_id()) )
>> - do_softirq();
>> +
>> + process_pending_timers();
>>
>> /*
>> * Interrupts must be disabled during bus mastering calculations and
>>
>>> -----Original Message-----
>>> From: Mukesh Rathor [mailto:mukesh.rathor@xxxxxxxxxx]
>>> Sent: Friday, July 03, 2009 9:19 AM
>>> To: mukesh.rathor@xxxxxxxxxx
>>> Cc: George Dunlap; Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx; Yu, Ke;
>Kurt C.
>>> Hackel
>>> Subject: Re: [Xen-devel] dom0 hang
>>>
>>>
>>> Hi Kevin/Yu:
>>>
>>> acpi_processor_idle()
>>> {
>>> sched_tick_suspend();
>>> /*
>>> * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>>> * which will break the later assumption of no sofirq pending,
>>> * so add do_softirq
>>> */
>>> if ( softirq_pending(smp_processor_id()) )
>>> do_softirq(); <===============
>>>
>>> local_irq_disable();
>>> if ( softirq_pending(smp_processor_id()) )
>>> {
>>> local_irq_enable();
>>> sched_tick_resume();
>>> cpufreq_dbs_timer_resume();
>>> return;
>>> }
>>>
>>> wouldn't the do_softirq() call scheduler with tick suspended, and
>>> the scheduler then context switches to another vcpu0 (with *_BOOST)
>which
>>> would result in the stuck vcpu I described?
>>>
>>> thanks
>>> Mukesh
>>>
>>>
>>> Mukesh Rathor wrote:
>>>> ah, i totally missed csched_tick():
>>>> if ( !is_idle_vcpu(current) )
>>>> csched_vcpu_acct(cpu);
>>>>
>>>> yeah, looks like that's what is going on. i'm still waiting to
>>>> reproduce. at first glance, looking at c/s 19460, seems like
>>>> suspend/resume, well at least the resume, should happen in
>>>> csched_schedule().....
>>>>
>>>> thanks,
>>>> Mukesh
>>>>
>>>>
>>>> George Dunlap wrote:
>>>>> [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke
>>>>> who wrote cs 19460]
>>>>>
>>>>> The functionality I was talking about, subtracting credits and
>>>>> clearing BOOST, happens in csched_vcpu_acct() (which is different than
>>>>> csched_acct()). vcpu_acct() is called from csched_tick(), which
>>>>> should still happen every 10ms on every cpu.
>>>>>
>>>>> The patch I referred to (cs 19460) disables and re-enables tickers in
>>>>> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the
>>>>> processor idles. I can't see anywhere else that tickers are disabled,
>>>>> so it's probably something not properly re-enabling them again.
>>>>>
>>>>> Try applying the attached patch to see if that changes anything. (I'm
>>>>> on the road, so I can't repro the lockup issue.) If that doesn't
>>>>> work, try disabling c-states and see if that helps. Then at least
>>>>> we'll know where the problem lies.
>>>>>
>>>>> -George
>>>>>
>>>>> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh
>>>>> Rathor<mukesh.rathor@xxxxxxxxxx> wrote:
>>>>>> that seems to only suspend csched_pcpu.ticker which is csched_tick
>>>>>> that is
>>>>>> only sorting local runq.
>>>>>>
>>>>>> again, we are concerned about csched_priv.master_ticker that calls
>>>>>> csched_acct? correct, so i can trace that?
>>>>>>
>>>>>> thanks,
>>>>>> mukesh
>>>>>>
>>>>>>
>>>>>> George Dunlap wrote:
>>>>>>> Ah, I see that there's been some changes to tick stuff with the
>>>>>>> c-state (e.g., cs 19460). It looks like they're supposed to be going
>>>>>>> still, but perhaps the tick_suspend() and tick_resume() aren't being
>>>>>>> called properly. Let me take a closer look.
>>>>>>>
>>>>>>> -George
>>>>>>>
>>>>>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
>>> Rathor<mukesh.rathor@xxxxxxxxxx>
>>>>>>> wrote:
>>>>>>>> George Dunlap wrote:
>>>>>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
>>>>>>>>> Rathor<mukesh.rathor@xxxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>> dom0 hang:
>>>>>>>>>> vcpu0 is trying to wakeup a task and in try_to_wake_up() calls
>>>>>>>>>> task_rq_lock(). since the task has cpu set to 1, it gets runq lock
>>>>>>>>>> for vcpu1. next it calls resched_task() which results in sending
>>>>>>>>>> IPI
>>>>>>>>>> to vcpu1. for that, vcpu0 gets into the
>>> HYPERVISOR_event_channel_op
>>>>>>>>>> HCALL and is waiting to return. Meanwhile, vcpu1 got running,
>>>>>>>>>> and is
>>>>>>>>>> spinning on it's runq lock in
>>>>>>>>>> "schedule():spin_lock_irq(&rq->lock);",
>>>>>>>>>> that vcpu0 is holding (and is waiting to return from the HCALL).
>>>>>>>>>>
>>>>>>>>>> As I had noticed before, vcpu0 never gets scheduled in xen. So
>>>>>>>>>> looking further into xen:
>>>>>>>>>>
>>>>>>>>>> xen:
>>>>>>>>>> Both vcpu's are on the same runq, in this case cpu1. But the
>>>>>>>>>> priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a
>>> result,
>>>>>>>>>> the scheduler always picks vcpu1, and vcpu0 is starved. Also, I
>>>>>>>>>> see in
>>>>>>>>>> kdb that the scheduler timer is not set on cpu 0. That would've
>>>>>>>>>> allowed csched_load_balance() to kick in on cpu0. [Also, on
>>>>>>>>>> cpu1, the accounting timer, csched_tick, is not set. Altho,
>>>>>>>>>> csched_tick() is running on cpu0, it only checks runq for cpu0.]
>>>>>>>>>>
>>>>>>>>>> Looks like c/s 19500 changed csched_schedule():
>>>>>>>>>>
>>>>>>>>>> - ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>>>>>>> + ret.time = (is_idle_vcpu(snext->vcpu) ?
>>>>>>>>>> + -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>>>>>>
>>>>>>>>>> The quickest fix for us would be to just back that out.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> BTW, just a comment on following (all in sched_credit.c):
>>>>>>>>>>
>>>>>>>>>> if ( svc->pri == CSCHED_PRI_TS_UNDER &&
>>>>>>>>>> !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
>>>>>>>>>> {
>>>>>>>>>> svc->pri = CSCHED_PRI_TS_BOOST;
>>>>>>>>>> }
>>>>>>>>>> comibined with
>>>>>>>>>> if ( snext->pri > CSCHED_PRI_TS_OVER )
>>>>>>>>>> __runq_remove(snext);
>>>>>>>>>>
>>>>>>>>>> Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems
>dangerous.
>>> To
>>>>>>>>>> me,
>>>>>>>>>> since csched_schedule() never checks for time accumulated
>by a
>>>>>>>>>> vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a
>>>>>>>>>> vcpu to a
>>>>>>>>>> pcpu. if that vcpu never makes progress, essentially, the
>system
>>>>>>>>>> has lost a physical cpu. Optionally, csched_schedule()
>should
>>>>>>>>>> always
>>>>>>>>>> check for cpu time accumulated and reduce the priority over
>>> time.
>>>>>>>>>> I can't tell right off if it already does that. or something like
>>>>>>>>>> that :)... my 2 cents.
>>>>>>>>> Hmm... what's supposed to happen is that eventually a timer tick
>will
>>>>>>>>> interrupt vcpu1. If cpu1 is set to be "active", then it will be
>>>>>>>>> debited 10ms worth of credit. Eventually, it will go into OVER,
>and
>>>>>>>>> lose BOOST. If it's "inactive", then when the tick happens, it will
>>>>>>>>> be set to "active" and be debited 10ms again, setting it directly
>>>>>>>>> into
>>>>>>>>> OVER (and thus also losing boost).
>>>>>>>>>
>>>>>>>>> Can you see if the timer ticks are still happening, and perhaps put
>>>>>>>>> some tracing it to verify that what I described above is happening?
>>>>>>>>>
>>>>>>>>> -George
>>>>>>>> George,
>>>>>>>>
>>>>>>>> Is that in csched_acct()? Looks like that's somehow gotten removed.
>If
>>>>>>>> true, then may be that's the fundamental problem to chase.
>>>>>>>>
>>>>>>>> Here's what the trq looks like when hung, not in any schedule
>>>>>>>> function:
>>>>>>>>
>>>>>>>> [0]xkdb> dtrq
>>>>>>>> CPU[00]: NOW:0x00003f2db9af369e
>>>>>>>> 1: exp=0x00003ee31cb32200 fn:csched_tick
>>> data:0000000000000000
>>>>>>>> 2: exp=0x00003ee347ece164 fn:time_calibration
>>> data:0000000000000000
>>>>>>>> 3: exp=0x00003ee69a28f04b fn:mce_work_fn
>>> data:0000000000000000
>>>>>>>> 4: exp=0x00003f055895e25f fn:plt_overflow
>>> data:0000000000000000
>>>>>>>> 5: exp=0x00003ee353810216 fn:rtc_update_second
>>> data:ffff83007f0226d8
>>>>>>>> CPU[01]: NOW:0x00003f2db9af369e
>>>>>>>> 1: exp=0x00003ee30b847988 fn:s_timer_fn
>>> data:0000000000000000
>>>>>>>> 2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback
>>> data:ffff83007f022a68
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> Mukesh
>>>>>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|