WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] dom0 hang

To: "mukesh.rathor@xxxxxxxxxx" <mukesh.rathor@xxxxxxxxxx>
Subject: RE: [Xen-devel] dom0 hang
From: "Yu, Ke" <ke.yu@xxxxxxxxx>
Date: Tue, 7 Jul 2009 15:14:12 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "Kurt C. Hackel" <kurt.hackel@xxxxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 07 Jul 2009 00:15:38 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A52C52C.9080409@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4A426D50.80401@xxxxxxxxxx> <4A4C2743.5030703@xxxxxxxxxx> <de76405a0907021050y10a8bea0kc9de92126b58a9e8@xxxxxxxxxxxxxx> <4A4D0710.10309@xxxxxxxxxx> <de76405a0907021349q20e47f5ave3cc86b74c511f0@xxxxxxxxxxxxxx> <4A4D2253.8070807@xxxxxxxxxx> <de76405a0907021437m52f1913au2cd0963dd99eada3@xxxxxxxxxxxxxx> <4A4D4D78.1060609@xxxxxxxxxx> <4A4D5C69.5020409@xxxxxxxxxx> <4D05DB80B95B23498C72C700BD6C2E0B2F9F599D@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <4A52C52C.9080409@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acn+tZ24wQgV+EHPRx6jFo5ZCiRy2QAFY0tw
Thread-topic: [Xen-devel] dom0 hang
>-----Original Message-----
>From: Mukesh Rathor [mailto:mukesh.rathor@xxxxxxxxxx]
>Sent: Tuesday, July 07, 2009 11:47 AM
>To: Yu, Ke
>Cc: George Dunlap; Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx; Kurt C.
>Hackel
>Subject: Re: [Xen-devel] dom0 hang
>
>
>Well, the problem takes long to reproduce (only on certain boxes). And then it
>may not always happen. So I want to make sure I understand the fix, as it
>was pretty hard to debug.

Ok, looking forward your update. 

>
>While the fix will still allow softirqs pending, I guess, functionally
>it's OK because after irq disable, it'll check for pending softirq, and
>just return. I think the comment about expecting no softirq pending
>should be fixed.

Right. the comment will also be fixed.

>
>BTW, why can't the tick be suspended when csched_schedule() concludes
>it's idle vcpu before returning? won't that would make it less intrusive.

The tick suspend can be put in csched_schedule, but the suspend/resume logic is 
still needed in acpi_processor_idle anyway, due to another dbs_timer 
suspend/resume. The intention here is to make acpi_processor_idle the central 
place for timers which are stoppable during idle period. If there is other 
stoppable timer in the future, it can be easily added to acpi_processor_idle. 
So it is clean to keep current logic. and as long as we carefully not over 
doing the softirq, it looks not so intrusive. How do you think?

Best Regards
Ke

>
>thanks,
>Mukesh
>
>
>Yu, Ke wrote:
>> Hi Mukesh,
>>
>> Could you please try the following patch, to see if it can resolve the issue
>you observed? Thanks.
>>
>> Best Regards
>> Ke
>>
>> diff -r d461c4d8af17 xen/arch/x86/acpi/cpu_idle.c
>> --- a/xen/arch/x86/acpi/cpu_idle.c
>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>> @@ -228,10 +228,10 @@ static void acpi_processor_idle(void)
>>      /*
>>       * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>>       * which will break the later assumption of no sofirq pending,
>> -     * so add do_softirq
>> +     * so process the pending timers
>>       */
>> -    if ( softirq_pending(smp_processor_id()) )
>> -        do_softirq();
>> +
>> +    process_pending_timers();
>>
>>      /*
>>       * Interrupts must be disabled during bus mastering calculations and
>>
>>> -----Original Message-----
>>> From: Mukesh Rathor [mailto:mukesh.rathor@xxxxxxxxxx]
>>> Sent: Friday, July 03, 2009 9:19 AM
>>> To: mukesh.rathor@xxxxxxxxxx
>>> Cc: George Dunlap; Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx; Yu, Ke;
>Kurt C.
>>> Hackel
>>> Subject: Re: [Xen-devel] dom0 hang
>>>
>>>
>>> Hi Kevin/Yu:
>>>
>>> acpi_processor_idle()
>>> {
>>>     sched_tick_suspend();
>>>      /*
>>>      * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>>>      * which will break the later assumption of no sofirq pending,
>>>      * so add do_softirq
>>>      */
>>>     if ( softirq_pending(smp_processor_id()) )
>>>         do_softirq();             <===============
>>>
>>>     local_irq_disable();
>>>     if ( softirq_pending(smp_processor_id()) )
>>>     {
>>>         local_irq_enable();
>>>         sched_tick_resume();
>>>         cpufreq_dbs_timer_resume();
>>>         return;
>>>     }
>>>
>>> wouldn't the do_softirq() call scheduler with tick suspended, and
>>> the scheduler then context switches to another vcpu0 (with *_BOOST)
>which
>>> would result in the stuck vcpu I described?
>>>
>>> thanks
>>> Mukesh
>>>
>>>
>>> Mukesh Rathor wrote:
>>>> ah, i totally missed csched_tick():
>>>>     if ( !is_idle_vcpu(current) )
>>>>         csched_vcpu_acct(cpu);
>>>>
>>>> yeah, looks like that's what is going on. i'm still waiting to
>>>> reproduce. at first glance, looking at c/s 19460, seems like
>>>> suspend/resume, well at least the resume, should happen in
>>>> csched_schedule().....
>>>>
>>>> thanks,
>>>> Mukesh
>>>>
>>>>
>>>> George Dunlap wrote:
>>>>> [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke
>>>>> who wrote cs 19460]
>>>>>
>>>>> The functionality I was talking about, subtracting credits and
>>>>> clearing BOOST, happens in csched_vcpu_acct() (which is different than
>>>>> csched_acct()).  vcpu_acct() is called from csched_tick(), which
>>>>> should still happen every 10ms on every cpu.
>>>>>
>>>>> The patch I referred to (cs 19460) disables and re-enables tickers in
>>>>> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the
>>>>> processor idles.  I can't see anywhere else that tickers are disabled,
>>>>> so it's probably something not properly re-enabling them again.
>>>>>
>>>>> Try applying the attached patch to see if that changes anything.  (I'm
>>>>> on the road, so I can't repro the lockup issue.)  If that doesn't
>>>>> work, try disabling c-states and see if that helps.  Then at least
>>>>> we'll know where the problem lies.
>>>>>
>>>>>  -George
>>>>>
>>>>> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh
>>>>> Rathor<mukesh.rathor@xxxxxxxxxx> wrote:
>>>>>> that seems to only suspend csched_pcpu.ticker which is csched_tick
>>>>>> that is
>>>>>> only sorting local runq.
>>>>>>
>>>>>> again, we are concerned about csched_priv.master_ticker that calls
>>>>>> csched_acct? correct, so i can trace that?
>>>>>>
>>>>>> thanks,
>>>>>> mukesh
>>>>>>
>>>>>>
>>>>>> George Dunlap wrote:
>>>>>>> Ah, I see that there's been some changes to tick stuff with the
>>>>>>> c-state (e.g., cs 19460).  It looks like they're supposed to be going
>>>>>>> still, but perhaps the tick_suspend() and tick_resume() aren't being
>>>>>>> called properly.  Let me take a closer look.
>>>>>>>
>>>>>>>  -George
>>>>>>>
>>>>>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
>>> Rathor<mukesh.rathor@xxxxxxxxxx>
>>>>>>> wrote:
>>>>>>>> George Dunlap wrote:
>>>>>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
>>>>>>>>> Rathor<mukesh.rathor@xxxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>> dom0 hang:
>>>>>>>>>>  vcpu0 is trying to wakeup a task and in try_to_wake_up() calls
>>>>>>>>>>  task_rq_lock(). since the task has cpu set to 1, it gets runq lock
>>>>>>>>>>  for vcpu1. next it calls resched_task() which results in sending
>>>>>>>>>> IPI
>>>>>>>>>>  to vcpu1. for that, vcpu0 gets into the
>>> HYPERVISOR_event_channel_op
>>>>>>>>>>  HCALL and is waiting to return. Meanwhile, vcpu1 got running,
>>>>>>>>>> and is
>>>>>>>>>>  spinning on it's runq lock in
>>>>>>>>>> "schedule():spin_lock_irq(&rq->lock);",
>>>>>>>>>>  that vcpu0 is holding (and is waiting to return from the HCALL).
>>>>>>>>>>
>>>>>>>>>>  As I had noticed before, vcpu0 never gets scheduled in xen. So
>>>>>>>>>>  looking further into xen:
>>>>>>>>>>
>>>>>>>>>> xen:
>>>>>>>>>>  Both vcpu's are on the same runq, in this case cpu1. But the
>>>>>>>>>>  priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a
>>> result,
>>>>>>>>>>  the scheduler always picks vcpu1, and vcpu0 is starved. Also, I
>>>>>>>>>> see in
>>>>>>>>>>  kdb that the scheduler timer is not set on cpu 0. That would've
>>>>>>>>>>  allowed csched_load_balance() to kick in on cpu0. [Also, on
>>>>>>>>>>  cpu1, the accounting timer, csched_tick, is not set.  Altho,
>>>>>>>>>>  csched_tick() is running on cpu0, it only checks runq for cpu0.]
>>>>>>>>>>
>>>>>>>>>>  Looks like c/s 19500 changed csched_schedule():
>>>>>>>>>>
>>>>>>>>>> -    ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>>>>>>> +    ret.time = (is_idle_vcpu(snext->vcpu) ?
>>>>>>>>>> +                -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>>>>>>
>>>>>>>>>>  The quickest fix for us would be to just back that out.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  BTW, just a comment on following (all in sched_credit.c):
>>>>>>>>>>
>>>>>>>>>>    if ( svc->pri == CSCHED_PRI_TS_UNDER &&
>>>>>>>>>>       !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
>>>>>>>>>>    {
>>>>>>>>>>       svc->pri = CSCHED_PRI_TS_BOOST;
>>>>>>>>>>    }
>>>>>>>>>>  comibined with
>>>>>>>>>>  if ( snext->pri > CSCHED_PRI_TS_OVER )
>>>>>>>>>>          __runq_remove(snext);
>>>>>>>>>>
>>>>>>>>>>    Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems
>dangerous.
>>> To
>>>>>>>>>> me,
>>>>>>>>>>    since csched_schedule() never checks for time accumulated
>by a
>>>>>>>>>>    vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a
>>>>>>>>>> vcpu to a
>>>>>>>>>>    pcpu. if that vcpu never makes progress, essentially, the
>system
>>>>>>>>>>    has lost a physical cpu.  Optionally, csched_schedule()
>should
>>>>>>>>>> always
>>>>>>>>>>    check for cpu time accumulated and reduce the priority over
>>> time.
>>>>>>>>>>    I can't tell right off if it already does that. or something like
>>>>>>>>>>    that :)...  my 2 cents.
>>>>>>>>> Hmm... what's supposed to happen is that eventually a timer tick
>will
>>>>>>>>> interrupt vcpu1.  If cpu1 is set to be "active", then it will be
>>>>>>>>> debited 10ms worth of credit.  Eventually, it will go into OVER,
>and
>>>>>>>>> lose BOOST.  If it's "inactive", then when the tick happens, it will
>>>>>>>>> be set to "active" and be debited 10ms again, setting it directly
>>>>>>>>> into
>>>>>>>>> OVER (and thus also losing boost).
>>>>>>>>>
>>>>>>>>> Can you see if the timer ticks are still happening, and perhaps put
>>>>>>>>> some tracing it to verify that what I described above is happening?
>>>>>>>>>
>>>>>>>>>  -George
>>>>>>>> George,
>>>>>>>>
>>>>>>>> Is that in csched_acct()? Looks like that's somehow gotten removed.
>If
>>>>>>>> true, then may be that's the fundamental problem to chase.
>>>>>>>>
>>>>>>>> Here's what the trq looks like when hung, not in any schedule
>>>>>>>> function:
>>>>>>>>
>>>>>>>> [0]xkdb> dtrq
>>>>>>>> CPU[00]: NOW:0x00003f2db9af369e
>>>>>>>>  1: exp=0x00003ee31cb32200 fn:csched_tick
>>> data:0000000000000000
>>>>>>>>  2: exp=0x00003ee347ece164 fn:time_calibration
>>> data:0000000000000000
>>>>>>>>  3: exp=0x00003ee69a28f04b fn:mce_work_fn
>>> data:0000000000000000
>>>>>>>>  4: exp=0x00003f055895e25f fn:plt_overflow
>>> data:0000000000000000
>>>>>>>>  5: exp=0x00003ee353810216 fn:rtc_update_second
>>> data:ffff83007f0226d8
>>>>>>>> CPU[01]: NOW:0x00003f2db9af369e
>>>>>>>>  1: exp=0x00003ee30b847988 fn:s_timer_fn
>>> data:0000000000000000
>>>>>>>>  2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback
>>> data:ffff83007f022a68
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> Mukesh
>>>>>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>