Here is the patch according to Kevin's analysis. Jan, could you give it a try?
Thanks
diff -r 55eb480d82ae xen/common/schedule.c
--- a/xen/common/schedule.c Wed Feb 24 13:24:54 2010 +0800
+++ b/xen/common/schedule.c Wed Feb 24 13:55:50 2010 +0800
@@ -107,7 +107,8 @@ static inline void vcpu_urgent_count_upd
if ( unlikely(v->is_urgent) )
{
- if ( !test_bit(v->vcpu_id, v->domain->poll_mask) )
+ if ( !test_bit(_VPF_blocked, &v->pause_flags) ||
+ !test_bit(v->vcpu_id, v->domain->poll_mask) )
{
v->is_urgent = 0;
atomic_dec(&per_cpu(schedule_data,v->processor).urgent_count);
@@ -115,7 +116,8 @@ static inline void vcpu_urgent_count_upd
}
else
{
- if ( unlikely(test_bit(v->vcpu_id, v->domain->poll_mask)) )
+ if ( unlikely(test_bit(_VPF_blocked, &v->pause_flags) &&
+ test_bit(v->vcpu_id, v->domain->poll_mask)) )
{
v->is_urgent = 1;
atomic_inc(&per_cpu(schedule_data,v->processor).urgent_count);
>-----Original Message-----
>From: Tian, Kevin
>Sent: Wednesday, February 24, 2010 11:09 AM
>To: Jan Beulich; Keir Fraser; Yu, Ke
>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>Subject: RE: [Xen-devel] cpuidle causing Dom0 soft lockups
>
>>From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
>>Sent: 2010年2月24日 0:45
>>
>>>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 23.02.10 11:57 >>>
>>>On 23/02/2010 10:37, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>>>
>>>>> Right. According to the code, there should be no way to
>>this BUG_ON.
>>>>> If it happens, that reveal either bugs of code or the necessity of
>>>>> adding code to migrate urgent vcpu count. Do you have more
>>>>> information on how this BUG_ON happens?
>>>>
>>>> Obviously there are vCPU-s that get inserted on a run queue with
>>>> is_urgent set (which according to my reading of Keir's description
>>>> shouldn't happen). In particular, this
>>>
>>>Is it possible for a polling VCPU to become runnable without it being
>>>cleared from poll_mask? I suspect maybe that is the problem,
>>and that needs
>>>dealing with, or the proper handling needs to be added to
>>sched_credit.c.
>>
>>I don't think that's the case, at least not exclusively. Using
>>
>>--- a/xen/common/sched_credit.c
>>+++ b/xen/common/sched_credit.c
>>@@ -201,6 +201,7 @@ __runq_insert(unsigned int cpu, struct c
>>
>> BUG_ON( __vcpu_on_runq(svc) );
>> BUG_ON( cpu != svc->vcpu->processor );
>>+WARN_ON(svc->vcpu->is_urgent);//temp
>>
>> list_for_each( iter, runq )
>> {
>>--- a/xen/common/schedule.c
>>+++ b/xen/common/schedule.c
>>@@ -139,6 +139,7 @@ static inline void vcpu_runstate_change(
>>
>>ASSERT(spin_is_locked(&per_cpu(schedule_data,v->processor).sche
>>dule_lock));
>>
>> vcpu_urgent_count_update(v);
>>+WARN_ON(v->is_urgent && new_state <= RUNSTATE_runnable);//temp
>>
>> trace_runstate_change(v, new_state);
>>
>>I get pairs of warnings (i.e. each for the same vCPU):
>>
>>(XEN) Xen WARN at schedule.c:142
>>(XEN) Xen call trace:
>>(XEN) [<ffff82c48011c8d5>] schedule+0x375/0x510
>>(XEN) [<ffff82c48011deb8>] __do_softirq+0x58/0x80
>>(XEN) [<ffff82c4801e61e6>] process_softirqs+0x6/0x10
>>
>>(XEN) Xen WARN at sched_credit.c:204
>>(XEN) Xen call trace:
>>(XEN) [<ffff82c4801186b9>] csched_vcpu_wake+0x169/0x1a0
>>(XEN) [<ffff82c4801497f2>] update_runstate_area+0x102/0x110
>>(XEN) [<ffff82c48011cdcf>] vcpu_wake+0x13f/0x390
>>(XEN) [<ffff82c48014b1a0>] context_switch+0x760/0xed0
>>(XEN) [<ffff82c48014913d>] vcpu_kick+0x1d/0x80
>>(XEN) [<ffff82c480107feb>] evtchn_set_pending+0xab/0x1b0
>>(XEN) [<ffff82c4801083a9>] evtchn_send+0x129/0x150
>>(XEN) [<ffff82c480108950>] do_event_channel_op+0x4c0/0xf50
>>(XEN) [<ffff82c4801461b5>] reprogram_timer+0x55/0x90
>>(XEN) [<ffff82c4801461b5>] reprogram_timer+0x55/0x90
>>(XEN) [<ffff82c48011fd44>] timer_softirq_action+0x1a4/0x360
>>(XEN) [<ffff82c4801e6169>] syscall_enter+0xa9/0xae
>>
>>In schedule() this is always "prev" transitioning to RUNSTATE_runnable
>>(i.e. _VPF_blocked not set), yet the second call trace shows that
>>_VPF_blocked must have been set at that point (otherwise
>>vcpu_unblock(), tail-called from vcpu_kick(), would not have called
>>vcpu_wake()). If the order wasn't always that shown, or if the two
>>traces got intermixed, this could hint at a race - but they are always
>>that way, which so far I cannot make sense of.
>>
>
>Such race surely exists, since two paths are each updating multiple
>fileds which however are not all protected with same scheduler lock.
>vcpu_unblock manipulates _VPF_blocked w/o acuquiring scheduler
>lock. Then below sequence is possible:
>
>enter schedule() with _VPF_blocked
>...
><-vcpu_unblock clears _VPF_blocked. poll_mask hasn't been cleared yet
>...
>vcpu_runstate_change(
> prev,
> (test_bit(_VPF_blocked, &prev->pause_flags) ? RUNSTATE_blocked :
> (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
> now);
>
>Then RUNSTATE_runnable is chosen. Then vcpu_urgent_count_update
>will set is_urgent flag since poll_mask has bit set. Then vcpu_unblock
>clears poll_mask and then invoke vcpu_wake. vcpu_wake wait for
>scheduler lock and then will see is_urgent flag set for a runnable vcpu.
>
>Here one necessary check is missed in vcpu_urgent_count_update, as
>only polling vcpu in blocked state should be cared in cpuidle. If there's
>any runnable vcpu in temporary polling state, idle vcpu won't get
>scheduled. is_urgent can be only set when new runstate is blocked, and
>when vcpu is in poll_mask. As runstate change always happen with
>scheduler lock, this can effectively ensure set/clear of is_urgent.
>
>Thanks
>Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|