Juergen,
What is supposed to happen if a domain is in cpupool0, and then all of
the cpus are taken out of cpupool0? Is that possible?
It looks like there's code in cpupools.c:cpupool_unassign_cpu() which
will move all VMs in a cpupool to cpupool0 before removing the last
cpu. But what happens if cpupool0 is the pool that has become empty?
It seems like that breaks a lot of the assumptions; e.g.,
sched_move_domain() seems to assume that the pool we're moving a VM to
actually has cpus.
While we're at it, what's with the "(cpu != cpu_moving_cpu)" in the
first half of cpupool_unassign_cpu()? Under what conditions are you
anticipating cpupool_unassign_cpu() being called a second time before
the first completes? If you have to abort the move because
schedule_cpu_switch() failed, wouldn't it be better just to roll the
whole transaction back, rather than leaving it hanging in the middle?
Hmm, and why does RMCPU call cpupool_get_by_id() with exact==0? What
could possibly be the use of grabbing a random cpupool and then trying
to remove the specified cpu from it?
Andre, you might think about folding the attached patch into your debug patch.
-George
On Mon, Feb 7, 2011 at 1:32 PM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:
> On 02/07/11 13:38, Andre Przywara wrote:
>>
>> Juergen,
>>
>> as promised some more debug data. This is from c/s 22858 with Stephans
>> debug patch (attached).
>> We get the following dump when the hypervisor crashes, note that the
>> first lock is different from the second and subsequent ones:
>>
>> (XEN) sched_credit.c, 572: prv: ffff831836df2970 &prv->lock:
>> ffff831836df2970 prv->weight: 256 sdom->active_vcpu_count: 3
>> sdom->weight: 256
>> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
>> ffff830437ffa5e0 prv->weight: 768 sdom->active_vcpu_count: 4
>> sdom->weight: 256
>> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
>> ffff830437ffa5e0 prv->weight: 1024 sdom->active_vcpu_count: 5
>> sdom->weight: 256
>> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
>> ffff830437ffa5e0 prv->weight: 1280 sdom->active_vcpu_count: 6
>> sdom->weight: 256
>>
>> ....
>>
>> Hope that gives you an idea. I attach the whole log for your reference.
>
> Hmm, could it be your log wasn't created with the attached patch? I'm
> missing
> Dom-Id and VCPU from the printk() above, which would be interesting (at
> least
> I hope so)...
> Additionally printing the local pcpu number would help, too.
> And could you add a printk for the new prv address in csched_init()?
>
> It would be nice if you could enable cpupool diag output. Please use the
> attached patch (includes the previous patch for executing the cpu move on
> the
> cpu to be moved, plus some diag printk corrections).
>
>
> Juergen
>
> --
> Juergen Gross Principal Developer Operating Systems
> TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967
> Fujitsu Technology Solutions e-mail:
> juergen.gross@xxxxxxxxxxxxxx
> Domagkstr. 28 Internet: ts.fujitsu.com
> D-80807 Muenchen Company details:
> ts.fujitsu.com/imprint.html
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
>
cpupools-bug-on-move-to-self.diff
Description: Text Data
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|