WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [xen-unstable test] 6374: regressions - FAIL

>>> On 14.03.11 at 11:52, Tim Deegan <Tim.Deegan@xxxxxxxxxx> wrote:
> At 10:39 +0000 on 14 Mar (1300099174), Jan Beulich wrote:
>> > I think this hang comes because although this code:
>> > 
>> >             cpu = cycle_cpu(CSCHED_PCPU(nxt)->idle_bias, nxt_idlers);
>> >             if ( commit )
>> >                CSCHED_PCPU(nxt)->idle_bias = cpu;
>> >             cpus_andnot(cpus, cpus, per_cpu(cpu_sibling_map, cpu));
>> > 
>> > removes the new cpu and its siblings from cpus, cpu isn't guaranteed to
>> > have been in cpus in the first place, and none of its siblings are
>> > either since nxt might not be its sibling.
>> 
>> I had originally spent quite a while to verify that the loop this is in
>> can't be infinite (i.e. there's going to be always at least one bit
>> removed from "cpus"), and did so again during the last half hour
>> or so.
> 
> I'm pretty sure there are possible passes through this loop that don't
> remove any cpus, though I haven't constructed the full history that gets
> you there.

Actually, while I don't think that this can happen, something else is
definitely broken here: The logic can select a CPU that's not in the
vCPU's affinity mask. How I managed to not note this when I
originally put this change together I can't tell. I'll send a patch in
a moment, and I think after that patch it's also easier to see that
each iteration will remove at least one bit.

>> > which guarantees that nxt will be removed from cpus, though I suspect
>> > this means that we might not pick the best HT pair in a particular core.
>> > Scheduler code is twisty and hurts my brain so I'd like George's
>> > opinion before checking anything in.
>> 
>> No - that was precisely done the opposite direction to get
>> better symmetry of load across all CPUs. With what you propose,
>> idle_bias would become meaningless.
> 
> I don't think see why it would.  As I said, having picked a core we
> might not iterate to pick the best cpu within that core, but the
> round-robining effect is still there.  And even if not I figured a
> hypervisor crash is worse than a suboptimal scheduling decision. :)

Sure. Just that this code has been there for quite a long time, and
it would be really strange to only now see it start producing hangs
(which apparently aren't that difficult to reproduce - iirc a similar
one was sent around by Ian a few days earlier).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel