At 10:39 +0000 on 14 Mar (1300099174), Jan Beulich wrote:
> > I think this hang comes because although this code:
> > cpu = cycle_cpu(CSCHED_PCPU(nxt)->idle_bias, nxt_idlers);
> > if ( commit )
> > CSCHED_PCPU(nxt)->idle_bias = cpu;
> > cpus_andnot(cpus, cpus, per_cpu(cpu_sibling_map, cpu));
> > removes the new cpu and its siblings from cpus, cpu isn't guaranteed to
> > have been in cpus in the first place, and none of its siblings are
> > either since nxt might not be its sibling.
> I had originally spent quite a while to verify that the loop this is in
> can't be infinite (i.e. there's going to be always at least one bit
> removed from "cpus"), and did so again during the last half hour
> or so.
I'm pretty sure there are possible passes through this loop that don't
remove any cpus, though I haven't constructed the full history that gets
you there. But the cpupool patches you suggest in your other email look
like much stronger candidates for this hang.
> > which guarantees that nxt will be removed from cpus, though I suspect
> > this means that we might not pick the best HT pair in a particular core.
> > Scheduler code is twisty and hurts my brain so I'd like George's
> > opinion before checking anything in.
> No - that was precisely done the opposite direction to get
> better symmetry of load across all CPUs. With what you propose,
> idle_bias would become meaningless.
I don't think see why it would. As I said, having picked a core we
might not iterate to pick the best cpu within that core, but the
round-robining effect is still there. And even if not I figured a
hypervisor crash is worse than a suboptimal scheduling decision. :)
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)
Xen-devel mailing list