George Dunlap wrote:
> Wow, I totally missed this thread.
> A couple of thoughts;
> Complicated solutions for the scheduler are a really bad idea. It's
> hard enough to predict and debug the side-effects of simple
> mechanisms; a complex mechanism is doomed to failure at the outset.
> I agree with Jeremy, that the guest shouldn't tell Xen to run a
> specific VCPU. At most it should be something along the lines of, "If
> you're going to run any vcpu from this domain, please run vcpu X."
> Jeremy, do you think that changes to the HV are necessary, or do you
> think that the existing solution is sufficient? It seems to me like
> hinting to the HV to do a directed yield makes more sense than making
> the same thing happen via blocking and event channels. OTOH, that
> gives the guest a lot more control over when and how things happen.
> Mukesh, did you see the patch by Xiantao Zhang a few days ago,
> regarding what to do on an HVM pause instruction? I thought the
> solution he had was interesting: when yielding due to a spinlock,
> rather than going to the back of the queue, just go behind one person.
> I think an impleentation of "yield_to" that might make sense in the
> credit scheduler is:
> * Put the yielding vcpu behind one cpu
> * If the yield-to vcpu is not running, pull it to the front within its
> priority. (I.e., if it's UNDER, put it at the front so it runs next;
> if it's OVER, make it the first OVER cpu.)
What Xiantao (and I internally) proposed is to implement temporary
coscheduling to solve spin-lock issues no matter FIFO spin-lock or ordinary
spin-lock, utilizing PLE exit (of course can work with PV spin-lock as well).
Here is our thinking (please refer to Xiantao's mail as well):
There are 2 typical solution to improve spin lock efficiency in
virtualization: A) lock holder preemption avoidance (or co-scheduling), and B)
helping locks which donates the spinning CPU cycles for overal system
#A solves spin-lock issue best, however it requires hardware assistance
to detect lock holder which is impratical, or coscheduling which is hard to be
implement efficiently and sacrifficing lots of scheduler flexibility. Neither
Xen or KVM implemented that.
#B (current Xen policy with PLE_yeilding) may help system performance,
however it may not help the performance of spinning guest. In some cases the
guest may become even worse due to long waiting (yield) of spin-lock. In some
cases it may get back additional CPU cycles (and performance) from VMM
scheduler complementing to its previous CPU cycle donation. In general, #B may
help system performance if it is right overcommitted, but it also hurt single
guest "speed" depending.
An additional issue in #B is that it may hurt FIFO spin lock (ticket
spin-lock in Linux and queued spin-lock in Windwos from Windows 2000), where
only the first-in waiting VCPU is able to get lock from OS design perspective.
Current PLE won't be able to know which one is the next (First In) waiting VCPU
and which one is lock holder.
Lock holder preemption avoidance is the right solution to fully utilize
hardware PLE capability, the current solution is simply hurting the
performance, and we need to improve it with solution #A.
Given that current hardware is unable to tell which VCPU is lock holder
or which one is the next (First In) waiting VCPU? Coscheduling may be the
choice. However, Coscheduling has that many side effect as well (somebody said
other company using co-scheduling is going to give up as well). This proposal
is to do temporary coscheduling on top of existing VMM scheduling. The details
When one or more of VCPU of a guest is waiting for a spin-lock, we can
temporary increase the priority of all VCPUs of the same geust to be scheduled
in for a short period. The period will be pretty small here to avoid the impact
of "coscheduling" to overall VMM scheduler. The current Xen patch simply
"boost" the VCPUs which already show great gain, but there may be more tuning
in optimized parameter for this algorithm.
I believe this will be a perfect solution to spin-lock issue with PLE
in for now (when VCPU # is not dramatically large. vConsolidate (mix of LInux
and Windows guest) shows 19% consolidation performance gain, that is so great
to believe even, but it is true :) We are investing more for different
workload, and will post new patch soon.
Of course if PV guest is running in PVM container, the PVed spin-lock
is still needed. But I am doubting its necessity if PVM is running on top of
HVM container :)
Xen-devel mailing list