This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Linux spin lock enhancement on xen

To: George Dunlap <dunlapg@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: RE: [Xen-devel] Linux spin lock enhancement on xen
From: "Dong, Eddie" <eddie.dong@xxxxxxxxx>
Date: Wed, 25 Aug 2010 09:03:56 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>, "Dong, Eddie" <eddie.dong@xxxxxxxxx>
Delivery-date: Tue, 24 Aug 2010 18:09:28 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTin_HTtxL9wB9JcxDWFeGGYHKHfBxGW4dPrYKDGb@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C6C0C3D.2070508@xxxxxxxx> <C891D252.1E4BD%keir.fraser@xxxxxxxxxxxxx> <AANLkTin_HTtxL9wB9JcxDWFeGGYHKHfBxGW4dPrYKDGb@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: ActDY5y+w4LxirVdRp6Sc3pm0pXzCwAjAyjA
Thread-topic: [Xen-devel] Linux spin lock enhancement on xen
George Dunlap wrote:
> Wow, I totally missed this thread.
> A couple of thoughts;
> Complicated solutions for the scheduler are a really bad idea.  It's
> hard enough to predict and debug the side-effects of simple
> mechanisms; a complex mechanism is doomed to failure at the outset.
> I agree with Jeremy, that the guest shouldn't tell Xen to run a
> specific VCPU.  At most it should be something along the lines of, "If
> you're going to run any vcpu from this domain, please run vcpu X."
> Jeremy, do you think that changes to the HV are necessary, or do you
> think that the existing solution is sufficient?  It seems to me like
> hinting to the HV to do a directed yield makes more sense than making
> the same thing happen via blocking and event channels.  OTOH, that
> gives the guest a lot more control over when and how things happen.
> Mukesh, did you see the patch by Xiantao Zhang a few days ago,
> regarding what to do on an HVM pause instruction?  I thought the
> solution he had was interesting: when yielding due to a spinlock,
> rather than going to the back of the queue, just go behind one person.
>  I think an impleentation of "yield_to" that might make sense in the
> credit scheduler is:
> * Put the yielding vcpu behind one cpu
> * If the yield-to vcpu is not running, pull it to the front within its
> priority.  (I.e., if it's UNDER, put it at the front so it runs next;
> if it's OVER, make it the first OVER cpu.)
> Thoughts?

        What Xiantao (and I internally) proposed is to implement temporary 
coscheduling to solve spin-lock issues no matter FIFO spin-lock or ordinary 
spin-lock, utilizing PLE exit (of course can work with PV spin-lock as well). 
Here is our thinking (please refer to Xiantao's mail as well):

        There are 2 typical solution to improve spin lock efficiency in 
virtualization: A) lock holder preemption avoidance (or co-scheduling), and B) 
helping locks which donates the spinning CPU cycles for overal system 

        #A solves spin-lock issue best, however it requires hardware assistance 
to detect lock holder which is impratical, or coscheduling which is hard to be 
implement efficiently and sacrifficing lots of scheduler flexibility. Neither 
Xen or KVM implemented that.

        #B (current Xen policy with PLE_yeilding) may help system performance, 
however it may not help the performance of spinning guest. In some cases the 
guest may become even worse due to long waiting (yield) of spin-lock. In some 
cases it may get back additional CPU cycles (and performance) from VMM 
scheduler complementing to its previous CPU cycle donation. In general, #B may 
help system performance if it is right overcommitted, but it also hurt single 
guest "speed" depending.
        An additional issue in #B is that it may hurt FIFO spin lock (ticket 
spin-lock in Linux and queued spin-lock in Windwos from Windows 2000), where 
only the first-in waiting VCPU is able to get lock from OS design perspective. 
Current PLE won't be able to know which one is the next (First In) waiting VCPU 
and which one is lock holder.

[Proposed optimization]
        Lock holder preemption avoidance is the right solution to fully utilize 
hardware PLE capability, the current solution is simply hurting the 
performance, and we need to improve it with solution #A.

        Given that current hardware is unable to tell which VCPU is lock holder 
or which one is the next (First In) waiting VCPU? Coscheduling may be the 
choice. However, Coscheduling has that many side effect as well (somebody said 
other company using co-scheduling is going to give up as well). This proposal 
is to do temporary coscheduling on top of existing VMM scheduling. The details 
        When one or more of VCPU of a guest is waiting for a spin-lock, we can 
temporary increase the priority of all VCPUs of the same geust to be scheduled 
in for a short period. The period will be pretty small here to avoid the impact 
of "coscheduling" to overall VMM scheduler. The current Xen patch simply 
"boost" the VCPUs which already show great gain, but there may be more tuning 
in optimized parameter for this algorithm.
        I believe this will be a perfect solution to spin-lock issue with PLE 
in for now (when VCPU # is not dramatically large. vConsolidate (mix of LInux 
and Windows guest) shows 19% consolidation performance gain, that is so great 
to believe even, but it is true :)  We are investing more for different 
workload, and will post new patch soon.
        Of course if PV guest is running in PVM container, the PVed spin-lock 
is still needed. But I am doubting its necessity if PVM is running on top of 
HVM container :)

Thx, Eddie

Xen-devel mailing list