This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Linux spin lock enhancement on xen

To: Mukesh Rathor <mukesh.rathor@xxxxxxxxxx>
Subject: Re: [Xen-devel] Linux spin lock enhancement on xen
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Wed, 18 Aug 2010 09:37:17 -0700
Cc: "Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date: Wed, 18 Aug 2010 09:38:03 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100817185807.10628599@xxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20100816183357.08623c4c@xxxxxxxxxxxxxxxxxxxx> <4C6ACA28.7030104@xxxxxxxx> <20100817185807.10628599@xxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100720 Fedora/3.1.1-1.fc13 Lightning/1.0b2pre Thunderbird/3.1.1
 On 08/17/2010 06:58 PM, Mukesh Rathor wrote:
>> How does this compare with Jeremy's existing paravirtualised spinlocks
>> in pv_ops? They required no hypervsior changes. Cc'ing Jeremy.
>> -- Keir
> Yeah, I looked at it today. What pv-ops is doing is forcing a yield
> via a fake irq/event channel poll, after storing the lock pointer in
> a per cpu area. The unlock'er then IPIs the vcpus waiting. The lock
> holder may not be running tho, and there is no hint to hypervisor
> to run it. So you may have many waitor's come and leave for no
> reason.

(They don't leave for no reason; they leave when they're told they can
take the lock next.)

I don't see why the guest should micromanage Xen's scheduler decisions. 
If a VCPU is waiting for another VCPU and can put itself to sleep in the
meantime, then its up to Xen to take advantage of that newly freed PCPU
to schedule something.  It may decide to run something in your domain
that's runnable, or it may decide to run something else.  There's no
reason why the spinlock holder is the best VCPU to run overall, or even
the best VCPU in your domain.

My view is you should just put any VCPU which has nothing to do to
sleep, and let Xen sort out the scheduling of the remainder.

> To me this is more of an overhead than needed in a guest. In my
> approach, the hypervisor is hinted exactly which vcpu is the 
> lock holder.

The slow path should be rare.  In general locks should be taken
uncontended, or with brief contention.  Locks should be held for a short
period of time, so risk of being preempted while holding the lock should
be low.  The effects of the preemption a pretty disastrous, so we need
to handle it, but the slow path will be rare, so the time spent handling
it is not a critical factor (and can be compensated for by tuning the
timeout before dropping into the slow path).

>  Often many VCPUs are pinned to a set of physical cpus
> due to licensing and other reasons. So this really helps a vcpu
> that is holding a spin lock, wanting to do some possibly real
> time work, get scheduled and move on.

I'm not sure I understand this point.  If you're pinning vcpus to pcpus,
then presumably you're not going to share a pcpu among many, or any
vcpus, so the lock holder will be able to run any time it wants.  And a
directed yield will only help if the lock waiter is sharing the same
pcpu as the lock holder, so it can hand over its timeslice (since making
the directed yield preempt something already running in order to run
your target vcpu seems rude and ripe for abuse).

>  Moreover, number of vcpus is
> going up pretty fast.

Presumably the number of pcpus are also going up, so the amount of
per-pcpu overcommit is about the same.


Xen-devel mailing list