Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-

To:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject:	Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
From:	George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Date:	Mon, 19 Jan 2009 17:15:16 +0000
Cc:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Juergen Gross <juergen.gross@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date:	Mon, 19 Jan 2009 09:15:43 -0800
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=Ls82cg164jG1K8dhvin3aFm17hh9spq7Vt7fwi+XG9g=; b=tao6P2uOhOym9qcI8g2qtP/NebXzhklFyB/KlwNFuoWtwaSFdGRCtIlJn0tSLhGkcO 1DzTuyqJfu2Xhg11vXInxnVbin4IzrYE96pNc7//vYfWWVNYL029iJbOiLZUNtQYneR0 np6ZGG+JQcWVcUZ5Dd2sTIlbc2WVhwc3T525w=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=AI/PloWcXqkK0jrhVXa6VgN5GlwC1p9U/RrY/sSPrhNeTy1GD3p8n0I+qIa5d9IoA8 +bI5i07MG983kYH+eEnNOplDjiaTXuiPcFBg/qfLel1a1J0lwOapJ++/uBou5KUQKQOL kQJKskdizy8NAWOKPrMI4l+8m8xvtNb4OvoK8=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<4970C6D3.2080206@xxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<C59609B4.21084%keir.fraser@xxxxxxxxxxxxx> <4970C6D3.2080206@xxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On Fri, Jan 16, 2009 at 5:41 PM, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
> Yes, that's more or less right.  Each lock has a count of how many cpus are
> waiting for the lock; if its non-zero on unlock, the unlocker kicks all the
> waiting cpus via IPI.  There's a per-cpu variable of "lock I am waiting
> for"; the kicker looks at each cpu's entry and kicks it if its waiting for
> the lock being unlocked.
>
> The locking side does the expected "spin for a while, then block on
> timeout".  The timeout is settable if you have the appropriate debugfs
> option enabled (which also produces quite a lot of detailed stats about
> locking behaviour).  The IPI is never delivered as an event BTW; the locker
> uses the event poll hypercall to block until the event is pending (this
> hypercall had some performance problems until relatively recent versions of
> Xen; I'm not sure which release versions has the fix).
>
> The lock itself is a simple byte spinlock, with no fairness guarantees; I'm
> assuming (hoping) that the pathological cases that ticket locks were
> introduced to solve will be mitigated by the timeout/blocking path (and/or
> less likely in a virtual environment anyway).
>
> I measured a small performance improvement within the domain with this patch
> (kernbench-type workload), but an overall 10% reduction in system-wide CPU
> use with multiple competing domains.

This is in the pv-ops kernel; is it in the Xen 2.6.18 kernel yet?

The advantage of the block approach over yielding is that you don't
have these crazy priority problems:  The reason v0 (who is waiting for
the spinlock) is running right now and v1 (which holds the spinlock)
is not is usually because v1 is out of credits and v0 isn't; so
calling "schedule" often just results in v0 being chosen as the "best
candidate" over again.  The solution in the patch I sent is to
temporarily reduce the priority on a yield; but that's inherently a
little unpredictable.  (Another option might be to re-balance credits
to vcpus on a yield.)

The disadvantage of this approach is that it is rather complicated,
and would have to be re-implemented for each OS.  In theory it should
be able to be implemented in Windows, but it may not be that simple.
And it's got to be implemented all-or-nothing for each spinlock; i.e.,
if any caller of the spin_lock() for a given lock blocks, all callers
of spin_unlock() on that lock need to know to wake the blocker up.  I
don't expect that to be a problem in Windows, but it may be.

Another thing to consider is how the approach applies to a related
problem, that of "syncronous" IPI function calls: i.e., when v0 sends
an IPI to v1 to do something, and spins waiting for it to be done,
expecting it to be finished pretty quickly.  But v1 is over credits,
so it doesn't get to run, and v0 burns its credits waiting.

At any rate, I'm working on the scheduler now, and I'll be considering
the "don't-deschedule" option in due time. :-)

Peace,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part