Re: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals a

To:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject:	Re: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
From:	George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Date:	Wed, 15 Apr 2009 15:29:56 +0100
Cc:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Wed, 15 Apr 2009 07:30:22 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=qyTDoZX3dRK4czPHZEHz8PhqXkSHzXRiH137MtEwg0I=; b=XHhYjQnTqMnBQOPzFP5xYW/kZDKSVMEn0Ek1nYaqsTEi2rUyaW2oo1usysBryJA9Nh NDZX/Bsshm+9PwEOXxS64Q/un9d7VABWGPZ/nwPCpKmdnlJSCl5WJ1RRTMaqpyjCvUE5 8ieuLfXqcffqdOkuPa+HE67d58KdOoT4hJRCw=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=nlDjpQGfAJhPlwOQn9UAj7KRgJmKyr3g40LgIbIUv8xgqOHcknxF/3EuzCiAzJLreq Z7mGdBrDPyHtWv72JyktorAu9hpalQLSXmDfm8NsZ5NUIyh4YrKZSxb4qU9QGw7k6vds 6MN+JOEhB8C8NJFqB3PQuakB3UpG6ODNSvKr4=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<49DE415F.3060002@xxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<de76405a0904090858g145f07cja3bd7ccbd6b30ce9@xxxxxxxxxxxxxx> <49DE415F.3060002@xxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On Thu, Apr 9, 2009 at 7:41 PM, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
> I don't remember if there's a proper term for this, but what about having
> multiple domains sharing the same scheduling context, so that a stub domain
> can be co-scheduled with its main domain, rather than having them treated
> separately?

I think it's been informally called "co-scheduling". :-)  Yes, I'm
going to be looking into that.  One of the things that makes it a
little less easy is that (as I understand it) there is only one stub
domain "vcpu" per VM, which is shared by all a VM's vcpus.

> Also, a somewhat related point, some kind of directed schedule so that when
> one vcpu is synchronously waiting on anohter vcpu, have it directly hand
> over its pcpu to avoid any cross-cpu overhead (including the ability to take
> advantage of directly using hot cache lines).  That would be useful for
> intra-domain IPIs, etc, but also inter-domain context switches
> (domain<->stub, frontend<->backend, etc).

The only problem is if the "service" domain has other work that it may
do after it's done.  In my tests on a 2-core box doing scp to an HVM
guest, it's faster if I pin dom0 and domU to separate cores than if I
pin them to the same core.  Looking at the traces, it seems as though
after dom0 has woken up domU, it spends another 50K cycles or so
before blocking.  Stub domains may behave differently; in any case,
it's something that needs experimentation to decide.

>> For example, one could give dom0 a "reservation" of 50%, but leave the
>> weight at 256.  No matter how many other VMs run with a weight of 256,
>> dom0 will be guaranteed to get 50% of one cpu if it wants it.
>>
>
> How does the reservation interact with the credits?  Is the reservtion in
> addition to its credits, or does using the reservation consume them?

I think your question is, how does the reservation interact with
weight?  (Credits is the mechanism to implement both.)  The idea is
that a VM would get either an amount of cpu proportional to its
weight, or the reservation, whichever is greater.

So suppose that VMs A, B, and C have weights of 256 on a system with 1
core, no reservations.

If A and B are burning as much cpu as they can and C is idle, then A
and B should get 50% each.

If all of them (A,B,C) are burning as much cpu as they can, they will
should 33% each.

Now suppose that we give B a reservation of 40%.

If A and B are burning as much as they can and C is idle, then A and B
should again get 50% each.

However, if all of them are burning as much as they can, then B should
get 40% (its reservation), and A and C should each get 30% (i.e., the
remaining 60% divided by weight).

Does that make sense?

> Is it worth taking into account the power cost of cache misses vs hits?

If we have a general framework for "goodness" and "badness", and we
have a way of measuring cache hits / misses, we should be able to
extend the scheduler to do so.

> Do vcpus running on pcpus running at less than 100% speed consume fewer
> credits?

Yes, we'll also need to account for cpu frequency states in our accounting.

> Is there any explicit interface to cpu power state management, or would that
> be decoupled?

I think we might be able to fold this in; it depends on how
complicated things get.  Just as one can imagine a "badness factor" to
powering up a second CPU, which we can weight against the "badness" of
vcpus waiting on the runqueue, we can imagine a "badness factor" of
running at a higher cpu HZ that can be weighed against either powering
up extra cores / cpus or having to wait on the runqueue.

Let's start with a basic "badness factor" and see if we can get it
worked out properly, and then look at extending it to these sorts of
things.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and inte