[Xen-devel] [RFC] Scheduler work, part 1: High-level goals and i

To:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	[Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
From:	George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Date:	Thu, 9 Apr 2009 16:58:35 +0100
Delivery-date:	Thu, 09 Apr 2009 08:59:06 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=vZzTVYasM5wNAsHtgU43IBREBfcI0PG4WIl+BvYdusg=; b=m1/lrfiz45A+KTNYMhIzJsyt6mjgt3US4RLXb7Hyxd3VqM8jktDtN6iWfw6kDb23cM VaMctUmMVjAH+RQ2xagXGrQDbrowU3UAatsFa35Tj70RJYsnm/Zb2E8ObmJDDXhnsfel klalKaRyTjSJzcxDkUGHvRZmNdq/pfPV5TpfM=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; b=RV7dHXUFPMHb7hgT+tHgImbMqWXtAQkvuIikEJZhLbbvSfwu+IFi7ZlNxLg228WWJJ Nea/Iy/GdgGHSiHNeIRZaJ8N8TGDJmh8HtQO0qLeIXkxILLGq2ENt7zvqbX0FVsl5Vy3 sG/u8nWAjVRVuOb5aDkA/CX43ZX23yg1HkU98=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

In the interest of openness (as well as in the interest of taking
advantage of all the smart people out there), I'm posting a very early
design prototype of the credit2 scheduler.  We've had a lot of
contributors to the scheduler recently, so I hope that those with
interest and knowledge will take a look and let me know what they
think at a high level.

This first e-mail will discuss the overall goals: the target "sweet
spot" use cases to consider, measurable goals for the scheduler, and
the target interface / features.  This is for general comment.

The subsequent e-mail(s?) will include some specific algorithms and
changes currently in consideration, as well as some bleeding-edge
patches.  This will be for people who have a specific interest in the
details of the scheduling algorithms.

Please feel free to comment / discuss / suggest improvements.

1. Design targets

We have three general use cases in mind: Server consolidation, virtual
desktop providers, and clients (e.g. XenClient).

For servers, our target "sweet spot" for which we will optimize is a
system with 2 sockets, 4 cores each socket, and SMT (16 logical cpus).
Ideal performance is expected to be reached at about 80% total system
cpu utilization; but the system should function reasonably well up to
a utilization of 800% (e.g., a load of 8).

For virtual desktop systems, we will have a large number of
interactive VMs with a lot of shared memory.  Most of these will be
single-vcpu, or at most 2 vcpus.

For client systems, we expect to have 3-4 VMs (including dom0).
Systems will probably ahve a single socket with 2 cores and SMT (4
logical cpus).  Many VMs will be using PCI pass-through to access
network, video, and audio cards.  They'll also be running video and
audio workloads, which are extremely latency-sensitive.

2. Design goals

For each of the target systems and workloads above, we have some
high-level goals for the scheduler:

* Fairness.  In this context, we define "fairness" as the ability to
get cpu time proportional to weight.

We want to try to make this true even for latency-sensitive workloads
such as networking, where long scheduling latency can reduce the
throughput, and thus the total amount of time the VM can effectively
use.

* Good scheduling for latency-sensitive workloads.

To the degree we are able, we want this to be true even those which
use a significant amount of cpu power: That is, my audio shouldn't
break up if I start a cpu hog process in the VM playing the audio.

* HT-aware.

Running on a logical processor with an idle peer thread is not the
same as running on a logical processor with a busy peer thread.  The
scheduler needs to take this into account when deciding "fairness".

* Power-aware.

Using as many sockets / cores as possible can increase the total cache
size avalable to VMs, and thus (in the absence of inter-VM sharing)
increase total computing power; but by keeping multiple sockets and
cores powered up, also increases the electrical power used by the
system.  We want a configurable way to balance between maximizing
processing power vs minimizing electrical power.

3. Target interface:

The target interface will be similar to credit1:

* The basic unit is the VM "weight".  When competing for cpu
resources, VMs will get a share of the resources proportional to their
weight.  (e.g., two cpu-hog workloads with weights of 256 and 512 will
get 33% and 67% of the cpu, respectively).

* Additionally, we will be introducing a "reservation" or "floor".
  (I'm open to name changes on this one.)  This will be a minimum
  amount of cpu time that a VM can get if it wants it.

For example, one could give dom0 a "reservation" of 50%, but leave the
weight at 256.  No matter how many other VMs run with a weight of 256,
dom0 will be guaranteed to get 50% of one cpu if it wants it.

* The "cap" functionality of credit1 will be retained.

This is a maximum amount of cpu time that a VM can get: i.e., a VM
with a cap of 50% will only get half of one cpu, even if the rest of
the system is completely idle.

* We will also have an interface to the cpu-vs-electrical power.

This is yet to be defined.  At the hypervisor level, it will probably
be a number representing the "badness" of powering up extra cpus /
cores.  At the tools level, there will probably be the option of
either specifying the number, or of using one of 2/3 pre-defined
values {power, balance, green/battery}.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface