This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and inter

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Subject: RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
Date: Fri, 10 Apr 2009 08:33:08 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 09 Apr 2009 17:35:05 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <49DE415F.3060002@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <de76405a0904090858g145f07cja3bd7ccbd6b30ce9@xxxxxxxxxxxxxx> <49DE415F.3060002@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acm5QuM+GjG1PnDXQmavzYQ+62LukAALxb5w
Thread-topic: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
>From: Jeremy Fitzhardinge
>Sent: 2009年4月10日 2:42
>George Dunlap wrote:
>> 1. Design targets
>> We have three general use cases in mind: Server 
>consolidation, virtual
>> desktop providers, and clients (e.g. XenClient).
>> For servers, our target "sweet spot" for which we will optimize is a
>> system with 2 sockets, 4 cores each socket, and SMT (16 
>logical cpus).
>> Ideal performance is expected to be reached at about 80% total system
>> cpu utilization; but the system should function reasonably well up to
>> a utilization of 800% (e.g., a load of 8).
>Is that forward-looking enough?  That hardware is currently available; 
>what's going to be commonplace in 2-3 years?

good point.

>> * HT-aware.
>> Running on a logical processor with an idle peer thread is not the
>> same as running on a logical processor with a busy peer thread.  The
>> scheduler needs to take this into account when deciding "fairness".
>Would it be worth just pair-scheduling HT threads so they're always 
>running in the same domain?

running same domain doesn't help fairness and instead, it worsens.

>> * Power-aware.
>> Using as many sockets / cores as possible can increase the 
>total cache
>> size avalable to VMs, and thus (in the absence of inter-VM sharing)
>> increase total computing power; but by keeping multiple sockets and
>> cores powered up, also increases the electrical power used by the
>> system.  We want a configurable way to balance between maximizing
>> processing power vs minimizing electrical power.
>I don't remember if there's a proper term for this, but what about 
>having multiple domains sharing the same scheduling context, so that a 
>stub domain can be co-scheduled with its main domain, rather 
>than having 
>them treated separately?

This is really desired.

>Also, a somewhat related point, some kind of directed schedule so that 
>when one vcpu is synchronously waiting on anohter vcpu, have 
>it directly 
>hand over its pcpu to avoid any cross-cpu overhead (including the 
>ability to take advantage of directly using hot cache lines).  That 
>would be useful for intra-domain IPIs, etc, but also inter-domain 
>context switches (domain<->stub, frontend<->backend, etc).

The hard part here is to find the hint on WHICH vcpu that given
cpu is waiting, which is not straightforward. Of course stub
domain is most possible example, but it may be already cleanly
addressed if above co-scheduling could be added? :-)

>> * We will also have an interface to the cpu-vs-electrical power.
>> This is yet to be defined.  At the hypervisor level, it will probably
>> be a number representing the "badness" of powering up extra cpus /
>> cores.  At the tools level, there will probably be the option of
>> either specifying the number, or of using one of 2/3 pre-defined
>> values {power, balance, green/battery}.
>Is it worth taking into account the power cost of cache misses vs hits?
>Do vcpus running on pcpus running at less than 100% speed 
>consume fewer 
>Is there any explicit interface to cpu power state management, 
>or would 
>that be decoupled?

now cpu power management has sysctl interface exposed and xenpm
is the tool using that interface so far.

Xen-devel mailing list