This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and inter

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Subject: RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
Date: Thu, 16 Apr 2009 12:58:50 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 15 Apr 2009 21:59:55 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <de76405a0904150807y7d1e22aeu3fdf6789e92177e0@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <de76405a0904090858g145f07cja3bd7ccbd6b30ce9@xxxxxxxxxxxxxx> <0A882F4D99BBF6449D58E61AAFD7EDD61036A5EC@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <de76405a0904150807y7d1e22aeu3fdf6789e92177e0@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acm929knhaLpOUsESSyIRpriTDijKwAZ/V6w
Thread-topic: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
>From: George Dunlap
>Sent: 2009年4月15日 23:07
>> Do you mean that same elapsed time in above two scenarios will be
>> translated into different credits?
>Yes.  Ideally, we want to give "processing power" based on weight.
>But the "processing power" of a thread whose sibling is idle is
>significantly more than the "processing power" of a thread whose
>sibling is running.  (Same thing possibly for cpu frequency scaling.)
>So we'd want to arrange the credits such that VMs with equal weight
>equal "processing power", not just equal "time on a logical cpu".

Yup, this is one interesting part to be further explored. 

>> Xen3.4 now supports "sched_smt_power_savings" (both boot option
>> and touchable by xenpm) to change power/performance preference.
>> It's simple implementation to simply reverse the span order from
>> existing package->core->thread to thread->core->package. More
>> fine-grained flexibility could be given in future if 
>hierarchical scheduling
>> concept could be more clearly constructed like domain scheduler
>> in Linux.
>I haven't looked at this code.  From your description here it sounds
>like a sort of a simple hack to get the effect we want (either
>spreading things out or pushing them together) -- is that correct?

yes, spread first vs. fill first.

>My general feeling is that hacks are good short-term solutions, but
>not long-term.  Things always get more complicated, and often have
>unexpected side-effects.  I think since we're doing scheduler work,
>it's worth it to try to see if we can actually solve the
>power/performance problem.

Agree. Have you look at Linux side domain scheduler idea? Not sure
whether that topology based multi-level scheduler could help or over-
complicate here.

>> imo, weight is not strictly translated into the care for latency. any
>> elaboration on that? I remembered that previously Nishiguchi-san
>> gave idea to boost credit, and Disheng proposed static priority.
>> Maybe you can make a summary to help people how latency would
>> be exactly ensured in your proposal
>All of this needs to be run through experiments.  So far, I've had
>really good success with putting waking VMs in "boost" priority for
>1ms if they still have credits.  (And unlike the credit scheduler, I
>try to make sure that a VM rarely runs out of credits.)

btw, accurate accounting (at context switch instead of current tick-
based) should be also incorporated, if you do want to manipulate 
credits in fine-grain.

>> there should be some way to adjust or limit usage of 
>'reservation' when
>> multiple vcpus both claim a desire which however sum up to some
>> exceeding cpu's computing power or weaken your general
>> 'weight-as-basic-unit' idea?
>All "reservations" on the system must add up to less than the total
>processing power of the system.  So a system with 2 cores can't have a
>sum of reservations more than 200%.  Xen will check this when setting
>the reservation and return an appropriate error message if necessary.

return error, or scale previous successful reservations down?

>>>* We will also have an interface to the cpu-vs-electrical power.
>>>This is yet to be defined.  At the hypervisor level, it will probably
>>>be a number representing the "badness" of powering up extra cpus /
>>>cores.  At the tools level, there will probably be the option of
>>>either specifying the number, or of using one of 2/3 pre-defined
>>>values {power, balance, green/battery}.
>> Not sure how that number will be defined. Maybe we can follow
>> current way to just add individual name-based options matching
>> its purpose (such as migration_cost and sched_smt_power_savings...)
>At the scheduler level, I was thinking along the lines of
>"core_power_up_cost".  This would be comparible to the cost of having
>things waiting on the runqueue.  So (for example) if the cost was 0.1,

who decides what the cost could be? how is it easily useful to an
end customer?

>then when the load on the current processors reached 1.1, then it
>would power up another core.  You could set it to 0.5 or 1.0 to save

what do you mean by 'power up'? boost its frequency or migrate load
to that core?

>more power (at the cost of some performance).  I think defining it
>that way is the closest to what you really want: a way to define the
>performance impact vs power consumption.

I'm still a bit confused here. What (at which situation) is translated into
a comparable value to the "core_power_up_cost"?

>Obviously at the user interface level, we might have something more
>manageable: e.g., {power, balance, green} => {0, 0.2, 0.8} or
>something like that.

Then how is this triple mapped to above "core_power_up_cost"?

>But as I said, the *goal* is to have a useful configurable interface;
>the implementation will depend on what actually can be made to work in

I agree with this goal, but not convinced by above example. :-)

Xen-devel mailing list