RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals a

To:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject:	RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
From:	"Tian, Kevin" <kevin.tian@xxxxxxxxx>
Date:	Sat, 11 Apr 2009 17:52:57 +0800
Accept-language:	en-US
Acceptlanguage:	en-US
Cc:	George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Sat, 11 Apr 2009 02:53:32 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<49DF708F.6070102@xxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<de76405a0904090858g145f07cja3bd7ccbd6b30ce9@xxxxxxxxxxxxxx> <49DE415F.3060002@xxxxxxxx> <0A882F4D99BBF6449D58E61AAFD7EDD61036A60D@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <49DF708F.6070102@xxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	Acm595vy7Y4z2p75ROKpv3HpWG4qIAAjuhHg
Thread-topic:	[Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.

>From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx] 
>Sent: 2009年4月11日 0:15
>>>> * HT-aware.
>>>>
>>>> Running on a logical processor with an idle peer thread is not the
>>>> same as running on a logical processor with a busy peer 
>thread.  The
>>>> scheduler needs to take this into account when deciding "fairness".
>>>>   
>>>>       
>>> Would it be worth just pair-scheduling HT threads so they're always 
>>> running in the same domain?
>>>     
>>
>> running same domain doesn't help fairness and instead, it worsens.
>>   
>
>I don't know what the performance characteristics of modern-HT is, but
>in P4-HT the throughput of a given thread was very dependent 
>on what the
>other thread was doing. If its competing with some other arbitrary
>domain, then its hard to make any estimates about what the 
>throughput of
>a given vcpu's thread is.
>
>If we present them as sibling pairs to guests, then it becomes 
>the guest
>OS's problem (ie, we don't try to hide the true nature of these pcpus).
>That's fairer for the guest, because they know what they're 
>getting, and
>Xen can charge the guest for cpu use on a thread-pair, rather than
>trying to work out how the two threads compete. In other words, if only
>one thread is running, then it can charge 
>max-thread-throughput; if both
>are running, it can charge max-core-throughput (possibly scaled by
>whatever performance mode the core is running in).

It bases on one assumption that workloads within VM is more HT
friendly than workloads cross VMs. Maybe it's true in some cases
but I don't think it a strong point in most deployments.

The major worry to me is added complexity by exposing such sibling 
pairs to guest. You then have to schedule at core level for that VM, 
since the implication of HT should be always maintained or else 
reverse effect could be seen when VM does try to utilize that topology.
This brings trouble to scheduler. Not all VMs are guest SMP, and
then the VM being exposed with HT is actually treated unfair as one
more limitation is imposed that partial idle core can't be used by it 
while other VMs is immune. Another tricky part is that you have to 
gang schedule that VM, which is in concept fancy but no one has 
come up a solid implementaion in real.

Above is why I said the fairness could be worse in a general level.
It could be useful in some specific scenario. one is in client, where
however it's better to expose full topology instead of HT. the other
is some mission critical usages where cpu resource are paritioned
and thus to expose HT could be also useful.


>>> Also, a somewhat related point, some kind of directed 
>schedule so that 
>>> when one vcpu is synchronously waiting on anohter vcpu, have 
>>> it directly 
>>> hand over its pcpu to avoid any cross-cpu overhead (including the 
>>> ability to take advantage of directly using hot cache lines).  That 
>>> would be useful for intra-domain IPIs, etc, but also inter-domain 
>>> context switches (domain<->stub, frontend<->backend, etc).
>>>     
>>
>> The hard part here is to find the hint on WHICH vcpu that given
>> cpu is waiting, which is not straightforward. Of course stub
>> domain is most possible example, but it may be already cleanly
>> addressed if above co-scheduling could be added? :-)
>>   
>
>I'm being unclear by conflating two issues.
>
>One is that when dom0 (or driver domain) does some work on behalf of a
>guest, it seems like it would be useful for the time used to 
>be credited
>against the guest rather than against dom0.
>
>My thought is that, rather than having the scheduler parameters be the
>implicit result of "vcpu A belongs to domain X, charge X", 
>each vcpu has
>a charging domain which can be updated via (privileged) hypercall. When
>dom0 is about to do some work, it updates the charging domain
>accordingly (with some machinery to make that a per-task 
>property within
>the kernel so that task context switches update the vcpu state
>appropriately).
>
>A further extension would be the idea of charging grants, 
>where domain A
>could grant domain B charging rights, and B could set its vcpus to
>charge A as an unprivileged operation. As with grant tables, revocation
>poses some interesting problems.
>
>This is a generalization of coscheduled stub domains, because you could
>achieve the same effect by making the stub domain simply switch all its
>vcpus to charge its main domain.


Yup. This is one long missing part in Xen. Current accounting mechanism
like in xentop is raw incomplete. In this part KVM could be easier under the
cap of container.


>
>How to schedule vcpus? They could either be scheduled as if they were
>part of the other domain; or be scheduled with their "home" domain, but
>their time spent is charged against the other domain. The former is
>effectively priority inheritance, and raises all the the 
>normal issues -
>but it would be appropriate for co-scheduled stub domains. The latter
>makes more sense for dom0, but its less clear what it actually means:
>does it consume any home domain credits? What happens if the other
>domain's credits are all consumed? Could two domains collude 
>to get more
>than their fair share of cpu?
>
>
>
>The second issue is trying to share pcpu resources between vcpus where
>appropriate. The obvious case is doing some kind of cross-domain copy
>operation, where the data could well be hot in cache, so if you use the
>same pcpu you can just get cache hits. Of course there's the tradeoff
>that you're necessarily serialising things which could be done in
>parallel, so perhaps it doesn't work well in practice.
>
>J
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and inter