This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and inter

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>
Subject: RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
From: Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>
Date: Fri, 10 Apr 2009 18:16:20 +0100
Accept-language: en-US
Acceptlanguage: en-US
Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 10 Apr 2009 10:16:50 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <49DF708F.6070102@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <de76405a0904090858g145f07cja3bd7ccbd6b30ce9@xxxxxxxxxxxxxx> <49DE415F.3060002@xxxxxxxx> <0A882F4D99BBF6449D58E61AAFD7EDD61036A60D@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <49DF708F.6070102@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acm596nYrdHKNuOmRXW6VwLlP+rKwgABWn1A
Thread-topic: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
> I don't know what the performance characteristics of modern-HT is, but
> in P4-HT the throughput of a given thread was very dependent on what the
> other thread was doing. If its competing with some other arbitrary
> domain, then its hard to make any estimates about what the throughput
> of a given vcpu's thread is.

The original Northwood P4's were fairly horrible as regards performance 
predictability, but things got considerably better with later steppings. 
Nehalem has some interesting features that ought to make it better yet.

Presenting sibling pairs to guests is probably preferable (it avoids any 
worries about side channel crypto attacks), but I certainly wouldn't restrict 
it to just that: server hosted desktop workloads often involve large numbers of 
single VCPU guests, and you want every logical processor available.

Scaling the accounting if two threads share a core is a good way of ensuring 
things tend toward longer term fairness.

Possibly having two modes of operation would be good thing:

 1. explicitly present HT to guests and gang schedule threads

 2. normal free-for-all with HT aware accounting.

Of course, #1 isn't optimal if guests may migrate between HT and non-HT systems.


> If we present them as sibling pairs to guests, then it becomes the
> guest
> OS's problem (ie, we don't try to hide the true nature of these pcpus).
> That's fairer for the guest, because they know what they're getting,
> and
> Xen can charge the guest for cpu use on a thread-pair, rather than
> trying to work out how the two threads compete. In other words, if only
> one thread is running, then it can charge max-thread-throughput; if
> both
> are running, it can charge max-core-throughput (possibly scaled by
> whatever performance mode the core is running in).
> >>> * Power-aware.
> >>>
> >>> Using as many sockets / cores as possible can increase the
> >>>
> >> total cache
> >>
> >>> size avalable to VMs, and thus (in the absence of inter-VM sharing)
> >>> increase total computing power; but by keeping multiple sockets and
> >>> cores powered up, also increases the electrical power used by the
> >>> system.  We want a configurable way to balance between maximizing
> >>> processing power vs minimizing electrical power.
> >>>
> >>>
> >> I don't remember if there's a proper term for this, but what about
> >> having multiple domains sharing the same scheduling context, so that
> a
> >> stub domain can be co-scheduled with its main domain, rather
> >> than having
> >> them treated separately?
> >>
> >
> > This is really desired.
> >
> >
> >> Also, a somewhat related point, some kind of directed schedule so
> that
> >> when one vcpu is synchronously waiting on anohter vcpu, have
> >> it directly
> >> hand over its pcpu to avoid any cross-cpu overhead (including the
> >> ability to take advantage of directly using hot cache lines).  That
> >> would be useful for intra-domain IPIs, etc, but also inter-domain
> >> context switches (domain<->stub, frontend<->backend, etc).
> >>
> >
> > The hard part here is to find the hint on WHICH vcpu that given
> > cpu is waiting, which is not straightforward. Of course stub
> > domain is most possible example, but it may be already cleanly
> > addressed if above co-scheduling could be added? :-)
> >
> I'm being unclear by conflating two issues.
> One is that when dom0 (or driver domain) does some work on behalf of a
> guest, it seems like it would be useful for the time used to be
> credited
> against the guest rather than against dom0.
> My thought is that, rather than having the scheduler parameters be the
> implicit result of "vcpu A belongs to domain X, charge X", each vcpu
> has
> a charging domain which can be updated via (privileged) hypercall. When
> dom0 is about to do some work, it updates the charging domain
> accordingly (with some machinery to make that a per-task property
> within
> the kernel so that task context switches update the vcpu state
> appropriately).
> A further extension would be the idea of charging grants, where domain
> A
> could grant domain B charging rights, and B could set its vcpus to
> charge A as an unprivileged operation. As with grant tables, revocation
> poses some interesting problems.
> This is a generalization of coscheduled stub domains, because you could
> achieve the same effect by making the stub domain simply switch all its
> vcpus to charge its main domain.
> How to schedule vcpus? They could either be scheduled as if they were
> part of the other domain; or be scheduled with their "home" domain, but
> their time spent is charged against the other domain. The former is
> effectively priority inheritance, and raises all the the normal issues
> -
> but it would be appropriate for co-scheduled stub domains. The latter
> makes more sense for dom0, but its less clear what it actually means:
> does it consume any home domain credits? What happens if the other
> domain's credits are all consumed? Could two domains collude to get
> more
> than their fair share of cpu?
> The second issue is trying to share pcpu resources between vcpus where
> appropriate. The obvious case is doing some kind of cross-domain copy
> operation, where the data could well be hot in cache, so if you use the
> same pcpu you can just get cache hits. Of course there's the tradeoff
> that you're necessarily serialising things which could be done in
> parallel, so perhaps it doesn't work well in practice.
> J
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

Xen-devel mailing list