Tian, Kevin wrote:
>> From: Jeremy Fitzhardinge
>> Sent: 2009年4月10日 2:42
>> George Dunlap wrote:
>>> 1. Design targets
>>> We have three general use cases in mind: Server
>> consolidation, virtual
>>> desktop providers, and clients (e.g. XenClient).
>>> For servers, our target "sweet spot" for which we will optimize is a
>>> system with 2 sockets, 4 cores each socket, and SMT (16
>> logical cpus).
>>> Ideal performance is expected to be reached at about 80% total system
>>> cpu utilization; but the system should function reasonably well up to
>>> a utilization of 800% (e.g., a load of 8).
>> Is that forward-looking enough? That hardware is currently available;
>> what's going to be commonplace in 2-3 years?
> good point.
>>> * HT-aware.
>>> Running on a logical processor with an idle peer thread is not the
>>> same as running on a logical processor with a busy peer thread. The
>>> scheduler needs to take this into account when deciding "fairness".
>> Would it be worth just pair-scheduling HT threads so they're always
>> running in the same domain?
> running same domain doesn't help fairness and instead, it worsens.
I don't know what the performance characteristics of modern-HT is, but
in P4-HT the throughput of a given thread was very dependent on what the
other thread was doing. If its competing with some other arbitrary
domain, then its hard to make any estimates about what the throughput of
a given vcpu's thread is.
If we present them as sibling pairs to guests, then it becomes the guest
OS's problem (ie, we don't try to hide the true nature of these pcpus).
That's fairer for the guest, because they know what they're getting, and
Xen can charge the guest for cpu use on a thread-pair, rather than
trying to work out how the two threads compete. In other words, if only
one thread is running, then it can charge max-thread-throughput; if both
are running, it can charge max-core-throughput (possibly scaled by
whatever performance mode the core is running in).
>>> * Power-aware.
>>> Using as many sockets / cores as possible can increase the
>> total cache
>>> size avalable to VMs, and thus (in the absence of inter-VM sharing)
>>> increase total computing power; but by keeping multiple sockets and
>>> cores powered up, also increases the electrical power used by the
>>> system. We want a configurable way to balance between maximizing
>>> processing power vs minimizing electrical power.
>> I don't remember if there's a proper term for this, but what about
>> having multiple domains sharing the same scheduling context, so that a
>> stub domain can be co-scheduled with its main domain, rather
>> than having
>> them treated separately?
> This is really desired.
>> Also, a somewhat related point, some kind of directed schedule so that
>> when one vcpu is synchronously waiting on anohter vcpu, have
>> it directly
>> hand over its pcpu to avoid any cross-cpu overhead (including the
>> ability to take advantage of directly using hot cache lines). That
>> would be useful for intra-domain IPIs, etc, but also inter-domain
>> context switches (domain<->stub, frontend<->backend, etc).
> The hard part here is to find the hint on WHICH vcpu that given
> cpu is waiting, which is not straightforward. Of course stub
> domain is most possible example, but it may be already cleanly
> addressed if above co-scheduling could be added? :-)
I'm being unclear by conflating two issues.
One is that when dom0 (or driver domain) does some work on behalf of a
guest, it seems like it would be useful for the time used to be credited
against the guest rather than against dom0.
My thought is that, rather than having the scheduler parameters be the
implicit result of "vcpu A belongs to domain X, charge X", each vcpu has
a charging domain which can be updated via (privileged) hypercall. When
dom0 is about to do some work, it updates the charging domain
accordingly (with some machinery to make that a per-task property within
the kernel so that task context switches update the vcpu state
A further extension would be the idea of charging grants, where domain A
could grant domain B charging rights, and B could set its vcpus to
charge A as an unprivileged operation. As with grant tables, revocation
poses some interesting problems.
This is a generalization of coscheduled stub domains, because you could
achieve the same effect by making the stub domain simply switch all its
vcpus to charge its main domain.
How to schedule vcpus? They could either be scheduled as if they were
part of the other domain; or be scheduled with their "home" domain, but
their time spent is charged against the other domain. The former is
effectively priority inheritance, and raises all the the normal issues -
but it would be appropriate for co-scheduled stub domains. The latter
makes more sense for dom0, but its less clear what it actually means:
does it consume any home domain credits? What happens if the other
domain's credits are all consumed? Could two domains collude to get more
than their fair share of cpu?
The second issue is trying to share pcpu resources between vcpus where
appropriate. The obvious case is doing some kind of cross-domain copy
operation, where the data could well be hot in cache, so if you use the
same pcpu you can just get cache hits. Of course there's the tradeoff
that you're necessarily serialising things which could be done in
parallel, so perhaps it doesn't work well in practice.
Xen-devel mailing list