This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and inter

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>
Subject: RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Thu, 16 Apr 2009 14:10:46 +0000 (GMT)
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 16 Apr 2009 07:57:41 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <de76405a0904160327h29d138d1pfe6e4c2b4f4eaaac@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>From a resource utilization perspective, hyper-pairing may
make sense.  But what about the user perspective?  How would
an administrator specify hyper-pairing?  And more importantly
why?  When consolidating workloads from, say, a group
of dual-core or dual-processor servers onto some future
larger hyperthreaded server, why would anyone say
"please assign this to a hyper-pair", which is essentially
saying "give me less peak performance than I had before"?

Also, in the analysis below, the problem is greatly
simplified because today's (x86) processors are limited
to two hyperthreads.  How soon will we see more threads
per core, given that other non-x86 CPUs already support
four or more?

> -----Original Message-----
> From: George Dunlap [mailto:George.Dunlap@xxxxxxxxxxxxx]
> Sent: Thursday, April 16, 2009 4:28 AM
> To: Tian, Kevin
> Cc: Jeremy Fitzhardinge; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] [RFC] Scheduler work, part 1: 
> High-level goals
> and interface.
> 2009/4/16 Tian, Kevin <kevin.tian@xxxxxxxxx>:
> > Could you elaborate more about what's being simplified compared to
> > generic gang scheduling? I used to be scared by complexity to have
> > multiple vcpus sync in and sync out, especially with other
> > heterogeneous VMs (w/o gang scheduling requirement). It's possibly
> > simpler if all VMs in that system are hyper-pair based... :-)
> (I've only done some thought experiments re gang scheduling, so take
> that into account as you read this description.)
> The general gang scheduling problem is that you have P processors, and
> N vms, each of which have up to M processors.  Each vm may have a
> different number of processors N.vcpu < P, and at any one time there
> may be a different number of processors runnable N.runnable < N.vcpu <
> P; this may change at any time.  The general problem is to maximize
> the number of used processors P (and thus maximize the throughput).
> If you have a VM with 4 vcpus, but it's only using 2, you have a
> choice: do I run it on 2 cores, and let another VM use the other 2
> cores?  Or do I "reserve" 4 cores for it, so that if the other 2 vcpus
> wake up they can run immediately?  If you do the former, then if one
> of the other vcpus wakes up you have to quickly preempt someone else;
> if not, you risk leaving the two cores idle for the entire timeslice,
> effectively throwing away the processing power.  The whole problem is
> likely to be NP-complete, and really obnoxious to have good heuristics
> for.
> In the case of HT gang-scheduling, the problem is 
> significantly constrained:
> * The pairs of processors to look at is constrianed: each logical cpu
> has a pre-defined sibling.
> * The quantity we're trying to gang-schedule is significantly
> constrained: only 2; and each gang-scheduled vcpu has its pre-defined
> HT sibling as well.
> * If only one sibling of a HT pair is active, the other one isn't
> wasted; the active thread will get more processing power.  So we don't
> have that risky choice.
> So there are really only a handful of new cases to consider:
> * A HT vcpu pair wakes up or comes to the front of the queue when
> another HT vcpu pair is running.
>  + Simple: order normally.  If this pair is a higher priority
> (whatever that means) than the running pair, preempt the running pair
> (i.e., preempt the vcpus on both logical cpus).
> * A non-HT vcpu becomes runnable (or comes to the front of the
> runqueue) when a HT vcpu pair is on a pair of threads
>  + If the non-HT vcpu priority is lower, wait until the HT vcpu pair
> is finished.
>  + If the non-HT vcpu priority is higher, we need to decide whether to
> wait longer or whether to preempt both threads.  This may depends on
> whether there are other non-HT vcpus waiting to run, and what their
> priority is.
> * A HT vcpu pair becomes runnable when non-HT vcpus are 
> running on the threads
>  + Deciding whether to wait or preempt both threads will depend on the
> relative weights of both.
> These decisions are a lot easier to deal with than the full
> gang-scheduling problem (AFAICT).  One can imagine, for instance, that
> each HT pair could share one runqueue.  A vcpu HT pair would be put on
> the runqueue as an individual entity.  When it reached the top of the
> queue, both threads would preempt what was on them and run the pair of
> threads (or idle if the vcpu was idle).  Otherwise, each thread would
> take a single non-pair vcpu and execute it.
> At any rate, that's a step further down the road.  First we need to
> address basic credits, latency, and load-balancing. (Implementation
> e-mails to come in a bit.)
>  -George
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>