This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] New CPU scheduler w/ SMP load balancer

To: Anthony Liguori <aliguori@xxxxxxxxxx>
Subject: Re: [Xen-devel] New CPU scheduler w/ SMP load balancer
From: Emmanuel Ackaouy <ack@xxxxxxxxxxxxx>
Date: Fri, 26 May 2006 20:11:22 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 26 May 2006 12:11:42 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <447721EF.6040401@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Mail-followup-to: Anthony Liguori <aliguori@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
References: <20060526130150.GA2756@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <447721EF.6040401@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.4.1i
Hi Anthony.

Thanks for your feedback. I'll take a look at your comments
regarding the Xend python code in the patch this week end.

On Fri, May 26, 2006 at 10:42:39AM -0500, Anthony Liguori wrote:
> Can you provide some more details on any results you may have seen with 
> the new scheduler?  How does it affect common benchmarks?  How does the 
> "load balancer" scale?  How much penalty do you pay (if any at all) on UP?

It is not simple to define a set of performance benchmarks for
a VCPU scheduler. On an SMP host, the credit scheduler is a lot
better at enforcing fairness across multiple guest, some SMP
and some UP. Certainly, the VCPU scheduler has an effect on I/O
benchmarks because of the interaction between domUs and dom0.

I found that on a uni-processor, running ttcp in a domU yielded
almost twice the network bandwidth with the credit scheduler
compared to with SEDF. This probably has less to do with scheduling
algorithms than with implementation problems though.

For SMP guests, the credit scheduler enforces that all VCPUs
make equal progress. This solves a number of serious performance
problems when you are time slicing some of your physical CPUs
between multiple SMP guests.

In terms of consolidating multiple guests on one SMP host, we
are now playing in a different ballpark with the credit scheduler:
When a CPU goes idle, it immediately picks up a runnable VCPU
waiting on the runqueue on another CPU. With SEDF and BVT, you
have to manually place all the VCPUs in the system and there are
no dynamic adjustements when VCPUs go to sleep waiting for I/O.
The credit scheduler is work conserving in that it will make use
of any CPU cycles when there is runnable work. It does this as
soon as a CPU runs out of work. This is in contrast with other
load balancing algorithms that work in the background and move
things around on some type of clock tick. Being work conserving
on SMP hosts is a huge improvement over the previous scheduler

In terms of scaling, I have taken profiles on an 8-way system
and found lock contention to be reasonable. We'll need to do
some performance work and perhaps pad some cachelines or change
a few things to run on very large NUMA type systems but by design,
the credit scheduler is designed to scale to very large systems.

The common code path (do_schedule) is designed to be extremely
fast on both UP and MP systems. Using the scientific method of
code inspection :-), these code paths are a lot shorter and faster
than the SEDF ones. The accounting work in the credit scheduler is
done every 30 milliseconds outside the common path and its
complexity is linear with the number of running VCPUs in the
system. Making accounting work overhead independant of the
number of scheduling operations is good on I/O workloads where
lots of context switches occur.

> Better yet, if you have a paper you could share, that would be even 
> better :-)  If you cannot share because of conference restrictions, it 
> would be nice if you could a condensed version (similar to what the L4ka 
> group did for their afterburning work).

Writing a paper is something I'd like to do at some point once
we've had more experience in the field.

> Based on your description though, the new scheduler looks very promising!

I am eager to hear people's experiences with the new scheduler,
especially on SMP hosts.


Xen-devel mailing list