Re: [Xen-devel] atropos scheduler broken

To:	Diwaker Gupta <diwakergupta@xxxxxxxxx>
Subject:	Re: [Xen-devel] atropos scheduler broken
From:	Steven Hand <Steven.Hand@xxxxxxxxxxxx>
Date:	Tue, 26 Oct 2004 08:36:50 +0100
Cc:	xen-devel@xxxxxxxxxxxxxxxxxxxxx, Steven.Hand@xxxxxxxxxxxx
Delivery-date:	Tue, 26 Oct 2004 08:36:50 +0100
Envelope-to:	Steven.Hand@xxxxxxxxxxxx
In-reply-to:	Message from Diwaker Gupta <diwakergupta@xxxxxxxxx> of "Mon, 25 Oct 2004 11:08:18 PDT." <1b0b4557041025110861e4e9bb@xxxxxxxxxxxxxx>

>I've been playing around with the atropos scheduler last couple of
>days, and I'm quite convinced that it *does not* enforce the soft real
>time guarantees. 

It is quite possible our current implementation is bugged -- we've 
not gotten around to extensive testing in the recent past. 

> Maybe I'm using the wrong parameters or something, so let me describe 
> my experiment:
>
>o first I create 2 VMs -- VM1 and VM2
>o then I change their atropos params as follows:
>$ xm atropos 1 10 100 1 1
>$ xm atropos 2 70 100 1 1
>Ideally, this should guarantee that VM1 gets 10ns of CPU time every
>100ns, and VM2 gets 70ns every 100ns, and that any left over CPU time
>will be shared between the 2.

Well your parameters are somewhat aggressive -- although times 
are specified in nanoseconds this is for precision rather than for 
allowing 10ns slices and 100ns periods (which translates into at 
least 10 millions context switches a second). x86 CPUs don't really
turn corners too fast, and so this is a considerable overhead. 

Atropos doesn't work it it's in overload (>= 100%), which includes 
both allocated slices and all overhead for context switching, running 
through the scheduler, and certain irq handling. 

Your latency values are also rather aggressive - 1ns means that if 
a domain blocks for any reason (e.g. to do I/O) then when it unblocks
it's new period will start at most 1ns after the current pass through
the scheduler. There's a small modification in the current implementation
which means this may not bit quite as hard as it could, but even so 
any domain waiting more than 100ns for something could cause an immediate
reentry into the scheduler after unblocking due to this. 

One simple thing to try is to scale your scheduling parameters to 
something more reasonable; e.g. 

$ xm atropos 1 10000 100000 50000 1
$ xm atropos 2 70000 100000 50000 1

Let us know how well this works -- if this is also broken, then we
have a real bug. 

cheers,

S.


p.s. you're not running on SMP are you? if so, the domains will be 
     on different CPUs and hence the x flag will cause each of them
     to get approximately the same allocation, just as you observed.

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] atropos scheduler broken