WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Questions regarding Xen Credit Scheduler
From: Gaurav Dhiman <dimanuec@xxxxxxxxx>
Date: Thu, 15 Jul 2010 17:41:09 -0700
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 15 Jul 2010 17:42:02 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=xg+Q/1dtHo6fl26a5zsZYvpGR8ktlDecl4YTzXZ1Lbo=; b=biP20/9Ep5vKljt5SebPojKvN5Y3yRlxlp4gX7toHx3RLFITLTPa+JT24YwK07CWi3 K2zLgEQ8TM2E0OU+DNJi1smRVPT2wn2jj4wVgzlNizhuyDwPo6L9WV6DxCDLgWy5HO/O elX4A4gcBb5nN+Fey1y24w3NTqZ5IPxywKP7E=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=caWxEzNOM5GUQaTTmxsgmzX50y/BxiAfHLyfAGrSfBA+Frcbpi+l/mf6ebuN9/BpQj q24kVVq1/8dHZLd8V6okWw4vvVb6aIBXSyzcamC6iJCK9Dpm68kZAsDp9r+QSWVisbqb QPargoETXjuUggfgEOgdvZZjRXz5wBSDGq8Xc=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTinFWVZwfPdVc_pbo6x77KYNqVYEa8xJCkbEAjKF@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTiloe7jgMO49i72sF0MDFmuHJJeysEBb0oLVNono@xxxxxxxxxxxxxx> <AANLkTikh504vP27XP1SXtNANv2h1Z42RNDgEzRMjI-BK@xxxxxxxxxxxxxx> <AANLkTim2BYie1fZS00YO23XOZB3KRv8JFXmptbt9I-rp@xxxxxxxxxxxxxx> <AANLkTinFWVZwfPdVc_pbo6x77KYNqVYEa8xJCkbEAjKF@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Mon, Jul 12, 2010 at 4:05 AM, George Dunlap
<George.Dunlap@xxxxxxxxxxxxx> wrote:
>> 2. __runq_tickle: Tickle the CPU even if the new VCPU has same
>> priority but higher amount of credits left. Current code just looks at
>> the priority.
> [snip]
>> 5. csched_schedule: Always call csched_load_balance. In the
>> csched_load_balance and csched_runq_steal functions, change the logic
>> to grab a VCPU with higher credit. Current code just works on
>> priority.
>
> I'm much more wary of these ideas.  The problem here is that doing
> runqueue tickling and load balancing isn't free -- IPIs can be
> expensive, especially if your VMs are running with hardware
> virtualization .  In fact, with the current scheduler, you get a sort
> of n^2 effect, where the time the system spends doing IPIs due to load
> balancing squares with the number of schedulable entities.  In
> addition, frequent migration will reduce cache effectiveness and
> increase congestion on the memory bus.
>
> I presume you want to do this to decrease the latency?  Lee et al [1]
> actually found that *decreasing* the cpu migrations of their soft
> real-time workload led to an overall improvement in quality.  The
> paper doesn't delve deeply into why, but it seems reasonable to
> conclude that although the vcpus may have been able to start their
> task sooner (although even that's not guaranteed -- it may have taken
> longer to migrate than to get to the front of the runqueue), they
> ended their task later, presumably due to cpu stalls on cacheline
> misses and so on.
>

Thanks for this paper. It gives a very interesting analysis on what
can go wrong with applications that fall in the middle (need CPU, but
are latency sensitive as well). In my experiments, I see some servers
like mysql db-servers fall into this category. And as expected they do
not do well with some CPU intensive jobs in background, even if I give
them highest possible weight (65535). I guess very aggressive
migrations might not be a good idea, but there needs to be some way to
guarantee such apps getting their fair share at the right time.

> I think a much better approach would be:
> * To have long-term effective placement, if possible: i.e., distribute
> latency-sensitive vcpus
> * If two latency-sensitive vcpus are sharing a cpu, do shorter time-slices.

These are very interesting ideas indeed.

>> 4. csched_acct: If credit of a VCPU crosses 300, then set it to 300,
>> not 0. I am still not sure why the VCPU is being marked as inactive?
>> Can't I just update the credit and let it be active?

> So what credit1 does is assume that all workloads fall into two
> categories: "active" VMs, which consume as much cpu as they can, and
> "inactive" (or "I/O-bound") VMs, which use almost no cpu.  "Inactive"
> VMs essentially run at BOOST priority, and run whenever they want to.
> Then the credit for each timeslice is divided among the "active" VMs.
>  This way the ones that are consuming cpu don't get too far behind.
>
> The problem of course, is that most server workloads fall in the
> middle: they spend a significant time processing, but also a
> significant time waiting for more network packets.

This is precisely the problem we are facing.

> I looked at the idea of "capping" credit, as you say; but the
> steady-state when I worked out the algorithms by hand was that all the
> VMs were at their cap all the time, which screwed up other aspects of
> the algorithm.  Credits need to be thrown away; my proposal was to
> divide the credits by 2, rather than setting to 0.  This should be a
> good mid-way.

Sure, dividing by 2 could be a good middle ground. We can additionally
not mark them inactive as well?

> These things are actually really subtle.  I've spent hours and hours
> with pencil-and-paper, working out different algorithms by hand, to
> see exactly what effect the different changes would have.  I even
> wrote a discrete event simulator, to make the process a bit faster.
> (But of course, to understand why things look the way they do, you
> still have to trace through the algorithm manually).  If you're really
> keen, I can tar it up and send it to you. :-)

I am just figuring out how non trivial these apparently small problems
are :-) It would be great if you could share your simulator!

I will keep you posted on my changes and tests.

Thanks,
-Gaurav

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel