Re: [Xen-devel] Questions regarding Xen Credit Scheduler

To:	George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] Questions regarding Xen Credit Scheduler
From:	Gaurav Dhiman <dimanuec@xxxxxxxxx>
Date:	Thu, 15 Jul 2010 17:41:09 -0700
Cc:	xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Thu, 15 Jul 2010 17:42:02 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=xg+Q/1dtHo6fl26a5zsZYvpGR8ktlDecl4YTzXZ1Lbo=; b=biP20/9Ep5vKljt5SebPojKvN5Y3yRlxlp4gX7toHx3RLFITLTPa+JT24YwK07CWi3 K2zLgEQ8TM2E0OU+DNJi1smRVPT2wn2jj4wVgzlNizhuyDwPo6L9WV6DxCDLgWy5HO/O elX4A4gcBb5nN+Fey1y24w3NTqZ5IPxywKP7E=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=caWxEzNOM5GUQaTTmxsgmzX50y/BxiAfHLyfAGrSfBA+Frcbpi+l/mf6ebuN9/BpQj q24kVVq1/8dHZLd8V6okWw4vvVb6aIBXSyzcamC6iJCK9Dpm68kZAsDp9r+QSWVisbqb QPargoETXjuUggfgEOgdvZZjRXz5wBSDGq8Xc=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<AANLkTinFWVZwfPdVc_pbo6x77KYNqVYEa8xJCkbEAjKF@xxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<AANLkTiloe7jgMO49i72sF0MDFmuHJJeysEBb0oLVNono@xxxxxxxxxxxxxx> <AANLkTikh504vP27XP1SXtNANv2h1Z42RNDgEzRMjI-BK@xxxxxxxxxxxxxx> <AANLkTim2BYie1fZS00YO23XOZB3KRv8JFXmptbt9I-rp@xxxxxxxxxxxxxx> <AANLkTinFWVZwfPdVc_pbo6x77KYNqVYEa8xJCkbEAjKF@xxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On Mon, Jul 12, 2010 at 4:05 AM, George Dunlap
<George.Dunlap@xxxxxxxxxxxxx> wrote:
>> 2. __runq_tickle: Tickle the CPU even if the new VCPU has same
>> priority but higher amount of credits left. Current code just looks at
>> the priority.
> [snip]
>> 5. csched_schedule: Always call csched_load_balance. In the
>> csched_load_balance and csched_runq_steal functions, change the logic
>> to grab a VCPU with higher credit. Current code just works on
>> priority.
>
> I'm much more wary of these ideas.  The problem here is that doing
> runqueue tickling and load balancing isn't free -- IPIs can be
> expensive, especially if your VMs are running with hardware
> virtualization .  In fact, with the current scheduler, you get a sort
> of n^2 effect, where the time the system spends doing IPIs due to load
> balancing squares with the number of schedulable entities.  In
> addition, frequent migration will reduce cache effectiveness and
> increase congestion on the memory bus.
>
> I presume you want to do this to decrease the latency?  Lee et al [1]
> actually found that *decreasing* the cpu migrations of their soft
> real-time workload led to an overall improvement in quality.  The
> paper doesn't delve deeply into why, but it seems reasonable to
> conclude that although the vcpus may have been able to start their
> task sooner (although even that's not guaranteed -- it may have taken
> longer to migrate than to get to the front of the runqueue), they
> ended their task later, presumably due to cpu stalls on cacheline
> misses and so on.
>

Thanks for this paper. It gives a very interesting analysis on what
can go wrong with applications that fall in the middle (need CPU, but
are latency sensitive as well). In my experiments, I see some servers
like mysql db-servers fall into this category. And as expected they do
not do well with some CPU intensive jobs in background, even if I give
them highest possible weight (65535). I guess very aggressive
migrations might not be a good idea, but there needs to be some way to
guarantee such apps getting their fair share at the right time.

> I think a much better approach would be:
> * To have long-term effective placement, if possible: i.e., distribute
> latency-sensitive vcpus
> * If two latency-sensitive vcpus are sharing a cpu, do shorter time-slices.

These are very interesting ideas indeed.

>> 4. csched_acct: If credit of a VCPU crosses 300, then set it to 300,
>> not 0. I am still not sure why the VCPU is being marked as inactive?
>> Can't I just update the credit and let it be active?

> So what credit1 does is assume that all workloads fall into two
> categories: "active" VMs, which consume as much cpu as they can, and
> "inactive" (or "I/O-bound") VMs, which use almost no cpu.  "Inactive"
> VMs essentially run at BOOST priority, and run whenever they want to.
> Then the credit for each timeslice is divided among the "active" VMs.
>  This way the ones that are consuming cpu don't get too far behind.
>
> The problem of course, is that most server workloads fall in the
> middle: they spend a significant time processing, but also a
> significant time waiting for more network packets.

This is precisely the problem we are facing.

> I looked at the idea of "capping" credit, as you say; but the
> steady-state when I worked out the algorithms by hand was that all the
> VMs were at their cap all the time, which screwed up other aspects of
> the algorithm.  Credits need to be thrown away; my proposal was to
> divide the credits by 2, rather than setting to 0.  This should be a
> good mid-way.

Sure, dividing by 2 could be a good middle ground. We can additionally
not mark them inactive as well?

> These things are actually really subtle.  I've spent hours and hours
> with pencil-and-paper, working out different algorithms by hand, to
> see exactly what effect the different changes would have.  I even
> wrote a discrete event simulator, to make the process a bit faster.
> (But of course, to understand why things look the way they do, you
> still have to trace through the algorithm manually).  If you're really
> keen, I can tar it up and send it to you. :-)

I am just figuring out how non trivial these apparently small problems
are :-) It would be great if you could share your simulator!

I will keep you posted on my changes and tests.

Thanks,
-Gaurav

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Questions regarding Xen Credit Scheduler