WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

To: NISHIGUCHI Naoki <nisiguti@xxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler
From: Atsushi SAKAI <sakaia@xxxxxxxxxxxxxx>
Date: Thu, 18 Dec 2008 12:31:53 +0900
Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Emmanuel Ackaouy <ackaouy@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 17 Dec 2008 19:32:42 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4949C17C.50403@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4949C17C.50403@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi, Naoki

Ask Emmanuel and George first,
since I am not maintaining scheduler.

By the way, I want to see the xentrace data for original one.
(adding vcpu priority and credit in trace-output is helpful.)
Your problem seems vcpu priority mis-handling in somewhere.

Thanks
Atsushi SAKAI


NISHIGUCHI Naoki <nisiguti@xxxxxxxxxxxxxx> wrote:

> Hi, Atsushi
> 
> After my patches applied, I have tested similarly.
> The CPU% shows following.
> dom0  25
> dom1  25
> dom2  50
> dom3 100
> 
> How do you think about my patches?
> 
> Regards,
> Naoki Nishiguchi
> 
> Atsushi SAKAI wrote:
> > Hi, George
> > 
> > Sorry for delaying.
> > 
> > With this type of changes,
> > The CPU% shows following.
> > dom1  26
> > dom2  26
> > dom3  51
> > dom4  96
> > 
> > Thanks
> > Atsushi SAKAI
> > 
> > "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote:
> > 
> >> OK, I've grueled through an example by hand and think I see what's going 
> >> on.
> >>
> >> So the idea of the credit scheduler is that we have a certain number
> >> of "credits" per accounting period, and each of these credits
> >> represents a certain amount of time.  The scheduler gives out credits
> >> according to weight, so theoretically each accounting period, if all
> >> vcpus are active, each should consume all of its credits.  Based on
> >> that assumption, if a vcpu has run and accumulated more than one full
> >> accounting period of credits, it's probably idle and we can leave it
> >> be.
> >>
> >> The problem in this situation isnt' so much with rounding errors, as
> >> with *scheduling granularity*.  In the eample given:
> >>
> >> d1: weight 128
> >> d2: weight 128
> >> d3: weight 256
> >> d4: weight 512
> >>
> >> If each domain has 2 vcpus, and there are 2 cores, then the credits
> >> will be divided thus:
> >>
> >> d1: 37 credits / vcpu
> >> d2: 37 credits / vcpu
> >> d3: 75 credits / vcpu
> >> d4: 150 credits / vcpu
> >>
> >> But since scheduling and accounting only happens every "tick", and
> >> every "tick" is 100 credits.  So each vcpu of d{1,2}, instead of
> >> consuming 37 credits, consumes 100; same with each vcpu of d3.   At
> >> the end of the first accounting period, d{1,2,3} have gotten to run
> >> for 100 credits worth of time, but d4 hasn't gotten to run at all.
> >>
> >> In short, the fact that we have a 100-credit scheduling granularity
> >> breaks the assumption that every VM has had a chance to run each
> >> accounting period when there are really long runqueues.
> >>
> >> I can think of a couple of solutions: the simplest one might be to
> >> sort the runqueue by number of credits -- at least every accounting
> >> period.  In that case, d4 would always get to run every accounting
> >> period; d{1.2} might not run for a given accounting period, but the
> >> next time it would have twice the number of credits, &c.
> >>
> >> Others might include extending accounting periods when we have long
> >> runqueues, or doing the credit limit during accounting only if it's
> >> not on the runqueue (Sakai-san's idea) *combined* with a check when
> >> the vcpu blocks.  That would catch vcpus that are only moderately
> >> active, but just happen to be on the runqueue for several accounting
> >> periods in a row.
> >>
> >> Sakai-san, would you be willing to try to implement a simple "runqueue
> >> sort" patch, and see if it also solves your scheduling issue?
> >>
> >>  -George
> >>
> >> On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI <sakaia@xxxxxxxxxxxxxx> 
> >> wrote:
> >>> Hi, Emmanuel
> >>>
> >>> 1)rounding error for credit
> >>>
> >>> This patch is over rounding error.
> >>> So I think it does not need to consider this effect.
> >>> If you think, would you suggest me your patch.
> >>> It seems changing CSCHED_TICKS_PER_ACCT is not enough.
> >>>
> >>> 2)Effect for I/O intensive job.
> >>>
> >>> I am not change the code for BOOST priority.
> >>> I just changes "credit reset" condition.
> >>> It should be no effect on I/O intensive(but I am not measured it.)
> >>> If it needs, I will test it.
> >>> Which test is best for this change?
> >>> (Simple I/O test is not enough for this case,
> >>> I think complex domain I/O configuration is needed to prove this patch 
> >>> effect.)
> >>>
> >>> 3)vcpu allocation measurement.
> >>>
> >>> At first time, I use
> >>>  http://weather.ou.edu/~apw/projects/stress/
> >>>  stress --cpu xx --timeout xx --verbose
> >>> then I use simple test.(since 2vcpus on 1domain)
> >>>  yes > /dev/null &
> >>>  yes > /dev/null &
> >>> Now I test with suggested method, then result is
> >>>     original   w/ patch
> >>> dom1    27        25
> >>> dom2    27        25
> >>> dom3    53        50
> >>> dom4    91        98
> >>>
> >>>
> >>> Thanks
> >>> Atsushi SAKAI
> >>>
> >>>
> >>>
> >>>
> >>> Emmanuel Ackaouy <ackaouy@xxxxxxxxx> wrote:
> >>>
> >>>> On Dec 9, 2008, at 2:25, George Dunlap wrote:
> >>>>> On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI
> >>>>> <sakaia@xxxxxxxxxxxxxx> wrote:
> >>>>>> You mean it should get rid of "credit reset"?
> >>>>> Yes, that's exactly what I was thinking.  Removing the check for vcpus
> >>>>> on the runqueue may actually be functionally equivalent to removing
> >>>>> the check altogether.
> >>>> Essentially, this code is there as a safeguard against rounding errors
> >>>> and other oddball cases. In theory, a runnable VCPU should seldom
> >>>> accumulate more than one time slice's worth of credits.
> >>>>
> >>>> The problem with your change is that a VCPU that is not a spinner
> >>>> but instead runs and sleeps may not be removed from the accounting
> >>>> list because when it should because it will not always be running when
> >>>> accounting and the check in question is performed. Potentially this will
> >>>> do very bad things for VCPUs that are I/O intensive or otherwise yield
> >>>> or sleep for a short time before consuming a full time slice.
> >>>>
> >>>> One thing that may help here is to make the credit calculations less
> >>>> prone to rounding errors. One thing I had wanted to do while at
> >>>> XenSource but never got around to was to change the arithmetic
> >>>> so that instead of 30 credits representing a time slice, we would
> >>>> make this a much bigger number.
> >>>>
> >>>> In this case for example, you would get credit allocations that had
> >>>> less significant rounding errors if you used 30000 instead of 30
> >>>> credits per time slice:
> >>>>
> >>>> dom1 vcpu0,1 w128 credit 3750
> >>>> dom2 vcpu0,1 w128 credit 3750
> >>>> dom3 vcpu0,1 w256 credit 7500
> >>>> dom4 vcpu0,1 w512 credit 15000
> >>>>
> >>>> I suspect this would get rid of a large number of cases such as the
> >>>> one you are reporting, where a runnable VCPU's credit exceeds
> >>>> one entire time slice. This type of change would improve accuracy
> >>>> and not screw up credit computation for I/O intensive and other
> >>>> non spinning domains.
> >>>>
> >>>> What do you think?
> >>>>
> >>>> Also please confirm that your VCPUs are indeed doing simple
> >>>> "while(1);" loops.
> >>>>
> >>>> Cheers,
> >>>> Emmanuel.
> >>>
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>> http://lists.xensource.com/xen-devel
> >>>
> >>>
> >>> ------------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>> http://lists.xensource.com/xen-devel
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel