Hi, George
I will make a simple runqsort version.
By the way, sorting makes some overhead.
(even if small number of vcpus exists on pcpu runq)
Thanks
Atsushi SAKAI
"George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote:
> OK, I've grueled through an example by hand and think I see what's going on.
>
> So the idea of the credit scheduler is that we have a certain number
> of "credits" per accounting period, and each of these credits
> represents a certain amount of time. The scheduler gives out credits
> according to weight, so theoretically each accounting period, if all
> vcpus are active, each should consume all of its credits. Based on
> that assumption, if a vcpu has run and accumulated more than one full
> accounting period of credits, it's probably idle and we can leave it
> be.
>
> The problem in this situation isnt' so much with rounding errors, as
> with *scheduling granularity*. In the eample given:
>
> d1: weight 128
> d2: weight 128
> d3: weight 256
> d4: weight 512
>
> If each domain has 2 vcpus, and there are 2 cores, then the credits
> will be divided thus:
>
> d1: 37 credits / vcpu
> d2: 37 credits / vcpu
> d3: 75 credits / vcpu
> d4: 150 credits / vcpu
>
> But since scheduling and accounting only happens every "tick", and
> every "tick" is 100 credits. So each vcpu of d{1,2}, instead of
> consuming 37 credits, consumes 100; same with each vcpu of d3. At
> the end of the first accounting period, d{1,2,3} have gotten to run
> for 100 credits worth of time, but d4 hasn't gotten to run at all.
>
> In short, the fact that we have a 100-credit scheduling granularity
> breaks the assumption that every VM has had a chance to run each
> accounting period when there are really long runqueues.
>
> I can think of a couple of solutions: the simplest one might be to
> sort the runqueue by number of credits -- at least every accounting
> period. In that case, d4 would always get to run every accounting
> period; d{1.2} might not run for a given accounting period, but the
> next time it would have twice the number of credits, &c.
>
> Others might include extending accounting periods when we have long
> runqueues, or doing the credit limit during accounting only if it's
> not on the runqueue (Sakai-san's idea) *combined* with a check when
> the vcpu blocks. That would catch vcpus that are only moderately
> active, but just happen to be on the runqueue for several accounting
> periods in a row.
>
> Sakai-san, would you be willing to try to implement a simple "runqueue
> sort" patch, and see if it also solves your scheduling issue?
>
> -George
>
> On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI <sakaia@xxxxxxxxxxxxxx> wrote:
> > Hi, Emmanuel
> >
> > 1)rounding error for credit
> >
> > This patch is over rounding error.
> > So I think it does not need to consider this effect.
> > If you think, would you suggest me your patch.
> > It seems changing CSCHED_TICKS_PER_ACCT is not enough.
> >
> > 2)Effect for I/O intensive job.
> >
> > I am not change the code for BOOST priority.
> > I just changes "credit reset" condition.
> > It should be no effect on I/O intensive(but I am not measured it.)
> > If it needs, I will test it.
> > Which test is best for this change?
> > (Simple I/O test is not enough for this case,
> > I think complex domain I/O configuration is needed to prove this patch
> > effect.)
> >
> > 3)vcpu allocation measurement.
> >
> > At first time, I use
> > http://weather.ou.edu/~apw/projects/stress/
> > stress --cpu xx --timeout xx --verbose
> > then I use simple test.(since 2vcpus on 1domain)
> > yes > /dev/null &
> > yes > /dev/null &
> > Now I test with suggested method, then result is
> > original w/ patch
> > dom1 27 25
> > dom2 27 25
> > dom3 53 50
> > dom4 91 98
> >
> >
> > Thanks
> > Atsushi SAKAI
> >
> >
> >
> >
> > Emmanuel Ackaouy <ackaouy@xxxxxxxxx> wrote:
> >
> >> On Dec 9, 2008, at 2:25, George Dunlap wrote:
> >> > On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI
> >> > <sakaia@xxxxxxxxxxxxxx> wrote:
> >> >> You mean it should get rid of "credit reset"?
> >> >
> >> > Yes, that's exactly what I was thinking. Removing the check for vcpus
> >> > on the runqueue may actually be functionally equivalent to removing
> >> > the check altogether.
> >>
> >> Essentially, this code is there as a safeguard against rounding errors
> >> and other oddball cases. In theory, a runnable VCPU should seldom
> >> accumulate more than one time slice's worth of credits.
> >>
> >> The problem with your change is that a VCPU that is not a spinner
> >> but instead runs and sleeps may not be removed from the accounting
> >> list because when it should because it will not always be running when
> >> accounting and the check in question is performed. Potentially this will
> >> do very bad things for VCPUs that are I/O intensive or otherwise yield
> >> or sleep for a short time before consuming a full time slice.
> >>
> >> One thing that may help here is to make the credit calculations less
> >> prone to rounding errors. One thing I had wanted to do while at
> >> XenSource but never got around to was to change the arithmetic
> >> so that instead of 30 credits representing a time slice, we would
> >> make this a much bigger number.
> >>
> >> In this case for example, you would get credit allocations that had
> >> less significant rounding errors if you used 30000 instead of 30
> >> credits per time slice:
> >>
> >> dom1 vcpu0,1 w128 credit 3750
> >> dom2 vcpu0,1 w128 credit 3750
> >> dom3 vcpu0,1 w256 credit 7500
> >> dom4 vcpu0,1 w512 credit 15000
> >>
> >> I suspect this would get rid of a large number of cases such as the
> >> one you are reporting, where a runnable VCPU's credit exceeds
> >> one entire time slice. This type of change would improve accuracy
> >> and not screw up credit computation for I/O intensive and other
> >> non spinning domains.
> >>
> >> What do you think?
> >>
> >> Also please confirm that your VCPUs are indeed doing simple
> >> "while(1);" loops.
> >>
> >> Cheers,
> >> Emmanuel.
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
> >
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|