Hi, Naoki
Ask Emmanuel and George first,
since I am not maintaining scheduler.
By the way, I want to see the xentrace data for original one.
(adding vcpu priority and credit in trace-output is helpful.)
Your problem seems vcpu priority mis-handling in somewhere.
Thanks
Atsushi SAKAI
NISHIGUCHI Naoki <nisiguti@xxxxxxxxxxxxxx> wrote:
> Hi, Atsushi
>
> After my patches applied, I have tested similarly.
> The CPU% shows following.
> dom0 25
> dom1 25
> dom2 50
> dom3 100
>
> How do you think about my patches?
>
> Regards,
> Naoki Nishiguchi
>
> Atsushi SAKAI wrote:
> > Hi, George
> >
> > Sorry for delaying.
> >
> > With this type of changes,
> > The CPU% shows following.
> > dom1 26
> > dom2 26
> > dom3 51
> > dom4 96
> >
> > Thanks
> > Atsushi SAKAI
> >
> > "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote:
> >
> >> OK, I've grueled through an example by hand and think I see what's going
> >> on.
> >>
> >> So the idea of the credit scheduler is that we have a certain number
> >> of "credits" per accounting period, and each of these credits
> >> represents a certain amount of time. The scheduler gives out credits
> >> according to weight, so theoretically each accounting period, if all
> >> vcpus are active, each should consume all of its credits. Based on
> >> that assumption, if a vcpu has run and accumulated more than one full
> >> accounting period of credits, it's probably idle and we can leave it
> >> be.
> >>
> >> The problem in this situation isnt' so much with rounding errors, as
> >> with *scheduling granularity*. In the eample given:
> >>
> >> d1: weight 128
> >> d2: weight 128
> >> d3: weight 256
> >> d4: weight 512
> >>
> >> If each domain has 2 vcpus, and there are 2 cores, then the credits
> >> will be divided thus:
> >>
> >> d1: 37 credits / vcpu
> >> d2: 37 credits / vcpu
> >> d3: 75 credits / vcpu
> >> d4: 150 credits / vcpu
> >>
> >> But since scheduling and accounting only happens every "tick", and
> >> every "tick" is 100 credits. So each vcpu of d{1,2}, instead of
> >> consuming 37 credits, consumes 100; same with each vcpu of d3. At
> >> the end of the first accounting period, d{1,2,3} have gotten to run
> >> for 100 credits worth of time, but d4 hasn't gotten to run at all.
> >>
> >> In short, the fact that we have a 100-credit scheduling granularity
> >> breaks the assumption that every VM has had a chance to run each
> >> accounting period when there are really long runqueues.
> >>
> >> I can think of a couple of solutions: the simplest one might be to
> >> sort the runqueue by number of credits -- at least every accounting
> >> period. In that case, d4 would always get to run every accounting
> >> period; d{1.2} might not run for a given accounting period, but the
> >> next time it would have twice the number of credits, &c.
> >>
> >> Others might include extending accounting periods when we have long
> >> runqueues, or doing the credit limit during accounting only if it's
> >> not on the runqueue (Sakai-san's idea) *combined* with a check when
> >> the vcpu blocks. That would catch vcpus that are only moderately
> >> active, but just happen to be on the runqueue for several accounting
> >> periods in a row.
> >>
> >> Sakai-san, would you be willing to try to implement a simple "runqueue
> >> sort" patch, and see if it also solves your scheduling issue?
> >>
> >> -George
> >>
> >> On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI <sakaia@xxxxxxxxxxxxxx>
> >> wrote:
> >>> Hi, Emmanuel
> >>>
> >>> 1)rounding error for credit
> >>>
> >>> This patch is over rounding error.
> >>> So I think it does not need to consider this effect.
> >>> If you think, would you suggest me your patch.
> >>> It seems changing CSCHED_TICKS_PER_ACCT is not enough.
> >>>
> >>> 2)Effect for I/O intensive job.
> >>>
> >>> I am not change the code for BOOST priority.
> >>> I just changes "credit reset" condition.
> >>> It should be no effect on I/O intensive(but I am not measured it.)
> >>> If it needs, I will test it.
> >>> Which test is best for this change?
> >>> (Simple I/O test is not enough for this case,
> >>> I think complex domain I/O configuration is needed to prove this patch
> >>> effect.)
> >>>
> >>> 3)vcpu allocation measurement.
> >>>
> >>> At first time, I use
> >>> http://weather.ou.edu/~apw/projects/stress/
> >>> stress --cpu xx --timeout xx --verbose
> >>> then I use simple test.(since 2vcpus on 1domain)
> >>> yes > /dev/null &
> >>> yes > /dev/null &
> >>> Now I test with suggested method, then result is
> >>> original w/ patch
> >>> dom1 27 25
> >>> dom2 27 25
> >>> dom3 53 50
> >>> dom4 91 98
> >>>
> >>>
> >>> Thanks
> >>> Atsushi SAKAI
> >>>
> >>>
> >>>
> >>>
> >>> Emmanuel Ackaouy <ackaouy@xxxxxxxxx> wrote:
> >>>
> >>>> On Dec 9, 2008, at 2:25, George Dunlap wrote:
> >>>>> On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI
> >>>>> <sakaia@xxxxxxxxxxxxxx> wrote:
> >>>>>> You mean it should get rid of "credit reset"?
> >>>>> Yes, that's exactly what I was thinking. Removing the check for vcpus
> >>>>> on the runqueue may actually be functionally equivalent to removing
> >>>>> the check altogether.
> >>>> Essentially, this code is there as a safeguard against rounding errors
> >>>> and other oddball cases. In theory, a runnable VCPU should seldom
> >>>> accumulate more than one time slice's worth of credits.
> >>>>
> >>>> The problem with your change is that a VCPU that is not a spinner
> >>>> but instead runs and sleeps may not be removed from the accounting
> >>>> list because when it should because it will not always be running when
> >>>> accounting and the check in question is performed. Potentially this will
> >>>> do very bad things for VCPUs that are I/O intensive or otherwise yield
> >>>> or sleep for a short time before consuming a full time slice.
> >>>>
> >>>> One thing that may help here is to make the credit calculations less
> >>>> prone to rounding errors. One thing I had wanted to do while at
> >>>> XenSource but never got around to was to change the arithmetic
> >>>> so that instead of 30 credits representing a time slice, we would
> >>>> make this a much bigger number.
> >>>>
> >>>> In this case for example, you would get credit allocations that had
> >>>> less significant rounding errors if you used 30000 instead of 30
> >>>> credits per time slice:
> >>>>
> >>>> dom1 vcpu0,1 w128 credit 3750
> >>>> dom2 vcpu0,1 w128 credit 3750
> >>>> dom3 vcpu0,1 w256 credit 7500
> >>>> dom4 vcpu0,1 w512 credit 15000
> >>>>
> >>>> I suspect this would get rid of a large number of cases such as the
> >>>> one you are reporting, where a runnable VCPU's credit exceeds
> >>>> one entire time slice. This type of change would improve accuracy
> >>>> and not screw up credit computation for I/O intensive and other
> >>>> non spinning domains.
> >>>>
> >>>> What do you think?
> >>>>
> >>>> Also please confirm that your VCPUs are indeed doing simple
> >>>> "while(1);" loops.
> >>>>
> >>>> Cheers,
> >>>> Emmanuel.
> >>>
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>> http://lists.xensource.com/xen-devel
> >>>
> >>>
> >>> ------------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>> http://lists.xensource.com/xen-devel
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|