WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6)

Hi George --

Thanks again for the reply.  Hope it's OK if I go back
on-list...  I'm hoping others may be able to reproduce
as my ability to experiment is limited now (see below).

> From: George Dunlap [mailto:George.Dunlap@xxxxxxxxxxxxx]
>
> 1) Make a large ramdisk in each VM, big enough for the whole kernel
> tree and binaries.  Do the build there, and see if you have the same
> discrepancy.

My test domains have 384MB each.  Dom0 has 256MB.
(Total physical RAM is only 2GB.)  So this isn't
really an option.

> 2) Play with the dom0 io scheduler and see if it has an effect.  If
> your current one is "noop", that's suspicious; see if "cfq" works
> better.

On dom0, /sys/block/sda/queue/scheduler shows [cfq].
Don't know if this matters but /sys/block/tapdev*/queue/scheduler
show [noop].

> 3) Take a trace of just the scheduling events, using xentrace...

I lost about a week of test runs that I'm working on for
Xen Summit and have to re-do those before I do much
experimenting, but will try out some of your ideas when
my (week of redo) test runs are done.  In the meantime, I'm
still monitoring the test runs that I am running now.
(I need a reliable set of non-tmem runs as a base
to compare various tmem runs against.)

I reported two problems that we can call:
1) "racing ahead", where one of a pair of identical domains
   seems to get a lot more cycles than the other
2) "irreproducibility", where two seemingly identical
   and heavily overcommitted test runs have timing results
   that differ by an unreasonable amount (6-7%)

After reducing my test domains to a single vcpu, the
"irreproducibility" problem seems to be greatly reduced.
I made three runs and they differ by <0.3%.  So as
best I can tell, this problem requires multi-vcpu domains.
(Actually, I changed from "file" to "tap:aio" also so
it could be that too.)

However, with:

a) vcpus=1 for the test domains (see previous post) and
b) vcpus=1 for test domains and dom0_max_vcpus=1

I am still seeing the "racing ahead" problem.  On
a current run of (b)

142s dom0
479s 64-bit #1
454s 64-bit #2 <-- 6% less
536s 32-bit #1
447s 32-bit #2 <-- 16% less!

Again, this is a transitory oddity that may shed some
light... after completion of the workload, the runtimes
are very similar THOUGH #2 seems to always be the
slower of the two by a small amount (<0.5%).

Thanks,
Dan

> -----Original Message-----
> From: George Dunlap [mailto:George.Dunlap@xxxxxxxxxxxxx]
> Sent: Tuesday, April 06, 2010 5:24 AM
> To: Dan Magenheimer
> Subject: Re: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6)
> 
> How much memory does each VM have?  Another possibility is that this
> has to do with unfairness in the block driver servicing requests.
> Three ways you could test this hypothesis.
> 
> 1) Make a large ramdisk in each VM, big enough for the whole kernel
> tree and binaries.  Do the build there, and see if you have the same
> discrepancy.
> 
> 2) Play with the dom0 io scheduler and see if it has an effect.  If
> your current one is "noop", that's suspicious; see if "cfq" works
> better.
> 
> 3) Take a trace of just the scheduling events, using xentrace, and use
> xenalyze to see how much time each vcpu is spending running, runnable,
> and blocked (waiting for the cpu).  If the scheduler is being unfair,
> then some vcpus will spend more time "runnable" than others.  If it's
> something else (the dom0 disk scheduler being unfair, or the vm just
> using different amounts of memory) then "runnable" will not be
> considerably higher.
> 
> To do #3:
> 
> # xentrace -D -e 0x28000 -S 32 /tmp/filename.trace
> 
> Then download:
> http://xenbits.xensource.com/ext/xenalyze.hg
> 
> Make it, and run the following command:
> 
> $ xenalyze -s --cpu-hz [speed-in-gigahertz]G filename.trace >
> filename.summary
> 
> The summary file breaks information down by domain, then vcpu; look at
> the "runstates" for each vcpu (running, runnable, blocked) and compare
> them.
> 
>  -George
> 
> On Tue, Apr 6, 2010 at 12:17 AM, Dan Magenheimer
> <dan.magenheimer@xxxxxxxxxx> wrote:
> > For the record, I am seeing the same problem (first one,
> > haven't yet got multiple runs) with vcpus=1 for all domains.
> > Only on 32-bit this time and only 20%, but those may
> > be random scheduling factors.  This is also with
> > tap:aio instead of file so as to eliminate dom0 page
> > cacheing effects.
> >
> >  394s dom0
> > 2265s 64-bit #1
> > 2275s 64-bit #2
> > 2912s 32-bit #1
> > 2247s 32-bit #2 <-- 20% less!
> >
> > I'm going to try a dom0_vcpus=1 run next.
> >
> >> -----Original Message-----
> >> From: Dan Magenheimer
> >> Sent: Monday, April 05, 2010 2:18 PM
> >> To: George Dunlap
> >> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> >> Subject: RE: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6)
> >>
> >> Thanks for the reply!
> >>
> >> Well I'm now seeing something a little more alarming:  Running
> >> an identical but CPU-overcommitted workload (just normal PV domains,
> >> no tmem or ballooning or anything), what would you expect the
> >> variance to be between successive identical measured runs
> >> on identical hardware?
> >>
> >> I am seeing total runtimes, both measured by elapsed time and by
> >> sum-of-CPUsec across all domains (incl dom0), vary by 6-7% or more.
> >> This seems a bit unusual/excessive to me and makes it very hard
> >> to measure improvements (e.g. by tmem, for an upcoming Xen summit
> >> presentation) or benchmark anything complex.
> >>
> >> > Is it possible that Linux is just favoring one vcpu over the other
> >> for
> >> > some reason?  Did you try running the same test but with only one
> VM?
> >>
> >> Well "make -j8" will likely be single-threaded part of the time,
> >> but I wouldn't expect that to make that big a difference between
> >> two identical workloads.
> >>
> >> I'm not sure I understand how I would run the same test with
> >> only one VM when the observation of the strangeness requires
> >> two VMs (and even then must be observed at random points during
> >> execution).
> >>
> >> > Another theory would be that most interrupts are delivered to vcpu
> 0,
> >> > so it may end up in "boost" priority more often.
> >>
> >> Hmmm... I'm not sure I get that, but what about _physical_ cpu 0
> >> for Xen?  If all physical cpu's are not the same and one VM
> >> has an affinity for vcpu0-on-pcpu0 and the other has an affinity
> >> for vcpu1-in-pcpu0, would that make a difference?
> >>
> >> But still, 40% seems very large and almost certainly a bug,
> >> especially given the new observations above.
> >>
> >> > -----Original Message-----
> >> > From: George Dunlap [mailto:George.Dunlap@xxxxxxxxxxxxx]
> >> > Sent: Monday, April 05, 2010 8:44 AM
> >> > To: Dan Magenheimer
> >> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> >> > Subject: Re: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6)
> >> >
> >> > Is it possible that Linux is just favoring one vcpu over the other
> >> for
> >> > some reason?  Did you try running the same test but with only one
> VM?
> >> >
> >> > Another theory would be that most interrupts are delivered to vcpu
> 0,
> >> > so it may end up in "boost" priority more often.
> >> >
> >> > I'll re-post the credit2 series shortly; Keir said he'd accept it
> >> > post-4.0.  You could try it with that and see what the performance
> is
> >> > like.
> >> >
> >> >  -George
> >> >
> >> > On Fri, Apr 2, 2010 at 5:48 PM, Dan Magenheimer
> >> > <dan.magenheimer@xxxxxxxxxx> wrote:
> >> > > I've been running some heavy testing on a recent Xen 4.0
> >> > > snapshot and seeing a strange scheduling anomaly that
> >> > > I thought I should report.  I don't know if this is
> >> > > a regression... I suspect not.
> >> > >
> >> > > System is a Core 2 Duo (Conroe).  Load is four 2-VCPU
> >> > > EL5u4 guests, two of which are 64-bit and two of which
> >> > > are 32-bit.  Otherwise they are identical.  All four
> >> > > are running a sequence of three Linux compiles with
> >> > > (make -j8 clean; make -j8).  All are started approximately
> >> > > concurrently: I synchronize the start of the test after
> >> > > all domains are launched with an external NFS semaphore
> >> > > file that is checked every 30 seconds.
> >> > >
> >> > > What I am seeing is a rather large discrepancy in the
> >> > > amount of time consumed "underway" by the four domains
> >> > > as reported by xentop and xm list.  I have seen this
> >> > > repeatedly, but the numbers in front of me right now are:
> >> > >
> >> > > 1191s dom0
> >> > > 3182s 64-bit #1
> >> > > 2577s 64-bit #2 <-- 20% less!
> >> > > 4316s 32-bit #1
> >> > > 2667s 32-bit #2 <-- 40% less!
> >> > >
> >> > > Again these are identical workloads and the pairs
> >> > > are identical released kernels running from identical
> >> > > "file"-based virtual block devices containing released
> >> > > distros.  Much of my testing had been with tmem and
> >> > > self-ballooning so I had blamed them for awhile,
> >> > > but I have reproduced it multiple times with both
> >> > > of those turned off.
> >> > >
> >> > > At start and after each kernel compile, I record
> >> > > a timestamp, so I know the same work is being done.
> >> > > Eventually the workload finishes on each domain and
> >> > > intentionally crashes the kernel so measurement is
> >> > > stopped.  At the conclusion, the 64-bit pair have
> >> > > very similar total CPU sec and the 32-bit pair have
> >> > > very similar total CPU sec so eventually (presumably
> >> > > when the #1's are done hogging CPU), the "slower"
> >> > > domains do finish the same amount of work.  As a
> >> > > result, it is hard to tell from just the final
> >> > > results that the four domains are getting scheduled
> >> > > at very different rates.
> >> > >
> >> > > Does this seem like a scheduler problem, or are there
> >> > > other explanations? Anybody care to try to reproduce it?
> >> > > Unfortunately, I have to use the machine now for other
> >> > > work.
> >> > >
> >> > > P.S. According to xentop, there is almost no network
> >> > > activity, so it is all CPU and VBD.  And the ratio
> >> > > of VBD activity looks to be approximately the same
> >> > > ratio as CPU(sec).
> >> > >
> >> > > _______________________________________________
> >> > > Xen-devel mailing list
> >> > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> > > http://lists.xensource.com/xen-devel
> >> > >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>