[Xen-devel] Re: credit scheduler error rates as reported by HP a

On 12/04/07 09:22 -0700, Lucy Cherkasova wrote:

Hi Mike,


Because of 1-CPU machine, the explanation of this phenomena is different
(it is not related to load balancing of VCPUs) and the Credit schedulercan/should be made more precise.


No, the opposite is true. Running on a 1=cpu machine will exagerate
the over-allocation, because load-balancing has no effect. Hence a
vcpu that has already exceeded its allocation will be selected to
run. See csched_schedule and csched_load_balance in the source file
sched_credit.c

None of this explains the negative allocation errors, where the vcpu's
received less than their pcpu allotments. I speculate that a couple of
circumstances may contribute to negative allocation errors:

very low weights attached to domains will cause the credit scheduler
to attempt to pause vcpus almost every accounting cycle. vcpus may
therefore not have as many opportunities to run as frequently as
possible. If the ALERT measument method is different, or has a
different interval, than the credit schedulers 10ms tick and 30ms

accounting cycle, negative errors may result in the view of ALERT.


ALERT benchmark is setting the allocation of a SINGLE domain (on 1 CPU machine,
no other competing domains while running this benchmark) to a chosen

target CPU allocation, e.g., 20%, in the non-work-conserving mode.It means that the CPU allocation is CAPPED by 20%. This single domain runs"slurp" (a tight CPU loop, 1 process) to consume the allocated CPU share.


Yes, again, this will cause the credit scheduler to pause the domu
very frequently, which might explain some of the under-allocation

errors.

The monitoring part of ALERT just collects the measurements from the system
using both XenMon and xentop with 1 second reporting granularity
Since 1 sec is so much larger than 30 ms slices, there should be possible
to get a very accurate CPU allocation for larger CPU allocation targets.
However, for 1% CPU allocation you have an immediate error, because
Credit will allocate 30ms slice (that is 3% of 1 sec). If Credit
would use 10 sec slices than the error will be (theoretically) bounded

to 1%.The expectations are that each 1 sec measurements should show 20% CPUutilization for this domain.


It may be the case that the credit scheduler believes 1 sec
measurements should show *at least* a 20% CPU utilization for this
domain. That's the way it is code afaict. I simple patch to
csched_credit may be able to confirm this if you can run your tests

with a patches scheduler.

We run ALERT for different CPU allocation targets from 1% to 90%.
The reported error is the error between the targetted CPU allocation andthe measured CPU allocation at 1 sec granularity.
I/O activity: if ALERT performans I/O activity the test, even though
it is "cpu intensive" may cause domu to block on dom0 frequently,
meaning it will idle more, especially if dom0 has a low credit
allocation.
There are no I/O activities, ALERT functionality is very special asdescribed above: nothing else is happening in the system.


Questions: how does ALERT measure actual cpu allocation? Using Xenmon?


As, I've mentioned above we have measurements from both XenMon and xentop,
they are very close for these experiments.

How does the ALERT exersize the domain?


ALERT runs "slurp", a cpu-hungry loop, which will "eat"
as much CPU as you allocate to it. It is a single process application.


The paper didn't mention the

actual system calls and hypercalls the domains are making when running
ALERT.


There is none of such: it is a pure user space benchmark.


That's impossible, there must be hypercalls to load and start the
benchmark if nothing else. In addition, there are many hypercalls to
the scheduler. What you mean is that the benchmark does not make

hypercalls.

The only reason I asked is that I wanted to know if block or net I/O
was influencing the ALERT test and it sounds as if this is not the case.

Mike

--
Mike D. Day
IBM LTC
Cell: 919 412-3900
Sametime: ncmike@xxxxxxxxxx AIM: ncmikeday  Yahoo: ultra.runner
PGP key: http://www.ncultra.org/ncmike/pubkey.asc

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: credit scheduler error rates as reported by HP and UCSD