|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] CPU Utilization
Dave Thompson (davetho) wrote:
-----Original Message-----
From: Andrew Theurer [mailto:habanero@xxxxxxxxxx]
Sent: Monday, December 12, 2005 9:24 PM
To: Dave Thompson (davetho)
Cc: Anthony Liguori; xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] CPU Utilization
But what else is running? In this case I only have dom0 configured,
there is no domU. The only other possibility would be the hypervisor
and I hope the hypervisor is not accounting for the other 30%.
If xend is started, you may have the software bridge running
which can use as much as 10% cpu.
But I would think that the bridge activity should be showing up
in the top CPU summary as well. It is running on domain 0 after all.
I know one person suggested that kernel activity is not represented
in the top CPU util output. But I don't see how that can be right.
If so, where else is that time accounted for? It seems to be all
there (in the sy, hi, and si values).
Also, I don't see soft ints in that top output.
That could also be another ~7% cpu.
Soft interrupt time is accounted for in the si field (15%) of the
summary. I believe that is where most (if not all) of the TCP
processing is performed. Here is the top CPU summary display again:
Cpu(s): 1.0% us, 7.3% sy, 0.0% ni, 73.3% id, 0.0% wa, 3.3% hi,
15.0% si
Sorry, I overlooked the si.
Also xen is doing some work, receiving the real interrupts
and generating virtual interrupts to dom0, so with all this,
it is possible that you are using another 30% unseen
in top.
But aren't the hypervisor calls actually still being accounted for
by the domain since clock ticks are not lost but made up for in the
timer_interrupt() function of arch/xen/i386/kiernel/time.c? The
only issue is really when a domain is preempted by another domain
by the xen scheduler and this is actually a problem in the other
direction. The swapped out domain will still account for the
time in whichever time bucket it was using when the domain was
preempted (so the same time is accounted for by both domains).
Basically the aggregated CPU time for all domains on a CPU could
add greater than 100% because of this. If the domain is
re-scheduled because of a SCHEDOP_block in the idle loop, the time
will be properly accounted for as idle time.
I wonder if this is working under all situations. This problem seems
familiar. Before the kernel accounted for si and hi properly, we had a
very similar situation with this type of workload: lots of cpu time
unaccounted for because the interrupt processing happend mostly when the
system was idle, and the timer tick did not account for this properly.
I wonder if we have a similar problem in xen/linux. If lost ticks are
"queued up" but accounted for just one type of mode, then I think we
could be way off in some sitations like this.
However, none of this really matters for my case since I am
only running domain 0, there is no guest domain. I just want
a good explanation why 'xm top' is reporting 30% more CPU utilization
than top in this case.
Best way to confirm this would be to use xenoprofile.
Xenoprof is great for seeing which kernel functions are taking
the majority of time but does it really help with CPU utilization?
It counts (in the default case) unhalted clock cycles and in the
xen idle loop the processor is halted (to save power) so the
clock cycles are not accounted for. Is this right or am I
missing something.
I guess I was hoping to find a smoking gun in xen :). The only other
thing I think we could do is count the number of total samples we got
over x seconds and compare this with the number of samples we would get
in the same time period on a 100% busy system. We should then be able
to figure out how much % time the cpu was halted.
-Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|