My first observation would be
that I don't trust any self measured performance values from a VM. There are
tricky time usage allocation issues and I've seen and heard the 8% claims but
didn't believe the folks knew how to measure the VM behavior w/o trusting the
VM.
Luke, All
On Jun 1, 2009, at 8:43 PM, Luke S Crawford wrote:
Peter Booth <
peter_booth@xxxxxxx> writes:
Here's more context. The VMs weren't page scanning.
They did show non-
trivial %steal (where non-trivial is >
1%)
These VMs are commercially hosted on five quad core
hosts with approx
14 VMs per host and just under 1GB RAM per VM. Thats
not a lot of
memory, but then the workload of one nginx and three
mongrels per VM
is comfortably under 512MB of
RSS.
I guess I don't know much about mongrel, but if
someone was complaining to me
about performance of a modern web application
in an image with only 1GB ram,
CPU would not be the first thing I'd look
at.
I look at everything. Yes 1GB is a limitation. The mongrel was
configured taking that into account.
so steal was >1%? what was idle? what was iowait?
if steal was only 10%
and iowait was 50%, I'd still add more
ram before I added more CPU.
Theres no need to discuss hypotheticals. Lets look at real numbers at a
busy time:
sar -W
-f
00:00:01
pswpin/s pswpout/s
00:00:06
0.00 0.00
00:00:11
0.00 0.00
00:00:16
0.00 0.00
00:00:21
0.00 0.00
pswpin/s pswpout/s is equal to zero at all times, in other words, no
swapping is occurring.
So disk isn't a factor here.
00:00:01
CPU %user %nice %system %iowait
%steal %idle
00:00:06
all 84.42 0.00 6.92
3.08 0.96
4.62
00:00:11
all 92.46 0.00 6.15
0.00 1.19
0.20
00:00:16
all 90.24 0.00 6.37
0.40 2.00
1.00
00:00:21
all 88.42 0.00 8.98
0.00 1.80
0.80
We are clearly CPU starved.
and his performance improved. Disk is orders of magnitude slower
than
just about anything else (besides maybe network) so whenever you
can
exchange disk access for ram access, you see dramatic performance
improvements.
That is not the case. You will only see an improvement if disk access is a
bottleneck
My point, however, is that Xen performance is not
well understood in
general, and there are situations where
virtualization doesn't perform
well.
These sar readings on DomU do not tell the whole picture, nor do the
studies that
show Xen throughput is at worst only 8% worse than native
Linux.
There are scenarios where the impact of virtualization on user response
time
can be a factor of 3 or 4.
This issue is poorly understood, has been seen and described in research
literature, and
until we get a handle on it and understand it, it will cause substantial
problems.
With the increasing popularity of the cloud and virtualized
environments, where there
is less transparency than a physical environment, we should expect
that
performance problems will increase.