Luke, All
On Jun 1, 2009, at 8:43 PM, Luke S Crawford wrote: Peter Booth < peter_booth@xxxxxxx> writes: Here's more context. The VMs weren't page scanning. They did show non-
trivial %steal (where non-trivial is > 1%)
These VMs are commercially hosted on five quad core hosts with approx
14 VMs per host and just under 1GB RAM per VM. Thats not a lot of
memory, but then the workload of one nginx and three mongrels per VM
is comfortably under 512MB of RSS.
I guess I don't know much about mongrel, but if someone was complaining to me about performance of a modern web application in an image with only 1GB ram, CPU would not be the first thing I'd look at.
I look at everything. Yes 1GB is a limitation. The mongrel was configured taking that into account.
so steal was >1%? what was idle? what was iowait? if steal was only 10% and iowait was 50%, I'd still add more ram before I added more CPU.
Theres no need to discuss hypotheticals. Lets look at real numbers at a busy time:
sar -W -f 00:00:01 pswpin/s pswpout/s 00:00:06 0.00 0.00 00:00:11 0.00 0.00 00:00:16 0.00 0.00 00:00:21 0.00 0.00
pswpin/s pswpout/s is equal to zero at all times, in other words, no swapping is occurring. So disk isn't a factor here.
00:00:01 CPU %user %nice %system %iowait %steal %idle 00:00:06 all 84.42 0.00 6.92 3.08 0.96 4.62 00:00:11 all 92.46 0.00 6.15 0.00 1.19 0.20 00:00:16 all 90.24 0.00 6.37 0.40 2.00 1.00 00:00:21 all 88.42 0.00 8.98 0.00 1.80 0.80
We are clearly CPU starved.
and his performance improved. Disk is orders of magnitude slower than just about anything else (besides maybe network) so whenever you can exchange disk access for ram access, you see dramatic performance improvements.
That is not the case. You will only see an improvement if disk access is a bottleneck
My point, however, is that Xen performance is not well understood in
general, and there are situations where virtualization doesn't perform
well.
These sar readings on DomU do not tell the whole picture, nor do the studies that show Xen throughput is at worst only 8% worse than native Linux.
There are scenarios where the impact of virtualization on user response time can be a factor of 3 or 4.
This issue is poorly understood, has been seen and described in research literature, and until we get a handle on it and understand it, it will cause substantial problems.
With the increasing popularity of the cloud and virtualized environments, where there is less transparency than a physical environment, we should expect that performance problems will increase.
|