On Thu, Oct 01, 2009 at 01:33:57AM +0200, Luca Lesinigo wrote:
> I'm getting problems whenever the load on a system increase, but IMHO
> it should be well withing hardware capabilities.
>
> My configuration:
> - HP Proliant DL160G5, with a single quadcore E5405, 14GiB RAM, 2x1TB
> sata disks Hitachi 7K1000.B on the onboard sata controller (intel
> chipset)
> - Xen-3.4.1 64bit hypervisor, compiled from gentoo portage, with
> default commandline settings (I just specify the serial console and
> nothing else)
> - Domain-0 with gentoo's xen-sources 2.6.21 (the xen 2.6.18 tarball
> didn't have networking, I think the HP Tigon3 gigabit driver is too
> old but hadn't time to look into that
> - Domain-0 is using the CFQ i/o scheduler, and works from a software
> raid-1, no tickless kernel, HZ=100. It has all the free ram (currently
> some 5.x GiB)
> - the rest of the disks is also mirrored in a raid-1 device, and I use
> LVM2 on top of that
> - 6x paravirt 64bit DomU with 2.6.29-gentoo-r5 kernel, with NOOP i/o
> scheduler, tickless kernel. 1 - 1.5GiB of ram each.
> - 1x HVM 32bit Windows XP DomU, without any paravirt driver, 512MiB RAM
> - I use logical volumes as storage space for DomU's, the linux ones
> also have 0.5GiB of swap space (unused, no DomU is swapping)
> - all the linux DomU are on ext3 (noatime), and all DomU are single-
> cpu (just one vcpu each)
> - network is bridged (one lan and one wan interface on the physical
> system and the same for each domU), no jumbo frames
>
> Usually load on the system is very low. But when there is some I/O
> related load (I can easily trigger it by rsync'ing lots of stuff
> between domU's or from a different system to one of the domU or to the
> dom0) load gets very high and I often see domU's spending all their
> cpu time in "wait" [for I/O] state. When that happens, load on
> Domain-0 gets high (jumps from <1 to >5) and loads on DomU's get high
> too probably because of processes waiting for I/O to happen. Sometimes
> iostat will even show exactly 0.00 tps on all the dm-X devices (domU
> storage backends) and some activity on the physical devices, like all
> domU I/O activity froze up while dom0 is busy flushing caches or doing
> some other stuff.
>
> vmstat in Dom0 shows one or two cores (25% or 50% cpu time) busy in
> 'iowait' state, and context switches go in the thousands, but not in
> the hundreths thousands that http://wiki.xensource.com/xenwiki/KnownIssues
> talks about.
>
You have only 2x 7200 rpm disks for 7 virtual machines and you're
wondering why there's a lot of iowait? :)
> I tried pinning cpus: Domain-0 had its four VCPUs pinned to CPUs 0 and
> 1, some domU's pinned to CPU 2, and some domU's pinned to CPU 3. As
> far as I can tell it did not do any difference.
> I also (briefly) tested with all linux DomU's running with the CFQ
> scheduler, while it didn't seem to make any difference it also was too
> short of a test to trust it much.
>
> What's worse, sometimes I get qemu-dm processes (for the HVM domU) in
> zombie state. It also happened that the HVM domU crashed and I wasn't
> able to restart it: I got the hotplug scripts not working error from
> xm create, and looking in xenstore-ls I saw instances of the crashed
> domU with all its resources (which probably was the cause of the
> error?). Had the reboot the whole system to be able to start that
> domain again.
>
> Normally iostat in Domain-0 shows more or less high tps (200~300 under
> normal load, even higher if I play around with rsync to artificially
> trigger the problems) on the md device where all the DomU reside, and
> much less (usually just 10-20% of the previous value) on the two
> physical disks sda and sdb that compose the mirror. I guess I see less
> tps because the scheduler/elevator in Dom-0 is doing its job.
>
> I don't know if the load problems and the HVM problem are linked or
> not, but I also don't know where to look to solve any one of them.
>
> Any help would be appreciated, thank you very much. Also, what are
> ideal/recommended settings in dom0 and domU regarding i/o schedulers
> and tickless or not?
> Is there any reason to leave the hypervisor some extra free ram or it
> is ok to just let xend shrink dom0 when needed and leave free just the
> minimum? If I sum up memory (currently) used by domains, I get
> 14146MiB. xm info says 14335MiB total_memory and 10MiB free_memory.
>
Single 7200 rpm SATA disk can do around 120 random IOPS..
120 IO operations per second.
120 IOPS / 7 VMs = 17 IOPS available per VM.
That's not much..
-- Pasi
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|