WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] lots of cycles in i/o wait state

Subject: Re: [Xen-users] lots of cycles in i/o wait state
From: Miles Fidelman <mfidelman@xxxxxxxxxxxxxxxx>
Date: Sun, 06 Jun 2010 22:17:40 -0400
Cc: "xen-users@xxxxxxxxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Sun, 06 Jun 2010 19:19:02 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C0ADCFA.5040904@xxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <4C0AD6E7.1000809@xxxxxxxxxxxxxxxx> <4C0ADCFA.5040904@xxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.9) Gecko/20100317 SeaMonkey/2.0.4
re. my previous messages on this topic:

It's absolutely amazing with mounting volumes with "noatime" set will do to reduce i/o wait times! Took a while to figure this out, though.

Miles

Am 06.06.2010 00:59, schrieb Miles Fidelman:
Hi Folks,

I've been doing some experimenting to see how far I can push some old
hardware into a virtualized environment - partially to see how much use
I can get out of the hardware, and partially to learn more about the
behavior of, and interactions between, software RAID, LVM, DRBD, and Xen.

Basic configuration:

- two machines, 4 disk drives each, two 1G ethernet ports (1 each to the
outside world, 1 each as a cross-connect)
- each machine runs Xen 3 on top of Debian Lenny (the basic install)
- very basic Dom0s - just running the hypervisor and i/o (including disk
management)
---- software RAID6 (md)
---- LVM
---- DRBD
---- heartbeat to provide some failure migration
- dom0, on each machine, runs directly on md RAID volumes (RAID1 for
boot, RAID6 for root and swap)
- each Xen VM uses 2 DRBD volumes - one for root, one for swap
- one of the VMs has a third volume, used for backup copies of files

One domU, on one machine, runs a medium volume mail/list server.  This
used to run non-virtualized on one of the machines, and I moved it into
a domU.  Before virtualization, everything just hummed along (98% idle
time as reported by top).  Virtualized, the machine is mostly idle, but
now top reports a lot of i/o wait time, usually in the 20-25% range).

As I've started experimenting with adding additional domUs, in various
configurations, I've found that my mail server can get into a state
where it's spending almost all of its cycles in an i/o wait state (95%
and higher as reported by top).  This is particularly noticeable when I
run a backup job (essentially a large tar job that reads from the root
volume and writes to the backup volume).  The domU grinds to halt.

So I've been trying to track down the bottlenecks.

At first, I thought this was probably a function of pushing my disk
stack beyond reasonable limits - what with multiple domUs on top of DRBD
volumes, on top of LVM volumes, on top of software RAID6 (md).  I
figured I was seeing a lot of disk churning.

But... after running some disk benchmarks, what I'm seeing is something
else:

- I took one machine, turned off all the domUs, and turned off DRBD
- I ran a disk benchmark (bonnie++) on dom0, which reported 50MB/sec to
90MB/sec of throughput depending on the test (not exactly sure what this
means, but it's a baseline)
- I then brought up DRBD and various combinations of domUs, and ran the
benchmark in various places
- the most interesting result, running in the same domU as the mail
server: 34M-60M depending on the test (not much degredation from running
directly on the RAID volume
- but.... while running, the benchmark, the baseline i/o wait percentage
jumps from 25% to the 70-90% range

So... the question becomes, if it's not disk churning, what's causing
all those i/o wait cycles?  I'm starting to think it might involve
buffering or other interactions in the hypervisor.

Any thoughts or suggestions regarding diagnostics and/or tuning?  (Other
than "throw hardware at it" of course :-).

Thanks very much,

Miles Fidelman





_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


--
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users