WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support. (RFC

To: Andrew Warfield <andrew.warfield@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support. (RFC)
From: Anthony Liguori <aliguori@xxxxxxxxxx>
Date: Mon, 19 Jun 2006 13:55:10 -0500
Cc: Xen Developers <xen-devel@xxxxxxxxxxxxxxxxxxx>, Julian Chesterfield <julian.chesterfield@xxxxxxxxxxxx>
Delivery-date: Mon, 19 Jun 2006 11:55:37 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <eacc82a40606190919x4bd4ef22m9d8431e650e85a67@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <eacc82a40606190919x4bd4ef22m9d8431e650e85a67@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 1.5.0.4 (X11/20060612)
Hi Andy,


Performance is quite good, and we intend to focus on this a bit more
over the next few weeks, releasing updated patches as they are
available.  Bonnie results this morning are as follows (64-bit results
compare against linux blkback+loopback file, Julian can follow up with
loopback results for 32-bit later if anyone's interested):

-------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
64-bit:
xen0 4096 40115 93.4 41067 12.7 22757 1.2 32532 56.7 53724 0.4 121.4 0.0 img-sp 4096 20291 86.0 38091 18.1 19939 8.2 30854 69.0 47779 4.2 95.3 0.4 loop-sp 4096 33421 77.6 33663 13.1 18546 5.1 28606 59.2 46659 6.0 85.2 0.1

32-Bit:
xen0 1024 33857 94.0 45804 9.0 23269 0.0 25825 52.0 55628 0 185.0 0.0 img-sp 1448 32743 92.0 40703 8.0 23281 0.0 31139 75.0 56585 0 208.1 0.0

What is img-sp? Is this blktap + a physical device or is this blktap with something like qcow?

The numbers a tad worse than I'd expect them to be if it was a physical device. Theoretically, linux-aio is inserting requests directly into the backend. I expect there to be a certain amount of CPU overhead from context switching but since it's still zero-copy, I wouldn't expect less CPU usage and less throughput.

Any idea why this is or am I just totally misunderstanding how things should behave :-)

Working in conjunction with the kernel blktap driver, all disk I/O
requests from VMs are passed to the userspace deamon (using a shared
memory interface) through a character device. Each active disk is
mappd to an individual device node, allowing per-disk processes to
implement individual block devices where desired.  The userspace
drivers are implemented using asynchronous (Linux libaio),
O_DIRECT-based calls to preserve the unbuffered, batched and
asynchronous request dispatch achieved with the existing blockback
code.  We provide a simple, asynchronous virtual disk interface that
makes it quite easy to add new disk implementations.


A very much like the idea of a userspace block device backend. Have you considered what it would take to completely replace blkback with a userspace backend? I'm also curious why you choose a character device to interact with the ring queue instead of just attaching to the ring queue directly in userspace.

I think the whole discussion of COW support is orthogonal to a userspace backend FWIW so I'll save that part of the discussion for another thread :-)

Regards,

Anthony Liguori


As of June 2006 the current supported disk formats are:

- Raw Images (both on partitions and in image files)
- File-backed Qcow disks (sparse qcow overlay on a raw image/patrition).
- Standalone sparse Qcow disks (sparse disks, not backed by a parent image). - Fast shareable RAM disk between VMs (requires some form of cluster-based
  filesystem support e.g. OCFS2 in the guest kernel)
- Some VMDK images - your mileage may vary

Raw and QCow images have asynchronous backends and so should perform
fairly well.  VMDK is based directly on the qemu vmdk driver, which is
synchronous (a.k.a. slow).

The qcow backends support existing qcow disks.  There are also a set
of tools to generate and convert qcow images.  With these tools (and
driver support), we maintain the qcow file format but adjust
parameters for higher performance with Xen -- using a larger segment
size (4096 instead of 512) and more coarsely allocating metadata
regions.  We are continuing to improve this work and expect qcow
performance to improve a great deal over the newxt few weeks.

Build and Installation Instructions
===================================

You will need libaio >= 0.3.104 on your target system to build the
tools (if you are installing RPMs, this means libaio and
libaio-devel).

Make to configure the blktap backend driver in your dom0 kernel.  It
will cooperate fine with the existing backend driver, so you can
experiment with tap disks without breaking existing VM configs.

To build the tools separately, "make && make install" in
tools/blktap_user.


Using the Tools
===============

Prepare the image for booting. For qcow files use the qcow utilities
installed earlier. e.g. qcow-create generates a blank standalone image
or a file-backed CoW image. img2qcow takes an existing image or
partition and creates a sparse, standalone qcow-based file.

Start the userspace disk agent either on system boot (e.g. via an init
script) or manually => 'blktapctrl'

Customise the VM config file to use the 'tap' handler, followed by the
driver type. e.g. for a raw image such as a file or partition:

disk = ['tap:aio:<FILENAME>,sda1,w']

e.g. for a qcow image:

disk = ['tap:qcow:<FILENAME>,sda1,w']
------------------------------------------------------------------------

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>