# HG changeset patch
# User Keir Fraser <keir.fraser@xxxxxxxxxx>
# Date 1244190683 -3600
# Node ID 9f4c5734e4aa014e062ddfdf7ed3763790b8e619
# Parent 21d1fcb0be4179a7122cbab1fd3aa4adff1c338a
blktap2: README updates
As promised, this brings the long out-of-sync documentation up to
date, and adds some getting started information about tapdisk driver
development - I get the occasional email on this latter subject.
Signed-off-by: Dutch Meyer <dmeyer@xxxxxxxxx>
---
tools/blktap2/README | 325 +++++++++++++++++++++++++++++++++++++++++----------
1 files changed, 262 insertions(+), 63 deletions(-)
diff -r 21d1fcb0be41 -r 9f4c5734e4aa tools/blktap2/README
--- a/tools/blktap2/README Fri Jun 05 09:30:36 2009 +0100
+++ b/tools/blktap2/README Fri Jun 05 09:31:23 2009 +0100
@@ -1,19 +1,21 @@ Blktap Userspace Tools + Library
-Blktap Userspace Tools + Library
+Blktap2 Userspace Tools + Library
================================
+
+Dutch Meyer
+4th June 2009
Andrew Warfield and Julian Chesterfield
16th June 2006
-{firstname.lastname}@cl.cam.ac.uk
-
-The blktap userspace toolkit provides a user-level disk I/O
-interface. The blktap mechanism involves a kernel driver that acts
+
+The blktap2 userspace toolkit provides a user-level disk I/O
+interface. The blktap2 mechanism involves a kernel driver that acts
similarly to the existing Xen/Linux blkback driver, and a set of
-associated user-level libraries. Using these tools, blktap allows
+associated user-level libraries. Using these tools, blktap2 allows
virtual block devices presented to VMs to be implemented in userspace
and to be backed by raw partitions, files, network, etc.
-The key benefit of blktap is that it makes it easy and fast to write
+The key benefit of blktap2 is that it makes it easy and fast to write
arbitrary block backends, and that these user-level backends actually
perform very well. Specifically:
@@ -38,7 +40,7 @@ perform very well. Specifically:
How it works (in one paragraph):
-Working in conjunction with the kernel blktap driver, all disk I/O
+Working in conjunction with the kernel blktap2 driver, all disk I/O
requests from VMs are passed to the userspace deamon (using a shared
memory interface) through a character device. Each active disk is
mapped to an individual device node, allowing per-disk processes to
@@ -49,74 +51,271 @@ code. We provide a simple, asynchronous
code. We provide a simple, asynchronous virtual disk interface that
makes it quite easy to add new disk implementations.
-As of June 2006 the current supported disk formats are:
+As of June 2009 the current supported disk formats are:
- Raw Images (both on partitions and in image files)
- - File-backed Qcow disks
- - Standalone sparse Qcow disks
- - Fast shareable RAM disk between VMs (requires some form of cluster-based
- filesystem support e.g. OCFS2 in the guest kernel)
- - Some VMDK images - your mileage may vary
-
-Raw and QCow images have asynchronous backends and so should perform
-fairly well. VMDK is based directly on the qemu vmdk driver, which is
-synchronous (a.k.a. slow).
+ - Fast sharable RAM disk between VMs (requires some form of
+ cluster-based filesystem support e.g. OCFS2 in the guest kernel)
+ - VHD, including snapshots and sparse images
+ - Qcow, including snapshots and sparse images
+
Build and Installation Instructions
===================================
-Make to configure the blktap backend driver in your dom0 kernel. It
-will cooperate fine with the existing backend driver, so you can
-experiment with tap disks without breaking existing VM configs.
-
-To build the tools separately, "make && make install" in
-tools/blktap.
+Make to configure the blktap2 backend driver in your dom0 kernel. It
+will inter-operate with the existing backend and frontend drivers. It
+will also cohabitate with the original blktap driver. However, some
+formats (currently aio and qcow) will default to their blktap2
+versions when specified in a vm configuration file.
+
+To build the tools separately, "make && make install" in
+tools/blktap2.
Using the Tools
===============
-Prepare the image for booting. For qcow files use the qcow utilities
-installed earlier. e.g. qcow-create generates a blank standalone image
-or a file-backed CoW image. img2qcow takes an existing image or
-partition and creates a sparse, standalone qcow-based file.
+Preparing an image for boot:
The userspace disk agent is configured to start automatically via xend
-(alternatively you can start it manually => 'blktapctrl')
-
-Customise the VM config file to use the 'tap' handler, followed by the
-driver type. e.g. for a raw image such as a file or partition:
-
-disk = ['tap:aio:<FILENAME>,sda1,w']
-
-e.g. for a qcow image:
-
-disk = ['tap:qcow:<FILENAME>,sda1,w']
-
-
-Mounting images in Dom0 using the blktap driver
+
+Customize the VM config file to use the 'tap:tapdisk' handler,
+followed by the driver type. e.g. for a raw image such as a file or
+partition:
+
+disk = ['tap:tapdisk:aio:<FILENAME>,sda1,w']
+
+Alternatively, the vhd-util tool (installed with make install, or in
+/blktap2/vhd) can be used to build sparse copy-on-write vhd images.
+
+For example, to build a sparse image -
+ vhd-util create -n MyVHDFile -s 1024
+
+This creates a sparse 1GB file named "MyVHDFile" that can be mounted
+and populated with data.
+
+One can also base the image on a raw file -
+ vhd-util snapshot -n MyVHDFile -p SomeRawFile -m
+
+This creates a sparse VHD file named "MyVHDFile" using "SomeRawFile"
+as a parent image. Copy-on-write semantics ensure that writes will be
+stored in "MyVHDFile" while reads will be directed to the most
+recently written version of the data, either in "MyVHDFile" or
+"SomeRawFile" as is appropriate. Other options exist as well, consult
+the vhd-util application for the complete set of VHD tools.
+
+VHD files can be mounted automatically in a guest similarly to the
+above AIO example simply by specifying the vhd driver.
+
+disk = ['tap:tapdisk:vhd:<VHD FILENAME>,sda1,w']
+
+
+Snapshots:
+
+Pausing a guest will also plug the corresponding IO queue for blktap2
+devices and stop blktap2 drivers. This can be used to implement a
+safe live snapshot of qcow and vhd disks. An example script "xmsnap"
+is shown in the tools/blktap2/drivers directory. This script will
+perform a live snapshot of a qcow disk. VHD files can use the
+"vhd-util snapshot" tool discussed above. If this snapshot command is
+applied to a raw file mounted with tap:tapdisk:AIO, include the -m
+flag and the driver will be reloaded as VHD. If applied to an already
+mounted VHD file, omit the -m flag.
+
+
+Mounting images in Dom0 using the blktap2 driver
===============================================
Tap (and blkback) disks are also mountable in Dom0 without requiring an
-active VM to attach. You will need to build a xenlinux Dom0 kernel that
-includes the blkfront driver (e.g. the default 'make world' or
-'make kernels' build. Simply use the xm command-line tool to activate
-the backend disks, and blkfront will generate a virtual block device that
-can be accessed in the same way as a loop device or partition:
-
-e.g. for a raw image file <FILENAME> that would normally be mounted using
-the loopback driver (such as 'mount -o loop <FILENAME> /mnt/disk'), do the
-following:
-
-xm block-attach 0 tap:aio:<FILENAME> /dev/xvda1 w 0
-mount /dev/xvda1 /mnt/disk <--- don't use loop driver
-
-In this way, you can use any of the userspace device-type drivers built
-with the blktap userspace toolkit to open and mount disks such as qcow
-or vmdk images:
-
-xm block-attach 0 tap:qcow:<FILENAME> /dev/xvda1 w 0
-mount /dev/xvda1 /mnt/disk
-
-
-
+active VM to attach.
+
+The syntax is -
+ tapdisk2 -n <type>:<full path to file>
+
+For example -
+ tapdisk2 -n aio:/home/images/rawFile.img
+
+When successful the location of the new device will be provided by
+tapdisk2 to stdout and tapdisk2 will terminate. From that point
+forward control of the device is provided through sysfs in the
+directory-
+
+ /sys/class/blktap2/blktap#/
+
+Where # is a blktap2 device number present in the path that tapdisk2
+printed before terminating. The sysfs interface is largely intuitive,
+for example, to remove tap device 0 one would-
+
+ echo 1 > /sys/class/blktap2/blktap0/remove
+
+Similarly, a pause control is available, which is can be used to plug
+the request queue of a live running guest.
+
+Previous versions of blktap mounted devices in dom0 by using blkfront
+in dom0 and the xm block-attach command. This approach is still
+available, though slightly more cumbersome.
+
+
+Tapdisk Development
+===============================================
+
+People regularly ask how to develop their own tapdisk drivers, and
+while it has not yet been well documented, the process is relatively
+easy. Here I will provide a brief overview. The best reference, of
+course, comes from the existing drivers. Specifically,
+blktap2/drivers/block-ram.c and blktap2/drivers/block-aio.c provide
+the clearest examples of simple drivers.
+
+Setup:
+
+First you need to register your new driver with blktap. This is done
+in disktypes.h. There are five things that you must do. To
+demonstrate, I will create a disk called "mynewdisk", you can name
+yours freely.
+
+1) Forward declare an instance of struct tap_disk.
+
+e.g. -
+ extern struct tap_disk tapdisk_mynewdisk;
+
+2) Claim one of the unused disk type numbers, take care to observe the
+MAX_DISK_TYPES macro, increasing the number if necessary.
+
+e.g. -
+ #define DISK_TYPE_MYNEWDISK 10
+
+3) Create an instance of disk_info_t. The bulk of this file contains examples
of these.
+
+e.g. -
+ static disk_info_t mynewdisk_disk = {
+ DISK_TYPE_MYNEWDISK,
+ "My New Disk (mynewdisk)",
+ "mynewdisk",
+ 0,
+ #ifdef TAPDISK
+ &tapdisk_mynewdisk,
+ #endif
+ };
+
+A few words about what these mean. The first field must be the disk
+type number you claimed in step (2). The second field is a string
+describing your disk, and may contain any relevant info. The third
+field is the name of your disk as will be used by the tapdisk2 utility
+and xend (for example tapdisk2 -n mynewdisk:/path/to/disk.image, or in
+your xm create config file). The forth is binary and determines
+whether you will have one instance of your driver, or many. Here, a 1
+means that your driver is a singleton and will coordinate access to
+any number of tap devices. 0 is more common, meaning that you will
+have one driver for each device that is created. The final field
+should contain a reference to the struct tap_disk you created in step
+(1).
+
+4) Add a reference to your disk info structure (from step (3)) to the
+dtypes array. Take care here - you need to place it in the position
+corresponding to the device type number you claimed in step (2). So
+we would place &mynewdisk_disk in dtypes[10]. Look at the other
+devices in this array and pad with "&null_disk," as necessary.
+
+5) Modify the xend python scripts. You need to add your disk name to
+the list of disks that xend recognizes.
+
+edit:
+ tools/python/xen/xend/server/BlktapController.py
+
+And add your disk to the "blktap_disk_types" array near the top of
+your file. Use the same name you specified in the third field of step
+(3). The order of this list is not important.
+
+
+Now your driver is ready to be written. Create a block-mynewdisk.c in
+tools/blktap2/drivers and add it to the Makefile.
+
+
+Development:
+
+Copying block-aio.c and block-ram.c would be a good place to start.
+Read those files as you go through this, I will be assisting by
+commenting on a few useful functions and structures.
+
+struct tap_disk:
+
+Remember the forward declaration in step (1) of the setup phase above?
+Now is the time to make that structure a reality. This structure
+contains a list of function pointers for all the routines that will be
+asked of your driver. Currently the required functions are open,
+close, read, write, get_parent_id, validate_parent, and debug.
+
+e.g. -
+ struct tap_disk tapdisk_mynewdisk = {
+ .disk_type = "tapdisk_mynewdisk",
+ .flags = 0,
+ .private_data_size = sizeof(struct tdmynewdisk_state),
+ .td_open = tdmynewdisk_open,
+ ....
+
+The private_data_size field is used to provide a structure to store
+the state of your device. It is very likely that you will want
+something here, but you are free to design whatever structure you
+want. Blktap will allocate this space for you, you just need to tell
+it how much space you want.
+
+
+tdmynewdisk_open:
+
+This is the open routine. The first argument is a structure
+representing your driver. Two fields in this array are
+interesting.
+
+driver->data will contain a block of memory of the size your requested
+in in the .private_data_size field of your struct tap_disk (above).
+
+driver->info contains a structure that details information about your
+disk. You need to fill this out. By convention this is done with a
+_get_image_info() function. Assign a size (the total number of
+sectors), sector_size (the size of each sector in bytes, and set
+driver->info->info to 0.
+
+The second parameter contains the name that was specified in the
+creation of your device, either through xend, or on the command line
+with tapdisk2. Usually this specifies a file that you will open in
+this routine. The final parameter, flags, contains one of a number of
+flags specified in tapdisk.h that may change the way you treat the
+disk.
+
+
+_queue_read/write:
+
+These are your read and write operations. What you do here will
+depend on your disk, but you should do exactly one of-
+
+1) call td_complete_request with either error or success code.
+
+2) Call td_forward_request, which will forward the request to the next
+driver in the stack.
+
+3) Queue the request for asynchronous processing with
+td_prep_read/write. In doing so, you will also register a callback
+for request completion. When the request completes you must do one of
+options (1) or (2) above. Finally, call td_queue_tiocb to submit the
+request to a wait queue.
+
+The above functions are defined in tapdisk-interface.c. If you don't
+use them as specified you will run into problems as your driver will
+fail to inform blktap of the state of requests that have been
+submitted. Blktap keeps track of all requests and does not like losing track.
+
+
+_close, _get_parent_id, _validate_parent:
+
+These last few tend to be very routine. _close is called when the
+device is closed, and also when it is paused (in this case, open will
+also be called later). The other functions are used in stacking
+drivers. Most often drivers will return TD_NO_PARENT and -EINVAL,
+respectively.
+
+
+
+
+
+
_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-changelog
|