WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.

To: Daniel Stodden <Daniel.Stodden@xxxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: RE: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.
From: Dave Scott <Dave.Scott@xxxxxxxxxxxxx>
Date: Tue, 16 Nov 2010 13:00:25 +0000
Accept-language: en-US
Acceptlanguage: en-US
Cc: "Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 16 Nov 2010 05:01:31 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1289898792.23890.214.camel@ramone>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1289604707-13378-1-git-send-email-daniel.stodden@xxxxxxxxxx> <4CDDE0DA.2070303@xxxxxxxx> <1289620544.11102.373.camel@xxxxxxxxxxxxxxxxxxxxxxx> <4CE17B80.7080606@xxxxxxxx> <1289898792.23890.214.camel@ramone>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcuFbp5eZxfiGXLMRYumH6pI2Q1T7wAGGp7w
Thread-topic: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.
Hi,

Re: XCP's use of blktap2:

> On Mon, 2010-11-15 at 13:27 -0500, Jeremy Fitzhardinge wrote:
> > On 11/12/2010 07:55 PM, Daniel Stodden wrote:
> > > The second issue I see is the XCP side of things. XenServer got a
> lot of
> > > benefit out of blktap2, and particularly because of the tapdevs. It
> > > promotes a fairly rigorous split between a blkback VBD, controlled
> by
> > > the agent, and tapdevs, controlled by XS's storage manager.
> > >
> > > That doesn't prevent blkback to go into userspace, but it better
> won't
> > > share a process with some libblktap, which in turn would better not
> be
> > > controlled under the same xenstore path.
> >
> >
> > Could you elaborate on this?  What was the benefit?
> 
> It's been mainly a matter of who controls what. Blktap1 was basically a
> VBD, controlled by the agent. Blktap2 is a VDI represented as a block
> device. Leaving management of that to XCP's storage manager, which just
> hands that device node over to Xapi simplified many things. Before, the
> agent had to understand a lot about the type of storage, then talk to
> the right backend accordingly. Worse, in order to have storage
> management control a couple datapath features, you'd basically have to
> talk to Xapi, which would talk though xenstore to blktap, which was a
> bit tedious. :)

As Daniel says, XCP currently separates domain management (setting up, 
rebooting VMs) from storage management (attaching disks, snapshot, coalesce). 
In the current design the storage layer handles the storage control-path 
(instigating snapshots, clones, coalesce, dedup in future) through a storage 
API ("SMAPI") and provides a uniform interface to qemu, blkback for the 
data-path (currently in the form of a dom0 block device). In a VM start, xapi 
will first ask the storage control-path to make a disk available, and then pass 
this information to blkback/qemu.

One of the trickiest things XCP handles is vhd "coalesce": merging a vhd file 
into its "parent". This comes up because vhds are arranged in a tree structure 
where the leaves are separate independent VM disks and the nodes represent 
shared common blocks, the result of (eg) cloning a single VM lots of times. 
When guest disks are deleted and the vhd leaves are removed, it sometimes 
becomes possible to save space by merging nodes together. The tricky bit is 
doing this while I/O is still being performed in parallel against logically 
separate (but related by parentage/history) disks on different hosts. It's 
necessary for the thing doing the coalescing to know where all the I/O is going 
on (eg to be able to find the host and pid where the related tapdisks (or 
qemus) live) and it's necessary for it to be able to signal to these processes 
when they need to re-read the vhd tree metadata.

In the bad old blktap1 days, the storage control-path didn't know enough about 
the data-path to reliably signal the active tapdisks: IIRC the tapdisks were 
spawned by blktapctrl as a side-effect of the domain manager writing to 
xenstore. In the much better blktap2 days :) the storage control-path sets up 
(registers?) the data-path (currently via tap-ctl and a dom0 block device) and 
so it knows who to talk to in order to co-ordinate a coalesce.

So I think the critical thing is to be able to have the storage control-path 
able to do something to "register" a data-path, enabling it to find later and 
signal any processes using that data-path. There are a bunch of different 
possibilities the storage control-path could use instead of using tap-ctl to 
create a block device, including:

1. directly spawn a tapdisk2 userspace process. Some identifier (pid, unix 
domain socket) could be passed to qemu allowing it to perform I/O. The block 
backend could be either in the tapdisk2 directly or in qemu?

2. return a (path to vhd file, callback unix domain socket). This could be 
passed to qemu (or something else) and qemu could use the callback socket to 
register its intention to use the data-path (and hence that it needs to be 
signaled if something changes)

I'm sure there are lots of possibilities :-)

Cheers,
Dave
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>