WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)

To: "Dan Smith" <danms@xxxxxxxxxx>, "Andrew Warfield" <andrew.warfield@xxxxxxxxxxxx>
Subject: RE: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
From: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
Date: Tue, 20 Jun 2006 12:07:45 +0100
Cc: NAHieu <nahieu@xxxxxxxxx>, Xen Developers <xen-devel@xxxxxxxxxxxxxxxxxxx>, Julian Chesterfield <julian.chesterfield@xxxxxxxxxxxx>
Delivery-date: Tue, 20 Jun 2006 04:08:47 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcaT5c4/PXp7K2ofQ5WyttnMyou6VAAchnKQ
Thread-topic: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
> AW> This should be fixable though.  I'm also not sure how carefully
> AW> dm-u watches block completion responses to ensure safety of
> AW> metadata updates relative to data writes.  This too should be
> AW> fixable -- i just don't know if the user-level tools can currently
> AW> request completion notifications on requests that they've
> AW> processed.
> 
> So, right now, we're a little optimistic about metadata writing.  It
> will be relatively easy to hijack the callback routine for the disk
> request (a technique which is heavily used in the rest of the block
> layer) to get a completion trigger.  We can then notify userspace for
> the metadata write and then trigger the original callback routine for
> completion.

Yep, dm-userspace is certainly going to need to have a way of
intercepting IO completions and then choosing when it's actually going
to propagate the completion to the backend. That's quite a big change to
the current code (incidentally, the dm-snap code is pretty shocking in
this respect too).

> AW> A benefit to the dm-user patch is that it is more of a linux
> AW> approach than a xen+linux approach.  Dm-user will be generally
> AW> useful in the linux tree
> 
> Right, this is a huge advantage, I think.  Being able to mount images
> as if they were disks will be quite helpful.  Another benefit is the
> ability to easily convert between formats.  Converting a vmdk to a
> qcow is as easy as mounting both and doing a "cp -R" between them.

I think the blktap code should definitely export a kernel device at the
top so that the same property holds. Should be easy to add.

> AW> which has some bad failure characteristics which can result in
> AW> both data being acknowledged as written even though it hasn't
> AW> been, and the OOM killer going insane.  I think some fixes to loop
> AW> probably need to be applied in the near future given how much
> AW> people are generally depending on the code with VMs.
> 
> Can you elaborate about what specifically is wrong with the loop
> driver?

It doesn't bypass the buffer cache (so all bets are off for data
integrity) and can end up consuming all of dom0 memory with dirty
buffers -- just create a few loop devices and do a few parallel dd's to
them and watch the oomkiller go on the rampage. It's even worse if the
filesystem the file lives on is slow e.g. NFS.

> AW> Julian and I have talked about extending the tap driver to combine
> AW> it with blkback and allow block address translation without access
> AW> to request contents.
> 
> Since the kernel already has a block address translation solution
> (i.e. device-mapper), is there a benefit to adding another
> xen-specific one?

I think blktap and dm-userspace are quite complementary, so I don't see
a problem with having them both in the tree. Right now, blktap looks to
be the more mature solution, but dm-userspace could catch up. Blktap
will obviously still be preferable when its necessary to actually touch
the data.

Ian


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel