WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support. (RFC

To: "Andrew Warfield" <andrew.warfield@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support. (RFC)
From: Dan Smith <danms@xxxxxxxxxx>
Date: Mon, 19 Jun 2006 14:16:40 -0700
Cc: NAHieu <nahieu@xxxxxxxxx>, Xen Developers <xen-devel@xxxxxxxxxxxxxxxxxxx>, Julian Chesterfield <julian.chesterfield@xxxxxxxxxxxx>
Delivery-date: Mon, 19 Jun 2006 14:16:57 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <eacc82a40606191022k4251eed0re68411a9f8fc974a@xxxxxxxxxxxxxx> (Andrew Warfield's message of "Mon, 19 Jun 2006 10:22:04 -0700")
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <eacc82a40606190919x4bd4ef22m9d8431e650e85a67@xxxxxxxxxxxxxx> <5d7aca950606190951q5f67d8aav5a2591a360edca4d@xxxxxxxxxxxxxx> <eacc82a40606191022k4251eed0re68411a9f8fc974a@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux)
AW> I'm sure that Dan can comment on this as well.  The main technical
AW> difference is that (as I understand it at least) dm-userspace
AW> doesn't bring block data through userspace, just the block request
AW> addresses, which may be redirected.  The current tap code maps the
AW> entire request up, so you can potentially change the data and you
AW> can issue block I/O using normal unix file access functions.

Yup, that's a correct assessment.

AW> My intuition is that an approach like dm-userspace can be made
AW> more efficient in the long run, but right now it's going to be
AW> slower as you need to do copies of guest data pages as requests go
AW> through the device mapper kernel code.

Why do you say that?  I would imagine that blkback provides the domU
pages as the target pages in the request, is that right?  In that
case, the data coming off of the disk should go directly into the domU
page.  Remember that dm-userspace doesn't do anything other than
rewriting of the destination device and sector of a request.  So,
however it works for blkback now, is how it works with dm-userspace in
the mix.

AW> This should be fixable though.  I'm also not sure how carefully
AW> dm-u watches block completion responses to ensure safety of
AW> metadata updates relative to data writes.  This too should be
AW> fixable -- i just don't know if the user-level tools can currently
AW> request completion notifications on requests that they've
AW> processed.  

So, right now, we're a little optimistic about metadata writing.  It
will be relatively easy to hijack the callback routine for the disk
request (a technique which is heavily used in the rest of the block
layer) to get a completion trigger.  We can then notify userspace for
the metadata write and then trigger the original callback routine for
completion. 

AW> A benefit to the dm-user patch is that it is more of a linux
AW> approach than a xen+linux approach.  Dm-user will be generally
AW> useful in the linux tree

Right, this is a huge advantage, I think.  Being able to mount images
as if they were disks will be quite helpful.  Another benefit is the
ability to easily convert between formats.  Converting a vmdk to a
qcow is as easy as mounting both and doing a "cp -R" between them.

AW> Similarly though, one downside of dm-user, that is absolutely no
AW> fault of the developers, is the dependency on the linux loopback
AW> driver

Just a clarification, this is only if file images are used.  If using
LVMs or partitions or some other block device, we don't use the loop
driver.

AW> which has some bad failure characteristics which can result in
AW> both data being acknowledged as written even though it hasn't
AW> been, and the OOM killer going insane.  I think some fixes to loop
AW> probably need to be applied in the near future given how much
AW> people are generally depending on the code with VMs.

Can you elaborate about what specifically is wrong with the loop
driver?

AW> Julian and I have talked about extending the tap driver to combine
AW> it with blkback and allow block address translation without access
AW> to request contents.

Since the kernel already has a block address translation solution
(i.e. device-mapper), is there a benefit to adding another
xen-specific one?

Another question I have is this: doesn't the dependence on libaio
limit you to certain filesystems?  For example, the page for libaio
doesn't mention reisferfs as supported.  Does that mean that SLES
users won't be able to use ublkback?

Thanks for posting your code Andrew!

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@xxxxxxxxxx

Attachment: pgpol5KtH7gd8.pgp
Description: PGP signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel