WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH] Blktap: Userspace file-based image support.(RFC)

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Re: [PATCH] Blktap: Userspace file-based image support.(RFC)
From: Anthony Liguori <aliguori@xxxxxxxxxx>
Date: Wed, 21 Jun 2006 09:45:50 -0500
Delivery-date: Wed, 21 Jun 2006 07:46:37 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <A95E2296287EAD4EB592B5DEEFCE0E9D4BAB1A@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <m3psh3pm7t.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.)
On Tue, 20 Jun 2006 14:10:30 -0700, Dan Smith wrote:

> IP> It doesn't bypass the buffer cache (so all bets are off for data
> IP> integrity) and can end up consuming all of dom0 memory with dirty
> IP> buffers -- just create a few loop devices and do a few parallel
> IP> dd's to them and watch the oomkiller go on the rampage. It's even
> IP> worse if the filesystem the file lives on is slow e.g. NFS.
> 
> Ok, it seems like this should be addressed in the upstream loop
> driver.  I imagine quite a few people are depending on the loop driver
> right now, expecting it to maintain data integrity.

It's probably worth spending some cycles trying to improve the loop driver
itself.

> Could the loop driver make use of the routines that do direct IO
> instead of the normal routines to solve this when it's an issue?

It appears that the loop driver is split between two threads using a
producer/consumer queue.  The main thread gets the bio requests and queues
them for the consumer thread.

The consumer thread can do a number of things depending on properties of
the fd.  It may use address ops, use fops->write, or do a transform of the
data.  It should be possible to, if the fd is opened with O_DIRECT and
fops has a valid aio_{read,write}, use proper aio calls to queue the
requests.  You'll probably have to get clever about how the thread blocks
(has to wake up either on the queue mutex or when an aio request completes).

I suspect that this will have a pretty noticable performance improvement
in the loop driver (especially on SCSI/SATA storage).

The loop driver still has issues though.  It cannot grow and it has a
pretty odd hardcoded limit (256 devices) which quickly becomes a
scalability issue.

The former problem could possibly be address by having a parameter for
SET_STATUS that let's you set the size of the device to be greater than
the size of the underlying file.  If a bio comes for an offset greater
than the underlying file, it would have to be smart enough to ftruncate
the file.  The error handling is a bit tough (you'll have to make sure
that if ftruncate fails, you fail the read/write--extra points if the
failure is temporary such that later on if space is freed up you succeed).

The hardcoded limit is a bit larger of a problem.  The driver would likely
need a bit of reworking.  Since 256 is the limit based on minor number
allocation, you would have to either get some more device number space for
it or just have the ability to allocate dynamic numbers and rely on
udev/hotplug for folks that want more than 256.

> This brings me to another question: Will people really be using
> file-based images for their VMs?  It seems to me that the performance
> of using a block device overshadows the convenience of a file image.

If the performance of the loop driver could be better (and fundamentally,
there's no reason it can't be pretty good), then I see no reason why using
file images wouldn't be the most common approach.

Files are quite a lot easier to manage than partitions.  Of course, I see
no reason why someone couldn't write a FUSE front-end to LVM :-)

Regards,

Anthony Liguori


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel