Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks

To:	"Daniel P. Berrange" <berrange@xxxxxxxxxx>
Subject:	Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks
From:	"Andrew Warfield" <andrew.warfield@xxxxxxxxxxxx>
Date:	Thu, 19 Jul 2007 15:46:17 -0700
Cc:	xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Thu, 19 Jul 2007 15:43:59 -0700
Dkim-signature:	a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=SuwbKuTFzKfHTlGPTNflOG8eCP+FmsOLlNB2OQX3aqNJIGkeBrunYn/IxQDiqc7oaw8YcVic2oCQgCsJgib7bKuPYH2tgqvzRvOkasCyz9x9n/BGPY9WT4MChxHpA/owAaSmm0t3xxfnc7+eqA8QeeqXF171xLnUfAG1hVJheeI=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=ZYBo/hs1wn0g2GNw/HVGWIq0Jru7vmPCBtAJFtLyuuR14RQhqcCVJ7AS65fNV+Cy/GbMmPCao6aM3/A8PBSRFMJBYAB8HdMx2UK4KDiPXIaq1CS8MhJnITmOi/m1HJ0gTH1+lBMnsco5oGGkCHemwxu5ULLTTbRL3rBmjkpsqXk=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
In-reply-to:	<20070719180855.GF26669@xxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<20070719170922.GE26669@xxxxxxxxxx> <eacc82a40707191034j4ae8eb3ch8077977e39c92bce@xxxxxxxxxxxxxx> <20070719180855.GF26669@xxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

> In the other thread that's currently going on this topic, it sounds
> like others are quite successfully using the phantom code.  Why is it
> broken for you?

I really can't see how it works for anybody in 3.1.0 since the code which
sets up phantom devices simply doesn't work


Well let's fix it then. ;)

> As I've said before, I dislike the idea of having separate
> implementations of disks -- one in qemu and one in tapdisk.  We'd
> quite like to encourage people to be able to extend virtual block
> devices in the future, and it seems like your approach is going to
> force them to do two independent implementations of things.  It also
> leads to complications if you want to add things like caching, shared
> ramdisks, etc.  If phantom is broken, why don't we just fix that?

AFAICT with or without  my change you need to have two separate impls
of every disk format, since the phantom device stuff is only ever used
by blktap - non blktap disks still get processed directly by QEMU.


My concern is that it's possible to run the VM with it only having to
depend on a single implementation of a virtual disk.  If you don't use
PV drivers, the qemu block drivers do this nicely.  If you do, the
phantom code lets you do this by ensuring that emulated block requests
are redirected to tapdisk (in an admittedly ineffecient, but it
doesn't really matter for the length of time that it happens, way)
until the pv drivers come up.

IMHO the entire design & impl of blktap userspace was broken from the
start because it is duplicating functionality already in the QEMU
codebase.


Blktap was written before there were device emulated guests and before
qemu was capable of processing more than a single outstanding block
request at a time.  So the only functionality that it duplicated was
to use e.g. the vmdk and qcow code as a basis for some of the image
file implementations.  Vmdk is largely unchanged and I don't know of
anyone who actively uses it, qcow evolved considerably in order to do
asynchronous access and batched request processing.

With the benefit of hindsight, I would suggest that it would
be better to have QEMU able to speak the native blktap protocol straight
to the blktap kernel driver. Keep HVM using QEMU for all file backed
disks, since it already handles all the formats just fine, and have a
new machine type in QEMU for paravirt VMs which provided the tap daemon
replacement and also a PVFB daemon replacement. The you could kill the
entire blktap userspace codebase & most of the PVFB userspace codebase
and the libvncserver requirement.


I think a patch that pulled a lot of the tapdisk processing into qemu
would be a very interesting thing to compare overheads for against the
current model.

So there'd only be 1 single daemon in Dom0 per VM, it would be the same
daemon for PV and HVM, and all the open source virt platforms (Xen, KVM,
QEMU, VirtualBox) would all be reaping the benefit of each other's code
improvements to QEMU driver model, in particular for disk format code &
VNC server code, rather than forking & reimplementing private copies.

Of course this isn't a quick job, but if the motiviation is reducing
code duplication & alternative I/O paths, the focusing on QEMU for
everything seems like a much more viable idea than more Xen specific
code.


Absolutely.  Dan, I completely agree that it would be very good to
have a unified way to implement virtual block devices -- image
formats, interposition, and otherwise.  I think that the qemu and
blktap disk interfaces both shared this as an initial design goal.  I
agree it's a lot of work and I agree that it would be a very nice
thing -- in the same spirit as Rusty's virtio efforts -- to be able to
share these implementations across hypervisors/emulators/etc.  I also
know of some grad students who would be very happy to see virtual
block devices that they are building for blktap apply against
everything else.

The thing is is that doing everything in qemu doesn't currently
achieve this -- because PV drivers can't talk directly to qemu and
going through the emulated path results in suckful performance.  So
rather than taking a patch that means PV-based HVM domains have to
depend on multiple implementations of disks, I'd much prefer to see us
go in the direction of what you propose.

a.


On 7/19/07, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote:

On Thu, Jul 19, 2007 at 10:34:12AM -0700, Andrew Warfield wrote:
> So two comments on this:
>
> In the other thread that's currently going on this topic, it sounds
> like others are quite successfully using the phantom code.  Why is it
> broken for you?

I really can't see how it works for anybody in 3.1.0 since the code which
sets up phantom devices simply doesn't work

        try:
            imagetype = self.vm.info['image']['type']
        except:
            imagetype = ""

        if imagetype == 'hvm':

The body of that try: statement is trying to read hash keys which don't
exist, since 'vm.info' isn't a hash. So imagetype is always "" and so
none of the phantom setup code ever gets run.  Even once fixing that I
never get any devices appearing and the Vm just immediately shuts down.
It seems to be looking for the /dev/xvd* device nodes in Dom0 rather
than DomU which seems rather wrong.

> As I've said before, I dislike the idea of having separate
> implementations of disks -- one in qemu and one in tapdisk.  We'd
> quite like to encourage people to be able to extend virtual block
> devices in the future, and it seems like your approach is going to
> force them to do two independent implementations of things.  It also
> leads to complications if you want to add things like caching, shared
> ramdisks, etc.  If phantom is broken, why don't we just fix that?

AFAICT with or without  my change you need to have two separate impls
of every disk format, since the phantom device stuff is only ever used
by blktap - non blktap disks still get processed directly by QEMU. Now
if we intend to remove all support for file: entirely, and make blktap
compulsory for file backed VMs then I can see the benefit in having
everything go via one codepath. Though now having 2 userspace daemons
in Dom0 per HVM guest seems like its going in wrong direction to me.

IMHO the entire design & impl of blktap userspace was broken from the
start because it is duplicating functionality already in the QEMU
codebase. With the benefit of hindsight, I would suggest that it would
be better to have QEMU able to speak the native blktap protocol straight
to the blktap kernel driver. Keep HVM using QEMU for all file backed
disks, since it already handles all the formats just fine, and have a
new machine type in QEMU for paravirt VMs which provided the tap daemon
replacement and also a PVFB daemon replacement. The you could kill the
entire blktap userspace codebase & most of the PVFB userspace codebase
and the libvncserver requirement.

So there'd only be 1 single daemon in Dom0 per VM, it would be the same
daemon for PV and HVM, and all the open source virt platforms (Xen, KVM,
QEMU, VirtualBox) would all be reaping the benefit of each other's code
improvements to QEMU driver model, in particular for disk format code &
VNC server code, rather than forking & reimplementing private copies.

Of course this isn't a quick job, but if the motiviation is reducing
code duplication & alternative I/O paths, the focusing on QEMU for
everything seems like a much more viable idea than more Xen specific
code.

Dan.
--
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks