xen-devel
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
Hi Andrew,
Andrew Warfield schrieb:
> Hi Kevin,
>
> Thanks very much for your thoughts on this so far.
>
>> The big evil about what we currently have is code duplication. For each
>> image format we support we have two implementations: One in ioemu and
>> one in tapdisk. This is why it was agreed that we want to get rid of the
>> tapdisk implementations and move everything into ioemu (the other way
>> round wouldn't work because of HVM).
>
> I didn't realize that this had been universally agreed. Why do you
> think that implementing virtual block devices inside qemu is the right
> way to go -- it seems to me like a direction that's just going to
> bloat qemu and make it more difficult to maintain.
>
> The patches that Dutch has posted do a great thing from a design
> perspective: they remove almost all dependencies on Xen from blktap.
Don't get me wrong: I'm not saying that the design is complete crap.
Actually I can't tell because I haven't looked at it yet. It's just one
point, however an important one, I don't agree with.
The reason why I'd like to see the block drivers inside qemu is simple:
They are already there and we need them there at least for HVM machines
(and upstream qemu needs them always, so they certainly won't go away).
This patch series is creating a third copy of these original
implementations and probably it's going to modify them in yet another
way. I think we don't need to discuss that code duplication is bad,
especially if you have changes in each copy which make merging hard.
This doesn't mean that you need to implement your stuff in qemu. I think
I get your point, it wouldn't really fit in the design. Of course it
would be nice to have only one copy (the qemu one), but if you really
need a second copy, okay. That still doesn't explain why to change them
beyond recognition instead of making them really more or less a literal
copy or even compiling them out of the qemu source.
> There is no longer a blktapctrl that watches Xenstore, and the blktap
> kernel driver no longer duplicates functionality with blkback.
> Instead, the new code lets you instantiate a tapdisk on the linux
> command line, which results in the creation of a new block device.
> Requests to this block device are forwarded through tapdisk. This
> means that virtual block device code can be implemented in exactly one
> place (tapdisk), and that consumers (like blkback and qemu) can just
> use raw read/write requests to talk to that virtual device.
>
> The new structure is similar to FUSE, but at the block device
> layer.
>
> The ioemu patches take a bunch of the ugliest parts of the old
> blktap implementation -- especially the parts that talk over the
> blktap-internal character device for user/kernel request forwarding --
> and duplicate them in qemu. I'd argue that this is taking things in
> exactly the wrong direction, as it forces a bunch of internal xen-gore
> into qemu, and means that the blktap linux driver and qemu share an
> interface that must be maintained over time.
>
> The new blktap code has a pile of cool stuff: it does smart
> request dependency tracking which is incredibly helpful when
> implementing chained image formats effeciently. It has a request
> merging layer that eliminates an enormous amount of overhead.
> Finally, it has a quite neat interface for specifying block devices as
> connected graphs of block components.
>
> The only remaining Xen code on the data path is an optimization to
> preserve zero-copy request forwarding on Xen, and this won't matter in
> qemu environments.
>
> Having isolated tapdisks, and presenting the associated images as
> Linux block devices means that (a) you can use tools like ionice to
> prioritize individual block devices rather than having to set priority
> for *all* of qemu, (b) individual tapdisks can serve block devices for
> multiple VMs -- this is useful if you want to implement a cache for
> many VMs booting from a common image, it's also good for complicated
> distributed block devices like Parallax (which Dutch can tell you more
> about if you are interested) -- again, this is a case where having the
> code in qemu is bad, you want multiple VMs to share the tapdisk-based
> implementation. Finally, (c) tapdisk can be used to directly loopback
> the image into linux, which allows people to more easily work with the
> image's contents. Again, here I don't think you want a whole qemu.
>
> Sorry for the long email. The whole point behind blktap in the
> first place was to make it easy for people to do fun things on the
> block interface. I don't don't want to see repeated work on this
> front any more than you do -- but I don't think that hoovering the
> code into qemu solves any problems.
Thank you for this explanation, I see now what you're aiming at. I think
at least some parts of it should have gone into the patch comment in the
first place.
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|