On Mon, 2006-08-07 at 12:18 -0400, Michael LeMay wrote:
> This is precisely the sort of problem that the Keir's proposal seems to
> address. I've copied my comments on the proposal below; perhaps we can
> discuss them further now since nobody was interested when I originally
> posted them. :-)
>
> ---
>
> Here's another general comment for discussion...
>
> The bottom of page 18 in the Xen Roadmap proposal recommends considering
> how to "export byte stream
> (TCP) data between domains in a high performance fashion." For
> communications that occur between domains on a single physical machine,
> it would seem logical to setup a new address and protocol family within
> Linux that could be used to create and manipulate stream sockets via the
> standard interfaces (I'm focusing on Linux at this point, although
> similar adaptations could be made to other kernels). Then, behind the
> scenes, the Xen grant tables could be used to efficiently transfer
> socket buffers between the domains. This should involve much less
> overhead than directly connecting two network frontends or performing
> other optimizations at lower layers, since it would truncate the
> protocol stack and avoid unnecessary TCP-style flow control protocols.
>
> An enhancement such as this could help to eliminate the network
> dependence of some Xen management applications, particularly those that
> rely on XML-RPC to communicate. For example, xm currently uses a UNIX
> domain socket to communicate with Xend, which introduces an artificial
> requirement that xend and xm be running in the same domain. Once XenSE
> gains traction and management utilities are scattered across multiple
> domains, UNIX domain sockets will no longer be adequate. Under this
> scheme, stream sockets to specific domains could easily be constructed,
> without regard for the network configuration on the system.
>
> One important detail that I haven't yet resolved is how to address
> inter-domain sockets. Of course, the most important component in the
> address for each socket would be the domain ID. However, some sort of
> port specification or pathname would also be necessary. I'm not sure
> which of those options would be appropriate in this case. Port numbers
> would be consistent with TCP and would probably ease the task of porting
> applications based on TCP, but pathnames are more consistent with the
> UNIX domain sockets used by xm and xend. Perhaps we could provide both,
> using two address families associated with the same protocol family?
>
> What other ideas have been floating around on how to accomplish
> byte-stream transport between domains? Are any concrete efforts to
> provide this functionality currently underway? Thanks!
hi all.
since you've explicitly asked for comments, here's mine.
from a performance point of view, it is all obviously correct. get rid
of the tcp congestion/flow/reliability overhead. in a synchronous,
reliable environment like host-local domain intercommunication
infrastructure, as you propose, it is nothing but overhead, and should
speed up things a lot. plus it saves a whole bunch of memory.
but there's a different point of view, which i would like to point out.
if you think about the whole 'virtualization' thing, some of the
relevant literature is correct to point out that simple unix process is
nothing but a virtual machine. a 'process vm', in many respects quite
different from a system vm on top of a hypervisor, like xen, though it
already has a number of features which make up a virtual machine.
resource control and abstraction, as an example, being the most
prominent ones. such comparisons are especially daunting if you look at
a paravirtualizing, microkernel-style hypervisor design, like xen is
one.
so, if operating systems and hypervisors are already so similar, where's
the merit? one of the major features which make many system VMMs,
carrying whole systems, so interesting and different from a unix-style
operating system, carrying a number simple processes, is proper
isolation. 'isolation' here means separation of an entity, here the
guest os instance, from its environment. currently, there is only a few
communication primitives connecting a guest from its supporting
processing environment. apart from vcpu-state, it's block I/O, network
I/O, memory aquisition. a small number of interfaces, each of them on a
sufficiently abstract level to enable one of the most distinguishing
features (compared with conventional os-level processing models) a
system vm has to offer: migrateablity. if the communication primitives
remain simple, and more important, location-indepent enough, you can
just freeze the whole thing on one place, move it around, and thaw
processing state at whatever different location you see fit. for xen as
of today, complexity of implementation varies somewhere between
'trivial' and 'easy enough'.
now try the same thing with your regular unix process. let's see what we
need to carry: system v ipc. shared memory. ip sockets, ok, but then
unix domain sockets, netlink sockets. pipes. named pipes. device special
files. for starters, just migrate open files terminating in block
storage. then try maintaining original process identifiers, your
application may have inquired about them and computational state
therefore depends on their consistency as well. save all that, migrate,
now try to restore elsewhere. the bottom line is: unix processes are
anything but isolated. for good reason, a lot of useful applications
depend on on inter-process communication. but that lack of isolation has
its cost.
what the proposal above means is basically addition of dedicated ipc to
the domain model. good for performance, but also a good step towards
breaking isolation. dom3 may call connect(socket(PF_XEN), "dom7") in
future. does that mean if i move dom3 to a backup node, dom7 has to move
as well? no not desirable, ok, let's write a proxy service redirecting
the once so efficient channel over tcp back to maintain transparency,
then. that's just a few additional lines of code. then don't forget
those few additional additional code lines telling the domain controller
to automatically reroot that proxy as well, in case either domain needs
to remigrate at a later point because the machine maintainer cleary
doesn't want to have to care.
yes, still it's all 'virtually' possible. after all, computers are state
machines, and state could always be captured. it just turns out to be a
whole lot of work if the number of state machines connecting your vm to
its environment keeps morphing and multiplying. operating systems are a
moving target, and probably always stay so. that's why hypervisors make
sense, as long as they stay simple and reasonably nonintrusive to the
guest. that is why os virtualization like openvz may be doomed. if they
don't make it into the stock kernel, so others help to maintain their
code, they'll keep maintaining on a pretty regular basis, until
infinity. xen does as well due to paravitualization, but not as much as
the vmm/os-integrated approach. i even suggest a term for this class of
proposal: "overparavirtualization". the point where you modified the
guest so deeply that you need someone else to maintain the patches.
apart from that, i'm all for performance.
there's a compromise: add those features, but take good care to separate
them from the isolated, network-transparent, preferably IP-based,
regular standard guest state. never make such a thing a dependency of
anything. most users of vm technology are better off rejecting it, if
they wish to keep the features distinguishing the result from a standard
operating system environment.
make it absolutely clear to users, that if they wish to configure fast
host-local inter-domain-communications, they get what they desire: fast,
but host-local domain-interdependent communications. if your customer
asks for light-weight optimized inter-domain-communication, ask her if
that specific application would not rather demand for regular
inter-process-communications on a standard operating system, because
that's what they get then.
kind regards,
daniel
--
Daniel Stodden
LRR - Lehrstuhl für Rechnertechnik und Rechnerorganisation
Institut für Informatik der TU München D-85748 Garching
http://www.lrr.in.tum.de/~stodden mailto:stodden@xxxxxxxxxx
PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B
signature.asc
Description: This is a digitally signed message part
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|