After many delays and false starts, here's a preliminary netchannel2
implementation:
http://xenbits.xensource.com/ext/netchannel2/xen-unstable.hg
http://xenbits.xensource.com/ext/netchannel2/linux-2.6.18.hg
These trees are up-to-date with the mainline trees of the same name as
of a couple of days ago.
Here are the main features of the current scheme:
-- Obviously, it implements a fully functional network interface.
LRO, TSO, and checksum offload are all supported. Hot-add and
hot-remove work as expected.
-- The copy-to-receiver-buffers is now performed in the receiving
domain, rather than in dom0. This helps to prevent dom0 from
becoming a bottleneck, and also has some cache locality advantages.
-- Inter-domain traffic can be configure to bypass dom0 completely.
Once a bypass is established, the domains communicate on their own
private ring, without indirecting via dom0. This significantly
increases inter-domain bandwidth, reduces latency, and reduces dom0
load.
(This is currently somewhat rough around the edges, and each bypass
needs to be configured manually. It'll (hopefully) eventually be
automatic, but that hasn't been implemented yet.)
-- A new, and hopefully far more extensible, ring protocol, supporting
variable size messages, multi-page rings, and out-of-order message
return. This is intended to make VMDQ support straightforward,
although that hasn't been implemented yet.
-- Packet headers are sent in-line in the ring, rather than
out-of-band in fragment descriptors. Small packets (e.g. TCP ACKs)
are sent entirely in-line.
-- There's an asymmetry limiter, intended to protect dom0 against
denial of service attacks by malicious domUs.
-- Sub-page grant support. The grant table interface is extended so a
domain can grant another domain access to a range of bytes within a
page, and Xen will then prevent the grantee domain accessing
outside that range. For obvious reasons, it isn't possible to map
these grant references, and domains are expected to use the grant
copy hypercalls instead.
-- Transitive grant support. It's now possible for a domain to create
a grant reference which indirects to another grant reference, so
that any attempt to access the first grant reference will be
redirected to the second one. This is used to implement
receiver-side copy on inter-domain traffic: rather than copying the
packet in dom0, dom0 creates a transitive grant referencing the
original transmit buffer, and passes that to the receiving domain.
For implementation reasons, only a single level of transitive
granting is supported, and transitive grants cannot be mapped
(i.e. they can only be used in grant copy operations). Multi-level
transitive grants could be added pretty much as soon as anybody
needs them, but mapping transitive grants would be more tricky.
It does still have a few rough edges:
-- Suspend/resume and migration don't work with dom0 bypass.
-- Ignoring the bypass support, performance isn't that much better
than netchannel1 for many tests. Dom0 CPU load is usually lower,
so it should scale better when you have many NICs, but in terms of
raw throughput there's not much in it either way. Earlier versions
were marginally ahead, but there seems to have been a bit of a
regression while I was bringing it up to date with current
Xen/Linux.
-- The hotplug scripts and tool integration aren't nearly as complete
as their netchannel1 equivalents. It's not clear to me how much of
the netchannel1 stuff actually gets used, though, so I'm going to
leave this as-is unless somebody complains.
-- The code quality needs some attention. It's been hacked around by
a number of people over the course of several months, and generally
has a bit less conceptual integrity than I'd like in new code.
(It's not horrific, by any means, but it is a bit harder to follow
than the old netfront/netback drivers were.)
-- There's no unmodified-drivers support, so you won't be able to use
it in HVM domains. Adding support is unlikely to be terribly
difficult, with the possible exception of the dom0 bypass
functionality, but I've not looked at it at all yet.
If you want to try this out, you'll need to rebuild Xen, the dom0
kernel, and the domU kernels, in addition to building the module.
You'll also need to install xend and the userspace tools from the
netchannel2 xen-unstable repository. To create an interface, either
use the ``xm network2-attach'' command or specify a vif2= list in your
xm config file.
The current implementation is broadly functional, in that it doesn't
have any known crippling bugs, but hasn't had a great deal of testing.
It should work, for the most part, but it certainly isn't ready for
production use. If you find any problems, please report them.
Patches would be even better. :)
A couple of people have asked about using the basic ring protocol in
other PV device classes (e.g. pvSCSI, pvUSB). I'll follow up in a
second with a summary of how all that works.
Steven.
signature.asc
Description: Digital signature
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|