[Xen-devel] Re: [PATCH] xen: core dom0 support

Nick Piggin wrote:

Those would be pertinent questions if I were suddenly popping up and
saying "hey, let's add Xen support to the kernel!"  But Xen support has
been in the kernel for well over a year now, and is widely used, enabled
in distros, etc.  The patches I'm proposing here are not a whole new
thing, they're part of the last 10% to fill out the kernel's support to
make it actually useful.


As a guest, I guess it has been agreed that guest support for all
different hypervisors is "a good thing". dom0 is more like a piece
of the hypervisor itself, right?

Hm, I wouldn't put it like that. dom0 is no more part of the hypervisorthan the hypervisor is part of dom0. The hypervisor provides one set ofservices (domain isolation and multiplexing). Domains with directhardware access and drivers provide arbitration for virtualized deviceaccess. They provide orthogonal sets of functionality which are bothrequired to get a working system.

Also, the machinery needed to allow a kernel to operate as dom0 is morethan that: it allows direct access to hardware in general. An otherwiseunprivileged domU can be given access to a specific PCI device viaPCI-passthrough so that it can drive it directly. This is often usedfor direct access to 3D hardware, or high-performance networking (espwith multi-context hardware that's designed for virtualization use).

Because Xen is dedicated to just running virtual machines, its internal
architecture can be more heavily oriented towards that task, which
affects things from how its scheduler works, its use and multiplexing of
physical memory.  For example, Xen manages to use new hardware
virtualization features pretty quickly, partly because it doesn't need
to trade-off against normal kernel functions.  The clear distinction
between the privileged hypervisor and the rest of the domains makes the
security people happy as well.  Also, because Xen is small and fairly
self-contained, there's quite a few hardware vendors shipping it burned
into the firmware so that it really is the first thing to boot (many of
instant-on features that laptops have are based on Xen).  Both HP and
Dell, at least, are selling servers with Xen pre-installed in the firmware.


That would kind of seem like Xen has a better design to me, OTOH if it
needs this dom0 for most device drivers and things, then how much
difference is it really? Is KVM really disadvantaged by being a part of
the kernel?

Well, you can lump everything together in dom0 if you want, and that isa common way to run a Xen system. But there's no reason you can'tdisaggregate drivers into their own domains, each with theresponsibility for a particular device or set of devices (or indeed, anyother service you want provided). Xen can use hardware features likeVT-d to really enforce the partitioning so that the domains can'tprogram their hardware to touch anything except what they're allowed totouch, so nothing is trusted beyond its actual area of responsibility.It also means that killing off and restarting a driver domain is afairly lightweight and straightforward operation because the state isisolated and self-contained; guests using a device have to be able todeal with a disconnect/reconnect anyway (for migration), so it doesn'taffect them much. Part of the reason there's a lot of academic interestin Xen is because it has the architectural flexibility to try out lotsof different configurations.

I wouldn't say that KVM is necessarily disadvantaged by its design; itsjust a particular set of tradeoffs made up-front. It loses Xen'sflexibility, but the result is very familiar to Linux people. A guestdomain just looks like a qemu process that happens to run in a strangeprocessor mode a lot of the time. The qemu process provides virtualdevice access to its domain, and accesses the normal device drivers likeany other usermode process would. The domains are as isolated from eachother as much as processes normally are, but they're all floating aroundin the same kernel; whether that provides enough isolation for whatevertechnical, billing, security, compliance/regulatory or otherrequirements you have is up to the user to judge.

Once important area of paravirtualization is that Xen guests directly
use the processor's pagetables; there is no shadow pagetable or use of
hardware pagetable nesting.  This means that a tlb miss is just a tlb
miss, and happens at full processor performance.  This is possible
because 1) pagetables are always read-only to the guest, and 2) the
guest is responsible for looking up in a table to map guest-local pfns
into machine-wide mfns before installing them in a pte.  Xen will check
that any new mapping or pagetable satisfies all the rules, by checking
that the writable reference count is 0, and that the domain owns (or has
been allowed access to) any mfn it tries to install in a pagetable.


Xen's memory virtualization is pretty neat, I'll give it that. Is it
faster than KVM on a modern CPU?

It really depends on the workload. There's three cases to consider:software shadow pagetables, hardware nested pagetables, and Xen directpagetables. Even now, Xen's (highly optimised) shadow pagetable codegenerally out-performs modern nested pagetables, at least when runningWindows (for which that code was most heavily tuned). Shadow pagetablesand nested pagetables will generally outperform direct pagetables whenthe workload does lots of pagetable updates compared to accesses. (Idon't know what the current state of kvm's shadow pagetable performanceis, but it seems OK.)

But if you're mostly accessing the pagetable, direct pagetables stillwin. On a tlb miss, it gets 4 memory accesses, whereas a nestedpagetable tlb miss needs 24 memory accesses; and a nested tlb hit meansthat you have 24 tlb entries being tied up to service the hit, vs 4.(Though the chip vendors are fairly secretive about exactly how theystructure their tlbs to deal with nested lookups, so I may be offhere.) (It also depends on whether you arrange to put the guest, hostor both memory into large pages; doing so helps a lot.)

 Would it be possible I wonder to make
a MMU virtualization layer for CPUs without support, using Xen's page
table protection methods, and have KVM use that? Or does that amount
to putting a significant amount of Xen hypervisor into the kernel..?

At one point Avi was considering doing it, but I don't think he evermade any real effort in that direction. KVM is pretty wedded to havinghardware support anyway, so there's not much point in removing it inthis one area.

The Xen technique gets its performance from collapsing a level ofindirection, but that has a cost in terms of flexibility; the hypervisorcan't do as much mucking around behind the guest's back (for example,the guest sees real hardware memory addresses in the form of mfns, soXen can't move pages around, at least not without some form of explicitsynchronisation).


   J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: [PATCH] xen: core dom0 support