[Xen-devel] Re: [PATCH] xen: core dom0 support

Jeremy Fitzhardinge wrote:

OK, fair point, its probably time for another Xen architecture refresherpost.
There are two big architectural differences between Xen and KVM:
Firstly, Xen has a separate hypervisor who's primary role is to contextswitch between the guest domains (virtual machines). The hypervisor isrelatively small and single purpose. It doesn't, for example, containany device drivers or even much knowledge of things like pci buses andtheir structure. The domains themselves are more or less peers; someare more privileged than others, but from Xen's perspective they aremore or less equivalent. The first domain, dom0, is special because itsstarted by Xen itself, and has some inherent initial privileges; itsmain job is to start other domains, and it also typically providesvirtualized/multiplexed device services to other domains via afrontend/backend split driver structure.
KVM, on the other hand, builds all the hypervisor stuff into the kernelitself, so you end up with a kernel which does all the normal kernelstuff, and can run virtual machines by making them look like slightlystrange processes.
Because Xen is dedicated to just running virtual machines, its internalarchitecture can be more heavily oriented towards that task, whichaffects things from how its scheduler works, its use and multiplexing ofphysical memory. For example, Xen manages to use new hardwarevirtualization features pretty quickly, partly because it doesn't needto trade-off against normal kernel functions. The clear distinctionbetween the privileged hypervisor and the rest of the domains makes thesecurity people happy as well. Also, because Xen is small and fairlyself-contained, there's quite a few hardware vendors shipping it burnedinto the firmware so that it really is the first thing to boot (many ofinstant-on features that laptops have are based on Xen). Both HP andDell, at least, are selling servers with Xen pre-installed in the firmware.

I think this is a bit misleading. I think you can understand the truedifferences between Xen and KVM by s/hypervisor/Operating System/.Fundamentally, a hypervisor is just an operating system that provides ahardware-like interface to it's processes.

Today, the Xen operating system does not have that many features so itrequires a special process (domain-0) to drive hardware. It uses Linuxfor this and it happens that the Linux domain-0 has full access to allsystem resources so there is absolutely no isolation between Xen anddomain-0. The domain-0 guest is like a Linux userspace process withaccess to an old-style /dev/mem.

You can argue that in theory, one could build a small, decoupleddomain-0, but you could also do this, in theory, with Linux and KVM. Itis not necessary to have all of your device drivers in your Linuxkernel. You could build an initramfs that passed all PCI devicesthrough (via VT-d) to a single guest, and then provided and interface toallow that guest to create more guests. This is essentially what dom0support is.

The real difference between KVM and Xen is that Xen is a separateOperating System dedicated to virtualization. In many ways, it's a forkof Linux since it uses quite a lot of Linux code.

The argument for Xen as a separate OS is no different than the argumentfor a dedicated Real Time Operating System, a dedicated OS for embeddedsystems, or a dedicated OS for a very large system.

Having the distros ship Xen was a really odd thing from a Linuxperspective. It's as if Red Hat started shipping VXworks with a Linuxemulation layer as Real Time Linux.

The arguments for dedicated OSes are well-known. You can do a betterscheduler for embedded/real-time/large systems. You can do a bettermemory allocate for embedded/real-time/large systems. These are thearguments that are made for Xen.

In theory, Xen, the hypervisor, could be merged with upstream Linux butthere is certainly no parties interested in that currently.

My point is not to rail on Xen, but to point out that there isn't reallya choice to be made here from a Linux perspective. It's like saying dowe really need FreeBSD and Linux, maybe those FreeBSD guys should justmerge with Linux. It's not going to happen.

KVM turns Linux into a hypervisor by adding virtualization support. Xenis a separate hypervisor.

So the real discussion shouldn't be should KVM and Xen converge becauseit really doesn't make sense. It's whether it makes sense for upstreamLinux to support being a domain-0 guest under the Xen hypervisor.


Regards,

Anthony Liguori

The second big difference is the use of paravirtualization. Xen cansecurely virtualize a machine without needing any particular hardwaresupport. Xen works well on any post-P6 or any ia64 machine, withoutneeding any virtualzation hardware support. When Xen runs a kernel inparavirtualized mode, it runs the kernel in an unprivileged processorstate. The allows the hypervisor to vet all the guest kernel'sprivileged operations, which are carried out are either via hypercallsor by memory shared between each guest and Xen.
By contrast, KVM relies on at least VT/SVM (and whatever the ia64 equivis called) being available in the CPUs, and needs the most modern ofhardware to get the best performance.
Once important area of paravirtualization is that Xen guests directlyuse the processor's pagetables; there is no shadow pagetable or use ofhardware pagetable nesting. This means that a tlb miss is just a tlbmiss, and happens at full processor performance. This is possiblebecause 1) pagetables are always read-only to the guest, and 2) theguest is responsible for looking up in a table to map guest-local pfnsinto machine-wide mfns before installing them in a pte. Xen will checkthat any new mapping or pagetable satisfies all the rules, by checkingthat the writable reference count is 0, and that the domain owns (or hasbeen allowed access to) any mfn it tries to install in a pagetable.
The other interesting part of paravirtualization is the abstraction ofinterrupts into event channels. Each domain has a bit-array of 1024bits which correspond to 1024 possible event channels. An event channelcan have one of several sources, such as a timer virtual interrupt, aninter-domain event, an inter-vcpu IPI, or mapped from a hardwareinterrupt. We end up mapping the event channels back to irqs and theyare delivered as normal interrupts as far as the rest of the kernel isconcerned.
The net result is that a paravirtualized Xen guest runs a very close tofull speed. Workloads which modify live pagetables a lot take a bit ofa performance hit (since the pte updates have to trap to the hypervisorfor validation), but in general this is not a huge deal. Hardwaresupport for nested pagetables is only just beginning to get close togetting performance parity, but with different tradeoffs (pagetableupdates are cheap, but tlb misses are much more expensive, and hitsconsume more tlb entries).
Xen can also make full use of whatever hardware virtualization featuresare available when running an "hvm" domain. This is typically how you'drun Windows or other unmodified operating systems.
All of this is stuff that's necessary to support any PV Xen domain, andhas been in the kernel for a long time now.
The additions I'm proposing now are those needed for a Xen domain tocontrol the physical hardware, in order to provide virtual devicesupport for other less-privileged domains. These changes affect a fewareas:
   * interrupts: mapping a device interrupt into an event channel for
     delivery to the domain with the device driver for that interrupt
   * mappings: allowing direct hardware mapping of device memory into a
     domain
   * dma: making sure that hardware gets programmed with machine memory
     address, nor virtual ones, and that pages are machine-contiguous
     when expected
Interrupts require a few hooks into the x86 APIC code, but the endresult is that hardware interrupts are delivered via event channels, butthen they're mapped back to irqs and delivered normally (they even endup with the same irq number as they'd usually have).
Device mappings are fairly easy to arrange. I'm using a software ptebit, _PAGE_IOMAP, to indicate that a mapping is a device mapping. Thisbit is set by things like ioremap() and remap_pfn_range, and the Xen mmucode just uses the pfn in the pte as-is, rather than doing the normalpfn->mfn translation.
DMA is handled via the normal DMA API, with some hooks to swiotlb tomake sure that the memory underlying its pools is really DMA-ready (ie,is contiguous and low enough in machine memory).
The changes I'm proposing may look a bit strange from a purely x86perspective, but they fit in relatively well because they're not allthat different from what other architectures require, and so thekernel-wide infrastructure is mostly already in place.
I hope that helps clarify what I'm trying to do here, and why Xen andKVM do have distinct roles to play.
   J



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: [PATCH] xen: core dom0 support