|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] architecture for backend domains
>I'm going through all of the latest Xen 2.0 documentation, and I had a
>couple of questions:
>
>o it seems from the docs that its possible to assign io privileges and
>administrative privileges to *any* domain (apart from dom0, which has
>these privileges built in IIRC). is this correct?
Sort of.
There are two 'capabilities' a domain may have: full administrative
privilege (DF_PRIVILEGED) and 'physical device' privilege (DF_PHYSDEV).
DF_PRIVILEGED allows full access to all hypervisor operations (i.e. to
create, inspect or destroy other domains + access all memory including
all PCI space. DF_PHYSDEV allows more restricted access.
The intention is that DF_PRIVILEGED is given to only dom0 or, perhaps,
a dom0-replacement (for doing live upgrades of dom0) although we've
never done this. DF_PHYSDEV on the other hand is intended for any
backend domain.
In the current implementation, however:
- dom0 is given DF_PRIVILEGED as expected when creating the first domain
- there is no hypercall whose purpose is to add DF_PRIVILEGED to a
new domain (nor can one specify this in create domain) -- as such
doing the live upgrade thing is not really cleanly possible
- the privileged pcidev_access hypercall (used to configure a backend
domain) actually sets both DF_PRIVILEGED and DF_PHYSDEV in the
backend domain.
This is a temporary measure; the architectural intention is that
DF_PHYSDEV alone will suffice for backend domains, and that playing
around with DF_PRIVILEGED will be handled in a cleaner fashion.
>o can there be multiple backend domains for a single physical device
>(like a network interface)? if so, then there is a scheduling involved
>at multiple levels -- first Xen will have to schedule backends across
>the physdev, and then each backend will have to schedule across the
>domains that use it as backend. Further, what mechanism does Xen use
>to determine which backend to direct pkts to and from the backend
>which client domain to forward them to?
There's nothing to stop someone from configuring the system to have
two backend domains with access to the same physical device.
However trying to run two copies of a device driver against a single
physical device will lead to tears -- device drivers tend to assume
they're the only ones driving the hardware, and so things will likely
get completely moulinexed.
This is not a 'scheduling' issue; there's no way to get two device
drivers to share a device without (a) hacking the crap out of the
device drivers and (b) inserting a whole bunch of synchronization
and communication between the drivers.
>o if there is just one backend, how exactly does access to the devices
>take place? From the docs, I gather that each domain using the device
>has 2 rings -- one for sends and one for receivs (very generally
>speaking). Also, the docs say that the backend can directly map
>buffers of the virtual domains in Xen to enable DMA to them. But at
>other places in the docs, I got the impression that client domains
>(and not just backends) have these descriptor rings as well. So
>basically I'm asking if all communication happens through the backend,
>or do client domains talk directly to Xen.
The "2 rings" referred to are basically an inter-domain communication
mechanism -- that is, they allow the transfer of information between
e.g. a client domain and a backend domain. Some of the confusion may
arise from the fact that we refer to the part of the client domain
that does this as "the frontend device driver" and the part of the
backend domain that does this as "the backend device driver". However
both of these are *virtual* device drivers and don't actually speak
to physical devices at all. They are just two ends of a communication
mechanism which allows e.g. a client to request "read block 1000 from
device sda3 into a buffer at 0x5ca000").
The actual hardware is accessed by regular device drivers running
in the backend domain -- today that means any linux 2.4.27 or
linux 2.6.8.1 device drivers. These access the hardware in almost
exactly the same way as they do normally - via memory mapping bits
of the PCI address space, reading and writing to that, and receiving
interrupts. Some small modifications are required in xenlinux to
ensure this is done in a safe way (i.e. indirecting through xen
for privilege checks etc), but otherwise the driver is the same.
Overall then if we consider e.g. a process reading a file in a
client (non backend) domain the control flow is:
1. client process does read() syscall -> client kernel
2. client kernel (VFS layer) invokes frontend driver for actual access
3. frontend driver uses I/O rings to send a message to backend domain
4. backend driver receives message on I/O rings
5. backend driver forwards the request to the 'real' physical device
driver which in turn forwards the request to the actual device
6. -- time passes --
7. device returns data to real device driver
8. real device driver returns data to backend driver
9. backend driver puts response onto the I/O rings
a. frontend driver receives response and passes up to VFS layer
b. data returned to client process
Hope this makes things a little clearer.
We're working on updating the documentation for 2.0, but it'll
likely be an ongoing process.
cheers,
S.
|
|
|
|
|