WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] architecture for backend domains

To: Diwaker Gupta <diwakergupta@xxxxxxxxx>
Subject: Re: [Xen-devel] architecture for backend domains
From: Steven Hand <Steven.Hand@xxxxxxxxxxxx>
Date: Sun, 24 Oct 2004 09:25:32 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxxx, Steven.Hand@xxxxxxxxxxxx
Delivery-date: Sun, 24 Oct 2004 09:27:35 +0100
Envelope-to: Steven.Hand@xxxxxxxxxxxx
In-reply-to: Message from Diwaker Gupta <diwakergupta@xxxxxxxxx> of "Sat, 23 Oct 2004 12:30:22 PDT." <1b0b455704102312306f268bff@xxxxxxxxxxxxxx>
>I'm going through all of the latest Xen 2.0 documentation, and I had a
>couple of questions:
>
>o it seems from the docs that its possible to assign io privileges and
>administrative privileges to *any* domain (apart from dom0, which has
>these privileges built in IIRC). is this correct?

Sort of. 

There are two 'capabilities' a domain may have: full administrative 
privilege (DF_PRIVILEGED) and 'physical device' privilege (DF_PHYSDEV).  
DF_PRIVILEGED allows full access to all hypervisor operations (i.e. to 
create, inspect or destroy other domains + access all memory including
all PCI space. DF_PHYSDEV allows more restricted access. 

The intention is that DF_PRIVILEGED is given to only dom0 or, perhaps, 
a dom0-replacement (for doing live upgrades of dom0) although we've 
never done this. DF_PHYSDEV on the other hand is intended for any 
backend domain. 

In the current implementation, however: 

  - dom0 is given DF_PRIVILEGED as expected when creating the first domain 

  - there is no hypercall whose purpose is to add DF_PRIVILEGED to a 
    new domain (nor can one specify this in create domain) -- as such 
    doing the live upgrade thing is not really cleanly possible 

  - the privileged pcidev_access hypercall (used to configure a backend 
    domain) actually sets both DF_PRIVILEGED and DF_PHYSDEV in the 
    backend domain. 


This is a temporary measure; the architectural intention is that 
DF_PHYSDEV alone will suffice for backend domains, and that playing
around with DF_PRIVILEGED will be handled in a cleaner fashion. 


>o can there be multiple backend domains for a single physical device
>(like a network interface)? if so, then there is a scheduling involved
>at multiple levels -- first Xen will have to schedule backends across
>the physdev, and then each backend will have to schedule across the
>domains that use it as backend. Further, what mechanism does Xen use
>to determine which backend to direct pkts to and from the backend
>which client domain to forward them to?

There's nothing to stop someone from configuring the system to have
two backend domains with access to the same physical device. 

However trying to run two copies of a device driver against a single
physical device will lead to tears -- device drivers tend to assume
they're the only ones driving the hardware, and so things will likely
get completely moulinexed. 

This is not a 'scheduling' issue; there's no way to get two device
drivers to share a device without (a) hacking the crap out of the 
device drivers and (b) inserting a whole bunch of synchronization 
and communication between the drivers.  


>o if there is just one backend, how exactly does access to the devices
>take place? From the docs, I gather that each domain using the device
>has 2 rings -- one for sends and one for receivs (very generally
>speaking). Also, the docs say that the backend can directly map
>buffers of the virtual domains in Xen to enable DMA to them. But at
>other places in the docs, I got the impression that client domains
>(and not just backends) have these descriptor rings as well. So
>basically I'm asking if all communication happens through the backend,
>or do client domains talk directly to Xen.

The "2 rings" referred to are basically an inter-domain communication
mechanism -- that is, they allow the transfer of information between
e.g. a client domain and a backend domain. Some of the confusion may
arise from the fact that we refer to the part of the client domain 
that does this as "the frontend device driver" and the part of the
backend domain that does this as "the backend device driver". However
both of these are *virtual* device drivers and don't actually speak
to physical devices at all. They are just two ends of a communication
mechanism which allows e.g. a client to request "read block 1000 from 
device sda3 into a buffer at 0x5ca000"). 

The actual hardware is accessed by regular device drivers running 
in the backend domain -- today that means any linux 2.4.27 or 
linux 2.6.8.1 device drivers. These access the hardware in almost 
exactly the same way as they do normally - via memory mapping bits
of the PCI address space, reading and writing to that, and receiving 
interrupts. Some small modifications are required in xenlinux to 
ensure this is done in a safe way (i.e. indirecting through xen 
for privilege checks etc), but otherwise the driver is the same. 

Overall then if we consider e.g. a process reading a file in a 
client (non backend) domain the control flow is: 

   1. client process does read() syscall -> client kernel 
   2. client kernel (VFS layer) invokes frontend driver for actual access
   3. frontend driver uses I/O rings to send a message to backend domain
   4. backend driver receives message on I/O rings 
   5. backend driver forwards the request to the 'real' physical device 
      driver which in turn forwards the request to the actual device 
   6. -- time passes -- 
   7. device returns data to real device driver 
   8. real device driver returns data to backend driver 
   9. backend driver puts response onto the I/O rings 
   a. frontend driver receives response and passes up to VFS layer
   b. data returned to client process

Hope this makes things a little clearer. 

We're working on updating the documentation for 2.0, but it'll 
likely be an ongoing process.

cheers,

S.




<Prev in Thread] Current Thread [Next in Thread>