I am open to considering a design change which exposes a
physical-to-machine translation table (PMT) which is shared
between domain0 and Xen. Domain0 is:
- started once by Xen
- is essentially in the same trust domain as Xen
- unlikely (outside of research projects) to ever be safely
rebootable without a system/Xen reboot
- rarely will run real customer apps, so need not use
a large portion of a system's physical memory
- not migratable
However, I agree with Matt that a PMT for other domains
(domU) is a bad idea as it creates many problems for migration,
save/restore, ballooning, and adding new domains to an already
loaded system. Further, the grant table abstraction is the primary
mechanism for page sharing for domU in Xen (on Xen/x86).
I think if domU has any knowledge of actual machine addresses,
the Xen team would consider this a bug that should be fixed.
Some of the email discussion in this thread has referred to
a PMT for dom0 and others refer to a PMT for both dom0 and domU.
At this time, I am willing to consider a PMT for dom0 only.
If you would like to start proposing a design (and patches)
for dom0 PMT, please start a new thread and describe:
- what is the structure/size of the PMT and how is it allocated
(e.g. is it a linear table)? Does the table have other
attributes (e.g. r/w permissions) or is it just a one-to-one
map of physical-to-machine pages?
- how do you deal with different page sizes? (does dom0 need
to be compiled with PAGE_SIZE=4K?)
- how is dom0 I/O handled (differently than it is now)?
- what is the impact on handling virtual translations (e.g.
vcpu_translate())?
- what code that is now different for ia64 in the Xen virtual
drivers would now be the same as x86**
- what code that is now different for ia64 in the Xen virtual
drivers will still be different between ia64 and x86**
- what code (outside of Xen drivers) in xenlinux/ia64 would
need to be changed and is it still possible to make the
changes transparent?
- can dom0 and domU still use the same binary?
- what code in grant_table.c changes (can we merge back to
using common/grant_table.c instead of a separate file?)
HOWEVER, unless there is a general consensus that this change
will be easy to implement and debug, and will make fixing of
multiple domains and/or implementation of virtual networking
much easier for 3.0, I see this as a post-3.0 implementation.
Thanks,
Dan
** it would be good to see the patches for the drivers as
I think the whole point of this proposal is to make the code
closer to Xen/x86 to minimize differences/maintenance. If
"before" we have 100 lines different, and "after" we have
90 lines different, and there are other disadvantages,
adding a PMT might not be a very good tradeoff.
> -----Original Message-----
> From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf
> Of Dong, Eddie
> Sent: Tuesday, November 01, 2005 12:09 AM
> To: Matt Chapman; Tian, Kevin
> Cc: Ling, Xiaofeng; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-ia64-devel] Re: PMT table for XEN/IA64
> (was: RE:Transparentparavirtualization vs. xen paravirtualization)
>
> Matt:
> Yes, like you mentioned, let domU or VTIdomain only do
> page flipping with assumption of the service domain own whole
> system pages (i.e all other domain's page comes from service
> domain) works. While it is eventually impossible for driver
> domains as there can be only one domain that own whole system
> pages. So either we start with what you proposed, and roll
> back to what X86 is doing now at some time later for example
> Xen3.1, or we start with align to Xen/X86 and save all
> various maintaince effort and rework effort. I suggest we go
> with the right design it will be eventually.
> Yes, supporting PMT may require modification in
> Xenia64Linux, while as you pointed out, domU in any sense
> (migration, memory location etc.) has to maintain PMT table,
> so why not let dom0 work in same way? Let dom0 and domU use
> as much code as possible is a right way to do IMO, right?
> The modification to Xenia64Linux is not so big,
> probably only PMT setup now, and then VBD/VNIF work may
> reference and modify it. It should be almost same with X86 approach.
> What sepcific question about X86 shadow_translate? I
> can consult expert here too if you need :-)
>
> So, now it may be time for us to dig into details of
> how to do PMTs...:-) And dan?
> Eddie
>
>
>
>
> Matt Chapman wrote:
> > I'm still not clear about the details. Could you outline
> the changes
> > that you want to make to Xen/ia64?
> >
> > Would DomU have a PMT? Surely DomU should not know about
> real machine
> > addresses, that should be hidden behind the grant table interface.
> > Otherwise migration, save/restore, etc. are difficult (as they have
> > found on x86).
> >
> > Do you know how x86 shadow_translate mode works? Perhaps we should
> > use that as an example.
> >
> > Matt
> >
> >
> > On Mon, Oct 31, 2005 at 05:11:09PM +0800, Tian, Kevin wrote:
> >> Matt Chapman wrote:
> >>> 1. Packet arrives in a Dom0 SKB. Of course the buffer needs
> >>> to be page sized/aligned (this is true on x86 too).
> >>> 2. netback steals the buffer
> >>> 3. netback donates it to DomU *without freeing it*
> >>> 4. DomU receives the frame and passes it up its network stack
> >>> 5. DomU gives away other frame(s) to restore balance
> >>> 6. Dom0 eventually receives extra frames via its balloon driver
> >>>
> >>> 5 and 6 can be done lazily in batches. Alternatively, 4 and 5
> >>> could be a single "flip" operation.
> >>
> >> The solution will work with some tweaks. But is there any obvious
> >> benefit than PMT approach used on x86? (If yes, you should suggest
> >> to xen-devel;-) Usually we want a different approach for either
> >> "can't do on this architecture" or "far better performance than
> >> existing one". Or else why we derail from Xen design for extra
> >> maintainance effort. This extra effort has causing us 2+ weeks to
> >> get VBD up to support DomU for the last 2 upstream merges
> >>
> >>>
> >>> I think this is not significantly different from x86.
> >>>
> >>> I'm not saying this is necessarily better than a PMT solution,
> >>> but I want to discuss the differences and trade-offs. By PMT
> >>> I assume you mean to make Dom0 not 1:1 mapped, and then give
> >>> it access to the translation table? Can you describe how the
> >>> above works differently with a PMT?
> >>
> >>
> >> Simply saying the work flow, PMT approach is similar with
> >> backend/frontend needed to touch PMT table for ownership change.
> >> However do you evaluate how many tricky changes required to support
> >> Domain0 with gpn=mfn upon existing code? For example,
> >> - Backend drivers are not bound to dom0, which can also
> be used by
> >> domU as driver domain. At that time, 1:1 mapping has no
> sense there.
> >> There are some talks on DomU servers as driver IO already.
> >> - You need ensure all available pages granted to dom0.
> That means
> >> you need change current dom0 allocation code.
> >> - You need to change current vnif code with - unknown -
> #ifdefs and
> >> workarounds, since you implement a new behavior on top of different
> >> approach.
> >> - ... (maintenance!)
> >>
> >> So if you implement a VM from scratch, then definitely
> your approach
> >> is worthy of trying since no limitation there. However
> since we work
> >> on XEN, we should take advantage of current Xen design as possible,
> >> right? ;-)
> >>
> >>>
> >>> One disadvantage I see of having Dom0 not 1:1 is that superpages
> >>> are more difficult, we can't just use the guest's superpages.
> >>
> >>
> >> Superpages are optimization option, and we still need to support
> >> incontiguous pages as a basic requirement. You can still add option
> >> to allocate contiguous pages for guest even with PMT table, since
> >> para-virtualization is cooperative.
> >>
> >>>
> >>> Also, are there paravirtualisation changes needed to support a
> >>> PMT? I'm concerned about not making the paravirtualisation
> >>> changes too complex (I think x86 Xen changes the OS too much).
> >>> Also, it should be possible to load Xen frontend drivers into
> >>> unmodified OSs (on VT).
> >>
> >>
> >> We need balance between new designs and maintainance effort.
> >> Currently Xiaofeng Lin from Intel is working on para-drivers for
> >> unmodified domain, and both VBD & VNIF are working for x86 VT
> >> domains already and are reviewing by Cambridge. This work is based
> >> on PMT table.
> >>
> >> Kevin
> >>>
> >>> On Mon, Oct 31, 2005 at 01:28:43PM +0800, Tian, Kevin wrote:
> >>>> Hi, Matt,
> >>>>
> >>>> The point here is how to check donated frame done and
> where "free"
> >>>> actually happens in domU. Currently Linux network driver utilizes
> >>>> zero-copy to pass received packet up without any copy. In this
> >>>> case, the receive pages are allocated from skbuff, which however
> >>>> is freed by upper layer instead of vnif driver itself.
> To let dom0
> >>>> know when the donated page is done, you may either:
> >>>> - Copy content from donated page to local skbuff page, and then
> >>>> notify dom0 immediately at the cost of performance
> >>>> - Modify upper layer code to register "free" hook which notify
> >>>> dom0 if done at the cost of more modification to common code and
> >>>> bias from x86.
> >>>>
> >>>> Definitely there're other possibilities to make it "working" by
> >>>> this approach and even more alternatives. However the point we
> >>>> really want to emphasize here is that we can move towards x86
> >>>> solution by adding PMT, with best performance and less
> maintenance
> >>>> effort. That can actually minimize our future re-base effort when
> >>>> para-drivers keep going. ;-)
> >>>>
> >>>> Thanks,
> >>>> Kevin
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Matt Chapman [mailto:matthewc@xxxxxxxxxxxxxxx]
> >>>>> Sent: 2005年10月31日 13:09
> >>>>> To: Tian, Kevin
> >>>>> Cc: Dong, Eddie; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>>>> Subject: Re: [Xen-ia64-devel] Re: PMT table for
> XEN/IA64 (was: RE:
> >>>>> Transparentparavirtualization vs. xen paravirtualization)
> >>>>>
> >>>>> Yes, I think I understand the problem now.
> >>>>>
> >>>>> The way I imagine this could work is that Dom0 would know about
> >>>>> all of the memory in the machine (i.e. it would be passed the
> >>>>> original EFI memmap, minus memory used by Xen).
> >>>>>
> >>>>> Then Dom0 would donate memory for other domains (=ballooning).
> >>>>> Dom0 can donate data frames to DomU in the same way -
> by granting
> >>>>> the frame and not freeing it. When DomU donates a data frame to
> >>>>> Dom0, Dom0 frees it when it is done, and now the kernel can use
> >>>>> it.
> >>>>>
> >>>>> What do you think of this approach?
> >>>>>
> >>>>> Matt
> >>>>>
> >>>>>
> >>>>> On Mon, Oct 31, 2005 at 11:09:04AM +0800, Tian, Kevin wrote:
> >>>>>> Hi, Matt,
> >>>>>> It's not related to mapped virtual address, but only for
> >>>>>> physical/machine pfn.
> >>>>> Current vnif backend (on x86) works as:
> >>>>>>
> >>>>>> 1. Allocate a set of physical pfns from kernel
> >>>>>> 2. chop up the mapping between physical pfn and old machine pfn
> >>>>>> 3. Transfer ownership of old machine pfn to frontend
> >>>>>> 4. Allocate new machine pfn and bound to that physical pfn
> >>>>>> (In this case, there's no ownership return from frontend for
> >>>>>> performance reason)
> >>>>>>
> >>>>>> If without PMT table (Assuming guest==machine
> for dom0), that
> >>>>>> means you
> >>>>> have to hotplug physical pfns from guest (based on page
> >>>>> granularity) based on current vnif model. Or maybe you
> have better
> >>>>> alternative without PMT, and without big change to existing vnif
> >>>>> driver simultaneously?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Kevin
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >>>>>>> [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx]
> On Behalf Of
> >>>>>>> Matt Chapman Sent: 2005年10月31日 10:59 To: Dong, Eddie
> >>>>>>> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>>>>>> Subject: [Xen-ia64-devel] Re: PMT table for XEN/IA64 (was: RE:
> >>>>>>> Transparentparavirtualization vs. xen paravirtualization)
> >>>>>>>
> >>>>>>> Hi Eddie,
> >>>>>>>
> >>>>>>> The way I did it was to make the address argument to grant
> >>>>>>> hypercalls in/out; that is, the hypervisor might
> possibly return
> >>>>>>> a different address than the one requested, like mmap on UNIX.
> >>>>>>>
> >>>>>>> For DomU, the hypervisor would map the page at the requested
> >>>>>>> address. For Dom0, the hypervisor would instead return the
> >>>>>>> existing address of that page, since Dom0 already has access
> >>>>>>> to the whole address space.
> >>>>>>>
> >>>>>>> (N.B. I'm referring to physical/machine mappings here; unlike
> >>>>>>> the x86 implementation where the grant table ops map pages
> >>>>>>> directly into virtual address space).
> >>>>>>>
> >>>>>>> Matt
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Oct 28, 2005 at 10:28:08PM +0800, Dong, Eddie wrote:
> >>>>>>>>> Page flipping should work just fine
> >>>>>>>>> in the current design; Matt had it almost working (out of
> >>>>>>>>> tree) before he went back to school.
> >>>>>>>>>
> >>>>>>>> Matt:
> >>>>>>>> Dan mentioned that you had VNIF work almost
> done without PMT
> >>>>>>>> table support for dom0, Can you share the idea with us?
> >>>>>>>> Usually VNIF swap page between dom0 and domU so
> that network
> >>>>>>>> package copy (between dom0 native driver and domU frontend
> >>>>>>>> driver) can be avoided and thus achieve high
> performance. With
> >>>>>>>> this swap, we can no longer assume dom0 gpn=mfn. So what did
> >>>>>>>> you ever propose to port VNIF without PMT
> table? Thanks a
> >>>>>>>> lot, eddie
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Xen-ia64-devel mailing list
> >>>>>>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>>>>>> http://lists.xensource.com/xen-ia64-devel
> >>>
> >>> _______________________________________________
> >>> Xen-ia64-devel mailing list
> >>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>> http://lists.xensource.com/xen-ia64-devel
>
>
> _______________________________________________
> Xen-ia64-devel mailing list
> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-ia64-devel
>
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|