Many Xen hypercalls pass mlocked pointers as parameters for both input and
output. For example, xc_get_pfn_list() is a nice one with multiple levels of
structures/mlocking.
Considering just the tools for the moment, those pointers are userspace
addresses. Ultimately the hypervisor ends up with that userspace address, from
which it reads and writes data. This is OK for x86, since userspace, kernel,
and hypervisor all share the same virtual address space (and userspace has
carefully mlocked the relevent memory).
On PowerPC though, the hypervisor runs in real mode (no MMU translation).
Unlike x86, PowerPC exceptions arrive in real mode, and also PowerPC does not
force a TLB flush when switching between real and virtual modes. So a virtual
address is pretty much worthless as a hypervisor parameter; performing the
MMU translation in software is infeasible.
Although it rarely passes parameters by pointer, the way the pSeries
hypervisor handles this is having the kernel always pass a "pseudo-physical"
address (to borrow Xen terminology), which is trivially translatable to a
"machine" address in the hypervisor. The processor has some notion of a large
(e.g. 64M) chunk of contiguous machine memory, so the hypervisor keeps a
table of chunks which can be used to translate pseudo-physical addresses.
Of course, userspace doesn't know psuedo-physical addresses, only the kernel
does. So one way or another, to pass parameters by pointer to the PPC
hypervisor, the kernel is going to need to translate them. That also means
userspace memory areas will be limited to one page (since virtually
consecutive pages may not be representable by a single pseudo-physical
address).
If we're stuck with structure addresses in hypercalls, one possible solution
is to modify libxc so that all parameter addresses are physical pointers
within the same page, then pass that page's physical address into the
hypercall. Something like this:
ulong magicpage_vaddr;
ulong magicpage_paddr;
libxc_init() {
#ifdef __powerpc__
posix_memalign(&magicpage_vaddr, PAGE_SIZE, PAGE_SIZE);
mlock(magicpage_vaddr);
magicpage_paddr = new_translate_syscall(magicpage_vaddr);
#endif
...
}
xc_get_pfn_list() {
dom0_op_t *op;
ulong op_paddr;
magicalloc(&op, &op_paddr, sizeof(dom0_op_t));
...
}
#ifdef __powerpc__
magicalloc(ulong &usable_addr, ulong &hcall_addr, int bytes) {
*usable_addr = magicpage_vaddr + offset;
*hcall_addr = magicpage_paddr + offset;
offset += bytes;
}
do_xen_hypercall(ptr) {
ptr -= magicpage_vaddr - magicpage_paddr;
do_privcmd(..., ptr);
}
#endif
(Note that this is for discussion only, not a proposed interface.)
Each architecture would provide their own magicalloc and do_xen_hypercall, and
for x86 magicalloc would be malloc+mlock and both pointers are the same. x86
do_xen_hypercall would remain unchanged. Basically, any current use of mlock
in libxc would be replaced with calls to magicalloc.
For example, if we're willing to change the embedded pointers in dom0_ops to
offsets, we do not need to invent a new "translate" system call.
Other suggestions are welcome.
--
Hollis Blanchard
IBM Linux Technology Center
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|