Hi Ingo,
This series implements a sequence of optimisations to reduce the
impact of enabling CONFIG_PARAVIRT while running native.
They are:
(0. Move some Xen code around to make later changes work better.)
1. For a number of the pvops, the native implemention is a simple
identity function which returns its argument. Add a specific
paravirt identity function which the patcher can treat specially,
by directly inlining the either nops (32-bit) or a mov (64-bit)
into the instruction stream.
2. When a pvop is called from asm code, it also provides a hint
about what registers are available to be clobbered by the called
code. Until now, that information was ignored, and all
caller-save registers were saved. Now, don't bother
saving/restoring registers which are clobberable.
3. The C calling convention lists which registers the caller can
expect to survive a function call, and which the callee is
allowed to clobber. The latter set is quite large, especially on
64-bit. This means that converting a pile of simple inline
functions into function calls caused a lot more register
pressure, making the generated code much worse.
I introduce a new "callee-save" calling convention which makes
only the return register (eax:edx on 32-bit, rax on 64)
callee-clobberable; the callee must preserve all other registers,
including the argument registers.
This makes the callsites for these functions clobber many fewer
registers, giving the compiler a chance to generate better code.
Small asm functions, which generally only use one or two
registers anyway, to be directly called. C code can also be
called via a thunk, which does the necessary register
saving/restoring (generated by PV_CALLEE_SAVE_REGS_THUNK(func)).
The irq enable/disable/save/restore functions are the first to
make use of this calling convention, since they are the most
commonly used in the kernel, and are also called form asm code.
4. Convert the pte_val/make_pte identity functions to use the
callee-save convention; they're only identity functions anyway,
so they have no need to trash lots of registers.
I had to make some adjustments to VSMP and lguest to match the new
calling conventions. I wasn't sure how I should change VMI, so I'm
waiting for Zach's input on that (VMI doesn't compile at the moment).
In testing, the net result was that the overhead dropped by about 75%,
though I found it hard to really get stable results. The most obvious
improvement was a reduction in L2 references, presumably meaning that
L1 was getting a better hit rate. Each of these transforms is an
unambigious improvement in generated code for the native case, so I'm
curious to see what other people see.
Thanks,
J
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|