xen-devel
[Xen-devel] Re: [PATCH] xen: core dom0 support
Ingo Molnar wrote:
Have i missed a mail of yours perhaps? I dont have any track of
you having posted mmap-perf perfcounters results. I grepped my
mbox and the last mail i saw from you containing the string
"mmap-perf" is from January 20, and it only includes my numbers.
Yes, I think you must have missed a mail. I've attached it for
reference, along with a more complete set of measurements I made
regarding the series of patches applied (series ending at
1f4f931501e9270c156d05ee76b7b872de486304) to improve pvops performance.
My results showed a dramatic drop in cache references (from about 300%
pvop vs non-pvop, down to 125% with the full set of patches applied),
but it didn't seem to make much of an effect on the overall wallclock
time. I'm a bit sceptical of the numbers here because, while each run's
passes are fairly consistent, booting and remeasuring seemed to cause
larger variations than we're looking at. It would be easy to handwave it
away with "cache effects", but its not very satisfying.
I also didn't find the measurements very convincing because the number
of CPU cycles and instructions executed count is effectively unchanged
(ie, the baseline non-pvops vs original pvops apparently execute exactly
the same number of instructions, but we know that there's a lot more
going on), and with no change as each added patch definitely removes
some amount of pvops overhead in terms of instructions in the
instruction stream. Is it just measuring usermode stats? I ran it as
root, with the command line you suggested ("./perfstat -e
-5,-4,-3,0,1,2,3 ./mmap-perf 1"). Cache misses wandered up and down in a
fairly non-intuitive way as well.
I'll do a rerun comparing current tip.git pvops vs non-pvops to see if I
can get some better results.
J
pvops-mmap-measurements.ods
Description: application/vnd.oasis.opendocument.spreadsheet
--- Begin Message ---
To: |
Ingo Molnar <mingo@xxxxxxx> |
Subject: |
Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT |
From: |
Jeremy Fitzhardinge <jeremy@xxxxxxxx> |
Date: |
Tue, 27 Jan 2009 02:17:39 -0800 |
Cc: |
Zachary Amsden <zach@xxxxxxxxxx>, Nick Piggin <npiggin@xxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, "hpa@xxxxxxxxx" <hpa@xxxxxxxxx>, "jeremy@xxxxxxxxxxxxx" <jeremy@xxxxxxxxxxxxx>, "chrisw@xxxxxxxxxxxx" <chrisw@xxxxxxxxxxxx>, "rusty@xxxxxxxxxxxxxxx" <rusty@xxxxxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx> |
In-reply-to: |
<20090127075912.GA6551@xxxxxxx> |
References: |
<20090120112634.GA20858@xxxxxxx> <20090120140324.GA26424@xxxxxxx> <49763806.5090009@xxxxxxxx> <20090120205653.GA19710@xxxxxxx> <20090121072718.GN24891@xxxxxxxxxxxxx> <4977A051.8050203@xxxxxxxx> <1232663311.16317.176.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <4978F6C6.3090003@xxxxxxxx> <1232664907.16317.182.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <49790BCC.1040807@xxxxxxxx> <20090127075912.GA6551@xxxxxxx> |
User-agent: |
Thunderbird 2.0.0.19 (X11/20090105) |
Ingo Molnar wrote:
ping?
This is a very serious paravirt_ops slowdown affecting the native kernel's
performance to the tune of 5-10% in certain workloads.
It's been about 2 years ago that paravirt_ops went upstream, when you told
us that something like this would never happen, that paravirt_ops is
designed so flexibly that it will never hinder the native kernel - and if
it does it will be easy to fix it. Now is the time to fulfill that
promise.
I couldn't exactly reproduce your results, but I guess they're similar
in shape. Comparing 2.6.29-rc2-nopv with -pvops, I saw this ratio (pass
1-5). Interestingly I'm seeing identical instruction counts for pvops
vs non-pvops, and a lower cycle count. The cache references are way up
and the miss rate is up a bit, which I guess is the source of the slowdown.
With the attached patch, I get a clear improvement; it replaces the
do-nothing pte_val/make_pte functions with inlined movs to move the
argument to return, overpatching the 6-byte indirect call (on i386 it
would just be all nopped out). CPU cycles and cache misses are way
down, and the tick count is down from ~5% worse to ~2%. But the cache
reference rate is even higher, which really doesn't make sense to me.
But the patch is a clear improvement, and its hard to see how it could
make anything worse (its always going to replace an indirect call with
simple inlined code).
(Full numbers in spreadsheet.)
I have a couple of other patches to reduce the register pressure of the
pvops calls, but I'm trying to work out how to make sure its not all to
complex and/or fragile.
J
pvops-mmap-measurements.ods
Description: application/vnd.oasis.opendocument.spreadsheet
Subject: x86/pvops: add a paravirt_indent functions to allow special patching
Several paravirt ops implementations simply return their arguments,
the most obvious being the make_pte/pte_val class of operations on
native.
On 32-bit, the identity function is literally a no-op, as the calling
convention uses the same registers for the first argument and return.
On 64-bit, it can be implemented with a single "mov".
This patch adds special identity functions for 32 and 64 bit argument,
and machinery to recognize them and replace them with either nops or a
mov as appropriate.
At the moment, the only users for the identity functions are the
pagetable entry conversion functions.
The result is a measureable improvement on pagetable-heavy benchmarks
(2-3%, reducing the pvops overhead from 5 to 2%).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
---
arch/x86/include/asm/paravirt.h | 5 ++
arch/x86/kernel/paravirt.c | 75 ++++++++++++++++++++++++++++++-----
arch/x86/kernel/paravirt_patch_32.c | 12 +++++
arch/x86/kernel/paravirt_patch_64.c | 15 +++++++
4 files changed, 98 insertions(+), 9 deletions(-)
===================================================================
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -390,6 +390,8 @@
asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
unsigned paravirt_patch_nop(void);
+unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
+unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len);
unsigned paravirt_patch_ignore(unsigned len);
unsigned paravirt_patch_call(void *insnbuf,
const void *target, u16 tgt_clobbers,
@@ -1378,6 +1380,9 @@
}
void _paravirt_nop(void);
+u32 _paravirt_ident_32(u32);
+u64 _paravirt_ident_64(u64);
+
#define paravirt_nop ((void *)_paravirt_nop)
void paravirt_use_bytelocks(void);
===================================================================
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -44,6 +44,17 @@
{
}
+/* identity function, which can be inlined */
+u32 _paravirt_ident_32(u32 x)
+{
+ return x;
+}
+
+u64 _paravirt_ident_64(u64 x)
+{
+ return x;
+}
+
static void __init default_banner(void)
{
printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
@@ -138,9 +149,16 @@
if (opfunc == NULL)
/* If there's no function, patch it with a ud2a (BUG) */
ret = paravirt_patch_insns(insnbuf, len, ud2a,
ud2a+sizeof(ud2a));
- else if (opfunc == paravirt_nop)
+ else if (opfunc == _paravirt_nop)
/* If the operation is a nop, then nop the callsite */
ret = paravirt_patch_nop();
+
+ /* identity functions just return their single argument */
+ else if (opfunc == _paravirt_ident_32)
+ ret = paravirt_patch_ident_32(insnbuf, len);
+ else if (opfunc == _paravirt_ident_64)
+ ret = paravirt_patch_ident_64(insnbuf, len);
+
else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) ||
type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) ||
@@ -373,6 +391,45 @@
#endif
};
+typedef pte_t make_pte_t(pteval_t);
+typedef pmd_t make_pmd_t(pmdval_t);
+typedef pud_t make_pud_t(pudval_t);
+typedef pgd_t make_pgd_t(pgdval_t);
+
+typedef pteval_t pte_val_t(pte_t);
+typedef pmdval_t pmd_val_t(pmd_t);
+typedef pudval_t pud_val_t(pud_t);
+typedef pgdval_t pgd_val_t(pgd_t);
+
+
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
+/* 32-bit pagetable entries */
+#define paravirt_native_make_pte (make_pte_t *)_paravirt_ident_32
+#define paravirt_native_pte_val (pte_val_t *)_paravirt_ident_32
+
+#define paravirt_native_make_pmd (make_pmd_t *)_paravirt_ident_32
+#define paravirt_native_pmd_val (pmd_val_t *)_paravirt_ident_32
+
+#define paravirt_native_make_pud (make_pud_t *)_paravirt_ident_32
+#define paravirt_native_pud_val (pud_val_t *)_paravirt_ident_32
+
+#define paravirt_native_make_pgd (make_pgd_t *)_paravirt_ident_32
+#define paravirt_native_pgd_val (pgd_val_t *)_paravirt_ident_32
+#else
+/* 64-bit pagetable entries */
+#define paravirt_native_make_pte (make_pte_t *)_paravirt_ident_64
+#define paravirt_native_pte_val (pte_val_t *)_paravirt_ident_64
+
+#define paravirt_native_make_pmd (make_pmd_t *)_paravirt_ident_64
+#define paravirt_native_pmd_val (pmd_val_t *)_paravirt_ident_64
+
+#define paravirt_native_make_pud (make_pud_t *)_paravirt_ident_64
+#define paravirt_native_pud_val (pud_val_t *)_paravirt_ident_64
+
+#define paravirt_native_make_pgd (make_pgd_t *)_paravirt_ident_64
+#define paravirt_native_pgd_val (pgd_val_t *)_paravirt_ident_64
+#endif
+
struct pv_mmu_ops pv_mmu_ops = {
#ifndef CONFIG_X86_64
.pagetable_setup_start = native_pagetable_setup_start,
@@ -424,21 +481,21 @@
.pmd_clear = native_pmd_clear,
#endif
.set_pud = native_set_pud,
- .pmd_val = native_pmd_val,
- .make_pmd = native_make_pmd,
+ .pmd_val = paravirt_native_pmd_val,
+ .make_pmd = paravirt_native_make_pmd,
#if PAGETABLE_LEVELS == 4
- .pud_val = native_pud_val,
- .make_pud = native_make_pud,
+ .pud_val = paravirt_native_pud_val,
+ .make_pud = paravirt_native_make_pud,
.set_pgd = native_set_pgd,
#endif
#endif /* PAGETABLE_LEVELS >= 3 */
- .pte_val = native_pte_val,
- .pgd_val = native_pgd_val,
+ .pte_val = paravirt_native_pte_val,
+ .pgd_val = paravirt_native_pgd_val,
- .make_pte = native_make_pte,
- .make_pgd = native_make_pgd,
+ .make_pte = paravirt_native_make_pte,
+ .make_pgd = paravirt_native_make_pgd,
.dup_mmap = paravirt_nop,
.exit_mmap = paravirt_nop,
===================================================================
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -12,6 +12,18 @@
DEF_NATIVE(pv_cpu_ops, clts, "clts");
DEF_NATIVE(pv_cpu_ops, read_tsc, "rdtsc");
+unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
+{
+ /* arg in %eax, return in %eax */
+ return 0;
+}
+
+unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
+{
+ /* arg in %edx:%eax, return in %edx:%eax */
+ return 0;
+}
+
unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
unsigned long addr, unsigned len)
{
===================================================================
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -19,6 +19,21 @@
DEF_NATIVE(pv_cpu_ops, usergs_sysret32, "swapgs; sysretl");
DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs");
+DEF_NATIVE(, mov32, "mov %edi, %eax");
+DEF_NATIVE(, mov64, "mov %rdi, %rax");
+
+unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
+{
+ return paravirt_patch_insns(insnbuf, len,
+ start__mov32, end__mov32);
+}
+
+unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
+{
+ return paravirt_patch_insns(insnbuf, len,
+ start__mov64, end__mov64);
+}
+
unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
unsigned long addr, unsigned len)
{
--- End Message ---
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|