WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT

To: Ingo Molnar <mingo@xxxxxxx>
Subject: [Xen-devel] Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Tue, 27 Jan 2009 02:17:39 -0800
Cc: Zachary Amsden <zach@xxxxxxxxxx>, Nick Piggin <npiggin@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, "jeremy@xxxxxxxxxxxxx" <jeremy@xxxxxxxxxxxxx>, "rusty@xxxxxxxxxxxxxxx" <rusty@xxxxxxxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, "chrisw@xxxxxxxxxxxx" <chrisw@xxxxxxxxxxxx>, "hpa@xxxxxxxxx" <hpa@xxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 27 Jan 2009 02:18:30 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20090127075912.GA6551@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20090120112634.GA20858@xxxxxxx> <20090120140324.GA26424@xxxxxxx> <49763806.5090009@xxxxxxxx> <20090120205653.GA19710@xxxxxxx> <20090121072718.GN24891@xxxxxxxxxxxxx> <4977A051.8050203@xxxxxxxx> <1232663311.16317.176.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <4978F6C6.3090003@xxxxxxxx> <1232664907.16317.182.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <49790BCC.1040807@xxxxxxxx> <20090127075912.GA6551@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.19 (X11/20090105)
Ingo Molnar wrote:
ping?

This is a very serious paravirt_ops slowdown affecting the native kernel's performance to the tune of 5-10% in certain workloads.

It's been about 2 years ago that paravirt_ops went upstream, when you told us that something like this would never happen, that paravirt_ops is designed so flexibly that it will never hinder the native kernel - and if it does it will be easy to fix it. Now is the time to fulfill that promise.

I couldn't exactly reproduce your results, but I guess they're similar in shape. Comparing 2.6.29-rc2-nopv with -pvops, I saw this ratio (pass 1-5). Interestingly I'm seeing identical instruction counts for pvops vs non-pvops, and a lower cycle count. The cache references are way up and the miss rate is up a bit, which I guess is the source of the slowdown.

With the attached patch, I get a clear improvement; it replaces the do-nothing pte_val/make_pte functions with inlined movs to move the argument to return, overpatching the 6-byte indirect call (on i386 it would just be all nopped out). CPU cycles and cache misses are way down, and the tick count is down from ~5% worse to ~2%. But the cache reference rate is even higher, which really doesn't make sense to me. But the patch is a clear improvement, and its hard to see how it could make anything worse (its always going to replace an indirect call with simple inlined code).

(Full numbers in spreadsheet.)

I have a couple of other patches to reduce the register pressure of the pvops calls, but I'm trying to work out how to make sure its not all to complex and/or fragile.

   J

Attachment: pvops-mmap-measurements.ods
Description: application/vnd.oasis.opendocument.spreadsheet

Subject: x86/pvops: add a paravirt_indent functions to allow special patching

Several paravirt ops implementations simply return their arguments,
the most obvious being the make_pte/pte_val class of operations on
native.

On 32-bit, the identity function is literally a no-op, as the calling
convention uses the same registers for the first argument and return.
On 64-bit, it can be implemented with a single "mov".

This patch adds special identity functions for 32 and 64 bit argument,
and machinery to recognize them and replace them with either nops or a
mov as appropriate.

At the moment, the only users for the identity functions are the
pagetable entry conversion functions.

The result is a measureable improvement on pagetable-heavy benchmarks
(2-3%, reducing the pvops overhead from 5 to 2%).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
---
 arch/x86/include/asm/paravirt.h     |    5 ++
 arch/x86/kernel/paravirt.c          |   75 ++++++++++++++++++++++++++++++-----
 arch/x86/kernel/paravirt_patch_32.c |   12 +++++
 arch/x86/kernel/paravirt_patch_64.c |   15 +++++++
 4 files changed, 98 insertions(+), 9 deletions(-)

===================================================================
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -390,6 +390,8 @@
        asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
 
 unsigned paravirt_patch_nop(void);
+unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
+unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len);
 unsigned paravirt_patch_ignore(unsigned len);
 unsigned paravirt_patch_call(void *insnbuf,
                             const void *target, u16 tgt_clobbers,
@@ -1378,6 +1380,9 @@
 }
 
 void _paravirt_nop(void);
+u32 _paravirt_ident_32(u32);
+u64 _paravirt_ident_64(u64);
+
 #define paravirt_nop   ((void *)_paravirt_nop)
 
 void paravirt_use_bytelocks(void);
===================================================================
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -44,6 +44,17 @@
 {
 }
 
+/* identity function, which can be inlined */
+u32 _paravirt_ident_32(u32 x)
+{
+       return x;
+}
+
+u64 _paravirt_ident_64(u64 x)
+{
+       return x;
+}
+
 static void __init default_banner(void)
 {
        printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
@@ -138,9 +149,16 @@
        if (opfunc == NULL)
                /* If there's no function, patch it with a ud2a (BUG) */
                ret = paravirt_patch_insns(insnbuf, len, ud2a, 
ud2a+sizeof(ud2a));
-       else if (opfunc == paravirt_nop)
+       else if (opfunc == _paravirt_nop)
                /* If the operation is a nop, then nop the callsite */
                ret = paravirt_patch_nop();
+
+       /* identity functions just return their single argument */
+       else if (opfunc == _paravirt_ident_32)
+               ret = paravirt_patch_ident_32(insnbuf, len);
+       else if (opfunc == _paravirt_ident_64)
+               ret = paravirt_patch_ident_64(insnbuf, len);
+
        else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
                 type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) ||
                 type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) ||
@@ -373,6 +391,45 @@
 #endif
 };
 
+typedef pte_t make_pte_t(pteval_t);
+typedef pmd_t make_pmd_t(pmdval_t);
+typedef pud_t make_pud_t(pudval_t);
+typedef pgd_t make_pgd_t(pgdval_t);
+
+typedef pteval_t pte_val_t(pte_t);
+typedef pmdval_t pmd_val_t(pmd_t);
+typedef pudval_t pud_val_t(pud_t);
+typedef pgdval_t pgd_val_t(pgd_t);
+
+
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
+/* 32-bit pagetable entries */
+#define paravirt_native_make_pte       (make_pte_t *)_paravirt_ident_32
+#define paravirt_native_pte_val                (pte_val_t *)_paravirt_ident_32
+
+#define paravirt_native_make_pmd       (make_pmd_t *)_paravirt_ident_32
+#define paravirt_native_pmd_val                (pmd_val_t *)_paravirt_ident_32
+
+#define paravirt_native_make_pud       (make_pud_t *)_paravirt_ident_32
+#define paravirt_native_pud_val                (pud_val_t *)_paravirt_ident_32
+
+#define paravirt_native_make_pgd       (make_pgd_t *)_paravirt_ident_32
+#define paravirt_native_pgd_val                (pgd_val_t *)_paravirt_ident_32
+#else
+/* 64-bit pagetable entries */
+#define paravirt_native_make_pte       (make_pte_t *)_paravirt_ident_64
+#define paravirt_native_pte_val                (pte_val_t *)_paravirt_ident_64
+
+#define paravirt_native_make_pmd       (make_pmd_t *)_paravirt_ident_64
+#define paravirt_native_pmd_val                (pmd_val_t *)_paravirt_ident_64
+
+#define paravirt_native_make_pud       (make_pud_t *)_paravirt_ident_64
+#define paravirt_native_pud_val                (pud_val_t *)_paravirt_ident_64
+
+#define paravirt_native_make_pgd       (make_pgd_t *)_paravirt_ident_64
+#define paravirt_native_pgd_val                (pgd_val_t *)_paravirt_ident_64
+#endif
+
 struct pv_mmu_ops pv_mmu_ops = {
 #ifndef CONFIG_X86_64
        .pagetable_setup_start = native_pagetable_setup_start,
@@ -424,21 +481,21 @@
        .pmd_clear = native_pmd_clear,
 #endif
        .set_pud = native_set_pud,
-       .pmd_val = native_pmd_val,
-       .make_pmd = native_make_pmd,
+       .pmd_val = paravirt_native_pmd_val,
+       .make_pmd = paravirt_native_make_pmd,
 
 #if PAGETABLE_LEVELS == 4
-       .pud_val = native_pud_val,
-       .make_pud = native_make_pud,
+       .pud_val = paravirt_native_pud_val,
+       .make_pud = paravirt_native_make_pud,
        .set_pgd = native_set_pgd,
 #endif
 #endif /* PAGETABLE_LEVELS >= 3 */
 
-       .pte_val = native_pte_val,
-       .pgd_val = native_pgd_val,
+       .pte_val = paravirt_native_pte_val,
+       .pgd_val = paravirt_native_pgd_val,
 
-       .make_pte = native_make_pte,
-       .make_pgd = native_make_pgd,
+       .make_pte = paravirt_native_make_pte,
+       .make_pgd = paravirt_native_make_pgd,
 
        .dup_mmap = paravirt_nop,
        .exit_mmap = paravirt_nop,
===================================================================
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -12,6 +12,18 @@
 DEF_NATIVE(pv_cpu_ops, clts, "clts");
 DEF_NATIVE(pv_cpu_ops, read_tsc, "rdtsc");
 
+unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
+{
+       /* arg in %eax, return in %eax */
+       return 0;
+}
+
+unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
+{
+       /* arg in %edx:%eax, return in %edx:%eax */
+       return 0;
+}
+
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
                      unsigned long addr, unsigned len)
 {
===================================================================
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -19,6 +19,21 @@
 DEF_NATIVE(pv_cpu_ops, usergs_sysret32, "swapgs; sysretl");
 DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs");
 
+DEF_NATIVE(, mov32, "mov %edi, %eax");
+DEF_NATIVE(, mov64, "mov %rdi, %rax");
+
+unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
+{
+       return paravirt_patch_insns(insnbuf, len,
+                                   start__mov32, end__mov32);
+}
+
+unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
+{
+       return paravirt_patch_insns(insnbuf, len,
+                                   start__mov64, end__mov64);
+}
+
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
                      unsigned long addr, unsigned len)
 {
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>