|  |  | 
  
    |  |  | 
 
  |   |  | 
  
    |  |  | 
  
    |  |  | 
  
    |   xen-devel
[Xen-devel] Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT 
| To: | Ingo Molnar <mingo@xxxxxxx> |  
| Subject: | [Xen-devel] Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT |  
| From: | Jeremy Fitzhardinge <jeremy@xxxxxxxx> |  
| Date: | Tue, 20 Jan 2009 12:45:58 -0800 |  
| Cc: | Nick Piggin <npiggin@xxxxxxx>, zach@xxxxxxxxxx,	Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, jeremy@xxxxxxxxxxxxx,	rusty@xxxxxxxxxxxxxxx,	Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>,	chrisw@xxxxxxxxxxxx, hpa@xxxxxxxxx,	Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>,	Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> |  
| Delivery-date: | Tue, 20 Jan 2009 12:46:28 -0800 |  
| Envelope-to: | www-data@xxxxxxxxxxxxxxxxxxx |  
| In-reply-to: | <20090120140324.GA26424@xxxxxxx> |  
| List-help: | <mailto:xen-devel-request@lists.xensource.com?subject=help> |  
| List-id: | Xen developer discussion <xen-devel.lists.xensource.com> |  
| List-post: | <mailto:xen-devel@lists.xensource.com> |  
| List-subscribe: | <http://lists.xensource.com/mailman/listinfo/xen-devel>,	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |  
| List-unsubscribe: | <http://lists.xensource.com/mailman/listinfo/xen-devel>,	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |  
| References: | <20090120110542.GE19505@xxxxxxxxxxxxx>	<20090120112634.GA20858@xxxxxxx> <20090120140324.GA26424@xxxxxxx> |  
| Sender: | xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |  
| User-agent: | Thunderbird 2.0.0.19 (X11/20090105) |  
| 
Ingo Molnar wrote:
 
* Ingo Molnar <mingo@xxxxxxx> wrote:
 Times I believe are in nanoseconds for lmbench, anyway lower is 
better.
Ouch, that looks unacceptably expensive. All the major distros turn 
CONFIG_PARAVIRT on. paravirt_ops was introduced in x86 with the express 
promise to have no measurable runtime overhead.
non pv   AVG=464.22 STD=5.56
paravirt AVG=502.87 STD=7.36
Nearly 10% performance drop here, which is quite a bit... hopefully 
people are testing the speed of their PV implementations against 
non-PV bare metal :)
 
Here are some more precise stats done via hw counters on a perfcounters 
kernel using 'timec', running a modified version of the 'mmap performance 
stress-test' app i made years ago. 
The MM benchmark app can be downloaded from:
   http://redhat.com/~mingo/misc/mmap-perf.c
timec.c can be picked up from:
   http://redhat.com/~mingo/perfcounters/timec.c
mmap-perf conducts 1 million mmap()/munmap()/mremap() calls, and touches 
the mapped area as well with a certain chance. The patterns are 
pseudo-random and the random seed is initialized to the same value so 
repeated runs produce the exact same mmap sequence. 
I ran the test with a single thread and bound to a single core:
  # taskset 2 timec -e -5,-4,-3,0,1,2,3 ./mmap-perf 1
[ I ran it as root - so that kernel-space hardware-counter statistics are 
  included as well. ] 
The results are quite surprisingly candid about the true costs of 
paravirt_ops on the native kernel's overhead (CONFIG_PARAVIRT=y): 
-----------------------------------------------
| Performance counter stats for './mmap-perf' |
-----------------------------------------------
|                |
|  x86-defconfig |   PARAVIRT=y         
|------------------------------------------------------------------ 
|
|    1311.554526 |  1360.624932  task clock ticks (msecs)    +3.74%
|                |
|              1 |            1  CPU migrations
|             91 |           79  context switches
|          55945 |        55943  pagefaults
|    ............................................
|     3781392474 |   3918777174  CPU cycles                  +3.63%
|     1957153827 |   2161280486  instructions               +10.43%
 
!!
 
|       50234816 |     51303520  cache references            +2.12%
|        5428258 |      5583728  cache misses                +2.86%
 
Is this I or D, or combined?
 
|                |
|    1314.782469 |  1363.694447  time elapsed (msecs)        +3.72%
|                |
-----------------------------------
The most surprising element is that in the paravirt_ops case we run 204 
million more instructions - out of the ~2000 million instructions total. 
That's an increase of over 10%!
 
Yow!  That's pretty awful.  We knew that static instruction count was 
up, but wouldn't have thought that it would hit the dynamic instruction 
count so much... 
I think there are some immediate tweaks we can make to the code 
generated for each call site,  which will help to an extent. 
   J
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 | 
 |  | 
  
    |  |  |