WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Poor SMP performance pv_ops domU

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: Re: [Xen-devel] Poor SMP performance pv_ops domU
From: John Morrison <john@xxxxxxxxxxxxx>
Date: Wed, 19 May 2010 17:24:20 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 19 May 2010 09:25:20 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4BF2DEBD.7040108@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <E2279633-C226-4C37-9313-49CE6A53B628@xxxxxxxxxxxxx> <4BF2DEBD.7040108@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
I've tried with various kernel's today - pv_ops seems to only use 1 core out of 
8.

PV spinlocks makes no difference.

The thing that sticks out most is I cannot get the dom0 (xen-3.4.2) to show 
more that about 99.7% cpu usage for any pv_ops kernel.

#!/usr/bin/perl

while () {}

running 8 of these loads 2.6.18.8-xenU with nearly 800% cpu as shown in dom0
running the same 8 in any pv_ops kernel's only gets as high as about 99.7%

Inside the pv and xenU kernels top -s show all 8 cores being used.


John

On 18 May 2010, at 19:38, Jeremy Fitzhardinge wrote:

> On 05/18/2010 10:34 AM, John Morrison wrote:
>> Hi,
>> 
>> Over the last year we have tried many times to get acceptable performance 
>> from pv_ops kernels.
>> 
>> Tests done with 1,2,4 and 8 cores. The more cores the lower the score.
>> 
>> Inside the domU it shows all cores, top -s shows all cores in use.
>> xentop in dom0 never shows over 99% cpu.
>> 
>> 2.6.18.8-xenU kernel show's over 700% cpu and the scores are about 8 x the 
>> pv_ops score.
>> 
>> Any ideas ?
>> 
> 
> Well, I guess some kind of bad serialization is going on in there, and
> it should be fairly obvious with a bit of examination.
> 
> Have you tried building your own pvops domu kernels?  Does enabling PV
> spinlocks make any difference?  Also enabling some of the lock
> debugging/profiling/contention monitoring stuff may give useful results.
> 
> Can you post the corresponding 2.6.18 results?  Are there specific
> sub-tests which show the effect more strongly than the others?
> 
> How does the 2.6.32 kernel fare when booted native?
> 
> Thanks,
>    J
> 
>> 
>> John
>> 
>> 
>> 1 core
>> 
>> BYTE UNIX Benchmarks (Version 4.1-wht.2)
>> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
>> 2010 x86_64 GNU/Linux
>> /dev/xvda1           141110136   1066476 132875660   1% /
>> 
>> Start Benchmark Run: Tue May 18 13:54:54 BST 2010
>> 13:54:54 up 0 min,  1 user,  load average: 0.00, 0.00, 0.00
>> 
>> End Benchmark Run: Tue May 18 14:06:12 BST 2010
>> 14:06:12 up 11 min,  2 users,  load average: 11.48, 5.20, 2.43
>> 
>> 
>>                     INDEX VALUES
>> TEST                                        BASELINE     RESULT      INDEX
>> 
>> Dhrystone 2 using register variables        376783.7  8950813.0      237.6
>> Double-Precision Whetstone                      83.1     2103.7      253.2
>> Execl Throughput                               188.3     1568.4       83.3
>> File Copy 1024 bufsize 2000 maxblocks         2672.0    64198.0      240.3
>> File Copy 256 bufsize 500 maxblocks           1077.0    17781.0      165.1
>> File Read 4096 bufsize 8000 maxblocks        15382.0   643717.0      418.5
>> Pipe-based Context Switching                 15448.6    85379.4       55.3
>> Pipe Throughput                             111814.6   478490.1       42.8
>> Process Creation                               569.3     3329.6       58.5
>> Shell Scripts (8 concurrent)                    44.8      380.7       85.0
>> System Call Overhead                        114433.5   498712.3       43.6
>>                                                                 =========
>>     FINAL SCORE                                                     114.1
>> 
>> 2-cores
>> 
>> ==============================================================
>> BYTE UNIX Benchmarks (Version 4.1-wht.2)
>> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
>> 2010 x86_64 GNU/Linux
>> /dev/xvda1           141110136   1066548 132875588   1% /
>> 
>> Start Benchmark Run: Tue May 18 14:07:27 BST 2010
>> 14:07:27 up 0 min,  1 user,  load average: 0.00, 0.00, 0.00
>> 
>> End Benchmark Run: Tue May 18 14:18:04 BST 2010
>> 14:18:04 up 10 min,  1 user,  load average: 12.78, 5.53, 2.49
>> 
>> 
>>                     INDEX VALUES
>> TEST                                        BASELINE     RESULT      INDEX
>> 
>> Dhrystone 2 using register variables        376783.7 10124838.6      268.7
>> Double-Precision Whetstone                      83.1     1188.7      143.0
>> Execl Throughput                               188.3     1596.2       84.8
>> File Copy 1024 bufsize 2000 maxblocks         2672.0    58323.0      218.3
>> File Copy 256 bufsize 500 maxblocks           1077.0    17776.0      165.1
>> File Read 4096 bufsize 8000 maxblocks        15382.0   568217.0      369.4
>> Pipe-based Context Switching                 15448.6    86111.3       55.7
>> Pipe Throughput                             111814.6   469957.8       42.0
>> Process Creation                               569.3     3298.1       57.9
>> Shell Scripts (8 concurrent)                    44.8      378.9       84.6
>> System Call Overhead                        114433.5   532828.4       46.6
>>                                                                 =========
>>     FINAL SCORE                                                     107.9
>> 
>> 4-cores
>> 
>> ==============================================================
>> BYTE UNIX Benchmarks (Version 4.1-wht.2)
>> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
>> 2010 x86_64 GNU/Linux
>> /dev/xvda1           141110136   1066628 132875508   1% /
>> 
>> Start Benchmark Run: Tue May 18 14:19:17 BST 2010
>> 14:19:17 up 0 min,  1 user,  load average: 0.00, 0.00, 0.00
>> 
>> End Benchmark Run: Tue May 18 14:29:53 BST 2010
>> 14:29:53 up 10 min,  1 user,  load average: 13.59, 6.35, 2.97
>> 
>> 
>>                     INDEX VALUES
>> TEST                                        BASELINE     RESULT      INDEX
>> 
>> Dhrystone 2 using register variables        376783.7 10185429.8      270.3
>> Double-Precision Whetstone                      83.1      759.8       91.4
>> Execl Throughput                               188.3     1386.2       73.6
>> File Copy 1024 bufsize 2000 maxblocks         2672.0    62331.0      233.3
>> File Copy 256 bufsize 500 maxblocks           1077.0    16492.0      153.1
>> File Read 4096 bufsize 8000 maxblocks        15382.0   563402.0      366.3
>> Pipe-based Context Switching                 15448.6    87176.0       56.4
>> Pipe Throughput                             111814.6   481068.1       43.0
>> Process Creation                               569.3     3128.9       55.0
>> Shell Scripts (8 concurrent)                    44.8      394.9       88.1
>> System Call Overhead                        114433.5   539996.1       47.2
>>                                                                 =========
>>     FINAL SCORE                                                     102.6
>> 8-cores
>> 
>> ==============================================================
>> BYTE UNIX Benchmarks (Version 4.1-wht.2, 8 threads)
>> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
>> 2010 x86_64 GNU/Linux
>> /dev/xvda1           141110136   1066680 132875456   1% /
>> 
>> Start Benchmark Run: Tue May 18 14:30:59 BST 2010
>> 14:30:59 up 0 min,  1 user,  load average: 0.07, 0.02, 0.00
>> 
>> End Benchmark Run: Tue May 18 14:42:52 BST 2010
>> 14:42:52 up 12 min,  1 user,  load average: 25.56, 10.84, 4.96
>> 
>> 
>>                     INDEX VALUES
>> TEST                                        BASELINE     RESULT      INDEX
>> 
>> Dhrystone 2 using register variables        376783.7  9972130.3      264.7
>> Double-Precision Whetstone                      83.1      755.2       90.9
>> Execl Throughput                               188.3     1584.7       84.2
>> File Copy 1024 bufsize 2000 maxblocks         2672.0    58981.0      220.7
>> File Copy 256 bufsize 500 maxblocks           1077.0    16904.0      157.0
>> File Read 4096 bufsize 8000 maxblocks        15382.0   557735.0      362.6
>> Pipe-based Context Switching                 15448.6    80738.2       52.3
>> Pipe Throughput                             111814.6   450891.2       40.3
>> Process Creation                               569.3     2948.5       51.8
>> Shell Scripts (8 concurrent)                    44.8      378.1       84.4
>> System Call Overhead                        114433.5   537443.2       47.0
>>                                                                 =========
>>     FINAL SCORE                                                     100.9
>> 
>> 
>> 
>> --
>> Professional hosting without compromise
>> www.clustered.net
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>> 
>> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>