[Xen-devel] Some initial measurements comparing spinlock algorit

To:	Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>, Thomas Friebel <thomas.friebel@xxxxxxx>
Subject:	[Xen-devel] Some initial measurements comparing spinlock algorithms
From:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date:	Fri, 04 Jul 2008 15:17:09 -0700
Cc:	Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Fri, 04 Jul 2008 15:17:41 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Thunderbird 2.0.0.14 (X11/20080501)

I did some kernbench tests with various spinlock algorithms.

I tried the default ticket locks, the old lock-byte spinlock, and aXen-specific spin-then-block lock algorithm.

The test VM is a 4 vcpu guest with 1GB of memory, running on a 2 cpuhost. The idea is to provoke self-stealing due to over-committed CPUs,exacerbating any bad preemption behaviours the various lock algorithmsmay have. The kernel is my current pvops development tree, so2.6.26-rc8+patches, running 32-bit.


I ran "kernbench -M", which avoids the "make -j" saturation test.

The first test was with ticket locks:

   Fri Jul  4 13:25:54 BST 2008
   2.6.26-rc8-tip - ticket locks
   Average Half load -j 3 Run (std deviation):
   Elapsed Time 503.002 (19.3737)
   User Time 563.494 (0.562699)
   System Time 146.404 (6.94372)
   Percent CPU 141 (4.63681)
   Context Switches 54069.4 (458.201)
   Sleeps 49098.4 (367.281)

   (Aborted optimal run after many hours.  EIP sampled to __ticket_spin_lock+16)

The first half-load test finished in a reasonable time period, but the"optimal load" (make -j16) test never terminated. After around 6 hoursof running, it didn't get past the first pass of 5. Sampling eip showedit was always in __ticket_spin_lock on all processors. This is a prettydramatic confirmation of Thomas's results.


The second test was with lock-byte spinlocks:

   2.6.26-rc8-tip - bytelocks
   Average Half load -j 3 Run (std deviation):
   Elapsed Time 410.686 (2.49314)
   User Time 564.596 (0.710408)
   System Time 130.2 (0.519856)
   Percent CPU 168.6 (1.34164)
   Context Switches 53195.8 (599.579)
   Sleeps 49026 (568.152)

   Average Optimal load -j 16 Run (std deviation):
   Elapsed Time 326.226 (0.158367)
   User Time 552.268 (13.0477)
   System Time 117.686 (13.2014)
   Percent CPU 182.9 (15.103)
   Context Switches 68198.8 (15849.9)
   Sleeps 51708.1 (2857.7)

   vcpu use:
   fedora9-x86_32 246 0 1 -b- 2050.1 any cpu
   fedora9-x86_32 246 1 0 -b- 2044.4 any cpu
   fedora9-x86_32 246 2 1 -b- 2032.3 any cpu
   fedora9-x86_32 246 3 0 -b- 2024.1 any cpu

This shows that the old spinlock behaviour has better performance. Forone, the test completed properly under load. The half-load test showsabout the same amount of user time, but less system time used, bettercpu utilisation. "xm vcpu-list" shows about 2020-2050 seconds ofoverall cpu use.


And with the xen-pv locks:

   Fri Jul  4 18:37:36 BST 2008
   2.6.26-rc8-tip - xenpv locks
   Average Half load -j 3 Run (std deviation):
   Elapsed Time 338.98 (0.932121)
   User Time 567.326 (0.416569)
   System Time 132.802 (1.56383)
   Percent CPU 206 (0)
   Context Switches 50225 (499.58)
   Sleeps 48687.6 (542.278)

   Average Optimal load -j 16 Run (std deviation):
   Elapsed Time 323.176 (0.251555)
   User Time 555.099 (12.898)
   System Time 117.882 (15.7619)
   Percent CPU 202.7 (3.49761)
   Context Switches 67133.4 (17837.1)
   Sleeps 51669.8 (3210.78)

   fedora9-x86_32                       4     0     1   -b-    1857.3
   any cpu
   fedora9-x86_32                       4     1     1   -b-    1821.7
   any cpu
   fedora9-x86_32                       4     2     0   -b-    1821.0
   any cpu
   fedora9-x86_32                       4     3     0   r--    1787.3
   any cpu

The pv locks show a marked improvement again: the cpu utilisation is upto the ideal 200%, and less elapsed time (at least for the half load).System time and user time is about the same or slightly worse. But themost signficiant result is the overall reduced CPU usage shown by xmvcpu-list. This shows that even if the guest performance is more orless unchanged, it improves overall system scaling.

The pv-spinlock algorithm sets up an event channel for each vcpu. Afterspinning for 2^10 iterations, it then falls into a poll hypercallwaiting for an event. When the lock holder releases the lock, it checksto see if anyone is waiting and kicks them with an IPI event to unblockthem.


   J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Some initial measurements comparing spinlock algorithms