|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] Some initial measurements comparing spinlock algorithms
I did some kernbench tests with various spinlock algorithms.
I tried the default ticket locks, the old lock-byte spinlock, and a
Xen-specific spin-then-block lock algorithm.
The test VM is a 4 vcpu guest with 1GB of memory, running on a 2 cpu
host. The idea is to provoke self-stealing due to over-committed CPUs,
exacerbating any bad preemption behaviours the various lock algorithms
may have. The kernel is my current pvops development tree, so
2.6.26-rc8+patches, running 32-bit.
I ran "kernbench -M", which avoids the "make -j" saturation test.
The first test was with ticket locks:
Fri Jul 4 13:25:54 BST 2008
2.6.26-rc8-tip - ticket locks
Average Half load -j 3 Run (std deviation):
Elapsed Time 503.002 (19.3737)
User Time 563.494 (0.562699)
System Time 146.404 (6.94372)
Percent CPU 141 (4.63681)
Context Switches 54069.4 (458.201)
Sleeps 49098.4 (367.281)
(Aborted optimal run after many hours. EIP sampled to __ticket_spin_lock+16)
The first half-load test finished in a reasonable time period, but the
"optimal load" (make -j16) test never terminated. After around 6 hours
of running, it didn't get past the first pass of 5. Sampling eip showed
it was always in __ticket_spin_lock on all processors. This is a pretty
dramatic confirmation of Thomas's results.
The second test was with lock-byte spinlocks:
2.6.26-rc8-tip - bytelocks
Average Half load -j 3 Run (std deviation):
Elapsed Time 410.686 (2.49314)
User Time 564.596 (0.710408)
System Time 130.2 (0.519856)
Percent CPU 168.6 (1.34164)
Context Switches 53195.8 (599.579)
Sleeps 49026 (568.152)
Average Optimal load -j 16 Run (std deviation):
Elapsed Time 326.226 (0.158367)
User Time 552.268 (13.0477)
System Time 117.686 (13.2014)
Percent CPU 182.9 (15.103)
Context Switches 68198.8 (15849.9)
Sleeps 51708.1 (2857.7)
vcpu use:
fedora9-x86_32 246 0 1 -b- 2050.1 any cpu
fedora9-x86_32 246 1 0 -b- 2044.4 any cpu
fedora9-x86_32 246 2 1 -b- 2032.3 any cpu
fedora9-x86_32 246 3 0 -b- 2024.1 any cpu
This shows that the old spinlock behaviour has better performance. For
one, the test completed properly under load. The half-load test shows
about the same amount of user time, but less system time used, better
cpu utilisation. "xm vcpu-list" shows about 2020-2050 seconds of
overall cpu use.
And with the xen-pv locks:
Fri Jul 4 18:37:36 BST 2008
2.6.26-rc8-tip - xenpv locks
Average Half load -j 3 Run (std deviation):
Elapsed Time 338.98 (0.932121)
User Time 567.326 (0.416569)
System Time 132.802 (1.56383)
Percent CPU 206 (0)
Context Switches 50225 (499.58)
Sleeps 48687.6 (542.278)
Average Optimal load -j 16 Run (std deviation):
Elapsed Time 323.176 (0.251555)
User Time 555.099 (12.898)
System Time 117.882 (15.7619)
Percent CPU 202.7 (3.49761)
Context Switches 67133.4 (17837.1)
Sleeps 51669.8 (3210.78)
fedora9-x86_32 4 0 1 -b- 1857.3
any cpu
fedora9-x86_32 4 1 1 -b- 1821.7
any cpu
fedora9-x86_32 4 2 0 -b- 1821.0
any cpu
fedora9-x86_32 4 3 0 r-- 1787.3
any cpu
The pv locks show a marked improvement again: the cpu utilisation is up
to the ideal 200%, and less elapsed time (at least for the half load).
System time and user time is about the same or slightly worse. But the
most signficiant result is the overall reduced CPU usage shown by xm
vcpu-list. This shows that even if the guest performance is more or
less unchanged, it improves overall system scaling.
The pv-spinlock algorithm sets up an event channel for each vcpu. After
spinning for 2^10 iterations, it then falls into a poll hypercall
waiting for an event. When the lock holder releases the lock, it checks
to see if anyone is waiting and kicks them with an IPI event to unblock
them.
J
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- [Xen-devel] Some initial measurements comparing spinlock algorithms,
Jeremy Fitzhardinge <=
|
|
|
|
|