xen-devel
Re: [Xen-devel] [PATCH] Network Checksum Removal
Tests for domU->dom0, domU->host, and domU->domU are completed:
3.2 GHz Xeon with Hyperhtreading, 2GB (correction) memory
Benchmark: netperf2 -T TCP_STREAM
dom0, dom1, and dom2 on cpu0 (first SMT thread on first core)
domU to host
hw tx csum
msg-size: 00064 Mbps: 0186 d0-cpu: 49.38 d1-cpu: 44.35
msg-size: 01500 Mbps: 0917 d0-cpu: 62.13 d1-cpu: 37.87
msg-size: 16384 Mbps: 0933 d0-cpu: 66.63 d1-cpu: 33.37
msg-size: 32768 Mbps: 0928 d0-cpu: 66.96 d1-cpu: 32.66
sw tx csum
msg-size: 00064 Mbps: 0187 d0-cpu: 49.50 d1-cpu: 44.52
msg-size: 01500 Mbps: 0904 d0-cpu: 60.63 d1-cpu: 39.36
msg-size: 16384 Mbps: 0924 d0-cpu: 63.98 d1-cpu: 35.98
msg-size: 32768 Mbps: 0926 d0-cpu: 64.18 d1-cpu: 35.68
^^about 2% reduction in cpu util on dom1^^
domU to dom0
hw tx csum
msg-size: 00064 Mbps: 0014 d0-cpu: 64.02 d1-cpu: 31.71
msg-size: 01500 Mbps: 1087 d0-cpu: 63.34 d1-cpu: 36.67
msg-size: 16384 Mbps: 1204 d0-cpu: 67.30 d1-cpu: 32.71
msg-size: 32768 Mbps: 1148 d0-cpu: 68.08 d1-cpu: 31.93
sw tx csum
msg-size: 00064 Mbps: 0014 d0-cpu: 64.88 d1-cpu: 32.39
msg-size: 01500 Mbps: 0948 d0-cpu: 62.20 d1-cpu: 37.80
msg-size: 16384 Mbps: 1063 d0-cpu: 64.73 d1-cpu: 35.27
msg-size: 32768 Mbps: 1012 d0-cpu: 65.71 d1-cpu: 34.30
^^upto 13% throughput increase with cpu util down ~2% on dom1^^
Note the dismal performance for very small msg sizes
donU to domU
hw tx csum
msg-size:00064 Mbps: 0359 d0-cpu: 27.85 d1-cpu: 53.68 d2-cpu: 18.48
msg-size:01500 Mbps: 0594 d0-cpu: 47.42 d1-cpu: 21.77 d2-cpu: 30.78
msg-size:16384 Mbps: 0619 d0-cpu: 49.66 d1-cpu: 18.81 d2-cpu: 31.53
msg-size:32768 Mbps: 0616 d0-cpu: 49.58 d1-cpu: 18.68 d2-cpu: 31.74
sw tx csum
msg-size:00064 Mbps: 0361 d0-cpu: 27.81 d1-cpu: 53.58 d2-cpu: 18.62
msg-size:01500 Mbps: 0584 d0-cpu: 46.22 d1-cpu: 23.18 d2-cpu: 30.60
msg-size:16384 Mbps: 0602 d0-cpu: 47.99 d1-cpu: 20.33 d2-cpu: 31.69
msg-size:32768 Mbps: 0603 d0-cpu: 47.67 d1-cpu: 20.59 d2-cpu: 31.74
^^About a 2% throughput increase, and cpu down on d1
The cpu wasted on dom1 should be enough justification for
domU<->domU communication with point to point front end driver
communication.
dom0 on cpu0, dom1 on cpu2, and dom2 on cpu3 (dom1 and dom2 on same
core)
domU to host
hw tx csum
msg-size: 00064 Mbps: 0540 d0-cpu: 92.98 d1-cpu: 100.00
msg-size: 01500 Mbps: 0941 d0-cpu: 99.74 d1-cpu: 48.62
msg-size: 16384 Mbps: 0941 d0-cpu: 99.71 d1-cpu: 43.32
msg-size: 32768 Mbps: 0941 d0-cpu: 99.72 d1-cpu: 43.21
sw tx csum
msg-size: 00064 Mbps: 0545 d0-cpu: 93.47 d1-cpu: 100.00
msg-size: 01500 Mbps: 0941 d0-cpu: 99.76 d1-cpu: 51.43
msg-size: 16384 Mbps: 0941 d0-cpu: 99.69 d1-cpu: 46.58
msg-size: 32768 Mbps: 0941 d0-cpu: 99.72 d1-cpu: 45.39
^^Finally at wire speed, but at a cost of 100% cpu on dom0
This cpu util seems excessive, maybe oprofile will show
some problems. Notice dom1 has ~2% lower cpu.
domU to dom0
tx csum
msg-size: 00064 Mbps: 0390 d0-cpu: 97.92 d1-cpu: 100.00
msg-size: 01500 Mbps: 1571 d0-cpu: 97.36 d1-cpu: 54.83
msg-size: 16384 Mbps: 1582 d0-cpu: 96.20 d1-cpu: 49.93
msg-size: 32768 Mbps: 1596 d0-cpu: 96.32 d1-cpu: 49.63
sw tx csum
msg-size: 00064 Mbps: 0375 d0-cpu: 97.65 d1-cpu: 100.00
msg-size: 01500 Mbps: 1546 d0-cpu: 96.36 d1-cpu: 52.99
msg-size: 16384 Mbps: 1598 d0-cpu: 95.88 d1-cpu: 47.48
msg-size: 32768 Mbps: 1641 d0-cpu: 95.89 d1-cpu: 46.37
^^very slightly better avg throughput, and lower cpu on dom1
donU to domU
tx csum
msg-size:00064 Mbps: 0287 d0-cpu: 84.97 d1-cpu: 100.0 d2-cpu: 75.46
msg-size:01500 Mbps: 1004 d0-cpu: 90.98 d1-cpu: 68.29 d2-cpu: 76.94
msg-size:16384 Mbps: 1018 d0-cpu: 89.78 d1-cpu: 60.82 d2-cpu: 78.12
msg-size:32768 Mbps: 1010 d0-cpu: 89.30 d1-cpu: 59.83 d2-cpu: 77.99
sw tx csum
msg-size:00064 Mbps: 0286 d0-cpu: 84.81 d1-cpu: 99.93 d2-cpu: 76.28
msg-size:01500 Mbps: 1018 d0-cpu: 91.30 d1-cpu: 67.27 d2-cpu: 75.08
msg-size:16384 Mbps: 1012 d0-cpu: 88.46 d1-cpu: 55.56 d2-cpu: 71.37
msg-size:32768 Mbps: 1017 d0-cpu: 88.33 d1-cpu: 54.96 d2-cpu: 70.96
^^about same throughput, but ~4% lower cpu on d1
Again, point to point front end comms woudl be great here.
IMO, I think the patch is a good thing. There are other very major
issues with networking, like the massive cpu overhead for dom0. I
wonder if we could have a layer 2 networking model like:
-Xen has have front end ethernet drivers only
-dom0 has a Xen bridge front end driver, just to put eth0 (or whatever
phys dev) on it.
-no domain hosted bridge device or backend ethernet drivers
With this, Xen acts as a ethernet "switch", switching ethernet traffic
in xen itself, without the help of a domain hosted bridge. Packets are
forwarded to either a domain's front end driver, or the front end
bridge interface in dom0 (or any other driver domain). With this we
may have better control of emulating offload functions, and we should
avoid some hops (and in may cases involving dom0) for the netwrok
traffic. Comments?
-Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|