Hi everyone,
First, let me establish the baseline here:
All default settings, no modifications to any sysctls. Same amount of
RAM to dom0 and VM. (note that by default, TCP BiC is on). Test across
a low latency cluster, everything on the same gigabit switch. I'm
using Xen 2.0.3. I'm using netperf for all my test.
Between dom0 to dom0 on two machines in the cluster, I can consistenly
get ~930Mbps. Between VM to VM on the same two machines, I can get
between 730 to 850 Mbps, but there's a lot more variation.
So far so good.
Now, I modify the TCP buffer sizes (both on dom0 and VM) thus:
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 65536 8388608
net.ipv4.tcp_mem = 24576 32768 49152
net.core.rmem_default = 112640
net.core.wmem_default = 112640
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
net.ipv4.tcp_bic_low_window = 14
net.ipv4.tcp_bic_fast_convergence = 1
net.ipv4.tcp_bic = 1
Now, between dom0 to dom0 on 2 machines, I can get consistenly get
880Mbps. And between VM to VM, I can get around 850Mbps. So far so
good.
But now comes the really interesting part. So far, these machines were
talking over the switch directly. Now I direct all traffic through a
dummynet router (on the same switch). The pipe connecting the two is
set to 500Mbps with an RTT of 80ms.
Here are the results for dom0 to dom0 tests:
== Single flow, 10 seconds ==
[dgupta@sysnet03]$ netperf -H sysnet08
TCP STREAM TEST to sysnet08
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 65536 65536 10.11 158.55
== Single flow, 80 seconds ==
[dgupta@sysnet03]$ netperf -H sysnet08 -l 80
TCP STREAM TEST to sysnet08
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 65536 65536 80.72 344.20
== 50 flows, 80 seconds ==
87380 65536 65536 80.14 4.93
87380 65536 65536 80.18 9.37
87380 65536 65536 80.21 10.13
87380 65536 65536 80.22 9.11
87380 65536 65536 80.19 9.45
87380 65536 65536 80.22 5.06
87380 65536 65536 80.15 9.38
87380 65536 65536 80.20 9.98
87380 65536 65536 80.23 3.70
87380 65536 65536 80.20 9.14
87380 65536 65536 80.18 8.85
87380 65536 65536 80.16 8.96
87380 65536 65536 80.21 9.91
87380 65536 65536 80.18 9.46
87380 65536 65536 80.17 9.38
87380 65536 65536 80.18 9.82
87380 65536 65536 80.15 7.22
87380 65536 65536 80.16 8.64
87380 65536 65536 80.26 10.60
87380 65536 65536 80.22 9.33
87380 65536 65536 80.24 8.88
87380 65536 65536 80.22 9.54
87380 65536 65536 80.19 9.65
87380 65536 65536 80.20 9.70
87380 65536 65536 80.24 9.43
87380 65536 65536 80.19 8.10
87380 65536 65536 80.21 9.31
87380 65536 65536 80.18 9.08
87380 65536 65536 80.19 9.24
87380 65536 65536 80.27 9.91
87380 65536 65536 80.28 9.67
87380 65536 65536 80.24 9.50
87380 65536 65536 80.28 9.70
87380 65536 65536 80.24 10.09
87380 65536 65536 80.31 4.55
87380 65536 65536 80.28 5.93
87380 65536 65536 80.25 9.55
87380 65536 65536 80.32 5.60
87380 65536 65536 80.35 6.29
87380 65536 65536 80.27 4.75
87380 65536 65536 80.40 6.51
87380 65536 65536 80.39 6.38
87380 65536 65536 80.40 10.12
87380 65536 65536 80.53 4.62
87380 65536 65536 80.67 16.53
87380 65536 65536 81.10 4.53
87380 65536 65536 82.21 1.93
87380 65536 65536 80.09 9.43
87380 65536 65536 80.10 9.14
87380 65536 65536 80.13 9.88
[~]
[dgupta@sysnet03]$ awk '{sum+=$5} END {print sum,NR,sum/NR}' dom0-dom0-50.dat
419.96 50 8.3992
This the aggregate and average per flow. Now I run the same test from VM to VM:
== Single flow, 10 seconds ==
root@tg3:~# netperf -H 172.19.222.101
TCP STREAM TEST to 172.19.222.101
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 65536 65536 10.15 22.30
== Single flow, 80 seconds ==
root@tg3:~# netperf -H 172.19.222.101 -l 80
TCP STREAM TEST to 172.19.222.101
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 65536 65536 80.17 76.96
== 50 flows, 80 seconds ==
tee vm-vm-50.dati=0;i<50;i++));do netperf -P 0 -H 172.19.222.101 -l 80 & done |
87380 65536 65536 80.09 8.50
87380 65536 65536 80.08 6.46
87380 65536 65536 80.19 7.33
87380 65536 65536 80.20 7.29
87380 65536 65536 80.20 5.86
87380 65536 65536 80.23 8.40
87380 65536 65536 80.22 8.55
87380 65536 65536 80.22 7.34
87380 65536 65536 80.29 6.28
87380 65536 65536 80.28 7.23
87380 65536 65536 80.23 8.56
87380 65536 65536 80.25 6.60
87380 65536 65536 80.31 6.99
87380 65536 65536 80.27 8.22
87380 65536 65536 80.30 7.41
87380 65536 65536 80.33 8.21
87380 65536 65536 80.27 7.94
87380 65536 65536 80.32 6.54
87380 65536 65536 80.29 8.58
87380 65536 65536 80.35 7.37
87380 65536 65536 80.35 7.09
87380 65536 65536 80.37 7.23
87380 65536 65536 80.38 8.31
87380 65536 65536 80.38 8.18
87380 65536 65536 80.44 9.11
87380 65536 65536 80.43 4.95
87380 65536 65536 80.43 6.48
87380 65536 65536 80.42 8.11
87380 65536 65536 80.44 6.74
87380 65536 65536 80.47 8.76
87380 65536 65536 80.42 7.68
87380 65536 65536 80.45 6.10
87380 65536 65536 80.46 7.47
87380 65536 65536 80.51 7.37
87380 65536 65536 80.52 6.78
87380 65536 65536 80.48 7.31
87380 65536 65536 80.56 7.55
87380 65536 65536 80.57 6.85
87380 65536 65536 80.59 7.53
87380 65536 65536 80.63 7.01
87380 65536 65536 80.64 6.78
87380 65536 65536 80.60 5.76
87380 65536 65536 80.79 6.63
87380 65536 65536 80.79 6.29
87380 65536 65536 80.81 7.54
87380 65536 65536 80.81 7.22
87380 65536 65536 80.94 6.54
87380 65536 65536 80.90 8.02
87380 65536 65536 81.15 4.22
root@tg3:~# awk '{sum+=$5} END {print sum,NR,sum/NR}' vm-vm-50.dat
361.74 50 7.2348
Note the the terrible performance with single flows. With 50 flows,
the aggregate improves, but is still much worse than the dom0 to dom0
results.
Any ideas why I'm getting such bad performance from the VMs on high
BDP links? I'm willing and interested to help in debugging and fixing
this issue, but I need some leads :)
TIA
--
Diwaker Gupta
http://resolute.ucsd.edu/diwaker
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|