|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] New MPI benchmark performance results (update)
Ian,
Thanks for the response.
In the graphs presented on the webpage, we take the results
of native Linux as the reference and normalize the other 3
scenarios to it. We observe a general pattern that usually
dom0 has a better performance than domU with SMP than domU
without SMP (here better performance means low latency and
high throughput). However, we also notice very big
performance gap between domU (w/o SMP) and native linux (or
dom0 because generally dom0 has a very similar performance as
native linux). Some distinct examples are: 8-node SendRecv
latency (max domU/linux score ~ 18), 8-node Allgather latency
(max domU/linux score ~ 17), and 8-node Alltoall latency (max
domU/linux > 60). The performance difference in the last
example is HUGE and we could not think about a reasonable
explaination why transferring 512B message size is so much
different than other sizes. We appreciate if you can provide
your insight to such a big performance problem in these benchmarks.
I still don't quite understand your experimental setup. What version of
Xen are you using? How many CPUs does each node have? How many domU's do
you run on a single node?
The Xen version is 2.0. Each node has 2 CPUs. "domU with SMP" I mentioned in the previous email
means Xen is booted with SMP support (no "nosmp" option) and I pin dom0 to the 1st CPU and pin domU
to the 2nd CPU; "domU with no SMP" I mentioned means Xen is booted without SMP support (with "nosmp"
option) and both dom0 and domU use the same single CPU. There is only 1 domU running on a single
node for each experiment.
As regards the anomalous result for 512B AlltoAll performance, the best
way to track this down would be to use xen-oprofile.
I am not very familar with xen-oprofile. I notice there are some discussions about it in the mailing
list. I wonder if there is any other documents that I can refer to. Thanks.
Is it reliably repeatable?
Yes, we observe this anomaly repeatable. The reported data point in the graph is the average of 10
different runs of the same experiment in different time.
Really bad results are usually due to packets being dropped
somewhere -- there hasn't ben a whole lot of effort put into UDP
performance because so few applications use it.
To clarify: do you indicate that benchmark like AlltoAll might use UDP rather than TCP as
transportation protocol?
Thanks again for the help.
Xuehai
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|