xen-devel
RE: [Xen-devel] MPI benchmark performance gap between native linux anddo
To: |
"Nivedita Singhvi" <niv@xxxxxxxxxx>, "Bin Ren" <bin.ren@xxxxxxxxx>, "Andrew Theurer" <habanero@xxxxxxxxxx> |
Subject: |
RE: [Xen-devel] MPI benchmark performance gap between native linux anddomU |
From: |
"Santos, Jose Renato G (Jose Renato Santos)" <joserenato.santos@xxxxxx> |
Date: |
Tue, 5 Apr 2005 17:17:51 -0700 |
Cc: |
"Turner, Yoshio" <yoshio_turner@xxxxxx>, Aravind Menon <aravind.menon@xxxxxxx>, Xen-devel@xxxxxxxxxxxxxxxxxxx, G John Janakiraman <john@xxxxxxxxxxxxxxxxxxx> |
Delivery-date: |
Wed, 06 Apr 2005 00:17:55 +0000 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
Thread-index: |
AcU6LfZxUXcb0dHaQO2H6qEYW9ST+gADZVtA |
Thread-topic: |
[Xen-devel] MPI benchmark performance gap between native linux anddomU |
Nivedita, Bin, Andrew and all interested in Xenoprof
We should be posting the xenoprof patches in a few days.
We are doing some last cleaning up in the code. Just be a little more
patient
Thanks
Renato
>> -----Original Message-----
>> From: Nivedita Singhvi [mailto:niv@xxxxxxxxxx]
>> Sent: Tuesday, April 05, 2005 3:23 PM
>> To: Santos, Jose Renato G (Jose Renato Santos)
>> Cc: xuehai zhang; Xen-devel@xxxxxxxxxxxxxxxxxxx; Turner,
>> Yoshio; Aravind Menon; G John Janakiraman
>> Subject: Re: [Xen-devel] MPI benchmark performance gap
>> between native linux anddomU
>>
>>
>> Santos, Jose Renato G (Jose Renato Santos) wrote:
>>
>> > Hi,
>> >
>> > We had a similar network problem in the past. We were
>> using a TCP
>> > benchmark instead of MPI but I believe your problem is
>> probably the
>> > same as the one we encountered.
>> > It took us a while to get to the bottom of this and we only
>> > identified the reason for this behavior after we ported
>> oprofile to
>> > Xen and did some performance profiling experiments.
>>
>> Hello! Was this on the 2.6 kernel? Would you be able to
>> share the oprofile port? It would be very handy indeed
>> right now. (I was told by a few people that someone
>> was porting oprofile and I believe there was some status
>> on the list that went by) but haven't seen it yet...
>>
>> > Here is a brief explanation of the problem we found and
>> the solution
>> > that worked for us.
>> > Xenolinux allocates a full page (4KB) to store socket buffers
>> > instead of using just MTU bytes as in traditional linux. This is
>> > necessary to enable page exchanges between the guest and the I/O
>> > domains. The side effect of this is that memory space used
>> for socket
>> > buffers is not very efficient. Even if packets have the
>> maximum MTU
>> > size (typically 1500 bytes for Ethernet) the total buffer
>> utilization
>> > is very low ( at most just slightly higher than 35%). If packets
>> > arrive faster than they are processed at the receiver
>> side, they will
>> > exhaust the receiver buffer
>>
>> Most small connections (say upto 3 - 4K) involve only 3 to 5
>> segments, and so the tcp window never really opens fully.
>> On longer lived connections, it does help very much to have
>> a large buffer.
>>
>> > before the TCP advertised window is reached (By default
>> Linux uses a
>> > TCP advertised window equal to 75% of the receive buffer size. In
>> > standard Linux, this is typically sufficient to stop packet
>> > transmission at the sender before running out of receive
>> buffers. The
>> > same is not true in Xen due to inefficient use of socket buffers).
>> > When a packet arrives and there is no receive buffer
>> available, TCP
>> > tries to free socket buffer space by eliminating socket buffer
>> > fragmentation (i.e. eliminating wasted buffer space). This
>> is done at
>> > the cost of an extra copy of all receive buffer to new compacted
>> > socket buffers. This introduces overhead and reduces
>> throughput when
>> > the CPU is the bottleneck, which seems to be your case.
>>
>> /proc/net/netstat will show a counter of just how many times
>> this happens (RcvPruned). Would be interesting if that was
>> significant.
>>
>> > This problem is not very frequent because modern CPUs are
>> fast enough
>> > to receive packets at Gigabit speeds and the receive
>> buffer does not
>> > fill up. However the problem may arise when using slower machines
>> > and/or when the workload consumes a lot of CPU cycles, such as for
>> > example scientific MPI applications. In your case in you have both
>> > factors against you.
>>
>>
>> > The solution to this problem is trivial. You just have to
>> change the
>> > TCP advertised window of your guest to a lower value. In
>> our case, we
>> > used 25% of the receive buffer size and that was sufficient to
>> > eliminate the problem. This can be done using the following command
>>
>> >>echo -2 > /proc/sys/net/ipv4/tcp_adv_win_scale
>>
>> How much did this improve your results by? And wouldn't
>> making the default socket buffers, max socket buffers
>> larger by, say, 5 times be more effective (other than for
>> those applications using setsockopt() to set their buffers
>> to some size already, but not large enough)?
>>
>> > (The default 2 corresponds to 75% of receive buffer, and -2
>> > corresponds to 25%)
>> >
>> > Please let me know if this improve your results. You
>> should still see
>> > a degradation in throughput when comparing xen to
>> traditional linux,
>> > but hopefully you should be able to see better
>> throughputs. You should
>> > also try running your experiments in domain 0. This will
>> give better
>> > throughput although still lower than traditional linux. I
>> am curious
>> > to know if this have any effect in your experiments.
>> Please, post the
>> > new results if this has any effect in your results
>>
>> Yep, me too..
>>
>> thanks,
>> Nivedita
>>
>>
>>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: [Xen-devel] MPI benchmark performance gap between native linux anddomU, (continued)
RE: [Xen-devel] MPI benchmark performance gap between native linux anddomU, Santos, Jose Renato G (Jose Renato Santos)
RE: [Xen-devel] MPI benchmark performance gap between native linux anddomU, Santos, Jose Renato G (Jose Renato Santos)
RE: [Xen-devel] MPI benchmark performance gap between native linux anddomU, Santos, Jose Renato G (Jose Renato Santos)
RE: [Xen-devel] MPI benchmark performance gap between native linux anddomU,
Santos, Jose Renato G (Jose Renato Santos) <=
RE: [Xen-devel] MPI benchmark performance gap between native linux anddomU, Santos, Jose Renato G (Jose Renato Santos)
RE: [Xen-devel] MPI benchmark performance gap between native linux anddomU, Ian Pratt
RE: [Xen-devel] MPI benchmark performance gap between native linux anddomU, Ian Pratt
|
|
|