xen-devel
Re: [Xen-devel] a question about popen() performance on domU
Mats,
Thanks a lot for the response.
I did have a look at popen, and essentially, it does the following [ the
real code is MUCH more complicated, doing lots of open/dup/close on
pipes and stuff]:
if (!fork())
exec("/bin/sh", "sh", "-c", cmd, NULL);
I took a look at the popen source code too yesterday and the above lines are the esstential part. A
thread at gnu list (http://lists.gnu.org/archive/html/bug-global/2005-06/msg00001.html) suggets
popen() might depend on how fast /bin/sh is executed. On both my VM and the physical machine, the
kernel version is 2.6.11, glibc version is 2.3.2.ds1-21, and /bin/sh is linked to /bin/bash. I also
tried to see any difference of the shared libraries used by /bin/sh on both machines and found
/bin/sh on the physical machine uses libraries from /lib/tls while for the VM this directory is
disabled.
VM$ ldd /bin/sh
libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa7000)
libdl.so.2 => /lib/libdl.so.2 (0xb7fa3000)
libc.so.6 => /lib/libc.so.6 (0xb7e70000)
/lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000)
PHYSICAL$ ldd /bin/sh
libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa6000)
libdl.so.2 => /lib/tls/libdl.so.2 (0xb7fa2000)
libc.so.6 => /lib/tls/libc.so.6 (0xb7e6d000)
/lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000)
The fork creates another process, which then executes the /bin/sh, which
again causes another fork/exec to take place in the effort of executing
the actual command given.
So the major component of popen would be fork() and execl(), both of
which cause, amongst other things, a lot of page-table work and
task-switching.
Note that popen is implemented in glibc [I took the 2.3.6 source code
from www.gnu.org for my look at this], so there's no difference in the
implementation of popen itself - the difference lies in how the Linux
kernel handles fork() and exec(), but maybe more importantly, how
task-switches and page-tables are handled in Linux native and Xen-Linux.
Because Xen keeps track of the page-tables on top of Linux's handling of
page-tables, you get some extra work here. So, it should really be
slower on Xen than on native Linux.
[In fact, the question came up not so long ago, why Xen was SLOWER than
native Linux on popen (and some others) in a particular benchmark, and
the result of that investigation was that it's down to, mainly,
task-switching takes longer in Xen.]
I agree with your explanation about Xen was SLOWER than native Linux on popen because of the longer
task-switching in Xen. The problem I met (popen runs faster on Xen VM than the physical machine)
looks abnormal. I ran several home-made benchmarking programming and used the "strace" tool to trace
the system call performance. The first program is to test the performance of both popen and pclose
(a loop of popen call with a followup pclose call) and the source of the program and the strace
results are available at http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-popen/strace.txt. The
results shows the waitpid syscall costs more time on physical machine than on the VM (see the
usecs/call valuee in the following table).
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- ---------
----------------
VM: 63.43 0.127900 6395 20 waitpid
PHYSICAL
MACHINE: 93.87 0.532498 26625 20 waitpid
waitpid is called by pclose as shown in the glibc source code. So, my original post questioning the
performance of popen should take pclose into consideration too. A more accurate question I should
post is, popen+pclose executes faster on my VM than my physical machine. The popen/pclose benchmark
I did narrows the problem down to waitpid that waitpid somehow is suffering on the physical machine.
So, I did a followup experiment to test the fork and waitpid performance on both machines. The
program is a loop of fork call with a followup waitpid call. The source of the program and the
strace results are available at
http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-fork/strace.txt. The strace results confirm
the waitpid costs more time on the physical machine (154 usec/call) than the VM (56 usec/call).
However, the program runs faster on the physical machine (not like the popen/pclose program) and the
results suggest the fork syscall used on the VM costs more time than the clone syscall on the
physical machine. I have a question here, why the physical machine doesn't use fork syscall but the
clone syscall for the same program?
The reason it is not would probably have something to do with the
differences in hardware on Linux vs. Xen platforms, perhaps the fact
that your file-system is a virtual block-device and thus lives inside a
file that is perhaps better cached or otherwise handled in a different
way on the Xen-system.
Let me describe the hardware context of my VM and physical machine. The host of my VM and the
physical machine I tested against the VM, are two nodes of a physical cluster with the same hardware
configuration (Dual Intel PIII 498.799 MHz CPU, 512MB memory, a 4GB HD with same partitions). The
physical machine is rebooted with "nosmp". The VM host is rebooted into Xen with "nosmp" (Xen
version information is "Latest ChangeSet: 2005/05/03 17:30:40 1.1846
4277a730mvnFSFXrxJpVRNk8hjD4Vg"). Xen dom0 is assigned 96MB memory and the VM is the only user
domain running on the VM host with 395MB memory. Both dom0 and the VM are pinned to CPU 0.
Yes, the backends of the VM's VBDs are loopback files in dom0. Three loopback files are used to map
to three partitions inside of the VM. I acutally thought about the possible caching effect of the
VM's VBD backends, but not sure how to testify it and compare it with the physical machine. Is it
possible the Xen has different assurance of writing back than the physical machine, that is, the
data is kept in memory longer before is actually written to disk?
Now, I'm not saying that there isn't a possibility that something is
managed differently in Xen that makes this run faster - I just don't
really see how that would be likely, since everything that happens in
the system is going to be MORE complicated by the extra layer of Xen
involved.
If anyone else has some thoughts on this subject, it would be
interesting to hear.
I agree. But given the VM having same hardware/software configuration as the physical machine, it
runs faster still looks abnormal to me. I wonder if there is any other more efficient debugging
strategies I can use to investigate it. I appreciate if any one has any more suggestions.
Thanks again.
Xuehai
-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
xuehai zhang
Sent: 23 November 2005 20:26
To: xen-devel@xxxxxxxxxxxxxxxxxxx
Cc: Tim Freeman; Kate Keahey
Subject: [Xen-devel] a question about popen() performance on domU
Dear all,
When I compared the performance of some application on both a
Xen domU and a standard linux machine (where domU runs on a
similar physical mahine), I notice the application runs
faster on the domU than on the physical machine.
Instrumenting the application code shows the application
spends more time on popen() calls on domU than on the
physical machine. I wonder if xenlinux does some special
modification of the popen code to improve its performance
than the original Linux popen code?
Thanks in advance for your help.
Xuehai
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|