Re: [Xen-devel] a question about popen() performance on domU

See comments below.


Thanks Mats. I have more questions about your comments below.

Xuehai

-----Original Message-----
From: xuehai zhang [mailto:hai@xxxxxxxxxxxxxxx]Sent: 24 November 2005 14:02
To: Petersson, Mats
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Tim Freeman; Kate Keahey
Subject: Re: [Xen-devel] a question about popen() performance on domU

Mats,

Thanks a lot for the response.
I did have a look at popen, and essentially, it does the
following [
the real code is MUCH more complicated, doing lots of
open/dup/close
on pipes and stuff]:
if (!fork())
exec("/bin/sh", "sh", "-c", cmd, NULL);
I took a look at the popen source code too yesterday and theabove lines are the esstential part. A thread at gnu list(http://lists.gnu.org/archive/html/bug-global/2005-06/msg00001
.html) suggets
popen() might depend on how fast /bin/sh is executed. On bothmy VM and the physical machine, the kernel version is 2.6.11,glibc version is 2.3.2.ds1-21, and /bin/sh is linked to/bin/bash. I also tried to see any difference of the sharedlibraries used by /bin/sh on both machines and found /bin/shon the physical machine uses libraries from /lib/tls whilefor the VM this directory is disabled.
VM$ ldd /bin/sh
        libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa7000)
        libdl.so.2 => /lib/libdl.so.2 (0xb7fa3000)
        libc.so.6 => /lib/libc.so.6 (0xb7e70000)
        /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000)

PHYSICAL$  ldd /bin/sh
        libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa6000)
        libdl.so.2 => /lib/tls/libdl.so.2 (0xb7fa2000)
        libc.so.6 => /lib/tls/libc.so.6 (0xb7e6d000)
        /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000)
In this particular case, I would think that lib/tls is not a factor, but
it may be worth disabling the tls libraries on the pysical machine too,
just to make sure... [just "mv /lib/tls /lib/tls.disabled" should do
it].

I don't think /lib/tls is the factor too. I did rerun the tests with tls disabled on the physicalmachine and it gave even worse performance for the tests. So, I switched it back.

The fork creates another process, which then executes the /bin/sh,which again causes another fork/exec to take place in the effort ofexecuting the actual command given.
So the major component of popen would be fork() and
execl(), both of
which cause, amongst other things, a lot of page-table work andtask-switching.
Note that popen is implemented in glibc [I took the 2.3.6
source code
from www.gnu.org for my look at this], so there's no
difference in the
implementation of popen itself - the difference lies in how
the Linux
kernel handles fork() and exec(), but maybe more importantly, howtask-switches and page-tables are handled in Linux native
and Xen-Linux.
Because Xen keeps track of the page-tables on top of
Linux's handling
of page-tables, you get some extra work here. So, it should
really be
slower on Xen than on native Linux.
[In fact, the question came up not so long ago, why Xen was SLOWERthan native Linux on popen (and some others) in a particularbenchmark, and the result of that investigation was that
it's down to,
mainly, task-switching takes longer in Xen.]
I agree with your explanation about Xen was SLOWER thannative Linux on popen because of the longer task-switching inXen. The problem I met (popen runs faster on Xen VM than thephysical machine) looks abnormal. I ran several home-madebenchmarking programming and used the "strace" tool to tracethe system call performance. The first program is to test theperformance of both popen and pclose (a loop of popen callwith a followup pclose call) and the source of the programand the strace results are available athttp://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-popen/strace.txt. The results shows the waitpid syscall costs moretime on physical machine than on the VM (see the usecs/callvaluee in the following table).
% time seconds usecs/call callserrors syscall------ ----------- ----------- ------------------ ----------------VM: 63.43 0.127900 6395 20waitpid
PHYSICAL
MACHINE: 93.87 0.532498 26625 20waitpid
waitpid is called by pclose as shown in the glibc sourcecode. So, my original post questioning the performance ofpopen should take pclose into consideration too. A moreaccurate question I should post is, popen+pclose executesfaster on my VM than my physical machine. The popen/pclosebenchmark I did narrows the problem down to waitpid thatwaitpid somehow is suffering on the physical machine.So, I did a followup experiment to test the fork and waitpidperformance on both machines. The program is a loop of forkcall with a followup waitpid call. The source of the programand the strace results are available athttp://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-fork/strace.txt. The strace results confirm the waitpid costs moretime on the physical machine (154 usec/call) than the VM (56usec/call).However, the program runs faster on the physical machine (notlike the popen/pclose program) and the results suggest thefork syscall used on the VM costs more time than the clonesyscall on the physical machine. I have a question here, whythe physical machine doesn't use fork syscall but the clonesyscall for the same program?
Because it's using the same source for glibc! glibc says to use
_IO_fork(), which is calling the fork syscall. Clone would probably do
the same thing, but for whatever good or bad reason, the author(s) of
thise code chose to use fork. There may be good reasons, or no reason at
all to do it this way. I couldn't say. I don't think it makes a whole
lot of difference if the actual command executed by popen is actually
"doing something", rather than just an empty "return".

Do you have any suggestion why the same code uses different syscalls on two machines which have thesame kernel and glibc?

The reason it is not would probably have something to do with thedifferences in hardware on Linux vs. Xen platforms, perhaps
the fact
that your file-system is a virtual block-device and thus
lives inside
a file that is perhaps better cached or otherwise handled in adifferent way on the Xen-system.
Let me describe the hardware context of my VM and physicalmachine. The host of my VM and the physical machine I testedagainst the VM, are two nodes of a physical cluster with thesame hardware configuration (Dual Intel PIII 498.799 MHz CPU,512MB memory, a 4GB HD with same partitions). The physicalmachine is rebooted with "nosmp". The VM host is rebootedinto Xen with "nosmp" (Xen version information is "LatestChangeSet: 2005/05/03 17:30:40 1.18464277a730mvnFSFXrxJpVRNk8hjD4Vg"). Xen dom0 is assigned 96MBmemory and the VM is the only user domain running on the VMhost with 395MB memory. Both dom0 and the VM are pinned to CPU 0.
Yes, the backends of the VM's VBDs are loopback files indom0. Three loopback files are used to map to threepartitions inside of the VM. I acutally thought about thepossible caching effect of the VM's VBD backends, but notsure how to testify it and compare it with the physicalmachine. Is it possible the Xen has different assurance ofwriting back than the physical machine, that is, the data iskept in memory longer before is actually written to disk?
Xen itself doesn't know ANYTHING about the disk/file where the data for
the Dom0 or DomU comes from, so no, Xen would not do that. However, the
loopback file-system that is involved in VBD's would potentially do
things that are different from the actual hardware.

So, there is possbility that the loopback file-system can do something tricky like caching andresults in better performance for applications running inside of the VM?

I think you should be able to mount the virtual disk as a "device" on

your system.


What does "your system" here refer to? Does it mean dom0 or inside of domU?

I don't know of the top of my head how to do that, but
essentially something like this:

mount myimage.hdd loop/ -t ext3 [additional parameters may be needed].

You could then do "chroot loop/", and perform your tests there. This
should execute the same thing from the same place on the native linux as

you would in DomU.

Now, this may not run faster on native than your original setup, but I

wouldn't be surprised if it does...

This is interesting. I will try to run the same tests if I canmount the virtual disk as "device"successfully.


Thanks.

Xuehai

Now, I'm not saying that there isn't a possibility that
something is
managed differently in Xen that makes this run faster - I
just don't
really see how that would be likely, since everything that
happens in
the system is going to be MORE complicated by the extra
layer of Xen
involved.
If anyone else has some thoughts on this subject, it would beinteresting to hear.
I agree. But given the VM having same hardware/softwareconfiguration as the physical machine, it runs faster stilllooks abnormal to me. I wonder if there is any other moreefficient debugging strategies I can use to investigate it. Iappreciate if any one has any more suggestions.
Thanks again.

Xuehai
-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of xuehaizhang
Sent: 23 November 2005 20:26
To: xen-devel@xxxxxxxxxxxxxxxxxxx
Cc: Tim Freeman; Kate Keahey
Subject: [Xen-devel] a question about popen() performance on domU

Dear all,
When I compared the performance of some application on both
a Xen domU
and a standard linux machine (where domU runs on a similar physicalmahine), I notice the application runs faster on the domU
than on the
physical machine.
Instrumenting the application code shows the application
spends more
time on popen() calls on domU than on the physical machine.
I wonder
if xenlinux does some special modification of the popen code toimprove its performance than the original Linux popen code?
Thanks in advance for your help.
Xuehai

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] a question about popen() performance on domU