xen-users
[Xen-devel] Re: Using Xen Virtualization Environment for Development and
Dear All,
I have solved the problem.
With reference to http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01327.html and http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices , I have executed the following command as root on all my 6 compute nodes (each compute node is a F11 linux 64-bit PV virtual machine).
# ethtool -K eth0 tx off gso on
Now I can successfully run mpiexec to execute MPI and non-MPI jobs on my Virtual HPC Compute Cluster.
-- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering)
Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo
Email: space.time.universe@xxxxxxxxx MSN: teoenming@xxxxxxxxxxx Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618
Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore
On Fri, Oct 30, 2009 at 4:12 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx> wrote:
Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic
(2) National University of Singapore
My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.comMy Youtube videos: http://www.youtube.com/user/enmingteo
Email: space.time.universe@xxxxxxxxxMSN: teoenming@xxxxxxxxxxxMobile Phone (SingTel): +65-9648-9798
Mobile Phone (Starhub Prepaid): +65-8369-2618
Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 3:53 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx> wrote:
Hi,
I have reverted to the 2-node troubleshooting scenario. I have started node 1 and node 2.
On node 1, I will try to bring up the ring of mpd for the 2 nodes using mpdboot and try to execute mpiexec. On node 2, I will capture the tcpdump messages on virtual network interface eth0.
Please see attached PNG screenshots. They are numbered in sequence.
Please check if there are any problems.
Thank you.
--
On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx> wrote:
Dear All,
Here are more virtual network interface eth0 kernel messages. Notice the "net eth0: rx->offset: 0" messages. Are they of significance?
Node 1
Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.253:1009 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.252:877 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.251:1000 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.250:882 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.249:953 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no mpdid yet Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545) Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: 12 callbacks suppressed Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Node 6
Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no mpdid yet Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805)
Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
Node 1 NFS Server Configuration
[root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports /home/enming/mpich2-install/bin 192.168.1.0/24(ro)
Node 2 /etc/fstab Configuration Entry for NFS Client
192.168.1.254:/home/enming/mpich2-install/bin /home/enming/mpich2-install/bin nfs rsize=8192,wsize=8192,timeo=14,intrOn Fri, Oct 30, 2009 at 2:37 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx> wrote:
Dear All,
I have created a virtual high performance computing (HPC) cluster of 6 compute nodes with MPICH2 using Xen-based Fedora 11 Linux 64-bit paravirtualized (PV) domU guests. Dom0 is Fedora 11 Linux 64-bit. My Intel Desktop Board DQ45CB has a single onboard Gigabit LAN network adapter.
I am able to bring up the ring of mpd on the set of 6 compute nodes. However, I am consistently encountering the "(mpiexec 392): no msg recvd from mpd when expecting ack of request" error.
After much troubleshooting, I have found that there are Receive Errors (RX-ERR) in the virtual network interface eth0 of all the six compute nodes. All the 6 compute nodes are identical F11 linux 64-bit PV virtual machines.
Here is my PV guest configuration for node 1:
[enming@fedora11-x86-64-host xen]$ cat enming-f11-pv-hpc-node0001 name="enming-f11-pv-hpc-node0001" memory=512 disk = ['phy:/dev/virtualmachines/f11-pv-hpc-node0001,xvda,w' ]
vif = [ 'mac=00:16:3E:69:E9:11,bridge=eth0' ] vfb = [ 'vnc=1,vncunused=1,vncdisplay=0,vnclisten=127.0.0.1,vncpasswd=' ] vncconsole=1 bootloader = "/usr/bin/pygrub" #kernel = "/home/enming/fedora11/vmlinuz"
#ramdisk = "/home/enming/fedora11/initrd.img" vcpus=2 >> Will there be any problems with Xen networking for MPICH2 applications? Or it's just a fine-tuning exercise for Xen networking? I am using PV guests because PV guests have much higher performance than HVM guests.
Here are my mpich-discuss mailing list threads:
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005883.html
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005887.html
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005889.html
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005890.html
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005891.html
Please advise on the RX-ERR.
Thank you very much.
-- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore
My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo
Email: space.time.universe@xxxxxxxxx MSN: teoenming@xxxxxxxxxxx Mobile Phone (SingTel): +65-9648-9798
Mobile Phone (Starhub Prepaid): +65-8369-2618
Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|