WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: Using Xen Virtualization Environment for Development and

To: xen-devel@xxxxxxxxxxxxxxxxxxx, xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
From: "Mr. Teo En Ming (Zhang Enming)" <space.time.universe@xxxxxxxxx>
Date: Fri, 30 Oct 2009 16:35:00 +0800
Cc: space.time.universe@xxxxxxxxx
Delivery-date: Fri, 30 Oct 2009 01:35:45 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=sWmFQGU4vuiKe7j6S1C81wf5yOh5LDNuYTDTLD8MnuY=; b=jBAviVABuFiqFcotSMrvvFSXROEDa79LPgFinco3Be2LGVE0xnG/1RkRZLw8VcN8ag wMs0QeNQYoCbOomXXlvuMDdZ2XtR/Yoe9pNbtu3Yx8X46Hn7I8Nrn11eSa/KpSELC98T rbspsepjJE+mSsnd+Bnt0TH64i+Um3P24caf0=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=EKAet/MdK+vAO/lHZxgraX+THhYLDHts8A4EaOnDwHQ+4p+Z4dUW4U/MhQSwaTuudx JurzryVq3pA0+Tqk/TrYYULYHo0vfTE6vivV/vj4xsVhoyoMg8FLCSYR1lgaV9pCyBG4 e+eYJ0fT9pNF1qdCNJSs2JOET7ZCCWv/hi7d8=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <f712b9cf0910300112g528ce37ar41e9fa4710f1571@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <f712b9cf0910292337k5247e737pea271b244c340e16@xxxxxxxxxxxxxx> <f712b9cf0910292353u1239ed74rf6012620519dc17e@xxxxxxxxxxxxxx> <f712b9cf0910300053j782151f7y6f7084b3b3a59c62@xxxxxxxxxxxxxx> <f712b9cf0910300112g528ce37ar41e9fa4710f1571@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Dear All,

I have solved the problem.

With reference to http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01327.html and http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices , I have executed the following command as root on all my 6 compute nodes (each compute node is a F11 linux 64-bit PV virtual machine).

# ethtool -K eth0 tx off gso on

Now I can successfully run mpiexec to execute MPI and non-MPI jobs on my Virtual HPC Compute Cluster.

--
Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering)
Alma Maters:
(1) Singapore Polytechnic
(2) National University of Singapore
My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com
My Youtube videos: http://www.youtube.com/user/enmingteo
Email: space.time.universe@xxxxxxxxx
MSN: teoenming@xxxxxxxxxxx
Mobile Phone (SingTel): +65-9648-9798
Mobile Phone (Starhub Prepaid): +65-8369-2618
Age: 31 (as at 30 Oct 2009)
Height: 1.78 meters
Race: Chinese
Dialect: Hokkien
Street: Bedok Reservoir Road
Country: Singapore



On Fri, Oct 30, 2009 at 4:12 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx> wrote:
Dear All,

I have googled something which may help to solve my problem.

[Xen-devel] Network drop on domU (netfront: rx->offset: 0, size: 4294967295)

http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01274.html

Virtualization Tip: Always disable checksumming on virtual ethernet devices


http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices


Let me try to work on it first.

--
Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering)
Alma Maters:
(1) Singapore Polytechnic
(2) National University of Singapore
My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com
My Youtube videos: http://www.youtube.com/user/enmingteo
Email: space.time.universe@xxxxxxxxx
MSN: teoenming@xxxxxxxxxxx
Mobile Phone (SingTel): +65-9648-9798
Mobile Phone (Starhub Prepaid): +65-8369-2618
Age: 31 (as at 30 Oct 2009)
Height: 1.78 meters
Race: Chinese
Dialect: Hokkien
Street: Bedok Reservoir Road
Country: Singapore

On Fri, Oct 30, 2009 at 3:53 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx> wrote:
Hi,

I have reverted to the 2-node troubleshooting scenario. I have started node 1 and node 2.

On node 1, I will try to bring up the ring of mpd for the 2 nodes using mpdboot and try to execute mpiexec. On node 2, I will capture the tcpdump messages on virtual network interface eth0.

Please see attached PNG screenshots. They are numbered in sequence.

Please check if there are any problems.

Thank you.

--
Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering)
Alma Maters:
(1) Singapore Polytechnic
(2) National University of Singapore
My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com
My Youtube videos: http://www.youtube.com/user/enmingteo
Email: space.time.universe@xxxxxxxxx
MSN: teoenming@xxxxxxxxxxx
Mobile Phone (SingTel): +65-9648-9798
Mobile Phone (Starhub Prepaid): +65-8369-2618
Age: 31 (as at 30 Oct 2009)
Height: 1.78 meters
Race: Chinese
Dialect: Hokkien
Street: Bedok Reservoir Road
Country: Singapore

On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx> wrote:
Dear All,

Here are more virtual network interface eth0 kernel messages. Notice the "net eth0: rx->offset: 0" messages. Are they of significance?

Node 1

Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.253:1009 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.252:877 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.251:1000 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.250:882 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.249:953 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no mpdid yet
Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545)
Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: 12 callbacks suppressed
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295

Node 6

Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no mpdid yet
Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805)
Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295

Node 1 NFS Server Configuration

[root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports
/home/enming/mpich2-install/bin        192.168.1.0/24(ro)

Node 2 /etc/fstab Configuration Entry for NFS Client

192.168.1.254:/home/enming/mpich2-install/bin    /home/enming/mpich2-install/bin    nfs    rsize=8192,wsize=8192,timeo=14,intr


--
Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering)
Alma Maters:
(1) Singapore Polytechnic
(2) National University of Singapore
My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com
My Youtube videos: http://www.youtube.com/user/enmingteo
Email: space.time.universe@xxxxxxxxx
MSN: teoenming@xxxxxxxxxxx
Mobile Phone (SingTel): +65-9648-9798
Mobile Phone (Starhub Prepaid): +65-8369-2618
Age: 31 (as at 30 Oct 2009)
Height: 1.78 meters
Race: Chinese
Dialect: Hokkien
Street: Bedok Reservoir Road
Country: Singapore

On Fri, Oct 30, 2009 at 2:37 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx> wrote:
Dear All,

I have created a virtual high performance computing (HPC) cluster of 6 compute nodes with MPICH2 using Xen-based Fedora 11 Linux 64-bit paravirtualized (PV) domU guests. Dom0 is Fedora 11 Linux 64-bit. My Intel Desktop Board DQ45CB has a single onboard Gigabit LAN network adapter.

I am able to bring up the ring of mpd on the set of 6 compute nodes. However, I am consistently encountering the "(mpiexec 392): no msg recvd from mpd when expecting ack of request" error.

After much troubleshooting, I have found that there are Receive Errors (RX-ERR) in the virtual network interface eth0 of all the six compute nodes. All the 6 compute nodes are identical F11 linux 64-bit PV virtual machines.

Here is my PV guest configuration for node 1:

[enming@fedora11-x86-64-host xen]$ cat enming-f11-pv-hpc-node0001
name="enming-f11-pv-hpc-node0001"
memory=512
disk = ['phy:/dev/virtualmachines/f11-pv-hpc-node0001,xvda,w' ]
vif = [ 'mac=00:16:3E:69:E9:11,bridge=eth0' ]
vfb = [ 'vnc=1,vncunused=1,vncdisplay=0,vnclisten=127.0.0.1,vncpasswd=' ]
vncconsole=1
bootloader = "/usr/bin/pygrub"
#kernel = "/home/enming/fedora11/vmlinuz"
#ramdisk = "/home/enming/fedora11/initrd.img"
vcpus=2
>>
Will there be any problems with Xen networking for MPICH2 applications? Or it's just a fine-tuning exercise for Xen networking? I am using PV guests because PV guests have much higher performance than HVM guests.

Here are my mpich-discuss mailing list threads:

http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005883.html

http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005887.html

http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005889.html

http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005890.html

http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005891.html

Please advise on the RX-ERR.

Thank you very much.

--
Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering)
Alma Maters:
(1) Singapore Polytechnic
(2) National University of Singapore
My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com
My Youtube videos: http://www.youtube.com/user/enmingteo
Email: space.time.universe@xxxxxxxxx
MSN: teoenming@xxxxxxxxxxx
Mobile Phone (SingTel): +65-9648-9798
Mobile Phone (Starhub Prepaid): +65-8369-2618
Age: 31 (as at 30 Oct 2009)
Height: 1.78 meters
Race: Chinese
Dialect: Hokkien
Street: Bedok Reservoir Road
Country: Singapore
















_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>