WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Live migration with MPI

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Live migration with MPI
From: "Arun Babu" <arunbabu.n@xxxxxxxxx>
Date: Fri, 4 Aug 2006 13:36:15 -0400
Delivery-date: Fri, 04 Aug 2006 10:36:51 -0700
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=H+Gt7m6jf4YQT99qgbw1q2d9GchL0WoaHtv2eyvYqr7F96oYk4rCaz+nz+IdI4j9pK3vAPeTePUEJpwrFhoMIv1HwCHba1skOOmp8ZGymFIL0kXPAqqMJ7w15oqNXqPOFuMMWHsWNt7/9AyCxhE+XKH4QpvNMMxWWr06zrcvyb8=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi,
We have currently set up a 16 node cluster with xen3.0.2-3/Linux 2.6.16-13. We also have MPI setup and running on the cluster. I construct a ring of 4 machines with 3 real nodes and 1 virtual one and run an MPI application(a benchmark -smg2000) and it completes fine. Very nice.

Now while running the MPI benchmark on the ring, I try to live migrate the virtual machine. This produces a 'Kernel Bug' in the virtual machine with the dump pasted below. Also I am pasting the error thrown by the MPI benchmark application.(Seems like some kind of memory corruption while doing migration...)

Has anyone tried successfully doing a live migration while running an MPI application?
Could you please help me how to approach this?  (On seeing the glibc errors, i moved /lib64/tls to /lib64/tls.disabled. But no difference..)

Thank you,
Arun

1. Error message given by the virtual machine's console running and MPI.

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/mmap.c:1961
invalid opcode: 0000 [3] SMP
CPU 0
Modules linked in: ipv6 autofs4 i2c_dev i2c_core dm_mirror dm_mod lp  parport_pc parport
Pid: 4790, comm: smg2000 Not tainted 2.6.16.13-xen #7
RIP: e030:[<ffffffff8016a42b>] <ffffffff8016a42b>{exit_mmap+235}
RSP: e02b:ffff880012cddcd8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8800021a42c0 RCX: 000000000000011d
RDX: ffffffffff578000 RSI: ffff88002b33e6b8 RDI: ffff88000116da80
RBP: 0000000000000000 R08: ffff8800395445b0 R09: 0000000000000000
R10: 0000000000000537 R11: ffffffff801dac20 R12: ffff880002700880
R13: 0000000000000001 R14: 0000000000000006 R15: ffffffff8010b45d
FS:  00002b1d2333e6f0(0000) GS:ffffffff80514000(0000) knlGS:00000000 00000000
CS:  e033 DS: 0000 ES: 0000
Process smg2000 (pid: 4790, threadinfo ffff880012cdc000, task ffff88 003f9828b0)
Stack: 0000000000002181 ffff8800021a42c0 ffff880002700880 ffff880002 700900
       ffff88003f982f1c ffffffff8012ef94 0000000000000006 0000000000 000006
       ffff88003f9828b0 ffffffff80135479
Call Trace: <ffffffff8012ef94>{mmput+52} <ffffffff80135479>{do_exit+ 521}
       <ffffffff8013deae>{__dequeue_signal+478} <ffffffff8010b45d>{s ysret_signal+56}
       <ffffffff80135c28>{do_group_exit+264} <ffffffff8014062c>{get_ signal_to_deliver+1708}
       <ffffffff8010b45d>{sysret_signal+56} <ffffffff8010a5ed>{do_si gnal+157}
       <ffffffff801378cb>{current_fs_time+59} <ffffffff803a3c62>{__d own_read+18}
       <ffffffff80129eec>{try_to_wake_up+924} <ffffffff80196864>{dpu t+84}
       <ffffffff8013d62c>{sigprocmask+220} <ffffffff8013ef23>{sys_rt _sigprocmask+99}
       <ffffffff8017b768>{filp_close+104} <ffffffff8013d62c>{sigproc mask+220}
       <ffffffff8010b45d>{sysret_signal+56} <ffffffff8010b735>{ptreg scall_common+61}

Code: 0f 0b 68 95 2b 3d 80 c2 a9 07 48 83 c4 10 5b 5d 41 5c c3 66
RIP <ffffffff8016a42b>{exit_mmap+235} RSP <ffff880012cddcd8>
 <1>Fixing recursive fault but reboot is needed!


2. Error thrown by the MPI benchmark application:
*** glibc detected *** smg2000: free(): invalid pointer: 0x00000000017ef1a0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2b1d23162c43]
/lib64/libc.so.6(__libc_free+0x84)[0x2b1d23162dc4]
smg2000[0x42b632]
smg2000[0x4289ee]
smg2000[0x41d261]
smg2000[0x405dee]
smg2000[0x4056a8]
smg2000[0x408aad]
smg2000[0x403c05]
smg2000[0x403730]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2b1d23111e54]
smg2000[0x402269]
======= Memory map: ========
00400000-004bd000 r-xp 00000000 00:15 33425271                           /nfsroot/home/abnagara/code/bm/smg2000/test/smg2000
005bc000-005be000 rw-p 000bc000 00:15 33425271                           /nfsroot/home/abnagara/code/bm/smg2000/test/smg2000
005be000-0180f000 rw-p 005be000 00:00 0                                  [heap]
36f8e00000-36f8e0d000 r-xp 00000000 00:0c 37044253                       /nfsroot/lib64/libgcc_s- 4.1.0-20060304.so.1
36f8e0d000-36f8f0d000 ---p 0000d000 00:0c 37044253                       /nfsroot/lib64/libgcc_s-4.1.0-20060304.so.1
36f8f0d000-36f8f0e000 rw-p 0000d000 00:0c 37044253                       /nfsroot/lib64/libgcc_s- 4.1.0-20060304.so.1
2b1d22c37000-2b1d22c51000 r-xp 00000000 00:0c 37044225                   /nfsroot/lib64/ld-2.4.so
2b1d22c51000-2b1d22c52000 rw-p 2b1d22c51000 00:00 0
2b1d22c73000-2b1d22c74000 rw-p 2b1d22c73000 00:00 0
2b1d22d50000-2b1d22d51000 r--p 00019000 00:0c 37044225                   /nfsroot/lib64/ld-2.4.so
2b1d22d51000-2b1d22d52000 rw-p 0001a000 00:0c 37044225                   /nfsroot/lib64/ld- 2.4.so
2b1d22d52000-2b1d22dd2000 r-xp 00000000 00:0c 37044256                   /nfsroot/lib64/libm-2.4.so
2b1d22dd2000-2b1d22ed2000 ---p 00080000 00:0c 37044256                   /nfsroot/lib64/libm- 2.4.so
2b1d22ed2000-2b1d22ed3000 r--p 00080000 00:0c 37044256                   /nfsroot/lib64/libm-2.4.so
2b1d22ed3000-2b1d22ed4000 rw-p 00081000 00:0c 37044256                   /nfsroot/lib64/libm- 2.4.so
2b1d22ed4000-2b1d22ee6000 r-xp 00000000 00:0c 37044273                   /nfsroot/lib64/libpthread-2.4.so
2b1d22ee6000-2b1d22fe6000 ---p 00012000 00:0c 37044273                   /nfsroot/lib64/libpthread- 2.4.so
2b1d22fe6000-2b1d22fe7000 r--p 00012000 00:0c 37044273                   /nfsroot/lib64/libpthread-2.4.so
2b1d22fe7000-2b1d22fe8000 rw-p 00013000 00:0c 37044273                   /nfsroot/lib64/libpthread- 2.4.so
2b1d22fe8000-2b1d22fec000 rw-p 2b1d22fe8000 00:00 0
2b1d22fec000-2b1d22ff3000 r-xp 00000000 00:0c 37044275                   /nfsroot/lib64/librt-2.4.so
2b1d22ff3000-2b1d230f2000 ---p 00007000 00:0c 37044275                   /nfsroot/lib64/librt-2.4.so
2b1d230f2000-2b1d230f3000 r--p 00006000 00:0c 37044275                   /nfsroot/lib64/librt- 2.4.so
2b1d230f3000-2b1d230f4000 rw-p 00007000 00:0c 37044275                   /nfsroot/lib64/librt-2.4.so
2b1d230f4000-2b1d230f5000 rw-p 2b1d230f4000 00:00 0
2b1d230f5000-2b1d23234000 r-xp 00000000 00:0c 37044234                   /nfsroot/lib64/libc-2.4.so
2b1d23234000-2b1d23334000 ---p 0013f000 00:0c 37044234                   /nfsroot/lib64/libc- 2.4.so
2b1d23334000-2b1d23338000 r--p 0013f000 00:0c 37044234                   /nfsroot/lib64/libc-2.4.so
2b1d23338000-2b1d23339000 rw-p 00143000 00:0c 37044234                   /nfsroot/lib64/libc- 2.4.so
2b1d23339000-2b1d23458000 rw-p 2b1d23339000 00:00 0
2b1d2348d000-2b1d2350f000 rw-p 2b1d2348d000 00:00 0
2b1d2352b000-2b1d2426d000 rw-p 2b1d2352b000 00:00 0
2b1d24300000-2b1d24321000 rw-p 2b1d24300000 00:00 0
2b1d24321000-2b1d24400000 ---p 2b1d24321000 00:00 0
7fffffd72000-7fffffd87000 rw-p 7fffffd72000 00:00 0                      [stack]ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-devel] Live migration with MPI, Arun Babu <=