|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] Live migration with MPI
Hi, We have currently set up a 16 node cluster with xen3.0.2-3/Linux 2.6.16-13. We also have MPI setup and running on the cluster. I construct a ring of 4 machines with 3 real nodes and 1 virtual one and run an MPI application(a benchmark -smg2000) and it completes fine. Very nice.
Now while running the MPI benchmark on the ring, I try to live migrate the virtual machine. This produces a 'Kernel Bug' in the virtual machine with the dump pasted below. Also I am pasting the error thrown by the MPI benchmark application.(Seems like some kind of memory corruption while doing migration...)
Has anyone tried successfully doing a live migration while running an MPI application? Could you please help me how to approach this? (On seeing the glibc errors, i moved /lib64/tls to /lib64/tls.disabled. But no difference..)
Thank you, Arun
1. Error message given by the virtual machine's console running and MPI.
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/mmap.c:1961 invalid opcode: 0000 [3] SMP CPU 0 Modules linked in: ipv6 autofs4 i2c_dev i2c_core dm_mirror dm_mod lp parport_pc parport Pid: 4790, comm: smg2000 Not tainted 2.6.16.13-xen #7
RIP: e030:[<ffffffff8016a42b>] <ffffffff8016a42b>{exit_mmap+235} RSP: e02b:ffff880012cddcd8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8800021a42c0 RCX: 000000000000011d RDX: ffffffffff578000 RSI: ffff88002b33e6b8 RDI: ffff88000116da80
RBP: 0000000000000000 R08: ffff8800395445b0 R09: 0000000000000000 R10: 0000000000000537 R11: ffffffff801dac20 R12: ffff880002700880 R13: 0000000000000001 R14: 0000000000000006 R15: ffffffff8010b45d FS: 00002b1d2333e6f0(0000) GS:ffffffff80514000(0000) knlGS:00000000 00000000
CS: e033 DS: 0000 ES: 0000 Process smg2000 (pid: 4790, threadinfo ffff880012cdc000, task ffff88 003f9828b0) Stack: 0000000000002181 ffff8800021a42c0 ffff880002700880 ffff880002 700900 ffff88003f982f1c ffffffff8012ef94 0000000000000006 0000000000 000006
ffff88003f9828b0 ffffffff80135479 Call Trace: <ffffffff8012ef94>{mmput+52} <ffffffff80135479>{do_exit+ 521} <ffffffff8013deae>{__dequeue_signal+478} <ffffffff8010b45d>{s ysret_signal+56}
<ffffffff80135c28>{do_group_exit+264} <ffffffff8014062c>{get_ signal_to_deliver+1708} <ffffffff8010b45d>{sysret_signal+56} <ffffffff8010a5ed>{do_si gnal+157} <ffffffff801378cb>{current_fs_time+59} <ffffffff803a3c62>{__d own_read+18}
<ffffffff80129eec>{try_to_wake_up+924} <ffffffff80196864>{dpu t+84} <ffffffff8013d62c>{sigprocmask+220} <ffffffff8013ef23>{sys_rt _sigprocmask+99} <ffffffff8017b768>{filp_close+104} <ffffffff8013d62c>{sigproc mask+220}
<ffffffff8010b45d>{sysret_signal+56} <ffffffff8010b735>{ptreg scall_common+61}
Code: 0f 0b 68 95 2b 3d 80 c2 a9 07 48 83 c4 10 5b 5d 41 5c c3 66 RIP <ffffffff8016a42b>{exit_mmap+235} RSP <ffff880012cddcd8>
<1>Fixing recursive fault but reboot is needed!
2. Error thrown by the MPI benchmark application: *** glibc detected *** smg2000: free(): invalid pointer: 0x00000000017ef1a0 ***
======= Backtrace: ========= /lib64/libc.so.6[0x2b1d23162c43] /lib64/libc.so.6(__libc_free+0x84)[0x2b1d23162dc4] smg2000[0x42b632] smg2000[0x4289ee] smg2000[0x41d261] smg2000[0x405dee] smg2000[0x4056a8]
smg2000[0x408aad] smg2000[0x403c05] smg2000[0x403730] /lib64/libc.so.6(__libc_start_main+0xf4)[0x2b1d23111e54] smg2000[0x402269] ======= Memory map: ======== 00400000-004bd000 r-xp 00000000 00:15 33425271 /nfsroot/home/abnagara/code/bm/smg2000/test/smg2000
005bc000-005be000 rw-p 000bc000 00:15 33425271 /nfsroot/home/abnagara/code/bm/smg2000/test/smg2000 005be000-0180f000 rw-p 005be000 00:00 0 [heap] 36f8e00000-36f8e0d000 r-xp 00000000 00:0c 37044253 /nfsroot/lib64/libgcc_s-
4.1.0-20060304.so.1 36f8e0d000-36f8f0d000 ---p 0000d000 00:0c 37044253 /nfsroot/lib64/libgcc_s-4.1.0-20060304.so.1 36f8f0d000-36f8f0e000 rw-p 0000d000 00:0c 37044253 /nfsroot/lib64/libgcc_s-
4.1.0-20060304.so.1 2b1d22c37000-2b1d22c51000 r-xp 00000000 00:0c 37044225 /nfsroot/lib64/ld-2.4.so 2b1d22c51000-2b1d22c52000 rw-p 2b1d22c51000 00:00 0 2b1d22c73000-2b1d22c74000 rw-p 2b1d22c73000 00:00 0
2b1d22d50000-2b1d22d51000 r--p 00019000 00:0c 37044225 /nfsroot/lib64/ld-2.4.so 2b1d22d51000-2b1d22d52000 rw-p 0001a000 00:0c 37044225 /nfsroot/lib64/ld-
2.4.so 2b1d22d52000-2b1d22dd2000 r-xp 00000000 00:0c 37044256 /nfsroot/lib64/libm-2.4.so 2b1d22dd2000-2b1d22ed2000 ---p 00080000 00:0c 37044256 /nfsroot/lib64/libm-
2.4.so 2b1d22ed2000-2b1d22ed3000 r--p 00080000 00:0c 37044256 /nfsroot/lib64/libm-2.4.so 2b1d22ed3000-2b1d22ed4000 rw-p 00081000 00:0c 37044256 /nfsroot/lib64/libm-
2.4.so 2b1d22ed4000-2b1d22ee6000 r-xp 00000000 00:0c 37044273 /nfsroot/lib64/libpthread-2.4.so 2b1d22ee6000-2b1d22fe6000 ---p 00012000 00:0c 37044273 /nfsroot/lib64/libpthread-
2.4.so 2b1d22fe6000-2b1d22fe7000 r--p 00012000 00:0c 37044273 /nfsroot/lib64/libpthread-2.4.so 2b1d22fe7000-2b1d22fe8000 rw-p 00013000 00:0c 37044273 /nfsroot/lib64/libpthread-
2.4.so 2b1d22fe8000-2b1d22fec000 rw-p 2b1d22fe8000 00:00 0 2b1d22fec000-2b1d22ff3000 r-xp 00000000 00:0c 37044275 /nfsroot/lib64/librt-2.4.so
2b1d22ff3000-2b1d230f2000 ---p 00007000 00:0c 37044275 /nfsroot/lib64/librt-2.4.so 2b1d230f2000-2b1d230f3000 r--p 00006000 00:0c 37044275 /nfsroot/lib64/librt-
2.4.so 2b1d230f3000-2b1d230f4000 rw-p 00007000 00:0c 37044275 /nfsroot/lib64/librt-2.4.so 2b1d230f4000-2b1d230f5000 rw-p 2b1d230f4000 00:00 0
2b1d230f5000-2b1d23234000 r-xp 00000000 00:0c 37044234 /nfsroot/lib64/libc-2.4.so 2b1d23234000-2b1d23334000 ---p 0013f000 00:0c 37044234 /nfsroot/lib64/libc-
2.4.so 2b1d23334000-2b1d23338000 r--p 0013f000 00:0c 37044234 /nfsroot/lib64/libc-2.4.so 2b1d23338000-2b1d23339000 rw-p 00143000 00:0c 37044234 /nfsroot/lib64/libc-
2.4.so 2b1d23339000-2b1d23458000 rw-p 2b1d23339000 00:00 0 2b1d2348d000-2b1d2350f000 rw-p 2b1d2348d000 00:00 0 2b1d2352b000-2b1d2426d000 rw-p 2b1d2352b000 00:00 0 2b1d24300000-2b1d24321000 rw-p 2b1d24300000 00:00 0
2b1d24321000-2b1d24400000 ---p 2b1d24321000 00:00 0 7fffffd72000-7fffffd87000 rw-p 7fffffd72000 00:00 0 [stack]ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso]
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- [Xen-devel] Live migration with MPI,
Arun Babu <=
|
|
|
|
|