All,
I am currently running a xen-2.0-testing snapshot from April 20. I'm
having sporadic problems with migration.
I have two xen machines, 10.130.2.35 and 10.130.2.36, booting from a
read-only, iso image loopback iscsi target from a third machine. I'm
using the Cisco iscsi-initiator and iscsi-init module for the boot. The
iscsi has been solid so far.
The scsi target ends up mounted to /dev/sda in Dom 0 on both the
machines. I then use that same read-only mount and, as the following
xenU config file shows, gets exported up to /dev/hda when a xenU gets
created:
kernel = "/boot/kernel-2.6.11-xen-2.0.5-domU"
ramdisk = "/boot/initrd"
memory = 64
name = "test"
vif = [ 'mac=00:55:4F:44:00:01' ]
disk = [ 'phy:sda,hda,r' ]
dhcp="dhcp"
root = "/dev/ram0 ro init=/linuxrc cdroot"
Everything boots just fine. The "test" xenU runs flawlessly; I can ssh
into it, run whatever. No problems there. And it's surprisingly fast
over iscsi, even though I've only got 100 Mbit Ethernet adapters.
BUT...
I've been migrating between the machines, both live and non-live, with
mixed success. Sometimes, I'd say every 1 in 10 migrations, I get the
errors posted in the attached xfrd.log files. The .1 file is the source
of the migration and the .2 is the destination. The other 9 of 10
times, it migrates just fine.
I don't seem to get these problems when I do not export /dev/sda to a
domU. For example, if I use just a simple domU (using the same kernel)
with no mounts and an initrd file system, I don't have these problems.
I saw mailing list messages a while back dealing with migration and the
possibility of a crash under heavy network load. Further, I saw a patch
that had been applied:
<QUOTE>
[PATCH] stream fixes for migration
I've attached a patch for libxutil/libxc. This fixes one of the hangs =
I've seen during migrations. It applies against 2.0 and 2.0-testing.
Changes:
* Encountering EOF or error when xfrd reads from stream could cause an =
infinite loop.
* Cleaned up the closing of streams.
* Fixed several memory leaks.
Signed-off-by: Charles Coffing <ccoffing@xxxxxxxxxx>
</QUOTE>
The version of 2.0-testing I'm using has this patch applied. But the
comments in this patch imply that there are still more "hangs" during
migration. Have a stumbled on another one of these?
I believe this patch fixed a previous problem, I would get a looping
hang under 2.0.5 stable; I haven't seen that after going to 2.0-testing.
Am I making incorrect assumptions that I can read-only mount an iscsi
target twice?
Or could hardware be a factor? For testing, I'm just running cheap-o VIA
Rhine 100-TX controllers. I thought I would post this before shelling
out for some Intel gig nics and gig switches though.
Thank you very much for your help.
-James Henderson
2605 [INF] XFRD> Accepted connection from 127.0.0.1:1145 on 2
2759 [INF] XFRD> Xfr service for 127.0.0.1:1145
[DEBUG] Conn_init> flags=1
[DEBUG] Conn_init> write stream...
[DEBUG] stream_init>mode=w flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_init> read stream...
[DEBUG] stream_init>mode=r flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_sxpr>
(xfr.hello 1 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.migrate 5 "(domain (id 5) (name test) (memory 63) (maxmem 65536) (state
-b---) (cpu 0) (cpu_time 0.137634952) (up_time 15.0879249573) (start_time
1114545755.39) (console (status listening) (id 11) (domain 5) (local_port 11)
(remote_port 1) (console_port 9605)) (devices (vif (idx 0) (vif 0) (mac
00:55:4f:44:00:01)(vifname vif5.0) (evtchn 12 3) (index 0)) (vbd (idx 0) (vdev
768) (device 2048)(mode r) (dev hda) (uname phy:sda) (node sda) (index 0)))
(config (vm (name test) (memory 64) (image (linux (kernel
/boot/kernel-2.6.11-xen-2.0.5-domU) (ramdisk /boot/initrd) (ip
:1.2.3.4::::eth0:dhcp) (root '/dev/ram0 ro init=/linuxrc cdroot'))) (device
(vbd (uname phy:sda) (dev hda) (mode r))) (device (vif (mac
00:55:4F:44:00:01))))))" 10.130.2.36 8002 1 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_connect> addr=10.130.2.36:8002
[DEBUG] Conn_init> flags=1
[DEBUG] Conn_init> write stream...
[DEBUG] stream_init>mode=w flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_init> read stream...
[DEBUG] stream_init>mode=r flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_sxpr>
(xfr.err 0)[DEBUG] Conn_sxpr< err=0
[1114545770.483473] xc_linux_save start 5
xc_linux_save start 5
[1114545770.485161] Saving memory pages: iter 1 0%
Saving memory pages: iter 1 0%FNI 189 : [1000000c,1020] pte=00be4063,
mfn=00000be4, pfn=ffffffff [mfn]=deadbeef
6%
12%
18%
25%
31%
38%
44%
50%
56%
63%
69%
75%
82%
88%
95%
1: sent 16165, skipped 219,
1: sent 16165, skipped 219, delta 6695ms, dom0 21%, target 73%, sent 79Mb/s,
dirtied 1Mb/s 260 pages
[1114545777.180435] Saving memory pages: iter 2 0%
2: sent 242, skipped 12, 2 0%
2: sent 242, skipped 12, delta 102ms, dom0 20%, target 79%, sent 77Mb/s,
dirtied 3Mb/s 12 pages
[1114545777.283396] Saving memory pages: iter 3 0%
3: sent 0, skipped 12, r 3 0%
3: sent 0, skipped 12, [DEBUG] Conn_sxpr>
(xfr.err 22)[DEBUG] Conn_sxpr< err=0
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Retry suspend domain (0)
Unable to suspend domain. (0)
Unable to suspend domain. (0)
Domain appears not to have suspended: 0
Domain appears not to have suspended: 0
2759 [WRN] XFRD> Transfer errors:
2759 [WRN] XFRD> state=XFR_STATE err=1
2759 [INF] XFRD> Xfr service err=1
2515 [INF] XFRD> Accepted connection from 10.130.2.35:4227 on 2
2656 [INF] XFRD> Xfr service for 10.130.2.35:4227
[DEBUG] Conn_init> flags=1
[DEBUG] Conn_init> write stream...
[DEBUG] stream_init>mode=w flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_init> read stream...
[DEBUG] stream_init>mode=r flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_sxpr>
(xfr.hello 1 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.xfr 5)[DEBUG] Conn_sxpr< err=0
[1114545766.260913] xc_linux_restore start
xc_linux_restore start
[1114545766.265957] Created domain 5
Created domain 5
(Domain-0 Domain-5)'domain id=5 name=test memory=64 console=9605
image=/boot/kernel-2.6.11-xen-2.0.5-domU'[1114545766.340293] Reloading memory
pages: 0%
Reloading memory pages: 6%
12%
18%
25%
31%
37%
43%
50%
56%
62%
68%
75%
81%
87%
93%
98%
98%Error when reading from state file
Error when reading from state file
2656 [INF] XFRD> Xfr service err=1
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|