It looks like a live migration bug may have been introduced in 22.214.171.124...
I've been experiencing issues where upon live migration, the domU simply
hangs once it gets resumed on the target dom0. I've been unable to get
any crash information out of the domU, nothing comes up in xm dmesg.
There could be a kernel panic happening but since I can't connect to the
console during the migration I haven't been able to get anything useful.
Comparing a successful migration to a failed one in the xend.log and
xen-debug.log, nothing stands out as being different.
Testing a wide variety of VM's to see why some worked and some didn't,
I've narrowed it down to the domU kernel version and down to 126.96.36.199
specifically by trying these versions:
All are the stock kernel off kernel.org.
Note that this isn't consistent at all, I've got 6 dom0's and this only
happens when migrating certain directions between certain dom0's:
Previously, xen6->5 worked but xen5->6 didn't work. After a few reboots
(of the dom0) however the problem between them resolved itself and now I
can go xen5->6 and back all day on 188.8.131.52 without issues. If i then
migrate it to xen1 it's fine, but back to xen5 and it locks up on resume.
All 6 xen dom0's are identical:
xen5 ~ # xm info
host : xen5
release : 184.108.40.206
version : #11 SMP Wed Jan 26 10:55:28 PST 2011
machine : x86_64
nr_cpus : 12
nr_nodes : 2
cores_per_socket : 6
threads_per_core : 1
cpu_mhz : 2266
virt_caps : hvm hvm_directio
total_memory : 40950
free_memory : 38380
node_to_cpu : node0:0-5
node_to_memory : node0:23388
node_to_dma32_mem : node0:2994
max_node_id : 1
xen_major : 4
xen_minor : 0
xen_extra : .1-rc6-pre
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset : unavailable
xen_commandline : console=com1,com2,vga com1=115200,8n1
com2=115200,8n1 dom0_mem=1024M dom0_max_vcpus=1 dom0_vcpus_pin=true
cc_compiler : gcc version 4.3.4 (Gentoo 4.3.4 p1.1, pie-10.1.5)
cc_compile_by : root
cc_compile_date : Tue Jan 25 17:05:03 PST 2011
xend_config_format : 4
I've tried updating to a newer dom0 release but ran into linking issues
due to as-needed so I haven't managed to get them up yet.
Looking at the changelog for 220.127.116.11
were two xen patches made, both involving resuming.
Diffs for the two patches:
I've tried reversing them together and 1 at a time, yet the problem
still happens. I then took 18.104.22.168 and applied those patches and it's
completely stable. So whatever is causing this was apparently not a
Anyone have any ideas on what might be going on here or how I can debug
it further? I'm completely stumped at this point, don't want to just try
applying every patch in 22.214.171.124 to see which one is doing it.
Compiling + testing all these kernels is time consuming =)
Xen-devel mailing list