WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Migration stalls with 2.6.26.5 kernel

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] Migration stalls with 2.6.26.5 kernel
From: Trevor Bentley <trevor.bentley@xxxxxxxxxxxxxxxxxxx>
Date: Thu, 18 Sep 2008 15:55:56 -0400
Delivery-date: Thu, 18 Sep 2008 12:56:33 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.16 (Macintosh/20080707)
Hello,

I have been struggling through the task of moving our infrastructure over to Xen VMs. We were initially using Ubuntu packages for both dom0 and our domUs, but experienced extreme instability so we moved to CentOS, which has been much more reliable for dom0. Since we already had a bunch of Ubuntu VMs, we left them using the Ubuntu 2.4.24-19-xen kernel, but this has turned out to be a mistake -- we get frequent kernel oopses during heavy disk I/O. We modified the kernel to add NFS-root support, but that is the only change we made to the original config. All of our domUs mount their root file systems over NFS.

My problem is that I tried to upgrade the domU kernels to the latest kernel.org stable release (2.6.26.5) and did manage to get it working after some initial trouble (TCP checksum offloading was breaking NFS). However, the new kernel will not live migrate anymore. When I execute the live migrate command:

# xm migrate --live testvm 192.168.1.20

Migration hangs forever. The VM changes name to "migrate-testvm" and keeps running normally on the system it was on, and appears as "testvm" with state "-br---" on the destination machine with 0 CPU time. I left tcpdump running on the destination machine and captured an 84MB pcap file which looked pretty normal up until all traffic just completely stopped. If I just change the "kernel=" line in the config script to the Ubuntu kernel migration works again.

Here's my VM configuration:
------------------- name = 'testvm'
kernel      = '/xen_vm/global/kernels/vmlinuz-2.6.26.5'
ramdisk     = '/xen_vm/global/kernels/initrd.img-xen-latest'
memory      = '256'
disk        = ['tap:aio:/xen_vm/global/swaps/testvm.img,xvda1,w']
vif         = [
               'mac=00:16:3e:5b:8d:5d,bridge=xenbr0',
               'mac=00:16:3e:99:9b:e7,bridge=xenbr1'
             ]
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'
extra       = '2 console=hvc0 root=/dev/nfs ip=:192.168.1.12::::eth1:'
nfs_server  = '192.168.1.12'
nfs_root    = '/xen_vm/testvm'
-------------------


xend.log on source:
-------------------
[2008-09-18 15:51:11 xend 3751] DEBUG (balloon:127) Balloon: 786956 KiB free; need 2048; done. [2008-09-18 15:51:11 xend 3751] DEBUG (XendCheckpoint:89) [xc_save]: /usr/lib/xen/bin/xc_save 33 38 0 0 1
-------------------

xend.log on destination:
-------------------
...
[2008-09-18 15:51:11 xend.XendDomainInfo 3331] DEBUG (XendDomainInfo:1350) XendDomainInfo.construct: None [2008-09-18 15:51:11 xend 3331] DEBUG (balloon:127) Balloon: 262832 KiB free; need 2048; done.
...
[2008-09-18 15:51:11 xend 3331] DEBUG (blkif:24) exception looking up device number for xvda1: [Errno 2] No such file or directory: '/dev/xvda1' [2008-09-18 15:51:11 xend 3331] DEBUG (DevController:110) DevController: writing {'backend-id': '0', 'virtual-device': '51713', 'device-type': 'disk', 'state': '1', 'backend': '/local/domain/0/backend/tap/10/51713'} to /local/domain/10/device/vbd/51713.
...
[2008-09-18 15:51:12 xend 3331] DEBUG (XendCheckpoint:198) restore:shadow=0x0, _static_max=0x100, _static_min=0x100, [2008-09-18 15:51:12 xend 3331] DEBUG (balloon:127) Balloon: 262832 KiB free; need 262144; done. [2008-09-18 15:51:12 xend 3331] DEBUG (XendCheckpoint:215) [xc_restore]: /usr/lib/xen/bin/xc_restore 24 10 1 2 0 0 0
-------------------


Xen version: xen-3.0-x86_32p
dom0: 2.6.18-92.1.10.el5xen

Anybody know what would cause this, or have any suggestions for tracking down the problem? I did find a post from someone who was seeing identical behavior who claimed he fixed it by enabling CPU Hotplug support, but I already have that enabled in the kernel.

Thanks,

Trevor

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>