WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Live checkpointing not working in 3.4.x?

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] Live checkpointing not working in 3.4.x?
From: Tom Verbiscer <xen@xxxxxxxxxxxxx>
Date: Thu, 04 Mar 2010 00:12:56 -0600
Delivery-date: Wed, 03 Mar 2010 22:14:40 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Reply-to: xen@xxxxxxxxxxxxx
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.23 (X11/20090817)
I've been banging my head against a wall for a couple days now. Does anyone know if live checkpointing ('xm save -c') is currently working in 3.4.x? I've now tried with 3.4.0 on OracleVM, 3.4.1 on CentOS 5.4 and 3.4.2 on OpenSolaris. Each platform gives me the same results. It seems like the suspend works but does not release the devices so when the resume runs, it freaks because the devices are already attached. I don't know enough about Xen to know if the devices are supposed to remain attached (because it doesn't destroy the domain) or not. Every time I try to live checkpoint the VM winds up suspended and the only way to bring it back to life is to run 'xm destroy' on it and then 'xm resume'. I'll be happy to provide more logs if I've leaving something out. The following is on a OracleVM hypervisor (yes, OracleVM doesn't support checkpointing but the results are the same with vanilla Xen). Also doesn't matter if I use a file backend device for the disk or a physical device or a file on an NFS share, same result.

Thanks,
Tom

[root@compute-01 ~]# rpm -qa | grep xen
xen-devel-3.4.0-0.0.23.el5
xen-tools-3.4.0-0.0.23.el5
xen-debugger-3.4.0-0.0.23.el5
xen-3.4.0-0.0.23.el5
xen-64-3.4.0-0.0.23.el5
[root@compute-01 ~]# uname -a
Linux compute-01.example.com 2.6.18-128.2.1.4.9.el5xen #1 SMP Fri Oct 9 14:57:31 EDT 2009 i686 i686 i386 GNU/Linux

[root@compute-01 ~]# cat /OVS/running_pool/1_ovm_pv_01_example_com/vm.cfg
bootargs = 'bridge=xenbr0,mac=00:16:3E:AA:EB:08,type=netfront'
bootloader = '/usr/bin/pypxeboot'
disk = ['file:/tmp/System.img,xvda,w']
maxmem = 512
memory = 512
name = '1_ovm_pv_01_example_com'
on_crash = 'restart'
on_reboot = 'restart'
uuid = '7408c627-3232-4c1d-b5e3-1cf05cb015c8'
vcpus = 1
vfb = ['type=vnc,vncunused=1,vnclisten=0.0.0.0,vncpasswd=<removed>']
vif = ['bridge=xenbr0,mac=00:16:3E:AA:EB:08,type=netfront']
vif_other_config = []



xend.log

[2010-03-02 17:22:38 2840] DEBUG (XendCheckpoint:110) [xc_save]: /usr/lib/xen/bin/xc_save 43 6 0 0 0 [2010-03-02 17:22:38 2840] INFO (XendCheckpoint:418) xc_save: failed to get the suspend evtchn port
[2010-03-02 17:22:38 2840] INFO (XendCheckpoint:418)
[2010-03-02 17:22:38 2840] DEBUG (XendCheckpoint:389) suspend
[2010-03-02 17:22:38 2840] DEBUG (XendCheckpoint:113) In saveInputHandler suspend
[2010-03-02 17:22:38 2840] DEBUG (XendCheckpoint:115) Suspending 6 ...
[2010-03-02 17:22:38 2840] DEBUG (XendDomainInfo:520) XendDomainInfo.shutdown(suspend) [2010-03-02 17:22:38 2840] DEBUG (XendDomainInfo:1727) XendDomainInfo.handleShutdownWatch [2010-03-02 17:22:38 2840] DEBUG (XendDomainInfo:1727) XendDomainInfo.handleShutdownWatch [2010-03-02 17:22:38 2840] INFO (XendDomainInfo:1915) Domain has shutdown: name=migrating-1_ovm_pv_01_example_com id=6 reason=suspend.
[2010-03-02 17:22:38 2840] INFO (XendCheckpoint:121) Domain 6 suspended.
[2010-03-02 17:22:38 2840] DEBUG (XendCheckpoint:130) Written done
[2010-03-02 17:22:38 2840] INFO (XendCheckpoint:418) Had 0 unexplained entries in p2m table [2010-03-02 17:22:46 2840] INFO (XendCheckpoint:418) Saving memory pages: iter 1 0%^H^H^H^H 5%^H^H^H^H 10%^H^H^H^H 15%^H^H^H^H 20%^H^H^H^H 25%^H^H^H^H 30%^H^H^H^H 35%^H^H^H^H 40%^H^H^H^H 45%^H^H^H^H 50%^H^H^H^H 55%^H^H^H^H 60%^H^H^H^H 65%^H^H^H^H 70%^H^H^H^H 75%^H^H^H^H 80%^H^H^H^H 85%^H^H^H^H 90%^H^H^H^H 95%^M 1: sent 131072, skipped 0, delta 8194ms, dom0 17%, target 0%, sent 524Mb/s, dirtied 0Mb/s 0 pages [2010-03-02 17:22:46 2840] INFO (XendCheckpoint:418) Total pages sent= 131072 (0.98x) [2010-03-02 17:22:46 2840] INFO (XendCheckpoint:418) (of which 0 were fixups)
[2010-03-02 17:22:46 2840] INFO (XendCheckpoint:418) All memory is saved
[2010-03-02 17:22:47 2840] INFO (XendCheckpoint:418) Save exit rc=0
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:2804) XendDomainInfo.resumeDomain(6) [2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:2221) Destroying device model [2010-03-02 17:22:47 2840] INFO (image:553) migrating-1_ovm_pv_01_example_com device model terminated
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:2228) Releasing devices
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:2241) Removing vif/0
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:2241) Removing vbd/51712
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51712
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:2241) Removing vkbd/0
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vkbd, device = vkbd/0
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:2241) Removing vfb/0
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:2241) Removing console/0
[2010-03-02 17:22:47 2840] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0 [2010-03-02 17:22:47 2840] INFO (XendDomainInfo:3028) Dev 51712 still active, looping... [2010-03-02 17:22:47 2840] INFO (XendDomainInfo:3028) Dev 51712 still active, looping...
<many>
[2010-03-02 17:23:17 2840] INFO (XendDomainInfo:3028) Dev 51712 still active, looping... [2010-03-02 17:23:17 2840] INFO (XendDomainInfo:3034) Dev still active but hit max loop timeout [2010-03-02 17:23:17 2840] INFO (XendDomainInfo:3047) Dev 0 still active, looping... [2010-03-02 17:23:17 2840] INFO (XendDomainInfo:3047) Dev 0 still active, looping...
<many>
[2010-03-02 17:23:47 2840] INFO (XendDomainInfo:3047) Dev 0 still active, looping... [2010-03-02 17:23:47 2840] INFO (XendDomainInfo:3053) Dev still active but hit max loop timeout [2010-03-02 17:23:47 2840] DEBUG (XendDomainInfo:2826) XendDomainInfo.resumeDomain: devices released [2010-03-02 17:23:47 2840] DEBUG (XendDomainInfo:1727) XendDomainInfo.handleShutdownWatch [2010-03-02 17:23:47 2840] DEBUG (XendDomainInfo:1640) Storing domain details: {'console/ring-ref': '1211263', 'image/entry': '2149580800', 'console/port': '2', 'store/ring-ref': '1211264', 'image/loader': 'generic', 'vm': '/vm/b9efadc3-3dc5-4c8b-bb32-27e3f6217ff3', 'control/platform-feature-multiprocessor-suspend': '1', 'image/guest-os': 'linux', 'image/features/writable-descriptor-tables': '1', 'image/virt-base': '2147483648', 'memory/target': '524288', 'image/guest-version': '2.6', 'image/features/supervisor-mode-kernel': '1', 'console/limit': '1048576', 'image/paddr-offset': '2147483648', 'image/hypercall-page': '2149605376', 'cpu/0/availability': 'online', 'image/features/pae-pgdir-above-4gb': '1', 'image/features/writable-page-tables': '1', 'console/type': 'ioemu', 'image/features/auto-translated-physmap': '1', 'name': 'migrating-1_ovm_pv_01_example_com', 'domid': '6', 'image/xen-version': 'xen-3.0', 'store/port': '1'} [2010-03-02 17:23:47 2840] INFO (XendDomainInfo:2180) createDevice: vkbd : {'devid': 0, 'uuid': '89b96740-8d56-e9a6-4a3b-cbddf1810bf1'} [2010-03-02 17:23:47 2840] DEBUG (DevController:95) DevController: writing {'protocol': 'x86_64-abi', 'state': '1', 'backend-id': '0', 'backend': '/local/domain/0/backend/vkbd/6/0'} to /local/domain/6/device/vkbd/0. [2010-03-02 17:23:47 2840] DEBUG (DevController:97) DevController: writing {'frontend-id': '6', 'domain': 'migrating-1_ovm_pv_01_example_com', 'frontend': '/local/domain/6/device/vkbd/0', 'state': '1', 'online': '1'} to /local/domain/0/backend/vkbd/6/0. [2010-03-02 17:23:47 2840] INFO (XendDomainInfo:2180) createDevice: vfb : {'vncunused': '1', 'other_config': {'vncunused': '1', 'vncpasswd': 'XXXXXXXX', 'vnclisten': '0.0.0.0', 'vnc': '1', 'xauthority': '/root/.Xauthority'}, 'vnc': '1', 'xauthority': '/root/.Xauthority', 'vnclisten': '0.0.0.0', 'vncpasswd': 'XXXXXXXX', 'location': '0.0.0.0:5900', 'devid': 0, 'uuid': '3f989332-a2f2-5a41-1688-b460d3ac8192'} [2010-03-02 17:23:47 2840] DEBUG (DevController:95) DevController: writing {'protocol': 'x86_64-abi', 'state': '1', 'backend-id': '0', 'backend': '/local/domain/0/backend/vfb/6/0'} to /local/domain/6/device/vfb/0. [2010-03-02 17:23:47 2840] DEBUG (DevController:97) DevController: writing {'vncunused': '1', 'domain': 'migrating-1_ovm_pv_01_example_com', 'frontend': '/local/domain/6/device/vfb/0', 'xauthority': '/root/.Xauthority', 'frontend-id': '6', 'vnclisten': '0.0.0.0', 'vncpasswd': 'XXXXXXXX', 'state': '1', 'location': '0.0.0.0:5900', 'online': '1', 'vnc': '1', 'uuid': '3f989332-a2f2-5a41-1688-b460d3ac8192'} to /local/domain/0/backend/vfb/6/0. [2010-03-02 17:23:47 2840] INFO (XendDomainInfo:2180) createDevice: console : {'location': '2', 'devid': 0, 'protocol': 'vt100', 'uuid': '6ce7f874-5cdf-d038-3a1d-27ad1baa3497', 'other_config': {}} [2010-03-02 17:23:47 2840] DEBUG (DevController:95) DevController: writing {'protocol': 'x86_64-abi', 'state': '1', 'backend-id': '0', 'backend': '/local/domain/0/backend/console/6/1'} to /local/domain/6/device/console/1. [2010-03-02 17:23:47 2840] DEBUG (DevController:97) DevController: writing {'domain': 'migrating-1_ovm_pv_01_example_com', 'frontend': '/local/domain/6/device/console/1', 'uuid': '6ce7f874-5cdf-d038-3a1d-27ad1baa3497', 'frontend-id': '6', 'state': '1', 'location': '2', 'online': '1', 'protocol': 'vt100'} to /local/domain/0/backend/console/6/1. [2010-03-02 17:23:47 2840] INFO (XendDomainInfo:2180) createDevice: vbd : {'uuid': '56336c23-4848-c780-3dcd-fa0305797f25', 'bootable': 1, 'devid': 51712, 'driver': 'paravirtualised', 'dev': 'xvda', 'uname': 'file:/tmp/System.img', 'mode': 'w'} [2010-03-02 17:23:47 2840] ERROR (XendDomainInfo:2843) XendDomainInfo.resume: xc.domain_resume failed on domain 6.
Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2837, in resumeDomain
   self._createDevices()
File "/usr/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2182, in _createDevices
   devid = self._createDevice(devclass, config)
File "/usr/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2149, in _createDevice
   return self.getDeviceController(deviceClass).createDevice(devConfig)
File "/usr/lib/python2.4/site-packages/xen/xend/server/DevController.py", line 91, in createDevice
   raise VmError("Device %s is already connected." % dev_str)
VmError: Device xvda (51712, vbd) is already connected.
[2010-03-02 17:23:47 2840] DEBUG (XendDomainInfo:2845) XendDomainInfo.resumeDomain: completed


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>