I've been experimenting with using save/restore as a poor-mans migration on 3.0.2-2 and have experienced some real problems with the guests hanging when they are restored. In the end, I was able to repro this simply executing 'xm save' and then 'xm restore' on the same system -- the symptom is that the the restored guest is apparently running (for example, I can ping it) but anything that requires access to the disk hangs up.
Eventually, I dumped the guest and used crash to look at what was going on and I can see that the VBD front end driver is hung waiting for disk I/O to complete -- it seems to reconnect to the backend OK and re-issues I/O that was outstanding when the guest was saved (in all cases, there seems to be at least a single I/O outstanding when the save is done - a syslog write) and even carries on and issues more disk I/O requests _but_ none of them complete.
I should say that both Dom0 and DomU are 3.0.2-2 on top of vanilla 2.6.16 kernels on a dual proc Dell 1850 server. The DomU disk is actually implemented on top of DRBD 0.7 in Dom0 (but we’ve reproduced the same problem without DRBD as well).
Anyone know if this is a known issue with the release? Or suggest further investigation required?
Simon
--- more details ---
I instrumented blkfront and blkback to print the outstanding requests at suspend and resume and also to log state changes in the xenbus connection – here’s an example of the output in the dump of the guest:
Disabling SMP...
xen_blk: blkfront_suspend: device/vbd/769, outstanding requests:
xen_blk: req cf66c5b8: cmd cf66c62c, sec 829ec0, (8/8) buffer:cdd57000 [write]
xen_blk: req cf66c120: cmd cf66c194, sec 1b18008, (8/8) buffer:cfc6c000 [write]
xen_blk: req cf66c510: cmd cf66c584, sec 82c020, (8/16) buffer:cf631000 [write]
xen_blk: req cf66cee8: cmd cf66cf5c, sec 814010, (8/8) buffer:cae2b000 [write]
xen_blk: req cf66cd98: cmd cf66ce0c, sec 1b3b0a8, (8/8) buffer:cfbd3000 [read]
xen_blk: req ca980a50: cmd ca980ac4, sec 908000, (8/8) buffer:ca954000 [write]
xen_blk: req ca980af8: cmd ca980b6c, sec 814018, (8/8) buffer:ca9f1000 [write]
xen_blk: blkfront_resume: device/vbd/769
xenbus: vbd [device/vbd/769->/local/domain/0/backend/vbd/3/769] changed state 1 -> 3
xen_blk: resuming req cf66c5b8: cmd cf66c62c, sec 829ec0, (8/8) buffer:cdd57000 [write]
xen_blk: resuming req cf66c120: cmd cf66c194, sec 1b18008, (8/8) buffer:cfc6c000 [write]
xen_blk: resuming req cf66c510: cmd cf66c584, sec 82c020, (8/16) buffer:cf631000 [write]
xen_blk: resuming req cf66cee8: cmd cf66cf5c, sec 814010, (8/8) buffer:cae2b000 [write]
xen_blk: resuming req cf66cd98: cmd cf66ce0c, sec 1b3b0a8, (8/8) buffer:cfbd3000 [read]
xen_blk: resuming req ca980a50: cmd ca980ac4, sec 908000, (8/8) buffer:ca954000 [write]
xen_blk: resuming req ca980af8: cmd ca980b6c, sec 814018, (8/8) buffer:ca9f1000 [write]
xenbus: vbd [device/vbd/769->/local/domain/0/backend/vbd/3/769] changed state 3 -> 4
xen_blk: backend vbd [device/vbd/769->/local/domain/0/backend/vbd/3/769] changed state to 4 (my state 4).
xen_blk: backend vbd [device/vbd/769->/local/domain/0/backend/vbd/3/769] changed state to 4 (my state 4).
xenbus: vif [device/vif/0->/local/domain/0/backend/vif/3/0] changed state 1 -> 4
Enabling SMP...
Initializing CPU#1
Initializing CPU#2
Initializing CPU#3
You can see the state of the connection goes through more or less the normal sequence, ending up at 4 (connected) for both front and back ends.
xend.log from the suspend:
[2006-05-09 16:43:09 xend.XendDomainInfo] DEBUG (XendDomainInfo:877) XendDomainInfo.handleShutdownWatch
[2006-05-09 16:43:09 xend] DEBUG (DevController:132) Waiting for devices vif.
[2006-05-09 16:43:09 xend] DEBUG (DevController:138) Waiting for 0.
[2006-05-09 16:43:09 xend] DEBUG (DevController:403) hotplugStatusCallback /local/domain/0/backend/vif/4/0/hotplug-status.
[2006-05-09 16:43:10 xend] DEBUG (DevController:403) hotplugStatusCallback /local/domain/0/backend/vif/4/0/hotplug-status.
[2006-05-09 16:43:10 xend] DEBUG (DevController:417) hotplugStatusCallback 1.
[2006-05-09 16:43:10 xend] DEBUG (DevController:132) Waiting for devices usb.
[2006-05-09 16:43:10 xend] DEBUG (DevController:132) Waiting for devices vbd.
[2006-05-09 16:43:10 xend] DEBUG (DevController:138) Waiting for 769.
[2006-05-09 16:43:10 xend] DEBUG (DevController:403) hotplugStatusCallback /local/domain/0/backend/vbd/4/769/hotplug-status.
[2006-05-09 16:43:10 xend] DEBUG (DevController:417) hotplugStatusCallback 1.
[2006-05-09 16:43:10 xend] DEBUG (DevController:132) Waiting for devices irq.
[2006-05-09 16:43:10 xend] DEBUG (DevController:132) Waiting for devices pci.
[2006-05-09 16:43:10 xend] DEBUG (DevController:132) Waiting for devices ioports.
[2006-05-09 16:43:10 xend] DEBUG (DevController:132) Waiting for devices vtpm.
[2006-05-09 16:43:10 xend] INFO (XendDomain:363) Domain sonny1 (4) unpaused.
[2006-05-09 16:44:23 xend] DEBUG (XendCheckpoint:80) [xc_save]: /usr/lib/xen/bin/xc_save 10 19 4 0 0 0
[2006-05-09 16:44:23 xend] DEBUG (XendCheckpoint:200) suspend
[2006-05-09 16:44:23 xend] DEBUG (XendCheckpoint:83) In saveInputHandler suspend
[2006-05-09 16:44:23 xend] DEBUG (XendCheckpoint:85) Suspending 4 ...
[2006-05-09 16:44:23 xend.XendDomainInfo] DEBUG (XendDomainInfo:877) XendDomainInfo.handleShutdownWatch
[2006-05-09 16:44:23 xend.XendDomainInfo] DEBUG (XendDomainInfo:877) XendDomainInfo.handleShutdownWatch
[2006-05-09 16:44:25 xend.XendDomainInfo] INFO (XendDomainInfo:836) Domain has shutdown: name=migrating-sonny1 id=4 reason=suspend.
[2006-05-09 16:44:25 xend] INFO (XendCheckpoint:88) Domain 4 suspended.
[2006-05-09 16:44:25 xend] DEBUG (XendCheckpoint:91) Written done
[2006-05-09 16:44:25 xend] ERROR (XendCheckpoint:227) Had 0 unexplained entries in p2m table
[2006-05-09 16:44:26 xend] ERROR (XendCheckpoint:227) Saving memory pages: iter 1 95% 1: sent 65479, skipped 0, delta 997ms, dom0 97%, target 0%, sent 2152Mb/s, dirtied 0Mb/s 0 pages
[2006-05-09 16:44:26 xend] ERROR (XendCheckpoint:227) Total pages sent= 65479 (0.97x)
[2006-05-09 16:44:26 xend] ERROR (XendCheckpoint:227) (of which 0 were fixups)
[2006-05-09 16:44:26 xend] ERROR (XendCheckpoint:227) All memory is saved
[2006-05-09 16:44:26 xend] ERROR (XendCheckpoint:227) Save exit rc=0
[2006-05-09 16:44:26 xend.XendDomainInfo] DEBUG (XendDomainInfo:1335) XendDomainInfo.destroyDomain(4)
And for the resume:
[2006-05-09 16:44:31 xend.XendDomainInfo] DEBUG (XendDomainInfo:253) XendDomainInfo.restore(['domain', ['domid', '4'], ['uuid', 'bddc3e1e-1287-5620-293c-adfef462fea9'], ['ssidref', '0'], ['vcpus', '4'], ['vcpu_avail', '15'], ['cpu_weight', '1.0'], ['memory', '256'], ['maxmem', '256'], ['name', 'sonny1'], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['image', ['linux', ['kernel', '/boot/vmlinuz-2.6.16-xenU'], ['root', '/dev/hda1 ro selinux=0 enforcing=0'], ['args', '3']]], ['device', ['vif', ['backend', '0'], ['script', 'vif-bridge'], ['mac', '00:00:6c:03:00:01']]], ['device', ['vbd', ['backend', '0'], ['dev', 'hda1'], ['uname', 'phy:drbd0'], ['mode', 'w']]], ['state', '-b----'], ['shutdown_reason', 'poweroff'], ['cpu_time', '5.035287936'], ['online_vcpus', '4'], ['up_time', '74.7920620441'], ['start_time', '1147207388.96'], ['store_mfn', '891415'], ['console_mfn', '891414']])
[2006-05-09 16:44:31 xend.XendDomainInfo] DEBUG (XendDomainInfo:284) parseConfig: config is ['domain', ['domid', '4'], ['uuid', 'bddc3e1e-1287-5620-293c-adfef462fea9'], ['ssidref', '0'], ['vcpus', '4'], ['vcpu_avail', '15'], ['cpu_weight', '1.0'], ['memory', '256'], ['maxmem', '256'], ['name', 'sonny1'], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['image', ['linux', ['kernel', '/boot/vmlinuz-2.6.16-xenU'], ['root', '/dev/hda1 ro selinux=0 enforcing=0'], ['args', '3']]], ['device', ['vif', ['backend', '0'], ['script', 'vif-bridge'], ['mac', '00:00:6c:03:00:01']]], ['device', ['vbd', ['backend', '0'], ['dev', 'hda1'], ['uname', 'phy:drbd0'], ['mode', 'w']]], ['state', '-b----'], ['shutdown_reason', 'poweroff'], ['cpu_time', '5.035287936'], ['online_vcpus', '4'], ['up_time', '74.7920620441'], ['start_time', '1147207388.96'], ['store_mfn', '891415'], ['console_mfn', '891414']]
[2006-05-09 16:44:31 xend.XendDomainInfo] DEBUG (XendDomainInfo:380) parseConfig: result is {'ssidref': 0, 'uuid': 'bddc3e1e-1287-5620-293c-adfef462fea9', 'on_crash': 'restart', 'on_reboot': 'restart', 'image': ['linux', ['kernel', '/boot/vmlinuz-2.6.16-xenU'], ['root', '/dev/hda1 ro selinux=0 enforcing=0'], ['args', '3']], 'on_poweroff': 'destroy', 'cpus': None, 'name': 'sonny1', 'backend': [], 'vcpus': 4, 'cpu_weight': 1.0, 'vcpu_avail': 15, 'memory': 256, 'device': [('vif', ['vif', ['backend', '0'], ['script', 'vif-bridge'], ['mac', '00:00:6c:03:00:01']]), ('vbd', ['vbd', ['backend', '0'], ['dev', 'hda1'], ['uname', 'phy:drbd0'], ['mode', 'w']])], 'bootloader': None, 'cpu': None, 'maxmem': 256}
[2006-05-09 16:44:31 xend.XendDomainInfo] DEBUG (XendDomainInfo:1164) XendDomainInfo.construct: None 0
[2006-05-09 16:44:31 xend.XendDomainInfo] DEBUG (XendDomainInfo:671) Storing VM details: {'ssidref': '0', 'uuid': 'bddc3e1e-1287-5620-293c-adfef462fea9', 'on_reboot': 'restart', 'image': "(linux (kernel /boot/vmlinuz-2.6.16-xenU) (root '/dev/hda1 ro selinux=0 enforcing=0') (args 3))", 'on_poweroff': 'destroy', 'name': 'sonny1', 'vcpus': '4', 'vcpu_avail': '15', 'memory': '256', 'on_crash': 'restart', 'maxmem': '256'}
[2006-05-09 16:44:31 xend] DEBUG (DevController:103) DevController: writing {'backend-id': '0', 'mac': '00:00:6c:03:00:01', 'handle': '0', 'state': '1', 'backend': '/local/domain/0/backend/vif/5/0'} to /local/domain/5/device/vif/0.
[2006-05-09 16:44:31 xend] DEBUG (DevController:105) DevController: writing {'mac': '00:00:6c:03:00:01', 'state': '1', 'handle': '0', 'script': '/etc/xen/scripts/vif-bridge', 'frontend-id': '5', 'domain': 'sonny1', 'frontend': '/local/domain/5/device/vif/0'} to /local/domain/0/backend/vif/5/0.
[2006-05-09 16:44:31 xend] DEBUG (blkif:24) exception looking up device number for hda1: [Errno 2] No such file or directory: '/dev/hda1'
[2006-05-09 16:44:31 xend] DEBUG (DevController:103) DevController: writing {'virtual-device': '769', 'backend-id': '0', 'state': '1', 'backend': '/local/domain/0/backend/vbd/5/769'} to /local/domain/5/device/vbd/769.
[2006-05-09 16:44:31 xend] DEBUG (DevController:105) DevController: writing {'domain': 'sonny1', 'frontend': '/local/domain/5/device/vbd/769', 'dev': 'hda1', 'state': '1', 'params': 'drbd0', 'mode': 'w', 'frontend-id': '5', 'type': 'phy'} to /local/domain/0/backend/vbd/5/769.
[2006-05-09 16:44:31 xend.XendDomainInfo] DEBUG (XendDomainInfo:696) Storing domain details: {'console/port': '2', 'cpu/3/availability': 'online', 'name': 'sonny1', 'console/limit': '1048576', 'cpu/2/availability': 'online', 'vm': '/vm/bddc3e1e-1287-5620-293c-adfef462fea9', 'domid': '5', 'cpu/0/availability': 'online', 'memory/target': '262144', 'cpu/1/availability': 'online', 'store/port': '1'}
[2006-05-09 16:44:31 xend] DEBUG (balloon:126) Balloon: free 265; need 265; done.
[2006-05-09 16:44:31 xend] DEBUG (XendCheckpoint:148) [xc_restore]: /usr/lib/xen/bin/xc_restore 10 19 5 67584 1 2
[2006-05-09 16:44:31 xend] ERROR (XendCheckpoint:227) xc_linux_restore start: max_pfn = 10800
[2006-05-09 16:44:31 xend] ERROR (XendCheckpoint:227) Increased domain reservation by 42000 KB
[2006-05-09 16:44:31 xend] ERROR (XendCheckpoint:227) Reloading memory pages: 0%
[2006-05-09 16:44:32 xend] DEBUG (XendCheckpoint:200) store-mfn 807620
[2006-05-09 16:44:32 xend] DEBUG (XendCheckpoint:200) console-mfn 809031
[2006-05-09 16:44:32 xend] ERROR (XendCheckpoint:227) Received all pages (0 races)
[2006-05-09 16:44:32 xend] ERROR (XendCheckpoint:227) 100%
[2006-05-09 16:44:32 xend] ERROR (XendCheckpoint:227) Memory reloaded.
[2006-05-09 16:44:32 xend] ERROR (XendCheckpoint:227) Decreased reservation by 2105 pages
[2006-05-09 16:44:32 xend] ERROR (XendCheckpoint:227) Domain ready to be built.
[2006-05-09 16:44:32 xend] ERROR (XendCheckpoint:227) Restore exit with rc=0
[2006-05-09 16:44:32 xend.XendDomainInfo] DEBUG (XendDomainInfo:648) XendDomainInfo.completeRestore
[2006-05-09 16:44:32 xend.XendDomainInfo] DEBUG (XendDomainInfo:696) Storing domain details: {'console/ring-ref': '809031', 'console/port': '2', 'cpu/3/availability': 'online', 'name': 'sonny1', 'console/limit': '1048576', 'cpu/2/availability': 'online', 'vm': '/vm/bddc3e1e-1287-5620-293c-adfef462fea9', 'domid': '5', 'cpu/0/availability': 'online', 'memory/target': '262144', 'store/ring-ref': '807620', 'cpu/1/availability': 'online', 'store/port': '1'}
[2006-05-09 16:44:32 xend.XendDomainInfo] DEBUG (XendDomainInfo:877) XendDomainInfo.handleShutdownWatch
[2006-05-09 16:44:32 xend.XendDomainInfo] DEBUG (XendDomainInfo:658) XendDomainInfo.completeRestore done
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|