|
|
|
|
|
|
|
|
|
|
xen-users
[Xen-users] HVM Live Migrations Failing 90% Of The Time
I'm deploying a 2-node Pacemaker/DRBD backed Xen cluster to run a
mixture of Linux PVM and Windows HVM VMs. I have this up and running on
a pair of development machines, with both automatic and manual failover
working perfectly. The live migrations work every time for the PVM and
HVM based VMs.
I've replicated the setup onto a pair of high-end live machines, but the
live migrations only succeed around 10% of the time for the HVM VMs. PVM
live migrations complete every time. The configurations on the
development and live machines are identical in every way, except for the
physical hardware.
The migrating host errors with the following when the migration fails:
[2010-04-07 14:42:45 6211] DEBUG (XendCheckpoint:103) [xc_save]:
/usr/lib64/xen/bin/xc_save 30 18 0 0 5
[2010-04-07 14:42:45 6211] INFO (XendCheckpoint:403) xc_save: could not
read suspend event channel
[2010-04-07 14:42:45 6211] WARNING (XendDomainInfo:1617) Domain has
crashed: name=migrating-web id=18.
[2010-04-07 14:42:45 6211] DEBUG (XendDomainInfo:2389)
XendDomainInfo.destroy: domid=18
[2010-04-07 14:42:45 6211] DEBUG (XendDomainInfo:2406)
XendDomainInfo.destroyDomain(18)
[2010-04-07 14:42:48 6211] DEBUG (XendDomainInfo:1939) Destroying device
model
[2010-04-07 14:42:48 6211] INFO (XendCheckpoint:403) Saving memory
pages: iter 1 10%ERROR Internal error: Error peeking shadow bitmap
[2010-04-07 14:42:48 6211] INFO (XendCheckpoint:403) Warning - couldn't
disable shadow modeSave exit rc=1
[2010-04-07 14:42:48 6211] ERROR (XendCheckpoint:157) Save failed on
domain web (18) - resuming.
Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/xen/xend/XendCheckpoint.py",
line 125, in save
forkHelper(cmd, fd, saveInputHandler, False)
File "/usr/lib/python2.5/site-packages/xen/xend/XendCheckpoint.py",
line 391, in forkHelper
raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib64/xen/bin/xc_save 30 18 0 0 5 failed
With the below also being logged in /var/log/xen/qemu-dm-web.log:
xenstore_process_logdirty_event: key=000000006b8b4567 size=335816
Log-dirty: mapped segment at 0x7fb56c136000
Triggered log-dirty buffer switch
The host that is being migrated to errors with the following:
[2010-04-07 14:42:45 6227] INFO (XendCheckpoint:403) Reloading memory
pages: 0%
[2010-04-07 14:42:48 6227] INFO (XendCheckpoint:403) ERROR Internal
error: Error when reading batch size
[2010-04-07 14:42:48 6227] INFO (XendCheckpoint:403) Restore exit with rc=1
[2010-04-07 14:42:48 6227] DEBUG (XendDomainInfo:2389)
XendDomainInfo.destroy: domid=26
[2010-04-07 14:42:48 6227] DEBUG (XendDomainInfo:2406)
XendDomainInfo.destroyDomain(26)
[2010-04-07 14:42:48 6227] ERROR (XendDomainInfo:2418)
XendDomainInfo.destroy: xc.domain_destroy failed.
Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/xen/xend/XendDomainInfo.py",
line 2413, in destroyDomain
xc.domain_destroy(self.domid)
Error: (3, 'No such process')
Some basic config details:
Xen version: 3.3.0
Kernel: 2.6.24-27-xen
dom0 OS: Ubuntu 8.04 64-bit
domU OS: Windows 2008 64-bit
VM config for the above example:
name = "web"
kernel = "/usr/lib/xen/boot/hvmloader"
builder='hvm'
memory = 10240
shadow_memory = 8
vif = [ 'bridge=eth1' ]
acpi = 1
apic = 1
disk = [ 'phy:/dev/drbd0,hda,w', 'phy:/dev/drbd1,hdb,w' ]
device_model = '/usr/lib64/xen/bin/qemu-dm'
boot="dc"
sdl=0
vnc=1
vncconsole=1
vncpasswd='XXXXXXXXXXXX'
serial='pty'
usbdevice='tablet'
vcpus=8
on_poweroff = 'destroy'
on_reboot = 'restart'
on_crash = 'destroy'
The DRBD resources are handled by Jefferson Ogata's qemu-dm.drbd wrapper
(http://www.antibozo.net/xen/qemu-dm.drbd) and a slightly modified
version of DRBD's block-drbd script.
The dom0 machines are allocated 1GB of memory each and are identical, in
both software and hardware configurations. Each machine has a total of
24GB of memory.
Thanks
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-users] HVM Live Migrations Failing 90% Of The Time,
Tim O'Donovan <=
|
|
|
|
|