WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: Xen-unstable save error

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Re: Xen-unstable save error
From: Michal Novotny <minovotn@xxxxxxxxxx>
Date: Mon, 21 Jun 2010 16:02:18 +0200
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 21 Jun 2010 07:03:42 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C8452B88.1810E%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C8452B88.1810E%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-3.fc13 Thunderbird/3.0.4
On 06/21/2010 03:45 PM, Keir Fraser wrote:
On 21/06/2010 14:37, "Michal Novotny"<minovotn@xxxxxxxxxx>  wrote:

My guest is RHEL-5 i386 guest but this seems that the suspend port is
missing. AFAIK, you started using the SUSPEND_CANCEL some time ago which
requires the modified kernel.

Isn't it possible that's the issue or how is it with the SUSPEND_CANCEL
functionality?
SUSPEND_CANCEL is a different thing. The suspend port is simply a quicker
way for suspend notifications to be passed back and forth between the guest
and the dom0 toolstack. We fall back okay if the guest kernel does not
support the new faster method.

I'm not sure why the domain restore operation fails. Unfortunately some
error messages are now expected in the logs, since Remus functionality went
into the tree. So it's hard to work out what the first error is.

  -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
Ok Keir, but what I don't understand is why there's nothing in `/local/domain/%d/device/suspend/event-channel`. So this is OK?

For the restore functionality:

# ls -ahl rhel5-32fv.sav
-rwxr-xr-x 1 root root 53M Jun 21  2010 rhel5-32fv.sav

As you can see the save file is 53M big but the guest was having 1G of memory and I think this is why it's failing.
You can see it should be having 1G of memory here too:
...
[2010-06-21 17:29:20 4305] DEBUG (XendDomainInfo:237) XendDomainInfo.restore(['domain', ['domid', '1'], ['cpu_weight', '256'], ['cpu_cap', '0'], ['on_crash', 'restart'], ['uuid', 'c91ec802-2015-cb49-80e5-810c808bf725'], ['bootloader_args'], ['pool_name', 'Pool-0'], ['vcpus', '1'], ['name', 'rhel5-32fv-stubdom'], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['cpus', [[]]], ['description'], ['bootloader'], ['maxmem', '1024'],* ['memory', '1024'],* ['shadow_memory', '9'], ['vcpu_avail', '1'], ['features'], ['on_xend_start', 'ignore'], ['on_xend_stop', 'ignore'], ['start_time', '1277134046.11'], ['cpu_time', '1.550284835'], ['online_vcpus', '1'], ['image', ['hvm', ['kernel'], ['superpages', '0'], ['tsc_mode', '0'], ['videoram', '4'], ['hpet', '0'], ['boot', 'c'], ['loader', '/usr/lib/xen/boot/hvmloader'], ['serial', 'pty'], ['vpt_align', '1'], ['xen_platform_pci', '1'], ['opengl', '1'], ['vncunused', '1'], ['rtc_timeoffset', '0'], ['pci', []], ['pae', '1'], ['stdvga', '0'], ['hap', '1'], ['viridian', '0'], ['acpi', '1'], ['localtime', '0'], ['timer_mode', '1'], ['vnc', '1'], ['nographic', '0'], ['guest_os_type', 'default'], ['vncdisplay', '1'], ['pci_msitranslate', '1'], ['oos', '1'], ['apic', '1'], ['sdl', '0'], ['nomigrate', '0'], ['device_model', '/usr/lib/xen/bin/qemu-dm'], ['pci_power_mgmt', '0'], ['usb', '0'], ['xauthority', '/root/.Xauthority'], ['isa', '0'], ['display', 'localhost:10.0'], ['notes', ['SUSPEND_CANCEL', '1']]]], ['status', '2'], ['state', 'r-----'], ['store_mfn', '1044476'], ['device', ['vif', ['bridge', 'virbr0'], ['uuid', 'dcd99a20-2e8f-2692-8e56-dc4051579923'], ['script', '/etc/xen/scripts/vif-bridge'], ['mac', '00:16:3e:5b:bd:9c'], ['type', 'ioemu'], ['backend', '0']]], ['device', ['vbd', ['uuid', 'e7e07da9-c104-800d-ee3f-5fe9757167fd'], ['bootable', '1'], ['dev', 'hda:disk'], ['uname', 'file:/var/lib/xen/images/colossus/rhel5-32fv.img'], ['mode', 'w'], ['backend', '0'], ['VDI']]], ['device', ['vbd', ['uuid', '0180089b-8394-cbfa-0da4-b8c1fc688617'], ['bootable', '0'], ['dev', 'sda:disk'], ['uname', 'file:/home2/test.img'], ['mode', 'w'], ['backend', '0'], ['VDI']]], ['device', ['vfb', ['vncunused', '1'], ['location', '127.0.0.1:5901'], ['vnc', '1'], ['vncdisplay', '1'], ['uuid', '7fa1bcc0-797d-66ac-eb88-6ef15f1209f0']]], ['device', ['console', ['protocol', 'vt100'], ['location', '3'], ['uuid', 'd77b182b-4152-a4d2-f577-8b610b5cd6ff']]]])

The first error (Error when reading batch size (0 = Success): Internal error) is coming from libxc/xc_domain_restore.c in pagebuf_get_one() function where it is there:
...
    if ( RDEXACT(fd, &count, sizeof(count)) )
    {
        PERROR("Error when reading batch size");
        return -1;
    }
...
so I guess the data are not well-written for this guest (since the file is smaller than the original guest memory) and that's why the error occurs. As you can see there's nothing in xend.log except "failed to get the suspend evtchn port" message:

[2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:126) [xc_save]: /usr/lib64/xen/bin/xc_save 56 5 0 0 4 [2010-06-21 15:59:55 4305] INFO (XendCheckpoint:410) xc_save: failed to get the suspend evtchn port
[2010-06-21 15:59:55 4305] INFO (XendCheckpoint:410)
[2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:381) suspend
[2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:129) In saveInputHandler suspend
[2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:131) Suspending 5 ...
[2010-06-21 15:59:55 4305] DEBUG (XendDomainInfo:521) XendDomainInfo.shutdown(suspend) [2010-06-21 15:59:55 4305] DEBUG (XendDomainInfo:1877) XendDomainInfo.handleShutdownWatch [2010-06-21 15:59:55 4305] INFO (XendDomainInfo:538) HVM save:remote shutdown dom 5! [2010-06-21 15:59:55 4305] INFO (XendDomainInfo:2074) Domain has shutdown: name=migrating-rhel5-32fv-stubdom id=5 reason=suspend.
[2010-06-21 15:59:55 4305] INFO (XendCheckpoint:137) Domain 5 suspended.
[2010-06-21 15:59:56 4305] INFO (image:538) signalDeviceModel:restore dm state to running
[2010-06-21 15:59:56 4305] DEBUG (XendCheckpoint:146) Written done
[2010-06-21 16:00:02 4305] DEBUG (XendDomainInfo:3067) XendDomainInfo.destroy: domid=5 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2397) Destroying device model [2010-06-21 16:00:03 4305] INFO (image:615) migrating-rhel5-32fv-stubdom device model terminated
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2404) Releasing devices
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vif/0
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vbd/768
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vbd/2048
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/2048
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vfb/0
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing console/0
[2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0

Any ideas why the save file is that small (it should be 1024M at least, right? ) ?

Thanks,
Michal

--
Michal Novotny<minovotn@xxxxxxxxxx>, RHCE
Virtualization Team (xen userspace), Red Hat


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel