This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] live saving of domU

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] live saving of domU
From: Andres Lagar Cavilla <andreslc@xxxxxxxxxxxxxx>
Date: Wed, 10 May 2006 15:14:26 -0400
Delivery-date: Wed, 10 May 2006 12:15:43 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <E1FdtIP-0000Id-VQ@host-192-168-0-1-bcn-london>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <E1FdtIP-0000Id-VQ@host-192-168-0-1-bcn-london>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla Thunderbird 1.0.7-1.1.fc3 (X11/20050929)
Anthony Liguori wrote

Moreover, you cannot dump the state of a domain after a pause and expect it to ever run again.

Guests are aware of the physical addresses of the memory that's been allocated to them. Because of this, to save a domain's state in a restorable way you need the guest to "canonicalize" itself. The only way to do this today is through a suspend operation which happens to be a subop of shutdown. Shutdowns are non-recoverable so you cannot use this as a snapshotting mechanism.
My understanding is that the guest only canonicalizes the store and console mfn's and places them on the shared info frame which is passed to the suspend hypercall. The rest of the canonicalizations are done by dom0 user-space code (xc_linux_save). The guest never really shuts down: it issues the suspend hypercall and waits for it to return. This could happen months later when the domain is resumed :) The suspend hypercall executing in xen is the one that pauses all vcpus and kills the domain. Is it feasible to use a different hypercall that pauses the domain but doesn't kill it, and once xc_linux_save is done checkpointing have it issue a dom0_op that unpauses the domain?

For filesystem corruption you're gonna have to hack up your own thing. Probably a CoW solution, where you begin a new "epoch" when resuming from the checkpoint.


The closest thing you can achieve is a localhost migration. There are some caveats to this, of course. The first is that you need to have as much memory as the domain has available since you'll have a copy of the domain created briefly while the migration takes place. Migrations are also quite intrusive since they involve tearing down and bringing up all the devices.

I've gotten a lot of requests for light weight checkpointing. AFAIK, noone is actually working on it though.

Xen-devel mailing list