[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 07 of 10] Add new shutdown mode for checkpoint

  • To: Brendan Cully <brendan@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
  • Date: Thu, 28 Dec 2006 16:51:50 +0000
  • Delivery-date: Thu, 28 Dec 2006 08:51:31 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AccqoHgDtq96zpaTEdusoQANk04WTA==
  • Thread-topic: [Xen-devel] [PATCH 07 of 10] Add new shutdown mode for checkpoint

On 15/12/06 6:38 am, "Brendan Cully" <brendan@xxxxxxxxx> wrote:

> Add new shutdown mode for checkpoint.
> When control/shutdown = checkpoint, invoke an alternate suspend path
> that doesn't disconnect from back ends, and only reconnects when the
> image has been restored into a new domain.

I don't think a new type of 'checkpoint' handler is required in the guest
OS. We are already most of the way there in terms of doing as little as
possible on the suspend side of save/restore, so we should fix up what
little else there is to be done. Looking at the differences versus your new
checkpointing suspend:
 1. Xenbus_suspend() needs to stay. Actually most drivers do not have a
suspend handler anyway (only tpmfront does). We should provide a
suspend_cancelled() hook callback so that drivers which *do* have a suspend
handler can distinguish between proper resume and checkpoint return.
 2. I don't think we really need to xs_unwatch() all our watches on
xs_suspend(). Probably that code can just go.
 3. Keep gnttab_suspend(). It isn't really that slow to execute and avoids
needing other hacks.
 4. Keep pre_suspend() and don't have special pre_checkpoint(). Again, it is
fairly cheap to clear/renew the shared_info mapping.
 5. It would be nice to have a backward-compatible way for the guest to tell
the tools that its suspension is cancellable. For this we could write an
informative string into xen_start_info->magic[]. Notifying the guest of
suspend-cancel versus restore can be done via %eax return code. For example,
0==suspend-cancel, +ve==restore, -ve==error. Old tools will leave
%eax==__HYPERVISOR_sched_op, which will correctly map to 'restore'.

This allows us to use this cheap checkpoint framework to also provide easy
cancellation of save/restore if something goes wrong (e.g., network
connectivity fails during live migration).

 -- Keir

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.