WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a running domai

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a running domain
From: Brendan Cully <brendan@xxxxxxxxx>
Date: Thu, 14 Dec 2006 23:38:36 -0700
Delivery-date: Thu, 14 Dec 2006 23:39:02 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Here's some code I've been playing with lately that teaches Xen to do
checkpoints of running domains. It adds a new flag (-c) to xm save
that causes the domain to be restored to the runnable state after save
instead of being destroyed. It also alters the checkpoint code in the
guest to simply lock external devices instead of disconnecting them,
and to only attempt to reconnect if the suspend hypercall returns in a
new domain (detected by the hypercall return value).

This alternate suspend path is triggered by a new shutdown code
('checkpoint') - if the -c flag is not specified the existing
'suspend' function is run, so this code shouldn't have any effect on
existing functionality.

I'm not too sure about the last couple of patches in this
series. Because the checkpointing domain doesn't disconnect before
calling suspend, it retains a few references to pages it doesn't
own. These trigger a PT race detector in xc_linux_save, which causes
it to abort. So the last couple of patches explicitly identify the
references I've found so far (shared_info and some grant table shared
pages) and simply zero those PTEs during save, since they'll be
recreated on restore. Finding the grant table pages is a bit fragile -
I walk the page table loaded in CR3 at the time of suspend looking for
the virtual address I've stowed in the suspend record. I've only got
code for two-level page tables at the moment, since I'm not convinced
this is the right approach. Under what circumstances would a non-live
save have an unsafe PTE race? Maybe it's fine to simply zero these
ptes without checking them. Or maybe it'd be less fragile to get the
owners of the pages from Xen and see if the guest has legitimate
mappings to them? Comments?

I'll post some truly horrible proof-of-concept code to create LVM
snapshots at checkpoint time in a separate email.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel