WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a running d

To: Brendan Cully <brendan@xxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a running domain
From: Steven Hand <Steven.Hand@xxxxxxxxxxxx>
Date: Fri, 15 Dec 2006 08:07:55 +0000
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Steven.Hand@xxxxxxxxxxxx
Delivery-date: Fri, 15 Dec 2006 00:08:02 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: Message from Brendan Cully <brendan@xxxxxxxxx> of "Thu, 14 Dec 2006 23:38:36 MST." <patchbomb.1166168316@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>I'm not too sure about the last couple of patches in this
>series. Because the checkpointing domain doesn't disconnect before
>calling suspend, it retains a few references to pages it doesn't
>own. These trigger a PT race detector in xc_linux_save, which causes
>it to abort. So the last couple of patches explicitly identify the
>references I've found so far (shared_info and some grant table shared
>pages) and simply zero those PTEs during save, since they'll be
>recreated on restore. Finding the grant table pages is a bit fragile -
>I walk the page table loaded in CR3 at the time of suspend looking for
>the virtual address I've stowed in the suspend record. I've only got
>code for two-level page tables at the moment, since I'm not convinced
>this is the right approach. Under what circumstances would a non-live
>save have an unsafe PTE race? 

Pretty much any PT race in a non-live save/migrate is a bug; the 
domain is (in theory) suspended at this point, and all of the 
devices are disconnected. Since you've chosen not to 'disconnect' 
the devices, you'll get random updates occuring to any shared 
pages (shared via grants or directly shared with Xen). 

> Maybe it's fine to simply zero these ptes without checking them. 

I'd think not. 

>Or maybe it'd be less fragile to get the owners of the pages from Xen 
>and see if the guest has legitimate mappings to them? Comments?

I think the ideal thing to do here is to mirror the live migrate case, 
i.e. do a full 'disconnect' of devices, xenbus, console, event channels
etc, and then bring them back up. It'll probably be possible to do this
in a slightly more efficient / less intrusive fashion by just cauterising
things in Xen (i.e. closing the event channel -> guest path but not 
unbinding the interdomain side). For grants, you basically have to 
follow the live migrate case and be prepared to re-issue, since otherwise
on resume (which is preumably desired at some point?) you'll have garbage
in flight and/or lost requests. 


Anyway, looks like an interesting start, and would be a nice feature 
to get into -unstable sometime post 3.0.4. 



cheers,

S.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel