WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Error restoring DomU when using GPLPV

To: "mukesh.rathor@xxxxxxxxxx" <mukesh.rathor@xxxxxxxxxx>
Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date: Tue, 15 Sep 2009 08:39:20 +0100
Cc: Joshua West <jwest@xxxxxxxxxxxx>, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, James Harper <james.harper@xxxxxxxxxxxxxxxx>, "Kurt C. Hackel" <kurt.hackel@xxxxxxxxxx>, "annie.li@xxxxxxxxxx" <annie.li@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, "wayne.gong@xxxxxxxxxx" <wayne.gong@xxxxxxxxxx>
Delivery-date: Tue, 15 Sep 2009 00:41:10 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4AAEFB00.8000909@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Aco1q9SpGvk8V6srSHGSFzfporJFgAAK84wa
Thread-topic: [Xen-devel] Error restoring DomU when using GPLPV
User-agent: Microsoft-Entourage/12.20.0.090605
On 15/09/2009 03:25, "Mukesh Rathor" <mukesh.rathor@xxxxxxxxxx> wrote:

> Ok, I've been looking at this and figured what's going on. Annie's problem
> lies in not remapping the grant frames post migration. Hence the leak,
> tot_pages goes up every time until migration fails. On linux, remapping
> is where the frames created by restore (for heap pfn's), get freed back to
> the dom heap, is what I found.  So that's a fix to be made on win
> pv driver side.

Although obviosuly that is a bug, I'm not sure why it would cause this
particular issue? The domheap pages do not get freed and replaced with
xenheap pages, but why does that affect the next save/restore cycle? After
all, xc_domain_save does not distinguish between Xenheap and domheap pages?

> 1. Always balloon down, shinfo+gnttab frames: This needs to be done just
>     once during load, right? I'm not sure how it would work tho if mem gets
>     ballooned up subsequently. I suppose the driver will have to intercept
>     every increase in reservation and balloon down everytime?

Well, it is the same driver that is doing the ballooning, so it's kind of
easy to intercept, right? Just need to track how many Xenheap pages are
mapped and maintain that amount of 'balloon down'.

>     Also, balloon down during suspend call would prob be too late, right?

Indeed it would. Need to do it during boot. It's only a few pages though, so
noone will miss them.

> 2. libxc fix: I wonder how much work this will be. Good thing here is,
>     it'll take care of both linux and PV HVM guests avoiding driver
>     updates in many versions, and hence appealing to us. Can we somehow
>     mark the frames special to be skipped? Looking at biiig xc_domain_save
>     function, not sure in case of HVM, how pfn_type gets set. May be before
> the
>     outer loop, it could ask hyp for all xen heap page list, but then what if
> a
>     new page gets added to the list in between.....

It's a pain. Pfn_type[] I think doesn't really get used. Xc_domain_save()
just tries to map PFNs and saves all the ones it successfully maps. So the
problem is it is allowed to map Xenheap pages. But we can't always disallow
that because sometimes the tools have good reason to map Xenheap pages. So
we'd need a new hypercall, or a flag, or something, and that would need dom0
kernel changes as well as Xen and toolstack changes. So it's rather a pain.

> Also, unfortunately, the failure case is not handled properly sometimes.
> If migration fails after suspend, then no way to get the guest
> back. I even noticed, the guest disappeared totally from both source and
> target when failed, couple times of several dozen migrations I did.

That shouldn't happen since there is a mechanism to cancel the suspension of
a suspended guest. Possibly xend doesn't get it right every time, as it's
error handling is pretty poor in general. I trust the underlying mechanisms
below xend pretty well however.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel