WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: [PATCH] libxl: do slow resume after failed migration

To: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Re: [PATCH] libxl: do slow resume after failed migration attempt
From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Date: Wed, 16 Feb 2011 11:53:21 +0000
Delivery-date: Wed, 16 Feb 2011 03:53:55 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1297856957.21980.6310.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Citrix Systems, Inc.
References: <1728ed4bbec9e82ca13c.1297856876@xxxxxxxxxxxxxxxxxxxxx> <1297856957.21980.6310.camel@xxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Wed, 2011-02-16 at 11:49 +0000, Ian Campbell wrote:
> On Wed, 2011-02-16 at 11:47 +0000, Ian Campbell wrote:
> > # HG changeset patch
> > # User Ian Campbell <ian.campbell@xxxxxxxxxx>
> > # Date 1297856874 0
> > # Node ID 1728ed4bbec9e82ca13c2639c8e4ef8b4dc231b6
> > # Parent  aa466613328f5de78fdfc968473cb06e948c1f5d
> > libxl: do slow resume after failed migration attempt
> > 
> > both of the current callers for libxl_domain_resume are calling after
> > a migration has failed, one is failure to suspend on the sender and
> > the other is failure to start on the destination, both leading to a
> > resume attempt on the sender.
> > 
> > However in the first case, failure to suspend, there is no guarantee
> > that the guest has made it as far as the suspend hypercall and
> > therefore the fast resume method, which frobs the hypercall return to
> > indicate a cancelled suspend, cannot safely be used since it will
> > corrupt %eax/%rax.
> > 
> > For the second case, failure to start on destination, I don't think it
> > really matters if the resume is fast or slow.
> > 
> > Therefore always use the slow/uncooperative version of xc_domain_resume from
> > libxl_domain_resume.
> > 
> > This makes a PV domain which failed to suspend (e.g. because the core
> > Linux PM infrastructure within the guest didn't allow it) recover
> > gracefully.
> 
> a PVHVM domain never suffered from this because libxl_domain_resume
> bails due to a libxl__domain_is_hvm check. I'm not 100% clear whether
> this is correct but I didn't change it. My test with a PVHVM guest which
> acknowledges the suspend but doesn't go on to do anything seems to work.

Looking closer, even a PV guest which is hacked to not actually try to
suspend fails this new xc_domain_resume call and it's actually the
original domain which continues.

I'm inclined to suggest that this is OK and that trying to do a slow
xc_domain_resume will save guests which have suffered certain types of
failure and be harmless for other types of failures, but I wouldn't
argue strongly against a suggestion that the right thing to do in the
"failed to suspend" case is to simply unpause the original domain and
let it try and continue...

Ian.

> 
> Ian.
> 
> > 
> > Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> > 
> > diff -r aa466613328f -r 1728ed4bbec9 tools/libxl/libxl.c
> > --- a/tools/libxl/libxl.c   Tue Feb 15 13:40:50 2011 +0000
> > +++ b/tools/libxl/libxl.c   Wed Feb 16 11:47:54 2011 +0000
> > @@ -226,7 +226,7 @@ int libxl_domain_resume(libxl_ctx *ctx, 
> >          rc = ERROR_NI;
> >          goto out;
> >      }
> > -    if (xc_domain_resume(ctx->xch, domid, 1)) {
> > +    if (xc_domain_resume(ctx->xch, domid, 0)) {
> >          LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, 
> >                          "xc_domain_resume failed for domain %u", 
> >                          domid);
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel