On Wed, 2011-02-16 at 11:49 +0000, Ian Campbell wrote:
> On Wed, 2011-02-16 at 11:47 +0000, Ian Campbell wrote:
> > # HG changeset patch
> > # User Ian Campbell <ian.campbell@xxxxxxxxxx>
> > # Date 1297856874 0
> > # Node ID 1728ed4bbec9e82ca13c2639c8e4ef8b4dc231b6
> > # Parent aa466613328f5de78fdfc968473cb06e948c1f5d
> > libxl: do slow resume after failed migration attempt
> >
> > both of the current callers for libxl_domain_resume are calling after
> > a migration has failed, one is failure to suspend on the sender and
> > the other is failure to start on the destination, both leading to a
> > resume attempt on the sender.
> >
> > However in the first case, failure to suspend, there is no guarantee
> > that the guest has made it as far as the suspend hypercall and
> > therefore the fast resume method, which frobs the hypercall return to
> > indicate a cancelled suspend, cannot safely be used since it will
> > corrupt %eax/%rax.
> >
> > For the second case, failure to start on destination, I don't think it
> > really matters if the resume is fast or slow.
> >
> > Therefore always use the slow/uncooperative version of xc_domain_resume from
> > libxl_domain_resume.
> >
> > This makes a PV domain which failed to suspend (e.g. because the core
> > Linux PM infrastructure within the guest didn't allow it) recover
> > gracefully.
>
> a PVHVM domain never suffered from this because libxl_domain_resume
> bails due to a libxl__domain_is_hvm check. I'm not 100% clear whether
> this is correct but I didn't change it. My test with a PVHVM guest which
> acknowledges the suspend but doesn't go on to do anything seems to work.
Looking closer, even a PV guest which is hacked to not actually try to
suspend fails this new xc_domain_resume call and it's actually the
original domain which continues.
I'm inclined to suggest that this is OK and that trying to do a slow
xc_domain_resume will save guests which have suffered certain types of
failure and be harmless for other types of failures, but I wouldn't
argue strongly against a suggestion that the right thing to do in the
"failed to suspend" case is to simply unpause the original domain and
let it try and continue...
Ian.
>
> Ian.
>
> >
> > Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> >
> > diff -r aa466613328f -r 1728ed4bbec9 tools/libxl/libxl.c
> > --- a/tools/libxl/libxl.c Tue Feb 15 13:40:50 2011 +0000
> > +++ b/tools/libxl/libxl.c Wed Feb 16 11:47:54 2011 +0000
> > @@ -226,7 +226,7 @@ int libxl_domain_resume(libxl_ctx *ctx,
> > rc = ERROR_NI;
> > goto out;
> > }
> > - if (xc_domain_resume(ctx->xch, domid, 1)) {
> > + if (xc_domain_resume(ctx->xch, domid, 0)) {
> > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
> > "xc_domain_resume failed for domain %u",
> > domid);
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|