WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen

To: Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>
Subject: Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)
From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Date: Fri, 1 Apr 2011 14:49:10 +0100
Cc: Shriram Rajagopalan <rshriram@xxxxxxxxx>, Brendan Cully <brendan@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 01 Apr 2011 06:49:58 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=8wNKpd9lHwZ+gGMikYhru5/xJS3Juy4nxy71/1VROmo=; b=SWTDn6jJmGWy86LUr/AgLre02SfaCnjl+FAEcb/0y6kdJtcDXu4/Is23tW1zKw5Z9z i5ot/8jcdazblw2NjeSB0l/9n2MeOJ6tm7vCZKEoUy8rXcQmoa8/94UPbecFXxbfdurA v5UND4b7gTXmCw3UQXKGpZwjHkJKPaayTYemY=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=Scfrdw1TQMfgftMlxgZnGs6b/ZShGCURYfbKd3okuKuHySn+KFPv7ZmdINkU68W7lz 8ry9RHUW4e5DpclwSyKkrfTOJbia3y7BhivG375TeqkeL8yeg41oTxiBTkAbplQo9Mli vNzpth8nzHElVcWdIreG0cjSGCIRUWETu7s3Y=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1301659093.27123.227.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTik=XfbVbDzEGb3KL-UyN3oWVBvkrj4t+G9Fj_hf@xxxxxxxxxxxxxx> <1301659093.27123.227.camel@xxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Fri, Apr 1, 2011 at 12:58 PM, Ian Campbell
<Ian.Campbell@xxxxxxxxxxxxx> wrote:
> On Fri, 2011-04-01 at 12:02 +0100, George Dunlap wrote:
>> On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell <ian.campbell@xxxxxxxxxx> 
>> wrote:
>> > # HG changeset patch
>> > # User Ian Campbell <ian.campbell@xxxxxxxxxx>
>> > # Date 1283766891 -3600
>> > # Node ID bdf8ce09160d715451e1204babe5f80886ea6183
>> > # Parent  5f96f36feebdb87eaadbbcab0399f32eda86f735
>> > libxc: provide notification of final checkpoint to restore end
>> >
>> > When the restore code sees this notification it will restore the
>> > currently in-progress checkpoint when it completes.
>> >
>> > This allows the restore end to finish up without waiting for a
>> > spurious timeout on the receive fd and thereby avoids unnecessary
>> > error logging in the case of a successful migration or restore.
>> >
>> > In the normal migration or restore case the first checkpoint is always
>> > the last. For a rolling checkpoint (such as Remus) the notification is
>> > currently unused but could be used in the future for example to
>> > provide a controlled failover for reasons other than error
>>
>> Unfortunatley, this introduces a backwards-compatibility problem,
>> since older versions of the tools never send LAST_CHECKPOINT.
>
> Out of interest what actually happens? (I think you mentioned something
> about failing when restoring from a file?)

Steps to reproduce:
* Install last release of XenServer (based on Xen 3.4)
* Install a VM, suspend it.
* Upgrade to development version of XenServer (based on 4.1)
* Attempt to resume VM; resume fails.

libxc output is below:

Apr  1 14:27:29 localhost xenguest: xc: detail: xc_domain_restore
start: p2m_size = fefff
Apr  1 14:27:33 localhost xenguest: xc: error: Max batch size exceeded
(1970103633). Giving up.: Internal error
Apr  1 14:27:33 localhost xenguest: xc: error: error when buffering
batch, finishing (90 = Message too long): Internal error
Apr  1 14:27:33 localhost xenguest: xc: detail: Restore exit with rc=0

I had added debugging to print out the "batch" values read, and
confirmed that this is *after* the page transfer and everything else
was done, and after the "0" entry indicating the end of a checkpoint.
So it's reading through one full checkpoint, and then reading some
garbage value and failing.  Initializing "ctx->last_checkpoint" to 1
makes the problem go away.

Ah -- hmm.  You know what I bet -- xc_domain_save() doesn't write the
qemu checkpoint, but xc_domain_restore() reads it (again because of
REMUS).  Libxl handles this properly, but making xapi write in such a
way that xc_domain_restore() could read it properly turned out to be
non-trivial; making xc_domain_restore() not read the qemu file was the
simplest solution.

But of course, that means that there's still data in the file after
you've finished a "checkpoint", so the next read succeeds in reading
in the qemu save file.  Not surprising that it doesn't look right. :-)

> My intention at the time was that in the absence of a LAST_CHECKPOINT
> things would continue to behave as before e.g. you would see the
> spurious timeout on the receive fd but still restore correctly.
>
> Perhaps the associated changes to the O_NONBLOCK'ness of the fd at
> various stages is what broken this for the non-Remus migrations?
>
>> Would it make sense to invert the logic on this -- i.e., default to
>> only one checkpoint, and send "MORE_CHECKPOINTS" if it expects this
>> *not* to be the last (either by expecting the checkpoint() function to
>> do it, or by sending it from xc_domain_save)?
>
> I'd quite like to understand what I broke and investigate the
> trivialness (or otherwise) of a fix, but this change would probably
> independently make sense too.
>
>> Now that 4.1 is out, we have to consider version compatibilities.  But
>> we should be able to make it so that there would only be if:
>> 1) You're running REMUS
>
> Is it the case that Remus only cares about checkpointing between like
> versions of the toolstack?
>
>> 2) One of your toolstacks is 4.1.0, and one is not.
>>
>> That seems like it shouldn't be too bad.
>>
>> Thoughts?
>>
>>  -George
>>
>> >
>> > Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
>> > Acked-by: Brendan Cully <brendan@xxxxxxxxx>
>> >
>> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c
>> > --- a/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010 +0100
>> > +++ b/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010 +0100
>> > @@ -42,6 +42,7 @@ struct restore_ctx {
>> >     xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */
>> >     xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current 
>> > region.  */
>> >     int completed; /* Set when a consistent image is available */
>> > +    int last_checkpoint; /* Set when we should commit to the current 
>> > checkpoint when it completes. */
>> >     struct domain_info_context dinfo;
>> >  };
>> >
>> > @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface
>> >         // DPRINTF("console pfn location: %llx\n", buf->console_pfn);
>> >         return pagebuf_get_one(xch, ctx, buf, fd, dom);
>> >
>> > +    case XC_SAVE_ID_LAST_CHECKPOINT:
>> > +        ctx->last_checkpoint = 1;
>> > +        // DPRINTF("last checkpoint indication received");
>> > +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
>> > +
>> >     default:
>> >         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
>> >             ERROR("Max batch size exceeded (%d). Giving up.", count);
>> > @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch,
>> >             goto out;
>> >         }
>> >         ctx->completed = 1;
>> > -        /* shift into nonblocking mode for the remainder */
>> > -        if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
>> > -            flags = 0;
>> > -        fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
>> > +
>> > +        /*
>> > +         * If more checkpoints are expected then shift into
>> > +         * nonblocking mode for the remainder.
>> > +         */
>> > +        if ( !ctx->last_checkpoint )
>> > +        {
>> > +            if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
>> > +                flags = 0;
>> > +            fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
>> > +        }
>> > +    }
>> > +
>> > +    if ( ctx->last_checkpoint )
>> > +    {
>> > +        // DPRINTF("Last checkpoint, finishing\n");
>> > +        goto finish;
>> >     }
>> >
>> >     // DPRINTF("Buffered checkpoint\n");
>> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c
>> > --- a/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010 +0100
>> > +++ b/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010 +0100
>> > @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in
>> >         }
>> >     }
>> >
>> > +    if ( !callbacks->checkpoint )
>> > +    {
>> > +        /*
>> > +         * If this is not a checkpointed save then this must be the first 
>> > and
>> > +         * last checkpoint.
>> > +         */
>> > +        i = XC_SAVE_ID_LAST_CHECKPOINT;
>> > +        if ( wrexact(io_fd, &i, sizeof(int)) )
>> > +        {
>> > +            PERROR("Error when writing last checkpoint chunk");
>> > +            goto out;
>> > +        }
>> > +    }
>> > +
>> >     /* Zero terminate */
>> >     i = 0;
>> >     if ( wrexact(io_fd, &i, sizeof(int)) )
>> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h
>> > --- a/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010 +0100
>> > +++ b/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010 +0100
>> > @@ -131,6 +131,7 @@
>> >  #define XC_SAVE_ID_TMEM_EXTRA         -6
>> >  #define XC_SAVE_ID_TSC_INFO           -7
>> >  #define XC_SAVE_ID_HVM_CONSOLE_PFN    -8 /* (HVM-only) */
>> > +#define XC_SAVE_ID_LAST_CHECKPOINT    -9 /* Commit to restoring after 
>> > completion of current iteration. */
>> >
>> >  /*
>> >  ** We process save/restore/migrate in batches of pages; the below
>> >
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@xxxxxxxxxxxxxxxxxxx
>> > http://lists.xensource.com/xen-devel
>> >
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel