xen-devel

[Top] [All Lists]

Re: [Xen-devel] [PATCH] libxc: succeed silently on restore

from [Ian Campbell]

[Permanent Link][Original]

To:	Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] [PATCH] libxc: succeed silently on restore
From:	Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>
Date:	Thu, 2 Sep 2010 19:26:45 +0100
Cc:	Brendan, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Cully <brendan@xxxxxxxxx>
Delivery-date:	Thu, 02 Sep 2010 11:27:24 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<19583.55777.287080.230085@xxxxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization:	Citrix Systems, Inc.
References:	<5ad37819cddd19a27065.1283444083@xxxxxxxxxxxxxxxxxxxxx> <1283446919.12544.9877.camel@xxxxxxxxxxxxxxxxxxxxxx> <19583.55777.287080.230085@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On Thu, 2010-09-02 at 18:07 +0100, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] [PATCH] libxc: succeed silently on 
> restore"):
> > I'm not so sure what can be done about this case, the way
> > xc_domain_restore is (currently) designed it relies on the saving end to
> > close its FD when it is done in order to generate an EOF at the receiver
> > end to signal the end of the migration.
> 
> This was introduced in the Remus patches and is IMO not correct.
> 
> > The xl migration protocol has a postamble which prevents us from closing
> > the FD and so instead what happens is that the sender finishes the save
> > and then sits waiting for the ACK from the receiver so the receiver hits
> > the remus heartbeat timeout which causes us to continue. This isn't
> > ideal from the downtime point of view nor from just a general design
> > POV.
> 
> The xl migration protocol postamble is needed to try to mitigate the
> consequences of network failure, where otherwise it is easy to get
> into situations where neither the sender nor the receiver can safely
> resume the domain.

Yes, I wasn't suggesting getting rid of the postamble, just commenting
on why we can't simply close the sending fd as xc_domain_restore
currently expects.

> > Perhaps we should insert an explicit done marker into the xc save
> > protocol which would be appended in the non-checkpoint case? Only the
> > save end is aware if the migration is a checkpoint or not (and only
> > implicitly via callbacks->checkpoint <> NULL) but that is OK, I think.
> 
> There _is_ an explicit done marker: the sender stops sending pages and
> sends a register dump.  It's just that remus then wants to continue
> anyway.

I was suggesting a second "alldone" marker to be sent after the register
dump and other tail bits when there are no more checkpoints to come.
But...

> The solution is that the interface to xc_domain_restore should be
> extended so that:
>  * Callers specify whether they are expecting a series of checkpoints,
>    or just one.
>  * When it returns you find out whether the response was "we got
>    exactly the one checkpoint you were expecting" or "the network
>    connection failed too soon" or "we got some checkpoints and then
>    the network connection failed".

... I like this idea more. I'll see what I can rustle up.

> A related problem is that it is very difficult for the caller to
> determine when the replication has been properly set up: ie, to know
> when the receiver has got at least one whole checkpoint.

I think this actually does work with the code as it is -- the receive
will return error if it doesn't get at least one whole checkpoint and
will return success and commit to the most recent complete checkpoint
otherwise.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
[Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Jackson Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell <= Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Jackson Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Brendan Cully Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Brendan Cully Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Jackson

Previous by Date:	[Xen-devel] RE: xen-4.0-testing ioemu-dir build fails with kvm.c(?!) errors, Dan Magenheimer
Next by Date:	Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell
Previous by Thread:	Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Jackson
Next by Thread:	Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Jackson
Indexes:	[Date] [Thread] [Top] [All Lists]