WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops

To: keir.fraser@xxxxxxxxxxxxx, Ian.Jackson@xxxxxxxxxxxxx, andreas.olsowski@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
From: Brendan Cully <Brendan@xxxxxxxxx>
Date: Wed, 2 Jun 2010 21:31:43 -0700
Cc:
Delivery-date: Wed, 02 Jun 2010 21:32:51 -0700
Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=quuxuum.com; h=date:to :subject:message-id:references:mime-version:content-type :in-reply-to:from; s=dk; bh=qX2Z6klxhKKA47NdLEV6igzlnaE=; b=IrHH /F6Pw/y0LsDnbwxnPZ/kirURUGFx8RVzS8Am2mp1gI37OQSRMypSsFZACd4QaAVz 4BE5kloegqhaC4LkEQpgIHdsR2BvDrnzG/kx3qZyTB54WRES6N4FOpxz2jgN6qlG rh1ZDG0mh24prDWLhMVd57ErtfSCimkkv0h3xk8=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100603010418.GB2028@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Mail-followup-to: keir.fraser@xxxxxxxxxxxxx, Ian.Jackson@xxxxxxxxxxxxx, andreas.olsowski@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx
References: <19462.33905.936222.605434@xxxxxxxxxxxxxxxxxxxxxxxx> <C82C445E.167B0%keir.fraser@xxxxxxxxxxxxx> <20100603010418.GB2028@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.20 (2010-04-22)
On Wednesday, 02 June 2010 at 18:04, Brendan Cully wrote:
> On Wednesday, 02 June 2010 at 17:24, Keir Fraser wrote:
> > On 02/06/2010 17:18, "Ian Jackson" <Ian.Jackson@xxxxxxxxxxxxx> wrote:
> > 
> > > Andreas Olsowski writes ("[Xen-devel] slow live magration / xc_restore on 
> > > xen4
> > > pvops"):
> > >> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal
> > >> error: Error when reading batch size
> > >> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal
> > >> error: error when buffering batch, finishing
> > > 
> > > These errors, and the slowness of migrations, are caused by changes
> > > made to support Remus.  Previously, a migration would be regarded as
> > > complete as soon as the final information including CPU states was
> > > received at the migration target.  xc_domain_restore would return
> > > immediately at that point.
> > 
> > This probably needs someone with Remus knowledge to take a look, to keep all
> > cases working correctly. I'll Cc Brendan. It'd be good to get this fixed for
> > a 4.0.1 in a few weeks.
> 
> I've done a bit of profiling of the restore code and observed the
> slowness here too. It looks to me like it's probably related to
> superpage changes. The big hit appears to be at the front of the
> restore process during calls to allocate_mfn_list, under the
> normal_page case. It looks like we're calling
> xc_domain_memory_populate_physmap once per page here, instead of
> batching the allocation? I haven't had time to investigate further
> today, but I think this is the culprit.

By the way, this only seems to matter on pvops -- restore is still
pretty quick on 2.6.18. I'm somewhat surprised that there'd be any
significant difference in allocating guest memory between the two
kernels (isn't this almost entirely Xen's responsibility?), but it
does explain why this wasn't noticed until recently.

> > 
> > 
> > > Since the Remus patches, xc_domain_restore waits until it gets an IO
> > > error, and also has a very short timeout which induces IO errors if
> > > nothing is received if there is no timeout.  This is correct in the
> > > Remus case but wrong in the normal case.
> > > 
> > > The code should be changed so that xc_domain_restore
> > >  (a) takes an explicit parameter for the IO timeout, which
> > >      should default to something much longer than the 100ms or so of
> > >      the Remus case, and
> > >  (b) gets told whether
> > >     (i) it should return immediately after receiving the "tail"
> > >         which contains the CPU state; or
> > >     (ii) it should attempt to keep reading after receiving the "tail"
> > >         and only return when the connection fails.
> > > 
> > > In the case (b)(i), which should be the usual case, the behaviour
> > > should be that which we would get if changeset 20406:0f893b8f7c15 was
> > > reverted.  The offending code is mostly this, from 20406:
> > > 
> > > +    // DPRINTF("Buffered checkpoint\n");
> > > +
> > > +    if ( pagebuf_get(&pagebuf, io_fd, xc_handle, dom) ) {
> > > +        ERROR("error when buffering batch, finishing\n");
> > > +        goto finish;
> > > +    }
> > > +    memset(&tmptail, 0, sizeof(tmptail));
> > > +    if ( buffer_tail(&tmptail, io_fd, max_vcpu_id, vcpumap,
> > > +                     ext_vcpucontext) < 0 ) {
> > > +        ERROR ("error buffering image tail, finishing");
> > > +        goto finish;
> > > +    }
> > > +    tailbuf_free(&tailbuf);
> > > +    memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
> > > +
> > > +    goto loadpages;
> > > +
> > > +  finish:
> > > 
> > > Ian.
> > > 
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-devel
> > 
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
> > 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>