This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [RFC] use event channel to improve suspend speed

To: "Daniel P. Berrange" <berrange@xxxxxxxxxx>
Subject: Re: [Xen-devel] [RFC] use event channel to improve suspend speed
From: Brendan Cully <brendan@xxxxxxxxx>
Date: Thu, 10 May 2007 17:06:41 -0700
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 10 May 2007 17:05:10 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <20070510230005.GA17705@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Mail-followup-to: berrange@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx
References: <20070509000110.GI19767@xxxxxxxxxxxxxxxxx> <20070510221310.GD9138@xxxxxxxxxxxxxxxxx> <20070510230005.GA17705@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.15 (2007-05-02)
On Friday, 11 May 2007 at 00:00, Daniel P. Berrange wrote:
> On Thu, May 10, 2007 at 03:13:10PM -0700, Brendan Cully wrote:
> > The posted patch was a fairly conservative approach (backward
> > compatible, equivalent to existing semantics). I've done some
> > more experimental work that reduces the time for the final round to
> > about 5ms. Here are the stats for 100 checkpoints:
> > 
> > avg: 5.62 ms, min: 3.96, max: 13.70, median: 4.86
> > 
> > It turns out the biggest remaining delay is (surprise!) xenstored. To
> > get the above numbers I unwired xenstored from VIRQ_DOM_EXC and let
> > xc_save bind to it.
> > 
> > Obviously this isn't a practical approach. I'd love to hear any ideas
> > about the right way to avoid the xenstore penalty though. My current
> > thought is that it might be possible to arrange to register a dynamic
> > virq from xc_save into xen for a target domain, and then have xen fire
> > it on suspend instead of DOM_EXC (iff it's installed, otherwise use
> > the normal path).
> It would be interesting to know what aspect of the xenstore interaction
> is responsible for the slowdown. In particular, whether it is a fundamental
> architectural constraint, or whether it is merely due to the poor performance
> of the current impl. We already know from previous tests that XenD impl of
> transactions absolutely kills performance of various XenD operations due to 
> the vast amount of unneccessary I/O it does. 
> If fixing the XenstoreD transaction code were to help suspend performance 
> too, it might be a better option than re-writing all code which touches
> xenstore. A quick test of putting /var/lib/xenstored on a ramdisk would
> be a way of testing whether its the I/O which is hurting suspend time.

That's certainly part of it. If I rewrite xc_save to set up a watch on
@releaseDomain, then select on the xs handle (deferring actually
reading the watch until after the checkpoint), then I get the
following timings:

/var/lib/xenstored on ext3:
avg: 29.41 ms, min: 27.65, max: 40.33, median: 29.30
on tmpfs:
avg: 17.58 ms, min: 7.05, max: 43.88, median: 16.57

It's still awfully jittery though, and significantly slower. I'd guess
that the watch mechanism is the problem. I haven't looked very closely
at its internals, but I wonder if it's just delivering synchronous
notifications to the watcher list in order (in this case, making
xc_save wait until xend has handled the watch).

Xen-devel mailing list