This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [RFC] use event channel to improve suspend speed

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] [RFC] use event channel to improve suspend speed
From: Brendan Cully <brendan@xxxxxxxxx>
Date: Thu, 24 May 2007 17:06:11 -0700
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, "Daniel P. Berrange" <berrange@xxxxxxxxxx>
Delivery-date: Thu, 24 May 2007 17:04:34 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C269D201.72B1%Keir.Fraser@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Mail-followup-to: Keir.Fraser@xxxxxxxxxxxx, berrange@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx
References: <20070510230005.GA17705@xxxxxxxxxx> <C269D201.72B1%Keir.Fraser@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.15 (2007-05-02)
On Friday, 11 May 2007 at 07:55, Keir Fraser wrote:
> On 11/5/07 00:00, "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote:
> > It would be interesting to know what aspect of the xenstore interaction
> > is responsible for the slowdown. In particular, whether it is a fundamental
> > architectural constraint, or whether it is merely due to the poor 
> > performance
> > of the current impl. We already know from previous tests that XenD impl of
> > transactions absolutely kills performance of various XenD operations due to
> > the vast amount of unneccessary I/O it does.
> > 
> > If fixing the XenstoreD transaction code were to help suspend performance
> > too, it might be a better option than re-writing all code which touches
> > xenstore. A quick test of putting /var/lib/xenstored on a ramdisk would
> > be a way of testing whether its the I/O which is hurting suspend time.
> Yes. We could go either way -- it wouldn't be too bad to add support via
> dynamic VIRQ_DOM_EXC for example, or add other things to get xenstore off
> the critical path for save/restore. But if the problem is that xenstored
> sucks it probably is worth investing a bit of time to tackle the problem
> directly and see where the time is going. We could end up with optimisations
> which have benefits beyond just save/restore.

I'm sure xenstore could be made significantly faster, but barring a
redesign maybe it's better just to use it for low-frequency
transactions with pretty loose latency expectations? Running the
suspend notification through xenstore, to xend and finally back to
xc_save (as the current code does) seems convoluted, and bound to
create opportunities for bad scheduling compared to directly notifying

In case there's interest, I'll attach the two patches I'm using to
speed up checkpointing (and live migration downtime). As I mentioned
earlier, the first patch should be semantically equivalent to existing
code, and cuts downtime to about 30-35ms. The second notifies xend
that the domain has been suspended asynchronously, so that final round
memory copying may begin before device migration stage 2. This is a
semantic change, but I can't think of a concrete drawback. It's a
little rough-and-ready -- suggestions for improvement are welcome.

Here are some stats on final round time (100 runs):

xen 3.1:
  avg: 93.40 ms, min: 72.59, max: 432.46, median: 85.10
patch 1 (trigger suspend via event channel):
  avg: 43.69 ms, min: 35.21, max: 409.50, median: 37.21
patch 1, /var/lib/xenstored on tmpfs:
  avg: 33.88 ms, min: 27.01, max: 369.21, median: 28.34
patch 2 (receive suspended notification via event channel):
  avg: 4.95 ms, min: 3.46, max: 14.73, median: 4.63

Attachment: suspend-evtchn.patch
Description: Text Data

Attachment: subscribe-suspend.patch
Description: Text Data

Xen-devel mailing list