On Friday, 11 May 2007 at 00:00, Daniel P. Berrange wrote:
> On Thu, May 10, 2007 at 03:13:10PM -0700, Brendan Cully wrote:
> > The posted patch was a fairly conservative approach (backward
> > compatible, equivalent to existing semantics). I've done some
> > more experimental work that reduces the time for the final round to
> > about 5ms. Here are the stats for 100 checkpoints:
> >
> > avg: 5.62 ms, min: 3.96, max: 13.70, median: 4.86
> >
> > It turns out the biggest remaining delay is (surprise!) xenstored. To
> > get the above numbers I unwired xenstored from VIRQ_DOM_EXC and let
> > xc_save bind to it.
> >
> > Obviously this isn't a practical approach. I'd love to hear any ideas
> > about the right way to avoid the xenstore penalty though. My current
> > thought is that it might be possible to arrange to register a dynamic
> > virq from xc_save into xen for a target domain, and then have xen fire
> > it on suspend instead of DOM_EXC (iff it's installed, otherwise use
> > the normal path).
>
> It would be interesting to know what aspect of the xenstore interaction
> is responsible for the slowdown. In particular, whether it is a fundamental
> architectural constraint, or whether it is merely due to the poor performance
> of the current impl. We already know from previous tests that XenD impl of
> transactions absolutely kills performance of various XenD operations due to
> the vast amount of unneccessary I/O it does.
>
> If fixing the XenstoreD transaction code were to help suspend performance
> too, it might be a better option than re-writing all code which touches
> xenstore. A quick test of putting /var/lib/xenstored on a ramdisk would
> be a way of testing whether its the I/O which is hurting suspend time.
That's certainly part of it. If I rewrite xc_save to set up a watch on
@releaseDomain, then select on the xs handle (deferring actually
reading the watch until after the checkpoint), then I get the
following timings:
/var/lib/xenstored on ext3:
avg: 29.41 ms, min: 27.65, max: 40.33, median: 29.30
on tmpfs:
avg: 17.58 ms, min: 7.05, max: 43.88, median: 16.57
It's still awfully jittery though, and significantly slower. I'd guess
that the watch mechanism is the problem. I haven't looked very closely
at its internals, but I wonder if it's just delivering synchronous
notifications to the watcher list in order (in this case, making
xc_save wait until xend has handled the watch).
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|