Hi,
I've been doing a little work on improving the latency of guest domain
suspends. I've added a couple of printfs into xc_domain_save around
the last round, and hooked up a harness to loop over the last round
code every couple of seconds. Here are some numbers for a run of 100
last rounds (from just before the suspend callback to just before it
would exit), on a 3.2 Ghz P4 with 1 GB of RAM, 128 MB of which goes to
a guest. This approximates the best-case downtime for live migration,
I think.
current code:
avg: 133.57 ms, min: 82.53, max: 559.86, median: 135.63
with the attached patch:
avg: 36.05 ms, min: 33.99, max: 52.14, median: 35.51
The patch creates an event channel in the guest that fires the suspend
code. xc_save can use this to suspend the domain instead of calling
back to xend, which then writes a xenstore entry, which then causes a
watch to fire in the guest. It seems the xenstore interaction is
fairly slow and very jittery.
This isn't intended for 3.1, but I thought I'd put it out just in case
anyone else finds it interesting. I'd appreciate comments about the
approach.
There's also a fair amount of latency involved in xend receiving the
notification that the domain has suspended and passing that back on to
xc_save. A quick hack to let xc_save simply loop on xc_domain_getinfo
until the domain suspends indicates that it should be fairly easy to
cut the suspend latency in half again, to about 15ms. I'll see about
finding a clean equivalent of this...
Comments?
suspend-via-evtchn.patch
Description: Text Data
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|