WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Improving domU restore time

To: Rafal Wojtczuk <rafal@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Improving domU restore time
From: Joanna Rutkowska <joanna@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 25 May 2010 12:58:28 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 25 May 2010 03:56:15 -0700
Dkim-signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=messagingengine.com; h=message-id:date:from:mime-version:to:cc:subject:references:in-reply-to:content-type; s=smtpout; bh=fUVMy1ZYTYIMuAfwL4PmEvkRFuI=; b=YDHHFoCieO2UiINgx1xx2OK0rN8fo9eR0OF4ox5xGncYUP3SMIGlId1CRSlfjNK/vi0z7UskmKrg9D6uHueh3WbBQonTSwGsfM7/jABSRUP/hqpF04DftNIXTmeXRzE8S99ww1CBFAccFOTK4JmGgJhEXz0DGVM4xREGdD18kAg=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100525103557.GC23903@xxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20100525103557.GC23903@xxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100330 Fedora/3.0.4-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.4
A bit of a background to the Rafal's post -- we plan to implement a
feature that we call "Disposable VMs" in Qubes, that would essentially
allow for super-fast creation of small, one-purpose VM (DomU), e.g. just
for opening of a PDF, or Word document, etc. The point is: the creation
& resume of such a VM must be really fast, i.e. much below 1s.

And this seems possible, especially if we use sparse files for storing
the VM's save-image and the restore operation (the VMs we're talking
about here would have around 100-150MB of the actual data recorded in a
sparse savefile). But, as Rafal pointed out, some operations that Xen
does seem to be implemented ineffectively, and wanted to get your
opinion before we start optimizing them (i.e. xc_restore and
/etc/xen/scripts/block optimization that Rafal mentioned).

Thanks,
j.

On 05/25/2010 12:35 PM, Rafal Wojtczuk wrote:
> Hello,
> I would be grateful for the comments on possible methods to improve domain
> restore performance. Focusing on the PV case, if it matters.
> 1) xen-4.0.0
> I see a similar problem to the one reported at the thread at
> http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00677.html
> 
> Dom0 is 2.6.32.9-7.pvops0 x86_64, xen-4.0.0 x86_64. 
> [user@qubes ~]$ xm create /dev/null
>       kernel=/boot/vmlinuz-2.6.32.9-7.pvops0.qubes.x86_64 
>       root=/dev/mapper/dmroot extra="rootdelay=1000" memory=400
> ...wait a second...
> [user@qubes ~]$ xm save null nullsave
> [user@qubes ~]$ time cat nullsave >/dev/null
> ...
> [user@qubes ~]$ time cat nullsave >/dev/null
> ...
> [user@qubes ~]$ time cat nullsave >/dev/null
> real    0m0.173s
> user    0m0.010s
> sys     0m0.164s
> /* sits nicely in the cache, let's restore... */
> [user@qubes ~]$ time xm restore nullsave
> real    0m9.189s
> user    0m0.151s
> sys     0m0.039s
> 
> According to systemtap, xc_restore uses 3812s of CPU time; besides it being
> a lot, what uses the remaining 6s ? Just as reported previously, there are 
> some errors in xend.log
> 
> [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0,
> _static_max=0x19000000, _static_min=0x0, 
> [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:305) [xc_restore]:
> /usr/lib64/xen/bin/xc_restore 39 3 1 2 0 0 0 0
> [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) xc_domain_restore
> start: p2m_size = 19000
> [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) Reloading memory pages:
> 0%
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error:
> Error when reading batch size
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error:
> error when buffering batch, finishing
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) 
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:4100%
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Memory reloaded (0
> pages)
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) read VCPU 0
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Completed checkpoint
> load
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Domain ready to be
> built.
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Restore exit with rc=0
> 
> Note, xc_restore on xen-3.4.3 works much faster (and with no warnings in the
> log), with the same dom0 pvops kernel.
> 
> Ok, so there is some issue here. Some more generic thoughts below.
> 
> 2) xen-3.4.3
> Firstly, /etc/xen/scripts/block in xen-3.4.3 tries to do something like
> for i in /dev/loop* ; do
>       losetup $i
> so, spawn one losetup process per each existing /dev/loopX; it hogs CPU, 
> especially if your system comes with maxloops=255 :). So,
> let's replace it with the xen-4.0.0 version, where this problem is fixed (it 
> uses losetup -a, hurray).
> Then, restore time for a 400MB domain, with the restore file in the cache,
> with 4 vbds backed by /dev/loopX, with one vif, is ca 2.7s real time.
> According to systemtap, the CPU time requirements are
> xend threads- 0.363s
> udevd(in dom0) - 0.007s
> /etc/xen/scripts/block and its children - 1.075s
> xc_restore - 1.368s
> /etc/xen/scripts/vif-bridge (in netvm) - 0.130s
> 
> The obvious idea to improve /etc/xen/scripts/block shell script execution 
> time 
> is to recode it, in some other language that will not spawn hundreds of 
> processes to do its job.
> 
> Now, xc_restore.
> a) Is it correct that when xc_restore runs, the target domain memory is 
> already
> zeroed (because hypervisor scrubs free memory, before it is assigned to a
> new domain) ? So, xc_save could check whether a given page contains only
> zeroes and if so, omit it in the savefile. This could result in quite
> significant savings when
> - we save a freshly booted domain, or if we can zero out free memory in the 
>   domain before saving
> - we plan to restore multiple times from the same savefile (yes, vbd must be
> restored in this case too).
> 
> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one
> read syscall per page. Make it read in larger chunks. It looks it is fixed in
> xen-4.0.0, is this correct ?
> 
> Also, it looks really excessive that basically copying 400MB of memory takes 
> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
> dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything 
> else ?
> I am aware that in the usual cases, xc_restore is not the bottleneck 
> (savefile reads from the disk or the network is), but in case we can fetch 
> savefile quickly, it matters.
> 
> Is 3.4.3 branch still being developed, or pure maintenance mode only, so new 
> code should be prepared for 4.0.0 ? 
> 
> Regards,
> Rafal Wojtczuk
> Principal Researcher
> Invisible Things Lab, Qubes-os project
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel