Re: [Xen-devel] Poor HVM performance with 8 vcpus

To:	Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] Poor HVM performance with 8 vcpus
From:	Gianluca Guida <gianluca.guida@xxxxxxxxxxxxx>
Date:	Wed, 14 Oct 2009 12:16:59 +0200
Cc:	Tim Deegan <Tim.Deegan@xxxxxxxxxxxxx>, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Wed, 14 Oct 2009 03:19:34 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type; bh=Alm02pEm1SMt0sEQipLD+wpnciKNYCpANNXwOtrbIac=; b=KWzhmUWTvpeMDYmfbTi5Cxr7WLrxRJCeupypxt8etkkuyD6ySJ736NCdOhNIW1S+T3 /VqskemTsJ4Gxkei3oSAwl8I6sHzvDr8xAD4U/qI60NkCtDWdlIWwR2g2MRDvWWHnlVt B1yQIGSay091AMqX/yn+lccdH0565sI2KQULc=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=IGYKUvQ5gcb3/bexmxcJHFg5EdJIF5TbMRV3dzW/sDt9xESYqYFNAtkW4b04F1H4hU SBP7QK8liieztsQawncid0tAzYYwjztknR3hZJZzqVBFrhho7iQVLYzZopBK0zTSakzU SnNhVvonb1aWd2TUGRbXrPedwxqq1pdY+sjWY=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<C6FB4BF5.1764F%keir.fraser@xxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<4AD588D9.4040104@xxxxxxxxxxxxxx> <C6FB4BF5.1764F%keir.fraser@xxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

Ah, those good old OOS talks. I fear I am going to fail on my attempt
to be laconic.

On Wed, Oct 14, 2009 at 10:35 AM, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
> On 14/10/2009 09:16, "Juergen Gross" <juergen.gross@xxxxxxxxxxxxxx> wrote:
>
>> as the performance of BS2000 seems to be hit by OOS optimization, I'm
>> thinking of making a patch to disable this feature by a domain parameter.
>>
>> Is there a way to do this without having to change all places where the
>> #if statements are placed?
>> I think there should be some central routines where adding an "if" could
>> be enough (setting oos_active to 0 seems not to be enough, I fear).
>>
>> Do you have any hint?
>
> How about disabling it for domains with more than four VCPUs? Have you
> measured performance with OOS for 1-4 VCPU guests? This is perhaps not
> something that needs to be baked into guest configs.

In general, shadow code loses performances as the vcpus increase (>=4)
because of the single shadow lock (and getting rid of the shadow lock,
i.e. having per-vcpu shadows wouldn't help, since it would make much
slower the most common operation, that is removing writable access of
guest pages).
But the two algorithms (always in-sync vs. OOS) will show their
performance penalties in two different areas: in a scenario where
guests do lot of PTE writes (read Windows in most of its operations)
the in-sync approach will be more penalizing, because emulation is
slow and needs the shadow lock, while scenarios were guests tend to
have many dirty CR3 switches (that is CR3 switches with freshly
written PTEs, as in the case with Juergen benchmark and the famous
Windows parallel ddk build) will be penalized more by the OOS
algorithm.

Disabling OOS for domains more than 4 vcpus might be a good idea, but
not necessarily optimal. Furthermore, I always understood that a good
practice for VM performance is to have many small VMs instead of a VM
eating all of the host's CPUs, at least when shadow code is on. With
big VMs, EPT/NPT has always been the best approach, since even with
lot of TLB misses, the system was definitely lock-free in most of the
VM's life.

Creating a per-domain switch should be a good idea, but a more generic
(and correct) approach would be to have a dynamic policy for OOSing
pages, in which we would stop putting OOS pages when we realize that
we are resynch'ing too many pages in CR3 switches. This was taken in
consideration during the development of the OOS, but it was finally
discarded because performance were decent and big VMs were not in the
interest range.

Yes, definitely away from spartan wit. But I hope this clarifies the issue.

Thanks,
Gianluca

-- 
It was a type of people I did not know, I found them very strange and
they did not inspire confidence at all. Later I learned that I had been
introduced to electronic engineers.
                                                  E. W. Dijkstra

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Poor HVM performance with 8 vcpus