This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] shadow OOS and fast path are incompatible

To: Frank van der Linden <Frank.Vanderlinden@xxxxxxx>
Subject: Re: [Xen-devel] shadow OOS and fast path are incompatible
From: Gianluca Guida <gianluca.guida@xxxxxxxxxxxxx>
Date: Thu, 2 Jul 2009 23:42:57 +0200
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 02 Jul 2009 14:46:21 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=w/stHM3cRdqC1P8S7OK/L3IHihl4zQJJMvnqUBwM6CY=; b=YPqtl+CaOwgI8TFQNMyUaNG0P1mUF/txq1pfepzaJReomaDFfrxgx1LKuDxaze0xcX BGxO4cqf8yJx3KdJzuDS+diivEAJa17KKdLxT7qGI/dr4YFIRwBARoOP7d4mLRu5/98+ 11tzKlZWLxdDF6OxYcbL2smtxbmvqoPzlsc24=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=oJp3LrRFI6nSgtt6uZRHiqRpNkMsLtKkanQcQoFchSqUhJ3/dPRNb5LcLL3mTn739u WjgmBEbkqAVhltwEMSdLMYTLBpcICWlAJIK92NzI7Lfm1gjnE9HWO+03JVMXSleHNSMq 9X+Kdg4ay+anhaP706ah1OdQmcQVbwliAEcTo=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A4D18DE.6070306@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4A4D18DE.6070306@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On Thu, Jul 2, 2009 at 10:30 PM, Frank van der
Linden<Frank.Vanderlinden@xxxxxxx> wrote:
> We recently observed a problem with Solaris HVM domains. The bug was seen
> was seen with a higher number of VCPUs (3 or more), and always had the same
> pattern: some memory was allocated in the guest, but the first reference
> caused it to crash with a fatal pagefault. However, on inspection of the
> page tables, the guests' view of the pagetables was consistent: the page was
> present.
> Disabling the out-of-sync optimization made this problem go away.
> Eventually, I tracked it down to the fault fast path and the OOS code in
> sh_page_fault(). Here's what happens:
> * CPU 0 has a page fault for a PTE in an OOS page that hasn't been synched
> yet
> * CPU 1 has the same page fault (or at least one involving the same L1 page)
> * CPU 1 enters the fast path
> * CPU 0 finds the L1 page OOS and starts a resync

CPU0 doesn't resync a whole L1 page because it's accessing it. There
are other reasons for a resync here (especially if the guest is 64
bit), but most probably the resync happen because CPU0 is unsyncing
another page. Anyway yes, it's highly unlikely but this race can
definitely happen. I think I never saw it.

> * CPU 1 finds it's a "special" entry (mmio or gnp)
> * CPU 0 finishes resync, clears OOS flag for the L1 page
> * CPU 1 finds it's not an OOS L1 page
> * CPU 1 finds that the shadow L1 entry is GNP
> * CPU 1 bounces fault to guest (sh_page_fault returns 0)
> * guest sees an unexpected page fault
> There are certainly ways to rearrange the code to avoid this particular
> scenario, but it points to a bigger issue: the fast fault path and OOS pages
> are inherently incompatible. Since the fast path works outside of the shadow
> lock, there is nothing that prevents another CPU coming in and changing the
> OOS status, re-syncing the page, etc, right under your nose.

You're right about this, the gnp fast path has always brought me some
doubt and I already intended to remove it because it's also useless as
an optimization with OOS (can actually slow down things in a classical
demand-paging scheme), but I never came to see what you just pointed

> I haven't checked the fast emulation path, but similar problems might be
> lurking there in combination with OOS.

I think that it should be safe enough, but yes another look at it
should be worth it.

> I can think of some ways to fix this, but they involve locking, which mostly
> defeats the purpose of the fast fault path.
> Ideas/suggestions?

Removing the fast GNP and leaving only fast mmio (it's safe and useful
for performances). I have this trivial patch somewhere, I'll post it

Thanks for tracking this down!

It was a type of people I did not know, I found them very strange and
they did not inspire confidence at all. Later I learned that I had been
introduced to electronic engineers.
                                                  E. W. Dijkstra

Xen-devel mailing list