xen-devel

[Top] [All Lists]

[Xen-devel] shadow OOS and fast path are incompatible

from [Frank van der Linden]

[Permanent Link][Original]

To:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	[Xen-devel] shadow OOS and fast path are incompatible
From:	Frank van der Linden <Frank.Vanderlinden@xxxxxxx>
Date:	Thu, 02 Jul 2009 14:30:22 -0600
Delivery-date:	Thu, 02 Jul 2009 13:31:43 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Thunderbird 2.0.0.21 (X11/20090323)

We recently observed a problem with Solaris HVM domains. The bug wasseen was seen with a higher number of VCPUs (3 or more), and always hadthe same pattern: some memory was allocated in the guest, but the firstreference caused it to crash with a fatal pagefault. However, oninspection of the page tables, the guests' view of the pagetables wasconsistent: the page was present.


Disabling the out-of-sync optimization made this problem go away.

Eventually, I tracked it down to the fault fast path and the OOS code insh_page_fault(). Here's what happens:

* CPU 0 has a page fault for a PTE in an OOS page that hasn't beensynched yet

* CPU 1 has the same page fault (or at least one involving the same L1 page)
* CPU 1 enters the fast path
* CPU 0 finds the L1 page OOS and starts a resync
* CPU 1 finds it's a "special" entry (mmio or gnp)
* CPU 0 finishes resync, clears OOS flag for the L1 page
* CPU 1 finds it's not an OOS L1 page
* CPU 1 finds that the shadow L1 entry is GNP
* CPU 1 bounces fault to guest (sh_page_fault returns 0)
* guest sees an unexpected page fault

There are certainly ways to rearrange the code to avoid this particularscenario, but it points to a bigger issue: the fast fault path and OOSpages are inherently incompatible. Since the fast path works outside ofthe shadow lock, there is nothing that prevents another CPU coming inand changing the OOS status, re-syncing the page, etc, right under yournose.

Optimized operations without OOS (i.e. on a single L1 PTE) are safe inthe fast path outside of the lock, since the guest will have theappropriate locking around the PTE writes. But with OOS, you're dealingwith an entire L1 page.

I haven't checked the fast emulation path, but similar problems might belurking there in combination with OOS.

I can think of some ways to fix this, but they involve locking, whichmostly defeats the purpose of the fast fault path.


Ideas/suggestions?

- Frank


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
[Xen-devel] shadow OOS and fast path are incompatible, Frank van der Linden <= Re: [Xen-devel] shadow OOS and fast path are incompatible, Gianluca Guida Re: [Xen-devel] shadow OOS and fast path are incompatible, Tim Deegan Re: [Xen-devel] shadow OOS and fast path are incompatible, Gianluca Guida Re: [Xen-devel] shadow OOS and fast path are incompatible, Gianluca Guida

Previous by Date:	Re: [Xen-devel] Re: [PATCH] switch to a known good/static GDT beforekexec, Keir Fraser
Next by Date:	[Xen-devel] 'xm create' creates domains without a valid console tty, joe mcguckin
Previous by Thread:	[Xen-devel] 3w_9xxx + Xen-patched 2.6.30 dom0 == bad LUN detection, Christopher S. Aker
Next by Thread:	Re: [Xen-devel] shadow OOS and fast path are incompatible, Gianluca Guida
Indexes:	[Date] [Thread] [Top] [All Lists]