This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Race between ept_get_entry / ept_set_entry

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Race between ept_get_entry / ept_set_entry
From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Date: Thu, 26 Aug 2010 11:35:58 +0100
Cc: "Li, Xin" <xin.li@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Delivery-date: Thu, 26 Aug 2010 03:38:38 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=lys1wzJGeeKNmi2EXmMMYdVgXoKo7FqnMfmDSMFJzWQ=; b=n4uOIaUyWDJglJwuKZyHisDZ4wdyMjIhUrWUfhvI2NdNnwAxTdUAAXfHLYIhTgeh6D HUITsrRCOdrOOnph129nYnqHSxO6STlqUTiq9rM29X0RTL+CKHCI5OKL5CR1BJMEpf+p rOd08uFX4S9J98zYgFmV/n4S2GGzovvBlYrqo=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; b=jOt9dIH2e0+fiB8kJbKJ2cdLbHjggfSpKinyMcU4aO05dw0PquBbaxwgsBE1HtZoVs nYU+/ahMZtJZaVQNdzes8/rQkRfkczr4cnUd1ybeFnthTLwTg+qgsvTMNjNOZFNd/sWF vuwAc8rFg8pYt21ohMsn8MXYooX8ak9agcHPw=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
In the course of doing some fixes for my populate-on-demand testing, I
found that a Windows Server 2008 VM with 30G static max and 24G ram
(i.e., booting ballooned) crashed 1-2 times out of ten during boot,
reporting MMIO errors.

I managed to get a trace of this crash.  Strangely enough, the trace
indicated that the page the NPF occured on was populate-on-demand --
but that hvm_hap_nested_page_fault() injected a GP anyway.

The only way this would be possible is if the gfn_to_mfn_query() in
the trace function got a p2m type of p2m_popluate_on_demand, but the
gfn_to_mfn_current() in hvm_hap_nested_page_fault() got a p2m type of

Looking at the trace (snippet attached), the failed NPF happened on
d1v1; but almost simultaneously on d1v0, an NPF fault happened that
caused a populate-on-demand demand populate.  That demand populate
happened to be of a superpage that was shared with the gpa fault on

So, the first query on d1v1 (correctly) got a PoD; but the second
query, instead of either causing the demand-populate, or successfully
getting the result of d1v0's demand populate, returned failure,
causing the guest to crash.

I looked in the p2m-ept.c code, and noticed (once again) that
ept_get_entry() can be called without the p2m lock held.  I added
conditional locks, and am running the test again. The guest has now
booted 20 times successfully without crashing (whereas before, the
average was about 2 in 10 crashing).

Looking closely at the code, I can see one potential race:
* entry starts out PoD, not-present.
* v0 finds the entry PoD, allocates a page, calls set_p2m_entry(),
which calls ept_set_entry().
* v1 begins to walk the pagetable; at some point, it calls
ept_next_level(), which finds the flags all clear (entry->epte & 7 ==
* v0 ept_set_entry() changes the p2m type from p2m_populate_on_demand
to p2m_ram_rw
* v1 ept_next_level() reads entry->avail1 and finds that it is not
p2m_populate_on_demand, so it returns GUEST_TABLE_MAP_FAILED
* v0 ept_set_entry() sets the flags to present.

Is there a good reason not to just grab the p2m lock when walking the
ept tables?  We could conceivably do some cleverness to avoid this
kind of race, but unless there's a significant performance gain, I
think the simple approach is better.


Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>