WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] x86_32: spurious page faults in guest GDT area

To: <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] x86_32: spurious page faults in guest GDT area
From: "Jan Beulich" <jbeulich@xxxxxxxxxx>
Date: Mon, 16 Jun 2008 11:32:00 +0100
Delivery-date: Mon, 16 Jun 2008 03:32:06 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
While under long-during stress I can reproduce this issue back to at least
c/s 16084, in older change sets it was apparently so rare that during
normal work/testing I never noticed it or had to ignore it due to not being
re-creatable. However, on recent change sets (tested with our 2.6.25-
based kernels only so far) it happens much more frequently (and
occasionally even while the machine boots).

I inserted selector validation code in the context switch path to verify
that a vcpu's selectors are okay (or better, that the guest-provided
part of the GDT is accessible). These checks never indicated a failure
so far.

The faults may happen in various places (hypervisor exit path as well
as guest code), and always involve loading a selector register with a
guest defined value (i.e. in the first page of the GDT). A page walk
in the (hypervisor) fault handler shows that all levels of the translation
exist (and are valid/consistent), and instrumentation of the selector
manipulation functions shows that none of them get called spuriously.

Hence I can only suspect some asynchronous page table manipulation
(but I'm not aware of anything like that) lacking proper TLB flushing, or
some very rare issue with the CR3 reloading code.

The same 32-bit kernel used with a 64-bit hypervisor so far did not
show similar problems - while I first thought this would help narrow
the problem, I'm pretty clueless at this point because the candidate
areas where 32-bit code is different from 64-bit all don't look
troublesome to me (most notably TLB flushing is identical between
the two).

Any ideas on how to narrow the problem would be appreciated.
Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>