WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

To: Gerd Hoffmann <kraxel@xxxxxxxxxx>
Subject: Re: [Xen-devel] Re: Next steps with pv_ops for Xen
From: Derek Murray <Derek.Murray@xxxxxxxxxxxx>
Date: Mon, 03 Dec 2007 14:51:19 +0000
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Eduardo Habkost <ehabkost@xxxxxxxxxx>, Juan Quintela <quintela@xxxxxxxxxx>, "Stephen C. Tweedie" <sct@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxxxx>, Glauber de Oliveira Costa <gcosta@xxxxxxxxxx>, Chris Wright <chrisw@xxxxxxxxxxxx>, "virtualization@xxxxxxxxxxxxxx" <virtualization@xxxxxxxxxxxxxx>
Delivery-date: Mon, 03 Dec 2007 06:51:59 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <47540FB8.8000106@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1195682725.6726.48.camel@xxxxxxxxxxxxxxxxxxxxx> <4753FC6A.4020601@xxxxxxxxxx> <4754024C.7020905@xxxxxxxxxxxx> <47540FB8.8000106@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.9 (X11/20071115)
Gerd Hoffmann wrote:
Derek Murray wrote:
I take the blame for that one. I added the hook because, if a process
were to die whilst holding one or more grants, there were no hooks that
would make it possible to carry out the grant-unmap. All existing hooks
on either the device or the VMA were called *after* the PTEs were cleared.

Hmm.  What exactly is the issue here?

This is about *userspace* mappings, right?  As far as I can see from a
quick scan there of the code is an additional kernel space mapping for
the grants and the userspace mapping is optional.  I don't see any
problems with userspace mapping going away without *instant*
notification.  Cleaning up a bit later, called from the
file_ops->release callback maybe, should work ok.

If we let Linux zap the page tables before we unmap the grant reference, then it is not possible to unmap the grant reference. The unmap_grant_ref hypercall ultimately calls destroy_grant_pte_mapping in xen/arch/x86/mm.c, which ensures that the PTE does in fact point to the granted frame. Note also the comment further up in that file (in put_page_from_l1e):

    /*
* Check if this is a mapping that was established via a grant reference. * If it was then we should not be here: we require that such mappings are
     * explicitly destroyed via the grant-table interface.
     *
* The upshot of this is that the guest can end up with active grants that
     * it cannot destroy (because it no longer has a PTE to present to the
     * grant-table interface). This can lead to subtle hard-to-catch bugs,
* hence a special grant PTE flag can be enabled to catch the bug early.
     *
* (Note that the undestroyable active grants are not a security hole in * Xen. All active grants can safely be cleaned up when the domain dies.)
     */

Effectively, there is a debug option that sets a bit in PTEs that map granted pages, and this can be used to force a domain_crash in the event that a VM tries to zap the entries normally. The normal behaviour is to silently accept the zap operation, and leak granted pages until the grantee domain is killed.

The problem I see with the additional vm_ops callback is that I suspect
you'll have to come up with some *very* good arguments to get it
accepted by the VM (as in "virtual memory") folks and merged mainline.

On this point I completely agree with you! If anyone has any less radical suggestions, then I'd be delighted to refactor the gntdev code to use them. However, I'm not currently aware of any alternative that maintains robustness to process crashes.

It gets better, though. The same hook is used in the version of blktap
in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for
xen-3.1-testing):

Oh, I'm thinking more in the direction of killing blktap altogether in
favor of a pure userspace implementation on top of gntdev.

I think this would represent good progress, though I wonder if there would be a performance penalty due to performing the mapping and unmapping in user-space (multiple syscalls per mapping versus a single hypercall).

Cheers,

Derek Murray.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel