WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space

To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Sat, 5 Nov 2011 09:38:46 -0400
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "namhyung@xxxxxxxxx" <namhyung@xxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "linux-mm@xxxxxxxxx" <linux-mm@xxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, "rientjes@xxxxxxxxxx" <rientjes@xxxxxxxxxx>, "paulmck@xxxxxxxxxxxxxxxxxx" <paulmck@xxxxxxxxxxxxxxxxxx>
Delivery-date: Sat, 05 Nov 2011 06:39:58 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110906163553.GA28971@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1314877863-21977-1-git-send-email-david.vrabel@xxxxxxxxxx> <20110901161134.GA8979@xxxxxxxxxxxx> <4E5FED1A.1000300@xxxxxxxx> <20110901141754.76cef93b.akpm@xxxxxxxxxxxxxxxxxxxx> <4E60C067.4010600@xxxxxxxxxx> <20110902153204.59a928c1.akpm@xxxxxxxxxxxxxxxxxxxx> <20110906163553.GA28971@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Sep 06, 2011 at 12:35:53PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 02, 2011 at 03:32:04PM -0700, Andrew Morton wrote:
> > On Fri, 2 Sep 2011 12:39:19 +0100
> > David Vrabel <david.vrabel@xxxxxxxxxx> wrote:
> > 
> > > Xen backend drivers (e.g., blkback and netback) would sometimes fail
> > > to map grant pages into the vmalloc address space allocated with
> > > alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
> > > could not find the page (in the L2 table) containing the PTEs it
> > > needed to update.
> > > 
> > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> > > 
> > > netback and blkback were making the hypercall from a kernel thread
> > > where task->active_mm != &init_mm and alloc_vm_area() was only
> > > updating the page tables for init_mm.  The usual method of deferring
> > > the update to the page tables of other processes (i.e., after taking a
> > > fault) doesn't work as a fault cannot occur during the hypercall.
> > > 
> > > This would work on some systems depending on what else was using
> > > vmalloc.
> > > 
> > > Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
> > > (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
> > > comment to explain why it's needed.
> > 
> > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> > *think* that's the outcome of this discussion, for the short-term?
> 
> <nods> Yup. Thanks!

Hey Andrew,

The long term outcome is the patchset that David worked on. I've sent
a GIT PULL to Linus to pick up the Xen related patches that switch over
the users of the right API:

 (xen) stable/vmalloc-3.2 for Linux 3.2-rc0

(https://lkml.org/lkml/2011/10/29/82)

And then on top of that use this patch:
[Note, I am still waiting for Linus to pull that patchset above.. so not
sure on the outcome. perhaps a better way would be for you to pull
all patches in your tree?]

Also, not sure what you thought of this patch below?

>From b9acd3abc12972be0d938d7bc2466d899023e757 Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vrabel@xxxxxxxxxx>
Date: Thu, 29 Sep 2011 16:53:32 +0100
Subject: [PATCH] xen: map foreign pages for shared rings by updating the PTEs
 directly

When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs.  This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
---
 arch/x86/xen/grant-table.c         |    2 +-
 drivers/xen/xenbus/xenbus_client.c |   11 ++++++++---
 include/linux/vmalloc.h            |    2 +-
 mm/vmalloc.c                       |   27 +++++++++++++--------------
 4 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 6bbfd7a..5a40d24 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -71,7 +71,7 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned 
long nr_gframes,
 
        if (shared == NULL) {
                struct vm_struct *area =
-                       alloc_vm_area(PAGE_SIZE * max_nr_gframes);
+                       alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL);
                BUG_ON(area == NULL);
                shared = area->addr;
                *__shared = shared;
diff --git a/drivers/xen/xenbus/xenbus_client.c 
b/drivers/xen/xenbus/xenbus_client.c
index 229d3ad..52bc57f 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -34,6 +34,7 @@
 #include <linux/types.h>
 #include <linux/vmalloc.h>
 #include <asm/xen/hypervisor.h>
+#include <asm/xen/page.h>
 #include <xen/interface/xen.h>
 #include <xen/interface/event_channel.h>
 #include <xen/events.h>
@@ -435,19 +436,20 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
 int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void 
**vaddr)
 {
        struct gnttab_map_grant_ref op = {
-               .flags = GNTMAP_host_map,
+               .flags = GNTMAP_host_map | GNTMAP_contains_pte,
                .ref   = gnt_ref,
                .dom   = dev->otherend_id,
        };
        struct vm_struct *area;
+       pte_t *pte;
 
        *vaddr = NULL;
 
-       area = alloc_vm_area(PAGE_SIZE);
+       area = alloc_vm_area(PAGE_SIZE, &pte);
        if (!area)
                return -ENOMEM;
 
-       op.host_addr = (unsigned long)area->addr;
+       op.host_addr = arbitrary_virt_to_machine(pte).maddr;
 
        if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
                BUG();
@@ -526,6 +528,7 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void 
*vaddr)
        struct gnttab_unmap_grant_ref op = {
                .host_addr = (unsigned long)vaddr,
        };
+       unsigned int level;
 
        /* It'd be nice if linux/vmalloc.h provided a find_vm_area(void *addr)
         * method so that we don't have to muck with vmalloc internals here.
@@ -547,6 +550,8 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void 
*vaddr)
        }
 
        op.handle = (grant_handle_t)area->phys_addr;
+       op.host_addr = arbitrary_virt_to_machine(
+               lookup_address((unsigned long)vaddr, &level)).maddr;
 
        if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
                BUG();
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 9332e52..1a77252 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -118,7 +118,7 @@ unmap_kernel_range(unsigned long addr, unsigned long size)
 #endif
 
 /* Allocate/destroy a 'vmalloc' VM area. */
-extern struct vm_struct *alloc_vm_area(size_t size);
+extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
 extern void free_vm_area(struct vm_struct *area);
 
 /* for /dev/kmem */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5016f19..b5deec6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2105,23 +2105,30 @@ void  __attribute__((weak)) vmalloc_sync_all(void)
 
 static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data)
 {
-       /* apply_to_page_range() does all the hard work. */
+       pte_t ***p = data;
+
+       if (p) {
+               *(*p) = pte;
+               (*p)++;
+       }
        return 0;
 }
 
 /**
  *     alloc_vm_area - allocate a range of kernel address space
  *     @size:          size of the area
+ *     @ptes:          returns the PTEs for the address space
  *
  *     Returns:        NULL on failure, vm_struct on success
  *
  *     This function reserves a range of kernel address space, and
  *     allocates pagetables to map that range.  No actual mappings
- *     are created.  If the kernel address space is not shared
- *     between processes, it syncs the pagetable across all
- *     processes.
+ *     are created.
+ *
+ *     If @ptes is non-NULL, pointers to the PTEs (in init_mm)
+ *     allocated for the VM area are returned.
  */
-struct vm_struct *alloc_vm_area(size_t size)
+struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
 {
        struct vm_struct *area;
 
@@ -2135,19 +2142,11 @@ struct vm_struct *alloc_vm_area(size_t size)
         * of kernel virtual address space and mapped into init_mm.
         */
        if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
-                               area->size, f, NULL)) {
+                               size, f, ptes ? &ptes : NULL)) {
                free_vm_area(area);
                return NULL;
        }
 
-       /*
-        * If the allocated address space is passed to a hypercall
-        * before being used then we cannot rely on a page fault to
-        * trigger an update of the page tables.  So sync all the page
-        * tables here.
-        */
-       vmalloc_sync_all();
-
        return area;
 }
 EXPORT_SYMBOL_GPL(alloc_vm_area);
-- 
1.7.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel