WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Xen pvops kernel CONFIG_HIGHPTE race/crash

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Subject: Re: [Xen-devel] Xen pvops kernel CONFIG_HIGHPTE race/crash
From: Pasi Kärkkäinen <pasik@xxxxxx>
Date: Mon, 8 Feb 2010 14:57:56 +0200
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Keir Fraser <Keir.Fraser@xxxxxxxxxxxxx>, Daniel Stodden <Daniel.Stodden@xxxxxxxxxx>
Delivery-date: Mon, 08 Feb 2010 04:58:24 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100208085716.GN2861@xxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4B5DF8E5.9040906@xxxxxxxx> <1264613173.16526.97718.camel@xxxxxxxxxxxxxxxxxxxxxx> <20100207193517.GJ2861@xxxxxxxxxxx> <1265578978.2938.3.camel@xxxxxxxxxxxxxxxxxxxxx> <1265581335.658.2.camel@xxxxxxxxxxxxxxxxxxxxxxx> <20100208074140.GL2861@xxxxxxxxxxx> <1265615222.2938.4.camel@xxxxxxxxxxxxxxxxxxxxx> <20100208080617.GM2861@xxxxxxxxxxx> <1265619053.2938.6.camel@xxxxxxxxxxxxxxxxxxxxx> <20100208085716.GN2861@xxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.18 (2008-05-17)
On Mon, Feb 08, 2010 at 10:57:16AM +0200, Pasi Kärkkäinen wrote:
> On Mon, Feb 08, 2010 at 08:50:53AM +0000, Ian Campbell wrote:
> > On Mon, 2010-02-08 at 08:06 +0000, Pasi Kärkkäinen wrote: 
> > > On Mon, Feb 08, 2010 at 07:47:02AM +0000, Ian Campbell wrote:
> > > > On Mon, 2010-02-08 at 07:41 +0000, Pasi Kärkkäinen wrote: 
> > > > > On Sun, Feb 07, 2010 at 02:22:15PM -0800, Daniel Stodden wrote:
> > > > > > On Sun, 2010-02-07 at 16:42 -0500, Ian Campbell wrote:
> > > > > > > On Sun, 2010-02-07 at 19:35 +0000, Pasi Kärkkäinen wrote: 
> > > > > > > > On Wed, Jan 27, 2010 at 05:26:13PM +0000, Ian Campbell wrote:
> > > > > > > > > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > > > > > > > > index 65215ab..49f8e83 100644
> > > > > > > > > --- a/arch/x86/mm/pgtable.c
> > > > > > > > > +++ b/arch/x86/mm/pgtable.c
> > > > > > > > > @@ -28,7 +28,10 @@ pgtable_t pte_alloc_one(struct mm_struct 
> > > > > > > > > *mm, unsigned long address)
> > > > > > > > >       struct page *pte;
> > > > > > > > >  
> > > > > > > > >  #ifdef CONFIG_HIGHPTE
> > > > > > > > > -     pte = alloc_pages(PGALLOC_GFP | __GFP_HIGHMEM, 0);
> > > > > > > > > +     if (is_xen_domain())
> > > > > > > > > +             pte = alloc_pages(PGALLOC_GFP, 0);
> > > > > > > > > +     else
> > > > > > > > > +             pte = alloc_pages(PGALLOC_GFP | __GFP_HIGHMEM, 
> > > > > > > > > 0);
> > > > > > > > >  #else
> > > > > > > > >       pte = alloc_pages(PGALLOC_GFP, 0);
> > > > > > > > >  #endif
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > I just tried this patch, but it fails to compile:
> > > > > > > > 
> > > > > > > > arch/x86/mm/pgtable.c: In function 'pte_alloc_one':
> > > > > > > > arch/x86/mm/pgtable.c:19: error: implicit declaration of 
> > > > > > > > function 'is_xen_domain'
> > > > > > > > make[2]: *** [arch/x86/mm/pgtable.o] Error 1
> > > > > > > > make[1]: *** [arch/x86/mm] Error 2
> > > > > > > > make: *** [arch/x86] Error 2
> > > > > > > > 
> > > > > > > > I tried grepping around for that function but didn't find it 
> > > > > > > > from any header..
> > > > > > > 
> > > > > > > IIRC on some kernels it was called just xen_domain(), I'm not 
> > > > > > > sure but I
> > > > > > > think my patch was against plain 2.6.32. I think the function 
> > > > > > > (whatever
> > > > > > > it is called) also moved around in the headers recently.
> > > > > > 
> > > > > > is_running_on_xen()
> > > > > > 
> > > > > 
> > > > > I'm using kernel.org 2.6.32.7, and I can't find is_running_on_xen() 
> > > > > either.
> > > > 
> > > > 2.6.32.7 has xen_domain() defined in include/xen/xen.h. Probably
> > > > xen_pv_domain() is the correct check though.
> > > > 
> > > 
> > > # grep xen_domain include/xen/xen.h
> > > grep: include/xen/xen.h: No such file or directory
> > > 
> > > # grep xen_domain include/xen/interface/xen.h
> > > typedef uint8_t xen_domain_handle_t[16];
> > > 
> > > # ls include/xen
> > > events.h  features.h     hvc-console.h  Kbuild  xenbus.h   xen-ops.h
> > > evtchn.h  grant_table.h  interface      page.h  xencomm.h
> > > 
> > > # grep xen_domain include/xen/*
> > > #
> > > 
> > > # grep xen_domain include/xen/interface/*
> > > include/xen/interface/xen.h:typedef uint8_t xen_domain_handle_t[16];
> > > #
> > 
> > Ah, my 2.6.32.7 was actually 2.6.32.7 + Jeremy's next branch which has
> > the headfile move I mentioend earlier whereas plain 2.6.32.7 does not.
> > 
> > # git grep  xen_domain_type  v2.6.32.7 arch/x86/xen arch/x86/include/asm
> > v2.6.32.7:arch/x86/include/asm/xen/hypervisor.h:enum xen_domain_type {
> > v2.6.32.7:arch/x86/include/asm/xen/hypervisor.h:extern enum xen_domain_type 
> > xen_domain_type;
> > v2.6.32.7:arch/x86/include/asm/xen/hypervisor.h:#define xen_domain_type     
> >     XEN_NATIVE
> > v2.6.32.7:arch/x86/include/asm/xen/hypervisor.h:#define xen_domain()        
> >     (xen_domain_type != XEN_NATIVE)
> > v2.6.32.7:arch/x86/include/asm/xen/hypervisor.h:                            
> >      xen_domain_type == XEN_PV_DOMAIN)
> > v2.6.32.7:arch/x86/include/asm/xen/hypervisor.h:                            
> >      xen_domain_type == XEN_HVM_DOMAIN)
> > v2.6.32.7:arch/x86/xen/enlighten.c:enum xen_domain_type xen_domain_type = 
> > XEN_NATIVE;
> > v2.6.32.7:arch/x86/xen/enlighten.c:EXPORT_SYMBOL_GPL(xen_domain_type);
> > v2.6.32.7:arch/x86/xen/enlighten.c:     xen_domain_type = XEN_PV_DOMAIN;
> > 
> > so it looks like you need xen_(|pv_)domain() and asm/xen/hypervisor.h
> > 
> 
> # grep xen_pv_domain arch/x86/include/asm/xen/hypervisor.h
> #define xen_pv_domain()         (xen_domain() &&                        \
> #define xen_initial_domain()    (xen_pv_domain() && \
> 
> And now it compiles.. I'm using xen_pv_domain().
> I'll do some testing when it's built. 
> 
> I also noticed that the HIGHPTE race is much easier to reproduce
> on a 2vcpu guest.. where it takes less than 30 mins to happen.
> 
> 1vcpu guest survived kernel compilation loop for a couple hours OK.
> 

Ok, based on quick testing the patch seems to fix the problem.

details: 2 vcpu guest, 2 GB RAM, 32bit PAE, running 2.6.32.7 with and without 
the patch.

without the patch: crashes in around 30 minutes.
with the patch: survived over one hour without problems.

I'll continue the compilation loop to make sure it works for real..

-- Pasi

This is the patch I used:

--- linux-2.6.32.7/arch/x86/mm/pgtable.c.orig   2010-01-29 01:06:20.000000000 
+0200
+++ linux-2.6.32.7/arch/x86/mm/pgtable.c        2010-02-08 10:52:36.495142772 
+0200
@@ -3,6 +3,7 @@
 #include <asm/pgtable.h>
 #include <asm/tlb.h>
 #include <asm/fixmap.h>
+#include <asm/xen/hypervisor.h>

 #define PGALLOC_GFP GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO

@@ -16,7 +17,10 @@
        struct page *pte;

 #ifdef CONFIG_HIGHPTE
-       pte = alloc_pages(PGALLOC_GFP | __GFP_HIGHMEM, 0);
+       if (xen_pv_domain())
+               pte = alloc_pages(PGALLOC_GFP, 0);
+       else
+               pte = alloc_pages(PGALLOC_GFP | __GFP_HIGHMEM, 0);
 #else
        pte = alloc_pages(PGALLOC_GFP, 0);
 #endif



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>