| 
         
xen-devel
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61
 
| 
To:  | 
Jeremy Fitzhardinge <jeremy@xxxxxxxx> | 
 
| 
Subject:  | 
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 | 
 
| 
From:  | 
"Tian, Kevin" <kevin.tian@xxxxxxxxx> | 
 
| 
Date:  | 
Fri, 29 Apr 2011 08:19:44 +0800 | 
 
| 
Accept-language:  | 
en-US | 
 
| 
Acceptlanguage:  | 
en-US | 
 
| 
Cc:  | 
MaoXiaoyun <tinnycloud@xxxxxxxxxxx>,	xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>,	"giamteckchoon@xxxxxxxxx" <giamteckchoon@xxxxxxxxx>,	"konrad.wilk@xxxxxxxxxx" <konrad.wilk@xxxxxxxxxx> | 
 
| 
Delivery-date:  | 
Thu, 28 Apr 2011 17:20:40 -0700 | 
 
| 
Envelope-to:  | 
www-data@xxxxxxxxxxxxxxxxxxx | 
 
| 
In-reply-to:  | 
<4DB9F845.6020204@xxxxxxxx> | 
 
| 
List-help:  | 
<mailto:xen-devel-request@lists.xensource.com?subject=help> | 
 
| 
List-id:  | 
Xen developer discussion <xen-devel.lists.xensource.com> | 
 
| 
List-post:  | 
<mailto:xen-devel@lists.xensource.com> | 
 
| 
List-subscribe:  | 
<http://lists.xensource.com/mailman/listinfo/xen-devel>,	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe> | 
 
| 
List-unsubscribe:  | 
<http://lists.xensource.com/mailman/listinfo/xen-devel>,	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> | 
 
| 
References:  | 
<COL0-MC1-F14hmBzxHs00230882@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>,	<BLU157-w488E5FEBD5E2DBC0666EF1DAA70@xxxxxxx>,	<BLU157-w5025BFBB4B1CDFA7AA0966DAA90@xxxxxxx>,	<BLU157-w540B39FBA137B4D96278D2DAA90@xxxxxxx>,	<BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@xxxxxxxxxxxxxx>,	<BANLkTimkMgYNyANcKiZu5tJTL4==zdP3xg@xxxxxxxxxxxxxx>,	<BLU157-w116F1BB57ABFDE535C7851DAA80@xxxxxxx>,	<4DA3438A.6070503@xxxxxxxx>, 	<BLU157-w2C6CD57CEA345B8D115E8DAAB0@xxxxxxx>,	<BLU157-w36F4E0A7503A357C9DE6A3DAAB0@xxxxxxx>,	<20110412100000.GA15647@xxxxxxxxxxxx>,	<BLU157-w14B84A51C80B41AB72B6CBDAAD0@xxxxxxx>,	<BANLkTinNxLnJxtZD68ODLSJqafq0tDRPfw@xxxxxxxxxxxxxx>,	<BLU157-w30A1A208238A9031F0D18EDAAD0@xxxxxxx>,	<BLU157-w383D1A2536480BCD4C0E0EDAAD0@xxxxxxx>	<BLU157-w42DAD248C94153635E9749DAAC0@xxxxxxx>,	<4DA8B715.9080508@xxxxxxxx>	<BLU157-w51A8A73D5A656542F9AB13DA960@xxxxxxx>	<625BA99ED14B2D499DC4E29D8138F1505C7F2C5185@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>	<4DB9F845.6020204@xxxxxxxx> | 
 
| 
Sender:  | 
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx | 
 
| 
Thread-index:  | 
AcwF/CNszPMbDuv7SgmBv4XtozKiRgABsDGQ | 
 
| 
Thread-topic:  | 
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 | 
 
 
 
> From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx]
> Sent: Friday, April 29, 2011 7:29 AM
> 
> On 04/25/2011 10:52 PM, Tian, Kevin wrote:
> >> From: MaoXiaoyun
> >> Sent: Monday, April 25, 2011 11:15 AM
> >>> Date: Fri, 15 Apr 2011 14:22:29 -0700
> >>> From: jeremy@xxxxxxxx
> >>> To: tinnycloud@xxxxxxxxxxx
> >>> CC: giamteckchoon@xxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx;
> >>> konrad.wilk@xxxxxxxxxx
> >>> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61
> >>>
> >>> On 04/15/2011 05:23 AM, MaoXiaoyun wrote:
> >>>> Hi:
> >>>>
> >>>> Could the crash related to this patch ?
> >>>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commitdi
> >>>> ff;h=45bfd7bfc6cf32f8e60bb91b32349f0b5090eea3
> >>>>
> >>>> Since now TLB state change to TLBSTATE_OK(mmu_context.h:40) is
> >>>> before cpumask_clear_cpu(line 49).
> >>>> Could it possible that right after execute line 40 of
> >>>> mmu_context.h, CPU revice IPI from other CPU to flush the mm, and
> >>>> when in interrupt, find the TLB state happened to be TLBSTATE_OK.
> >>>> Which conflicts.
> >>> Does reverting it help?
> >>>
> >>> J
> >>
> >> Hi Jeremy:
> >>
> >>     The lastest test result shows the reverting didn't help.
> >>     Kernel panic exactly at the same place in tlb.c.
> >>
> >>     I have question about TLB state, from the stack,
> >>     xen_do_hypervisor_callback-> xen_evtchn_do_upcall->...
> >> ->drop_other_mm_ref
> >>
> >>     What  cpu_tlbstate.state should be,  could  TLBSTATE_OK or
> TLBSTATE_LAZY all be possible?
> >>     That is after a hypercall from userspace, state will be TLBSTATE_OK,
> and
> >>       if from kernel space, state will be TLBSTATE_LAZE ?
> >>
> >>        thanks.
> > it looks a bug in drop_other_mm_ref implementation, that current TLB
> > state should be checked before invoking leave_mm(). There's a window
> between below lines of code:
> >
> > <xen_drop_mm_ref>
> >        /* Get the "official" set of cpus referring to our pagetable. */
> >         if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) {
> >                 for_each_online_cpu(cpu) {
> >                         if (!cpumask_test_cpu(cpu,
> mm_cpumask(mm))
> >                             && per_cpu(xen_current_cr3, cpu) !=
> __pa(mm->pgd))
> >                                 continue;
> >                         smp_call_function_single(cpu,
> drop_other_mm_ref, mm, 1);
> >                 }
> >                 return;
> >         }
> >
> > there's chance that when smp_call_function_single is invoked, actual
> > TLB state has been updated in the other cpu. The upstream kernel patch
> > you referred to earlier just makes this bug exposed more easily. But
> > even without this patch, you may still suffer such issue which is why 
> > reverting
> the patch doesn't help.
> >
> > Could you try adding a check in drop_other_mm_ref?
> >
> >         if (active_mm == mm && percpu_read(cpu_tlbstate.state) !=
> TLBSTATE_OK)
> >                 leave_mm(smp_processor_id());
> >
> > once the interrupted context has TLBSTATE_OK, it implicates that later
> > it will handle the TLB flush and thus no need for leave_mm from
> > interrupt handler, and that's the assumption of doing leave_mm.
> 
> That seems reasonable.  MaoXiaoyun, does it fix the bug for you?
> 
> Kevin, could you submit this as a proper patch?
> 
I'm waiting for Xiaoyun's test result before submitting a proper patch, since 
this
part of logic is tricky and his test can make sure we don't overlook some corner
cases. :-)
Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 
 |   
 
| <Prev in Thread] | 
Current Thread | 
[Next in Thread>
 |  
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, (continued)
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 - [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 - [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 - [Xen-devel] Re: Kernel BUG at arch/x86/mm/tlb.c:61, Jeremy Fitzhardinge
 - [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 - RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
 - RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 - RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
 
    
- Re: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Jeremy Fitzhardinge
 - RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61,
Tian, Kevin <=
 - RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 - RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
 
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 - [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 - [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 - [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 - RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
 
- [Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1872, Teck Choon Giam
 
 
[Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1860!, Joerg Stephan
 |  
  
 | 
    |