xen-devel

[Top] [All Lists]

RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61

from [MaoXiaoyun]

[Permanent Link][Original]

To:	<kevin.tian@xxxxxxxxx>, <jeremy@xxxxxxxx>
Subject:	RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61
From:	MaoXiaoyun <tinnycloud@xxxxxxxxxxx>
Date:	Fri, 29 Apr 2011 09:50:57 +0800
Cc:	xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, giamteckchoon@xxxxxxxxx, konrad.wilk@xxxxxxxxxx
Delivery-date:	Thu, 28 Apr 2011 18:52:19 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
Importance:	Normal
In-reply-to:	<625BA99ED14B2D499DC4E29D8138F1505C843BB27A@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<COL0-MC1-F14hmBzxHs00230882@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>, , <BLU157-w488E5FEBD5E2DBC0666EF1DAA70@xxxxxxx>, , <BLU157-w5025BFBB4B1CDFA7AA0966DAA90@xxxxxxx>, , <BLU157-w540B39FBA137B4D96278D2DAA90@xxxxxxx>, , <BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@xxxxxxxxxxxxxx>, , <BANLkTimkMgYNyANcKiZu5tJTL4==zdP3xg@xxxxxxxxxxxxxx>, , <BLU157-w116F1BB57ABFDE535C7851DAA80@xxxxxxx>, <4DA3438A.6070503@xxxxxxxx>, , <BLU157-w2C6CD57CEA345B8D115E8DAAB0@xxxxxxx>, , <BLU157-w36F4E0A7503A357C9DE6A3DAAB0@xxxxxxx>, , <20110412100000.GA15647@xxxxxxxxxxxx>, , <BLU157-w14B84A51C80B41AB72B6CBDAAD0@xxxxxxx>, , <BANLkTinNxLnJxtZD68ODLSJqafq0tDRPfw@xxxxxxxxxxxxxx>, , <BLU157-w30A1A208238A9031F0D18EDAAD0@xxxxxxx>, , <BLU157-w383D1A2536480BCD4C0E0EDAAD0@xxxxxxx>, <BLU157-w42DAD248C94153635E9749DAAC0@xxxxxxx>, <4DA8B715.9080508@xxxxxxxx>, <BLU157-w51A8A73D5A656542F9AB13DA960@xxxxxxx>, <625BA99ED14B2D499DC4E29D8138F1505C7F2C5185@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>, <4DB9F845.6020204@xxxxxxxx>, <625BA99ED14B2D499DC4E29D8138F1505C843BB27A@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

> From: kevin.tian@xxxxxxxxx
> To: jeremy@xxxxxxxx
> CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx; giamteckchoon@xxxxxxxxx; konrad.wilk@xxxxxxxxxx
> Date: Fri, 29 Apr 2011 08:19:44 +0800
> Subject: RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61
>
> > From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx]
> > Sent: Friday, April 29, 2011 7:29 AM
> >
> > On 04/25/2011 10:52 PM, Tian, Kevin wrote:
> > >> From: MaoXiaoyun
> > >> Sent: Monday, April 25, 2011 11:15 AM
> > >>> Date: Fri, 15 Apr 2011 14:22:29 -0700
> > >>> From: jeremy@xxxxxxxx
> > >>> To: tinnycloud@xxxxxxxxxxx
> > >>> CC: giamteckchoon@xxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx;
> > >>> konrad.wilk@xxxxxxxxxx
> > >>> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61
> > >>>
&g t; > >>> On 04/15/2011 05:23 AM, MaoXiaoyun wrote:
> > >>>> Hi：
> > >>>>
> > >>>> Could the crash related to this patch ?
> > >>>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commitdi
> > >>>> ff;h=45bfd7bfc6cf32f8e60bb91b32349f0b5090eea3
> > >>>>
> > >>>> Since now TLB state change to TLBSTATE_OK(mmu_context.h:40) is
> > >>>> before cpumask_clear_cpu(line 49).
> > >>>> Could it possible that right after execute line 40 of
> > >>>> mmu_context.h, CPU revice IPI from other CPU to flush the mm, and
> > >>>> when in interrupt, find the TLB state happened to be TLBSTATE_OK.
> > >>>> Which conflicts.
> > >>> Does reverting it help?
> > >>>
> > >>> J
> > >>
> > >> Hi Jeremy:
> > >>
> > >> The lastest test result shows the reverting didn't help.
> > >> Kernel panic exactly at the same place in tlb.c.
> > >>
> > >> I have question about TLB state, from the stack,
> > >> xen_do_hypervisor_callback-> xen_evtchn_do_upcall->...
> > >> ->drop_other_mm_ref
> > >>
> > >> What cpu_tlbstate.state should be, could TLBSTATE_OK or
> > TLBSTATE_LAZY all be possible?
> > >> That is after a hypercall from userspace, state will be TLBSTATE_OK,
> > and
> > >> if from kernel space, state will be TLBSTATE_LAZE ?
> > >>
> > >> thanks.
> > > it looks a bug in drop_other_mm_ref implementation, that current TLB
> > > state should be checked before invoking leave_mm(). There's a window
> > between below lines of code:
> > >
> > > <xen_drop_mm_ref>
> > > /* Get the "official" set of cpus referring to our pagetable. */
> > > if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) {
> > > for_each_online_cpu(cpu) {
> > > if (!cpumask_test_cpu(cpu,
> > mm_cpumask(mm))
> > > && per_cpu(xen_current_cr3, cpu) !=
> > __pa(mm->pgd))
> > > continue;
> > > smp_call_function_single(cpu,
> > drop_other_mm_ref, mm, 1);
> > > }
> > > return;
> > > }
> > >
> > > there's chance that when smp_call_function_single is invoked, actual
> > > TLB state has been updated in the other cpu. The upstream kernel patch
> > > you referred to earlier just makes this bug exposed more easily. But
> > > even without this patch, you may still suffer such issue which is why reverting
> > the patch doesn't help.
> > >
> > > Could you try adding a check in drop_other_mm_ref?
> > >
> > > if (active_mm == mm && percpu_read(cpu_tlbstate.state) !=
> > TLBSTATE_OK)
> > > leave_mm(smp_processor_id());
> > >
> > > once the interrupted context has TLBSTATE_OK, it implicates that later
> > > it will handle the TLB flush and thus no need for leave_mm from
> > > interrupt handler, and that's the assumption of doing leave_mm.
> >
> > That seems reasonable. MaoXiaoyun, does it fix the bug for you?
> >
> > Kevin, could you submit this as a proper patch?
> >
>
> I'm waiting for Xiaoyun's test result before submitting a proper patch, since this
> part of logic is tricky and his test can make sure we don't overlook some corner
> cases. :-)
>

I think it works. The test has been running over 70 hours successfully.
My plan is run one week.

Thanks.

> Thanks
> Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, (continued) [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun [Xen-devel] Re: Kernel BUG at arch/x86/mm/tlb.c:61, Jeremy Fitzhardinge [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin Re: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Jeremy Fitzhardinge RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun <= RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin [Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1872, Teck Choon Giam [Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1860!, Joerg Stephan Re: [Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1860!, Pasi Kärkkäinen

Previous by Date:	Re: [Xen-devel] Re: [Xen-users] Issues migrating to xen 4.1 / qemu-dm memory leak?, Sylvain CHEVALIER
Next by Date:	RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
Previous by Thread:	RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
Next by Thread:	RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
Indexes:	[Date] [Thread] [Top] [All Lists]