|   xen-devel
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 
| To: | MaoXiaoyun <tinnycloud@xxxxxxxxxxx>, "jeremy@xxxxxxxx" <jeremy@xxxxxxxx> |  
| Subject: | RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 |  
| From: | "Tian, Kevin" <kevin.tian@xxxxxxxxx> |  
| Date: | Fri, 29 Apr 2011 09:57:11 +0800 |  
| Accept-language: | en-US |  
| Acceptlanguage: | en-US |  
| Cc: | xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>,	"giamteckchoon@xxxxxxxxx" <giamteckchoon@xxxxxxxxx>,	"konrad.wilk@xxxxxxxxxx" <konrad.wilk@xxxxxxxxxx> |  
| Delivery-date: | Thu, 28 Apr 2011 18:59:05 -0700 |  
| Envelope-to: | www-data@xxxxxxxxxxxxxxxxxxx |  
| In-reply-to: | <BLU157-w56F58AC1E03973F638EA64DA9A0@xxxxxxx> |  
| List-help: | <mailto:xen-devel-request@lists.xensource.com?subject=help> |  
| List-id: | Xen developer discussion <xen-devel.lists.xensource.com> |  
| List-post: | <mailto:xen-devel@lists.xensource.com> |  
| List-subscribe: | <http://lists.xensource.com/mailman/listinfo/xen-devel>,	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |  
| List-unsubscribe: | <http://lists.xensource.com/mailman/listinfo/xen-devel>,	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |  
| References: | <COL0-MC1-F14hmBzxHs00230882@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>, ,	<BLU157-w488E5FEBD5E2DBC0666EF1DAA70@xxxxxxx>, ,	<BLU157-w5025BFBB4B1CDFA7AA0966DAA90@xxxxxxx>, ,	<BLU157-w540B39FBA137B4D96278D2DAA90@xxxxxxx>, ,	<BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@xxxxxxxxxxxxxx>, ,	<BANLkTimkMgYNyANcKiZu5tJTL4==zdP3xg@xxxxxxxxxxxxxx>, ,	<BLU157-w116F1BB57ABFDE535C7851DAA80@xxxxxxx>, 	<4DA3438A.6070503@xxxxxxxx>, ,	<BLU157-w2C6CD57CEA345B8D115E8DAAB0@xxxxxxx>, ,	<BLU157-w36F4E0A7503A357C9DE6A3DAAB0@xxxxxxx>, ,	<20110412100000.GA15647@xxxxxxxxxxxx>, ,	<BLU157-w14B84A51C80B41AB72B6CBDAAD0@xxxxxxx>, ,	<BANLkTinNxLnJxtZD68ODLSJqafq0tDRPfw@xxxxxxxxxxxxxx>, ,	<BLU157-w30A1A208238A9031F0D18EDAAD0@xxxxxxx>, ,	<BLU157-w383D1A2536480BCD4C0E0EDAAD0@xxxxxxx>,	<BLU157-w42DAD248C94153635E9749DAAC0@xxxxxxx>, 	<4DA8B715.9080508@xxxxxxxx>,	<BLU157-w51A8A73D5A656542F9AB13DA960@xxxxxxx>,	<625BA99ED14B2D499DC4E29D8138F1505C7F2C5185@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,	<4DB9F845.6020204@xxxxxxxx>,	<625BA99ED14B2D499DC4E29D8138F1505C843BB27A@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>	<BLU157-w56F58AC1E03973F638EA64DA9A0@xxxxxxx> |  
| Sender: | xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |  
| Thread-index: | AcwGD+13hvGxAukZSTyNnO4G6ZvI6gAAMyMw |  
| Thread-topic: | [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 |  
| | OK, thanks for the update. I’ll send out the patch then   Thanks Kevin   From: MaoXiaoyun [mailto:tinnycloud@xxxxxxxxxxx] Sent: Friday, April 29, 2011 9:51 AM
 To: Tian, Kevin; jeremy@xxxxxxxx
 Cc: xen devel; giamteckchoon@xxxxxxxxx; konrad.wilk@xxxxxxxxxx
 Subject: RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61
   > From: kevin.tian@xxxxxxxxx
 > To: jeremy@xxxxxxxx
 > CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx; giamteckchoon@xxxxxxxxx; konrad.wilk@xxxxxxxxxx
 > Date: Fri, 29 Apr 2011 08:19:44 +0800
 > Subject: RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61
 >
 > > From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx]
 > > Sent: Friday, April 29, 2011 7:29 AM
 > >
 > > On 04/25/2011 10:52 PM, Tian, Kevin wrote:
 > > >> From: MaoXiaoyun
 > > >> Sent: Monday, April 25, 2011 11:15 AM
 > > >>> Date: Fri, 15 Apr 2011 14:22:29 -0700
 > > >>> From: jeremy@xxxxxxxx
 > > >>> To: tinnycloud@xxxxxxxxxxx
 > > >>> CC: giamteckchoon@xxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx;
 > > >>> konrad.wilk@xxxxxxxxxx
 > > >>> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61
 > > >>>
 > > >>> On 04/15/2011 05:23 AM, MaoXiaoyun wrote:
 > > >>>> Hi:
 > > >>>>
 > > >>>> Could the crash related to this patch ?
 > > >>>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commitdi
 > > >>>> ff;h=45bfd7bfc6cf32f8e60bb91b32349f0b5090eea3
 > > >>>>
 > > >>>> Since now TLB state change to TLBSTATE_OK(mmu_context.h:40) is
 > > >>>> before cpumask_clear_cpu(line 49).
 > > >>>> Could it possible that right after execute line 40 of
 > > >>>> mmu_context.h, CPU revice IPI from other CPU to flush the mm, and
 > > >>>> when in interrupt, find the TLB state happened to be TLBSTATE_OK.
 > > >>>> Which conflicts.
 > > >>> Does reverting it help?
 > > >>>
 > > >>> J
 > > >>
 > > >> Hi Jeremy:
 > > >>
 > > >> The lastest test result shows the reverting didn't help.
 > > >> Kernel panic exactly at the same place in tlb.c.
 > > >>
 > > >> I have question about TLB state, from the stack,
 > > >> xen_do_hypervisor_callback-> xen_evtchn_do_upcall->...
 > > >> ->drop_other_mm_ref
 > > >>
 > > >> What cpu_tlbstate.state should be, could TLBSTATE_OK or
 > > TLBSTATE_LAZY all be possible?
 > > >> That is after a hypercall from userspace, state will be TLBSTATE_OK,
 > > and
 > > >> if from kernel space, state will be TLBSTATE_LAZE ?
 > > >>
 > > >> thanks.
 > > > it looks a bug in drop_other_mm_ref implementation, that current TLB
 > > > state should be checked before invoking leave_mm(). There's a window
 > > between below lines of code:
 > > >
 > > > <xen_drop_mm_ref>
 > > > /* Get the "official" set of cpus referring to our pagetable. */
 > > > if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) {
 > > > for_each_online_cpu(cpu) {
 > > > if (!cpumask_test_cpu(cpu,
 > > mm_cpumask(mm))
 > > > && per_cpu(xen_current_cr3, cpu) !=
 > > __pa(mm->pgd))
 > > > continue;
 > > > smp_call_function_single(cpu,
 > > drop_other_mm_ref, mm, 1);
 > > > }
 > > > return;
 > > > }
 > > >
 > > > there's chance that when smp_call_function_single is invoked, actual
 > > > TLB state has been updated in the other cpu. The upstream kernel patch
 > > > you referred to earlier just makes this bug exposed more easily. But
 > > > even without this patch, you may still suffer such issue which is why reverting
 > > the patch doesn't help.
 > > >
 > > > Could you try adding a check in drop_other_mm_ref?
 > > >
 > > > if (active_mm == mm && percpu_read(cpu_tlbstate.state) !=
 > > TLBSTATE_OK)
 > > > leave_mm(smp_processor_id());
 > > >
 > > > once the interrupted context has TLBSTATE_OK, it implicates that later
 > > > it will handle the TLB flush and thus no need for leave_mm from
 > > > interrupt handler, and that's the assumption of doing leave_mm.
 > >
 > > That seems reasonable. MaoXiaoyun, does it fix the bug for you?
 > >
 > > Kevin, could you submit this as a proper patch?
 > >
 >
 > I'm waiting for Xiaoyun's test result before submitting a proper patch, since this
 > part of logic is tricky and his test can make sure we don't overlook some corner
 > cases. :-)
 >
 
 I think it works. The test has been running over 70 hours successfully.
 My plan is run one week.
 
 Thanks.
 
 > Thanks
 > Kevin
 | 
 _______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 | 
 
| <Prev in Thread] | Current Thread | [Next in Thread> |  | 
[Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1860!, Joerg Stephan[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, (continued)
Re: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Jeremy Fitzhardinge
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61,
Tian, Kevin <=
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
[Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1872, Teck Choon Giam
 |  |  |