xen-devel
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61
To: |
MaoXiaoyun <tinnycloud@xxxxxxxxxxx>, "jeremy@xxxxxxxx" <jeremy@xxxxxxxx> |
Subject: |
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 |
From: |
"Tian, Kevin" <kevin.tian@xxxxxxxxx> |
Date: |
Tue, 26 Apr 2011 16:31:51 +0800 |
Accept-language: |
en-US |
Acceptlanguage: |
en-US |
Cc: |
xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, "giamteckchoon@xxxxxxxxx" <giamteckchoon@xxxxxxxxx>, "konrad.wilk@xxxxxxxxxx" <konrad.wilk@xxxxxxxxxx> |
Delivery-date: |
Tue, 26 Apr 2011 01:33:12 -0700 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
In-reply-to: |
<BLU157-w5697DD78D0AA06E69BA116DA990@xxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<COL0-MC1-F14hmBzxHs00230882@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>, , <BLU157-w488E5FEBD5E2DBC0666EF1DAA70@xxxxxxx>, , <BLU157-w5025BFBB4B1CDFA7AA0966DAA90@xxxxxxx>, , <BLU157-w540B39FBA137B4D96278D2DAA90@xxxxxxx>, , <BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@xxxxxxxxxxxxxx>, , <BANLkTimkMgYNyANcKiZu5tJTL4==zdP3xg@xxxxxxxxxxxxxx>, , <BLU157-w116F1BB57ABFDE535C7851DAA80@xxxxxxx>, <4DA3438A.6070503@xxxxxxxx>, , <BLU157-w2C6CD57CEA345B8D115E8DAAB0@xxxxxxx>, , <BLU157-w36F4E0A7503A357C9DE6A3DAAB0@xxxxxxx>, , <20110412100000.GA15647@xxxxxxxxxxxx>, , <BLU157-w14B84A51C80B41AB72B6CBDAAD0@xxxxxxx>, , <BANLkTinNxLnJxtZD68ODLSJqafq0tDRPfw@xxxxxxxxxxxxxx>, , <BLU157-w30A1A208238A9031F0D18EDAAD0@xxxxxxx>, , <BLU157-w383D1A2536480BCD4C0E0EDAAD0@xxxxxxx>, <BLU157-w42DAD248C94153635E9749DAAC0@xxxxxxx>, <4DA8B715.9080508@xxxxxxxx>, <BLU157-w51A8A73D5A656542F9AB13DA960@xxxxxxx>, <625BA99ED14B2D499DC4E29D8138F1505C7F2C5185@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <BLU157-w5697DD78D0AA06E69BA116DA990@xxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
Thread-index: |
AcwD4DM+tz8sFpRaSWmdGxOlOAqAwAAC6nsQ |
Thread-topic: |
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 |
I think that should be fine. note a later check: /* If this cpu still has a stale cr3 reference, then make sure it has been flushed. */ if (percpu_read(xen_current_cr3) == __pa(mm->pgd)) load_cr3(swapper_pg_dir); this should ensure the stale TLB being flushed if this cpu is still in lazy mode. Thanks Kevin From: MaoXiaoyun [mailto:tinnycloud@xxxxxxxxxxx] Sent: Tuesday, April 26, 2011 3:05 PM To: Tian, Kevin; jeremy@xxxxxxxx Cc: xen devel; giamteckchoon@xxxxxxxxx; konrad.wilk@xxxxxxxxxx Subject: RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 Many thanks, Kevin. I agree on the race window. One thing more, In my understaning, the CPU who send out IPI message, will unpin the pagetable after receive all ACKS from other cpu, if the CPU who received IPI message, enter drop_other_mm_ref, and has TLBSTATE_OK, does nothing, will it possible it possible confronts with stale pagetable (that is unpinned by sender CPU)? So do we need flush tlb when its state is TBLSTATE_OK? if (active_mm == mm){ if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK) load_cr3(mm->pgd) else leave_mm(smp_processor_id()); } > From: kevin.tian@xxxxxxxxx > To: tinnycloud@xxxxxxxxxxx; jeremy@xxxxxxxx > CC: xen-devel@xxxxxxxxxxxxxxxxxxx; giamteckchoon@xxxxxxxxx; konrad.wilk@xxxxxxxxxx > Date: Tue, 26 Apr 2011 13:52:11 +0800 > Subject: RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 > > >From: MaoXiaoyun > >Sent: Monday, April 25, 2011 11:15 AM > >> Date: Fri, 15 Apr 2011 14:22:29 -0700 > >> From: jeremy@xxxxxxxx > >> To: tinnycloud@xxxxxxxxxxx > >> CC: giamteckchoon@xxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx > >> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61 > >> > >> On 04/15/2011 05:23 AM, MaoXiaoyun wrote: > >> > Hi: > >> > > >> > Could the crash related to this patch ? > >> > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commitdiff;h=45bfd7bfc6cf32f8e60bb91b32349f0b5090eea3 > >> > > >> > Since now TLB state change to TLBSTATE_OK(mmu_context.h:40) is before > >> > cpumask_clear_cpu(line 49). > >> > Could it possible that right after execute line 40 of mmu_context.h, > >> > CPU revice IPI from other CPU to > >> > flush the mm, and when in interrupt, find the TLB state happened to be > >> > TLBSTATE_OK. Which conflicts. > >> > >> Does reverting it help? > >> > >> J > > > >Hi Jeremy: > > > > The lastest test result shows the reverting didn't help. > > Kernel panic exactly at the same place in tlb.c. > > > > I have question about TLB state, from the stack, > > xen_do_hypervisor_callback-> xen_evtchn_do_upcall->... ->drop_other_mm_ref > > > > What cpu_tlbstate.state should be, could TLBSTATE_OK or TLBSTATE_LAZY all be possible? > > That is after a hypercall from userspace, state will be TLBSTATE_OK, and > > if from kernel space, state will be TLBSTATE_LAZE ? > > > > thanks. > > it looks a bug in drop_other_mm_ref implementation, that current TLB state should be checked > before invoking leave_mm(). There's a window between below lines of code: > > <xen_drop_mm_ref> > /* Get the "official" set of cpus referring to our pagetable. */ > if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) { > for_each_online_cpu(cpu) { > if (!cpumask_test_cpu(cpu, mm_cpumask(mm)) > && per_cpu(xen_current_cr3, cpu) != __pa(mm->pgd)) > continue; > smp_call_function_single(cpu, drop_other_mm_ref, mm, 1); > } > return; > } > > there's chance that when smp_call_function_single is invoked, actual TLB state has been > updated in the other cpu. The upstream kernel patch you referred to earlier just makes > this bug exposed more easily. But even without this patch, you may still suffer such issue > which is why reverting the patch doesn't help. > > Could you try adding a check in drop_other_mm_ref? > > if (active_mm == mm && percpu_read(cpu_tlbstate.state) != TLBSTATE_OK) > leave_mm(smp_processor_id()); > > once the interrupted context has TLBSTATE_OK, it implicates that later it will handle > the TLB flush and thus no need for leave_mm from interrupt handler, and that's the > assumption of doing leave_mm. > > Thanks > Kevin |
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, (continued)
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- [Xen-devel] Re: Kernel BUG at arch/x86/mm/tlb.c:61, Teck Choon Giam
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- [Xen-devel] Re: Kernel BUG at arch/x86/mm/tlb.c:61, Jeremy Fitzhardinge
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
- RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61,
Tian, Kevin <=
- Re: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Jeremy Fitzhardinge
- RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
- RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
- RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
- [Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1872, Teck Choon Giam
|
|
|