|   xen-devel
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 
| To: | <giamteckchoon@xxxxxxxxx> |  
| Subject: | [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 |  
| From: | MaoXiaoyun <tinnycloud@xxxxxxxxxxx> |  
| Date: | Thu, 14 Apr 2011 19:16:37 +0800 |  
| Cc: | jeremy@xxxxxxxx, xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>,	konrad.wilk@xxxxxxxxxx |  
| Delivery-date: | Thu, 14 Apr 2011 04:17:35 -0700 |  
| Envelope-to: | www-data@xxxxxxxxxxxxxxxxxxx |  
| Importance: | Normal |  
| In-reply-to: | <BLU157-w30A1A208238A9031F0D18EDAAD0@xxxxxxx> |  
| List-help: | <mailto:xen-devel-request@lists.xensource.com?subject=help> |  
| List-id: | Xen developer discussion <xen-devel.lists.xensource.com> |  
| List-post: | <mailto:xen-devel@lists.xensource.com> |  
| List-subscribe: | <http://lists.xensource.com/mailman/listinfo/xen-devel>,	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |  
| List-unsubscribe: | <http://lists.xensource.com/mailman/listinfo/xen-devel>,	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |  
| References: | <COL0-MC1-F14hmBzxHs00230882@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>,	<BLU157-w488E5FEBD5E2DBC0666EF1DAA70@xxxxxxx>,	<BLU157-w5025BFBB4B1CDFA7AA0966DAA90@xxxxxxx>,	<BLU157-w540B39FBA137B4D96278D2DAA90@xxxxxxx>,	<BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@xxxxxxxxxxxxxx>,	<BANLkTimkMgYNyANcKiZu5tJTL4==zdP3xg@xxxxxxxxxxxxxx>,	<BLU157-w116F1BB57ABFDE535C7851DAA80@xxxxxxx>,	<4DA3438A.6070503@xxxxxxxx>,	<BLU157-w2C6CD57CEA345B8D115E8DAAB0@xxxxxxx>,	<BLU157-w36F4E0A7503A357C9DE6A3DAAB0@xxxxxxx>,	<20110412100000.GA15647@xxxxxxxxxxxx>,	<BLU157-w14B84A51C80B41AB72B6CBDAAD0@xxxxxxx>,	<BANLkTinNxLnJxtZD68ODLSJqafq0tDRPfw@xxxxxxxxxxxxxx>,	<BLU157-w30A1A208238A9031F0D18EDAAD0@xxxxxxx> |  
| Sender: | xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |  
| | Hi: 
 As I go through the code.
 From tlb.c:60, it looks like it  cpu_tlbstate.state  is TLBSTATE_OK,
 which indicates in user space, but the caller, in mmu.c:1512,
 (active_mm == mm) indicates kernel space, that the conflict.
 
 Well, the panic CPU is processing IPI interrupt, could it be something wrong
 with CPU mask?
 
 thanks.
 
 ======arch/x86/mm/tlb.c===
 58 void leave_mm(int cpu)
 59 {
 60 <+++if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
 61 <+++<+++BUG();
 62
  <+++cpumask_clear_cpu(cpu,
 63 <+++<+++<+++  mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
 64 <+++load_cr3(swapper_pg_dir);
 65 }
 66 EXPORT_SYMBOL_GPL(leave_mm);
 67
 
 ///arch/x86/xen/mmu.c
 
 1502 #ifdef CONFIG_SMP
 1503 /* Another cpu may still have their %cr3 pointing at the pagetable, so
 1504    we need to repoint it somewhere else before we can unpin it. */
 1505 static void drop_other_mm_ref(void *info)
 1506 {
 1507 <+++struct mm_struct *mm = info;
 1508 <+++struct mm_struct *active_mm;
 1509
 1510 <+++active_mm = percpu_read(cpu_tlbstate.active_mm);
 1511
 1512 <+++if (active_mm == mm)
 1513 <+++<+++leave_mm(smp_processor_id());                                                                                &
 nbsp;
 1514
 1515 <+++/* If this cpu still has a stale cr3 reference, then make sure
 1516 <+++   it has been flushed. */
 1517 <+++if (percpu_read(xen_current_cr3) == __pa(mm->pgd))
 1518 <+++<+++load_cr3(swapper_pg_dir);
 1519 }
 
 
 
 > Date: Thu, 14 Apr 2011 15:26:14 +0800
 > Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61
 > From: giamteckchoon@xxxxxxxxx
 > To: tinnycloud@xxxxxxxxxxx
 > CC: xen-devel@xxxxxxxxxxxxxxxxxxx; jeremy@xxxxxxxx; konrad.wilk@xxxxxxxxxx
 >
 > 2011/4/14 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>:
 > > Hi:
 > >
 > >       I've done test with "cpuidle=0 cpufreq=none", two machine crashed.
 > >
 > > blktap_sysfs_destroy
 > > blktap_sysfs_destroy
 > > blktap_sysfs_create: adding attributes for dev ffff8800ad581000
 > > blktap_sysfs_create: adding attributes for dev ffff8800a48e3e00
 > > ------------[ cut here ]------------
 > > kernel BUG at arch/x86/mm/tlb.c:61!
 > > invalid opcode: 0000 [#1] SMP
 > > last&
 nbsp;sysfs file: /sys/block/tapdeve/dev
 > > CPU 0
 > > Modules linked in: 8021q garp blktap xen_netback xen_blkback blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_ms
 > > ghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc lp parport ses enclosure snd_seq_dummy bnx2
 > > serio_raw snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer i2c_core snd iT
 > > CO_wdt pata_acpi soundcore iTCO_vendor_
 > > support ata_generic snd_page_alloc pcspkr ata_piix shpchp mptsas mptscsih mptbase [last unloa
 > > ded: freq_t
 able]
 > > Pid: 8022, comm: khelper Not tainted 2.6.32.36xen #1 Tecal RH2285
 > > RIP: e030:[<ffffffff8103a3cb>]  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
 > > RSP: e02b:ffff88002803ee48  EFLAGS: 00010046
 > > RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81675980
 > > RDX: ffff88002803ee78 RSI: 0000000000000000 RDI: 0000000000000000
 > > RBP: ffff88002803ee48 R08: ffff8800a4929000 R09: dead000000200200
 > > R10: dead000000100100 R11: ffffffff81447292 R12: ffff88012ba07b80
 > > R13: ffff880028046020 R14: 00000000000004fb R15: 0000000000000000
 > > FS:  00007f410af416e0(0000) GS:ffff88002803b000(0000) knlGS:0000000000000000
 > > CS:  e033&nb
 sp;DS: 0000 ES: 0000 CR0: 000000008005003b
 > > CR2: 0000000000469000 CR3: 00000000ad639000 CR4: 0000000000002660
 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 > > Process khelper (pid: 8022, threadinfo ffff8800a4846000, task ffff8800a9ed0000)
 > > Stack:
 > >  ffff88002803ee68 ffffffff8100e4a4 0000000000000001 ffff880097de3b88
 > > <0> ffff88002803ee98 ffffffff81087224 ffff88002803ee78 ffff88002803ee78
 > > <0> ffff88015f808180 00000000000004fb ffff88002803eea8 ffffffff810100e8
 > > Call Trace:
 > >  <IRQ>
 > >  [<ffffffff8100e4a4>] drop_other_mm_ref+0x2a/0x5
 3
 > >  [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
 > >  [<ffffffff810100e8>] xen_call_function_single_interrupt+0x13/0x28
 > >  [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
 > >  [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
 > >  [<ffffffff8128c1a8>] __xen_evtchn_do_upcall+0x1ab/0x27d
 > >  [<ffffffff8128dcf9>] xen_evtchn_do_upcall+0x33/0x46
 > >  [<ffffffff81013efe>] xen_do_hypervisor_callback+0x1e/0x30
 > >  <EOI>
 > >  [<ffffffff81447292>] ? _spin_unlock_irqrestore+0x15/0x17
 > >  [<ffffffff8100f8af>] ? xen_restore_fl_direct_end+0x0/0x1
 > >  [<ffffffff81113f75>] ? flush_old_exec+0x3ac/0x500
 > >  [<ffffffff81150dc9>] ? load_elf_binary+
 0x0/0x17ef
 > >  [<ffffffff81150dc9>] ? load_elf_binary+0x0/0x17ef
 > >  [<ffffffff81151161>] ? load_elf_binary+0x398/0x17ef
 > >  [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
 > >
 > > [<ffffffff811f463c>] ? process_measurement+0xc0/0xd7
 > >  [<ffffffff81150dc9>] ? load_elf_binary+0x0/0x17ef
 > >  [<ffffffff81113098>] ? search_binary_handler+0xc8/0x255
 > >  [<ffffffff81114366>] ? do_execve+0x1c3/0x29e
 > >  [<ffffffff8101155d>] ? sys_execve+0x43/0x5d
 > >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 > >  [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
 > >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 > >  [<ffffffff8100
 f8af>] ? xen_restore_fl_direct_end+0x0/0x1
 > >  [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
 > >  [<ffffffff81013daa>] ? child_rip+0xa/0x20
 > >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 > >  [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
 > >  [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
 > >  [<ffffffff81013da0>] ? c
 > > hild_rip+0x0/0x20
 > > Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 
 65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8
 > > RIP  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
 > >  RSP <ffff88002803ee48>
 > > ---[ end trace 1522f17fdfc9162d ]---
 > > Kernel panic - not syncing: Fatal exception in interrupt
 > > Pid: 8022, comm: khelper Tainted: G      D    2.6.32.36xen #1
 > > Call Trace:
 > >  <IRQ>  [<ffffffff8105682e>] panic+0xe0/0x19a
 > >  [<ffffffff8144006a>] ? init_amd+0x296/0x37a
 > >  [<ffffffff8100f169>] ? xen_force_evtchn_callback+0xd/0xf
 > >  [<ffffffff8100f8c2>] ? check_events+0x12/0x20
 > >  [<ffffffff8100f8af>] ? xen_restore_fl_direct_end+0x0/0x1
 > >  [<ffffffff81056487>] ? print_oops_end_marker+0x23/0x25
 > >  [<ffffffff81448165>] oops_end+0xb6/0xc6
 > >  [<ffffffff810166e5>] die+0x5a/0x63
 > >  [<ffffffff81447a3c>] do_trap+0x115/0x124
 > >  [<ffffffff810148e6>] do_invalid_op+0x9c/0xa5
 > >  [<ffffffff8103a3cb>] ? leave_mm+0x15/0x46
 > >  [<ffffffff8100f6e6>] ? xen_clocksource_read+0x21/0x23
 > >  [<ffffffff8100f258>] ? HYPERVISOR_vcpu_op+0xf/0x11
 > >  [<ffffffff8100f753>] ? xen_vcpuop_set_next_event+0x52/0x67
 > >
 ;  [<ffffffff81013b3b>] invalid_op+0x1b/0x20
 > >  [<ffffffff81447292>] ? _spin_unlock_irqrestore+0x15/0x17
 > >  [<ffffffff8103a3cb>] ? leave_mm+0x15/0x46
 > >  [<ffffffff8100e4a4>] drop_other_mm_ref+0x2a/0x53
 > >  [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
 > >  [<ffffffff810100e8>] xen_call_function_single_interrupt+0x13/0x28
 > >  [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
 > >  [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
 > >  [<ffffffff8128c1a8>] __xen_evtchn_do_upcall+0x1ab/0x27d
 > >  [<ffffffff8128dcf9>] xen_evtchn_do_upcall+0x33/0x46
 > >  [<ffffffff81013efe>] xen_do_hypervisor_callback+0x1e/0x30
 > >  <EOI>  [<ffffffff81447292>
 ] ? _spin_unlock_irqrestore+0x15/0x17
 > >  [<ffffffff8100f8af>] ? xen_restore_fl_direct_end+0x0/0x1
 > >  [<ffffffff81113f75>] ? flush_old_exec+0x3ac/0x500
 > >  [<ffffffff81150dc9>] ? load_elf_binary+0x0/0x17ef
 > >  [<ffffffff81150dc9>] ? load_elf_binary+0x0/0x17ef
 > >  [<ffffffff81151161>] ? load_elf_binary+0x398/0x17ef
 > >  [<ffffffff81042fcf>] ? need_resched+0x23/0x
 > > 2d
 > >  [<ffffffff811f463c>] ? process_measurement+0xc0/0xd7
 > >  [<ffffffff81150dc9>] ? load_elf_binary+0x0/0x17ef
 > >  [<ffffffff81113098>] ? search_binary_handler+0xc8/0x255
 > >  [<ffffffff81114366>] ? do_execve+0x1c3/0x29e
 > >  [<ffffffff8101155d>] ? sys_execve+
 0x43/0x5d
 > >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 > >  [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
 > >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 > >  [<ffffffff8100f8af>] ? xen_restore_fl_direct_end+0x0/0x1
 > >  [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
 > >  [<ffffffff81013daa>] ? child_rip+0xa/0x20
 > >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 > >  [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
 > >  [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
 > >  [<ffffffff81013da0>] ? child_rip+0x0/0x20
 > > (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
 >
  >
 > >> Date: Tue, 12 Apr 2011 06:00:00 -0400
 > >> From: konrad.wilk@xxxxxxxxxx
 > >> To: tinnycloud@xxxxxxxxxxx
 > >> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; giamteckchoon@xxxxxxxxx;
 > >> jeremy@xxxxxxxx
 > >> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61
 > >>
 > >> On Tue, Apr 12, 2011 at 05:11:51PM +0800, MaoXiaoyun wrote:
 > >> >
 > >> > Hi :
 > >> >
 > >> > We are using pvops kernel 2.6.32.36 + xen 4.0.1, but confront a kernel
 > >> > panic bug.
 > >> >
 > >> > 2.6.32.36 Kernel:
 > >> > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=bb1a15e55ec665a64c8a9c6bd699b1f16ac01ff4
 > >> > Xen 4.0.1 http://xenbits.xen.org/hg/xen-4.0-testing.hg/rev/b536ebfba183
 > >> >
 > >> > Our test is simple, 24 HVMS(Win2003 ) on a
  single host, each HVM loopes
 > >> > in restart every 15minutes.
 > >>
 > >> What is the storage that you are using for your guests? AoE? Local disks?
 > >>
 > >> > About 17 machines are invovled in the test, after 10 hours run, one
 > >> > confrontted a crash at arch/x86/mm/tlb.c:61
 > >> >
 > >> > Currently I am trying "cpuidle=0 cpufreq=none" tests based on Teck's
 > >> > suggestion.
 > >> >
 > >> > Any comments, thanks.
 > >> >
 
 
 
 
 | 
 _______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 | 
 
| <Prev in Thread] | Current Thread | [Next in Thread> |  | 
[Xen-devel] RE: kernel BUG at arch/x86/xen/mmu.c:1872, (continued)
[Xen-devel] Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
[Xen-devel] Re: Kernel BUG at arch/x86/mm/tlb.c:61, Konrad Rzeszutek Wilk
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
[Xen-devel] Re: Kernel BUG at arch/x86/mm/tlb.c:61, Teck Choon Giam
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61,
MaoXiaoyun <=
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
[Xen-devel] Re: Kernel BUG at arch/x86/mm/tlb.c:61, Jeremy Fitzhardinge
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
Re: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Jeremy Fitzhardinge
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, Tian, Kevin
RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61, MaoXiaoyun
 |  |  |