Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!

To:	Pasi Kärkkäinen <pasik@xxxxxx>
Subject:	Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
From:	Teck Choon Giam <giamteckchoon@xxxxxxxxx>
Date:	Wed, 5 Jan 2011 22:56:45 +0800
Cc:	xen-devel@xxxxxxxxxxxxxxxxxxx, Christophe Saout <christophe@xxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Delivery-date:	Wed, 05 Jan 2011 06:57:45 -0800
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=x+OmKqLIIL/3sRki5mxsVadtcQXWzr82ZHjjA0qD0lQ=; b=FqqU17f5zM1pyds4m8isAdxZQnQlWuIpwSZ4FIo7hvn8aNzho8fr1jT7ScBZn7a8av g//GLefWx4aCF0/oVxF3roywVZzS/DJfJJfg4CipT/OuUbdQ0GGIif4gJEEkl6s+Aees fJY7+YDcGl81qY/y0LpfQWa4rBtDjwKLMaLCc=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=idNFz1GZrvFpkLv3Xnu8tk4tMavf9Ihll/9+tmdGEpF+9+41LRrQDpoH+pUrqLZ0Ml oqXVVJW9skP3KuDwvgvbsOIKAMDSM//aqxj8/IJwiGoLVq6ez2tOQFZGevNnsrYORKlE HsLvB5XPNV6y26eiGdX22FE6NMWkxX3Ll681I=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<20110105105124.GZ2754@xxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<AANLkTi=Hwjooo43FiLPAAGzzOTG440ij_QsEqks6ECVv@xxxxxxxxxxxxxx> <20101227155314.GG3728@xxxxxxxxxxxx> <AANLkTikNvKGc78HQOMtVfi=Q+r8r92=svzZcMLQ2xojQ@xxxxxxxxxxxxxx> <20101228104256.GJ2754@xxxxxxxxxxx> <1294153817.24719.3.camel@xxxxxxxxxxxxxxxxxxxx> <20110105105124.GZ2754@xxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On Wed, Jan 5, 2011 at 6:51 PM, Pasi Kärkkäinen <pasik@xxxxxx> wrote:

On Tue, Jan 04, 2011 at 04:10:17PM +0100, Christophe Saout wrote:
> Hello thread,
>
> hijacking this thread since I am running into the same issue on a new
> machine.
>
> > > > While doing LVM snapshot for migration and get the following:
> > > >
> > > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------
> > > > Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860!
> > > > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP
> > > > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev
> > > > Dec 26 15:58:29 xen01 kernel: CPU 0
> > > > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE
> >
> > It would be very good to track this down and get it fixed..
> > hopefully you're able to help a bit and try some things to debug it.
> >
> > Konrad maybe has some ideas to try..
>
> I am seeing this with an lvcreate here, so I guess it's somehow related
> to device-mapper stuff in general.
>

Sorry if this was already stated earlier..
what are the exact steps to reproduce? I could try reproducing it at some point..

-- Pasi

Did you try my posted script? Provided you have existing LV for domUs in your VG which can be easily created if not there.

The idea is to create snapshot, mount it, umount it, remove snapshot and repeat this cycle in loop will catch this BUG!!!

Here are the latest crash with serial console output as it doesn't take long with sh test.sh loop 100 to produce this:

EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
(XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1f7e29 (pfn 25cb4)
(XEN) mm.c:2733:d0 Error while pinning mfn 1f7e29
------------[ cut here ]------------
kernel BUG at arch/x86/xen/mmu.c:1860!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:05:05.0/local_cpus
CPU 0
Modules linked in: ext4 jbd2 crc16 gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_]
Pid: 6526, comm: dmsetup Not tainted 2.6.32.27-0.xen.pvops.choon.centos5 #1 PowerEdge 860
RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59
RSP: e02b:ffff88001d8dfdc8 EFLAGS: 00010282
RAX: 00000000ffffffea RBX: 0000000000025cb4 RCX: 000000000000012e
RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef
RBP: ffff88001d8dfde8 R08: 00000000000005a0 R09: ffff880000000000
R10: 00000000deadbeef R11: 0000003db6814e00 R12: 0000000000000003
R13: 0000000000025cb4 R14: ffff88002ffe8440 R15: 0000003db7616250
FS: 00007fb54068b710(0000) GS:ffff88002804f000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003db7616250 CR3: 000000002ff88000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dmsetup (pid: 6526, threadinfo ffff88001d8de000, task ffff88002ffe8440)
Stack:
0000000000000000 00000000001f7e29 000000013f009e18 0000000000025cb4
<0> ffff88001d8dfe08 ffffffff8100e07c ffff88001dc3d040 ffff88001dc6cdd8
<0> ffff88001d8dfe18 ffffffff8100e0af ffff88001d8dfe58 ffffffff810a402f
Call Trace:
[<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69
[<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10
[<ffffffff810a402f>] __pte_alloc+0x70/0xce
[<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9
[<ffffffff8131be4d>] do_page_fault+0x252/0x2e2
[<ffffffff81319dd5>] page_fault+0x25/0x30
Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 8
RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59
RSP <ffff88001d8dfdc8>
---[ end trace b0a2643219f652eb ]---
BUG: soft lockup - CPU#0 stuck for 61s! [dmsetup:6526]
Modules linked in: ext4 jbd2 crc16 gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_]
CPU 0:
Modules linked in: ext4 jbd2 crc16 gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_]
Pid: 6526, comm: dmsetup Tainted: G D 2.6.32.27-0.xen.pvops.choon.centos5 #1 PowerEdge 860
RIP: e030:[<ffffffff813199d3>] [<ffffffff813199d3>] _spin_lock+0x19/0x20
RSP: e02b:ffff88001d8dfa68 EFLAGS: 00000297
RAX: 0000000000000023 RBX: 0000000025f91000 RCX: 0000000000000004
RDX: 0000000000000022 RSI: 0000000000000004 RDI: ffff88001dc3d0c0
RBP: ffff88001d8dfa68 R08: 0000000000000000 R09: ffffffff816dd100
R10: ffff88003e7424c8 R11: 0000000000000020 R12: ffff88001dc3d040
R13: 0000000000000004 R14: ffff88001dc3d0a0 R15: ffffffff816dd100
FS: 00007fb54068b710(0000) GS:ffff88002804f000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003db7616250 CR3: 0000000001001000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff811660dd>] ? free_cpumask_var+0x9/0xb
[<ffffffff8100dde1>] xen_exit_mmap+0x199/0x1d7
[<ffffffff810a8137>] exit_mmap+0x5f/0x14b
[<ffffffff81048648>] mmput+0x46/0xb2
[<ffffffff8104c552>] exit_mm+0xfd/0x108
[<ffffffff8100f799>] ? xen_irq_enable_direct_end+0x0/0x7
[<ffffffff8104d7ee>] do_exit+0x1f3/0x67b
[<ffffffff8131a908>] oops_end+0xba/0xc2
[<ffffffff810163a1>] die+0x55/0x5e
[<ffffffff8131a192>] do_trap+0x110/0x11f
[<ffffffff810142c8>] do_invalid_op+0x97/0xa0
[<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59
[<ffffffff810138bb>] invalid_op+0x1b/0x20
[<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59
[<ffffffff8100cb57>] ? pin_pagetable_pfn+0x4f/0x59
[<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69
[<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10
[<ffffffff810a402f>] __pte_alloc+0x70/0xce
[<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9
[<ffffffff8131be4d>] do_page_fault+0x252/0x2e2
[<ffffffff81319dd5>] page_fault+0x25/0x30
Kernel panic - not syncing: softlockup: hung tasks
Pid: 6526, comm: dmsetup Tainted: G D 2.6.32.27-0.xen.pvops.choon.centos5 #1
Call Trace:
<IRQ> [<ffffffff8104aa97>] panic+0xa0/0x15f
[<ffffffff81319dd5>] ? page_fault+0x25/0x30
[<ffffffff8101640f>] ? show_trace_log_lvl+0x4c/0x58
[<ffffffff8101642b>] ? show_trace+0x10/0x12
[<ffffffff81011755>] ? show_regs+0x44/0x48
[<ffffffff8107f202>] softlockup_tick+0x173/0x182
[<ffffffff810539bf>] run_local_timers+0x18/0x1a
[<ffffffff81053bde>] update_process_times+0x30/0x54
[<ffffffff81068821>] tick_sched_timer+0x70/0x99
[<ffffffff8105f52e>] __run_hrtimer+0x53/0xb3
[<ffffffff8105f772>] hrtimer_interrupt+0xae/0x192
[<ffffffff8100f3a3>] xen_timer_interrupt+0x37/0x181
[<ffffffff81082898>] ? check_for_new_grace_period+0x97/0xa5
[<ffffffff811c870f>] ? unmask_evtchn+0x34/0xd6
[<ffffffff8108318c>] ? __rcu_process_callbacks+0xf2/0x2ae
[<ffffffff8107f708>] handle_IRQ_event+0x2d/0xb7
[<ffffffff81081079>] handle_percpu_irq+0x3c/0x69
[<ffffffff811c8640>] __xen_evtchn_do_upcall+0xe1/0x168
[<ffffffff811c92d1>] xen_evtchn_do_upcall+0x2e/0x41
[<ffffffff81013c7e>] xen_do_hypervisor_callback+0x1e/0x30
<EOI> [<ffffffff813199d3>] ? _spin_lock+0x19/0x20
[<ffffffff811660dd>] ? free_cpumask_var+0x9/0xb
[<ffffffff8100dde1>] ? xen_exit_mmap+0x199/0x1d7
[<ffffffff810a8137>] ? exit_mmap+0x5f/0x14b
[<ffffffff81048648>] ? mmput+0x46/0xb2
[<ffffffff8104c552>] ? exit_mm+0xfd/0x108
[<ffffffff8100f799>] ? xen_irq_enable_direct_end+0x0/0x7
[<ffffffff8104d7ee>] ? do_exit+0x1f3/0x67b
[<ffffffff8131a908>] ? oops_end+0xba/0xc2
[<ffffffff810163a1>] ? die+0x55/0x5e
[<ffffffff8131a192>] ? do_trap+0x110/0x11f
[<ffffffff810142c8>] ? do_invalid_op+0x97/0xa0
[<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59
[<ffffffff810138bb>] ? invalid_op+0x1b/0x20
[<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59
[<ffffffff8100cb57>] ? pin_pagetable_pfn+0x4f/0x59
[<ffffffff8100e07c>] ? xen_alloc_ptpage+0x64/0x69
[<ffffffff8100e0af>] ? xen_alloc_pte+0xe/0x10
[<ffffffff810a402f>] ? __pte_alloc+0x70/0xce
[<ffffffff810a41cd>] ? handle_mm_fault+0x140/0x8b9
[<ffffffff8131be4d>] ? do_page_fault+0x252/0x2e2
[<ffffffff81319dd5>] ? page_fault+0x25/0x30
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.

Thanks.

Kindest regards,
Giam Teck Choon

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!