WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - idea

Hi guys,

This thread has gone quiet for a while and I was wondering if a solution
had been found?

I'm currently running the packaged version of Xen 4.0.1 in Debian
Squeeze and everything runs well, except for the random crashing when
using LVM.

I use LVM for the disk partitions, and use live snapshots as part of our
backup routine.  That is, create snapshot -> mount snapshot -> rsync ->
umount snapshot -> remove snapshot.

Cheers,

Dave Hunter.

On Mon, 2011-03-28 at 20:29 +0800, Teck Choon Giam wrote:
> On Mon, Mar 28, 2011 at 7:37 PM, Andreas Olsowski
> <andreas.olsowski@xxxxxxxxxxx> wrote:
> >
> >>  - turn on CONFIG_DEBUG_PAGEALLOC
> >>  - turn on CONFIG_DEBUG_LIST
> >>  - turn on CONFIG_DEBUG_KMEMLEAK
> >>  - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG
> >>  - turn on CONFIG_SLUB_DEBUG_ON
> >
> > After i enabled those options (i dont use SLUB, i use SLAB) i do no longer
> > encounter any errors.
> >
> > I completed 1000 loops of snapshot/mount/umoun/removesnapshot.
> 
> Did you try with just CONFIG_DEBUG_PAGEALLOC=y and leave the rest
> unchange of your config?  My testing all narrow down to
> CONFIG_DEBUG_PAGEALLOC=y to prevent this BUG.
> 
> >
> >
> > Without those options in 2.6.32.35 i hit a different bug earlier today:
> >
> > But you really have to be patient to see some output, because lvremove will
> > hang quite a while:
> > (a "while" beeing the a a roughly the time it takes for: wait 5 min for
> > error, leave office, get coffee, smoke cigarette, goto restroom, return to
> > office, finally see error)
> >
> > kernel: BUG: unable to handle kernel paging request
> > ...
> > kernel: RIP  [<ffffffff8100f2bf>] xen_set_pmd+0x2f/0xb0
> > syslog/dmesg output is attached as crash.2.6.32.35-xen_01 or available at:
> > http://pastebin.com/Ad8MhUzD
> 
> I hit this before:
> 
> # grep 'xen_set_pmd' /var/log/messages*
> /var/log/messages:Mar 27 09:31:14 xen05 kernel: IP:
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 09:31:14 xen05 kernel: RIP:
> e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 09:31:14 xen05 kernel: RIP
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 09:06:10 xen05 kernel: IP:
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 09:06:10 xen05 kernel: RIP:
> e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 09:06:10 xen05 kernel: RIP
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 15:18:57 xen05 kernel: IP:
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 15:18:57 xen05 kernel: RIP:
> e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 15:18:57 xen05 kernel: RIP
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages.1:Mar 23 11:00:16 xen05 kernel: IP:
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages.1:Mar 23 11:00:16 xen05 kernel: RIP:
> e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages.1:Mar 23 11:00:17 xen05 kernel: RIP
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> 
> But unable to reproduce when CONFIG_DEBUG_PAGEALLOC=y.
> 
> >
> > After that happened i did a kernel recompile without rebooting the machine
> > first and encoundeterd system_call_fastpath as last call once more as shown
> > in crash.2.6.32.35-xen_02 or http://pastebin.com/kB38W5mp
> 
> I hit this at least once but unable to when CONFIG_DEBUG_PAGEALLOC=y:
> 
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: ------------[ cut here
> ]------------
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: kernel BUG at
> arch/x86/xen/mmu.c:1872!
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: invalid opcode: 0000 [#1] SMP
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: last sysfs file:
> /sys/block/sdd/dev
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: CPU 2
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Modules linked in:
> ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
> xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_physdev iptable_filter
> ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6
> cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi
> dm_multipath scsi_dh video backlight output sbs sbshc power_meter
> hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp
> parport tg3 libphy sg ide_cd_mod cdrom serio_raw button tpm_tis tpm
> tpm_bios i2c_i801 i2c_core shpchp iTCO_wdt pcspkr dm_snapshot dm_zero
> dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod
> raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Pid: 5874, comm:
> lvcreate Not tainted 2.6.32.35-4.xen.pvops.choon.centos5 #1 PowerEdge
> 860
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RIP:
> e030:[<ffffffff8100cb5b>]  [<ffffffff8100cb5b>]
> pin_pagetable_pfn+0x53/0x59
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RSP:
> e02b:ffff8800303d1c28  EFLAGS: 00010282
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RAX: 00000000ffffffea
> RBX: 000000000003032d RCX: 0000000000000181
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RDX: 00000000deadbeef
> RSI: 00000000deadbeef RDI: 00000000deadbeef
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RBP: ffff8800303d1c48
> R08: 0000000000000968 R09: ffff880000000000
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: R10: 00000000deadbeef
> R11: ffff8800303d1d08 R12: 0000000000000003
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: R13: 000000000003032d
> R14: ffff880030360000 R15: 00007fd324a00000
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: FS:
> 00007fd327d2e710(0000) GS:ffff880028089000(0000)
> knlGS:0000000000000000
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: CS:  e033 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: CR2: 00000000004612f0
> CR3: 000000003a025000 CR4: 0000000000002660
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Process lvcreate (pid:
> 5874, threadinfo ffff8800303d0000, task ffff880030360000)
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Stack:
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  0000000000000000
> 00000000002027a9 000000013eb43318 000000000003032d
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: <0> ffff8800303d1c68
> ffffffff8100e07c ffff880032be05c0 ffff880032aa9928
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: <0> ffff8800303d1c78
> ffffffff8100e0af ffff8800303d1cb8 ffffffff810a4433
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Call Trace:
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff8100e07c>]
> xen_alloc_ptpage+0x64/0x69
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff8100e0af>]
> xen_alloc_pte+0xe/0x10
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a4433>]
> __pte_alloc+0x70/0xce
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a45d1>]
> handle_mm_fault+0x140/0x8b9
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a50c9>]
> __get_user_pages+0x37f/0x479
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a76ca>]
> __mlock_vma_pages_range+0xc0/0x16f
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff8131c03f>]
> ? _spin_unlock_irqrestore+0x11/0x13
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a78db>]
> mlock_fixup+0x162/0x199
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a7989>]
> do_mlockall+0x77/0x8d
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff81139016>]
> ? security_capable+0x27/0x29
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a7ce2>]
> sys_mlockall+0x8f/0xb9
> /var/log/messages:Mar 27 17:04:39 xen05 kernel:  [<ffffffff81012ac2>]
> system_call_fastpath+0x16/0x1b
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Code: 48 b8 ff ff ff
> ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2
> 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40
> f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RIP
> [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  RSP <ffff8800303d1c28>
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: ---[ end trace
> bf36c55d2ecd52e5 ]---
> 
> >
> >
> > Maybe this helps, but i think, if anything, this makes it worse as the debug
> > options actually supressed the problem that needs to be debugged.
> 
> True.  At least now we know/narrow down to just related to
> CONFIG_DEBUG_PAGEALLOC.  Maybe Konrad or Jeremy can have a closer look
> in the related codes... ...
> 
> Thanks.
> 
> Kindest regards,
> Giam Teck Choon
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel