Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
|I am able to prevent that bug for 8 days (till now) by implementing sleep 5 seconds then syc then sleep 5 seconds then sync repeating this for 60 seconds while doing lvm snapshot for 10 domUs. I mean:
On Wed, Jan 5, 2011 at 2:40 AM, Christophe Saout <christophe@xxxxxxxx>
Hi once more,
> > It doesn't look like this has been resolved yet. Somewhere I saw a
> > request for the hypervisor message related to the pinning failure.
> > Here it is:
> > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 41114f (pfn d514f)
> > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f
> > I have a bit of experience in debugging things, so if I can help someone
> > with more information...
> [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60On further investigation is seems that most of the time the issue is not
> [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0
> [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10
> [<ffffffff810decde>] __pte_alloc+0x7e/0xf0
> [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930
> [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100
> [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380
> [<ffffffff81452b96>] do_page_fault+0x116/0x3e0
> [<ffffffff8144ff65>] page_fault+0x25/0x30
> Additional information: This happened with a number of commands now.
> However, I am running a multipath setup and every time the crash
> seemed to be caused in the process context of the multipath daemon.
> I think the daemon listens to events from the device-mapper subsystem
> to watch for changes and the problem somehow arises from there, since
> on another machine with the same XEN/Dom0 version without such a
> daemon I never had any troubles with LVM.
caused by the daemon, but by the "multipath" tool, which is used a lot
by udev to identify properties of block devices.
When I start stracing udevd (following forks), I'm not able to reproduce
the crash anymore. So I was hoping to find out what the process was
doing before the crash occurs, but since my attempts to trace the
process masks the bug, I can't. :(
(without strace, the bug is very common, about every third "lvcreate"
command. Every lvcreate command triggers about 20 multipath
1. lvm snapshot domU (lvcreate)
2. mount lvm snapsho domUt
3. rsync to backup domU
4. umount lvm snapshot domU
5. remove lvm snapshot domU (lvremove)
6. sync (start countdown of 60 seconds and every 5 seconds interval doing sync)
7. sleep 5
9. sleep 5
11. sleep 5
.... until it hits 0 second countdown
Then next domU repeat the cycle.
Doing the above I am able to prevent such crash or bug to pop up for 8 days (8 such daily LVM snapshot backup for all domUs) which I posted in this thread.
Giam Teck Choon
Xen-devel mailing list