This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!

To: Christophe Saout <christophe@xxxxxxxx>
Subject: Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
From: Teck Choon Giam <giamteckchoon@xxxxxxxxx>
Date: Wed, 5 Jan 2011 03:32:58 +0800
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Delivery-date: Tue, 04 Jan 2011 11:34:28 -0800
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=O9ypfEFDgeRav1QQ3MGrJsjRb15N3Qzt1xvdPOK2LlU=; b=fFf8IWn/XiQZPRczq9UbRNnZajRjc3TyjZlxIKBzHu900iCPZ+pd/xS7GVk8VQGKnS /PUqOIVDBJxbt3lYB6hLNpR6khXAdifPJ2BVnsbdJ6mhlJ/FeuFQ07THM0AKaH6dovFi ZZMFAKqfl0/oCXiH/bX1EdSbit5AzKo3tFC88=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=MpHCIN3pIsb5gcqEI/AQ/9Lt4OENDvkxR1+aHfPdghbrQmVxOqU5dTlm2fk/cBgQth 1kpRtobbsm2ywp93rli8atDxbspNG/+HGJvhtXntbFaLQqq25/q4whimLsCVIQdZjkyS JCZNmiN2ZjbpR5EYmKh8j2L+dV1qd8m1YU1OA=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1294166410.24719.11.camel@xxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTi=Hwjooo43FiLPAAGzzOTG440ij_QsEqks6ECVv@xxxxxxxxxxxxxx> <20101227155314.GG3728@xxxxxxxxxxxx> <AANLkTikNvKGc78HQOMtVfi=Q+r8r92=svzZcMLQ2xojQ@xxxxxxxxxxxxxx> <20101228104256.GJ2754@xxxxxxxxxxx> <1294153817.24719.3.camel@xxxxxxxxxxxxxxxxxxxx> <1294154342.24719.6.camel@xxxxxxxxxxxxxxxxxxxx> <1294166410.24719.11.camel@xxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On Wed, Jan 5, 2011 at 2:40 AM, Christophe Saout <christophe@xxxxxxxx> wrote:
Hi once more,

> > It doesn't look like this has been resolved yet.  Somewhere I saw a
> > request for the hypervisor message related to the pinning failure.
> >
> > Here it is:
> >
> > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 41114f (pfn d514f)
> > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f
> >
> > I have a bit of experience in debugging things, so if I can help someone
> > with more information...
>  [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60
>  [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0
>  [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10
>  [<ffffffff810decde>] __pte_alloc+0x7e/0xf0
>  [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930
>  [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100
>  [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380
>  [<ffffffff81452b96>] do_page_fault+0x116/0x3e0
>  [<ffffffff8144ff65>] page_fault+0x25/0x30

> Additional information: This happened with a number of commands now.
> However, I am running a multipath setup and every time the crash
> seemed to be caused in the process context of the multipath daemon.
> I think the daemon listens to events from the device-mapper subsystem
> to watch for changes and the problem somehow arises from there, since
> on another machine with the same XEN/Dom0 version without such a
> daemon I never had any troubles with LVM.

On further investigation is seems that most of the time the issue is not
caused by the daemon, but by the "multipath" tool, which is used a lot
by udev to identify properties of block devices.

When I start stracing udevd (following forks), I'm not able to reproduce
the crash anymore.  So I was hoping to find out what the process was
doing before the crash occurs, but since my attempts to trace the
process masks the bug, I can't. :(

(without strace, the bug is very common, about every third "lvcreate"
command.  Every lvcreate command triggers about 20 multipath

I am able to prevent that bug for 8 days (till now) by implementing sleep 5 seconds then syc then sleep 5 seconds then sync repeating this for 60 seconds while doing lvm snapshot for 10 domUs.  I mean:

1. lvm snapshot domU (lvcreate)
2. mount lvm snapsho domUt
3. rsync to backup domU
4. umount lvm snapshot domU
5. remove lvm snapshot domU (lvremove)
6. sync (start countdown of 60 seconds and every 5 seconds interval doing sync)
7. sleep 5
8. sync
9. sleep 5
10. sync
11. sleep 5
12. sync
.... until it hits 0 second countdown
Then next domU repeat the cycle.

Doing the above I am able to prevent such crash or bug to pop up for 8 days (8 such daily LVM snapshot backup for all domUs) which I posted in this thread.


Kindest regards,
Giam Teck Choon
Xen-devel mailing list