Hi,
I use the same workaround to start domUs, since the issue mentioned bellow
ocured also during domU start via xendomains script.
Another scenario causing this issue (at least for me) are cron scripts. I was
unable to find out which one is responsible (crash every 2-3 days), but the
issue went away with disabled cronscripts.
Roman
On Wed, Jan 05, 2011 at 03:32:58AM +0800, Teck Choon Giam wrote:
> On Wed, Jan 5, 2011 at 2:40 AM, Christophe Saout <christophe@xxxxxxxx>wrote:
>
> > Hi once more,
> >
> >
> > > > It doesn't look like this has been resolved yet. Somewhere I saw a
> > > > request for the hypervisor message related to the pinning failure.
> > > >
> > > > Here it is:
> > > >
> > > > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp
> > 1000000000000000) for mfn 41114f (pfn d514f)
> > > > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f
> > > >
> > > > I have a bit of experience in debugging things, so if I can help
> > someone
> > > > with more information...
> > > [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60
> > > [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0
> > > [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10
> > > [<ffffffff810decde>] __pte_alloc+0x7e/0xf0
> > > [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930
> > > [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100
> > > [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380
> > > [<ffffffff81452b96>] do_page_fault+0x116/0x3e0
> > > [<ffffffff8144ff65>] page_fault+0x25/0x30
> >
> > > Additional information: This happened with a number of commands now.
> > > However, I am running a multipath setup and every time the crash
> > > seemed to be caused in the process context of the multipath daemon.
> > > I think the daemon listens to events from the device-mapper subsystem
> > > to watch for changes and the problem somehow arises from there, since
> > > on another machine with the same XEN/Dom0 version without such a
> > > daemon I never had any troubles with LVM.
> >
> > On further investigation is seems that most of the time the issue is not
> > caused by the daemon, but by the "multipath" tool, which is used a lot
> > by udev to identify properties of block devices.
> >
> > When I start stracing udevd (following forks), I'm not able to reproduce
> > the crash anymore. So I was hoping to find out what the process was
> > doing before the crash occurs, but since my attempts to trace the
> > process masks the bug, I can't. :(
> >
> > (without strace, the bug is very common, about every third "lvcreate"
> > command. Every lvcreate command triggers about 20 multipath
> > invocations)
> >
> >
> I am able to prevent that bug for 8 days (till now) by implementing sleep 5
> seconds then syc then sleep 5 seconds then sync repeating this for 60
> seconds while doing lvm snapshot for 10 domUs. I mean:
>
> 1. lvm snapshot domU (lvcreate)
> 2. mount lvm snapsho domUt
> 3. rsync to backup domU
> 4. umount lvm snapshot domU
> 5. remove lvm snapshot domU (lvremove)
> 6. sync (start countdown of 60 seconds and every 5 seconds interval doing
> sync)
> 7. sleep 5
> 8. sync
> 9. sleep 5
> 10. sync
> 11. sleep 5
> 12. sync
> .... until it hits 0 second countdown
> Then next domU repeat the cycle.
>
> Doing the above I am able to prevent such crash or bug to pop up for 8 days
> (8 such daily LVM snapshot backup for all domUs) which I posted in this
> thread.
>
> Thanks.
>
> Kindest regards,
> Giam Teck Choon
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
--
----------------------------------------------------------------------
,''`. [benco] | mailto: benco@xxxxxxx | silc: /msg benco
: :' : -------------------------------------------------------------
`. `' GPG publickey: http://www.acid.sk/pubkey.asc
`- KF = 0DF6 0592 74D2 F17A DACF A5C3 1720 CB7C F54C F429
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|