WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU cau

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU causes OOPS and instability
From: Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Date: Mon, 30 Aug 2010 12:13:59 -0700
Cc: "Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>, Scott Garron <xen-devel@xxxxxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 30 Aug 2010 12:14:55 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C7BE1C6.5030602@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C7864BB.1010808@xxxxxxxxxxxxxxxxxx> <4C7BE1C6.5030602@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Mon, 2010-08-30 at 12:52 -0400, Jeremy Fitzhardinge wrote:

> > After that transition, here's the problem I've been wrestling with and
> > can't seem to find a solution for:  It seems like any time I start
> > manipulating a volume group to add or remove a snapshot of a logical
> > volume that's used as a disk for a running HVM guest, new calls to LVM2
> > and/or Xen's storage locks up and spins forever.

Are you sure it's spinning or just freezing?

>   The first time I ran
> > across the problem, there was no indication of a problem other than
> > any command I ran that handled anything to do with LVM would freeze and
> > be completely unable to be signaled to do anything.  

> In other words, no
> > error messages, nothing in dmesg, nothing in syslog...  The commands
> > would just freeze and not return.  That was with the 2.6.31.14 kernel
> > that is what's currently retrieved if you checkout xen-4.0-testing.hg
> > and just do a make dist.

Can you try find the minimum number of steps necessary to get into that
state and try sth like $ ps -eH -owchan,nwchan,cmd

Also, is that sequence completely reproducible or does the behaviour
change evertime? Just trying if there's some point where deadlock ends
and corruption like the one quoted below would start.

Daniel

> > I have since checked out and compiled 2.6.32.18 that comes from doing
> > git checkout -b xen/stable-2.6.32.x origin/xen/stable-2.6.32.x, as
> > described on the Wiki page here:
> > http://wiki.xensource.com/xenwiki/XenParavirtOps
> >
> > If I run that kernel for dom0, but continue to use 2.6.31.14 for the
> > paravirtualized domUs, everything works fine until I try to manipulate
> > the snapshots of the HVM volumes.  Today, I got this kernel OOPS:
> 
> That's definitely bad.  Something is causing udevd to end up with bad
> pagetables which are causing a kernel crash on exit.  I'm not sure if
> its *the* udevd or some transient child, but either way its bad.
> 
> Any thoughts on this Daniel?
> 
> >
> > ---------------------------
> >
> > [78084.004530] BUG: unable to handle kernel paging request at
> > ffff8800267c9010
> > [78084.004710] IP: [<ffffffff810382ff>] xen_set_pmd+0x24/0x44
> > [78084.004886] PGD 1002067 PUD 1006067 PMD 217067 PTE 80100000267c9065
> > [78084.005065] Oops: 0003 [#1] SMP
> > [78084.005234] last sysfs file:
> > /sys/devices/virtual/block/dm-32/removable
> > [78084.005256] CPU 1
> > [78084.005256] Modules linked in: tun xt_multiport fuse dm_snapshot
> > nf_nat_tftp nf_conntrack_tftp nf_nat_pptp nf_conntrack_pptp
> > nf_conntrack_proto_gre nf_nat_proto_gre ntfs parport_pc parport k8temp
> > floppy forcedeth [last unloaded: scsi_wait_scan]
> > [78084.005256] Pid: 22814, comm: udevd Tainted: G        W  2.6.32.18 #1
> > H8SMI
> > [78084.005256] RIP: e030:[<ffffffff810382ff>]  [<ffffffff810382ff>]
> > xen_set_pmd+0x24/0x44
> > [78084.005256] RSP: e02b:ffff88002e2e1d18  EFLAGS: 00010246
> > [78084.005256] RAX: 0000000000000000 RBX: ffff8800267c9010 RCX:
> > ffff880000000000
> > [78084.005256] RDX: dead000000100100 RSI: 0000000000000000 RDI:
> > 0000000000000004
> > [78084.005256] RBP: ffff88002e2e1d28 R08: 0000000001993000 R09:
> > dead000000100100
> > [78084.005256] R10: 800000016e90e165 R11: 0000000000000000 R12:
> > 0000000000000000
> > [78084.005256] R13: ffff880002d8f580 R14: 0000000000400000 R15:
> > ffff880029248000
> > [78084.005256] FS:  00007fa07d87f7a0(0000) GS:ffff880002d81000(0000)
> > knlGS:0000000000000000
> > [78084.005256] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [78084.005256] CR2: ffff8800267c9010 CR3: 0000000001001000 CR4:
> > 0000000000000660
> > [78084.005256] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [78084.005256] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > [78084.005256] Process udevd (pid: 22814, threadinfo ffff88002e2e0000,
> > task ffff880019491e80)
> > [78084.005256] Stack:
> > [78084.005256]  0000000000600000 000000000061e000 ffff88002e2e1de8
> > ffffffff810fb8a5
> > [78084.005256] <0> 00007fff13ffffff 0000000100000206 ffff880003158003
> > 0000000000000000
> > [78084.005256] <0> 0000000000000000 000000000061dfff 000000000061dfff
> > 000000000061dfff
> > [78084.005256] Call Trace:
> > [78084.005256]  [<ffffffff810fb8a5>] free_pgd_range+0x27c/0x45e
> > [78084.005256]  [<ffffffff810fbb2b>] free_pgtables+0xa4/0xc7
> > [78084.005256]  [<ffffffff810ff1fd>] exit_mmap+0x107/0x13f
> > [78084.005256]  [<ffffffff8107714b>] mmput+0x39/0xda
> > [78084.005256]  [<ffffffff8107adff>] exit_mm+0xfb/0x106
> > [78084.005256]  [<ffffffff8107c86d>] do_exit+0x1e8/0x6ff
> > [78084.005256]  [<ffffffff815c228b>] ? do_page_fault+0x2cd/0x2fd
> > [78084.005256]  [<ffffffff8107ce0d>] do_group_exit+0x89/0xb3
> > [78084.005256]  [<ffffffff8107ce49>] sys_exit_group+0x12/0x16
> > [78084.005256]  [<ffffffff8103cc82>] system_call_fastpath+0x16/0x1b
> > [78084.005256] Code: 48 83 c4 28 5b c9 c3 55 48 89 e5 41 54 49 89 f4 53
> > 48 89 fb e8 fc ee ff ff 48 89 df ff 05 52 8f 9e 00 e8 78 e4 ff ff 84 c0
> > 75 05 <4c> 89 23 eb 16 e8 e0 ee ff ff 4c 89 e6 48 89 df ff 05 37 8f 9e
> > [78084.005256] RIP  [<ffffffff810382ff>] xen_set_pmd+0x24/0x44
> > [78084.005256]  RSP <ffff88002e2e1d18>
> > [78084.005256] CR2: ffff8800267c9010
> > [78084.005256] ---[ end trace 4eaa2a86a8e2da24 ]---
> > [78084.005256] Fixing recursive fault but reboot is needed!
> >


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel