This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU cau

To: Scott Garron <xen-devel@xxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU causes OOPS and instability
From: Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Date: Tue, 31 Aug 2010 02:20:28 -0700
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>
Delivery-date: Tue, 31 Aug 2010 02:21:20 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C7C14F7.9090308@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Citrix VMD
References: <4C7864BB.1010808@xxxxxxxxxxxxxxxxxx> <4C7BE1C6.5030602@xxxxxxxx> <1283195639.26797.451.camel@xxxxxxxxxxxxxxxxxxxxxxx> <4C7C14F7.9090308@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Mon, 2010-08-30 at 16:30 -0400, Scott Garron wrote:
> On 08/30/2010 03:13 PM, Daniel Stodden wrote:
> > Are you sure it's spinning or just freezing?
>       I'm not sure that I understand the difference between those two
> terms, so I'm going to guess "freezing" is probably a more accurate
> description.  The best way to describe what I was seeing was that my
> scripted backup procedure would get to a certain point and freeze, then
> I wouldn't be able to break out of it or issue a kill from another SSH
> session on its PID.  The kill command freezes the same way (never
> returns to a shell prompt and pressing CTRL-C just shows ^C on the
> display without breaking out).

If it were just some or more tasks hanging initially, and it's caught
some wait state, then identifying the point where things broke can
sometimes be quite straightforward. Doesn't seem to be the case here.

> > Can you try find the minimum number of steps necessary to get into
> > that state and try sth like $ ps -eH -owchan,nwchan,cmd
>       The minimum number of steps that I took, just now, to make it
> happen was as follows:
>       There's an HVM domU that's active and running Windows 2008 Server,
> called "scrappy", with the following Xen configuration:
> kernel = "hvmloader"
> builder='hvm'
> memory = 768
> name = "scrappy"
> vcpus=1
> vif = [ 'type=ioemu, mac=00:16:3e:00:00:18, bridge=eth0','type=ioemu,
> mac=00:16:3e:00:00:19, bridge=xenbr1','type=ioemu,
> mac=00:16:3e:00:00:1A, bridge=xenbr2' ]
> disk = [ 'phy:hurricanevg1/scrappy-primarymaster,xvda,w',
> 'file:/mnt/scratch/WindowsServerStd2008OEM_x86-64.iso,xvdb:cdrom,r',
> 'phy:hurricanevg1/scrappy-secondarymaster,xvdc,w' ]
> on_reboot   = 'restart'
> device_model = 'qemu-dm'
> sdl=0
> opengl=1
> vnc=1
> vnclisten=""
> vncdisplay=3
> vncunused=1
> stdvga=0
> serial='pty'
> tsc_mode=0
> localtime=1
> rtc_timeoffset=-3600
>       While that's running, I created a snapshot of the primarymaster
> volume, then removed it, created one for the secondarymaster, removed
> it, and created another one for the primarymaster, tried to remove it,
> and the lvremove command froze.  A minute or two later, I got a similar
> kernel OOPS message on my console to the one that I posted before.
> These are the commands that I used to create and remove the volumes:
> lvcreate -L 2G -n scrappy-primarymaster-backupsnap -s
> hurricanevg1/scrappy-primarymaster
> lvremove hurricanevg1/scrappy-primarymaster-backupsnap
> lvcreate -L 2G -n scrappy-secondarymaster-backupsnap -s
> hurricanevg1/scrappy-secondarymaster
> lvremove hurricanevg1/scrappy-secondarymaster-backupsnap
> lvcreate -L 2G -n scrappy-primarymaster-backupsnap -s
> hurricanevg1/scrappy-primarymaster
> lvremove hurricanevg1/scrappy-primarymaster-backupsnap
>       This time, the console froze completely and I couldn't open any new
> SSH sessions into the machine, and couldn't run the ps -eH command that
> you asked for in your previous message.  If I go for another attempt,
> I'll try to have a few logins already going so I can try to get that
> output for you.  This is a somewhat critical, production server, though,
> so I didn't want to keep bouncing it in the middle of the day.
> > Also, is that sequence completely reproducible or does the behaviour
> >  change evertime? Just trying if there's some point where deadlock
> > ends and corruption like the one quoted below would start.
>       It seems to be 3 for 3 at this point.

Okay. I guess that won't be simple to repro. I wonder what you are
running in dom0. Distro and version, what you upgraded and what not, any
customized software builds etc.

Given the rate at which you reproduce this and because only the
snapshots seem to trigger the problem, to me this looks more like an
LVM/DM issue than pvops specific.

Also, it might be worth trying to turn off udev and see whether that
changes sth.


Xen-devel mailing list