On 08/30/2010 03:13 PM, Daniel Stodden wrote:
Are you sure it's spinning or just freezing?
I'm not sure that I understand the difference between those two
terms, so I'm going to guess "freezing" is probably a more accurate
description. The best way to describe what I was seeing was that my
scripted backup procedure would get to a certain point and freeze, then
I wouldn't be able to break out of it or issue a kill from another SSH
session on its PID. The kill command freezes the same way (never
returns to a shell prompt and pressing CTRL-C just shows ^C on the
display without breaking out).
Can you try find the minimum number of steps necessary to get into
that state and try sth like $ ps -eH -owchan,nwchan,cmd
The minimum number of steps that I took, just now, to make it
happen was as follows:
There's an HVM domU that's active and running Windows 2008 Server,
called "scrappy", with the following Xen configuration:
kernel = "hvmloader"
builder='hvm'
memory = 768
name = "scrappy"
vcpus=1
vif = [ 'type=ioemu, mac=00:16:3e:00:00:18, bridge=eth0','type=ioemu,
mac=00:16:3e:00:00:19, bridge=xenbr1','type=ioemu,
mac=00:16:3e:00:00:1A, bridge=xenbr2' ]
disk = [ 'phy:hurricanevg1/scrappy-primarymaster,xvda,w',
'file:/mnt/scratch/WindowsServerStd2008OEM_x86-64.iso,xvdb:cdrom,r',
'phy:hurricanevg1/scrappy-secondarymaster,xvdc,w' ]
on_reboot = 'restart'
device_model = 'qemu-dm'
sdl=0
opengl=1
vnc=1
vnclisten="192.168.0.90"
vncdisplay=3
vncunused=1
stdvga=0
serial='pty'
tsc_mode=0
localtime=1
rtc_timeoffset=-3600
While that's running, I created a snapshot of the primarymaster
volume, then removed it, created one for the secondarymaster, removed
it, and created another one for the primarymaster, tried to remove it,
and the lvremove command froze. A minute or two later, I got a similar
kernel OOPS message on my console to the one that I posted before.
These are the commands that I used to create and remove the volumes:
lvcreate -L 2G -n scrappy-primarymaster-backupsnap -s
hurricanevg1/scrappy-primarymaster
lvremove hurricanevg1/scrappy-primarymaster-backupsnap
lvcreate -L 2G -n scrappy-secondarymaster-backupsnap -s
hurricanevg1/scrappy-secondarymaster
lvremove hurricanevg1/scrappy-secondarymaster-backupsnap
lvcreate -L 2G -n scrappy-primarymaster-backupsnap -s
hurricanevg1/scrappy-primarymaster
lvremove hurricanevg1/scrappy-primarymaster-backupsnap
This time, the console froze completely and I couldn't open any new
SSH sessions into the machine, and couldn't run the ps -eH command that
you asked for in your previous message. If I go for another attempt,
I'll try to have a few logins already going so I can try to get that
output for you. This is a somewhat critical, production server, though,
so I didn't want to keep bouncing it in the middle of the day.
Also, is that sequence completely reproducible or does the behaviour
change evertime? Just trying if there's some point where deadlock
ends and corruption like the one quoted below would start.
It seems to be 3 for 3 at this point.
--
Scott Garron
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|