Hello folks,
before i write this one off as a bug and subsequently hand it in to
[xen-devel] or [linux-lvm] im trying to gather some info.
So i am looking from anyone running a current xen environment using lvm
as guest storage.
** environment **
I am running xen/stable-2.6.32.x (2.6.32.45 atm, kernel.org is down)
with xen4.1.2-rc1 on debian 6.0.2 (stable/squeeze).
For each guest i have a logical volume for the root-filesystem and one
for swap.
** circumstances **
Each morning a cronjob runs a backup script for all running guests on my
xen hosts.
This backup script, before and after checking volume and mountpoint
availability does the following:
- create snapshot
- mount snapshot
- tar the mounted volume
- umount the snapshot
- remove the snapshot
** problem prelude **
Now in the past this worked fine, and by past i mean 2.6.31.x-dom0 with
Xen-4.0.x on debian 5.0(stable/lenny?)
Ever sinced i upgraded my environment i've been having trouble with LVM
hanging on snapshot creation for the FIRST guest in the list.
I have yet to catch this in the act as i only added -vvvv to the command
this week on one server and the only occurance of this bug after that
was on a server where i did not add -vvvv ... go figure.
At the moment i owe you (and myself) the real output of a lvcreate -vvvv
triggering this block.
Since i get one or two each week, its only a question of time until it
happens.
(Maybe i can provoke it by running 1000s of backup rotations without
mounting the volume or tarballing it.)
** problem **
For now im gonna stick with the aftermath:
After the initial process that ran in to the block, no lvm command can
be successfully run anymore. (i did remove the /var/lock/lvm/ files)
Sending signals to the blocking process does not get rid of it.
Every command that does the same init-stuff as lvcreate and lvs is left
to hang/block once it reaches the device (for example /dev/dm-14):
http://pastebin.com/3f7Q3ALb
The output in that paste documents pretty much the default stuff that is
run on every lvm command.
There are no entries made in any of the system logfiles pointing towards
an obvious problem.
At that point the guest is still fine, it can I/O to that device.
When i try to shutdown the domain it does not "power off" due to the
fact, that xen runs into the same block.
When i destroy the guest, xl list shows its state as "(null) .... ---p-s"
** recovery **
I can recover by forcefully removing the block device with "dmsetup
--force remove".
After that, not only can i kill the processes and the guest disappears
from "xl list".
"lvchange -aey xen-data/myguest-root" works.
Now i can create a snapshot and my backup script can successfully backup
the volume again.
** questions **
This may very well be a problem with the lvm version of debian, it may
be a problem with the old device mapper modules of 2.6.32, a combination
of both OR its a problem with the xen hypervisor, io handling of the xen
kernel code or a comination of those.
Has anyone of you ever encountered this or a similar problem before?
Did i miss related mails on [xen-devel] and [xen-users] that could help
me fix this issue?
What do you think where the problem may be (hypervisor, kernel or lvm
userland utils)?
If you are successfully running xen4.1 with 2.6.32 and LVM2, doint
pretty much the same backup procedure as i do and have never encountered
this, please let me know.
Input is greatly appreciated.
with best regards
Andreas
smime.p7s
Description: S/MIME Kryptografische Unterschrift
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|