On 08/31/2010 05:20 AM, Daniel Stodden wrote:
If it were just some or more tasks hanging initially, and it's caught
some wait state, then identifying the point where things broke can
sometimes be quite straightforward. Doesn't seem to be the case
True. It's at least narrowed down to something with the way LVM/DM
and udev interact during creation and removal of snapshots since the
machine can run for days without incident until I start adding and
removing snapshots (of running HVM volumes).
Okay. I guess that won't be simple to repro. I wonder what you are
running in dom0. Distro and version, what you upgraded and what not,
any customized software builds etc.
I'm running Debian Squeeze (testing) and have included a full list
of installed packages (dpkg -l) in the text file referenced in some of
my previous e-mails, here:
I've also included the output of "ps -eH -owchan,nwchan,cmd" during
normal operations (not yet in the "crashed" state).
I don't recall running any customized software builds on dom0.
It's a fairly bog standard Debian installation. If I'm going to do
anything customized, I usually do it on a domU.
Given the rate at which you reproduce this and because only the
snapshots seem to trigger the problem, to me this looks more like an
LVM/DM issue than pvops specific.
That has crossed my mind. The only reason that I suspected
anything to do with Xen or pvops was that it only seems to happen when
creating/removing a snapshot of an active, running HVM. I can create
and remove snapshots of other volumes all day and not trigger the bug
(tested yesterday). It would probably be impossible to trigger the bug
on a baremetal machine that's not running a hypervisor.
Also, it might be worth trying to turn off udev and see whether that
I'm going to try to reproduce it on another, less critical machine
today, so I can poke at it a little more. I'll let you know what I find.
Xen-devel mailing list