Dear all,
I am having problems with tapdisk devices:
- When shutting down the virtual machine, the tapdisk process
continues running, and the device is still present at
/sys/class/blktap2. It can be removed, though, issuing echo 1 >
/sys/class/blktap2/blktap<id>/remove.
- I tried to duplicate the snapshot process implemented in
tools/blktap2/drivers/xmsnap, but using vhd snapshot instead of
qcow. The process seemed to work, but changes continue to be written
to the renamed disk, not to the snapshot. It seems that the tapdisk
process keeps the association to the opened file, even when moving
it.
I'm using xen on a CentOS 5 distro, with xen and kernel compiled
from xen's own baselines. I noticed the same behavior in xen
4.0.2.rc3 / kernel 2.6.32.36+fix and in xen 4.1.2.rc1-pre / kernel
2.6.32.43.
Info from a xl.log file:
cat /var/log/xen/xl-teste020.log.2
Waiting for domain teste020 (domid 11) to die [pid 7352]
Domain 11 is dead
Unknown shutdown reason code 255. Destroying domain.
Action for shutdown reason code 255 is destroy
Domain 11 needs to be cleaned up: destroying the domain
libxl: error: libxl.c:734:libxl_domain_destroy xc_domain_pause
failed for 11
libxl: error: libxl_dm.c:747:libxl__destroy_device_model Couldn't
find device model's pid: No such file or directory
libxl: error: libxl.c:738:libxl_domain_destroy
libxl__destroy_device_model failed for 11
libxl: error: libxl_dom.c:603:userdata_path unable to find domain
info for domain 11: No such file or directory
libxl: error: libxl.c:755:libxl_domain_destroy xc_domain_destroy
failed for 11
Done. Exiting now
As a hint, some months ago I posted at xen-devel a bug report
related to tapdisk failures, which was solved with a fix related to
spinlocks, recently delivered to 2.6.32 pvops kernel baseline. At
that point, Daniel Stodden, who identified the needed fix, wrote:
"It's the only pending bugfix, quite an obvious one actually. It's
been rare enough unless provoked like Gerd did, but we found
it first in XCP so it actually tends to happen."
Actually, I'm not sure how I could be provoking any different
behavior from tapdisk, but it seems that some configuration I'm
using is leading tapdisk to some unexpected behavior.
The whole message exchange:
On Thu, 2011-04-14 at 12:38 -0400, Daniel Stodden wrote:
> On Thu, 2011-04-14 at 09:15 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Apr 13, 2011 at 06:02:13PM -0300, Gerd Jakobovitsch wrote:
> > > I'm trying to run several VMs (linux hvm, with tapdisk:aio disks at
> > > a storage over nfs) on a CentOS system, using the up-to-date version
> > > of xen 4.0 / kernel pvops 2.6.32.x stable. With a configuration
> > > without (most of) debug activated, I can start several instances -
> > > I'm running 7 of them - but shortly afterwards the system stops
> > > responding. I can't find any information on this.
> >
> > First time I see it.
> > >
> > > Activating several debug configuration items, among them
> > > DEBUG_PAGEALLOC, I get an exception as soon as I try to start up a
> > > VM. The system reboots.
> >
> > Oooh, and is the log below from that situation?
> >
> > Daniel, any thoughs?
>
> ---
> Unmap pages from the kernel linear mapping after free_pages().
> This results in a large slowdown, but helps to find certain types
> of memory corruption.
>
> Stunning. Our I/O page allocator is a sort of twisted mempool. Unless
> the allocation is explicitly modified in sysfs/, everything should stay
> pinned. We might be just tripping over debug code alone, but I didn't
> figure it out yet.
Ah, that's just missing Dominic's spinlock fix.
http://xenbits.xen.org/gitweb/?p=people/dstodden/linux.git;a=commit;h=a765257af7e28c41bd776c3e03615539597eb592
Daniel
|
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|