WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Xend problems through CS 15250

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] Xend problems through CS 15250
From: Stefan Berger <stefanb@xxxxxxxxxx>
Date: Sat, 16 Jun 2007 13:43:14 -0400
Cc: jfehlig@xxxxxxxxxx, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, John Levon <john.levon@xxxxxxx>, xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Delivery-date: Sat, 16 Jun 2007 10:41:23 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C2997896.9429%Keir.Fraser@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

I have reapplied 15252 to see what's going on.

This is from 15252:


--- a/tools/python/xen/xend/XendDomainInfo.py                 Mon Jun 11 10:16:54 2007 +0100
+++ b/tools/python/xen/xend/XendDomainInfo.py                 Mon Jun 11 10:21:11 2007 +0100
@@ -557,7 +557,23 @@ class XendDomainInfo:
                return None

        log.debug("dev = %s", dev)
-        return self.getDeviceController(deviceClass).destroyDevice(dev, force)
+
+        dev_control = self.getDeviceController(deviceClass)
+        dev_uuid = dev_control.readBackend(dev, 'uuid')


The 'dev' passed above to readBackend must be an integer, but it's a string in this case. For some reason 'xvda1' for example flies by here also (now?). I intercept this case with the following 'hack'.

diff -r 1feb91894e11 tools/python/xen/xend/XendDomainInfo.py
--- a/tools/python/xen/xend/XendDomainInfo.py                 Fri Jun 15 16:51:08 2007 +0100
+++ b/tools/python/xen/xend/XendDomainInfo.py                 Sat Jun 16 11:44:25 2007 -0400
@@ -555,6 +555,11 @@ class XendDomainInfo:
            if dev == None:
                log.debug("Could not find the device %s", devid)
                return None
+            try:
+                dev = int(dev)
+            except ValueError:
+                log.info("Not destroying device '%s'" % dev)
+                return



The next problem that's occurring is in DevController:

+                 # Wait till both frontpath and backpath are removed from
+                 # xenstore, or timed out
+                 if frontpath:
+                     status = self.waitUntilDestroyed(frontpath)
+                     if status == Timeout:
+                         # Exception will be caught by destroyDevice in XendDomainInfo.py
+                         raise EnvironmentError
+                 if backpath:
+                     status = self.waitUntilDestroyed(backpath)
+                     if status == Timeout:
+                         # Exception will be caught by destroyDevice in XendDomainInfo.py
+                         raise EnvironmentError

 
        self.vm._removeVm("device/%s/%d" % (self.deviceClass, devid))

The EnvironmentError gets raised due to the TimeOut and the propagated Exception indicates that the device has not been removed. I am not sure whether this is actually true (that the device has not been removed).


I added some debugging info and don't raise the EnvironmentException's anymore.

[2007-06-16 12:58:48 4939] DEBUG (XendDomainInfo:1362) Removing vbd/51713
[2007-06-16 12:58:48 4939] DEBUG (XendDomainInfo:564) dev = 51713
[2007-06-16 12:59:03 4939] INFO (DevController:235) ----> status = 4
[2007-06-16 12:59:03 4939] INFO (DevController:239) ---> Would raise an EnvironmentError for frontpath
[2007-06-16 12:59:18 4939] INFO (DevController:246) ---> Would raise an EnvironmentError for backpath
[2007-06-16 12:59:18 4939] DEBUG (XendDomainInfo:1362) Removing console/0
[2007-06-16 12:59:18 4939] DEBUG (XendDomainInfo:564) dev = 0
[2007-06-16 12:59:18 4939] DEBUG (DevController:578) destroyCallback /local/domain/4/device/vbd/51713.
[2007-06-16 12:59:18 4939] DEBUG (DevController:586) destroyCallback /local/domain/4/device/vbd/51713 is destroyed
[2007-06-16 12:59:18 4939] DEBUG (DevController:578) destroyCallback /local/domain/0/backend/vbd/4/51713.
[2007-06-16 12:59:18 4939] DEBUG (DevController:578) destroyCallback /local/domain/0/backend/vbd/4/51713/state.
[2007-06-16 12:59:18 4939] DEBUG (DevController:578) destroyCallback /local/domain/0/backend/vbd/4/51713.
[2007-06-16 12:59:18 4939] DEBUG (DevController:586) destroyCallback /local/domain/0/backend/vbd/4/51713 is destroyed

I raised the timeout for waiting for the destruction of the device to 15 seconds. The callbacks are firing due to the destruction of the domain, at least that's my interpretation. The xm-tests block-create 04 and 09 are good candidates for testing this.

   Stefan



xen-devel-bounces@xxxxxxxxxxxxxxxxxxx wrote on 06/16/2007 06:23:02 AM:

> Yes, I can repro this quite easily. However, the exception is coming
> from code added by changeset 15252 (Sun’s addition to make device
> detach wait for completion). But it’s probably a bad interaction
> between 15250 and 15252.
>
> Since I don’t know which of these changesets is actually bogus, I’ve
> reverted the later one (15252) for now. There were also complaints
> about ‘xm save’ not completing (which I’ve been unable to reproduce
> myself) and for which 15252 is one of the suspects, although this is
> not confirmed!
>
> So... Either 15250 needs fixing or reverting to allow 15252 to be
> correctly re-applied, or 15252 needs fixing in light of 15250. Take
> your pick. :-)
>
>  -- Keir
>
> On 16/6/07 05:28, "Stefan Berger" <stefanb@xxxxxxxxxx> wrote:

>
> Changeset 15250 introduces some problems of this kind here:
>
> [2007-06-15 23:44:58 3440] DEBUG (XendDomainInfo:559) dev = 0
> [2007-06-15 23:44:58 3440] ERROR (XendDomainInfo:1363) Device
> release failed: 01_domu_proc-1181965480; console; console/0
> Traceback (most recent call last):
>   File "//usr/lib/python/xen/xend/XendDomainInfo.py", line 1358, in
> _releaseDevices
>     self.destroyDevice(devclass, dev, False);
>   File "//usr/lib/python/xen/xend/XendDomainInfo.py", line 562, in
> destroyDevice
>     dev_uuid = dev_control.readBackend(dev, 'uuid')
>   File "//usr/lib/python/xen/xend/server/DevController.py", line
> 407, in readBackend
>     frontpath = self.frontendPath(devid)
>   File "//usr/lib/python/xen/xend/server/DevController.py", line
> 550, in frontendPath
>     return "%s/%d" % (self.frontendRoot(), devid)
> TypeError: int argument required
>
> In this case the devid is a string and not an integer.
>
> I see that some code that does a search has been removed. I wonder
> whether that was the right thing to do...
>
> http://xenbits.xensource.com/xen-unstable.hg?rev/a43a03d53781
>
>   Stefan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>