On Wed, 2010-10-06 at 11:39 +0100, Stefano Stabellini wrote:
> On Tue, 5 Oct 2010, Sergey Tovpeko wrote:
> > Hello, list!
> >
> > xl reports the error on passthrough-ed pci device removing.
> >
> > do_pci_remove device 01:00.0
> > libxl: error: libxl_device.c:448:libxl__wait_for_device_model Device
> > Model not ready
> > libxl: error: libxl_pci.c:858:do_pci_remove Device Model didn't respond
> > in time
> > libxl: error: libxl.c:944:libxl_domain_destroy pci shutdown failed for
> > domid 1
> > libxl: error: libxl.c:896:libxl_destroy_device_model Couldn't find
> > device model's pid: No such file or directory
> > libxl: error: libxl.c:956:libxl_domain_destroy
> > libxl_destroy_device_model failed for 1
> > libxl: error: libxl_device.c:307:libxl__devices_destroy
> > /local/domain/1/device is empty
> >
> >
> >
> > It seems that libxl_pci didn't get the 'pci-removed' status from
> > qemu-dm. Please, have a look who should set this status in qemu-dm. As
> > for me I added xenstore_record_dm_state("pci-removed");
> > after do_pci_del(par);
> > in xenstore_process_dm_command_event function.
> >
> > It fixed up my issue of removing pci devices.
>
> What guest OS are you using?
> Currently "pci-removed" is only written in response of an eject command
> from the guest OS, that means that if the guest doesn't support pci
> hotplug the value won't be written.
> If you are using Linux you should make sure that the acpiphp module is
> loaded, if you are using Windows, I think the only version that supports
> pci hotplug is Windows Server 2008 but I might be wrong.
I am pretty sure I tested this with such a guest operating system and
got the same failure but have not had time to confirm and track it down.
There is another issue here which is that when xl bails after such a
missing device removal we are in an inconsistent state where the device
has been removed from xenstore but the guest still clings to it.
If you recall, Stefano, we discussed this issue a few weeks ago. I was
arguing in favour of a force-removal option which I still think is
viable. However, a minimal, and more safety conscious fix would be to
keep the device assigned until the domain shuts down in this instance.
Following patch implements the 'force' method but if we keep a
RETURN_FAIL after the libxl__wait_for_device_model it would be the safe
variant.
Does this solve the problem for you Sergey?
---
diff -r 36420e35c65a tools/libxl/libxl_pci.c
--- a/tools/libxl/libxl_pci.c Wed Sep 22 16:57:12 2010 +0100
+++ b/tools/libxl/libxl_pci.c Wed Sep 22 18:03:40 2010 +0100
@@ -858,8 +858,6 @@ static int do_pci_remove(libxl__gc *gc,
}
}
- libxl_device_pci_remove_xenstore(gc, domid, pcidev);
-
hvm = libxl__domain_is_hvm(ctx, domid);
if (hvm) {
if (libxl__wait_for_device_model(ctx, domid, "running", NULL, NULL) <
0) {
@@ -878,7 +876,9 @@ static int do_pci_remove(libxl__gc *gc,
xs_write(ctx->xsh, XBT_NULL, path, "pci-rem", strlen("pci-rem"));
if (libxl__wait_for_device_model(ctx, domid, "pci-removed", NULL,
NULL) < 0) {
LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "Device Model didn't respond
in time");
- return ERROR_FAIL;
+ /* This depends on guest operating system acknowledging the
SCI, if it doesn't
+ * respond in time then force the remove.
+ */
}
}
path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state",
domid);
@@ -924,7 +924,7 @@ skip1:
if ((fscanf(f, "%u", &irq) == 1) && irq) {
rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
if (rc < 0) {
- LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc,
"xc_physdev_map_pirq irq=%d", irq);
+ LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc,
"xc_physdev_unmap_pirq irq=%d", irq);
}
rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
if (rc < 0) {
@@ -951,6 +951,8 @@ out:
libxl_device_pci_remove(ctx, stubdomid, &pcidev_s);
}
+ libxl_device_pci_remove_xenstore(gc, domid, pcidev);
+
return 0;
}
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|