After changeset 19679: ec2bc4b9fa32 (xend: hot-plug PCI devices at boot-time)
and the related ioemu commits, I notice there is a race condition that could
break the device assignment of hvm guest.
The scenario is (assuming we statically assign 1 NIC to hvm guest):
1) ioemu starts up; in xenstore_parse_domain_config(), ioemu adds a watch of
"device-model/<dom_id>/command" -- note: xs_watch() would fire the watch
immediately and a watch event is added into xsh->watch_list -- see
tools/xenstore/xenstored_watch.c:do_watch() -> add_event() and
tools/xenstore/xs.c:xs_watch();
2) in ioemu's main_loop(), ioemu invokes xenstore_record_dm_state("running");
3) xend invokes signalDeviceModel('pci-ins', 'pci-inserted', bdf_str) thats
writes "pci-ins" into "device-model/<dom_id>/command" -- note: it's the second
time the watch is fired and a new watch event is added into xsh->watch_list;
4) xend is scheduled out and ioemu is scheduled in (or, both processes are
running on different vcpus and ioemu runs a little faster at this time);
5) in main_loop(), ioemu invokes qemu_set_fd_handler(xenstore_fd(),
xenstore_process_event, NULL, NULL) and select();
For the command 'pci-ins', in xenstore_process_event(), ioemu
5.1) fetchs the first watch event from xsh->watch_list; in
xenstore_process_dm_command_event(), ioemu eventually writes the vslot into
/local/domain/0/device-model/<dom_id>/parameter on success;
5.2) fetchs the second watch event and invokes
xenstore_process_dm_command_event() again -- this invocation would fail due to
/local/domain/0/device-model/<dom_id>/parameter is not a valid bdf string so
ioemu would xenstore_record_dm("parameter", "no free hotplug slots");
6) now xend is scheduled in; in hvm_pci_device_insert_dev(), xend raises
VmError since vslot is "no free hotplug slots":
raise VmError(("Cannot pass-through PCI function '%s'. " +
"Device model reported an error: %s") %
(bdf_str, vslot))
The issue can be reproduced on some environments every time.
In ioemu's main_loop(), if we add a "sleep(3)" between
xenstore_record_dm_state("running") and qemu_set_fd_handler(xenstore_fd(),
xenstore_process_event, NULL, NULL), we can reproduce the issue every time.
A straightforward thought is: we can ignore the first watch fire. Please see
the below patch.
However I'm not sure if this is the best fix. And for the other watches(like
"logdirty", "hdc") set in xenstore_parse_domain_config(), I think we should
also double check them even if no actual bug has been observed yet.
Please comment.
Signed-off-by: Dexuan Cui <dexuan.cui@xxxxxxxxx>
diff --git a/xenstore.c b/xenstore.c
index d2f38d2..508f1c1 100644
--- a/xenstore.c
+++ b/xenstore.c
@@ -747,6 +747,18 @@ static void xenstore_process_dm_command_event(void)
char *path = NULL, *command = NULL, *par = NULL;
unsigned int len;
+ /* xs_watch() would fire the watch immediately -- see
+ * tools/xenstore/xenstored_watch.c:do_watch() -> add_event().
+ * To avoid race condition here, we should ignore the first fire and
+ * only handle the subsequent fire(s) triggered by the actual changes to
+ * the xenstore node.
+ */
+ static int first_time = 1;
+ if (first_time) {
+ first_time = 0;
+ goto out;
+ };
+
if (pasprintf(&path,
"/local/domain/0/device-model/%u/command", domid) == -1) {
fprintf(logfile, "out of memory reading dm command\n"); _______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|