The current way the hotplug scripts report errors is sub-optimal, rarely
giving the user enough information to diagnose the problem. This is
particularly true of the networking scripts whose typical failure mode
is very obsecure. eg, if I configure a guest to use bridge7 which does
not exist, I will be told:
# xm create demo
....60 seconds passes...
Error: Device 0 (vif) could not be connected. Hotplug scripts not working.
Having looked at the hotplug scripts & the way things tie into XenD it looks
like we can do much better. Currently one of 3 things happens:
- The script explicitly calls 'fatal' which sets the 'hotplug-status'
field in xenstore to 'error'. The real error message is sent to
syslog, which if you're lucky, may end up in /var/log/messages
- A command exits with non-zero exit status, resulting in the signal
trap being executed. This in turn calls 'fatal', with no useful
error messages
- For block devices where a sharing mode violation is detected, the
'hotplug-status' is set to 'busy' and a real error message is also
set in xenstored under the 'hotplug-error' field.
The latter case is the interesting one, because when it sees a 'busy' status
code, XenD will extract the actual error message from 'hotplug-error' and
feed it all the way back to the end user/tool.
Basically I want the first two cases to also behave like the error, so that
whenever anything goes wrong in the hotplug scripts, the user always gets
a useful error message back. It also avoids having to wait for the 60 second
hotplug timeout.
The first case was easy to fix in this way - simply change XenD so that
it always looks for 'hotplug-error' for 'fatal' codes as well as 'busy'
codes.
The second case is a little tricker because we have to identify commands
in the hotplug scripts which are likely to fail, and add explicit handling
to enable meaningful error feedback. I've done such analysis for the
vif-bridge script and realized that the most likely cause of failure for
any command in this script is a missing bridge device. It is trivial to
do an upfront check for existance of the bridge device and immediately
feed an error back to the user.
I'm using 'ip link show $bridge' to detect whether the bridge exists or
not. I expect this is reasonably portable, but as always its worth people
double-checking it works on their particular distro.
I'm attaching a patch which implements both of these fixups. The net
result is that now when creating a guest with a config refering to a
non-existant bridge device I see:
# xm create demo
Error: Device 0 (vif) could not be connected. Could not find bridge device
xenbr7
There may well be other places in the hotplug scripts which need fixing
up - we can address these as they turn up if we still see people getting
the generic 'Hotplug scripts no working' message.
Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx>
Regards,
Dan.
--
|=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=|
|=- Perl modules: http://search.cpan.org/~danberr/ -=|
|=- Projects: http://freshmeat.net/~danielpb/ -=|
|=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
xen-hotplug-error-reporting.patch
Description: Text document
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|