We have noticed occasional exceptions being reported in 3.0.4 by xend at the
end of save/live migration - it seems to me that this is caused by a lack of
synchronization between the destroyDomain() done at the end of save and the
code in XendDoimain.py that is watching for events on '@releaseDomain' - here's
a sample of xend.log:
[2007-04-20 03:47:05 xend 8525] DEBUG (XendCheckpoint:95) Written done
[2007-04-20 03:47:05 xend 8527] INFO (XendCheckpoint:271) SUSPEND shinfo
0000022e eip c01013a7 edx 00079894
[2007-04-20 03:47:05 xend 8527] INFO (XendCheckpoint:271) delta 9ms, dom0 33%,
target 0%, sent 167Mb/s, dirtied 233Mb/s 64 pages
[2007-04-20 03:47:05 xend 8527] INFO (XendCheckpoint:271) Saving memory pages:
iter 5 0%
5: sent 64, skipped 0, delta 3ms, dom0 0%, target 0%, sent 699Mb/s, dirtied
699Mb/s 64 pages
[2007-04-20 03:47:05 xend 8527] INFO (XendCheckpoint:271) Total pages sent=
108424 (0.98x)
[2007-04-20 03:47:05 xend 8527] INFO (XendCheckpoint:271) (of which 0 were
fixups)
[2007-04-20 03:47:05 xend 8527] INFO (XendCheckpoint:271) All memory is saved
[2007-04-20 03:47:05 xend 8527] INFO (XendCheckpoint:271) Save exit rc=0
[2007-04-20 03:47:05 xend.XendDomainInfo 8525] DEBUG (XendDomainInfo:1481)
XendDomainInfo.destroyDomain(1)
[2007-04-20 03:47:05 xend 5720] ERROR (xswatch:79) read_watch failed
Traceback (most recent call last):
File
"/test_logs/builds/SuperNova/trunk/070420/platform/xen/vendor/dist/install/usr/lib/python/xen/xend/xenstore/xswatch.py",
line 67, in watchMain
File
"/test_logs/builds/SuperNova/trunk/070420/platform/xen/vendor/dist/install/usr/lib/python/xen/xend/XendDomain.py",
line 146, in _on_domains_changed
File
"/test_logs/builds/SuperNova/trunk/070420/platform/xen/vendor/dist/install/usr/lib/python/xen/xend/XendDomain.py",
line 395, in _refresh
File
"/test_logs/builds/SuperNova/trunk/070420/platform/xen/vendor/dist/install/usr/lib/python/xen/xend/XendDomainInfo.py",
line 1832, in update
File
"/test_logs/builds/SuperNova/trunk/070420/platform/xen/vendor/dist/install/usr/lib/python/xen/xend/XendDomainInfo.py",
line 957, in refreshShutdown
File "/usr/lib/python2.3/logging/__init__.py", line 893, in info
apply(self._log, (INFO, msg, args), kwargs)
File "/usr/lib/python2.3/logging/__init__.py", line 994, in _log
self.handle(record)
File "/usr/lib/python2.3/logging/__init__.py", line 1004, in handle
self.callHandlers(record)
File "/usr/lib/python2.3/logging/__init__.py", line 1037, in callHandlers
hdlr.handle(record)
File "/usr/lib/python2.3/logging/__init__.py", line 592, in handle
self.emit(record)
File "/usr/lib/python2.3/logging/handlers.py", line 102, in emit
msg = "%s\n" % self.format(record)
File "/usr/lib/python2.3/logging/__init__.py", line 567, in format
return fmt.format(record)
File "/usr/lib/python2.3/logging/__init__.py", line 362, in format
record.message = record.getMessage()
File "/usr/lib/python2.3/logging/__init__.py", line 233, in getMessage
msg = msg % self.args
TypeError: int argument required
I think this can be explained by the fact that destroyDomain sets self.domId to
None right after calling xc.domain_destroy - there is a window here where the
_on_domains_changed processing in XendDomain.py can get into the
refreshShutdown routine at the same time that self.domId is being set to None.
As far as I can tell, this code has not changed in unstable and this problem
still exists; I guess it's mostly just cosmetic since the _on_domains_changed
callback wraps the refresh call in a an exception handler, BUT it's
disconcerting to see this error, apparently in destroyDomain until you look
closely.
Simon
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|