On Thu, 2005-08-04 at 14:09 +0100, Ian Pratt wrote:
> > FYI, I just pushed the accumulated xenstore daemon updates to
> > the xen-devel list. Some bug fixes, some cleaning, some
> > simplification, much more testing. The only API change is
> > that watches no longer have a priority (they all fire
> > simultaneously now), but noone used that arg anyway.
> The priority stuff looked like it was going to be useful to implement
> simple 'work flow' e.g. when a domain dies, enable the core dump daemon
> to hook in and run to completion before the domain reaper daemon, before
> the reboot daemon etc
This was the original intention, but it's been problematic in practice.
There's no simple way to implement timeouts, since eg. a core dump
daemon would be expected to take minutes. I then implemented "block the
writer until all acks come in", which can also be used to sync things,
and (as obvious in retrospect) it immediately deadlocked as a watcher
wrote a value the original writer was watching. This implementation
simplified a lot of code.
> What's the story for doing the above without support for priorites? Are
> we going to have to create some other convention that enables the
> various dameons to serialize themselves into the right (partial)order?
Yes. In the watch-based model, someone (xend?) was going to spot that
the domain had died, write a "dead" entry somewhere, which other daemons
would be watching for. It's possible to use the store like that, but
it's contrived: it's not really data, it's an event. You can tell this,
because if the reaper daemon isn't running (yet?), or doesn't get the
watch event for some other reason, the process fails. Fragile.
Consider this alternate strategy which is more robust (although it's
quite possible that xend execing known paths is a better model in
practice, and this generality is way overkill):
postmortemd registers interest by creating directory:
xend writes uuid to each dir under notification/death/
postmortemd does work, then deletes node.
xend notices deletion, notifies next daemon/cleans up etc.
In this case, the data contains the information, and it's fairly easy to
ensure that any of the daemons can be restarted at any time and see what
work there is to do.
It's also fairly easy for xend to implement a priority scheme, filtering
or whatever turns out to make sense. A variant is that the event
("death" in this example) actually written into the <uuid> node, rather
than being implied by the directory.
Hope that clarifies,
A bad analogy is like a leaky screwdriver -- Richard Braakman
Xen-tools mailing list