|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] [PATCH] Use timeout on xenstore read_reply to avoid task
(Chris isn't involved with Xen maintenance any more, moving him to BCC).
On Wed, 2011-03-02 at 10:20 +0000, Frank Pan wrote:
> Recent pvops kernel uses wait_event on waiting reply from xenstored.
> This may cause xenstore clients hung inside kernel when xenstored does
> not response correctly. The hung even happens when xenstored is not
> running, and may confuse the developer.
>
> The patch attached uses wait_event_timeout instead, and return -EIO to
> userspace if xenstored does not response in 5 seconds.
>
> Simply change wait_event to wait_event_timeout is not correct. Right
> after the xenstored starts working, the requests abandoned before will
> be processed by xenstored, and responses sent to the reply_list queue
> will confuse the requests later. The patch also makes use of the
> req_id section in struct xsd_sockmsg, as a sequence id. This avoids
> the confusing of responses by abandoned requests.
>
> Any suggestions? Thanks.
It all sounds very plausible to me but you've forgotten the patch ;-)
Why wait_event_timeout and not wait_event_interruptible to allow users
to interrupt? In particular I'm concerned about the arbitrarily chosen
5s timeout not being sufficient on a busy system.
There was a recent discussion of the general issue you are solving on
xen-devel which it might be worth looking over. It was a thread titled
"libxl: error handling before xenstored runs" starting on 9 February.
Once specific pitfall which I remember was that userspace clients are at
liberty to make use of the req_id themselves (and some do). Fixing this
might involve shadowing the user provided req_id with a kernel generated
ID on the ring and unshadowing in the responses...
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|