This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [PATCH] Use timeout on xenstore read_reply to avoid task

To: Frank Pan <frankpzh@xxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] Use timeout on xenstore read_reply to avoid task hunging
From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Date: Wed, 2 Mar 2011 10:34:22 +0000
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jeremy, Fitzhardinge <Jeremy.Fitzhardinge@xxxxxxxxxx>
Delivery-date: Wed, 02 Mar 2011 02:35:35 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTimsaSKxCxRqBkpgVs30_jgdHrgw2ibFsU_gKCo6@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Citrix Systems, Inc.
References: <AANLkTimsaSKxCxRqBkpgVs30_jgdHrgw2ibFsU_gKCo6@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
(Chris isn't involved with Xen maintenance any more, moving him to BCC).

On Wed, 2011-03-02 at 10:20 +0000, Frank Pan wrote:
> Recent pvops kernel uses wait_event on waiting reply from xenstored.
> This may cause xenstore clients hung inside kernel when xenstored does
> not response correctly. The hung even happens when xenstored is not
> running, and may confuse the developer.
> The patch attached uses wait_event_timeout instead, and return -EIO to
> userspace if xenstored does not response in 5 seconds.
> Simply change wait_event to wait_event_timeout is not correct. Right
> after the xenstored starts working, the requests abandoned before will
> be processed by xenstored, and responses sent to the reply_list queue
> will confuse the requests later. The patch also makes use of the
> req_id section in struct xsd_sockmsg, as a sequence id. This avoids
> the confusing of responses by abandoned requests.
> Any suggestions? Thanks.

It all sounds very plausible to me but you've forgotten the patch ;-)

Why wait_event_timeout and not wait_event_interruptible to allow users
to interrupt? In particular I'm concerned about the arbitrarily chosen
5s timeout not being sufficient on a busy system.

There was a recent discussion of the general issue you are solving on
xen-devel which it might be worth looking over. It was a thread titled
"libxl: error handling before xenstored runs" starting on 9 February.

Once specific pitfall which I remember was that userspace clients are at
liberty to make use of the req_id themselves (and some do). Fixing this
might involve shadowing the user provided req_id with a kernel generated
ID on the ring and unshadowing in the responses...


Xen-devel mailing list