WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] Avoid triggering the softlockup BUG when offline

To: Keir Fraser <keir@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] Avoid triggering the softlockup BUG when offline for too long.
From: Glauber de Oliveira Costa <gcosta@xxxxxxxxxx>
Date: Mon, 27 Nov 2006 13:31:12 -0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 27 Nov 2006 07:31:15 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C1906CC2.519F%keir@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20061124131022.GB7171@xxxxxxxxxx> <C1906CC2.519F%keir@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.11
On Mon, Nov 27, 2006 at 10:21:54AM +0000, Keir Fraser wrote:
> 
> 
> 
> On 24/11/06 13:10, "Glauber de Oliveira Costa" <gcosta@xxxxxxxxxx> wrote:
> 
> > After being offline for a long time, the softlockup  watchdog triggers
> > a BUG() on our faces. This is expected, as in fact, we spent more than
> > a fixed 10*HZ amount of time without touching the watchdog.
> > 
> > However, by inspecting the contents of RUNSTATE_offline, we can gain
> > awareness of the fact, and do better than that. This patch fixes it.
> > 
> > Signed-off-by: Glauber de Oliveira Costa <gcosta@xxxxxxxxxx>
> 
> Would 'stolen' not be a good enough thing to test? Presumably this is really
> just dealing with xm pause/unpause (a single long offline) so this simpler
> fix would work just as well?

I thought about it, but I'm not 100 % sure. Reasons I had for not using
stolen, were basically:

* Conceptually, (maybe not in practice) stolen could grow due to
runnable time only. 
* stolen time, as well as blocked time, does not have it's corresponding
per processor variable updated all in once, but in multiples of
NS_PER_TICK chuncks. If we're out for too long, we could detect stolen
being too great multiple times, leading to far more calls to the
softlockup watchdog then we want too.

Waiting for your comments on this,

-- 
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel