This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?

To: Jimi Xenidis <jimix@xxxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?
From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date: Wed, 06 Sep 2006 17:02:49 +0100
Cc: xen-ppc-devel <xen-ppc-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 06 Sep 2006 09:02:57 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <73C91A43-2403-41A1-9DD3-7B8835CEFCDF@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcbRzeZdJSgIhj3BEdu01wAKle7CWA==
Thread-topic: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?
User-agent: Microsoft-Entourage/
On 6/9/06 2:18 pm, "Jimi Xenidis" <jimix@xxxxxxxxxxxxxx> wrote:

> First off, I realize I have an SMP bug where my second processor is
> hung somewhere, I'm not sure where, but for the sake of this argument
> lets assume it has suffered an unrecoverable fault.
> My primary CPU is fine and is hung in vcpu_sleep_nosync() because the
> secondary will not clear its _VCPUF_running bit.

ITYM vcpu_sleep_sync(). Hint is in the name. ;-) The nosync variant does not
spin on the _running flag.

> While I have this error I would like to give up and try and recover
> from it.
> How long is long enuff?
> thoughts?

Holy crap!

Are you assuming that the offline CPU was not running anything other than
the idle loop or guest code, and that you'll simply destroy the guest if one
was running (since you can't really continue it)? Given that this is a
software bug, these assumptions are likely not true and the CPU has gone
down taking some locks with it. However, being optimistic, I suppose a few
100ms would be plenty to know that something is probably up.

 -- Keir

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>