xen-devel

[Top] [All Lists]

Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?

from [Jimi Xenidis]

[Permanent Link][Original]

To:	Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject:	Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?
From:	Jimi Xenidis <jimix@xxxxxxxxxxxxxx>
Date:	Thu, 7 Sep 2006 09:37:59 -0400
Cc:	xen-devel@xxxxxxxxxxxxxxxxxxx, xen-ppc-devel <xen-ppc-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Thu, 07 Sep 2006 06:38:22 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
In-reply-to:	<C124AFB9.23EB%Keir.Fraser@xxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<C124AFB9.23EB%Keir.Fraser@xxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx


On Sep 6, 2006, at 12:02 PM, Keir Fraser wrote:

On 6/9/06 2:18 pm, "Jimi Xenidis" <jimix@xxxxxxxxxxxxxx> wrote:

First off, I realize I have an SMP bug where my second processor is
hung somewhere, I'm not sure where, but for the sake of this argument
lets assume it has suffered an unrecoverable fault.

My primary CPU is fine and is hung in vcpu_sleep_nosync() because the
secondary will not clear its _VCPUF_running bit.

ITYM vcpu_sleep_sync(). Hint is in the name. ;-) The nosync variantdoes not

spin on the _running flag.


Correct.

While I have this error I would like to give up and try and recover
from it.
How long is long enuff?
thoughts?


Holy crap!


I find these things to be rather UN-holy :)

Are you assuming that the offline CPU was not running anythingother thanthe idle loop or guest code, and that you'll simply destroy theguest if one
was running (since you can't really continue it)?

Not sure how far I'd go here, but right now, I'd be happy with oneCPU not causing all CPUs (or the one servicing a xend command) to sitin an infinite loop, even if its my fault.

Given that this is a
software bug,


and there is always at least one :)

these assumptions are likely not true and the CPU has gone
down taking some locks with it.

Hypervisors should increase the availability of the machine as awhole, PPC machines tend to have many HA features that when unhandled(mostly ECC) can cause a CPU to go down.

However, being optimistic, I suppose a few
100ms would be plenty to know that something is probably up.


ok.. I'll work with that, thanks

-JX

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
[XenPPC] should vcpu_pause()/vcpu_sleep_nosync() give up?, Jimi Xenidis Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?, Keir Fraser Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?, Jimi Xenidis <= Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?, Keir Fraser

Previous by Date:	Re: [Xen-devel] unstable tip not booting on x86-64 with 'domain_crash_sync', Keir Fraser
Next by Date:	Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?, Keir Fraser
Previous by Thread:	Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?, Keir Fraser
Next by Thread:	Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?, Keir Fraser
Indexes:	[Date] [Thread] [Top] [All Lists]