Re: [Xen-devel] Sles9.3 HVM guest block

To:	"Woller, Thomas" <thomas.woller@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] Sles9.3 HVM guest block
From:	Keir Fraser <keir@xxxxxxxxxxxxx>
Date:	Tue, 20 Feb 2007 17:35:22 +0000
Cc:	"Wilson, Stephen" <Stephen.Wilson@xxxxxxx>
Delivery-date:	Tue, 20 Feb 2007 09:34:47 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
In-reply-to:	<683860AD674C7348A0BF0DE3918482F6045DB4A9@xxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcdVEJN9twCKDJwESkePMcO9nEpUbAABOu2q
Thread-topic:	[Xen-devel] Sles9.3 HVM guest block
User-agent:	Microsoft-Entourage/11.2.5.060620

On 20/2/07 17:00, "Woller, Thomas" <thomas.woller@xxxxxxx> wrote:

> During regression of testing 32b UP SLES9.3/SUSE10 HVM guests on 64b hv,
> we are seeing a problem with the guest becoming permanently blocked (b
> state).  Blockage occurs at fairly random times... booting, fsck,
> ltp/cerberos - on both AMD-V and VT, and takes from 5 minutes to many
> hours to fail.  Last c/s tested was 13947 that we see the problem.
> We've traced it back to changeset 13320.  if we boot the guest with
> hpet=disabled, then the guest runs without problem (tested 48 hours w/o
> failure).  Adding the "vcpu_kick" line removed with c/s 13320 also
> alleviates the problem (24 hours w/o failure).
> Let me know if you need any more details concerning the guest
> configuration or host machine, or if you believe/need alternate testing
> parms would be useful, and we can run additional tests.

Thanks for tracking this one down to the HPET logic. However, reinstating
this changeset is not really the correct fix. A vcpu_kick() may rescue
otherwise-lost VCPUs I suppose, but there's no logical reason that it should
be necessary. Any necessary wakeup should occur via an interrupt delivery
from hpet_route_interrupt().

After all, there's no point in waking up a VCPU unless it has work to do,
which will usually mean that you are in the process of delivering it an
interrupt (hence the vcpu_kick() invocations in vpic.c, vioapic.c and
vlapic.c). The invocation in vpt.c is actually correct because it is tied up
in the pending_intr_nr logic which gets checked in the exit-to-guest path of
a woken VCPU.

It's worth trying to grab some more info about a guest when it hangs: How
are the HPET timers configured? In particular, how should interrupts be
delivered? Does it look like an interrupt has been delivered but not
notified? Etc.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Sles9.3 HVM guest block