________________________________________
From: Keir Fraser [keir.xen@xxxxxxxxx]
Sent: 28 July 2011 21:42
To: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] Debugging a weird hardware fault.
On 28/07/2011 20:53, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote:
> My questions to the Xen community are:
>
> what (if any) new tasks get scheduled when a XENPF_enter_acpi_sleep is
> in action, and more generally, how can I go about debugging which tasks
> are being run.
By the time you get to time_suspend(), you are running on CPU0, all other
CPUs are offline, all domUs are suspended, and IRQs are disabled. There's
not much scope for unexpected interruptions unless it's an NMI or SMI.
By that point the serial subsystem is in synchronous mode, rather than
interrupt-driven, so it's no wonder it continues to work.
-- Keir
Initially, an SMI was what I was thinking, but the triple fault occurs whether
you start bringing down CPUs or not. While waiting 10 seconds in the
platform_op select statment, the fault still occurs when all CPUs are still up,
all IRQs still enabled and potentially domU's still up. (Also, from studying
the Xen3.4 code, I believe that interrupts are still actually up during
time_suspend(), but are soon brought down by lapic_suspend() later in
device_power_down().)
Convertly, in the hacked up case where I ditched most of the shared S3/S5
codepath and just hit the PM1A, the server correctly shut down and stayed shut
down, implying that the fault was caused by software (be it BIOS or OS) rather
than hardware. From what I understand of the APCI spec (and I claim very
little knowledge), there are a multitude of hardware events which could bring
the server out of S5, appearing as a triple fault, which would not be affected
by whether you had hit the PM1A register.
In this specific example, dom0 regular shudown code already brought down the
domUs (of which there were none because we never started any), and we were
running with 1 CPU only so no others were up. This opens up a whole host of
other possibilities which could be playing an effect betwee the
XENPF_enter_apci_sleep hypercall and Xen actually shutting itself down.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|