This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] RE: How to generate a HW NMI

> BTW, "rmmod processor thermal" (should be equivalent to your Xen

I am not familiar with the thermal module but my guess is that they are
not the same as the C3 states which can be entered when the kernel
becomes idle.  I believe the thermal plays with other type of state (P?)
where it alters the voltage and frequency of the CPU to keep the CPU
still running but at a particular % of the top speed.  The C3 state
causes the CPU clocks to shutdown entirely and then it is awaken by an
external event.


-----Original Message-----
From: Jan Kiszka [mailto:jan.kiszka@xxxxxxxxxxx] 
Sent: Monday, October 04, 2010 11:23 AM
To: Roger Cruz
Cc: Konrad Rzeszutek Wilk; xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: How to generate a HW NMI

Am 04.10.2010 16:19, Roger Cruz wrote:
> Until Friday, all hard hangs that we and our customers had experienced
> were on Lenovo T500 and X200, even with their latest BIOSes.

Yeah, the T500 was reported as problematic here as well. My Fujitsu
Celsius H700 also crashes.

In contrast, we have positive results from a Dell server with an Asus
P6T Deluxe V2 board and a Core i7 920.

>  The Lenovo
> T400 has never hung for me and I don't have any reports on them from
> field.  On Friday, I had an HP i5 hard hang with similar footprint as

i5? Mmh, we only have reports from i7 so far. Which BIOS vendor?

> the Lenovos.  When this hard hang happens, the Xen watchdog (which is
> driven by the NMI handler) will not do its job and cause a crash/stack
> trace.
>  This is why we have started to suspect something with the BIOS
> and SMIs as they are the only thing that can block an NMI.  I am
> certain that this is somehow related to entering C3 power states and
> possibly at the same time an SMI comes in.

I tried various stuff under Linux as well: nmi_watchdog=1, tracing to
VGA buffer right before/after guest-host switch (it always hangs after
entry here), verified guest interruptibility before entry (though
hypervisors usually do not play with the critical bits), read-out of
host RAM (including kernel log buffer) via Firewire - it all points to a
crash outside the scope of the host OS.

>  The time it takes to hang
> varies from 30mins to 24 hrs.

We are a bit more lucky, maybe due to our special guest (an old RTOS in
16-bit mode): I can reproduce the hang after a few minutes.

BTW, "rmmod processor thermal" (should be equivalent to your Xen
parameter) did not make a difference here.


Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.856 / Virus Database: 271.1.1/3168 - Release Date: 10/04/10

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>