WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Debugging a weird hardware fault.

To: Keir Fraser <keir@xxxxxxx>
Subject: Re: [Xen-devel] Debugging a weird hardware fault.
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Tue, 2 Aug 2011 15:56:54 +0100
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "winston.l.wang@xxxxxxxxx" <winston.l.wang@xxxxxxxxx>, "gang.wei@xxxxxxxxx" <gang.wei@xxxxxxxxx>
Delivery-date: Tue, 02 Aug 2011 08:01:22 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CA5D5710.18DF4%keir@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <CA5D5710.18DF4%keir@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11

On 02/08/11 15:26, Keir Fraser wrote:
> On 02/08/2011 07:14, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote:
>
>> Just for information, this turned out to be a BIOS bug.  It was setting
>> a 6 second timer when executing _PTS, which hit the system reset if
>> PM1{a,b} had not been hit when the timer expired.  As Xen does all of
>> its shutdown after the call to _PTS and before PM1{a,b}, there is a
>> significant time gap, which was falling fowl of the timer in most cases.
> Six seconds though, that's quite a long time! Is it a big box?

It is a Netscalar SDX box, designed to have 24 logical pcpus, 96GB ram,
320 pci-passed-through ixgbe virtual functions (claiming 3 irqs per vf).

It seems that Xen spends a fair amount of time doing freeze_domains
(even though dom0 has already shut down all domUs, albeit forcibly if
they haven't shut down nicely within 15 seconds), and bringing down the
other CPUs (in particular, it spends ages fiddling around with irq
affinities).

Overall, there is probably quite a bit of optimization which could be
done, but that still doesn't excuse a BIOS deciding that "a long time"
as per the ACPI spec is "less than 6 seconds".

~Andrew

>> In this case, it seems likely that a BIOS fix can be done, as Supermicro
>> do provide a custom BIOS for the NetScalar box in question.
>>
>> However, If anyone else comes across this issue, we did make a software
>> solution.  You can replace /etc/init.d/halt (or equivalent for your
>> chosen dom0 distro) to KEXEC reboot into a native kernel which listens
>> for a special command line parameter and calls pm_power_off_prepare()
>> and pm_power_off() after the ACPI module has initialized[1].
>>
>> This issue does however show that Xen itself is in breach of the ACPI
>> spec, which is a dangerous situation to be in given the fragility of
>> APCI at the best of times.  In due course, I will put my mind to solving
>> the dom0-Xen ACPI interaction problems if the question is still open.
> Yes, this is ultimately the issue. It's going to be a pain to fix properly,
> unfortunately.
>
>  -- Keir
>
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>