WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] AMD Magny-Cours and HPET

To: Jan Beulich <JBeulich@xxxxxxxxxx>
Subject: Re: [Xen-devel] AMD Magny-Cours and HPET
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Tue, 16 Aug 2011 13:32:17 +0100
Cc: ChristophEgger <christoph.egger@xxxxxxx>, Wei Huang <wei.huang2@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 16 Aug 2011 05:32:55 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4E4A5DE90200007800051741@xxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4E4A3CA1.3050100@xxxxxxxxxx> <4E4A5DE90200007800051741@xxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11

On 16/08/11 11:09, Jan Beulich wrote:
>>>> On 16.08.11 at 11:47, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> We have had a bug raised against Xen-3.4 that the kexec path fails, on
>> HP BL465c G7 blades.  The problem does not reproduce on any other AMD
>> machines I have to hand.
>>
>> On further investigation, it appears that if the crashing cpu is #0,
>> then the kexec path hangs forever trying to grab the already locked
>> legacy_hpet_event.lock in hpet_disable_legacy_broadcast().  Removing the
>> lock/unlock pair causes the kexec crash path to work as expected.
> Are you sure it is locked (rather than never initialized)? The problem
> could be that hpet_broadcast_is_available() returns true because of
> num_hpets_used > 0, yet hpet_broadcast_init() didn't make it down
> to spin_lock_init(&legacy_hpet_event.lock).

That is an very good point.  I had not considered it, and it turns out
that legacy broadcast is never set up

(XEN) HPET: starting hpet_broadcast_init()
(XEN) HPET: hpet_setup() successful
(XEN) HPET: 4 timers in total, 3 timers will be used for broadcast

hpet_broadcast_init() exits inside the "if ( num_hpets_used > 0 )"
clause (as the boot dmesg doesn't printk the line immediately following
the if clause), meaning that legacy broadcasts are never set up.

Therefore, the logic

if ( hpet_broadcast_is_available() )
    hpet_disable_legacy_broadcast();

in several places is wrong, and should be "if hpet_lecacy broadcast
used".  Judging on the similarities in this regard between Xen-3.4 and
Xen-4.x, i am now not certain that Xen-4.x is immune and will now
proceed to investigate this.

>> If the crashing cpu is not #0, then local_time_calibration() gets
>> worried and dumps the calibration data, and hangs at some later point
>> which I have yet to find.  This hang happens while performing the NMI
>> shootdown of other cpus.
>>
>> The support engineer who raised the bug says that it doesn't occur with
>> Xen-4.1.  Is there anything architecturally new in the Magny-Cours
>> processors which might explain this behavior?
> Possibly more a question of the surrounding platform, namely whether
> there are HPETs in the system, and whether they get used for the
> C-state broadcasting.
>
> Jan
>

Why would C-state broadcasting make a difference at this point?  I have
narrowed the crash down a bit, and local_time_calibration() is dumping
its state after one_cpu_only() and before the shootdown actually
occurs.  However, I cant see any code between these two points which
alters the state of the other CPU, which should still be running
normally at this point.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>