This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: Large system boot problems

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject: [Xen-devel] Re: Large system boot problems
From: Bill Burns <bburns@xxxxxxxxxx>
Date: Fri, 08 Feb 2008 10:22:10 -0500
Cc: Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, "Carb, Brian A" <Brian.Carb@xxxxxxxxxx>
Delivery-date: Fri, 08 Feb 2008 07:22:36 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C3D2224F.1C216%Keir.Fraser@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C3D2224F.1C216%Keir.Fraser@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird (X11/20071129)
Keir Fraser wrote:
> On 8/2/08 15:10, "Bill Burns" <bburns@xxxxxxxxxx> wrote:
>> The message from early_time_init (caller of
>> iinit_pit_and_calibrate_tsc, indicates that the
>> initial detection is ok:
>> (pmtimer case) (XEN) Detected 3400.114 MHz processor.
>> ((pit case)   (XEN) Detected 3400.165 MHz processor.
>> So I think it's the latter. The init of a large system
>> is staving off the soft irq so that the next calc fails.
> Okay, well you could test this by inserting a process_pending_timers() in
> the CPU-booting loop in smpboot.c. If you do timer work after booting each
> CPU, perhaps that makes the problem go away?

I woke up in the middle of the night with that idea
a few days ago and tried it without success. Seemed that
calls to process_pending_timers had no effect until
a certain point. But I need to go and look at that
some more and see why...

> But ultimately the calibration code should be robust to long delays before
> it is executed. It shouldn't go haywire. So something is bad there. Do you
> have a dump of the decision made by the calibration code on cpu0 the very
> first time it actually gets invoked? We probably need to trace the hell out
> of that first invocation to work out why it gets things so badly wrong.

I don't have more than in the earlier email where is shows the
large delta in tsc time, which seems to cause the bogus result.


>  -- Keir

Xen-devel mailing list