This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: Large system boot problems

To: Bill Burns <bburns@xxxxxxxxxx>
Subject: [Xen-devel] Re: Large system boot problems
From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date: Fri, 08 Feb 2008 15:14:23 +0000
Cc: Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, "Carb, Brian A" <Brian.Carb@xxxxxxxxxx>
Delivery-date: Fri, 08 Feb 2008 07:15:28 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <47AC70E2.6090900@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AchqZUkOh+Xl6dZYEdy43AAX8io7RQ==
Thread-topic: Large system boot problems
User-agent: Microsoft-Entourage/
On 8/2/08 15:10, "Bill Burns" <bburns@xxxxxxxxxx> wrote:

> The message from early_time_init (caller of
> iinit_pit_and_calibrate_tsc, indicates that the
> initial detection is ok:
> (pmtimer case) (XEN) Detected 3400.114 MHz processor.
> ((pit case)   (XEN) Detected 3400.165 MHz processor.
> So I think it's the latter. The init of a large system
> is staving off the soft irq so that the next calc fails.

Okay, well you could test this by inserting a process_pending_timers() in
the CPU-booting loop in smpboot.c. If you do timer work after booting each
CPU, perhaps that makes the problem go away?

But ultimately the calibration code should be robust to long delays before
it is executed. It shouldn't go haywire. So something is bad there. Do you
have a dump of the decision made by the calibration code on cpu0 the very
first time it actually gets invoked? We probably need to trace the hell out
of that first invocation to work out why it gets things so badly wrong.

 -- Keir

Xen-devel mailing list