|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] Re: Large system boot problems
Keir Fraser wrote:
> On 8/2/08 15:22, "Bill Burns" <bburns@xxxxxxxxxx> wrote:
>
>>> But ultimately the calibration code should be robust to long delays before
>>> it is executed. It shouldn't go haywire. So something is bad there. Do you
>>> have a dump of the decision made by the calibration code on cpu0 the very
>>> first time it actually gets invoked? We probably need to trace the hell out
>>> of that first invocation to work out why it gets things so badly wrong.
>> I don't have more than in the earlier email where is shows the
>> large delta in tsc time, which seems to cause the bogus result.
>
> Okay, well looking at the inputs on that first invocation -- master_stime
> and local_stime -- they are totally out of sync. One says that 9.3s has
> elapsed since init_xen_time() was invoked, the other says that 4.6s has
> elapsed (curiously exactly half the time). The former is correct if the CPU
> really is a 3.4GHz part and is running at full speed for the duration. But
> you ought to be able to work out which is the correct ballpark by timing
> with a stopwatch the time between init_xen_time() and that first invocation
> on cpu0 of local_time_calibration() (you'll have to printk() when
> init_xen_time() is executed).
>
> -- Keir
>
>
Well, I have a proposed fix that fixes the major symptom
of dom0 reporting time going backwards and failing it initialize
properly. I must note that dom0 still reports the wrong speed for
CPU0 when only one iteration of local_time_calibration occurs
before dom0 gets going. I believe that second issue is probably
due to the large delta between the master and local stime.
The first call to local_time_calibration automatically fixes
local stime being behind.
But when a significant amount of time has elapsed before the
initial call to local_time_calibration the code that deals with
the local stime and tsc deltas is broken. When the 64 bit deltas
for local stime is manipulated down to a 32 bit value the
tsc delta is also adjusted, but the tsc_shift value is
not maintained.
There are two loops. The first shifts both the stime and
tsc vaules in sync but fails to record the tsc shift:
while ( ((u32)stime_elapsed64 != stime_elapsed64) ||
((s32)stime_elapsed64 < 0) )
{
stime_elapsed64 >>= 1;
tsc_elapsed64 >>= 1;
++ tsc_shift--;
}
The second does the tsc shift alone, which is fine, but note
that it does record the tsc shift.
/* tsc_elapsed <= 2*stime_elapsed */
while ( tsc_elapsed64 > (stime_elapsed32 * 2) )
{
tsc_elapsed64 >>= 1;
tsc_shift--;
}
Making this one line change, as in the attached patch
yields a properly working dom0. Tested on both a small
memory and large memory system.
Bill
--- arch/x86/time.c.orig 2008-02-12 07:16:48.000000000 -0500
+++ arch/x86/time.c 2008-02-12 11:19:47.000000000 -0500
@@ -857,6 +857,7 @@ static void local_time_calibration(void
{
stime_elapsed64 >>= 1;
tsc_elapsed64 >>= 1;
+ tsc_shift--;
}
/* stime_master_diff now fits in a 32-bit word. */
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|