WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Timer going backwards and Unable to handle kernel NULLpo

>>> "Ian Pratt" <Ian.Pratt@xxxxxxxxxxxx> 28.05.07 22:03 >>>
>> I've got a Thunder K8SRE with two dual core Opteron processors and
>> eight 1 GB SDRAM sticks in it.
>> Booting in to SMP mode resulted in this:
>> 
>> May 16 00:43:37 weebl kernel: Timer ISR/3: Time went backwards:
>> delta=-12967620 delta_cpu=217015619 shadow=46263756493 off=423276508
>> processed=46700000000 cpu_processed=46470016761
>> May 16 00:43:37 weebl kernel:  0: 46680000000
>> May 16 00:43:37 weebl kernel:  1: 46700016761
>> May 16 00:43:37 weebl kernel:  2: 45670016761
>> May 16 00:43:37 weebl kernel:  3: 46470016761
>
>The only time I've seen this is on a system that was overheating and
>bouncing in and out of thermal throttling. Any chance that could be
>happening here?

I've been seeing these pretty regularly on a single-socket dual-core Athlon
system for the last couple of months, and only on Friday finally found time
to start looking into these. Besides the messages above, I also see hangs
in about every other boot attempt but only if I do *not* use serial output
(which makes debugging a little harder), and never once initial boot finished
- this is why I finally needed to find time to look into the problem. I shall
note though that the kernel we use does not disable CONFIG_GENERIC_TIME
and makes use of a Xen clocksource as posted by Jeremy among the
paravirt ops patches.
What happens when the hang occurs (in do_nanosleep context) is that the
time read/interpolated from the Xen provided values is in the past compared
to the last value read (and cached inside the kernel), resulting in a huge
timeout value rather than the intended 50ms one.
Without having collected data proving this (will do later today), I currently
think that the interpolation parameters are too imprecise until the first time
local_time_calibration() runs on each CPU, i.e. during little less than the 
first
second of dom0's life).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>