This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Timer going backwards and Unable to handle kernel NULLpo

To: "Ian Pratt" <Ian.Pratt@xxxxxxxxxxxx>, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>
Subject: RE: [Xen-devel] Timer going backwards and Unable to handle kernel NULLpointer
From: "Jan Beulich" <jbeulich@xxxxxxxxxx>
Date: Wed, 30 May 2007 17:20:58 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Robert Hulme <rob@xxxxxxxxxxxx>
Delivery-date: Wed, 30 May 2007 08:19:04 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>I've been seeing these pretty regularly on a single-socket dual-core Athlon
>system for the last couple of months, and only on Friday finally found time
>to start looking into these. Besides the messages above, I also see hangs
>in about every other boot attempt but only if I do *not* use serial output
>(which makes debugging a little harder), and never once initial boot finished
>- this is why I finally needed to find time to look into the problem. I shall
>note though that the kernel we use does not disable CONFIG_GENERIC_TIME
>and makes use of a Xen clocksource as posted by Jeremy among the
>paravirt ops patches.
>What happens when the hang occurs (in do_nanosleep context) is that the
>time read/interpolated from the Xen provided values is in the past compared
>to the last value read (and cached inside the kernel), resulting in a huge
>timeout value rather than the intended 50ms one.
>Without having collected data proving this (will do later today), I currently
>think that the interpolation parameters are too imprecise until the first time
>local_time_calibration() runs on each CPU, i.e. during little less than the 
>second of dom0's life).

The box I'm looking at takes 600ms to enable ACPI mode, during which time
no interrupts get delivered. Since it is not having a (visible) HPET, it has to
use the PIT, the 16-bit counter of which manages to roll over 11 times during
this process. The result is that the TSC is considered running too fast and
hence getting slowed down. Since this slow-down doesn't happen at exactly
the time (it can't be expected to), one CPU starts reporting measurably
smaller nano-second time values than the other, hence monotonicity gets
violated pretty significantly.

I'm therefore considering:
- making the PIT timer recover from being disabled for periods longer than
  what the 16-bit counter can tolerate (by means of estimating the number
  of roll-overs based on the TSC) - this would probably work well close after
  boot or at any time all TSCs are sufficiently synchronized, but could go
  pretty wrong as the individual TSCs drift apart
- inventing a method in the kernel that can cover even significantly non-
  monotonic values interpolated on different CPUs (it is clear from the data
  collected that small deviations from monotonic values must be accounted
  for in any case, but that could be done by simply returning the most
  recently returned value in case it turns out that the interpolated value is
  smaller than that, so the issue is really how to reasonably bridge large



Xen-devel mailing list