> On 10/26/2010 02:22 AM, Mark Adams wrote:
> > On Thu, Oct 07, 2010 at 07:04:18AM -0700, Dan Magenheimer wrote:
> >> Hi Jeremy and Mark --
> >>
> >> Oddly, I saw that "clocksource tsc unstable" message myself
> >> on a busy 2.6.36-rc5 PV domain yesterday. While it is possible
> >> that this reflects a hardware problem, the fact that you
> >> saw it on a Nehalem+ Intel processor makes it very unlikely.
> >> The "s" and "t" debug keys (the output of which can be seen via
> >> "xm debug-key s; xm dmesg | tail" in dom0) can help diagnose
> >> the problem if it is indeed a hardware problem or BIOS
> >> problem or the result of a CPU hot-add... all unlikely.
> >>
> >> It IS possible that the code that emulates tsc is broken
> >> somewhere, but I don't think tsc should be emulated by
> >> default for dom0 on a Nehalem+ box... and even if it is,
> >> it is directly based on Xen system time which, if it went
> >> awry, would probably cause major problems.
> >>
> >> Looking through the Linux code that prints that message (in
> >> kernel/time/clocksource.c) it appears that the message
> >> appears if the tsc deviates from the "watchdog clocksource",
> >> which in PV domains is "xen" (or more precisely pvclock
> >> I think). So most likely, this is a symptom of a problem
> >> with pvclock or the watchdog code in the pvops kernel, not
> >> an indicator that the tsc is actually unstable.
> >>
> > Is there any more information I can provide to help with debugging
> this?
> > We haven't had the problem since. It could just be a coincidence but
> it
> > happened around the time that daylight savings occurred in the US (we
> > are in the UK).
>
> In Linux/Xen it shouldn't have any effect since the clocks are always
> maintained in UTC, then timezone details are applied much later in
> usermode. But Windows has a bad habit of setting the hardware RTC to
> local time, and mucking about with it for DST changes - but that would
> only be relevant if you booted Windows on your host machine (I don't
> think there's any way for a Windows guest's time to leak into the
> host/dom0's timebase).
>
> Unfortunately these kinds of time problems can be notoriously hard to
> pin down and diagnose.
This seems to occur when one -- or possibly all -- vcpus
are "spinning" for an unexpectedly long period of time. If so
it may be possible to synthesize some kind of long-but-non-infinite
deadlock in a domU kernel which might reproduce the problem.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|