This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Re: [PATCH] x86: unconditionally mark TSC unstable under

> > OK, then I'm confused.  Either:
> > - this is one of those recent Intel boxes where all the TSCs should
> >   be sync'ed but due to firmware issues they are not, in which case
> >   this is a Linux bug that has already been fixed upstream; or
> > - this isn't Xen 4.0+ but should be fixed in 4.0; or
> > - this is Xen 4.0+ and the default tsc_mode is being overridden
> >
> > Otherwise, why is TSC not synchronized and pvclock always getting
> > an offset of 0?
> No, this bug doesn't really have anything to do with tsc sync issues.
> The situation is:
>     * The scheduler uses its own timebase, called sched_clock
>     * We have a pvop for sched_clock
>     * The Xen implementation for sched_clock counts unstolen ns, rather
>       than wallclock ns, since this is (somewhat, in theory) useful
>     * However, the scheduler checks to see if the tsc is stable
> (because
>       the default sched_clock is based on the tsc), and if so, assumes
>       that sched_clock is synced across all cpus - but of course the
>       amount of stolen time is different for each vcpu
> Unfortunately, while the idea of counting unstolen time is useful to
> see
> how much work got done in a timeslice, it pretty useless for counting
> how long something was asleep for (since you don't care about how much
> time was "stolen" while you were asleep).  And the scheduler uses the
> same timebase for measuring both.
> So the fix is to simply use plain Xen system time as the scheduler
> clock, as that will be synced across cpus.

OK, that makes sense.  Thanks for the thorough explanation.

Maybe the xen_sched_clock code should be entirely removed
rather than ifdef'd since it is no longer used and
"(somewhat, in theory)" led to a strange bug?  Or if
you are confident that it will be useful in the future
by some linux scheduler, maybe add some comments about
how enabling it may cause the effects Jed saw.

And maybe an even better answer is to submit a patch upstream
so that the scheduler doesn't use the same timebase for
measuring both, since the kernel is making a bad assumption
about real vs virtual time. I'd imagine KVM users might benefit
from that also.

Xen-devel mailing list