WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [RFC] [PATCH] use "reliable" tsc properly when available

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Xen-Devel (E-mail)" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] [RFC] [PATCH] use "reliable" tsc properly when available, but verify
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Mon, 28 Sep 2009 15:05:02 -0700 (PDT)
Cc:
Delivery-date: Mon, 28 Sep 2009 15:05:46 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C6E6E3EC.15E19%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> Surely it should be sufficient to check TSCs for consistency 
> across all CPUs
> periodically, and against the chosen platform timer, and 
> ensure none are
> drifting? An operation which would not require us to loop for 
> 2ms and would
> provide rather more useful information than an ad-hoc multi-CPU
> race-to-update-a-shared-variable-an-arbitrary-and-large-number
> -of-times.
> 
> I wouldn't take anything like this algorithm.

The algorithm ensures that the skew between any two
processors is sufficiently small so that it is unobservable
by any app (e.g. smaller than "a cache bounce").  I'm
not sure it is possible to "check for consistency across
all CPUs" and get that guarantee any other way... unless
there is some easy way to measure the minimum cost of a cache
bounce.

I'm not sure why Linux chooses to run the test
for 20ms but I think it is because it is only running
it once at boottime so it has to eat up some time to
give the tsc's a chance to skew sufficiently.  If we
are running it more than once (and Xen hasn't written
to the tsc's recently), it's probably sufficient to
run it for far fewer iterations, but given all the
possible CPU race conditions due to caches and pipelining
and such, I'm not sure how many iterations is enough.

Note that upstream Linux NEVER writes to TSC anymore.
If the check_tsc_warp test fails, tsc is simply marked
as an unreliable mechanism other than for interpolating
within a jiffie.  If OS's had some intrinsic to describe
this "reliable vs unreliable TSC" to apps, lots of troubles
could have been avoided.  But that's roughly what I
am trying to do with pvrdtscp so I'm trying to be very
sure that when Xen says it is, TSC is both reliable and
continues to be reliable.  (Though maybe once at boottime 
is sufficient.)

Which points out another alternative:  check_tsc_warp
need only be run if one or more domains have tsc_native
enabled AND have some mechanism (such as pvrdtscp or
a userland hypercall) to ask Xen if the TSC is reliable
or not.

But since this might be minutes/hours/days after Xen
boots, I'd still like to avoid Xen mucking around
using write_tsc in the meantime as it may be "fixing"
something that ain't broke.

> I should add, not only is the algorithm stupid and slow, but 
> it doesn't even
> check for exactly what RELIABLE_TSC guarantees -- 
> constant-rate TSCs. This
> would be useless on a single-CPU system, for example, or perhaps more
> practically a single-socket system where all TSCs skewed 
> together due to
> package-wide power management. In the latter case TSCs would not skew
> relative to each other, even though they could 'skew' 
> relative to wallclock
> (represented in Xen by the platform timer).

It's only checking for TSC skew relative to other
processors in an SMP system.  What's important to
an app is that time (as measured by sampling the
TSC on random processors) never goes backwards.
That IS what RELIABLE_TSC is supposed to guarantee.
I agree that check_tsc_warp doesn't test for skew
relative to a platform timer (though I suspect
they are driven from the same crystal) and need not
be run on a single-CPU system.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel