WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: Xen system skew MUCH worse than tsc skew (was RE: [Xen-devel]RE: [PA

To: "Ian Pratt" <Ian.Pratt@xxxxxxxxxxxxx>, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxxx>, "Xen-Devel (E-mail)" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: Xen system skew MUCH worse than tsc skew (was RE: [Xen-devel]RE: [PATCH] record max stime skew (was RE: [PATCH] strictly increasinghvm guest time))
From: "Dan Magenheimer" <dan.magenheimer@xxxxxxxxxx>
Date: Tue, 22 Jul 2008 18:40:38 -0600
Cc: Dave Winchell <dwinchell@xxxxxxxxxxxxxxx>
Delivery-date: Tue, 22 Jul 2008 17:41:23 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <DD74FBB8EE28D441903D56487861CD9D32AC257C@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Oracle Corporation
Reply-to: "dan.magenheimer@xxxxxxxxxx" <dan.magenheimer@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcjcXTkqnSPaEESHRsmD1HhwJkyawAAAIPEpAAuAJ0AAAgl0IAAT2dqMABFKNxAAAHEqUAAGNaGwAAdfXjQAIb9HAAAAjWJ4AAico9AAAPZ0lAECdtJAABGXRyYAHMJkcAAXKn0wAaR9gGAAUpgMiQBOeNLwAAIJq9AAAsXxwA==
> If you want to test this theory, you can easily get all the CPUs to
> recalibrate at the same instant, though it's a bit expensive:
> 
> Get one CPU to issue an smp_call_function on all CPUs (including
> itself). The called function should atomic_inc a variable and 
> then spin
> waiting reading the count until all CPUs have reached this point. When
> this happens, turn interrupts off, atomic_dec the same counter, spin
> until it hits zero, then read the TSC, re-enable interrupts, finish.
> The TSC reads should all happen very close to each other. 

The code invoked by "xm debug-key t" does exactly that and I've been
using it (as one way) to measure skew.  Any idea how expensive it is?
Is it too expensive to do once/second?  If it's not more expensive
than the (1Hz per processor) local_time_calibration(), perhaps we
should just use it to set TSC on all processors once/second and dispense
with the existing (beautiful but one additional frequency to resonate)
platform-timer-interpolated-by-tsc approach?

On the other hand, I'll bet the bigger the system, the more difficult
it is to rendezvous them... and the more natural skew there will be
between the sockets.
 
> The only thing that could mess this up would be NMI's or SMI's. You
> could at least detect that by reading the TSC after all CPUs have
> incremented the counter, and check that only a "reasonable" amount of
> time had elapsed. If not, set a flag to indicate that a 
> recalibration is
> required (you'd need to add another gather loop to enable all CPUs to
> vote on whether they're happy).

I think I've seen this code in recent Linux.

But assuming we stay with the existing approach, I'm not sure
the processors need to be calibrated at "exactly" the same time,
just "close".  Something similar to "round jiffies" (see
http://lkml.org/lkml/2006/10/10/189) may be enough... though
I guess that depends on the character of the timesource jitter.

Thanks,
Dan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>