This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] write_tsc in a PV domain?

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: RE: [Xen-devel] write_tsc in a PV domain?
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Wed, 26 Aug 2009 16:10:10 -0700 (PDT)
Cc: "Xen-Devel \(E-mail\)" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date: Wed, 26 Aug 2009 16:11:17 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A95B789.6020604@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > While I think I understand entirely why you would want to
> > think of it that way, there's thousands (millions?) of applications
> > out there that would beg to differ.  They DO assume that
> > rdtsc bears "some" relationship to time.
> They are wrong.  Linux doesn't offer the tsc to usermode for its use. 
> The closest it gets is vgettimeofday, which we could implement better.

Linux doesn't have to offer it.  The Intel x86 CPU does.  It's
a legal instruction for an app to use and (quote from Intel
SDM) "is guaranteed to return a monotonically increasing
unique value whenever executed except for 64-bit wraparound."
While that's not precisely a "relationship" to time, mere
mortals programming are likely to interpret it that way.

(Keir, please note that it says monotonically-increasing,
not monotonically-non-decreasing, so the current softtsc
implementation for HVM I think is incorrect.)

> >   Indeed Linux itself does. 
> A pv linux guest doesn't have a TSC in the same way that it 
> doesn't have
> a TSS or any number of other CPU features.  It would be a grave error
> for the kernel to use a tsc-based clocksource rather than the Xen pv
> clocksource.  A Xen PV VCPU bears a passing resemblance to an 
> Intel x86
> CPU, but should not be confused with one.

So are you going to guarantee that 2.6.31 Linux when running
on Xen has no uses or dependencies on rdtsc delivering anything
other than a random value?

> >  Exactly what that relationship to time is defined to be is
> > open to debate, and whether Xen supports whatever relationship
> > is defined is also debatable (especially in the presence of
> > migration).  But defining rdtsc as returning random bits
> > is not an acceptable solution for Xen.  Dom0 won't even
> > boot if rdtsc returns random bits so Xen must already be
> > guaranteeing that rdtsc has "some" relationship to time.
> No, it really doesn't.  It provides a PV clock, which includes "rdtsc"
> as part of its ABI.  It is not a general tsc.  You can't meaningfully
> execute "rdtsc" without also being (indirectly) aware of what pcpu its
> running on and applying the appropriate corrections to turn it into
> system monotonic time.  Executing rdtsc willy-nilly gets you useless
> results; fortunately no PV Xen kernel does that.

While what you are saying may seem reasonable, I think you
will find by looking at linux-2.6.18-xen that it is not true
in reality.  If you trap kernel uses of rdtsc and return random
values, dom0 will not boot.

> > We've been lucky so far with allowing rdtsc to execute directly
> > in hardware, but we really do need to fix it properly.
> No, that's false.  The current Xen time model works fine for 
> all guests
> using it correctly.
> Emulating rdtsc for hvm guests is another question entirely.

In the end, I don't care if rdtsc's in the kernel are emulated
(and the patch I submitted earlier doesn't emulate them other
than to do a "slow" rdtsc).  But apps don't care if they are
running on an HVM or a PVM, so if they use rdtsc, even if you
believe that usage of rdtsc is incorrect, rdtsc must deliver
what the Intel ABI guarantees.

> > But since applications cannot WRITE to tsc and Xen has some
> > control over the OS->Xen PV API, it might be safe to define that
> > write_tsc is a no-op.
> No, write_tsc is meaningless, and anyone trying to execute it is not
> even wrong.

In that case, are you saying it is an illegal instruction for a PV
guest to execute?  If so, we should not ignore it, we should fail
the guest.  But that would be unfortunate for the RHEL5-64bit
PV guests that actually DO use it.


Xen-devel mailing list