This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] write_tsc in a PV domain?

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: RE: [Xen-devel] write_tsc in a PV domain?
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Mon, 31 Aug 2009 14:06:04 -0700 (PDT)
Cc: "Xen-Devel \(E-mail\)" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Alan Cox <alan@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 31 Aug 2009 14:06:48 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C6C1DDAA.139BF%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > I have yet to get a measurement of either syscall that
> > is better than 2.5x WORSE than emulating rdtsc. On
> > my dual-core Conroe (Intel E6850) with 64-bit Xen and
> > 32-bit dom0, I get approximately:
> > 
> > rdtsc native: 22ns
> > softtsc (rdtsc emulated): 360ns
> Trap-and-emulate in 360ns seems astoundingly good. Perhaps 
> too good to be true?

I measured with the patch you checked in as 20128.

I tried a couple of tests, first changing pv_soft_rdtsc
to always return a value with the 4 LSB of the return
value cleared, second with the 4 LSB of the return value
set.  Both were properly reflected by a userland rdtsc.
So it looks like the correct emulation code is executing.

And get_s_time() always returns nanoseconds, correct?
So consecutive emulated rdtsc's should return values
that differ by the amount of nsec necessary to do
the emulation, right?  I ran 2 million rdtsc's in
a loop and took the average so, ignoring loop
and load/store overhead, the 360ns appears to be
an accurate measurement.

A thousand cycles to trap, decode, call get_s_time,
and return seems astoundingly good?  Probably it's
faster than a vmexit because there's so much less state
to save.  But still it's 15x slower than a raw rdtsc.

If you have ideas on how to test the measurement further,
I'd be happy to give them a spin.


Xen-devel mailing list