This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: rdtsc: correctness vs performance on Xen (and KVM?)

To: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Subject: [Xen-devel] Re: rdtsc: correctness vs performance on Xen (and KVM?)
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Mon, 31 Aug 2009 17:22:20 -0700
Cc: "Xen-Devel \(E-mail\)" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Alan Cox <alan@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 31 Aug 2009 17:22:47 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <830e5c23-96f5-4e79-9f11-3884735e1c33@default>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <830e5c23-96f5-4e79-9f11-3884735e1c33@default>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20090814 Fedora/3.0-2.6.b3.fc11 Lightning/1.0pre Thunderbird/3.0b3
On 08/31/09 16:52, Dan Magenheimer wrote:
> work both on Xen and bare metal, and works properly
> across: vcpu-to-pcpu rescheduling even on NUMA
> machines; system sleep/hibernation; and 
> save/restore/migration between machines with
> dissimilar clock rates. 

But it will only do this when running under Xen.  If running on bare
metal, there will be nothing providing the correction info to the app,
and it will be no better than using raw rdtsc with all its limitations. 
In practice this means that the app will have to have some other code
path anyway.

>  Implementation requires
> changes in Xen and "the app" but no OS changes
> thus making it still viable on legacy OS's
> and possibly(?) HVM domains.  Note that
> only apps that need to sample time on the
> order of >5-100K/core/second would use this;
> for other apps, rdtsc emulation overhead
> is probably negligible (<0.2%).
> 0)  Xen implements rdtsc emulation by default
> 1)  Guest OS is launched with pvtsc=1 in vm.cfg
> 2)  App running on guest OS sets up a SIGILL handler
> 3)  App executes a special rdmsr instruction or
>     hypercall.

No way to do direct hypercalls from usermode, so it would need to be an
illegal instruction (like cpuid).

But really it should be a system-wide kernel setting, set via sysctl or

> 4a) If SIGILL results, not running on Xen at all,
>     or on old Xen; app uses rdtsc at own risk. Done.
> 4b) Else, rdmsr/hypercall returns virtual address of
>     special pvclock page ("pvclock_va").
This can't be done without changing the kernel; Xen can't just start
sticking stuff into usermode mappings (how does Xen even know where a
given OS's usermode is?).

And again, usermode can't do hypercalls and I don't think we should
start making fake rdmsrs start working in usermode.

> 5)  App executes another special rdmsr instruction/
>     hypercall to disable rdtsc emulation.  This
>     affects ALL execution for all processes in this VM.

Once enabled, it should just stay enabled.  System-wide is very coarse
anyway (since there's no guarantee that all apps will use the mechanism).

> 6)  Xen maintains mapping of pvclock_va to a
>     different physical page for each processor
>     and transparently handles TLB misses for
>     pvclock_va

If you mean that a given VA has a per-cpu mapping, it requires percpu
pagetables.  That's not possible in Linux with PV pagetables (since two
tasks/threads on different cpus sharing the same mm will use the same

> 7)  App uses (unemulated) rdtsc and applies
>     pvclock algorithm (using values in memory
>     at pvclock_va) resulting in pvtsc, which
>     is nanoseconds since VM start.  App can
>     further apply local algorithms to enforce
>     monotonicity or frequency scaling as desired.
> Comments appreciated.  I realize that this is hacky
> and ugly... better alternatives gladly solicited.

In general even Linux's specialised APIs are entirely unused (sendfile,
vmsplice, etc).  Something as esoteric as this will be pretty much unused.

This can be entirely done within the vsyscall mechansim without any app
changes.  There's no reason no to.

> P.S. While it would be nice if we could just tell
> apps to use a fast vgettimeofday equivalent, this
> does not exist today and, even if it did, would not
> be widely available for years in the kernel running under
> most enterprise app deployments (and, even then,
> only on 64-bit Linux.)

These rationales are very unconvincing:

Making vsyscall work on 32bit is just a matter of doing it; apparently
nobody has put the effort into it, but there's no fundimental reason why
it wouldn't work.  Besides, who runs enterprise apps on 32-bit these
days?  Anything requiring even moderate amounts of memory is better run
on 64-bit.

Your mechanism will require kernel changes anyway, so there's no getting
around that.

Once vsyscall does Xen/KVM properly, then every app will automatically
do the right thing without modification.  There's no need for
specialized APIs that nobody will end up using anyway.  It only makes
sense to go to this kind of effort if it ends up making a plain "rdtsc"
have the properties you want it to have.


Xen-devel mailing list