WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] rdtscP and xen (and maybe the app-tsc answer I've been l

To: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Subject: Re: [Xen-devel] rdtscP and xen (and maybe the app-tsc answer I've been looking for)
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Mon, 21 Sep 2009 11:36:11 -0700
Cc: kurt.hackel@xxxxxxxxxx, "Xen-Devel \(E-mail\)" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>
Delivery-date: Mon, 21 Sep 2009 11:36:35 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <b02785ac-8a04-4bb1-9d24-08345bb0f87a@default>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <b02785ac-8a04-4bb1-9d24-08345bb0f87a@default>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090814 Fedora/3.0-2.6.b3.fc11 Lightning/1.0pre Thunderbird/3.0b3
On 09/19/09 08:34, Dan Magenheimer wrote:
> You're right, I don't need to differentiate between
> the two emulated cases.  I was trying to overload
> an extra piece of information that I really don't
> need to overload.
>
> However, I do need one special case to indicate
> emulation vs non-emulation, so wraparound is
> still a problem.
>   

I was assuming you'd just repurpose the existing version number scheme
which is always even, and therefore can never equal -1.

>> > If the hardware doesn't support rdtscp, how should an app know whether
>> > or not to use it?  Should it just try running rdtscp being prepared to
>> > handle a SIGILL?
>>     
> Yes, that's the plan.  I think this scheme always
> works, but only works fast if the hardware supports
> rdtscp and constant_tsc

What's the full algorithm for detecting this feature?  Usermode has to
establish:

   1. It is running under Xen (or not, if you expect this to be
      implemented on multiple hypervisors)
   2. rdtscp is available
   3. the ABI is actually being implemented, ie:
         1. the tsc_aux value actually has the correct meaning
         2. it has a working mechanism for getting the tsc scaling
            parameters
         3. (accommodate ways to evolve the ABI in a back-compatible way)

before it can do anything else.

If nothing else, its probably worth removing the rdtscp feature from the
logical guest cpuid, so that nothing else tries to use it for its own
purposes; in other words, you're exclusively claiming rdtscp for this
ABI.  Or you could disable this ABI if a guest kernel tries to set TSC_AUX.

> I've restricted the scheme to constant_tsc as I think
> it breaks down due to nasty races if running on a
> machine where the pvclock parameters differ across
> different pcpus.  I think the races can only be
> avoided if Xen sets the TSC_AUX for all of the
> pcpus running a pvrdtscp doman while all are idle.
>
> Is there a scheme that avoids the races?
>   

rdtscp makes it quite easy to avoid races because you get the tsc and
metadata about the tsc atomically.  You just need to encode enough info
in the metadata to do the conversion.

The obvious thing to do is to pack a version number and pcpu number into
TSC_AUX.  Usermode would maintain an array of pv_clock parameters, one
for each pcpu.  If the version number matches, then it uses the
parameters it has; if not it fetches new parameters and repeats the
rdtscp.  There's no need to worry about either thread or vcpu context
switches because you get the (tsc,params) tuple atomically, which is the
tricky bit without rdtscp.

(The version number would be truncated wrt the normal pvclock version
number, but it just needs to be large enough to avoid aliasing from
wrapping; I'm assuming something like 24 bits version and 8 bits cpu
number.)

> Fortunately, this also has the effect of greatly
> reducing the version increase frequency.
>   

I don't think that's going to be a huge issue; fetching time parameters
with a syscall/hypercall would be on the same order as doing an emulated
rdtsc, and would only need to happen, say, once per timeslice (100Hz?)
at the outside.

> The rate is synced but the values may not be.  Since
> software (BIOS or Xen) sets tsc on each processor
> it is essentially impossible to ensure they are
> identical.  The rendezvous algorithm should be able
> to set them so that they are "unobservably" different,
> but I keep hearing "within 2usec".  (It would be
> interesting to measure this across a broad set
> of machines.)  So it's probably prudent to recommend
> that apps be prepared for the possibility even if
> it never happens.
>   

You don't need to guarantee anything stronger than they'd see on bare
hardware.  You also need to be more precise about exactly what you're
guaranteeing.

Are you saying that a single thread will never see regressing tscs? 
That just requires making sure that Xen gets the tscs synced closer than
the context switch time of a thread between cpus, which should be possible.

Or are you making the stronger guarantee that two threads running
concurrently on different cpus doing rdtsc will see monotonically
increasing tscs with respect to the ordering of all their operations? 
That would require arbitrarily close syncing (well, within a the time it
takes a cacheline to bounce I guess).

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>