WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] RE: rdtsc: correctness vs performance on Xen (and KVM?

To: Jan Beulich <JBeulich@xxxxxxxxxx>
Subject: Re: [Xen-devel] RE: rdtsc: correctness vs performance on Xen (and KVM?)
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Thu, 03 Sep 2009 10:29:40 -0700
Cc: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, "Xen-Devel \(E-mail\)" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Alan Cox <alan@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 03 Sep 2009 10:30:36 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A9F9917020000780001320C@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C6C4A72F.13BDD%keir.fraser@xxxxxxxxxxxxx> <4A9EEC3D.4070402@xxxxxxxx> <4A9F9917020000780001320C@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090814 Fedora/3.0-2.6.b3.fc11 Lightning/1.0pre Thunderbird/3.0b3
On 09/03/09 01:23, Jan Beulich wrote:
>>   1. Add a hypercall to set the desired location of the clock
>>      correction info rather than putting it in the shared-info area
>>      (akin to vcpu placement).  KVM already has this; they write the
>>      address to a magic MSR.
>>     
> But this is already subject to placement, as it's part of the vcpu_info
> structure. While of course you don't want to make the whole vcpu_info
> visible to guests, it would seem awkward to further segregate the
> shared_info pieces. I'd rather consider adding a second (optional) copy
> of it, since the updating of this is rather little overhead in Xen,

Hm, I guess that's possible.  Though once you've added a new "other time
struct" pointer, it would be easier to just make Xen update that pointer
rather than update two.  I don't think a guest is going to know/care
about having two versions of the info (except that it opens the
possibility of getting confused by looking at the wrong one).

I'd propose that there'd be just one, and the non-valid pvclock
structure have its version set to 0xffffffff, since a guest should never
see a version in that state.

>  but
> using this in the kernel time handling code would eliminate the
> potential for accessing all the vcpu_info fields using percpu_read().
>   

I don't think that's a big concern.  The kernel's pvclock handing is
common between Xen and KVM now, and it just gets a pointer to the
structure; it never accesses it as a percpu variable.

>>   2. Pack all the clock structures into a single page, indexed by vcpu
>>      number
>>     
> That adds a scalability issue, albeit a relatively light one: You shouldn't
> anymore assume there's a limit on the number of vCPU-s. 
>   

Well, that's up to the kernel rather than Xen.  If there a lot of CPUs
it can span multiple pages.  There's no need to make them physically
contiguous, since the kernel never needs to treat them as an array and
we can map disjoint pages contiguously into userspace (it might take a
chunk of fixmap slots).

I guess one concern is that it ends up exposing the scheduling info
about all the VCPUs to all usermode.  I doubt that's a problem in
itself, but who knows if it could be used as part of a larger attack.

>>   5. On context switch, the kernel would increment the version of the
>>      *old* vcpu clock structure, so that when the usermode code
>>      re-checks the version at the end of its time calculation, it can
>>      tell that it has a stale vcpu and it needs to iterate with a new
>>      vcpu+clock structure
>>     
> I don't think you can re-use the hypervisor updated version field here,
> unless you add a protocol on how the two updaters avoid collision.
> struct vcpu_time_info has a padding field, which might be designated
> as guest-kernel-version.
>   

There's no padding.  It would be an extension of the pvclock ABI, which
KVM also implements, so we'd need to make sure they can cope too.

We only need to worry about Xen preempting a kernel update rather than
the other way around.  I think it ends up being very simple:

void ctxtsw_update_pvclock(struct pvclock_vcpu_time_info *pvclock)
{
        BUG_ON(preemptible());  /* Switching VCPUs would be a disaster */

        /*
         * We just need to update version; if Xen did it behind our back, then
         * that's OK with us.  We should never see an update-in-progress 
because Xen
         * will always completely update the pvclock structure before 
rescheduling the
         * VCPU, so version should always be even.  We don't care if Xen 
updates the
         * timing parameters here because we're not in the middle of a clock 
read.
         * Usermode might be in the middle of a read, but all it needs to see 
is version
         * changing to a new even number, even if this add gets preempted by 
Xen in
         * the middle.  There are no cross-PCPU writes going on, so we don't 
need to
         * worry about bus-level atomicity.
         */
        pvclock->version += 2;
}

Looks like this would work for KVM too.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>