On 09/03/09 01:23, Jan Beulich wrote:
>> 1. Add a hypercall to set the desired location of the clock
>> correction info rather than putting it in the shared-info area
>> (akin to vcpu placement). KVM already has this; they write the
>> address to a magic MSR.
>>
> But this is already subject to placement, as it's part of the vcpu_info
> structure. While of course you don't want to make the whole vcpu_info
> visible to guests, it would seem awkward to further segregate the
> shared_info pieces. I'd rather consider adding a second (optional) copy
> of it, since the updating of this is rather little overhead in Xen,
Hm, I guess that's possible. Though once you've added a new "other time
struct" pointer, it would be easier to just make Xen update that pointer
rather than update two. I don't think a guest is going to know/care
about having two versions of the info (except that it opens the
possibility of getting confused by looking at the wrong one).
I'd propose that there'd be just one, and the non-valid pvclock
structure have its version set to 0xffffffff, since a guest should never
see a version in that state.
> but
> using this in the kernel time handling code would eliminate the
> potential for accessing all the vcpu_info fields using percpu_read().
>
I don't think that's a big concern. The kernel's pvclock handing is
common between Xen and KVM now, and it just gets a pointer to the
structure; it never accesses it as a percpu variable.
>> 2. Pack all the clock structures into a single page, indexed by vcpu
>> number
>>
> That adds a scalability issue, albeit a relatively light one: You shouldn't
> anymore assume there's a limit on the number of vCPU-s.
>
Well, that's up to the kernel rather than Xen. If there a lot of CPUs
it can span multiple pages. There's no need to make them physically
contiguous, since the kernel never needs to treat them as an array and
we can map disjoint pages contiguously into userspace (it might take a
chunk of fixmap slots).
I guess one concern is that it ends up exposing the scheduling info
about all the VCPUs to all usermode. I doubt that's a problem in
itself, but who knows if it could be used as part of a larger attack.
>> 5. On context switch, the kernel would increment the version of the
>> *old* vcpu clock structure, so that when the usermode code
>> re-checks the version at the end of its time calculation, it can
>> tell that it has a stale vcpu and it needs to iterate with a new
>> vcpu+clock structure
>>
> I don't think you can re-use the hypervisor updated version field here,
> unless you add a protocol on how the two updaters avoid collision.
> struct vcpu_time_info has a padding field, which might be designated
> as guest-kernel-version.
>
There's no padding. It would be an extension of the pvclock ABI, which
KVM also implements, so we'd need to make sure they can cope too.
We only need to worry about Xen preempting a kernel update rather than
the other way around. I think it ends up being very simple:
void ctxtsw_update_pvclock(struct pvclock_vcpu_time_info *pvclock)
{
BUG_ON(preemptible()); /* Switching VCPUs would be a disaster */
/*
* We just need to update version; if Xen did it behind our back, then
* that's OK with us. We should never see an update-in-progress
because Xen
* will always completely update the pvclock structure before
rescheduling the
* VCPU, so version should always be even. We don't care if Xen
updates the
* timing parameters here because we're not in the middle of a clock
read.
* Usermode might be in the middle of a read, but all it needs to see
is version
* changing to a new even number, even if this add gets preempted by
Xen in
* the middle. There are no cross-PCPU writes going on, so we don't
need to
* worry about bus-level atomicity.
*/
pvclock->version += 2;
}
Looks like this would work for KVM too.
J
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|