>From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
>Sent: 2009年4月4日 6:23
>I think I still have a real concern here. Let me see if
>I can explain.
>The goal for Xen timekeeping is to ensure that if a guest
>could somehow magically read any of its virtual clocks
>(tsc, pit, hpet, pmtimer, ??) on all its virtual processors
>simultaneously, the values read must always obey this
>"virtual clock law":
> max - min < delta
>We can argue how large that delta can reasonably be and it
>may vary depending on what the workload is, but
>it's certainly under a millisecond, ten microseconds
>might not be a bad starting point, and it is getting
>smaller as processors get faster.
>If xen can't guarantee that, then it must turn on "numa"
>mode, which appears to me to be extremely restrictive
>and no system vendor could sell honestly sell the true
>promise of virtualization on such a box. So we'd like
>to avoid that if possible.
I also heard one concern that completely random load balance
may also work suboptimally on large scale system, being
fierce contention on shared data structures, and thus some
coarse-grained soft partition or limitation are welcomed to
ensure accurate control on assigned resources to given VM
and also avoid cross node traffic as possible. In such case
enable 'numa' could serve the purpose to some extent, which
simply refine given VM's activity within one node, but definitely
allow administrative tools to move it across node at its
disposal. I once heard that typical deployed VMs nowadays
are provisioned with 1 - 4 vcpus which normally fits in one
node. But this may not be true in all cases.
Well, my point is a bit out of topic here. Of course your
concern about cross-node TSC variance still makes sense
whether or not node affinity is enforced, as long as VM is
possibly migrated cross-nodes. My point is just that turn
on 'numa' itself is really not a 'extremely restrictive' thing. :-)
>Note that the Linux approach doesn't work here
>because: 1) a guest's clocks might obey the "virtual clock
>law" at one moment on one set of physical processors
>and not at the next moment; 2) guests access to all
>clocks (except the tsc) is emulated so even if a guest
>decides the tsc is unreliable, that just doesn't help
>if the alternate clock it chooses (e.g. HPET) is silently
>emulated on top of xen system time using the physical tsc.
As Keir said, Xen system time itself is implemented in
a stable style, and thus as long as HVM timer virtualization
finally falls into emulation path, it should be stable too by
adding some overhead atop current tsc virtualization path.
Xen-devel mailing list