On 07/15/2010 10:14 AM, Dan Magenheimer wrote:
>>> Isn't the real problem that, in a PV guest, the cpuid instructions
>>> that are testing the TSC-related CPUID bits are obtaining the actual
>>> hardware value, rather than what Xen would like the guest to believe?
>>>
>> No, because there shouldn't be any "naked" rdtscs in the kernel.
>>
>>
>>> IOW, isn't the correct fix to use pvcpuid instead of cpuid when
>>> xen_pvdomain() is true?
>>>
>> Every use of cpuid in the kernel goes via the cpuid pvop, which ends up
>> doing the Xen cpuid rather than the native one. Usermode cpuid is
>> still the "real" one, unless they explicitly use the Xen version.
>>
> OK, then I'm confused. Either:
> - this is one of those recent Intel boxes where all the TSCs should
> be sync'ed but due to firmware issues they are not, in which case
> this is a Linux bug that has already been fixed upstream; or
> - this isn't Xen 4.0+ but should be fixed in 4.0; or
> - this is Xen 4.0+ and the default tsc_mode is being overridden
>
> Otherwise, why is TSC not synchronized and pvclock always getting
> an offset of 0?
No, this bug doesn't really have anything to do with tsc sync issues.
The situation is:
* The scheduler uses its own timebase, called sched_clock
* We have a pvop for sched_clock
* The Xen implementation for sched_clock counts unstolen ns, rather
than wallclock ns, since this is (somewhat, in theory) useful
* However, the scheduler checks to see if the tsc is stable (because
the default sched_clock is based on the tsc), and if so, assumes
that sched_clock is synced across all cpus - but of course the
amount of stolen time is different for each vcpu
Unfortunately, while the idea of counting unstolen time is useful to see
how much work got done in a timeslice, it pretty useless for counting
how long something was asleep for (since you don't care about how much
time was "stolen" while you were asleep). And the scheduler uses the
same timebase for measuring both.
So the fix is to simply use plain Xen system time as the scheduler
clock, as that will be synced across cpus.
J
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|