Hi Tim --
Thanks for your comments!
> At 17:24 +0100 on 08 Oct (1255022685), Dan Magenheimer wrote:
> > Tongue-in-cheek noted. ;-) But seriously, what I'm proposing
> > is that now that this is architected by the processor, poorly
> > designed systems (or extremely large systems) should be the rare
> > exception, not the rule.
>
> That seems like unwarranted optimism, but we'll just have to wait and
> see. I've seen enough bugs that boiled down to reputable system
> builders doing things that software engineers thought would
> surely never happen.
Well, app providers have been beating up on processor and
system vendors for years to "fix the d*mn timestamp problem".
They finally have, and have even made it architectural.
I can think of one large enterprise software provider
that would gladly redlist systems that regress in this
area.
So color me optimistic that the problem is solved or
at least that system vendors will only sin for a very
good reason; and their indiscretions will be public enough
that their need for a special boottime Xen option will not
be a closely-guarded secret.
Now all I'm trying to do is ensure that Xen virtual machines
don't suffer their own "d*mn timestamp problem", especially
given that VMware doesn't have one.
> > A) unsafe (neither constant nor power-invariant)
> > B) semi-safe (constant = P-,T-state invariant, C-state may stop)
> > C) safe (constant+non-stop = P-,T-,and C-state invariant)
> > D) false-positive safe (CPUs safe, system-wide is not)
>
> OK; for the record I believe C should be assumed to be D.
What?!? And waste all that hard work by processor and
system vendors to finally fix the problem? ;-)
I admit that I have some reservations as well, so would
like Xen to verify "safeness" at each boot, and
preferably periodically for the life of the system.
Verification turns out to be quite ugly though,
and probably even more so for those superNUMA
systems that might be most likely to fail the test.
> > Xen currently assumes A.
>
> That's what I meant by detection and correction.
IMHO, the road to software performance hell is paved with
least-common-denominator solutions. (And, yes, to
take the words right out of your mouth before you
say them, the road to software maintenance hell is paved
with never-used special cases.)
> > This is sufficient for Xen's needs,
> > and for the pvclock algorithm, but insufficient for my
> > plans to expose "TSC reliability" to usermode.
>
> Your plans for usermode<-->hypervisor direct TSC integration
> seem to me to be an unpleasant hack.
Yes, I admit it offends my aesthetics some. But I defend
it to myself by believing that this is just a first step
in a long road of closer collaboration between hypervisor
and apps. Really the whole point of paravirtualization
is to benefit from knowing that the underlying platform
is virtual. Why should apps be excluded from the party?
> I understand that you have good business
> reasons for wanting it (even if you're not allowed to tell us
> explicitly
> what they are) and we've seen the justifications enough times that we
> don't need to cover them again here, but it's still a hack.
I think I've been very explicit: Some very large apps, both
Oracle and non-Oracle, need a way to get a timestamp
at a high frequency in a way that is both correct and
very fast and works across a range of hardware/software
environments, INCLUDING running under Xen.
I AM exposed to some other companies' confidential
information, so any appearance that I am hiding something
is due to my clumsy attempts to dance around that
in a public forum.
> I'm unhappy with the idea of kicking around the Xen timekeeping code
> (and introducing the usual bug-tail) to support introducing a usermode
> TSC. If there is to be a new mode for this, it should default to the
> current (works for everyone except the engineering team of a
> not-to-be-named enterprise application) behaviour.
This isn't a new mode, it's a new (not-so-new for AMD)
hardware feature that Xen has yet to make proper use of.
And I'm not introducing a usermode TSC... Intel did that
years ago.
And if, by "new mode" you're referring to rdtsc emulation,
that's certainly not for Oracle's benefit.
> > I'm proposing that:
> > 1) for case C, Xen shall never overwrite TSC
> > 2) for case D, a new "tsc_broken" boot option must be specified
> > when Xen is booted on a broken machine
>
> Might as well call it "application_broken" and default it the other
> way. :) The system builders are entirely within their rights to have
> separate clocks for separate sockets.
If you agree with Jeremy's opinion that "any app that uses
rdtsc is fundamentally broken", your syntax makes sense.
As you know, I disagree, especially as it applies to future
hardware and software.
Dan
P.S. I'll have infrequent access to email for the next week.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|