Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy

Dan Magenheimer wrote:

Kudos, Dave, for your excellent work!

Thanks, Dan.

Keir, I've completed enough testing to agree that
Dave's hpet policy is a huge improvement over the
existing hpet code and a major improvement over
the pit-based policies/timekeeping.  I strongly
recommend that, once Dave's soon-to-be-revised
patch is in, we turn on hpet by default for all
hvm guests. I'd also suggest that the default
timer_mode (at least when hpet=1) should be
Dave's guest_computes_missed_ticks policy.
(Dave, could you include this in your revised
patch?  Or if you want me to, let me know.)

Sure, I can do it.

A couple of remaining points...

I'm glad your able to reproduce my results.
Are you still seeing the boot time hang up?
Is this the reason for vcpus=1?


No, I was just trying to be methodical in my testing,
covering various cases.  I haven't seen the boot-time
hang for awhile.

ok. We still see it here so I'm working on a fix/workaround.

I'll put this on the bug list - unless no one
cares about apic=0.


It probably should be "on the bug list" but very low
priority compared with getting the patch cleaned up
(per Keir's requirements) in time for the 3.3 release.

ok.

Dan, thanks very much for the testing work. I know its not
easy and you still came up with the results very quickly.

-Dave

Dan

-----Original Message-----
From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx]
Sent: Friday, June 13, 2008 6:08 AM
To: dan.magenheimer@xxxxxxxxxx; Keir Fraser; xen-devel
Cc: Ben Guthro; Dave Winchell
Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy

Hi Dan,

I'm glad your able to reproduce my results.
Are you still seeing the boot time hang up?
Is this the reason for vcpus=1?

you can see that there are always 1000 LOC/sec.  But
with apic=1 there are also about 350 IO-APIC-edge-timer/sec
and with apic=0 there are 1000 XT-PIC-timer/sec.

I suspect that the latter of these (XT-PIC-timer) is
messing up your policy and the former (edge-timer) is not.


Thanks for this data. Your analysis is correct, I think.
I wrote the interrupt routing and callback code for the
IOAPIC edge triggered interrupts. The PIC path does not
have the callbacks. With no callbacks, it always looks to
the routing code in hpet.c like its been longer than a period
since the last one as the end-of-interrupt time stamp is zero. Thus, you get
an interrupt each timeout or 1000 interrupts/sec.
350 is a typical amount when the algorithm for missed ticks is
doing its thing. I'll put this on the bug list - unless no one
cares about apic=0.

thanks,
Dave


-----Original Message-----
From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
Sent: Fri 6/13/2008 12:47 AM
To: Dave Winchell; Keir Fraser; xen-devel
Cc: Ben Guthro
Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy

Hi Dave --

Hmmm... in my earlier runs with rhel5u1-64, I had apic=0
(yes apic, not acpi).  Changing it to apic=1 gives excellent
results (< 0.01% even with overcommit).  Changing it back
to apic=0 has the same fairly bad results, 0.08% with no
overcommit and 0.16% (and climbing) with overcommit.
Note that this is all with vcpus=1.

How odd...

I vaguely recalled from some research a couple of months ago
that hpet is read MORE than once/tick on the boot processor.
I can't seem to find the table I compiled from that research,
but I did find this in an email I sent to you:

"You probably know this already but an n-way 2.6 Linux
kernel reads hpet (n+1)*1000 times/second.  Let's take
five 2-way guests as an example; that comes to 15000
hpet reads/second...."

I wondered what was different between apic=1 vs 0. Using:

# cat /proc/interrupts | grep 'LOC|timer'; sleep 10; \
    cat /proc/interrupts | grep 'LOC|timer'

you can see that there are always 1000 LOC/sec.  But
with apic=1 there are also about 350 IO-APIC-edge-timer/sec
and with apic=0 there are 1000 XT-PIC-timer/sec.

I suspect that the latter of these (XT-PIC-timer) is
messing up your policy and the former (edge-timer) is not.

Dan

-----Original Message-----
From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx]
Sent: Thursday, June 12, 2008 4:49 PM
To: dan.magenheimer@xxxxxxxxxx; Keir Fraser; xen-devel
Cc: Ben Guthro; Dave Winchell
Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy


Dan,

You shouldn't be getting higher than .05%.
I'd like to figure out what is wrong. I'm running the same guest
you are with heavy loads and the physical processors overcommitted
by 3:1. And I'm seeing .027% error on rh5u1-64 after an hour.

Can you type ^a^a^a at the console and then
type 'Z' a couple of times about 10 seconds apart and send
me the output? Do this when you have a domain
running that is keeping poor time.

You should take drift measurements over a period
of time that is at least 20 minutes, preferably longer.

Also, can you send me a tarball of your sources from
the xen directory?


thanks,
Dave




-----Original Message-----
From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
Sent: Thu 6/12/2008 6:05 PM
To: Dave Winchell; Keir Fraser; xen-devel
Cc: Ben Guthro
Subject: Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy

(Going back on list.)

OK, so looking at the updated patch, hpet_avoid=1 is actually
working, just reporting wrong, correct?

With el5u1-64-hvm and hpet_avoid=1 and timer_mode=4, skew
is under -0.04% and falling.  With hpet_avoid=0, it looks
about the same.  However both cases seem to start creeping
up again when I put load on, then fall again when I remove
the load -- even with sched-credit capping cpu usage.  Odd!
This implies to me that the activity in the other domains
IS affecting skew on the domain-under-test. (Keir, any
comments on the hypothesis attached below?)

Another theoretical oddity... if you are always delivering
timer ticks "late", fewer than the nominal 1000 ticks/sec
should be being received.  So then why is guest time actually
going faster than an external source?

(In my mind, going faster is much worse than going slower
because if ntpd or a human moves time backwards to compensate
for a clock going faster, "make" and other programs can
get very confused.)

Dan

-----Original Message-----
From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
Sent: Thursday, June 12, 2008 3:13 PM
To: 'Dave Winchell'
Subject: RE: xen hpet patch

One more thought while waiting for compile and reboot:

Am I right that all of the policies are correcting for when
a domain "A" is out-of-context?  There's nothing in any other
domain "B" that can account for any timer loss/gain in domain
"A".  The only reason we are running other domains is to ensure
that domain "A" is sometimes out-of-context, and the more
it is out-of-context, the more likely we will observe
a problem, correct?

If this is true, it doesn't matter what workload is run
in the non-A domains... as long as it is loading the
CPU(s), thus ensuring that domain A is sometimes not
scheduled on any CPU.

And if all this is true, we may not need to run other
domains at all... running "xm sched-credit -d A -c 50"
should result in domain A being out-of-context at least
half the time.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy