| 
Hi Dan,
Thanks for all the investigation you've done!
-Dave
Dan Magenheimer wrote:
 
Hi Dave --
Thanks for that observation on ltp running on one vcpu!
With "clocksource=pit nohpet nopmtimer" our clock skew
problems seem to have been reduced to a reasonable
percentage.  However, our 32-bit clock skew seems to
show a measureable problem now.
 
For the 32 bit guest, which timesource did it pick?
 
 As a result,
I've been doing some digging into kernel sources and
have observed the following relative to RHEL4 (2.6.9-based)
kernels and RHEL5 (2.6.18-based) kernels and thought I
would document them for posterity.  Some of
our confusion arises from the fact that invalid command
line parameters are silently ignored.
RHEL4:
- clock= is a valid parameter for RHEL4-32
- clocksource= is not a valid parameter for RHEL4-xx
- nohpet is a valid parameter for RHEL4-64, not RHEL4-32
- nopmtimer is not a valid parameter for RHEL4-xx
- notsc is a valid parameter for RHEL4-32, not RHEL4-64
- SMP vs UP RHEL4-64 reports timekeeping in dmesg differently
For Xen RHEL4 HVM guests:
- I *think* clock=pit is sufficient for RHEL4-32 [1]
- I *think* nohpet is sufficient for RHEL4-64 [1]
RHEL5:
- there are two kinds of timekeeping, WALL and gtod
- clocksource= is a valid parameter for RHEL5-xx
- clock= is a valid but deprecated parameter for RHEL5-xx
- clock= and clocksource= are essentially equivalent
- nohpet is a valid parameter for RHEL5-64, not RHEL5-32
- nopmtimer is a valid parameter for RHEL5-64, not RHEL5-32
- notsc is a valid parameter for RHEL5-64, not RHEL5-32 [1]
- clock=pit changes the gtod source but not the WALL source[2]
- nohpet nopmtimer changes the WALL source to PIT
- /sys/devices/system/clocksource/clocksource0/...
 available_clocksource lists the possible clock sources
 current_clocksource lists the chosen clock source
 ..but neither of these works in a RHEL5 guest!
For Xen RHEL5 HVM guests:
- I *think* clock=pit is sufficient for RHEL5-32
 
But still poor accuracy, right?
 
- I *think* clock=pit nohpet nopmtimer is sufficient for RHEL5-64
Other info:
- As of 2.6.24.2, clock= is still valid (though still deprecated)
So, some open questions:
[1] Is notsc necessary for proper ticks for RHEL4-32/RHEL5-64?
   (I *think* not as it has never come up in any email.)
 
I have not investigated this yet.
 
[2] In RHEL5, I *think* it is the WALL source that we care about?
 
I'll have to check on this too.
 
And finally, since invalid command line parameters are ignored.
I think specifying:
        clock=pit nohpet nopmtimer
will force the guest clock sources into the optimal state for
all RHEL4 and RHEL5 both 32-bit and 64-bit guests (though see the
question above on tsc).  And we should keep an eye on
kernel/time/clocksource.c to ensure the __setup("clock="...)
line doesn't go away before RHEL6.
Note that if hpet=0 and pmtimer=0 were the default hvm platform
parameters for all xen hvm guests (on all versions of xen),
specifying kernel command line parameters would be unnecessary,
but c'est la vie.
Oh, and to be complete, timer_mode=0 for 32-bit RHEL guests
and timer_mode=2 for 64-bit RHEL guests.
Thanks,
Dan
-----Original Message-----
From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx]
Sent: Tuesday, February 19, 2008 8:27 AM
To: dan.magenheimer@xxxxxxxxxx
Cc: Dave Winchell; Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx; Deepak
Patel
Subject: Re: [Xen-devel] [PATCH] Add a timer mode that 
disables pending 
missed ticks
Hi Dan,
ltp runs by default loading up only one vcpu.
The -x option can be used to run multiple instances, though
in this mode you will get test failures.
I ran 8 instances on each guest for 16 hours, 25 min
and the time error was -11 sec (-.019%) on each guest.
Regards,
Dave
Dave Winchell wrote:
 
Hi Dan,
Mine was oversubscribed.
8 physical cpu, 2 guests, each with 8 vcpu.
I ran one instance of ltp on each guest, continuously. I hope ltp
loaded up all the vcpus. I seem to recall that it did, but I
could be wrong. If it didn't, that would be a major difference
between our tests. I'll verify this afternoon and run
 
multiple instances,
 
if necessary.
Thanks,
Dave
Dan Magenheimer wrote:
 
Hi Dave --
No new results yet but one other question:
The problems we've seen with our testing have been with a heavily
oversubscribed system: 8 physical CPU, six 2-vcpu 2GB guests
running LTP simultaneously.
Was your LTP testing oversubscribed or just a single guest?
Thanks,
Dan
 
-----Original Message-----
From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx]
Sent: Thursday, February 14, 2008 10:56 AM
To: dan.magenheimer@xxxxxxxxxx
Cc: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx; Deepak Patel; Dave
Winchell
Subject: Re: [Xen-devel] [PATCH] Add a timer mode that
 
disables pending
 
missed ticks
Dan,
Here are some boot snipets for rh4u564 on xen 3.2.
#1:
Feb 14 10:44:59 vs076 kernel: Bootdata ok (command line is ro
root=LABEL=/ console=ttyS0 clocksource=pit nohpet)
Feb 14 10:44:59 vs076 kernel: Linux version 2.6.9-55.ELsmp
(brewbuilder@xxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version
 
3.4.6 20060404
 
(Red Hat 3.4.6-3)) #1 SMP Fri Apr 20 16:36:54 EDT 2007
...
Feb 14 10:44:59 vs076 kernel: Kernel command line: ro root=LABEL=/
console=ttyS0 clocksource=pit nohpet
Feb 14 10:44:59 vs076 kernel: Initializing CPU#0
Feb 14 10:44:59 vs076 kernel: PID hash table entries:
 
2048 (order: 11,
 
65536 bytes)
Feb 14 10:44:59 vs076 kernel: time.c: Using 3.579545 MHz PM timer.
Feb 14 10:44:59 vs076 kernel: time.c: Detected 1992.050
 
MHz processor.
 
...
Feb 14 10:45:00 vs076 kernel: checking TSC
 
synchronization across 8
 
CPUs: passed.
Feb 14 10:45:00 vs076 kernel: Brought up 8 CPUs
Feb 14 10:45:00 vs076 kernel: Disabling vsyscall due to
 
use of PM timer
 
Feb 14 10:45:00 vs076 kernel: time.c: Using PM based timekeeping.
#2:
Feb 14 10:47:57 vs076 kernel: Bootdata ok (command line is ro
root=LABEL=/ console=ttyS0 clocksource=pit nohpet nopmtimer)
Feb 14 10:47:57 vs076 kernel: Linux version 2.6.9-55.ELsmp
(brewbuilder@xxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version
 
3.4.6 20060404
 
(Red Hat 3.4.6-3)) #1 SMP Fri Apr 20 16:36:54 EDT 2007
...
Feb 14 10:47:58 vs076 kernel: Kernel command line: ro root=LABEL=/
console=ttyS0 clocksource=pit nohpet nopmtimer
Feb 14 10:47:58 vs076 kernel: Initializing CPU#0
Feb 14 10:47:58 vs076 kernel: PID hash table entries:
 
2048 (order: 11,
 
65536 bytes)
Feb 14 10:47:58 vs076 kernel: time.c: Using 1.193182 MHz
 
PIT timer.
 Feb 14 10:47:58 vs076 kernel: time.c: Detected 1991.959 
         
 
MHz processor.
 
...
Feb 14 10:47:59 vs076 kernel: checking TSC
 
synchronization across 8
 
CPUs: passed.
Feb 14 10:47:59 vs076 kernel: Brought up 8 CPUs
Feb 14 10:47:59 vs076 kernel: time.c: Using PIT/TSC based
 
timekeeping.
 
As you can see, I only get the pit if I specify nopmtimer.
Dan Magenheimer wrote:
 
Hi Dave --
Thanks for continuing to run tests!
Hmmm... I thought I had noticed that even though Linux will
 
acknowledge
 
the existence of the pmtimer, it still prints:
time.c: Using PIT/TSC based timekeeping.
I will check again, but assuming the clocksource for our tests is
indeed pit, the huge difference in the results (yours vs ours) is
baffling. I wonder if the difference may be the
 
underlying hardware.
 
Maybe we will try to ensure we can duplicate the results on
 
a different
 
box.
So your testing was with stock 3.2.0 xen bits (what
 
cset?) without
 any of your [quote from below] "clock related tweaks 
           
 
that I haven't
 
submitted, because I'm still characterizing them"?
 
None of the tweaks I mentioned are in this test.
It was stock with some patches. However, none of the
 
patches are time
 
related to
my knowledge and I checked vpt.c to make sure that it is
 
the same as
 
what's in unstable.
The only difference is in pt_intr_post, where I set the
 
timer mode.
 
I don't have timer mode tied into our config process yet, which
is different than official xen method.
(In pt_intr_post)
    else
    {
+       if(v->arch.paging.mode->guest_levels == 4)
+
 
v->domain->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] =
 
HVMPTM_no_missed_ticks_pending;
+       else
+
 
v->domain->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] =
 
HVMPTM_delay_for_missed_ticks;
        if ( mode_is(v->domain, one_missed_tick_pending) ||
             mode_is(v->domain, no_missed_ticks_pending) )
        {
Could you also send detail on the rhel4u4-64 kernel you
are testing with, just to ensure we are not comparing apples
and oranges?  (Perhaps there's some way we can even share the
identical disk image and vm.cfg file?)
And if our problem is indeed the pmtimer, I will need to submit
another patch to Keir to add an hvm pmtimer platform variable.
(Hmmm... I don't think he's even accepted the hpet variable patch
yet.  I'll have to check.)
Thanks,
Dan
 
-----Original Message-----
From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx]
Sent: Thursday, February 14, 2008 9:00 AM
To: dan.magenheimer@xxxxxxxxxx
Cc: Dave Winchell; Keir Fraser;
 
xen-devel@xxxxxxxxxxxxxxxxxxx; Deepak
 
Patel
Subject: Re: [Xen-devel] [PATCH] Add a timer mode that
disables pending
missed ticks
Hi Dan,
I ran the ltp tests with 3.2 and found the errors
for a 16 hour run to be:
rh4u564 -9.9 sec (-.017%)
rh4u464 -7.3 sec (-.013%)
There were no cliffs and the drift was linear.
I think the problem you had may be due to the use of the
pm timer. If you still have the boot log, it would tell you.
When I first tried a guest on 3.2 with "clocksource=pit nohpet"
I noticed that it picked the pm timer. Adding "nopmtimer", the
guest will pick the pit.
The reason I didn't have the problem with our 3.1 base is that
I had disabled the hpet and the pmtimer by not advertising them
in the acpi tables. I did this so long ago, I forgot
 
that I had to
 
disable pmtimer as well as hpet.
So, can you re-run your test with "clocksource=pit nohpet
 
nopmtimer"?
 
You should see this in the boot messages:
time.c: Using PIT/TSC based timekeeping.
Thanks,
Dave
Dave Winchell wrote:
 
Hi Dan,
Over the weekend the drift was +18 seconds for each
 
guest (no ntp).
 The duration was 3900 minutes, so the error for each 
               
 
was +.0077%.
 
Looking back through the data, it appears to drift linearly at
this rate. I've attached a plot for rh4u5-64.
This accuracy is better than what I've seen before (.03-.05%).
This may be due to the different load (ltp vs usex) or to
 
one of the
 changes I've made recently. I'll do some 
               
 
experimentation to see if
 
there is
a fix I should propose.
This still doesn't address the radical drift you saw.
The next step for me is to run 3.2 and see if I can
 
reproduce it.
 
Regards,
Dave
Dave Winchell wrote:
 
Hi Dan,
Sorry it took me so long, but I finally ran an ltp test today.
Its on rh4u4-64. I'm using the defaults for ltp and
 
using a script
 
called runltp. I had a usex load on rh4u5-64. No ntpd.
virtual processors / physical processors = 2.
The clocks drifted -1 sec (4u5) and +1.5 sec (4u4) in
 
300 minutes
 
for -.005% and .008%.
I'm running a 3.1 based hypervisor with some clock related
 
tweaks that
 
I haven't submitted, because I'm still characterizing them.
I'm stopping the usex load on 4u5-64 now and
 
replacing it with ltp
 
and will leave the two guests running ltp over the weekend.
Regards,
Dave
Dave Winchell wrote:
 
Hi Dan, Deepak:
Thanks for the data. Those drifts are severe - no wonder
 
ntp couldn't
 
keep then in synch. I'll try to reproduce that behaviour
 
here, with
 
my code base.
If I can't reproduce it, I'll try 3.2.
If you can isolate what ltp is doing during the cliffs,
 
that would
 
be very
helpful.
thanks,
Dave
Dan Magenheimer wrote:
 
OK, Deepak repeated the test without ntpd and using
 
ntpdate -b before
 
the test.
The attached graph shows his results: el5u1-64
 
(best=~0.07%),
 
el4u5-64 (middle=~0.2%), and el4u5-32 (worst=~0.3%).
We will continue to look at LTP to try to isolate.
Thanks,
Dan
P.S. elXuY is essentially RHEL XuY with some patches.
 
-----Original Message-----
From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx]
Sent: Wednesday, January 30, 2008 2:45 PM
To: Deepak Patel
Cc: dan.magenheimer@xxxxxxxxxx; Keir Fraser;
xen-devel@xxxxxxxxxxxxxxxxxxx; akira.ijuin@xxxxxxxxxx;
 
Dave Winchell
 Subject: Re: [Xen-devel] [PATCH] Add a timer mode 
                       
 
that disables
 
pending
missed ticks
Dan, Deeepak,
It may be that the underlying clock error is too
 
great for ntp
 
to handle. It would be useful if you did not run ntpd
and, instead did ntpdate -b <timeserver> at the start
 
of the test
 for each guest. Then capture the data as you have 
                       
 
been doing.
 
If the drift is greater than .05%, then we need to
 
address that.
 
Another option is, when running ntpd, to enable loop
 
statistics in
 
/etc/ntp.conf
by adding this to the file:
statistics loopstats
statsdir /var/lib/ntp/
Then you will see loop data in that directory.
Correlating the data in the loopstats files with the
peaks in skew would be interesting. You will see
 
entries of the form
 
54495 76787.701 -0.045153303 -132.569229 0.020806776
 
239.735511 10
 
Where the second to last column is the Allan Deviation.
 
When that
 
gets over 1000, ntpd is working pretty hard. However,
 
I have not
 
seen ntpd
completely loose it like you have.
I'm on vacation until Monday, and won't be reading
email.
Thanks for all your work on this!
-Dave
Deepak Patel wrote:
 Is the graph for RHEL5u1-64? (I've never tested 
                           
 
this one.)
 
 
I do not know which graph was attached with this. But
 
I saw this
 
behavior in EL4u5 - 32, EL4U5 - 64 and EL5U1 - 64 hvm
 
guests when I
 
was running ltp tests continuously.
 
What was the behaviour of the other guests running?
 
All pvm guests are fine. But behavior of most of the
 
hvm guests were
 
as described.
 
If they had spikes, were they at the same wall time?
 
No. They are not at the same wall time.
 Yes all 6 guests (4 hvm and 2 pvm) the guests are
Were the other guests running ltp as well?
 
running ltp
 
continuously.
 
How are you measuring skew?
 
I was collecting output of "ntpdate -q  <timeserver> every
 
300 seconds
 
(5 minutes) and have created graph based on that.
 
Are you running ntpd?
 
Yes. ntp was running on all the guests.
I am investigating what causes this spikes and
 
let everyone
 
 
know what
 
are my findings.
Thanks,
Deepak
 
Anything that you can discover that would be in sync with
the spikes would be very helpful!
The code that I test with is our product code,
 
which is based
 
on 3.1. So it is possible that something in 3.2 other
 
than vpt.c
 
is the cause. I can test with 3.2, if necessary.
thanks,
Dave
Dan Magenheimer wrote:
 
Hi Dave (Keir, see suggestion below) --
Thanks!
Turning off vhpet certainly helps a lot (though
 
see below).
 
I wonder if timekeeping with vhpet is so bad that it
 
should be
 
turned off by default (in 3.1, 3.2, and unstable)
 
until it is
 fixed?  (I have a patch that defaults it off, 
                             
 
can post it if
 
there is agreement on the above point.)  The whole
 
point of an
 HPET is to provide more precise timekeeping and 
                             
 
if vhpet is
 
worse than vpit, it can only confuse users.  Comments?
In your testing, are you just measuring % skew
 
over a long
 
period of time?
We are graphing the skew continuously and
seeing periodic behavior that is unsettling,
 
even with pit.
 See attached.  Though your algorithm recovers, 
                             
 
the "cliffs"
 could still cause real user problems.  I wonder 
                             
 
if there is
 
anything that can be done to make the "recovery" more
responsive?
We are looking into what part(s) of LTP is causing
 
the cliffs.
 
Thanks,
Dan
 
-----Original Message-----
From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx]
Sent: Monday, January 28, 2008 8:21 AM
To: dan.magenheimer@xxxxxxxxxx
Cc: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx;
deepak.patel@xxxxxxxxxx;
akira.ijuin@xxxxxxxxxx; Dave Winchell
Subject: Re: [Xen-devel] [PATCH] Add a timer mode
 
that disables
 
pending
missed ticks
Dan,
I guess I'm a bit out of date calling for clock= usage.
Looking at linux 2.6.20.4 sources, I think you
 
should specify
 
"clocksource=pit nohpet" on the linux guest bootline.
You can leave the xen and dom0 bootlines as they are.
The xen and guest clocksources do not need to
 
be the same.
 In my tests, xen is using the hpet for its 
                               
 
timekeeping and
 
that appears to be the default.
When you boot the guests you should see
time.c: Using PIT/TSC based timekeeping.
on the rh4u5-64 guest, and something similar
 
on the others.
 
 (xm dmesg shows 8x Xeon 3.2GHz stepping 04, 
                                 
 
Platform timer
 
14.318MHz HPET.)
 
This appears to be the xen state, which is fine.
I was wrongly assuming that this was the guest state.
You might want to look in your guest logs and see
 
what they were
 
picking
for a clock source.
Regards,
Dave
Dan Magenheimer wrote:
 
Thanks, I hadn't realized that!  No wonder we didn't
 
 
see the same
 
improvement you saw!
 I'm confused... do you mean "clocksource=pit"
Try specifying clock=pit on the linux boot line...
 
on the Xen
 
 
command line or
 
"nohpet" / "clock=pit" / "clocksource=pit" on the
 
guest (or
                                  
 
dom0?) command
 line?  Or both places?  Since the tests take 
                                 
 
awhile, it
 
 
would be nice
 to get this right the first time.  Do the Xen 
                                 
 
and guest
 
 
clocksources need
 
to be the same?
Thanks,
Dan
-----Original Message-----
*From:* Dave Winchell
 
[mailto:dwinchell@xxxxxxxxxxxxxxx]
 
*Sent:* Sunday, January 27, 2008 2:22 PM
*To:* dan.magenheimer@xxxxxxxxxx; Keir Fraser
*Cc:* xen-devel@xxxxxxxxxxxxxxxxxxx;
 
deepak.patel@xxxxxxxxxx;
 
akira.ijuin@xxxxxxxxxx; Dave Winchell
*Subject:* RE: [Xen-devel] [PATCH] Add a timer mode
 
 
that disables
 
pending missed ticks
Hi Dan,
Hpet timer does have a fairly large error, as I
was
 
 
trying this
 
one recently.
I don't remember what I got for error, but 1% sounds
 
about right.
 
The problem is that hpet is not built on top of vpt.c,
 
the module
 
Keir and I did
all the recent work in, for its periodic timer
 
needs. Try
 
specifying clock=pit
on the linux boot line. If it still picks the
 
hpet, which it
 
might, let me know
and I'll tell you how to get around this.
Regards,
Dave
 
 
--------------------------------------------------------------
 
----------
 
*From:* Dan Magenheimer
 
[mailto:dan.magenheimer@xxxxxxxxxx]
 
*Sent:* Fri 1/25/2008 6:50 PM
*To:* Dave Winchell; Keir Fraser
*Cc:* xen-devel@xxxxxxxxxxxxxxxxxxx;
 
deepak.patel@xxxxxxxxxx;
 
akira.ijuin@xxxxxxxxxx
*Subject:* RE: [Xen-devel] [PATCH] Add a timer mode
 
that disables
 
pending missed ticks
Sorry for the very late followup on this but
 
we finally
 
 
were able
 
to get our testing set up again on stable 3.1
 
bits and have
 
seen some very bad results on 3.1.3-rc1, on the
 
order of 1%.
 
Test enviroment was a 4-socket dual core machine
 
with 24GB of
 
memory running six two-vcpu 2GB domains, four hvm
 
 
plus two pv.
 
All six guests were running LTP simultaneously.
 
The four hvm
 
guests were: RHEL5u1-64, RHEL4u5-32, RHEL5-64, and
 
 
RHEL4u5-64.
 
Timer_mode was set to 2 for 64-bit guests and 0 for
 
32-bit guests.
 
All four hvm guests experienced skew around -1%,
 
 
even the 32-bit
 
guest.  Less intensive testing didn't exhibit much
 
 
skew at all.
 
A representative graph is attached.
Dave, I wonder if some portion of your patches
 
 
didn't end up in
 
the xen trees?
(xm dmesg shows 8x Xeon 3.2GHz stepping 04,
 
Platform timer
 
14.318MHz HPET.)
Thanks,
Dan
P.S. Many thanks to Deepak and Akira for
 
running tests.
 
-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
 
 
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx]On Behalf Of
 
Dave Winchell
Sent: Wednesday, January 09, 2008 9:53 AM
To: Keir Fraser
Cc: dan.magenheimer@xxxxxxxxxx;
 
 
xen-devel@xxxxxxxxxxxxxxxxxxx; Dave
 
Winchell
Subject: Re: [Xen-devel] [PATCH] Add a
 
timer mode that
 
disables pending
missed ticks
Hi Keir,
The latest change, c/s 16690, looks fine.
I agree that the code in c/s 16690 is equivalent to
the code I submitted. Also, your version is more
concise.
The error tests confirm the equivalence. With
 
 
overnight cpu loads,
 the checked in version was accurate to 
                                   
 
+.048% for sles
 
and +.038% for red hat. My version was +.046%
 
and
 
 
+.032% in a
 
2 hour test.
I don't think the difference is significant.
i/o loads produced errors of +.01%.
Thanks for all your efforts on this issue.
Regards,
Dave
Keir Fraser wrote:
 
Applied as c/s 16690, although the
 
checked-in patch is
 
smaller. I think the
 
only important fix is to pt_intr_post() and the
 
 
 
only bit of
 
the patch I
 
totally omitted was the change to
 
 
 
pt_process_missed_ticks().
 
I don't think
 
that change can be important, but let's see what
 
 
happens to the
 
error
 
percentage...
-- Keir
On 4/1/08 23:24, "Dave Winchell"
 
 
<dwinchell@xxxxxxxxxxxxxxx> wrote:
 
 
Hi Dan and Keir,
Attached is a patch that fixes some
 
issues with the
 
 
SYNC policy
 
(no_missed_ticks_pending).
I have not tried to make the change the
 
 
minimal one, but,
 
rather, just
 
ported into
the new code what I know to work well.
 
The error for
 
no_missed_ticks_pending goes from
over 3% to .03% with this change according
 
 
to my testing.
 
Regards,
Dave
Dan Magenheimer wrote:
 
Hi Dave --
Did you get your correction ported?  If so,
 
 
it would be
 
nice to see this get
 
into 3.1.3.
Note that I just did some very limited
 
testing with
 
timer_mode=2(=SYNC=no
 
missed ticks pending)
on tip of xen-3.1-testing (64-bit Linux hv
 
 
 
guest) and the
 
worst error I've
 
seen so far
is 0.012%.  But I haven't tried any exotic
 
 
 
loads, just LTP.
 
Thanks,
Dan
 
-----Original Message-----
From: Dave Winchell
 
 
[mailto:dwinchell@xxxxxxxxxxxxxxx]
 
Sent: Wednesday, December 19, 2007 12:33 PM
To: dan.magenheimer@xxxxxxxxxx
Cc: Keir Fraser; Shan, Haitao;
 
xen-devel@xxxxxxxxxxxxxxxxxxx; Dong,
 
Eddie; Jiang, Yunhong; Dave Winchell
Subject: Re: [Xen-devel] [PATCH] Add a
 
 
timer mode that
 
disables pending
missed ticks
Dan,
I did some testing with the constant tsc offset
 
 
SYNC method
 
(now called
no_missed_ticks_pending)
and found the error to be very high, much larger
 
 
than 1 %, as
 
I recall.
I have not had a chance to submit a
 
correction. I
 
 
will try to
 
do it later
this week or the first week in January. My
 
 
version of
 
constant tsc
 
offset SYNC method
produces .02 % error, so I just need to port
 
 
 
that into the
 
current code.
The error you got for both of those kernels is
 
 
what I would
 
expect
 
for the default mode, delay_for_missed_ticks.
I'll let Keir answer on how to set the
 
time mode.
 
Regards,
Dave
Dan Magenheimer wrote:
 
Anyone make measurements on the final patch?
I just ran a 64-bit RHEL5.1 pvm kernel and
 
 
 
saw a loss of
 
 
about 0.2% with no load.  This was
 
 
 
xen-unstable tip today
 
with no options specified.  32-bit was
 
about 0.01%.
 
 
I think I missed something... how do I
 
 
run the various
 accounting choices and which ones are
 
known to be
 
 
appropriate
 
for which kernels?
 
Thanks,
Dan
 
-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
 
 
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx]On Behalf Of
 
 
 
 
Keir Fraser
 
Sent: Thursday, December 06, 2007 4:57 AM
To: Dave Winchell
Cc: Shan, Haitao;
 
 
 
xen-devel@xxxxxxxxxxxxxxxxxxx; Dong,
 
Eddie; Jiang,
 
Yunhong
Subject: Re: [Xen-devel] [PATCH] Add a timer
 
 
 
mode that
 
disables pending
missed ticks
Please take a look at xen-unstable
 
 
changeset 16545.
 
-- Keir
On 26/11/07 20:57, "Dave Winchell"
 
<dwinchell@xxxxxxxxxxxxxxx> wrote:
 
 
Keir,
The accuracy data I've collected for i/o
 
 
 
loads for the
 
various time protocols follows. In
 
 
 
addition, the data
 
for cpu loads is shown.
The loads labeled cpu and i/o-8 are on an 8
 
 
processor AMD
 
box.
 
Two guests, red hat and sles 64 bit, 8
 
 
vcpu each.
 
The cpu load is usex -e36 on each guest.
(usex is available at
 
http://people.redhat.com/anderson/usex.)
 
i/o load is 8 instances of dd if=/dev/hda6
 
 
of=/dev/null.
 
The loads labeled i/o-32 are 32
 
instances of dd.
 
Also, these are run on 4 cpu AMD box.
In addition, there is an idle rh-32bit guest.
All three guests are 8vcpu.
The loads labeled i/o-4/32 are the same
 
 
as i/o-32
 
except that the redhat-64 guest has 4
 
 
 
instances of dd.
 
Date Duration Protocol sles, rhat error load
11/07 23 hrs 40 min ASYNC -4.96 sec,
 
+4.42
 
 
sec -.006%,
 
+.005% cpu
 
11/09 3 hrs 19 min ASYNC -.13 sec, +1.44
 
 
 
sec, -.001%,
 
+.012% cpu
 
11/08 2 hrs 21 min SYNC -.80 sec, -.34
 
 
sec, -.009%,
 
-.004% cpu
 
11/08 1 hr 25 min SYNC -.24 sec, -.26 sec,
 
 
-.005%, -.005% cpu
 
11/12 65 hrs 40 min SYNC -18 sec, -8 sec,
 
 
-.008%, -.003% cpu
 
11/08 28 min MIXED -.75 sec, -.67 sec -.045%,
 
 
-.040% cpu
 
11/08 15 hrs 39 min MIXED -19. sec,-17.4
 
 
 
sec, -.034%,
 
-.031% cpu
 
11/14 17 hrs 17 min ASYNC -6.1
 
 
sec,-55.7 sec, -.01%,
 
-.09% i/o-8
 
11/15 2 hrs 44 min ASYNC -1.47
 
 
sec,-14.0 sec, -.015%
 
-.14% i/o-8
 
11/13 15 hrs 38 min SYNC -9.7 sec,-12.3
 
 
sec, -.017%,
 
-.022% i/o-8
 
11/14 48 min SYNC - .46 sec, - .48 sec,
 
 
-.017%, -.018% i/o-8
 
11/14 4 hrs 2 min MIXED -2.9 sec, -4.15
 
 
sec, -.020%,
 
-.029% i/o-8
 
11/20 16 hrs 2 min MIXED -13.4 sec,-18.1
 
 
 
sec, -.023%,
 
-.031% i/o-8
 
11/21 28 min MIXED -2.01 sec, -.67
 
sec, -.12%,
 
 
-.04% i/o-32
 
11/21 2 hrs 25 min SYNC -.96 sec, -.43
 
 
sec, -.011%,
 
-.005% i/o-32
 11/21 40 min ASYNC -2.43 sec, -2.77 
                                                 
 
sec -.10%,
 
 
-.11% i/o-32
 
11/26 113 hrs 46 min MIXED -297. sec,
 
 
13. sec -.07%,
 
.003% i/o-4/32
 
11/26 4 hrs 50 min SYNC -3.21 sec, 1.44
 
 
sec, -.017%,
 
.01% i/o-4/32
 
Overhead measurements:
Progress in terms of number of passes
 
 
 
through a fixed
 
 
system workload
 on an 8 vcpu red hat with an 8 vcpu 
                                                 
 
sles idle.
 
The workload was usex -b48.
ASYNC 167 min 145 passes .868 passes/min
SYNC 167 min 144 passes .862 passes/min
SYNC 1065 min 919 passes .863 passes/min
MIXED 221 min 196 passes .887 passes/min
Conclusions:
The only protocol which meets the
 
.05% accuracy
 
requirement for ntp
 
tracking under the loads
above is the SYNC protocol. The worst case
 
 
accuracies for
 
 
SYNC, MIXED,
 
and ASYNC
are .022%, .12%, and .14%, respectively.
We could reduce the cost of the SYNC
 
 
method by only
 
 
scheduling the extra
 
wakeups if a certain number
of ticks are missed.
Regards,
Dave
Keir Fraser wrote:
 
On 9/11/07 19:22, "Dave Winchell"
 
<dwinchell@xxxxxxxxxxxxxxx> wrote:
 
 
Since I had a high error (~.03%) for the
 
 
ASYNC method a
 
 
couple of days ago,
 
I ran another ASYNC test. I think
 
 
there may have
 
been something
 
wrong with the code I used a couple of
 
 
 
days ago for
 
 
ASYNC. It may have been
 
missing the immediate delivery of interrupt
 
 
after context
 
 
switch in.
 
My results indicate that either SYNC
 
 
or ASYNC give
 
 
acceptable accuracy,
 
each running consistently around or under
 
 
.01%. MIXED has
 
 
a fairly high
 
error of
greater than .03%. Probably too close
 
 
to .05% ntp
 
 
threshold for comfort.
 
I don't have an overnight run with SYNC. I
 
 
plan to leave
 
 
SYNC running
 
over the weekend. If you'd rather I can
 
 
 
leave MIXED
 
 
running instead.
 It may be too early to pick the 
                                                     
 
protocol and
 
 
I can run
 
 
more overnight tests
 
next week.
 
I'm a bit worried about any unwanted side
 
 
effects of the
 
 
SYNC+run_timer
 
approach -- e.g., whether timer wakeups will
 
 
cause higher
 
 
system-wide CPU
 
contention. I find it easier to think
 
 
through the
 
 
implications of ASYNC. I'm
 
surprised that MIXED loses time, and is less
 
 
accurate than
 
 
ASYNC. Perhaps it
 delivers more timer interrupts than 
                                                   
 
the other
 
 
approaches,
 
 
and each interrupt
 
event causes a small accumulated error?
Overall I would consider MIXED and ASYNC as
 
 
favourites and
 
 
if the latter is
 
actually more accurate then I can
 
 
simply revert the
 
changeset that
 
implemented MIXED.
Perhaps rather than running more of the same
 
 
workloads you
 
 
could try idle
 
VCPUs and I/O bound VCPUs (e.g., repeated
 
 
large disc reads
 
 
to /dev/null)? We
 
don't have any data on workloads that aren't
 
 
CPU bound, so
 
 
that's really an
 
obvious place to put any further effort imo.
-- Keir
 
_______________________________________________
 
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 
 
 
 
diff -r cfdbdca5b831 xen/arch/x86/hvm/vpt.c
--- a/xen/arch/x86/hvm/vpt.c Thu Dec 06 15:36:07
 
 
 
2007 +0000
 
+++ b/xen/arch/x86/hvm/vpt.c Fri Jan 04 17:58:16
 
 
 
2008 -0500
 
@@ -58,7 +58,7 @@ static void
 
 
 
pt_process_missed_ticks(stru
 
   missed_ticks = missed_ticks / (s_time_t)
 
 
pt->period + 1;
 
   if ( mode_is(pt->vcpu->domain,
 
 
no_missed_ticks_pending) )
 
-        pt->do_not_freeze = !pt->pending_intr_nr;
+        pt->do_not_freeze = 1;
   else
       pt->pending_intr_nr += missed_ticks;
   pt->scheduled += missed_ticks * pt->period;
@@ -127,7 +127,12 @@ static void
 
 
pt_timer_fn(void *data)
 
   pt_lock(pt);
-    pt->pending_intr_nr++;
+    if ( mode_is(pt->vcpu->domain,
 
 
no_missed_ticks_pending) ) {
+        pt->pending_intr_nr = 1;
+ pt->do_not_freeze = 0;
+    }
+    else
+ pt->pending_intr_nr++;
   if ( !pt->one_shot )
   {
@@ -221,8 +226,6 @@ void pt_intr_post(struct
 
 
 
vcpu *v, struct
 
       return;
   }
-    pt->do_not_freeze = 0;
-
   if ( pt->one_shot )
   {
       pt->enabled = 0;
@@ -235,6 +238,10 @@ void pt_intr_post(struct vcpu
 
 
*v, struct
 
           pt->last_plt_gtime =
 
 
hvm_get_guest_time(v);
 
           pt->pending_intr_nr = 0; /*
 
 
'collapse' all
 
missed ticks */
 
       }
+ else if ( mode_is(v->domain,
 
 
 
no_missed_ticks_pending) ) {
+     pt->pending_intr_nr--;
+     pt->last_plt_gtime = hvm_get_guest_time(v);
+ }
       else
       {
           pt->last_plt_gtime +=
 
pt->period_cycles;
                                        
 
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 
 
 
 
--------------------------------------------------------------
----------
 
 
 
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 |