| 
         
xen-devel
Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
 
Dan Magenheimer wrote:
 
In EL5u1-32 however it looks like the fractions are accounted
for.  Indeed the EL5u1-32 "lost tick handling" code resembles
the Linux/ia64 code which is what I've always assumed was
the "missed tick" model.  In this case, I think no policy
is necessary and the measured skew should be identical to
any physical hpet skew.  I'll have to test this hypothesis though.
    
 
I've tested this hypothesis and it seems to hold true.
This means the existing (unpatched) hpet code works fine
on EL5-32bit (vcpus=1) when hpet is the clocksource,
even when the machine is overcommitted.  A second hypothesis
still needs to be tested that Dave's patch will not make this worse.
  
 
Interesting, thanks for pointing this out and confirming.
 
(Note that per previous discussion, my EL5u1-32bit guest
running on an Intel dual-core physical box chose tsc as
the best clocksource and I had to override it with
clock=hpet in the kernel command line.)
  
 
Is there one setting for all Linux guests that makes them
choose hpet? Is it "clock=hpet clocksource=hpet"?
I know you wrote at length about this before.
  
Yes, that makes sense and concurs with what I remember from
the EL4u5-32 code.  If this is true, one would expect the
default "no missed tick" policy to see time moving faster
than an external source -- the first missed tick delivered
after a long sleep would "catch up" and then the remainder
would each add another tick.
    
 
Indeed with the existing (unpatched) hpet code, time is
running faster on EL4u5-32 (vcpus=1, when overcommited).
So Dave's patch is definitely needed here.
  
 
Its good to get the verification of this.
thanks,
Dave
 
Will try 64-bit next.
Dan
  
-----Original Message-----
From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
Sent: Monday, June 09, 2008 9:21 PM
To: 'Dave Winchell'; 'Keir Fraser'
Cc: 'xen-devel'; 'Ben Guthro'
Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
    
I'll tell  you what I recall about this. Tomorrow I'll check the
guest code to verify. I think that Linux declares a full tick,
even if the interrupt is early. That's the problem.
      
 
Yes, that makes sense and concurs with what I remember from
the EL4u5-32 code.  If this is true, one would expect the
default "no missed tick" policy to see time moving faster
than an external source -- the first missed tick delivered
after a long sleep would "catch up" and then the remainder
would each add another tick.
    
On the other hand, if the interrupt is late it in effect declares
 a tick plus fraction. If it just declared the fraction in 
     
 
the first place,
    
we could deliver the interrupts whenever we wanted.
      
 
My read of the EL4u5-32 code is that the fraction is discarded
and a new tick period commences at "now", so the fractions
eventually accumulate as lost time.
In EL5u1-32 however it looks like the fractions are accounted
for.  Indeed the EL5u1-32 "lost tick handling" code resembles
the Linux/ia64 code which is what I've always assumed was
the "missed tick" model.  In this case, I think no policy
is necessary and the measured skew should be identical to
any physical hpet skew.  I'll have to test this hypothesis though.
-----Original Message-----
 From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx]On Behalf Of 
Dave Winchell
Sent: Monday, June 09, 2008 5:35 PM
To: dan.magenheimer@xxxxxxxxxx; Keir Fraser
Cc: Dave Winchell; xen-devel; Ben Guthro
Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
    
The Linux policy is more subtle, but is required to go
from .1% to .03%.
        
 
Thanks for the good documentation which I hadn't thoroughly
read until now.
I now understand that the essence of your
hpet missed ticks policy is to ensure that ticks are never
delivered too close together.  But I'm trying to understand
WHY your patch works, in other words, what problem it is
countering.
      
 
I'll tell  you what I recall about this. Tomorrow I'll check the
guest code to verify. I think that Linux declares a full tick,
even if the interrupt is early. That's the problem.
On the other hand, if the interrupt is late it in effect declares
 a tick plus fraction. If it just declared the fraction in the 
first place,
we could deliver the interrupts whenever we wanted.
Its really not that different than the missed ticks policy in vpt.c
except that there the period in vpt.c is based on start of interrupt
and I have improved that with end-of interrupt as described
in the patch note.
I don't recall what prompted me to try end-of-interrupt,
but I saw a significant improvement. I may have been running
a monotonicity test at the same time to explain the lock
contention mentioned in the write-up.
    
I care about this for more reasons than just
because it is interesting: (1) I'd like to feel confident that
it is fixing a bug rather than just a symptom of a bug;
and (2) I wonder how universally it is applicable.
      
 
Its worked well my my small set of guests. You and our
QA are going to tell us about the wider set. It doesn't
matter if guest A handles interrupts closely spaced or not,
just whether it handles them far apart. So it should be pretty
universal with guests that really handle missed ticks.
I think its interesting that some 32bit Linux guests handle
missed ticks for hpet.
    
I see from code examination in mark_offset_hpet() in
RHEL4u5/arch/i386/kernel/timers/timer_hpet.c, that
the correction for lost ticks is just plain wrong in
a virtual environment. (Suppose for example that a virtual
tick was delivered every 1.999*hpet_tick... I think
the clock would be off by 50%!)  Is this the bug that
is being "countered" by your policy?
      
 
I haven't looked at that code, perhaps.
I'll check it tomorrow.
    
However, the lost tick handling in RHEL5u1/kernel/timer.c
(which I think is used also for hpet) is much better
so I am eager to find out if your policy works there
too.
If the hpet missed tick policy works for both, though,
I should be happy, though I wonder about upstream kernels
(e.g. the trend toward tickless).
      
 
I wasn't aware of this trend. If its robust, however, it should
handle late interrupts ...
    
That said, I'd rather
see this get into Xen 3.3 and worry about upstream kernels
later :-)
      
 
Regards,
Dave
-----Original Message-----
From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
Sent: Mon 6/9/2008 6:02 PM
To: Dave Winchell; Keir Fraser
Cc: Ben Guthro; xen-devel
Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
    
The Linux policy is more subtle, but is required to go
from .1% to .03%.
      
 
Thanks for the good documentation which I hadn't thoroughly
read until now.  I now understand that the essence of your
hpet missed ticks policy is to ensure that ticks are never
delivered too close together.  But I'm trying to understand
WHY your patch works, in other words, what problem it is
countering.  I care about this for more reasons than just
because it is interesting: (1) I'd like to feel confident that
it is fixing a bug rather than just a symptom of a bug;
and (2) I wonder how universally it is applicable.
I see from code examination in mark_offset_hpet() in
RHEL4u5/arch/i386/kernel/timers/timer_hpet.c, that
the correction for lost ticks is just plain wrong in
a virtual environment. (Suppose for example that a virtual
tick was delivered every 1.999*hpet_tick... I think
the clock would be off by 50%!)  Is this the bug that
is being "countered" by your policy?
However, the lost tick handling in RHEL5u1/kernel/timer.c
(which I think is used also for hpet) is much better
so I am eager to find out if your policy works there
too.
If the hpet missed tick policy works for both, though,
I should be happy, though I wonder about upstream kernels
(e.g. the trend toward tickless).  That said, I'd rather
see this get into Xen 3.3 and worry about upstream kernels
later :-)
-----Original Message-----
From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx]
Sent: Sunday, June 08, 2008 2:32 PM
To: dan.magenheimer@xxxxxxxxxx; Keir Fraser
Cc: Ben Guthro; xen-devel; Dave Winchell
Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
Hi Dan,
    
While I am fully supportive of offering hardware hpet as an option
for hvm guests (let's call it hwhpet=1 for shorthand), I am very
surprised by your preliminary results; the most obvious conclusion
is that Xen system time is losing time at the rate of 1000 PPM
though its possible there's a bug somewhere else in the "time
stack".  Your Windows result is jaw-dropping and inexplicable,
though I have to admit ignorance of how Windows manages time.
      
 
I think xen system time is fine. You have to add the interrupt
delivery policies decribed in the write-up for the patch to get
accurate timekeeping in the guest.
The windows policy is obvious and results in a large improvement
in accuracy. The Linux policy is more subtle, but is required to go
from .1% to .03%.
    
I think with my recent patch and hpet=1 (essentially the same as
your emulated hpet), hvm guest time should track Xen system time.
I wonder if domain0 (which if I understand correctly is directly
using Xen system time) is also seeing an error of .1%?  Also
I wonder for the skew you are seeing (in both hvm guests and
domain0) is time moving too fast or two slow?
      
 
I don't recall the direction. I can look it up in my notes at work
tomorrow.
    
Although hwhpet=1 is a fine alternative in many cases, it may
be unavailable on some systems and may cause significant performance
issues on others.  So I think we will still need to track down
the poor accuracy when hwhpet=0.
      
 
Our patch is accurate to < .03% using the physical hpet mode or
the simulated mode.
    
And if for some reason
Xen system time can't be made accurate enough (< 0.05%), then
I think we should consider building Xen system time itself on
top of hardware hpet instead of TSC... at least when Xen discovers
a capable hpet.
      
 
In our experience, Xen system time is accurate enough now.
    
One more thought... do you know the accuracy of the TSC crystals
on your test systems?  I posted a patch awhile ago that was
intended to test that, though I guess it was only testing skew
of different TSCs on the same system, not TSCs against an
external time source.
      
 
I do not know the tsc accuracy.
    
Or maybe there's a computation error somewhere in the hvm hpet
scaling code?  Hmmm...
      
 
Regards,
Dave
-----Original Message-----
From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
Sent: Fri 6/6/2008 4:29 PM
To: Dave Winchell; Keir Fraser
Cc: Ben Guthro; xen-devel
Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
Dave --
Thanks much for posting the preliminary results!
While I am fully supportive of offering hardware hpet as an option
for hvm guests (let's call it hwhpet=1 for shorthand), I am very
surprised by your preliminary results; the most obvious conclusion
is that Xen system time is losing time at the rate of 1000 PPM
though its possible there's a bug somewhere else in the "time
stack".  Your Windows result is jaw-dropping and inexplicable,
though I have to admit ignorance of how Windows manages time.
I think with my recent patch and hpet=1 (essentially the same as
your emulated hpet), hvm guest time should track Xen system time.
I wonder if domain0 (which if I understand correctly is directly
using Xen system time) is also seeing an error of .1%?  Also
I wonder for the skew you are seeing (in both hvm guests and
domain0) is time moving too fast or two slow?
Although hwhpet=1 is a fine alternative in many cases, it may
be unavailable on some systems and may cause significant performance
issues on others.  So I think we will still need to track down
the poor accuracy when hwhpet=0.  And if for some reason
Xen system time can't be made accurate enough (< 0.05%), then
I think we should consider building Xen system time itself on
top of hardware hpet instead of TSC... at least when Xen discovers
a capable hpet.
One more thought... do you know the accuracy of the TSC crystals
on your test systems?  I posted a patch awhile ago that was
intended to test that, though I guess it was only testing skew
of different TSCs on the same system, not TSCs against an
external time source.
Or maybe there's a computation error somewhere in the hvm hpet
scaling code?  Hmmm...
Thanks,
Dan
    
-----Original Message-----
From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx]
Sent: Friday, June 06, 2008 1:33 PM
To: dan.magenheimer@xxxxxxxxxx; Keir Fraser
Cc: Ben Guthro; xen-devel; Dave Winchell
Subject: Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
Dan, Keir:
Preliminary tests results indicate an error of .1% for Linux 64 bit
guests configured
 for hpet with xen-unstable as is. As we have discussed many 
     
 
times, the
    
ntp requirement is .05%.
 Tests on the patch we just submitted for hpet have 
     
 
indicated errors of
    
.0012%
on this platform under similar test conditions and .03% on
other platforms.
Windows vista64 has an error of 11% using hpet with the
xen-unstable bits.
In an overnight test with our hpet patch, the Windows vista
error was .008%.
The tests are with two or three guests on a physical node, all under
load, and with
the ratio of vcpus to phys cpus > 1.
I will continue to run tests over the next few days.
thanks,
Dave
Dan Magenheimer wrote:
      
Hi Dave and Ben --
When running tests on xen-unstable (without your patch),
        
 
please ensure
      
that hpet=1 is set in the hvm config and also I think 
       
 
 
that when hpet
    
is the clocksource on RHEL4-32, the clock IS resilient to
        
 
missed ticks
      
so timer_mode should be 2 (vs when pit is the clocksource
        
 
on RHEL4-32,
      
all clock ticks must be delivered and so timer_mode should be 0).
Per
        
 
http://lists.xensource.com/archives/html/xen-devel/2008-06/msg
00098.html it's
      
my intent to clean this up, but I won't get to it until next week.
Thanks,
Dan
   -----Original Message-----
   *From:* xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
   [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx]*On
        
 
Behalf Of *Dave
      
   Winchell
   *Sent:* Friday, June 06, 2008 4:46 AM
   *To:* Keir Fraser; Ben Guthro; xen-devel
   *Cc:* dan.magenheimer@xxxxxxxxxx; Dave Winchell
   *Subject:* RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
   Keir,
   I think the changes are required. We'll run some tests
        
 
today today so
      
   that we have some data to talk about.
   -Dave
   -----Original Message-----
   From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx on behalf
        
 
of Keir Fraser
      
   Sent: Fri 6/6/2008 4:58 AM
   To: Ben Guthro; xen-devel
   Cc: dan.magenheimer@xxxxxxxxxx
   Subject: Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
    Are these patches needed now the timers are built on 
       
 
 
Xen system
    
   time rather
   than host TSC? Dan has reported much better
        
 
time-keeping with his
      
   patch
   checked in, and it¹s for sure a lot less invasive than
        
 
this patchset.
      
    -- Keir
   On 5/6/08 15:59, "Ben Guthro" <bguthro@xxxxxxxxxxxxxxx> wrote:
   >
   > 1. Introduction
   >
   > This patch improves the hpet based guest clock in
       
 
terms of drift and
      
   > monotonicity.
   > Prior to this work the drift with hpet was greater
        
 
than 2%, far
      
   above the .05%
   > limit
    > for ntp to synchronize. With this code, the drift 
       
 
 
ranges from
    
   .001% to .0033%
   > depending
   > on guest and physical platform.
   >
    > Using hpet allows guest operating systems to 
       
 
 
provide monotonic
    
   time to their
   > applications. Time sources other than hpet are not
        
 
monotonic because
      
   > of their reliance on tsc, which is not synchronized
        
 
across physical
      
   > processors.
   >
   > Windows 2k864 and many Linux guests are supported with two
   policies, one for
   > guests
   > that handle missed clock interrupts and the other for guests
   that require the
   > correct number of interrupts.
   >
    > Guests may use hpet for the timing source even if 
       
 
 
the physical
    
   platform has no
   > visible
   > hpet. Migration is supported between physical machines which
   differ in
   > physical
   > hpet visibility.
   >
    > Most of the changes are in hpet.c. Two general 
       
 
 
facilities are
    
   added to track
   > interrupt
    > progress. The ideas here and the facilities would 
       
 
 
be useful in
    
   vpt.c, for
   > other time
   > sources, though no attempt is made here to improve vpt.c.
   >
   > The following sections discuss hpet dependencies, interrupt
   delivery policies,
   > live migration,
    > test results, and relation to recent work with 
       
 
 
monotonic time.
    
   >
   >
   > 2. Virtual Hpet dependencies
   >
   > The virtual hpet depends on the ability to read the
        
 
physical or
      
   simulated
   > (see discussion below) hpet.  For timekeeping, the
        
 
virtual hpet
      
   also depends
    > on two new interrupt notification facilities to 
       
 
 
implement its
    
   policies for
   > interrupt delivery.
   >
   > 2.1. Two modes of low-level hpet main counter reads.
   >
   > In this implementation, the virtual hpet reads with
   read_64_main_counter(),
   > exported by
   > time.c, either the real physical hpet main counter register
   directly or a
   > "simulated"
   > hpet main counter.
   >
   > The simulated mode uses a monotonic version of get_s_time()
   (NOW()), where the
   > last
    > time value is returned whenever the current time 
       
 
 
value is less
    
   than the last
   > time
   > value. In simulated mode, since it is layered on s_time, the
   underlying
   > hardware
   > can be hpet or some other device. The frequency of the main
   counter in
   > simulated
   > mode is the same as the standard physical hpet frequency,
   allowing live
   > migration
   > between nodes that are configured differently.
   >
   > If the physical platform does not have an hpet
        
 
device, or if xen
      
   is configured
   > not
    > to use the device, then the simulated method is 
       
 
 
used. If there
    
   is a physical
   > hpet device,
    > and xen has initialized it, then either simulated 
       
 
 
or physical
    
   mode can be
   > used.
   > This is governed by a boot time option, hpet-avoid.
        
 
Setting this
      
   option to 1
   > gives the
   > simulated mode and 0 the physical mode. The default
        
 
is physical
      
   mode.
   >
    > A disadvantage of the physical mode is that may 
       
 
 
take longer to
    
   read the device
   > than in simulated mode. On some platforms the cost is
        
 
about the
      
   same (less
   > than 250 nsec) for
    > physical and simulated modes, while on others 
       
 
 
physical cost is
    
   much higher
   > than simulated.
    > A disadvantage of the simulated mode is that it can 
       
 
 
return the
    
   same value
   > for the counter in consecutive calls.
   >
   > 2.2. Interrupt notification facilities.
   >
   > Two interrupt notification facilities are introduced, one is
   > hvm_isa_irq_assert_cb()
   > and the other hvm_register_intr_en_notif().
   >
    > The vhpet uses hvm_isa_irq_assert_cb to deliver 
       
 
 
interrupts to
    
   the vioapic.
    > hvm_isa_irq_assert_cb allows a callback to be 
       
 
 
passed along to
    
   > vioapic_deliver()
   > and this callback is called with a mask of the vcpus
        
 
which will
      
   get the
   > interrupt. This callback is made before any vcpus receive an
   interrupt.
   >
    > Vhpet uses hvm_register_intr_en_notif() to register 
       
 
 
a handler
    
   for a particular
   > vector that will be called when that vector is injected in
   > [vmx,svm]_intr_assist()
    > and also when the guest finishes handling the 
       
 
 
interrupt. Here
    
   finished is
   > defined
   > as the point when the guest re-enables interrupts or
        
 
lowers the
      
   tpr value.
   > EOI is not used as the end of interrupt as this is sometimes
   returned before
   > the interrupt handler has done its work. A flag is
        
 
passed to the
      
   handler
   > indicating
   > whether this is the injection point (post = 1) or the
        
 
interrupt
      
   finished (post
   > = 0) point.
   > The need for the finished point callback is discussed in the
   missed ticks
   > policy section.
   >
    > To prevent a possible early trigger of the finished 
       
 
 
callback,
    
   intr_en_notif
   > logic
   > has a two stage arm, the first at injection
   (hvm_intr_en_notif_arm()) and the
   > second when
   > interrupts are seen to be disabled
        
 
(hvm_intr_en_notif_disarm()).
      
   Once fully
   > armed, re-enabling
   > interrupts will cause hvm_intr_en_notif_disarm() to
        
 
make the end
      
   of interrupt
   > callback. hvm_intr_en_notif_arm() and
        
 
hvm_intr_en_notif_disarm()
      
   are called by
   > [vmx,svm]_intr_assist().
   >
   > 3. Interrupt delivery policies
   >
   > The existing hpet interrupt delivery is preserved.
        
 
This includes
      
   > vcpu round robin delivery used by Linux and 
       
 
 
broadcast delivery
    
   used by
   > Windows.
   >
    > There are two policies for interrupt delivery, one 
       
 
 
for Windows
    
   2k8-64 and the
   > other
   > for Linux. The Linux policy takes advantage of the
        
 
(guest) Linux
      
   missed tick
   > and offset
   > calculations and does not attempt to deliver the
        
 
right number of
      
   interrupts.
    > The Windows policy delivers the correct number of 
       
 
 
interrupts,
    
   even if
   > sometimes much
    > closer to each other than the period. The policies 
       
 
 
are similar
    
   to those in
   > vpt.c, though
   > there are some important differences.
   >
   > Policies are selected with an HVMOP_set_param
        
 
hypercall with index
      
   > HVM_PARAM_TIMER_MODE.
   > Two new values are added,
        
 
HVM_HPET_guest_computes_missed_ticks and
      
   > HVM_HPET_guest_does_not_compute_missed_ticks.  The 
       
 
 
reason that
    
   two new ones
   > are added is that
    > in some guests (32bit Linux) a no-missed policy is 
       
 
 
needed for
    
   clock sources
   > other than hpet
   > and a missed ticks policy for hpet. It was felt that
        
 
there would
      
   be less
   > confusion by simply
   > introducing the two hpet policies.
   >
   > 3.1. The missed ticks policy
   >
   > The Linux clock interrupt handler for hpet calculates missed
   ticks and offset
   > using the hpet
    > main counter. The algorithm works well when the 
       
 
 
time since the
    
   last interrupt
   > is greater than
   > or equal to a period and poorly otherwise.
   >
   > The missed ticks policy ensures that no two clock
        
 
interrupts are
      
   delivered to
   > the guest at
   > a time interval less than a period. A time stamp (hpet main
   counter value) is
   > recorded (by a
   > callback registered with hvm_register_intr_en_notif)
        
 
when Linux
      
   finishes
   > handling the clock
   > interrupt. Then, ensuing interrupts are delivered to
        
 
the vioapic
      
   only if the
   > current main
    > counter value is a period greater than when the 
       
 
 
last interrupt
    
   was handled.
   >
    > Tests showed a significant improvement in clock 
       
 
 
drift with end
    
   of interrupt
   > time stamps
   > versus beginning of interrupt[1]. It is believed that
        
 
the reason
      
   for the
   > improvement
   > is that the clock interrupt handler goes for a
        
 
spinlock and can
      
   be therefore
   > delayed in its
    > processing. Furthermore, the main counter is read 
       
 
 
by the guest
    
   under the lock.
   > The net
   > effect is that if we time stamp injection, we can get the
   difference in time
   > between successive interrupt handler lock acquisitions to be
   less than the
   > period.
   >
   > 3.2. The no-missed ticks policy
   >
   > Windows 2k864 keeps very poor time with the missed
        
 
ticks policy.
      
   So the
   > no-missed ticks policy
   > was developed. In the no-missed ticks policy we deliver the
   correct number of
   > interrupts,
   > even if they are spaced less than a period apart
        
 
(when catching up).
      
   >
   > Windows 2k864 uses a broadcast mode in the interrupt routing
   such that
   > all vcpus get the clock interrupt. The best Windows drift
   performance was
   > achieved when the
   > policy code ensured that all the previous interrupts (on the
   various vcpus)
   > had been injected
   > before injecting the next interrupt to the vioapic..
   >
   > The policy code works as follows. It uses the
   hvm_isa_irq_assert_cb() to
   > record
    > the vcpus to be interrupted in 
       
 
 
h->hpet.pending_mask. Then, in
    
   the callback
   > registered
    > with hvm_register_intr_en_notif() at post=1 time it 
       
 
 
clears the
    
   current vcpu in
   > the pending_mask.
   > When the pending_mask is clear it decrements
   hpet.intr_pending_nr and if
   > intr_pending_nr is still
   > non-zero posts another interrupt to the ioapic with
   hvm_isa_irq_assert_cb().
   > Intr_pending_nr is incremented in
   hpet_route_decision_not_missed_ticks().
   >
   > The missed ticks policy intr_en_notif callback also uses the
   pending_mask
   > method. So even though
    > Linux does not broadcast its interrupts, the code 
       
 
 
could handle
    
   it if it did.
    > In this case the end of interrupt time stamp is 
       
 
 
made when the
    
   pending_mask is
   > clear.
   >
   > 4. Live Migration
   >
   > Live migration with hpet preserves the current offset of the
   guest clock with
   > respect
    > to ntp. This is accomplished by migrating all of 
       
 
 
the state in
    
   the h->hpet data
   > structure
   > in the usual way. The hp->mc_offset is recalculated on the
   receiving node so
   > that the
   > guest sees a continuous hpet main counter.
   >
    > Code as been added to xc_domain_save.c to send a 
       
 
 
small message
    
   after the
   > domain context is sent. The contents of the message is the
   physical tsc
   > timestamp, last_tsc,
   > read just before the message is sent. When the
        
 
last_tsc message
      
   is received in
   > xc_domain_restore.c,
   > another physical tsc timestamp, cur_tsc, is read. The two
   timestamps are
   > loaded into the domain
   > structure as last_tsc_sender and first_tsc_receiver with
   hypercalls. Then
   > xc_domain_hvm_setcontext
   > is called so that hpet_load has access to these time stamps.
   Hpet_load uses
   > the timestamps
   > to account for the time spent saving and loading the domain
   context. With this
   > technique,
   > the only neglected time is the time spent sending a small
   network message.
   >
   > 5. Test Results
   >
   > Some recent test results are:
   >
   > 5.1 Linux 4u664 and Windows 2k864 load test.
   >       Duration: 70 hours.
   >       Test date: 6/2/08
   >       Loads: usex -b48 on Linux; burn-in on Windows
   >       Guest vcpus: 8 for Linux; 2 for Windows
   >       Hardware: 8 physical cpu AMD
   >       Clock drift : Linux: .0012% Windows: .009%
   >
    > 5.2 Linux 4u664, Linux 4u464 , and Windows 2k864 
       
 
 
no-load test
    
   >       Duration: 23 hours.
   >       Test date: 6/3/08
   >       Loads: none
   >       Guest vcpus: 8 for each Linux; 2 for Windows
   >       Hardware: 4 physical cpu AMD
   >       Clock drift : Linux: .033% Windows: .019%
   >
   > 6. Relation to recent work in xen-unstable
   >
   > There is a similarity between hvm_get_guest_time() in
   xen-unstable and
   > read_64_main_counter()
    > in this code. However, read_64_main_counter() is 
       
 
 
more tuned to
    
   the needs of
   > hpet.c. It has no
   > "set" operation, only the get. It isolates the mode,
        
 
physical or
      
   simulated, in
   > read_64_main_counter()
   > itself. It uses no vcpu or domain state as it is a physical
   entity, in either
   > mode. And it provides a real
   > physical mode for every read for those applications
        
 
that desire
      
   this.
   >
   > 7. Conclusion
   >
   > The virtual hpet is improved by this patch in terms
        
 
of accuracy and
      
   > monotonicity.
   > Tests performed to date verify this and more testing
        
 
is under way.
      
   >
   > 8. Future Work
   >
    > Testing with Windows Vista will be performed soon. 
       
 
 
The reason
    
   for accuracy
   > variations
    > on different platforms using the physical hpet 
       
 
 
device will be
    
   investigated.
    > Additional overhead measurements on simulated vs 
       
 
 
physical hpet
    
   mode will be
   > made.
   >
   > Footnotes:
   >
   > 1. I don't recall the accuracy improvement with end
        
 
of interrupt
      
   stamping, but
   > it was
   > significant, perhaps better than two to one improvement. It
   would be a very
   > simple matter
    > to re-measure the improvement as the facility can 
       
 
 
call back at
    
   injection time
   > as well.
   >
   >
   > Signed-off-by: Dave Winchell <dwinchell@xxxxxxxxxxxxxxx>
   > <mailto:dwinchell@xxxxxxxxxxxxxxx>
   > Signed-off-by: Ben Guthro <bguthro@xxxxxxxxxxxxxxx>
   > <mailto:bguthro@xxxxxxxxxxxxxxx>
   >
   >
   > _______________________________________________
   > Xen-devel mailing list
   > Xen-devel@xxxxxxxxxxxxxxxxxxx
   > http://lists.xensource.com/xen-devel
        
      
 
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
  
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 
 |   
 
| <Prev in Thread] | 
Current Thread | 
[Next in Thread>
 |  
- Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, (continued)
 
- RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dan Magenheimer
 - Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Keir Fraser
 - RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dan Magenheimer
 
- RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dan Magenheimer
 
- RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dan Magenheimer
 - RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dave Winchell
 - RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dan Magenheimer
 - RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dan Magenheimer
 - Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy,
Dave Winchell <=
 - RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dan Magenheimer
 
- RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dan Magenheimer
 
- Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Keir Fraser
 - RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dave Winchell
 - RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dave Winchell
 - Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Keir Fraser
 - Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dave Winchell
 - Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Keir Fraser
 - RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Dave Winchell
 
 
Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy, Steven Hand
 |  
  
 | 
    |