WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy

To: "Dave Winchell" <dwinchell@xxxxxxxxxxxxxxx>, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx>, "Ben Guthro" <bguthro@xxxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
From: "Dan Magenheimer" <dan.magenheimer@xxxxxxxxxx>
Date: Fri, 6 Jun 2008 09:53:23 -0600
Delivery-date: Fri, 06 Jun 2008 08:54:40 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <B99564216C25704085A82B41C46DD3427B05EE@xxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Oracle Corporation
Reply-to: "dan.magenheimer@xxxxxxxxxx" <dan.magenheimer@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcjHs4LxwWkH2DOmEd2N7AAX8io7RQAConWsAAulsyA=
Hi Dave and Ben --
 
When running tests on xen-unstable (without your patch), please ensure that hpet=1 is set in the hvm config and also I think that when hpet is the clocksource on RHEL4-32, the clock IS resilient to missed ticks so timer_mode should be 2 (vs when pit is the clocksource on RHEL4-32, all clock ticks must be delivered and so timer_mode should be 0).
 
Per http://lists.xensource.com/archives/html/xen-devel/2008-06/msg00098.html it's my intent to clean this up, but I won't get to it until next week.
 
Thanks,
Dan
-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx]On Behalf Of Dave Winchell
Sent: Friday, June 06, 2008 4:46 AM
To: Keir Fraser; Ben Guthro; xen-devel
Cc: dan.magenheimer@xxxxxxxxxx; Dave Winchell
Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy

Keir,

I think the changes are required. We'll run some tests today today so
that we have some data to talk about.

-Dave


-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx on behalf of Keir Fraser
Sent: Fri 6/6/2008 4:58 AM
To: Ben Guthro; xen-devel
Cc: dan.magenheimer@xxxxxxxxxx
Subject: Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy

Are these patches needed now the timers are built on Xen system time rather
than host TSC? Dan has reported much better time-keeping with his patch
checked in, and it¹s for sure a lot less invasive than this patchset.


 -- Keir

On 5/6/08 15:59, "Ben Guthro" <bguthro@xxxxxxxxxxxxxxx> wrote:

>
> 1. Introduction
>
> This patch improves the hpet based guest clock in terms of drift and
> monotonicity.
> Prior to this work the drift with hpet was greater than 2%, far above the .05%
> limit
> for ntp to synchronize. With this code, the drift ranges from .001% to .0033%
> depending
> on guest and physical platform.
>
> Using hpet allows guest operating systems to provide monotonic time to their
> applications. Time sources other than hpet are not monotonic because
> of their reliance on tsc, which is not synchronized across physical
> processors.
>
> Windows 2k864 and many Linux guests are supported with two policies, one for
> guests
> that handle missed clock interrupts and the other for guests that require the
> correct number of interrupts.
>
> Guests may use hpet for the timing source even if the physical platform has no
> visible
> hpet. Migration is supported between physical machines which differ in
> physical
> hpet visibility.
>
> Most of the changes are in hpet.c. Two general facilities are added to track
> interrupt
> progress. The ideas here and the facilities would be useful in vpt.c, for
> other time
> sources, though no attempt is made here to improve vpt.c.
>
> The following sections discuss hpet dependencies, interrupt delivery policies,
> live migration,
> test results, and relation to recent work with monotonic time.
>
>
> 2. Virtual Hpet dependencies
>
> The virtual hpet depends on the ability to read the physical or simulated
> (see discussion below) hpet.  For timekeeping, the virtual hpet also depends
> on two new interrupt notification facilities to implement its policies for
> interrupt delivery.
>
> 2.1. Two modes of low-level hpet main counter reads.
>
> In this implementation, the virtual hpet reads with read_64_main_counter(),
> exported by
> time.c, either the real physical hpet main counter register directly or a
> "simulated"
> hpet main counter.
>
> The simulated mode uses a monotonic version of get_s_time() (NOW()), where the
> last
> time value is returned whenever the current time value is less than the last
> time
> value. In simulated mode, since it is layered on s_time, the underlying
> hardware
> can be hpet or some other device. The frequency of the main counter in
> simulated
> mode is the same as the standard physical hpet frequency, allowing live
> migration
> between nodes that are configured differently.
>
> If the physical platform does not have an hpet device, or if xen is configured
> not
> to use the device, then the simulated method is used. If there is a physical
> hpet device,
> and xen has initialized it, then either simulated or physical mode can be
> used.
> This is governed by a boot time option, hpet-avoid. Setting this option to 1
> gives the
> simulated mode and 0 the physical mode. The default is physical mode.
>
> A disadvantage of the physical mode is that may take longer to read the device
> than in simulated mode. On some platforms the cost is about the same (less
> than 250 nsec) for
> physical and simulated modes, while on others physical cost is much higher
> than simulated.
> A disadvantage of the simulated mode is that it can return the same value
> for the counter in consecutive calls.
>
> 2.2. Interrupt notification facilities.
>
> Two interrupt notification facilities are introduced, one is
> hvm_isa_irq_assert_cb()
> and the other hvm_register_intr_en_notif().
>
> The vhpet uses hvm_isa_irq_assert_cb to deliver interrupts to the vioapic.
> hvm_isa_irq_assert_cb allows a callback to be passed along to
> vioapic_deliver()
> and this callback is called with a mask of the vcpus which will get the
> interrupt. This callback is made before any vcpus receive an interrupt.
>
> Vhpet uses hvm_register_intr_en_notif() to register a handler for a particular
> vector that will be called when that vector is injected in
> [vmx,svm]_intr_assist()
> and also when the guest finishes handling the interrupt. Here finished is
> defined
> as the point when the guest re-enables interrupts or lowers the tpr value.
> EOI is not used as the end of interrupt as this is sometimes returned before
> the interrupt handler has done its work. A flag is passed to the handler
> indicating
> whether this is the injection point (post = 1) or the interrupt finished (post
> = 0) point.
> The need for the finished point callback is discussed in the missed ticks
> policy section.
>
> To prevent a possible early trigger of the finished callback, intr_en_notif
> logic
> has a two stage arm, the first at injection (hvm_intr_en_notif_arm()) and the
> second when
> interrupts are seen to be disabled (hvm_intr_en_notif_disarm()). Once fully
> armed, re-enabling
> interrupts will cause hvm_intr_en_notif_disarm() to make the end of interrupt
> callback. hvm_intr_en_notif_arm() and hvm_intr_en_notif_disarm() are called by
> [vmx,svm]_intr_assist().
>
> 3. Interrupt delivery policies
>
> The existing hpet interrupt delivery is preserved. This includes
> vcpu round robin delivery used by Linux and broadcast delivery used by
> Windows.
>
> There are two policies for interrupt delivery, one for Windows 2k8-64 and the
> other
> for Linux. The Linux policy takes advantage of the (guest) Linux missed tick
> and offset
> calculations and does not attempt to deliver the right number of interrupts.
> The Windows policy delivers the correct number of interrupts, even if
> sometimes much
> closer to each other than the period. The policies are similar to those in
> vpt.c, though
> there are some important differences.
>
> Policies are selected with an HVMOP_set_param hypercall with index
> HVM_PARAM_TIMER_MODE.
> Two new values are added, HVM_HPET_guest_computes_missed_ticks and
> HVM_HPET_guest_does_not_compute_missed_ticks.  The reason that two new ones
> are added is that
> in some guests (32bit Linux) a no-missed policy is needed for clock sources
> other than hpet
> and a missed ticks policy for hpet. It was felt that there would be less
> confusion by simply
> introducing the two hpet policies.
>
> 3.1. The missed ticks policy
>
> The Linux clock interrupt handler for hpet calculates missed ticks and offset
> using the hpet
> main counter. The algorithm works well when the time since the last interrupt
> is greater than
> or equal to a period and poorly otherwise.
>
> The missed ticks policy ensures that no two clock interrupts are delivered to
> the guest at
> a time interval less than a period. A time stamp (hpet main counter value) is
> recorded (by a
> callback registered with hvm_register_intr_en_notif) when Linux finishes
> handling the clock
> interrupt. Then, ensuing interrupts are delivered to the vioapic only if the
> current main
> counter value is a period greater than when the last interrupt was handled.
>
> Tests showed a significant improvement in clock drift with end of interrupt
> time stamps
> versus beginning of interrupt[1]. It is believed that the reason for the
> improvement
> is that the clock interrupt handler goes for a spinlock and can be therefore
> delayed in its
> processing. Furthermore, the main counter is read by the guest under the lock.
> The net
> effect is that if we time stamp injection, we can get the difference in time
> between successive interrupt handler lock acquisitions to be less than the
> period.
>
> 3.2. The no-missed ticks policy
>
> Windows 2k864 keeps very poor time with the missed ticks policy. So the
> no-missed ticks policy
> was developed. In the no-missed ticks policy we deliver the correct number of
> interrupts,
> even if they are spaced less than a period apart (when catching up).
>
> Windows 2k864 uses a broadcast mode in the interrupt routing such that
> all vcpus get the clock interrupt. The best Windows drift performance was
> achieved when the
> policy code ensured that all the previous interrupts (on the various vcpus)
> had been injected
> before injecting the next interrupt to the vioapic..
>
> The policy code works as follows. It uses the hvm_isa_irq_assert_cb() to
> record
> the vcpus to be interrupted in h->hpet.pending_mask. Then, in the callback
> registered
> with hvm_register_intr_en_notif() at post=1 time it clears the current vcpu in
> the pending_mask.
> When the pending_mask is clear it decrements hpet.intr_pending_nr and if
> intr_pending_nr is still
> non-zero posts another interrupt to the ioapic with hvm_isa_irq_assert_cb().
> Intr_pending_nr is incremented in hpet_route_decision_not_missed_ticks().
>
> The missed ticks policy intr_en_notif callback also uses the pending_mask
> method. So even though
> Linux does not broadcast its interrupts, the code could handle it if it did.
> In this case the end of interrupt time stamp is made when the pending_mask is
> clear.
>
> 4. Live Migration
>
> Live migration with hpet preserves the current offset of the guest clock with
> respect
> to ntp. This is accomplished by migrating all of the state in the h->hpet data
> structure
> in the usual way. The hp->mc_offset is recalculated on the receiving node so
> that the
> guest sees a continuous hpet main counter.
>
> Code as been added to xc_domain_save.c to send a small message after the
> domain context is sent. The contents of the message is the physical tsc
> timestamp, last_tsc,
> read just before the message is sent. When the last_tsc message is received in
> xc_domain_restore.c,
> another physical tsc timestamp, cur_tsc, is read. The two timestamps are
> loaded into the domain
> structure as last_tsc_sender and first_tsc_receiver with hypercalls. Then
> xc_domain_hvm_setcontext
> is called so that hpet_load has access to these time stamps. Hpet_load uses
> the timestamps
> to account for the time spent saving and loading the domain context. With this
> technique,
> the only neglected time is the time spent sending a small network message.
>
> 5. Test Results
>
> Some recent test results are:
>
> 5.1 Linux 4u664 and Windows 2k864 load test.
>       Duration: 70 hours.
>       Test date: 6/2/08
>       Loads: usex -b48 on Linux; burn-in on Windows
>       Guest vcpus: 8 for Linux; 2 for Windows
>       Hardware: 8 physical cpu AMD
>       Clock drift : Linux: .0012% Windows: .009%
>
> 5.2 Linux 4u664, Linux 4u464 , and Windows 2k864 no-load test
>       Duration: 23 hours.
>       Test date: 6/3/08
>       Loads: none
>       Guest vcpus: 8 for each Linux; 2 for Windows
>       Hardware: 4 physical cpu AMD
>       Clock drift : Linux: .033% Windows: .019%
>
> 6. Relation to recent work in xen-unstable
>
> There is a similarity between hvm_get_guest_time() in xen-unstable and
> read_64_main_counter()
> in this code. However, read_64_main_counter() is more tuned to the needs of
> hpet.c. It has no
> "set" operation, only the get. It isolates the mode, physical or simulated, in
> read_64_main_counter()
> itself. It uses no vcpu or domain state as it is a physical entity, in either
> mode. And it provides a real
> physical mode for every read for those applications that desire this.
>
> 7. Conclusion
>
> The virtual hpet is improved by this patch in terms of accuracy and
> monotonicity.
> Tests performed to date verify this and more testing is under way.
>
> 8. Future Work
>
> Testing with Windows Vista will be performed soon. The reason for accuracy
> variations
> on different platforms using the physical hpet device will be investigated.
> Additional overhead measurements on simulated vs physical hpet mode will be
> made.
>
> Footnotes:
>
> 1. I don't recall the accuracy improvement with end of interrupt stamping, but
> it was
> significant, perhaps better than two to one improvement. It would be a very
> simple matter
> to re-measure the improvement as the facility can call back at injection time
> as well.
>
>
> Signed-off-by: Dave Winchell <dwinchell@xxxxxxxxxxxxxxx>
> <mailto:dwinchell@xxxxxxxxxxxxxxx>
> Signed-off-by: Ben Guthro <bguthro@xxxxxxxxxxxxxxx>
> <mailto:bguthro@xxxxxxxxxxxxxxx>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel