| 
         
xen-devel
RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
 
| 
 Hi 
Dave and Ben -- 
  
When 
running tests on xen-unstable (without your patch), please ensure that hpet=1 is 
set in the hvm config and also I think that when hpet is 
the clocksource on RHEL4-32, the clock IS resilient to missed ticks so 
timer_mode should be 2 (vs when pit is the clocksource on RHEL4-32, all 
clock ticks must be delivered and so timer_mode should be 
0). 
  
  
Thanks, 
Dan 
  
  Keir,
  I think the changes are required. We'll run some 
  tests today today so that we have some data to talk 
  about.
  -Dave
 
  -----Original Message----- From: 
  xen-devel-bounces@xxxxxxxxxxxxxxxxxxx on behalf of Keir Fraser Sent: Fri 
  6/6/2008 4:58 AM To: Ben Guthro; xen-devel Cc: 
  dan.magenheimer@xxxxxxxxxx Subject: Re: [Xen-devel] [PATCH 0/2] Improve 
  hpet accuracy
  Are these patches needed now the timers are built on Xen 
  system time rather than host TSC? Dan has reported much better time-keeping 
  with his patch checked in, and it¹s for sure a lot less invasive than this 
  patchset.
 
   -- Keir
  On 5/6/08 15:59, "Ben Guthro" 
  <bguthro@xxxxxxxxxxxxxxx> wrote:
  > > 1. 
  Introduction > > This patch improves the hpet based guest clock in 
  terms of drift and > monotonicity. > Prior to this work the drift 
  with hpet was greater than 2%, far above the .05% > limit > for 
  ntp to synchronize. With this code, the drift ranges from .001% to 
  .0033% > depending > on guest and physical 
  platform. > > Using hpet allows guest operating systems to provide 
  monotonic time to their > applications. Time sources other than hpet are 
  not monotonic because > of their reliance on tsc, which is not 
  synchronized across physical > processors. > > Windows 2k864 
  and many Linux guests are supported with two policies, one for > 
  guests > that handle missed clock interrupts and the other for guests 
  that require the > correct number of interrupts. > > Guests 
  may use hpet for the timing source even if the physical platform has 
  no > visible > hpet. Migration is supported between physical 
  machines which differ in > physical > hpet 
  visibility. > > Most of the changes are in hpet.c. Two general 
  facilities are added to track > interrupt > progress. The ideas 
  here and the facilities would be useful in vpt.c, for > other 
  time > sources, though no attempt is made here to improve 
  vpt.c. > > The following sections discuss hpet dependencies, 
  interrupt delivery policies, > live migration, > test results, and 
  relation to recent work with monotonic time. > > > 2. 
  Virtual Hpet dependencies > > The virtual hpet depends on the 
  ability to read the physical or simulated > (see discussion below) 
  hpet.  For timekeeping, the virtual hpet also depends > on two new 
  interrupt notification facilities to implement its policies for > 
  interrupt delivery. > > 2.1. Two modes of low-level hpet main 
  counter reads. > > In this implementation, the virtual hpet reads 
  with read_64_main_counter(), > exported by > time.c, either the 
  real physical hpet main counter register directly or a > 
  "simulated" > hpet main counter. > > The simulated mode uses 
  a monotonic version of get_s_time() (NOW()), where the > last > 
  time value is returned whenever the current time value is less than the 
  last > time > value. In simulated mode, since it is layered on 
  s_time, the underlying > hardware > can be hpet or some other 
  device. The frequency of the main counter in > simulated > mode is 
  the same as the standard physical hpet frequency, allowing live > 
  migration > between nodes that are configured 
  differently. > > If the physical platform does not have an hpet 
  device, or if xen is configured > not > to use the device, then 
  the simulated method is used. If there is a physical > hpet 
  device, > and xen has initialized it, then either simulated or physical 
  mode can be > used. > This is governed by a boot time option, 
  hpet-avoid. Setting this option to 1 > gives the > simulated mode 
  and 0 the physical mode. The default is physical mode. > > A 
  disadvantage of the physical mode is that may take longer to read the 
  device > than in simulated mode. On some platforms the cost is about the 
  same (less > than 250 nsec) for > physical and simulated modes, 
  while on others physical cost is much higher > than simulated. > A 
  disadvantage of the simulated mode is that it can return the same 
  value > for the counter in consecutive calls. > > 2.2. 
  Interrupt notification facilities. > > Two interrupt notification 
  facilities are introduced, one is > hvm_isa_irq_assert_cb() > and 
  the other hvm_register_intr_en_notif(). > > The vhpet uses 
  hvm_isa_irq_assert_cb to deliver interrupts to the vioapic. > 
  hvm_isa_irq_assert_cb allows a callback to be passed along to > 
  vioapic_deliver() > and this callback is called with a mask of the vcpus 
  which will get the > interrupt. This callback is made before any vcpus 
  receive an interrupt. > > Vhpet uses hvm_register_intr_en_notif() 
  to register a handler for a particular > vector that will be called when 
  that vector is injected in > [vmx,svm]_intr_assist() > and also 
  when the guest finishes handling the interrupt. Here finished is > 
  defined > as the point when the guest re-enables interrupts or lowers 
  the tpr value. > EOI is not used as the end of interrupt as this is 
  sometimes returned before > the interrupt handler has done its work. A 
  flag is passed to the handler > indicating > whether this is the 
  injection point (post = 1) or the interrupt finished (post > = 0) 
  point. > The need for the finished point callback is discussed in the 
  missed ticks > policy section. > > To prevent a possible 
  early trigger of the finished callback, intr_en_notif > logic > 
  has a two stage arm, the first at injection (hvm_intr_en_notif_arm()) and 
  the > second when > interrupts are seen to be disabled 
  (hvm_intr_en_notif_disarm()). Once fully > armed, re-enabling > 
  interrupts will cause hvm_intr_en_notif_disarm() to make the end of 
  interrupt > callback. hvm_intr_en_notif_arm() and 
  hvm_intr_en_notif_disarm() are called by > 
  [vmx,svm]_intr_assist(). > > 3. Interrupt delivery 
  policies > > The existing hpet interrupt delivery is preserved. 
  This includes > vcpu round robin delivery used by Linux and broadcast 
  delivery used by > Windows. > > There are two policies for 
  interrupt delivery, one for Windows 2k8-64 and the > other > for 
  Linux. The Linux policy takes advantage of the (guest) Linux missed 
  tick > and offset > calculations and does not attempt to deliver 
  the right number of interrupts. > The Windows policy delivers the 
  correct number of interrupts, even if > sometimes much > closer to 
  each other than the period. The policies are similar to those in > 
  vpt.c, though > there are some important differences. > > 
  Policies are selected with an HVMOP_set_param hypercall with index > 
  HVM_PARAM_TIMER_MODE. > Two new values are added, 
  HVM_HPET_guest_computes_missed_ticks and > 
  HVM_HPET_guest_does_not_compute_missed_ticks.  The reason that two new 
  ones > are added is that > in some guests (32bit Linux) a 
  no-missed policy is needed for clock sources > other than hpet > 
  and a missed ticks policy for hpet. It was felt that there would be 
  less > confusion by simply > introducing the two hpet 
  policies. > > 3.1. The missed ticks policy > > The 
  Linux clock interrupt handler for hpet calculates missed ticks and 
  offset > using the hpet > main counter. The algorithm works well 
  when the time since the last interrupt > is greater than > or 
  equal to a period and poorly otherwise. > > The missed ticks 
  policy ensures that no two clock interrupts are delivered to > the guest 
  at > a time interval less than a period. A time stamp (hpet main counter 
  value) is > recorded (by a > callback registered with 
  hvm_register_intr_en_notif) when Linux finishes > handling the 
  clock > interrupt. Then, ensuing interrupts are delivered to the vioapic 
  only if the > current main > counter value is a period greater 
  than when the last interrupt was handled. > > Tests showed a 
  significant improvement in clock drift with end of interrupt > time 
  stamps > versus beginning of interrupt[1]. It is believed that the 
  reason for the > improvement > is that the clock interrupt handler 
  goes for a spinlock and can be therefore > delayed in its > 
  processing. Furthermore, the main counter is read by the guest under the 
  lock. > The net > effect is that if we time stamp injection, we 
  can get the difference in time > between successive interrupt handler 
  lock acquisitions to be less than the > period. > > 3.2. The 
  no-missed ticks policy > > Windows 2k864 keeps very poor time with 
  the missed ticks policy. So the > no-missed ticks policy > was 
  developed. In the no-missed ticks policy we deliver the correct number 
  of > interrupts, > even if they are spaced less than a period 
  apart (when catching up). > > Windows 2k864 uses a broadcast mode 
  in the interrupt routing such that > all vcpus get the clock interrupt. 
  The best Windows drift performance was > achieved when the > 
  policy code ensured that all the previous interrupts (on the various 
  vcpus) > had been injected > before injecting the next interrupt 
  to the vioapic.. > > The policy code works as follows. It uses the 
  hvm_isa_irq_assert_cb() to > record > the vcpus to be interrupted 
  in h->hpet.pending_mask. Then, in the callback > registered > 
  with hvm_register_intr_en_notif() at post=1 time it clears the current vcpu 
  in > the pending_mask. > When the pending_mask is clear it 
  decrements hpet.intr_pending_nr and if > intr_pending_nr is 
  still > non-zero posts another interrupt to the ioapic with 
  hvm_isa_irq_assert_cb(). > Intr_pending_nr is incremented in 
  hpet_route_decision_not_missed_ticks(). > > The missed ticks 
  policy intr_en_notif callback also uses the pending_mask > method. So 
  even though > Linux does not broadcast its interrupts, the code could 
  handle it if it did. > In this case the end of interrupt time stamp is 
  made when the pending_mask is > clear. > > 4. Live 
  Migration > > Live migration with hpet preserves the current 
  offset of the guest clock with > respect > to ntp. This is 
  accomplished by migrating all of the state in the h->hpet data > 
  structure > in the usual way. The hp->mc_offset is recalculated on 
  the receiving node so > that the > guest sees a continuous hpet 
  main counter. > > Code as been added to xc_domain_save.c to send a 
  small message after the > domain context is sent. The contents of the 
  message is the physical tsc > timestamp, last_tsc, > read just 
  before the message is sent. When the last_tsc message is received in > 
  xc_domain_restore.c, > another physical tsc timestamp, cur_tsc, is read. 
  The two timestamps are > loaded into the domain > structure as 
  last_tsc_sender and first_tsc_receiver with hypercalls. Then > 
  xc_domain_hvm_setcontext > is called so that hpet_load has access to 
  these time stamps. Hpet_load uses > the timestamps > to account 
  for the time spent saving and loading the domain context. With this > 
  technique, > the only neglected time is the time spent sending a small 
  network message. > > 5. Test Results > > Some recent 
  test results are: > > 5.1 Linux 4u664 and Windows 2k864 load 
  test. >       Duration: 70 
  hours. >       Test date: 
  6/2/08 >       Loads: usex -b48 on Linux; 
  burn-in on Windows >       Guest vcpus: 8 
  for Linux; 2 for Windows >       Hardware: 
  8 physical cpu AMD >       Clock drift : 
  Linux: .0012% Windows: .009% > > 5.2 Linux 4u664, Linux 4u464 , 
  and Windows 2k864 no-load test >       
  Duration: 23 hours. >       Test date: 
  6/3/08 >       Loads: 
  none >       Guest vcpus: 8 for each 
  Linux; 2 for Windows >       Hardware: 4 
  physical cpu AMD >       Clock drift : 
  Linux: .033% Windows: .019% > > 6. Relation to recent work in 
  xen-unstable > > There is a similarity between 
  hvm_get_guest_time() in xen-unstable and > 
  read_64_main_counter() > in this code. However, read_64_main_counter() 
  is more tuned to the needs of > hpet.c. It has no > "set" 
  operation, only the get. It isolates the mode, physical or simulated, 
  in > read_64_main_counter() > itself. It uses no vcpu or domain 
  state as it is a physical entity, in either > mode. And it provides a 
  real > physical mode for every read for those applications that desire 
  this. > > 7. Conclusion > > The virtual hpet is 
  improved by this patch in terms of accuracy and > monotonicity. > 
  Tests performed to date verify this and more testing is under 
  way. > > 8. Future Work > > Testing with Windows Vista 
  will be performed soon. The reason for accuracy > variations > on 
  different platforms using the physical hpet device will be 
  investigated. > Additional overhead measurements on simulated vs 
  physical hpet mode will be > made. > > 
  Footnotes: > > 1. I don't recall the accuracy improvement with end 
  of interrupt stamping, but > it was > significant, perhaps better 
  than two to one improvement. It would be a very > simple matter > 
  to re-measure the improvement as the facility can call back at injection 
  time > as well. > > > Signed-off-by: Dave Winchell 
  <dwinchell@xxxxxxxxxxxxxxx> > <mailto:dwinchell@xxxxxxxxxxxxxxx> > 
  Signed-off-by: Ben Guthro <bguthro@xxxxxxxxxxxxxxx> > <mailto:bguthro@xxxxxxxxxxxxxxx> > > > 
  _______________________________________________ > Xen-devel mailing 
  list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel
 
 
 
   
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 
 |   
 
 | 
    |