WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] Add a timer mode that disables pending missed ti

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] Add a timer mode that disables pending missed ticks
From: Dave Winchell <dwinchell@xxxxxxxxxxxxxxx>
Date: Thu, 01 Nov 2007 17:14:50 -0400
Cc: "Dong, Eddie" <eddie.dong@xxxxxxxxx>, Dave Winchell <dwinchell@xxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, "Shan, Haitao" <haitao.shan@xxxxxxxxx>, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Delivery-date: Mon, 19 Nov 2007 10:14:19 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C34DDAB6.FB02%Keir.Fraser@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C34DDAB6.FB02%Keir.Fraser@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
Hi Keir, Haitao, Eddie:

Here is the patch for timekeeping.
There are three components.

1. Increase missed ticks threshold to 100 seconds. Per the comment in the patch,
   I have seen scheduling delays of 5 seconds in a test lasting 30 minutes.
   The result is an immediate error of 5 seconds in guest time that
   persists.

2.  In pt_timer_fn for no_missed_tick_accounting option, only
   increment pt->pending_intr_nr if its found to be zero.
   This deals with the case where the guest has interrupts disabled
   for longer than a tick period.

3.  Call pt_process_missed_ticks() unconditionally and place the
   test for no_missed_tick_accounting inside pt_process_missed_ticks().
   This returns the calculation for the next timer expiration
   to the original method. The original method is important
   as it sets up a theoretical time space at t=0 where expirations
   occur only at n*period, where n is any integer. This, in turn, removes
   rounding errors.

Accuracy tests:

A.  32bit Linux guest (Item '1' above relevant)
   Test: 2 64bit Linux guests and 1 32bit Linux guest stacked.
         8 vcpus for each guest.
           2 physical processors on the node.
         All vcpus heavily loaded with work.
         Test duration: 3 hours.

       Result: clock error < .02%.
(Before this patch guest time was 16 seconds slow in a 3.5 minute test.)

B.  64bit Linux guest (Items '2' and '3' above relevant)
   Test: 2 64bit Linux guests.
         8 vcpus for each guest.
         2 physical processors on the node.
         All vcpus heavily loaded with work.
         Test duration: 3 hours.

       Result: clock error  .02%.
       (Before this patch clock error was 1.3%. The contributions of
   items '2' and '3' above to the improvement in accuracy are .4%
   and .9% respectively.

Of course, these tests are without ntpd running in the guest.
Ntpd requires a local clock error of < .05%.
With the .02% accuracy, one can used ntpd to synchronize the clock.


thanks,
Dave


Keir Fraser wrote:

I'm going to apply Haitao's no_missed_ticks_fix.patch. If you think fixes
are needed apart from that, please provide a unified diff.

-- Keir

On 30/10/07 21:16, "Dave Winchell" <dwinchell@xxxxxxxxxxxxxxx> wrote:

Keir,

Here are my comments on your change.
I've attached an updated vpt.c with the changes.

1. For the no_missed_tick_accounting method, we still need the update
   to pt->scheduled taking into account the time that has elapsed when
   missed ticks are calculated. missed_ticks (pt_process_missed_ticks)
   is called from pt_restore_timer() and pt_timer_fn() and, thus, its
   easiest to put the check for no_missed_tick_accounting method in
   missed_ticks() itself.

2. In pt_timer_fn you don't want to increment pending_intr_nr beyond 1
   for no_missed_tick_accounting option.

thanks,
Dave


Keir Fraser wrote:

Applied as c/s 16274. Please take a look and make sure the mode works
as you expect.

-- Keir

On 30/10/07 14:28, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:

   Hi, Keir,

   This patch adds a new timer mode, in which no missed ticks is
   calculated. This can be used with latest x86_64 linux guest, since
   it can pick up missed ticks themselves.

    <<no_missed_ticks.patch>>

   Best Regards
   Haitao Shan

   ------------------------------------------------------------------------
   _______________________________________________
   Xen-devel mailing list
   Xen-devel@xxxxxxxxxxxxxxxxxxx
   http://lists.xensource.com/xen-devel


*** vpt.c.new.c 2007-10-30 16:45:26.000000000 -0400
--- vpt.c 2007-10-30 15:30:57.000000000 -0400
***************
*** 57,73 ****
         return;
missed_ticks = missed_ticks / (s_time_t) pt->period + 1; ! ! if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) {
!  if ( missed_ticks > 1000 )
!      {
!   /* TODO: Adjust guest time together */
!   pt->pending_intr_nr++;
!      }
!  else
!      {
!   pt->pending_intr_nr += missed_ticks;
!      }
     }
pt->scheduled += missed_ticks * pt->period;
--- 57,70 ----
         return;
missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
!     if ( missed_ticks > 1000 )
!     {
!         /* TODO: Adjust guest time together */
!         pt->pending_intr_nr++;
!     }
!     else
!     {
!         pt->pending_intr_nr += missed_ticks;
     }
pt->scheduled += missed_ticks * pt->period;
***************
*** 120,126 ****
list_for_each_entry ( pt, head, list )
     {
!  pt_process_missed_ticks(pt);
         set_timer(&pt->timer, pt->scheduled);
     }
--- 117,124 ---- list_for_each_entry ( pt, head, list )
     {
!         if ( !mode_is(v->domain, no_missed_tick_accounting) )
!             pt_process_missed_ticks(pt);
         set_timer(&pt->timer, pt->scheduled);
     }
***************
*** 135,151 ****
pt_lock(pt); ! if (mode_is(pt->vcpu->domain, no_missed_tick_accounting)) {
!  if(!pt->pending_intr_nr)
!      pt->pending_intr_nr++;
!     }
!     else
!  pt->pending_intr_nr++;
if ( !pt->one_shot )
     {
         pt->scheduled += pt->period;
!  pt_process_missed_ticks(pt);
         set_timer(&pt->timer, pt->scheduled);
     }
--- 133,152 ---- pt_lock(pt); ! pt->pending_intr_nr++; if ( !pt->one_shot )
     {
         pt->scheduled += pt->period;
!         if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) )
!         {
!             pt_process_missed_ticks(pt);
!         }
!         else if ( (NOW() - pt->scheduled) >= 0 )
!         {
!             pt->pending_intr_nr++;
!             pt->scheduled = NOW() + pt->period;
!         }
         set_timer(&pt->timer, pt->scheduled);
     }
/*
* vpt.c: Virtual Platform Timer
*
* Copyright (c) 2006, Xiaowei Yang, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along
with
* this program; if not, write to the Free Software Foundation, Inc., 59
Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/

#include <xen/time.h>
#include <asm/hvm/support.h>
#include <asm/hvm/vpt.h>
#include <asm/event.h>

#define mode_is(d, name) \
   ((d)->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] == HVMPTM_##name)

static void pt_lock(struct periodic_time *pt)
{
   struct vcpu *v;

   for ( ; ; )
   {
       v = pt->vcpu;
       spin_lock(&v->arch.hvm_vcpu.tm_lock);
       if ( likely(pt->vcpu == v) )
           break;
       spin_unlock(&v->arch.hvm_vcpu.tm_lock);
   }
}

static void pt_unlock(struct periodic_time *pt)
{
   spin_unlock(&pt->vcpu->arch.hvm_vcpu.tm_lock);
}

static void pt_process_missed_ticks(struct periodic_time *pt)
{
   s_time_t missed_ticks;

   if ( pt->one_shot )
       return;

   missed_ticks = NOW() - pt->scheduled;
   if ( missed_ticks <= 0 )
       return;

   missed_ticks = missed_ticks / (s_time_t) pt->period + 1;

   if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) {
if ( missed_ticks > 1000 )
  {
/* TODO: Adjust guest time together */
pt->pending_intr_nr++;
  }
else
  {
pt->pending_intr_nr += missed_ticks;
  }
   }

   pt->scheduled += missed_ticks * pt->period;
}

static void pt_freeze_time(struct vcpu *v)
{
   if ( !mode_is(v->domain, delay_for_missed_ticks) )
       return;

   v->arch.hvm_vcpu.guest_time = hvm_get_guest_time(v);
}

static void pt_thaw_time(struct vcpu *v)
{
   if ( !mode_is(v->domain, delay_for_missed_ticks) )
       return;

   if ( v->arch.hvm_vcpu.guest_time == 0 )
       return;

   hvm_set_guest_time(v, v->arch.hvm_vcpu.guest_time);
   v->arch.hvm_vcpu.guest_time = 0;
}

void pt_save_timer(struct vcpu *v)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;

   if ( test_bit(_VPF_blocked, &v->pause_flags) )
       return;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   list_for_each_entry ( pt, head, list )
       stop_timer(&pt->timer);

   pt_freeze_time(v);

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}

void pt_restore_timer(struct vcpu *v)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   list_for_each_entry ( pt, head, list )
   {
pt_process_missed_ticks(pt);
       set_timer(&pt->timer, pt->scheduled);
   }

   pt_thaw_time(v);

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}

static void pt_timer_fn(void *data)
{
   struct periodic_time *pt = data;

   pt_lock(pt);

   if (mode_is(pt->vcpu->domain, no_missed_tick_accounting)) {
if(!pt->pending_intr_nr)
  pt->pending_intr_nr++;
   }
   else
pt->pending_intr_nr++;

   if ( !pt->one_shot )
   {
       pt->scheduled += pt->period;
pt_process_missed_ticks(pt);
       set_timer(&pt->timer, pt->scheduled);
   }

   vcpu_kick(pt->vcpu);

   pt_unlock(pt);
}

void pt_update_irq(struct vcpu *v)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;
   uint64_t max_lag = -1ULL;
   int irq = -1;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   list_for_each_entry ( pt, head, list )
   {
       if ( !is_isa_irq_masked(v, pt->irq) && pt->pending_intr_nr &&
            ((pt->last_plt_gtime + pt->period_cycles) < max_lag) )
       {
           max_lag = pt->last_plt_gtime + pt->period_cycles;
           irq = pt->irq;
       }
   }

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);

   if ( is_lvtt(v, irq) )
   {
       vlapic_set_irq(vcpu_vlapic(v), irq, 0);
   }
   else if ( irq >= 0 )
   {
       hvm_isa_irq_deassert(v->domain, irq);
       hvm_isa_irq_assert(v->domain, irq);
   }
}

static struct periodic_time *is_pt_irq(
   struct vcpu *v, struct hvm_intack intack)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;
   struct RTCState *rtc = &v->domain->arch.hvm_domain.pl_time.vrtc;
   int vector;

   list_for_each_entry ( pt, head, list )
   {
       if ( !pt->pending_intr_nr )
           continue;

       if ( is_lvtt(v, pt->irq) )
       {
           if ( pt->irq != intack.vector )
               continue;
           return pt;
       }

       vector = get_isa_irq_vector(v, pt->irq, intack.source);

       /* RTC irq need special care */
       if ( (intack.vector != vector) ||
            ((pt->irq == 8) && !is_rtc_periodic_irq(rtc)) )
           continue;

       return pt;
   }

   return NULL;
}

void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
{
   struct periodic_time *pt;
   time_cb *cb;
   void *cb_priv;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   pt = is_pt_irq(v, intack);
   if ( pt == NULL )
   {
       spin_unlock(&v->arch.hvm_vcpu.tm_lock);
       return;
   }

   if ( pt->one_shot )
   {
       pt->enabled = 0;
       list_del(&pt->list);
   }
   else
   {
       pt->pending_intr_nr--;
       if ( mode_is(v->domain, no_missed_tick_accounting) )
           pt->last_plt_gtime = hvm_get_guest_time(v);
       else
           pt->last_plt_gtime += pt->period_cycles;
   }

   if ( mode_is(v->domain, delay_for_missed_ticks) &&
        (hvm_get_guest_time(v) < pt->last_plt_gtime) )
       hvm_set_guest_time(v, pt->last_plt_gtime);

   cb = pt->cb;
   cb_priv = pt->priv;

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);

   if ( cb != NULL )
       cb(v, cb_priv);
}

void pt_reset(struct vcpu *v)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   list_for_each_entry ( pt, head, list )
   {
       pt->pending_intr_nr = 0;
       pt->last_plt_gtime = hvm_get_guest_time(pt->vcpu);
       pt->scheduled = NOW() + pt->period;
       set_timer(&pt->timer, pt->scheduled);
   }

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}

void pt_migrate(struct vcpu *v)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   list_for_each_entry ( pt, head, list )
       migrate_timer(&pt->timer, v->processor);

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}

void create_periodic_time(
   struct vcpu *v, struct periodic_time *pt, uint64_t period,
   uint8_t irq, char one_shot, time_cb *cb, void *data)
{
   destroy_periodic_time(pt);

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   pt->enabled = 1;
   pt->pending_intr_nr = 0;

   /* Periodic timer must be at least 0.9ms. */
   if ( (period < 900000) && !one_shot )
   {
       gdprintk(XENLOG_WARNING,
                "HVM_PlatformTime: program too small period %"PRIu64"\n",
                period);
       period = 900000;
   }

   pt->period = period;
   pt->vcpu = v;
   pt->last_plt_gtime = hvm_get_guest_time(pt->vcpu);
   pt->irq = irq;
   pt->period_cycles = (u64)period * cpu_khz / 1000000L;
   pt->one_shot = one_shot;
   pt->scheduled = NOW() + period;
   /*
    * Offset LAPIC ticks from other timer ticks. Otherwise guests which use
    * LAPIC ticks for process accounting can see long sequences of process
    * ticks incorrectly accounted to interrupt processing.
    */
   if ( is_lvtt(v, irq) )
       pt->scheduled += period >> 1;
   pt->cb = cb;
   pt->priv = data;

   list_add(&pt->list, &v->arch.hvm_vcpu.tm_list);

   init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
   set_timer(&pt->timer, pt->scheduled);

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}

void destroy_periodic_time(struct periodic_time *pt)
{
   if ( !pt->enabled )
       return;

   pt_lock(pt);
   pt->enabled = 0;
   list_del(&pt->list);
   pt_unlock(pt);

   /*
    * pt_timer_fn() can run until this kill_timer() returns. We must do this
    * outside pt_lock() otherwise we can deadlock with pt_timer_fn().
    */
   kill_timer(&pt->timer);
}



diff -r 2717128cbdd1 xen/arch/x86/hvm/vpt.c
--- a/xen/arch/x86/hvm/vpt.c    Wed Oct 31 10:07:42 2007 +0000
+++ b/xen/arch/x86/hvm/vpt.c    Thu Nov 01 16:19:06 2007 -0400
@@ -57,14 +57,23 @@ static void pt_process_missed_ticks(stru
         return;
 
     missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
-    if ( missed_ticks > 1000 )
-    {
-        /* TODO: Adjust guest time together */
-        pt->pending_intr_nr++;
-    }
-    else
-    {
-        pt->pending_intr_nr += missed_ticks;
+
+    if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) {
+       /* When many guests are stacked on a node and the total number of vcpus
+        * running loads is 10 timed the number of physical processors,
+        * scheduling delays of 5 seconds are common. So set the threshold at
+        * 100 seconds to be safe. It would be reasonable to remove this
+        * check against threshold completely.
+        */
+       if ( missed_ticks > 100000 )
+           {
+               /* TODO: Adjust guest time together */
+               pt->pending_intr_nr++;
+           }
+       else
+           {
+               pt->pending_intr_nr += missed_ticks;
+           }
     }
 
     pt->scheduled += missed_ticks * pt->period;
@@ -117,15 +126,7 @@ void pt_restore_timer(struct vcpu *v)
 
     list_for_each_entry ( pt, head, list )
     {
-        if ( !mode_is(v->domain, no_missed_tick_accounting) )
-        {
-            pt_process_missed_ticks(pt);
-        }
-        else if ( (NOW() - pt->scheduled) >= 0 )
-        {
-            pt->pending_intr_nr++;
-            pt->scheduled = NOW() + pt->period;
-        }
+       pt_process_missed_ticks(pt);
         set_timer(&pt->timer, pt->scheduled);
     }
 
@@ -140,13 +141,14 @@ static void pt_timer_fn(void *data)
 
     pt_lock(pt);
 
-    pt->pending_intr_nr++;
+    if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) || 
!pt->pending_intr_nr ) {
+       pt->pending_intr_nr++;
+    }
 
     if ( !pt->one_shot )
     {
         pt->scheduled += pt->period;
-        if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) )
-            pt_process_missed_ticks(pt);
+       pt_process_missed_ticks(pt);
         set_timer(&pt->timer, pt->scheduled);
     }
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel