Hi Keir, Haitao, Eddie:
Here is the patch for timekeeping.
There are three components.
1. Increase missed ticks threshold to 100 seconds. Per the comment in
the patch,
I have seen scheduling delays of 5 seconds in a test lasting 30 minutes.
The result is an immediate error of 5 seconds in guest time that
persists.
2. In pt_timer_fn for no_missed_tick_accounting option, only
increment pt->pending_intr_nr if its found to be zero.
This deals with the case where the guest has interrupts disabled
for longer than a tick period.
3. Call pt_process_missed_ticks() unconditionally and place the
test for no_missed_tick_accounting inside pt_process_missed_ticks().
This returns the calculation for the next timer expiration
to the original method. The original method is important
as it sets up a theoretical time space at t=0 where expirations
occur only at n*period, where n is any integer. This, in turn, removes
rounding errors.
Accuracy tests:
A. 32bit Linux guest (Item '1' above relevant)
Test: 2 64bit Linux guests and 1 32bit Linux guest stacked.
8 vcpus for each guest.
2 physical processors on the node.
All vcpus heavily loaded with work.
Test duration: 3 hours.
Result: clock error < .02%.
(Before this patch guest time was 16 seconds slow in a 3.5
minute test.)
B. 64bit Linux guest (Items '2' and '3' above relevant)
Test: 2 64bit Linux guests.
8 vcpus for each guest.
2 physical processors on the node.
All vcpus heavily loaded with work.
Test duration: 3 hours.
Result: clock error .02%.
(Before this patch clock error was 1.3%. The contributions of
items '2' and '3' above to the improvement in accuracy are .4%
and .9% respectively.
Of course, these tests are without ntpd running in the guest.
Ntpd requires a local clock error of < .05%.
With the .02% accuracy, one can used ntpd to synchronize the clock.
thanks,
Dave
Keir Fraser wrote:
I'm going to apply Haitao's no_missed_ticks_fix.patch. If you think fixes
are needed apart from that, please provide a unified diff.
-- Keir
On 30/10/07 21:16, "Dave Winchell" <dwinchell@xxxxxxxxxxxxxxx> wrote:
Keir,
Here are my comments on your change.
I've attached an updated vpt.c with the changes.
1. For the no_missed_tick_accounting method, we still need the update
to pt->scheduled taking into account the time that has elapsed when
missed ticks are calculated. missed_ticks (pt_process_missed_ticks)
is called from pt_restore_timer() and pt_timer_fn() and, thus, its
easiest to put the check for no_missed_tick_accounting method in
missed_ticks() itself.
2. In pt_timer_fn you don't want to increment pending_intr_nr beyond 1
for no_missed_tick_accounting option.
thanks,
Dave
Keir Fraser wrote:
Applied as c/s 16274. Please take a look and make sure the mode works
as you expect.
-- Keir
On 30/10/07 14:28, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:
Hi, Keir,
This patch adds a new timer mode, in which no missed ticks is
calculated. This can be used with latest x86_64 linux guest, since
it can pick up missed ticks themselves.
<<no_missed_ticks.patch>>
Best Regards
Haitao Shan
------------------------------------------------------------------------
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
*** vpt.c.new.c 2007-10-30 16:45:26.000000000 -0400
--- vpt.c 2007-10-30 15:30:57.000000000 -0400
***************
*** 57,73 ****
return;
missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
!
! if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) {
! if ( missed_ticks > 1000 )
! {
! /* TODO: Adjust guest time together */
! pt->pending_intr_nr++;
! }
! else
! {
! pt->pending_intr_nr += missed_ticks;
! }
}
pt->scheduled += missed_ticks * pt->period;
--- 57,70 ----
return;
missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
! if ( missed_ticks > 1000 )
! {
! /* TODO: Adjust guest time together */
! pt->pending_intr_nr++;
! }
! else
! {
! pt->pending_intr_nr += missed_ticks;
}
pt->scheduled += missed_ticks * pt->period;
***************
*** 120,126 ****
list_for_each_entry ( pt, head, list )
{
! pt_process_missed_ticks(pt);
set_timer(&pt->timer, pt->scheduled);
}
--- 117,124 ----
list_for_each_entry ( pt, head, list )
{
! if ( !mode_is(v->domain, no_missed_tick_accounting) )
! pt_process_missed_ticks(pt);
set_timer(&pt->timer, pt->scheduled);
}
***************
*** 135,151 ****
pt_lock(pt);
! if (mode_is(pt->vcpu->domain, no_missed_tick_accounting)) {
! if(!pt->pending_intr_nr)
! pt->pending_intr_nr++;
! }
! else
! pt->pending_intr_nr++;
if ( !pt->one_shot )
{
pt->scheduled += pt->period;
! pt_process_missed_ticks(pt);
set_timer(&pt->timer, pt->scheduled);
}
--- 133,152 ----
pt_lock(pt);
! pt->pending_intr_nr++;
if ( !pt->one_shot )
{
pt->scheduled += pt->period;
! if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) )
! {
! pt_process_missed_ticks(pt);
! }
! else if ( (NOW() - pt->scheduled) >= 0 )
! {
! pt->pending_intr_nr++;
! pt->scheduled = NOW() + pt->period;
! }
set_timer(&pt->timer, pt->scheduled);
}
/*
* vpt.c: Virtual Platform Timer
*
* Copyright (c) 2006, Xiaowei Yang, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along
with
* this program; if not, write to the Free Software Foundation, Inc., 59
Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/
#include <xen/time.h>
#include <asm/hvm/support.h>
#include <asm/hvm/vpt.h>
#include <asm/event.h>
#define mode_is(d, name) \
((d)->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] == HVMPTM_##name)
static void pt_lock(struct periodic_time *pt)
{
struct vcpu *v;
for ( ; ; )
{
v = pt->vcpu;
spin_lock(&v->arch.hvm_vcpu.tm_lock);
if ( likely(pt->vcpu == v) )
break;
spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}
}
static void pt_unlock(struct periodic_time *pt)
{
spin_unlock(&pt->vcpu->arch.hvm_vcpu.tm_lock);
}
static void pt_process_missed_ticks(struct periodic_time *pt)
{
s_time_t missed_ticks;
if ( pt->one_shot )
return;
missed_ticks = NOW() - pt->scheduled;
if ( missed_ticks <= 0 )
return;
missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) {
if ( missed_ticks > 1000 )
{
/* TODO: Adjust guest time together */
pt->pending_intr_nr++;
}
else
{
pt->pending_intr_nr += missed_ticks;
}
}
pt->scheduled += missed_ticks * pt->period;
}
static void pt_freeze_time(struct vcpu *v)
{
if ( !mode_is(v->domain, delay_for_missed_ticks) )
return;
v->arch.hvm_vcpu.guest_time = hvm_get_guest_time(v);
}
static void pt_thaw_time(struct vcpu *v)
{
if ( !mode_is(v->domain, delay_for_missed_ticks) )
return;
if ( v->arch.hvm_vcpu.guest_time == 0 )
return;
hvm_set_guest_time(v, v->arch.hvm_vcpu.guest_time);
v->arch.hvm_vcpu.guest_time = 0;
}
void pt_save_timer(struct vcpu *v)
{
struct list_head *head = &v->arch.hvm_vcpu.tm_list;
struct periodic_time *pt;
if ( test_bit(_VPF_blocked, &v->pause_flags) )
return;
spin_lock(&v->arch.hvm_vcpu.tm_lock);
list_for_each_entry ( pt, head, list )
stop_timer(&pt->timer);
pt_freeze_time(v);
spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}
void pt_restore_timer(struct vcpu *v)
{
struct list_head *head = &v->arch.hvm_vcpu.tm_list;
struct periodic_time *pt;
spin_lock(&v->arch.hvm_vcpu.tm_lock);
list_for_each_entry ( pt, head, list )
{
pt_process_missed_ticks(pt);
set_timer(&pt->timer, pt->scheduled);
}
pt_thaw_time(v);
spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}
static void pt_timer_fn(void *data)
{
struct periodic_time *pt = data;
pt_lock(pt);
if (mode_is(pt->vcpu->domain, no_missed_tick_accounting)) {
if(!pt->pending_intr_nr)
pt->pending_intr_nr++;
}
else
pt->pending_intr_nr++;
if ( !pt->one_shot )
{
pt->scheduled += pt->period;
pt_process_missed_ticks(pt);
set_timer(&pt->timer, pt->scheduled);
}
vcpu_kick(pt->vcpu);
pt_unlock(pt);
}
void pt_update_irq(struct vcpu *v)
{
struct list_head *head = &v->arch.hvm_vcpu.tm_list;
struct periodic_time *pt;
uint64_t max_lag = -1ULL;
int irq = -1;
spin_lock(&v->arch.hvm_vcpu.tm_lock);
list_for_each_entry ( pt, head, list )
{
if ( !is_isa_irq_masked(v, pt->irq) && pt->pending_intr_nr &&
((pt->last_plt_gtime + pt->period_cycles) < max_lag) )
{
max_lag = pt->last_plt_gtime + pt->period_cycles;
irq = pt->irq;
}
}
spin_unlock(&v->arch.hvm_vcpu.tm_lock);
if ( is_lvtt(v, irq) )
{
vlapic_set_irq(vcpu_vlapic(v), irq, 0);
}
else if ( irq >= 0 )
{
hvm_isa_irq_deassert(v->domain, irq);
hvm_isa_irq_assert(v->domain, irq);
}
}
static struct periodic_time *is_pt_irq(
struct vcpu *v, struct hvm_intack intack)
{
struct list_head *head = &v->arch.hvm_vcpu.tm_list;
struct periodic_time *pt;
struct RTCState *rtc = &v->domain->arch.hvm_domain.pl_time.vrtc;
int vector;
list_for_each_entry ( pt, head, list )
{
if ( !pt->pending_intr_nr )
continue;
if ( is_lvtt(v, pt->irq) )
{
if ( pt->irq != intack.vector )
continue;
return pt;
}
vector = get_isa_irq_vector(v, pt->irq, intack.source);
/* RTC irq need special care */
if ( (intack.vector != vector) ||
((pt->irq == 8) && !is_rtc_periodic_irq(rtc)) )
continue;
return pt;
}
return NULL;
}
void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
{
struct periodic_time *pt;
time_cb *cb;
void *cb_priv;
spin_lock(&v->arch.hvm_vcpu.tm_lock);
pt = is_pt_irq(v, intack);
if ( pt == NULL )
{
spin_unlock(&v->arch.hvm_vcpu.tm_lock);
return;
}
if ( pt->one_shot )
{
pt->enabled = 0;
list_del(&pt->list);
}
else
{
pt->pending_intr_nr--;
if ( mode_is(v->domain, no_missed_tick_accounting) )
pt->last_plt_gtime = hvm_get_guest_time(v);
else
pt->last_plt_gtime += pt->period_cycles;
}
if ( mode_is(v->domain, delay_for_missed_ticks) &&
(hvm_get_guest_time(v) < pt->last_plt_gtime) )
hvm_set_guest_time(v, pt->last_plt_gtime);
cb = pt->cb;
cb_priv = pt->priv;
spin_unlock(&v->arch.hvm_vcpu.tm_lock);
if ( cb != NULL )
cb(v, cb_priv);
}
void pt_reset(struct vcpu *v)
{
struct list_head *head = &v->arch.hvm_vcpu.tm_list;
struct periodic_time *pt;
spin_lock(&v->arch.hvm_vcpu.tm_lock);
list_for_each_entry ( pt, head, list )
{
pt->pending_intr_nr = 0;
pt->last_plt_gtime = hvm_get_guest_time(pt->vcpu);
pt->scheduled = NOW() + pt->period;
set_timer(&pt->timer, pt->scheduled);
}
spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}
void pt_migrate(struct vcpu *v)
{
struct list_head *head = &v->arch.hvm_vcpu.tm_list;
struct periodic_time *pt;
spin_lock(&v->arch.hvm_vcpu.tm_lock);
list_for_each_entry ( pt, head, list )
migrate_timer(&pt->timer, v->processor);
spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}
void create_periodic_time(
struct vcpu *v, struct periodic_time *pt, uint64_t period,
uint8_t irq, char one_shot, time_cb *cb, void *data)
{
destroy_periodic_time(pt);
spin_lock(&v->arch.hvm_vcpu.tm_lock);
pt->enabled = 1;
pt->pending_intr_nr = 0;
/* Periodic timer must be at least 0.9ms. */
if ( (period < 900000) && !one_shot )
{
gdprintk(XENLOG_WARNING,
"HVM_PlatformTime: program too small period %"PRIu64"\n",
period);
period = 900000;
}
pt->period = period;
pt->vcpu = v;
pt->last_plt_gtime = hvm_get_guest_time(pt->vcpu);
pt->irq = irq;
pt->period_cycles = (u64)period * cpu_khz / 1000000L;
pt->one_shot = one_shot;
pt->scheduled = NOW() + period;
/*
* Offset LAPIC ticks from other timer ticks. Otherwise guests which use
* LAPIC ticks for process accounting can see long sequences of process
* ticks incorrectly accounted to interrupt processing.
*/
if ( is_lvtt(v, irq) )
pt->scheduled += period >> 1;
pt->cb = cb;
pt->priv = data;
list_add(&pt->list, &v->arch.hvm_vcpu.tm_list);
init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
set_timer(&pt->timer, pt->scheduled);
spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}
void destroy_periodic_time(struct periodic_time *pt)
{
if ( !pt->enabled )
return;
pt_lock(pt);
pt->enabled = 0;
list_del(&pt->list);
pt_unlock(pt);
/*
* pt_timer_fn() can run until this kill_timer() returns. We must do this
* outside pt_lock() otherwise we can deadlock with pt_timer_fn().
*/
kill_timer(&pt->timer);
}
diff -r 2717128cbdd1 xen/arch/x86/hvm/vpt.c
--- a/xen/arch/x86/hvm/vpt.c Wed Oct 31 10:07:42 2007 +0000
+++ b/xen/arch/x86/hvm/vpt.c Thu Nov 01 16:19:06 2007 -0400
@@ -57,14 +57,23 @@ static void pt_process_missed_ticks(stru
return;
missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
- if ( missed_ticks > 1000 )
- {
- /* TODO: Adjust guest time together */
- pt->pending_intr_nr++;
- }
- else
- {
- pt->pending_intr_nr += missed_ticks;
+
+ if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) {
+ /* When many guests are stacked on a node and the total number of vcpus
+ * running loads is 10 timed the number of physical processors,
+ * scheduling delays of 5 seconds are common. So set the threshold at
+ * 100 seconds to be safe. It would be reasonable to remove this
+ * check against threshold completely.
+ */
+ if ( missed_ticks > 100000 )
+ {
+ /* TODO: Adjust guest time together */
+ pt->pending_intr_nr++;
+ }
+ else
+ {
+ pt->pending_intr_nr += missed_ticks;
+ }
}
pt->scheduled += missed_ticks * pt->period;
@@ -117,15 +126,7 @@ void pt_restore_timer(struct vcpu *v)
list_for_each_entry ( pt, head, list )
{
- if ( !mode_is(v->domain, no_missed_tick_accounting) )
- {
- pt_process_missed_ticks(pt);
- }
- else if ( (NOW() - pt->scheduled) >= 0 )
- {
- pt->pending_intr_nr++;
- pt->scheduled = NOW() + pt->period;
- }
+ pt_process_missed_ticks(pt);
set_timer(&pt->timer, pt->scheduled);
}
@@ -140,13 +141,14 @@ static void pt_timer_fn(void *data)
pt_lock(pt);
- pt->pending_intr_nr++;
+ if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ||
!pt->pending_intr_nr ) {
+ pt->pending_intr_nr++;
+ }
if ( !pt->one_shot )
{
pt->scheduled += pt->period;
- if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) )
- pt_process_missed_ticks(pt);
+ pt_process_missed_ticks(pt);
set_timer(&pt->timer, pt->scheduled);
}
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|