>>> On 18.10.11 at 22:42, Laszlo Ersek <lersek@xxxxxxxxxx> wrote:
> ... because the "clock_event_device framework" already accounts for idle
> time through the "event_handler" function pointer in
> xen_timer_interrupt().
>
> The patch is intended as the completion of [1]. It should fix the double
> idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to
> stolen time accounting (the removed code seems to be isolated).
After some more looking around I still think it's incorrect, albeit for
a different reason: What tick_nohz_restart_sched_tick() accounts
as idle time is *all* time that passed while in cpu_idle(). What gets
accounted in do_stolen_accounting() (without your patch) is
different:
- time the vCPU was in RUNSTATE_blocked gets accounted as idle
- time the vCPU was in RUNSTATE_runnable and RUNSTATE_offline
gets accounted as stolen.
That is, on an overcommitted system (and without your patch) I
would expect you to not see the (full) double idle increment for a not
fully idle and not fully loaded vCPU.
Now we can of course say that for most purposes it's fine to
ignore the difference between idle and steal time, but there must
be a reason these have been getting accounted separately in Linux
without regard to Xen.
So I really think that what needs to be suppressed to address this
is tick_nohz_restart_sched_tick()'s call to account_idle_ticks(). But
simply forcing CONFIG_VIRT_CPU_ACCOUNTING is not an immediate
option (not even when considering a Xen-only kernel), as that has
further implications. Instead I wonder whether under Xen we should
e.g. update per_cpu(tick_cpu_sched, cpu).idle_jiffies prior to calling
tick_nohz_restart_sched_tick() (possibly implicitly by doing the
update in do_stolen_accounting(), though that wouldn't work when
the wakeup is due to an interrupt other than the timer one).
The alternative of accounting the negative of the steal time as
idle time in do_stolen_accounting() doesn't look correct either,
since not all stolen time was necessarily accounted as idle (some
would have got - also incorrectly - accounted as process or system
time).
So my conclusion is that only the full implementation of what
CONFIG_VIRT_CPU_ACCOUNTING expects would actually get
things sorted out properly (but that would imply Xen *and* native
are willing to pay the performance price, or the enabling of this
can be made dynamic so that at least native could remain
unaffected).
Or the negative stolen time should be accounted to the current
process, the system, or as idle, depending on the context which
do_stolen_accounting() runs in (just like timer_interrupt() did in
the XenoLinux kernels for positive accounting).
Jan
> The approach may be completely misguided.
>
> [1] https://lkml.org/lkml/2011/10/6/10
> [2] http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01068.html
>
> Signed-off-by: Laszlo Ersek <lersek@xxxxxxxxxx>
> ---
> arch/x86/xen/time.c | 17 ++---------------
> 1 files changed, 2 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 163b467..377f6ae 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -36,9 +36,8 @@ static DEFINE_PER_CPU(struct vcpu_runstate_info,
> xen_runstate);
> /* snapshots of runstate info */
> static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate_snapshot);
>
> -/* unused ns of stolen and blocked time */
> +/* unused ns of stolen time */
> static DEFINE_PER_CPU(u64, xen_residual_stolen);
> -static DEFINE_PER_CPU(u64, xen_residual_blocked);
>
> /* return an consistent snapshot of 64-bit time/counter value */
> static u64 get64(const u64 *p)
> @@ -115,7 +114,7 @@ static void do_stolen_accounting(void)
> {
> struct vcpu_runstate_info state;
> struct vcpu_runstate_info *snap;
> - s64 blocked, runnable, offline, stolen;
> + s64 runnable, offline, stolen;
> cputime_t ticks;
>
> get_runstate_snapshot(&state);
> @@ -125,7 +124,6 @@ static void do_stolen_accounting(void)
> snap = &__get_cpu_var(xen_runstate_snapshot);
>
> /* work out how much time the VCPU has not been runn*ing* */
> - blocked = state.time[RUNSTATE_blocked] - snap->time[RUNSTATE_blocked];
> runnable = state.time[RUNSTATE_runnable] -
> snap->time[RUNSTATE_runnable];
> offline = state.time[RUNSTATE_offline] - snap->time[RUNSTATE_offline];
>
> @@ -141,17 +139,6 @@ static void do_stolen_accounting(void)
> ticks = iter_div_u64_rem(stolen, NS_PER_TICK, &stolen);
> __this_cpu_write(xen_residual_stolen, stolen);
> account_steal_ticks(ticks);
> -
> - /* Add the appropriate number of ticks of blocked time,
> - including any left-overs from last time. */
> - blocked += __this_cpu_read(xen_residual_blocked);
> -
> - if (blocked < 0)
> - blocked = 0;
> -
> - ticks = iter_div_u64_rem(blocked, NS_PER_TICK, &blocked);
> - __this_cpu_write(xen_residual_blocked, blocked);
> - account_idle_ticks(ticks);
> }
>
> /* Get the TSC speed from Xen */
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|