This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] time accounting problem in pvops kernel

To: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Subject: Re: [Xen-devel] time accounting problem in pvops kernel
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Tue, 17 Aug 2010 15:51:34 -0700
Cc: Glauber Costa <glommer@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 17 Aug 2010 15:52:28 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C6AC705.1080904@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C6AC705.1080904@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100720 Fedora/3.1.1-1.fc13 Lightning/1.0b2pre Thunderbird/3.1.1
 On 08/17/2010 10:29 AM, Paolo Bonzini wrote:
> Hi,
> while experimenting a bit with time.c we found a bug in time
> accounting.  Basically, /proc/stat counts idle time twice for PV guests
> running a pvops kernel

What version?  Upstream and stable kernels contain the changeset "xen:
drop xen_sched_clock in favour of using plain wallclock time" which
should fix a lot of timekeeping/scheduling problems.


> .
> To reproduce, try this command in an unloaded guest:
> grep cpu0 /proc/stat; sleep 20 ; grep cpu0 /proc/stat
> and see the fourth number in /proc/stat (idle) increasing by approximately
> 4000 for a kernel with USER_HZ == 100. Instead, if you try these commands
> instead (you need an otherwise unloaded machine for these):
> grep cpu0 /proc/stat; timeout 20s yes > /dev/null ; grep cpu0 /proc/stat
> grep cpu0 /proc/stat; timeout 20s dd if=/dev/urandom > /dev/null ; grep cpu0 
> /proc/stat
> the first and third number in the /cpu/stat increase instead by 2000 only.
> The reason for this seems to be that in xen_timer_interrupt Linux's
> normal timer accounting is called (via evt->event_handler) and this
> calls account_idle_time. However, idle ticks are also added from
> do_stolen_accounting, so that overall they're counted twice.
> Related to this, it looks like stolen tick accounting is subtly
> wrong. Even if only part of a tick is stolen by the hypervisor, Linux's
> time accounting will add a whole tick to the user/system/idle time. In
> a dynticks kernel (or maybe even if the scheduling quanta have some
> kind of resonance with the guest's timer interrupt?) the sum of the
> four components user+sys+idle+steal will then be larger than the wall
> time. In fact, I found experimentally steal time to be usually 20%
> off from wall-user-sys-idle when the machine is under moderate load
> (e.g. 5 domains at 100% CPU usage, on a 4-CPU machine). Of course I used
> the correct, divided-by-2 idle time to do this computation.
> Paolo
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

Xen-devel mailing list