This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] time accounting problem in pvops kernel

To: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] time accounting problem in pvops kernel
From: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Date: Tue, 17 Aug 2010 19:29:41 +0200
Cc: Glauber Costa <glommer@xxxxxxxxxx>
Delivery-date: Tue, 17 Aug 2010 10:33:07 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100621 Fedora/3.0.5-1.fc13 Lightning/1.0b2pre Mnenhy/0.8.3 Thunderbird/3.0.5

while experimenting a bit with time.c we found a bug in time
accounting.  Basically, /proc/stat counts idle time twice for PV guests
running a pvops kernel.

To reproduce, try this command in an unloaded guest:

grep cpu0 /proc/stat; sleep 20 ; grep cpu0 /proc/stat

and see the fourth number in /proc/stat (idle) increasing by approximately
4000 for a kernel with USER_HZ == 100. Instead, if you try these commands
instead (you need an otherwise unloaded machine for these):

grep cpu0 /proc/stat; timeout 20s yes > /dev/null ; grep cpu0 /proc/stat
grep cpu0 /proc/stat; timeout 20s dd if=/dev/urandom > /dev/null ; grep cpu0 

the first and third number in the /cpu/stat increase instead by 2000 only.

The reason for this seems to be that in xen_timer_interrupt Linux's
normal timer accounting is called (via evt->event_handler) and this
calls account_idle_time. However, idle ticks are also added from
do_stolen_accounting, so that overall they're counted twice.

Related to this, it looks like stolen tick accounting is subtly
wrong. Even if only part of a tick is stolen by the hypervisor, Linux's
time accounting will add a whole tick to the user/system/idle time. In
a dynticks kernel (or maybe even if the scheduling quanta have some
kind of resonance with the guest's timer interrupt?) the sum of the
four components user+sys+idle+steal will then be larger than the wall
time. In fact, I found experimentally steal time to be usually 20%
off from wall-user-sys-idle when the machine is under moderate load
(e.g. 5 domains at 100% CPU usage, on a 4-CPU machine). Of course I used
the correct, divided-by-2 idle time to do this computation.


Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>