This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] million cycle interrupt

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Xen-Devel (E-mail)" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] million cycle interrupt
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Mon, 13 Apr 2009 21:15:12 +0000 (GMT)
Delivery-date: Mon, 13 Apr 2009 14:16:09 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C608B52E.87BA%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> You can instrument irq_enter() and irq_exit() to read TSC 

Rather than do this generically and ensure I get all the macros
correct (e.g. per_cpu, nesting) I manually instrumented three
likely suspect irq_enter/exit pairs, two in do_IRQ() and one
in smp_call_function().  ALL of them show an issue with max
readings in the 300K-1M range... with smp_call_function showing
the lowest max and the second in do_IRQ (the non-guest one)
showing readings over 1M (and the guest one at about 800K).

Interestingly, I get no readings at all over 60K when I
recompile with max_phys_cpus=4 (and with nosmp) on my
quad-core-by-two-thread machine.  This is versus several
readings over 60K nearly every second when max_phys_cpus=8.

> Otherwise who knows, it could even be system management mode

I suppose measuring irq_enter/exist pairs still don't rule
this out.  But the "large" interrupts don't seem to happen
(at least not nearly as frequently) with fewer physical
processors enabled, so sys mgmt mode seems unlikely.

Anyway, still a probable problem, still mostly a mystery
as to what is actually happening.  And, repeat, this has
nothing to do with tmem... I'm just observing it using
tmem as a convenient measurement tool.

> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> Sent: Monday, April 13, 2009 2:24 AM
> To: Dan Magenheimer; Xen-Devel (E-mail)
> Subject: Re: [Xen-devel] million cycle interrupt
> On 12/04/2009 21:16, "Dan Magenheimer" 
> <dan.magenheimer@xxxxxxxxxx> wrote:
> > Is a million cycles in an interrupt handler bad?  Any idea what
> > might be consuming this?  The evidence might imply more cpus
> > means longer interrupt, which bodes poorly for larger machines.
> > I tried disabling the timer rendezvous code (not positive I
> > was successful), but still got large measurements, and
> > eventually the machine froze up (but not before I observed
> > the stime skew climbing quickly to the millisecond-plus
> > range).
> You can instrument irq_enter() and irq_exit() to read TSC and 
> find out the
> distribution of irq handling times for interruptions that Xen 
> knows about.
> Otherwise who knows, it could even be system management mode on that
> particular box.
>  -- Keir

Xen-devel mailing list