>From: Dan Magenheimer
>Sent: 2009年4月14日 5:15
>> You can instrument irq_enter() and irq_exit() to read TSC
>Rather than do this generically and ensure I get all the macros
>correct (e.g. per_cpu, nesting) I manually instrumented three
>likely suspect irq_enter/exit pairs, two in do_IRQ() and one
>in smp_call_function(). ALL of them show an issue with max
>readings in the 300K-1M range... with smp_call_function showing
>the lowest max and the second in do_IRQ (the non-guest one)
>showing readings over 1M (and the guest one at about 800K).
Since you already reach this step around calling actual action's
handler, why not take one more step to measure every handler
(serial, apic, vtd, ...)? You can first simply print which handlers
are registered or invoked on your platform. If only one handler
is experienced with abnormal high latency, it's possibly one
specific point. Or else you can suspect on some common code
shared by all handlers, or ... as Keir said, it could be SMM. :-)
>Interestingly, I get no readings at all over 60K when I
>recompile with max_phys_cpus=4 (and with nosmp) on my
>quad-core-by-two-thread machine. This is versus several
>readings over 60K nearly every second when max_phys_cpus=8.
>> Otherwise who knows, it could even be system management mode
>I suppose measuring irq_enter/exist pairs still don't rule
>this out. But the "large" interrupts don't seem to happen
>(at least not nearly as frequently) with fewer physical
>processors enabled, so sys mgmt mode seems unlikely.
>Anyway, still a probable problem, still mostly a mystery
>as to what is actually happening. And, repeat, this has
>nothing to do with tmem... I'm just observing it using
>tmem as a convenient measurement tool.
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
>> Sent: Monday, April 13, 2009 2:24 AM
>> To: Dan Magenheimer; Xen-Devel (E-mail)
>> Subject: Re: [Xen-devel] million cycle interrupt
>> On 12/04/2009 21:16, "Dan Magenheimer"
>> <dan.magenheimer@xxxxxxxxxx> wrote:
>> > Is a million cycles in an interrupt handler bad? Any idea what
>> > might be consuming this? The evidence might imply more cpus
>> > means longer interrupt, which bodes poorly for larger machines.
>> > I tried disabling the timer rendezvous code (not positive I
>> > was successful), but still got large measurements, and
>> > eventually the machine froze up (but not before I observed
>> > the stime skew climbing quickly to the millisecond-plus
>> > range).
>> You can instrument irq_enter() and irq_exit() to read TSC and
>> find out the
>> distribution of irq handling times for interruptions that Xen
>> knows about.
>> Otherwise who knows, it could even be system management mode on that
>> particular box.
>> -- Keir
>Xen-devel mailing list
Xen-devel mailing list