|
|
|
|
|
|
|
|
|
|
xen-devel
RE: [Xen-devel] More network tests with xenoprofile this time
To: |
"William Cohen" <wcohen@xxxxxxxxxx> |
Subject: |
RE: [Xen-devel] More network tests with xenoprofile this time |
From: |
"Santos, Jose Renato G" <joserenato.santos@xxxxxx> |
Date: |
Fri, 17 Jun 2005 12:39:09 -0700 |
Cc: |
Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, "Turner, Yoshio" <yoshio_turner@xxxxxx>, Andrew Theurer <habanero@xxxxxxxxxx>, Aravind Menon <aravind.menon@xxxxxxx>, G John Janakiraman <john@xxxxxxxxxxxxxxxxxxx> |
Delivery-date: |
Fri, 17 Jun 2005 19:38:14 +0000 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
Thread-index: |
AcVrqqsSKNvG+C3NTE6IeTUhHmIj6wHvBUNw |
Thread-topic: |
[Xen-devel] More network tests with xenoprofile this time |
William and Andrew
Sorry for the delay in replying. I have been traveling
and did not have email access while away.
>
> Hi Renato,
>
> The article was an interesting application of the xenoprof.
>
> It seem like it would be useful to also have data collected using the
> cycle counts (GLOBAL_POWER_EVENTS on P4) to give some indication of
> areas with high overhead operations. There may be some areas with few
> very expensive instructions. Calling attention to those areas
> would help
> improve performance.
Yes, you are right. We have in fact collected GLOBAL_POWER_EVENTS,
but did not include in the paper due to space limitations.
I have attached oprofile results for our ttcp like benchmark(receive
side) for the case with 1 NIC (both cycle counts and instructions).
As you can see there are some functions with very expensive
instructions.
For example "hypercall" add anly 0.6% additional instructions but
these consume 3.0% more clock cycles; "unmask_IO_APIC_irq" add
0.25% instructions but consume 5% more cycles. It would be
interesting to investigate these and see if we can optimize them.
>
> The increases in I-TLB and D-TLB events for Xen-domain0 shown
> in Figure
> 4 are surprising. Why would the working sets be that much larger for
> Xen-domain0 than regular linux, particularly for code? Is
> there an table
> similar to table 3 for I-TLB event sample locations?
>
Yes, we were also surprised by these results. I have attached
the complete I-TLB and D_TLB oprofile results (for the 3 NICs case)
(note these are on a different type of machine than the other
2 attached oprofile results)
Aravind instrumented the macros in xen/include/asm-x86/flushtlb.h.
I am not sure if he used PERFCOUNTER_CPU or if he included his
own instrumentation. With this instrumentation we did not observe
any TLB flush, but I suppose we could have missed TLB flushes
that did not use the macro... I think it would be a good idea to
investigate this further to confirm that TLB flushes are not
happening.
One additional observation is that in general the number of misses
in NOT proportional to the size of the working set. It is possible
that a small increase in the working set significantly increase the
number of misses. Therefore it is possible that the increase
in TLB misses is in fact due to a larger working set. But, I agree
we have to investigate this further to get confirmation ...
> Can't the VMM use a 4-MB page and the Xen-domain0 kernel shouldn't be
> that much larger than regular linux kernel?
> How were TLB flushes ruled
> out as a cause? Could the PERFCOUNTER_CPU counters in perfc_defn.h be
> used to see if the VMM is doing a lot of TLB flushes?
>
> Also how much of I-TLB and D-TLB events are due to the P4
> architecture?
> Are the results so dramatic for a Athlon or AMD64 processors?
>
We did not try this on any other architecture.
Right now xenoprof is only supported on P4.
Support for other architectures is not on top of our priority list.
Regards
Renato
> -Will
>
>
time_func_xen0.prof
Description: time_func_xen0.prof
instr_func_xen0.prof
Description: instr_func_xen0.prof
dtlb_3nic.prof
Description: dtlb_3nic.prof
itlb_3nic.prof
Description: itlb_3nic.prof
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|