|
|
|
|
|
|
|
|
|
|
xen-ia64-devel
Re: [Xen-ia64-devel] [PATCH][RFC] performance tuning TAKE 5
Hi. I posted performance tuning patches.
It must be evaluated for its effect before commit.
(or improve them based on the analysis or discard the patches.)
I analyzed the patch behaviour based on the performance counter
to see if the patches work as I intended.
The figures at the end of this mail are from performance counter.
Reset the performance counter with 'P' keyhandler, execute
hdparm or wget 3 times and then get the counter with 'p' keyhander
I choice hdparm and wget just because they are handy.
- p2m exposure to a domain effectively eliminates p2m conversion hypercalls.
But m2p conversion hypercall remains.
m2p conversion is done by in_swiotlb_aperture() of swiotlb to determine
whether a given dma address is of swiotlb or not.
So it might be meaning full to reduce m2p conversion hypercalls overhead.
However this isn't easy as p2m case because of virtual frame table.
Virtual frame table allocate m2p table sparsely. Simply exposing
the m2p table to domain incurs page fault.
It results in ring crossing and xen p2m traverse.
On the other hand page fault handler on the m2p table area is
very effectively implemented by hand-coded assembler.
So it is doubtfull of the effect of exposing the m2p table.
Another way is introducing fast hypercall like Linux/IA64's fsys mode.
A domian jumps into the page xen provies.
it issues epc and lookup xen's m2p table and returns without context save.
I think this idea can be applied to fastpath of hyperprivop.
It might be able to reduce hyperprivop overhead.
- tlb tracking effectively elimiates full vTLB flush.
The number of calls to domain_flush_vtlb_all() is very small and
domain_flush_vtlb_track_entry() is called many times.
- preregistering skbuff of netback and page of netfront effectively
elimiates full vTLB flush.
I thought that some kind of deferring page free was needed.
But seeing this figures deferred page freeing might not be necessary, but
just batching is sufficient.
The implementation of preregistering netfront page is very hacky.
It should be discussed to adopt or not.
- deferred page freeing.
Currently it is actually batched tlb flush. it doesn't defer page freeing.
Seeing number of dqueue_flush_and_free_tlb_track_entries and
dfree_queue_tlb_track_entry, it works. But I'm not sure it actually
reduces overhead.
Probably analysis based on profiling is needed.
- tlb flush clock
This tries to reduce flushing vhpt and mTLB when vcpu context switch
and tlb entry flush.
It works well when vcpu context swtich, but it doesn't work well
when tlb entry flush.
- per vcpu vhpt
From those performance count figures, I can't interpret the effect of
per vcpu vhpt.
When VP model is adopted, xen VHPT size is reduced from 16MB to 64KB
to reduce the over head of domain_flush_vtlb_all().
64KB was chosen just because it was the smallest size Xen/IA64 accepted.
(I tried 32KB, VHPT smallest size, but it didn't boot.
I didn't track it down.)
Appropriate VHPT size must be determined at some point.
We might want to increase its size, and it affects the overhead of
domain_flush_vtlb_all() and vcpu context switch.
One factor we should take account of is scalability.
When the number of physical cpu is large (e.g. 64 or 128), but
the number of vcpus of a domain is not so large (e.g. 4 or 8),
per vcpu vhpt reduces domain_flush_vtlb_all().
Without per vcpu vhpt, something like tracking dirty pcpu is necessary
to get similar overhead reduction.
*items rough description
dom0vp_phystomach number of p2m conversion hypercall
dom0vp_machtophys number of m2p conversion hypercall
create_grant_host_mapping number of grant table page mapping
destroy_grant_host_mapping number of grant table page unmapping
steal_page number of grant table page transfer
domain_flush_vtlb_all number of calls domain_flush_vtlb_all()
This function flushes all vhpt, and mTLB of
all vcpu of a domain
domain_flush_vtlb_track_entry number of calls domain_flush_vtlb_track_entry
This function flushes only one vhpt entry
and mTLB of dirtied vcpu of a domain
tlb_track_sar number of page zapping from a domain
tlb_track_sar_not_tracked the virtual address of a page isn't tracked.
whole vTLB flush is needed.
i.e. domain_flush_vtlb_all() is called.
tlb_track_sar_not_found tlb is tracked, but no tlb is inserted
So when page is zapped, no tlb flush is
needed.
tlb_track_sar_found tlb is tracked and tlb insert of
one virtual address is issued.
So vTLB flush of this virtual address is
needed.
i.e. domain_flush_vtlb_track_entry() is
tlb_track_sar_many tlb is tracked and tlb insert of
more than one virtual address is issued.
So whole vTLB flush is necessary.
i.e. domain_flush_vtlb_all() is called.
dom0vp_tlb_track_page number of tlb track register
dom0vp_tlb_untrack_page number of tlb track unregister
dqueue_flush_and_free_tlb_track_entries
number of batched flush of tlb track entry
dfree_queue_tlb_track_entry number of tlb track entry queueing
tlbflush_clock_cswitch_purge number of tlb flush when context switch
tlbflush_clock_cswitch_skip number of skipping tlb flush when context
switch due to tlbflush clock
tlbflush_clock_tlb_track_purge number of tlb flush when tlb track entry
is flushed
tlbflush_clock_tlb_track_skip number of skipping tlb flush when tlb track
entry is flushed.
*hdparm -t /dev/hda6 x3
(XEN) dom0vp_phystomach TOTAL[ 0]
(XEN) dom0vp_machtophys TOTAL[ 502]
(XEN) create_grant_host_mapping TOTAL[ 131359]
(XEN) destroy_grant_host_mapping TOTAL[ 131359]
(XEN) steal_page_refcount TOTAL[ 429]
(XEN) steal_page TOTAL[ 430]
(XEN) domain_flush_vtlb_all TOTAL[ 3]
(XEN) domain_flush_vtlb_track_entry TOTAL[ 851]
(XEN) domain_page_flush_and_put TOTAL[ 132243]
(XEN) tlb_track_sar TOTAL[ 132245]
(XEN) tlb_track_sar_not_tracked TOTAL[ 3]
(XEN) tlb_track_sar_not_found TOTAL[ 131350]
(XEN) tlb_track_sar_found TOTAL[ 893]
(XEN) tlb_track_sar_many TOTAL[ 0]
(XEN) dom0vp_tlb_track_page TOTAL[ 2]
(XEN) dom0vp_tlb_untrack_page TOTAL[ 2]
(XEN) dqueue_flush_and_free_tlb_track_entries
TOTAL[ 469]
(XEN) dfree_queue_tlb_track_entry TOTAL[ 896]
(XEN) tlbflush_clock_cswitch_purge TOTAL[ 11821]
(XEN) tlbflush_clock_cswitch_skip TOTAL[ 1533]
(XEN) tlbflush_clock_tlb_track_purge TOTAL[ 902]
(XEN) tlbflush_clock_tlb_track_skip TOTAL[ 0]
*wget kernel source x3
(XEN) dom0vp_phystomach TOTAL[ 0]
(XEN) dom0vp_machtophys TOTAL[ 163589]
(XEN) create_grant_host_mapping TOTAL[ 57390]
(XEN) destroy_grant_host_mapping TOTAL[ 57390]
(XEN) steal_page_refcount TOTAL[ 86153]
(XEN) steal_page TOTAL[ 86153]
(XEN) domain_flush_vtlb_all TOTAL[ 19]
(XEN) domain_flush_vtlb_track_entry TOTAL[ 210230]
(XEN) domain_page_flush_and_put TOTAL[ 229734]
(XEN) tlb_track_sar TOTAL[ 229738]
(XEN) tlb_track_sar_not_tracked TOTAL[ 70]
(XEN) tlb_track_sar_not_found TOTAL[ 19321]
(XEN) tlb_track_sar_found TOTAL[ 210348]
(XEN) tlb_track_sar_many TOTAL[ 0]
(XEN) dom0vp_tlb_track_page TOTAL[ 15]
(XEN) dom0vp_tlb_untrack_page TOTAL[ 9]
(XEN) dqueue_flush_and_free_tlb_track_entries
TOTAL[ 125912]
(XEN) dfree_queue_tlb_track_entry TOTAL[ 210350]
(XEN) tlbflush_clock_cswitch_purge TOTAL[ 8408]
(XEN) tlbflush_clock_cswitch_skip TOTAL[ 1186]
(XEN) tlbflush_clock_tlb_track_purge TOTAL[ 210284]
(XEN) tlbflush_clock_tlb_track_skip TOTAL[ 0]
--
yamahata
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|
|
|
|
|