Hi Isaku and all,
I have evaluated Isaku's performance tuning patch and VNIF copy mode.
The result was outstanding. Employing Isaku's patch and VNIF copy
mode together, netperf reported that the throughput was about
2Gbits/sec from Domain-0 to Domain-U in the same box, which was 8
times better than vanilla Xen.
* environment
CPU: 2packge x 2core, w/o HT, 1.4GHz
Xen: ia64-unstable, base C/S 11701
dom0: 1 vcpu, pinned
domU: 1 vcpu, no affinity
dom0 <-> domU in a same box (Mbits/sec)
domU -> dom0 dom0 -> domU
nerperf netserver netperf netserver
--------------------------------------------------------------------
vanilla xen in page fliping 786.55 250.68
vanilla xen in copy mode 903.4 1013.44
patched xen in page flipping 1367.83 1646.91
patched xen in copy mode 1284.41 2025.46
Regards,
Hiroya
Isaku Yamahata wrote:
> Hi. These patches are for performance tuning TAKE 7
> Theses patches are for the changeset of xen-ia64-unstable.hg
> 11701:2bfd19fc1b79c6a6712c99f875f1fbf883af3f35
>
>>From dom0 <-> domU benchmark result and counter based analysis,
> xen/ia64 tlb flush overhead is successfully reduced with these patches.
> However domU network performance is still low.
> There might another issues somewhere else, I guess.
> I'll suspend further investigation and want to merge these patches.
> Then I'll move to xen oprofile and tlb miss issue (including huge
> page if possible). Merging these patches would be done as background task.
> If necessary, I'll be back to network performance again later.
>
>
> benchmark
> =========
> I did netperf benchmark very roughly by netperf -c -C -H <netserver> -l 100.
> This is to see the effects very roughly.
> The network environment isn't separeted from others,
> only it was measured only once and it seems that the distribution of
> netperf figures is large.
> If you need an accurate benchmark result, you should measure sometimes
> and get avarate by yourself. (and let me know!)
>
> * environment
> tiger4
> CPU: 4packge x 2core x 2HT
> Native: RHEL AS Release 4 Update 2: tiger4
> dom0: tiger4, vcpu=4
> domU: tiger4-g0, vcpu=8
> NIC: e1000
>
> em64t
> CPU: Intel(R) Pentium(R) 4 CPU 3.60GHz stepping 0a
> memory: 1GB
> NIC: Tigon3 [partno(BCM95789) rev 4101 PHY(5750)]
>
> * result
> target <-> em64t(Mbits/sec)
> target target -> em64t em64t -> target
> netperf netserver nerperf netserver
> Native 723.33 909.49
> dom0(vanilla 11701) 673.96 836.05
> dom0(patched) 675.75 837.75
> domU(vanilla 11701) 136.28 77.78
> domU(patched) 249.32 143.60
>
>
> dom0 <-> domU in a same box (Mbits/sec)
> domU -> dom0 dom0 -> domU
> nerperf netserver netperf netserver
> vanilla xen(C/S 11701) 576.71 329.49
> patched 973.99 930.08
>
>
> patches
> =======
> - performace counter
> - p2m exposure
> - per vcpu vhpt
> - tlb tracking
> - grant table transfer
> - netback skbuff preregister
> - netfront page preregister
> - netback page preregister
> - deferred page freeing
> - tlb flush clock
> - micro optimize __domain_flush_vtlb_track_entry
> - supress clear_pages
>
>
> patch detail
> ============
> - per vcpu vhpt
> It focuses on vcpu migration between physical cpus.
> With credit scheduler, vcpu is heavily migrated.
> This patch tries to reduce vTLB flush when vcpu is migrated.
>
> - p2m exposure
> DMA paravirtualization requires the conversion from pseudo physical address
> to machine address. Currently it is done by hypercall.
> This patch tries to reduce the conversion overhead by read-only
> mapping the xen p2m table to domain.
>
> - tlb tracking
> It forcuses on grant table mapping.
> When page is unmapped, full vTLB flush is necessary.
> By tracking tlb insert on grant mapped page, full vTLB flush
> can be avoided.
> Especially vbd does only DMA, so dom0 doesn't insert tlb entry
> on the grant mapped page. In such case any vTLB flush isn't needed.
>
> - netback skbuff/netfront/netback page tlb tracking
> This focuses on grant table transfer.
> When page is transfered, full vTLB flush is necessary on both
> sender domain and receiver domain.
> By preregistering the page, Xen/IA64 begins to track tlb insert on
> regestered pages.
>
> - deferred page freeing
> When the page in which tlb insert isn't tracked is unmapped/zapped from
> domain, full vTLB flush is necessary again.
> Balloon driver and grant table page transfer is the case.
> This patch focuses on it.
> It tries to batch freeing/zapping page from domain in order
> to reduce full vTLB flush.
> modifies tlb track page hypercall semantics and
> reimplements tlb untrack page hypercall.
> This patch tries to reduce vTLB flush cost of
> tlb track/untrack/zap page hypercall by trying to batch using timer.
>
> - tlb flush clock
> This is intended to be a counter part of Xen/x86 tlb flush clock.
> But this is used only when vcpu context switch only. not for lazy tlb flush.
>
> included patches
> ================
> 11457:de77bfdecfbe_avoid_long_time_interrupt_masking.patch
> 11458:2bf4fc5ee839_perfc_for_vtlb_flush.patch
> 11459:dc1c8c91d249_perfc_mm_c.patch
> 11460:edbfec69d631_perfc_dom0vp_p2m_and_m2p.patch
> 11461:357d5479c0ff_p2m_exposure_xen_side.patch
> 11462:dde3a660f354_p2m_exposure_linux_side.patch
> 11463:1ae54e6b7ac9_p2m_exposure_test_module.patch
> 11464:065b48a99038_script_for_p2m_test_module.patch
> 11465:96b229487ae2_pervcpu_vhpt.patch
> 11466:da72199ba08c_fix_pte_flags_conflict.patch
> 11467:677fdf7aa2de_import_linux_hash.h.patch
> 11468:114c67d3d090_tlb_track.patch
> 11469:c5fde1737a9b_deferred_page_freeing.patch
> 11470:e123f0373d66_skbuff_tlb_tracking_xen_side.patch
> 11471:1313603b6f82_skbuff_tlb_tracking_linux_side.patch
> 11472:14a194e7caa9_tlb_track_netfront_page_xen_side.patch
> 11473:31a91097ca2b_tlb_tracking_on_netfront_page_linux_side.patch
> 11474:644d8aa4ce8f_tlbflush_clock.patch
> 11475:3debc96c950d_tlb_zap_page_hypercall_xen_side.patch
> 11476:a276174da6dd_tlb_zap_hypercall_linux_side.patch
>
> FWIW my dot configs are as follows
> - xen dot config
> crash_debug=y
> debug=y
> verbose=y
> xen_ia64_dom0_virtual_physical=y
> xen_ia64_tlb_track=y
> #xen_ia64_tlb_track_cnt=y
> xen_ia64_tlb_track_cnt=n
> xen_ia64_tlb_track_grant_table_page_transfer=y
> xen_ia64_tlb_track_skbuff=y
> xen_ia64_tlb_track_netfront_page=y
> xen_ia64_tlb_track_deferred_flush=y
> xen_ia64_pervcpu_vhpt=y
> xen_ia64_deferred_free=y
> xen_ia64_tlbflush_clock=y
> xen_ia64_tlbflush_clock_tlb_track_entry=y
> xen_ia64_clear_page=n
>
> perfc=y
> perfc_arrays=y
>
> - Linux dot config includes
> CONFIG_XEN_IA64_VDSO_PARAVIRT=y
> CONFIG_XEN_IA64_EXPOSE_P2M=y
> CONFIG_XEN_IA64_EXPOSE_P2M_USE_DTR=y
> CONFIG_XEN_IA64_TLB_TRACK_SKBUFF=y
> CONFIG_XEN_IA64_TLB_TRACK_NETFRONT_PAGE=y
> CONFIG_XEN_IA64_TLB_TRACK_NETBACK_PAGE=y
>
>
> thanks.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Xen-ia64-devel mailing list
> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-ia64-devel
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|