> Hi Steven and Jan,
> I modified the code according to your comments, and the latest
> version is version 4. Do you have further comments or consideration
> on this version?
No, that all looks fine to me.
Sorry about the delay in replying; I thought I'd already responded,
but I seem to have dropped it on the floor somewhere.
> Xu, Dongxiao wrote:
> > Hi,
> > Do you have comments on this version of patch?
> > Thanks,
> > Dongxiao
> > Xu, Dongxiao wrote:
> >> This is netback multithread support patchset version 4.
> >> Main Changes from v3:
> >> 1. Patchset is against xen/next tree.
> >> 2. Merge group and idx into netif->mapping.
> >> 3. Use vmalloc to allocate netbk structures.
> >> Main Changes from v2:
> >> 1. Merge "group" and "idx" into "netif->mapping", therefore
> >> page_ext is not used now.
> >> 2. Put netbk_add_netif() and netbk_remove_netif() into
> >> __netif_up() and __netif_down().
> >> 3. Change the usage of kthread_should_stop().
> >> 4. Use __get_free_pages() to replace kzalloc().
> >> 5. Modify the changes to netif_be_dbg().
> >> 6. Use MODPARM_netback_kthread to determine whether using
> >> tasklet or kernel thread.
> >> 7. Put small fields in the front, and large arrays in the end of
> >> struct xen_netbk.
> >> 8. Add more checks in netif_page_release().
> >> Current netback uses one pair of tasklets for Tx/Rx data transaction.
> >> Netback tasklet could only run at one CPU at a time, and it is used
> >> to serve all the netfronts. Therefore it has become a performance
> >> bottle neck. This patch is to use multiple tasklet pairs to replace
> >> the current single pair in dom0.
> >> Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of
> >> tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets
> >> serve specific group of netfronts. Also for those global and static
> >> variables, we duplicated them for each group in order to avoid the
> >> spinlock.
> >> PATCH 01: Generilize static/global variables into 'struct xen_netbk'.
> >> PATCH 02: Introduce a new struct type page_ext.
> >> PATCH 03: Multiple tasklets support.
> >> PATCH 04: Use Kernel thread to replace the tasklet.
> >> Recently I re-tested the patchset with Intel 10G multi-queue NIC
> >> device, and use 10 outside 1G NICs to do netperf tests with that 10G
> >> NIC.
> >> Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU.
> >> With the patchset, the performance is 2x of the original throughput.
> >> Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs.
> >> With the patchset, the performance is 3.7x of the original
> >> throughput.
> >> when we test this patch, we found that the domain_lock in grant table
> >> operation (gnttab_copy()) becomes a bottle neck. We temporarily
> >> remove the global domain_lock to achieve good performance.
Description: Digital signature
Xen-devel mailing list