RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support

To:	Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Subject:	RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
From:	"Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>
Date:	Sat, 28 Nov 2009 00:08:07 +0800
Accept-language:	en-US
Acceptlanguage:	en-US
Cc:	Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jeremy
Delivery-date:	Fri, 27 Nov 2009 08:08:49 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<1259314974.7590.1042.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<EADF0A36011179459010BDF5142A457501D006B913@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <1259314974.7590.1042.camel@xxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcpvRgKEkN374BwETMSDfGzkwuBj4QAMeMyg
Thread-topic:	[Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support

Ian, 
        Thanks for your comments. Some explainations below.

Best Regards,
-- Dongxiao

Ian Campbell wrote:
> Hi,
> 
> Does this change have any impact on the responsiveness of domain 0
> userspace while the host is under heavy network load? We have found
> that the netback tasklets can completely dominate dom0's VCPU to the
> point where no userspace process ever gets the chance to run, since
> this includes sshd and the management toolstack that can be quite
> annoying. 
> 
> The issue was probably specific to using a single-VCPU domain 0 in
> XenServer but since you patch introduces a tasklet per VCPU it could
> possibly happen to multi-VCPU domain 0.

The former case you found is because all the netfronts are processed by one 
dom0's tasklet, therefore the only vcpu which handles the tasklet will become 
super busy and have no time to handle other userspace issues. My patch 
separates the netback's workload to different tasklets, and if the tasklets 
could bind with different vcpus in dom0 (by irqbalance or manually pin 
interrupt), the total CPU utilization will be delivered to each vcpu in 
average, which could make dom0 more scalable. Take our test case as an example, 
the system is in heavy network load, whose throughput is close to its network 
bandwidth (9.55G/10G), but it only uses ~460% dom0's CPU (Dom0 totally has 10 
vcpus to handle the network, so each vcpu cost 46% for the network workload). 
Therefore for 1G NIC, there will be no problem. Also for the current 10G NIC, 
most of them have multi-queue technology, interrupts will be deliver to 
different cpus, so dom0's each vcpu is only needed to handle part of the 
workload, I believe there will be no problem too. 

> 
> For XenServer we converted the tasklets into a kernel thread, at the
> cost of a small reduction in overall throughput but yielding a massive
> improvement in domain 0 responsiveness. Unfortunately the change was
> made by someone who has since left Citrix and I cannot locate the
> numbers he left behind :-(
> 
> Our patch is attached. A netback thread per domain 0 VCPU might be
> interesting to experiment with?

Adding the kernel thread mechanism to netback is a good way to improve dom0's 
responsiveness in UP case. However for multiple vcpu dom0, I think it may be 
not needed. Anyway it is another story from my multiple tasklet approach. 

> 
> Ian.
> 
> On Fri, 2009-11-27 at 02:26 +0000, Xu, Dongxiao wrote:
>> Current netback uses one pair of tasklets for Tx/Rx data transaction.
>> Netback tasklet could only run at one CPU at a time, and it is used
>> to serve all the netfronts. Therefore it has become a performance
>> bottle neck. This patch is to use multiple tasklet pairs to replace
>>      the current single pair in dom0. Assuming that Dom0 has CPUNR
>> VCPUs, we define CPUNR kinds of tasklets pair (CPUNR for Tx, and
>> CPUNR for Rx). Each pare of tasklets serve specific group of
>> netfronts. Also for those global and static variables, we duplicated
>> them for each group in order to avoid the spinlock. 
>> 
>> Test senario:
>> We use ten 1G NIC interface to talk with 10 VMs (netfront) in server.
>> So the total bandwidth is 10G.
>> For host machine, bind each guest's netfront with each NIC interface.
>> For client machine, do netperf testing with each guest.
>> 
>> Test Case    Packet Size     Throughput(Mbps)        Dom0 CPU Util   Guests 
>> CPU Util
>> w/o patch    1400            4304.30         400.33%         112.21%
>> w/   patch   1400            9533.13         461.64%         243.81%
>> 
>> BTW, when we test this patch, we found that the domain_lock in grant
>> table operation becomes a bottle neck. We temporarily remove the
>> global domain_lock to achieve good performance.
>> 
>> Best Regards,
>> -- Dongxiao
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support