Hi, Jeremy,
Thanks much for your comments, here are some explainations for the 01
patch.
Best Regards,
-- Dongxiao
Jeremy Fitzhardinge wrote:
> On 12/02/09 02:17, Xu, Dongxiao wrote:
>> Hi,
>> According to your feedback, I revised my patch and resend it now.
>>
>> [PATCH 01]: Use multiple tasklet pairs to replace the current single
>> pair in dom0. [PATCH 02]: Replace the tasklet with kernel thread. It
>> may hurt the performance, but could improve the responseness from
>> userspace.
>>
>> Test senario:
>> We use ten 1G NIC interface to talk with 10 VMs (netfront) in
>> server. So the total bandwidth is 10G.
>> For host machine, bind each guest's netfront with each NIC interface.
>> For client machine, do netperf testing with each guest.
>>
>> Test Case Throughput(Mbps) Dom0 CPU Util Guests
>> CPU Util
>> w/o any patch 4304.30 400.33% 112.21%
>> w/ 01 patch 9533.13 461.64% 243.81%
>> w/ 01 and 02 patches 7942.68 597.83% 250.53%
>>
>> From the result we can see that, the case "w/ 01 and 02 patches"
>> didn't reach/near the total bandwidth. It is because some vcpus in
>> dom0 are saturated due to the context switch with other tasks, thus
>> hurt the performance. To prove this idea, I did a experiment, which
>> sets the kernel thread to SCHED_FIFO type, in order to avoid
>> preemption by normal tasks. The experiment result is showed below,
>> and it could get good performance. However like tasklet, set the
>> kernel thread to high priority could also influence the userspace
>> responseness because the usespace application (for example, sshd)
>> could not preempt that netback kernel thread.
>>
>> w/ hi-priority kthread 9535.74 543.56% 241.26%
>>
>> For netchannel2, it omits the grant copy in dom0, I didn't try it
>> yet. But I used xenoprofile in current netback system to get a
>> feeling that, grant copy occupies ~1/6 cpu cycle of dom0 (including
>> Xen and dom0 vmlinux).
>>
>> BTW, 02 patch is ported from the patch given by Ian Campbell. You
>> can add your signed-off-by if you want. :)
>>
>
> I've applied this to the xen/dom0/backend/netback-tasklet branch for
> now. However, I noticed a number of problems with a quick lookover of
> the code:
>
> * "netbk" should either be static, or have a longer name
> (mentioning xen)
OK, I will rename it.
> * same with "foreign_page_tracker"
> o (the foreign page tracker API should have better names,
> but that's not your problem)
> * What's cpu_online_nr for? I don't think it should be necessary
> at all, and if it is, then it needs a much more distinct name.
> * If they're really per-cpu variables, they should use the percpu
> mechanism
Actually those tasklets are not per-cpu variables.
We just defined cpu_online_nr of tasklets, in order to get the best performance
if each tasklet could run on each cpu. However, they are not binded with cpus.
Some tasklets may run on the same vcpu of dom0 due to interrupt delivery
affinity. Therefore these tasklets are not per-cpu variables.
> * How do you relate the number of online CPUs to the whole group
> index/pending index computation? It isn't obvious how they're
> connected, or how it guarantees that the index is enough.
Same explaination as above. Whether online cpus number is greater or less than
tasklet number does not matter in our case. We defined them to the same value
is only for getting best performance.
> * What happens if you start hotplugging cpus?
It doesn't matter.
Assumes that dom0 has 3 vcpus now, so there are 3 tasklets to handle network
traffic.
At one time, admin add one vcpu to dom0. In this case, 4 vcpus will handle the
three
tasklets. Think of the current situation without my patch, dom0's all vcpus
handle only
one tasklet, which is a bottleneck.
> * All the repeated netbk[group_idx]. expressions would be improved
> by defining a local pointer for that value.
OK, I will improve it. Thanks!
>
> J
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|