|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] Xen dom0 network I/O scalability
On Apr 27, 2011, at 12:50 PM, Konrad Rzeszutek Wilk wrote:
>> So the current implementation of netback does not scale beyond a single CPU
>> core, thanks to the use of tasklets, making it a bottleneck (please correct
>> me if I am wrong). I remember coming across some patches which attempts to
>> use softirqs instead of tasklets to solve this issue. But the latest version
>> of linux-2.6-xen.hg repo does not include them. Are they included in some
>> other version of dom0 Linux? Or will they be included in future?
>
> You should be using the 2.6.39 kernel or the 2.6.32 to take advantage of
> those patches.
Thanks Konrad. I got hold of a pvops dom0 kernel from Jeremy's git repo
(xen/stable-2.6.32.x). As you pointed out it did include those patches. I spent
some time studying the new netback design and ran some experiments. I have a
few questions regarding them.
I am using the latest version of the hypervisor from the xen-unstable.hg repo.
I ran the experiments on a dual socket AMD quad-core opteron machine (with 8
CPU cores). My experiments simply involved running 'netperf' between 1 or 2
pairs of VMs on the same machine. I allocated 4 vcpus for dom0 and one each for
the VMs. None of the vcpus were pinned.
- So the new design allows you to choose between tasklets and kthreads within
netback, with tasklets being the default option. Is there any specific reason
for this?
- The inter-VM performance (throughput) is worse using both tasklets and
kthreads as compared to the old version of netback (as in linux-2.6-xen.hg
repo). I observed about 50% drop in throughput in my experiments. Has anyone
else observed this? Is the new version yet to be optimized?
- Two tasklets (rx and tx) are created per vcpu within netback. But in my
experiments I noticed that only one vcpu was being used during the experiments
(even with 4 VMs). I also observed that all the event channel notifications
within netback are always sent to vcpu 0. So my conjecture is that since the
tasklets are always scheduled by vcpu 0, all of them are run only on vcpu 0. Is
this a BUG?
- Unlike with tasklets, I observed the CPU utilization go up when I used
kthreads and increased the number of VMs. But the performance never scaled up.
On profiling the code (using xenoprof) I observed significant synchronization
overhead due to lock contention. The main culprit seems to be the per-domain
lock acquired inside the hypervisor (specifically within do_grant_table_op).
Further, packets are copied (inside gnttab_copy) while this lock is held. Seems
like a bad idea?
- A smaller source of overhead is when the '_lock' is acquired within netback
in netif_idx_release(). Shouldn't this lock be per struct xen-netbk instead of
being global (declared as static within the function)? Is this a BUG?
If some (or all) of these points have already been discussed before, I
apologize in advance!
I appreciate any feedback or pointers.
Thanks.
--Kaushik
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: [Xen-devel] Xen dom0 network I/O scalability,
Kaushik Kumar Ram <=
|
|
|
|
|