Xen project Mailing List

Re: [Xen-devel] Xen dom0 network I/O scalability

To: Kaushik Kumar Ram <kaushik@xxxxxxxx>

From: Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>

Date: Thu, 12 May 2011 09:21:39 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Delivery-date: Thu, 12 May 2011 01:22:13 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Wed, 2011-05-11 at 19:43 +0100, Kaushik Kumar Ram wrote: > On May 11, 2011, at 4:31 AM, Ian Campbell wrote: > > > >>> - The inter-VM performance (throughput) is worse using both tasklets > >> and kthreads as compared to the old version of netback (as in > >> linux-2.6-xen.hg repo). I observed about 50% drop in throughput in my > >> experiments. Has anyone else observed this? Is the new version yet to > >> be optimized? > >> > >> That is not surprising. The "new" version of netback copies pages. It > >> does not "swizzel" or "map" then between domains (so zero copying). > > > > I think Kaushik is running a xen/2.6.32.x tree and the copying only > > variant is only in mainline. > > > > A 50% drop in performance between linux-2.6-xen.hg and the xen.git > > 2.6.32 tree is slightly worrying but such a big drop sounds more like a > > misconfiguration, e.g. something like enabling debugging options in the > > kernel .config rather than a design or implementation issue in netback. > > > > (I actually have no idea what was in the linux-2.6-xen.hg tree, I don't > > recall such a tree ever being properly maintained, the last cset appears > > to be from 2006 and I recently cleaned it out of xenbits because noone > > knew what it was -- did you mean linux-2.6.18-xen.hg?) > > I was referring to the single-threaded netback version in linux-2.6.18-xen.hg > (which btw also uses copying). Ah, I think we are talking about different values of copying. A long time ago to backend->frontend path (guest receive) operated using a page flipping mode. At some point a copying mode was added to this path which became the default some time in 2006. You would have to go out of your way to find a guest which used flipping mode these days. I think this is the copying you are referring too, it's so long ago that there was a distinction on this path that I'd forgotten all about it until now. The frontend->backend path (guest transmit) has used a mapping (PageForeign) based scheme practically since forever. However when netback was upstreamed into 2.6.39 this had to be removed in favour of a copy based implementation (PageForeign has fingers in the mm subsystem which were unacceptable for upstreaming). This is the copying mode Konrad and I were talking about. We know the performance will suffer versus mapping mode, and we are working to find ways of reinstating mapping. > I don't believe misconfiguration to be reason. > As I mentioned previously, I profiled the code and found significant > synchronization > overhead due to lock contention. This essentially happens when two vcpus in > dom0 perform the grant hypercall and both try to acquire the domain_lock. > > I don't think re-introducing zero-copy in the receive path is a solution to > this problem. As far as I can tell you are running with the zero-copy path. Only mainline 2.6.39+ has anything different. I think you need to go into detail about your test setup so we can all get on the same page and stop confusing ourselves by guessing which modes netback has available and is running in. Please can you describe precisely which kernels you are running (tree URL and changeset as well as the .config you are using). Please also describe your guest configuration (kernels, cfg file, distro etc) and benchmark methodology (e.g. netperf options). I'd also be interesting in seeing the actual numbers you are seeing, alongside specifics of the test scenario which produced them. I'm especially interesting in the details of the experiment(s) where you saw a 50% drop in throughput. > I mentioned packet copies only to explain the severity of this > problem. Let me try to clarify. Consider the following scenario: vcpu 1 > performs a hypercall, acquires the domain_lock, and starts copying one or > more > packets (in gnttab_copy). Now vcpu 2 also performs a hypercall, but it cannot > acquire the domain_lock until all the copies have completed and the lock is > released by vcpu 1. So the domain_lock could be held for a long time before > it is released. But this isn't a difference between the multi-threaded/tasklet and single-threaded/tasklet version of netback, is it? In the single threaded case the serialisation is explicit due to the lack of threading, and it would obviously be good to avoid for the multithreaded case, but the contention doesn't really explain why multi-threaded mode would be 50% slower. (I suppose the threading case could serialise things into a different order, perhaps one which is somehow pessimal for e.g. TCP) It is quite easy to force the number of tasklets/threads to 1 (by forcing xen_netbk_group_nr to 1 in netback_init()). This might be an interesting experiment to see if the degradation is down to contention between threads or something else which has changed between 2.6.18 and 2.6.32 (there is an extent to which this is comparing apples to oranges but 50% is pretty severe...). > I think to properly scale netback we need more fine grained locking. Quite possibly. It doesn't seem at all unlikely that the domain lock on the guest-receive grant copy is going to hurt at some point. There are some plans to rework the guest receive path to do the copy on the guest side, the primary motivation is to remove load from dom0 and to allow better accounting of work to the guests to request it but a side-effect of this could be to reduce contention on dom0's domain_lock. However I would like to get to the bottom of the 50% degradation between linux-2.6.18-xen.hg and xen.git#xen/stable-2.6.32.x before we move on to how we can further improve the situation in xen.git. > >>> - Two tasklets (rx and tx) are created per vcpu within netback. But > >> in my experiments I noticed that only one vcpu was being used during > >> the experiments (even with 4 VMs). I also observed that all the event > >> channel notifications within netback are always sent to vcpu 0. So my > >> conjecture is that since the tasklets are always scheduled by vcpu 0, > >> all of them are run only on vcpu 0. Is this a BUG? > >> > >> Yes. We need to fix 'irqbalance' to work properly. There is something > >> not working right. > > > > The fix is to install the "irqbalanced" package. Without it no IRQ > > balancing will occur in a modern kernel. (perhaps this linux-2.6-xen.hg > > tree was from a time when the kernel would do balancing on its own?). > > You can also manually balance the VIF IRQs under /proc/irq if you are so > > inclined. > > Why cannot the virq associated with each xen_netbk be bound to a different > vcpu during initlization? An IRQ is associated with a VIF and multiple VIFs can be associated with a netbk. I suppose we could bind the IRQ to the same CPU as the associated netrbk thread but this can move around so we'd need to follow it. The tasklet case is easier since, I think, the tasklet will be run on whichever CPU scheduled it, which will be the one the IRQ occurred on. Drivers are not typically expected to behave in this way. In fact I'm not sure it is even allowed by the IRQ subsystem and I expect upstream would frown on a driver doing this sort of thing (I expect their answer would be "why aren't you using irqbalanced?"). If you can make this work and it shows real gains over running irqbalanced we can of course consider it. [...] > Also, which git repo/branch should I be using If I would like to experiment > with > the latest dom0 networking? I wouldn't recommend playing with the stuff in mainline right now -- we know it isn't the best due to the use of copying on the guest receive path. The xen.git#xen/stable-2.6.32.x tree is probably the best one to experiment on. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.