WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Xen dom0 network I/O scalability

On May 12, 2011, at 3:21 AM, Ian Campbell wrote:

> A long time ago to backend->frontend path (guest receive) operated using
> a page flipping mode. At some point a copying mode was added to this
> path which became the default some time in 2006. You would have to go
> out of your way to find a guest which used flipping mode these days. I
> think this is the copying you are referring too, it's so long ago that
> there was a distinction on this path that I'd forgotten all about it
> until now.

I was not referring to page flipping at all. I was only talking about the copies
in the receive path. 

> The frontend->backend path (guest transmit) has used a mapping
> (PageForeign) based scheme practically since forever. However when
> netback was upstreamed into 2.6.39 this had to be removed in favour of a
> copy based implementation (PageForeign has fingers in the mm subsystem
> which were unacceptable for upstreaming). This is the copying mode
> Konrad and I were talking about. We know the performance will suffer
> versus mapping mode, and we are working to find ways of reinstating
> mapping.

Hmm.. I did not know that the copying mode was introduced in the transmit path. 
But as I said above I was only referring to the receive path. 

> As far as I can tell you are running with the zero-copy path. Only
> mainline 2.6.39+ has anything different.

Again, I was only referring to the receive path! I assumed you were talking 
about
re-introducing zero-copy in the receive path (aka page flipping). 
To be clear: 
- xen.git#xen/stable-2.6.32.x uses copying in the RX path and 
mapping (zero-copy) in the RX path. 
- Copying is used in both RX and TX path in 2.6.39+ for upstreaming.

> I think you need to go into detail about your test setup so we can all
> get on the same page and stop confusing ourselves by guessing which
> modes netback has available and is running in. Please can you describe
> precisely which kernels you are running (tree URL and changeset as well
> as the .config you are using). Please also describe your guest
> configuration (kernels, cfg file, distro etc) and benchmark methodology
> (e.g. netperf options).
> 
> I'd also be interesting in seeing the actual numbers you are seeing,
> alongside specifics of the test scenario which produced them.
> 
> I'm especially interesting in the details of the experiment(s) where you
> saw a 50% drop in throughput.

I agree. I plan to run the experiments again next week. I will get back to you 
with all the details. 

But these are the versions I am trying to compare:
1. http://xenbits.xensource.com/linux-2.6.18-xen.hg (single-threaded legacy 
netback)
2. xen.git#xen/stable-2.6.32.x (multi-threaded netback using tasklets)
3. xen.git#xen/stable-2.6.32.x (multi-threaded netback using kthreads)

And (1) outperforms both (2) and (3).

>> I mentioned packet copies only to explain the severity of this
>> problem. Let me try to clarify. Consider the following scenario: vcpu 1 
>> performs a hypercall, acquires the domain_lock, and starts copying one or 
>> more 
>> packets (in gnttab_copy). Now vcpu 2 also performs a hypercall, but it 
>> cannot 
>> acquire the domain_lock until all the copies have completed and the lock is 
>> released by vcpu 1. So the domain_lock could be held for a long time before 
>> it is released.
> 
> But this isn't a difference between the multi-threaded/tasklet and
> single-threaded/tasklet version of netback, is it?
> 
> In the single threaded case the serialisation is explicit due to the
> lack of threading, and it would obviously be good to avoid for the
> multithreaded case, but the contention doesn't really explain why
> multi-threaded mode would be 50% slower. (I suppose the threading case
> could serialise things into a different order, perhaps one which is
> somehow pessimal for e.g. TCP)
> 
> It is quite easy to force the number of tasklets/threads to 1 (by
> forcing xen_netbk_group_nr to 1 in netback_init()). This might be an
> interesting experiment to see if the degradation is down to contention
> between threads or something else which has changed between 2.6.18 and
> 2.6.32 (there is an extent to which this is comparing apples to oranges
> but 50% is pretty severe...).

Hmm.. You are right.  I will run the above experiments next week.

>> I think to properly scale netback we need more fine grained locking.
> 
> Quite possibly. It doesn't seem at all unlikely that the domain lock on
> the guest-receive grant copy is going to hurt at some point. There are
> some plans to rework the guest receive path to do the copy on the guest
> side, the primary motivation is to remove load from dom0 and to allow
> better accounting of work to the guests to request it but a side-effect
> of this could be to reduce contention on dom0's domain_lock.
> 
> However I would like to get to the bottom of the 50% degradation between
> linux-2.6.18-xen.hg and xen.git#xen/stable-2.6.32.x before we move on to
> how we can further improve the situation in xen.git.

OK.

> An IRQ is associated with a VIF and multiple VIFs can be associated with
> a netbk.
> 
> I suppose we could bind the IRQ to the same CPU as the associated netrbk
> thread but this can move around so we'd need to follow it. The tasklet
> case is easier since, I think, the tasklet will be run on whichever CPU
> scheduled it, which will be the one the IRQ occurred on.
> 
> Drivers are not typically expected to behave in this way. In fact I'm
> not sure it is even allowed by the IRQ subsystem and I expect upstream
> would frown on a driver doing this sort of thing (I expect their answer
> would be "why aren't you using irqbalanced?"). If you can make this work
> and it shows real gains over running irqbalanced we can of course
> consider it.

OK.

Thanks.

--Kaushik
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel