This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: SKB paged fragment lifecycle on receive

To: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Subject: [Xen-devel] Re: SKB paged fragment lifecycle on receive
From: Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>
Date: Mon, 27 Jun 2011 15:42:04 +0100
Cc: "netdev@xxxxxxxxxxxxxxx" <netdev@xxxxxxxxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Delivery-date: Mon, 27 Jun 2011 07:43:07 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1308938183.2532.8.camel@edumazet-laptop>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Citrix Systems, Inc.
References: <1308930202.32717.144.camel@xxxxxxxxxxxxxxxxxxxxxx> <4E04C961.9010302@xxxxxxxx> <1308938183.2532.8.camel@edumazet-laptop>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Fri, 2011-06-24 at 18:56 +0100, Eric Dumazet wrote:
> Le vendredi 24 juin 2011 à 10:29 -0700, Jeremy Fitzhardinge a écrit :
> > On 06/24/2011 08:43 AM, Ian Campbell wrote:
> > > We've previously looked into solutions using the skb destructor callback
> > > but that falls over if the skb is cloned since you also need to know
> > > when the clone is destroyed. Jeremy Fitzhardinge and I subsequently
> > > looked at the possibility of a no-clone skb flag (i.e. always forcing a
> > > copy instead of a clone) but IIRC honouring it universally turned into a
> > > very twisty maze with a number of nasty corner cases etc. It also seemed
> > > that the proportion of SKBs which get cloned at least once appeared as
> > > if it could be quite high which would presumably make the performance
> > > impact unacceptable when using the flag. Another issue with using the
> > > skb destructor is that functions such as __pskb_pull_tail will eat (and
> > > free) pages from the start of the frag array such that by the time the
> > > skb destructor is called they are no longer there.
> > >
> > > AIUI Rusty Russell had previously looked into a per-page destructor in
> > > the shinfo but found that it couldn't be made to work (I don't remember
> > > why, or if I even knew at the time). Could that be an approach worth
> > > reinvestigating?
> > >
> > > I can't really think of any other solution which doesn't involve some
> > > sort of driver callback at the time a page is free()d.
> > 
> This reminds me the packet mmap (tx path) games we play with pages.
> net/packet/af_packet.c : tpacket_destruct_skb(), poking
> TP_STATUS_AVAILABLE back to user to tell him he can reuse space...

This is OK because af_packet involves no kernel side protocol and hence
there can be no retransmits etc? Otherwise you would suffer from the
same sorts of issues as I described with NFS at [0]?

However it seems like this might still have a problem if your SKBs are
ever cloned. What happens in this case, e.g if a user of AF_PACKET sends
a broadcast via a device associated with a bridge[1] (where it would be

Wouldn't you get into the situation where the destructor of the initial
skb is called before one or more of the clones going to a different
destination were sent. So you would send TP_STATUS_AVAILABLE to
userspace before the stack was really finished with the page and run the
risk of userspace reusing the buffer, leading to incorrect bytes going
to some destinations?

It looks to me that anything which does any zero-copy type thing to the
network stack will have problems with one or both of protocol retransmit
or SKB clone. There's simply no mechanism to know when the stack is
really finished with a page.


[0] http://marc.info/?l=linux-nfs&m=122424132729720&w=2
[1] Since dhcp clients typically use AF_PACKET and you typically put the
IP address on the bridge itself in these configurations this won't be
that unusual.


Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>