WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [PATCH] Network Checksum Removal

To: Xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] [PATCH] Network Checksum Removal
From: Jon Mason <jdmason@xxxxxxxxxx>
Date: Fri, 20 May 2005 18:30:15 -0500
Delivery-date: Fri, 20 May 2005 23:29:44 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.8i
Currently in Xen, interdomain communication needlessly wastes CPU cycles
calculating and verifying TCP/UDP checksums.  This is unnecessary, as
the possibility of packet corruption between domains is miniscule (and
can be detected in memory via ECC).  Also, domU's are unable to take
advantage of any adapter hardware checksum offload capabilities when
transmitting packets outside of the system.

This patch removes the inter-xen network checksums by using the existing
Linux hardware checksum offload infrastructure.  This decreased the
changes needed by this patch, and enabled me to easily use hardware
checksum on the physical
devices.

Here is how the traffic flow now works (generically):
Traffic generated by dom0 will not do the TCP/UDP checksums and will
notify domU this via the csum bit in netif_rx_response_t.  domU will
check for the csum bit on each incoming packet, and if not enabled it
will verify the checksum.

Traffic generated externally, if rx hardware checksum is available and
enabled, then dom0 will notify domU that it is unnecessary to validate
this checksum (providing the checksum is valid) by enabling the csum
bit.  If domU is not notified that it is unnecessary to vaildate the 
checksum, then domU will do it.

Traffic generated by domU will not do the TCP/UDP checksums and will
notify dom0 this via the csim bit in netif_tx_request_t.  dom0 will
check for the csum bit on each incoming packet, and if enabled it will
calculate the necessary bits for hardware checksum offload (skb->csum, 
which is the offset to insert the checksum).  It also sets
skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->flags |= SKB_FDW_NO_CSUM;

ip_summed is set in the case that the packet is destined for dom0, which
will prevent dom0 from checking the TCP/UDP checksum.  Unfortunately,
this flag is stomped on by both routing and bridging.  So I added a new
skb field and a new flag, SKB_FDW_NO_CSUM.  This field is checked on
transmission and corrects the fields that have been modified by the
bridging/routing code.  Once these fields have been corrected, the
adapter (if tx csum able) or stack (via skb_checksum_help()) will
calculate the TCP/UDP checksum.

Performance:
I ran the following test cases with netperf3 TCP_STREAM, and get the
following boosts (using bridging):
domU->dom0              500Mbps
dom0->domU              10Mbps
domU->remote host       none
domU->domU              70Mbps
Note: I have a small bridging patch which increases dom0 throughput.  I
am in the process of having it accepted into the Linux kernel.

I currently do not have CPU utilization numbers (where the real boost of
this patch would be), and I do not have throughput numbers for
routing/nat.


Also, I added the ability to enable/disable checksum offload via the
ethtool command.  

Signed-off-by: Jon Mason <jdmason@xxxxxxxxxx>

--- ../xen-unstable-pristine/xen/include/public/io/netif.h      2005-05-04 
22:20:10.000000000 -0500
+++ xen/include/public/io/netif.h       2005-05-18 12:05:41.000000000 -0500
@@ -12,7 +12,8 @@
 typedef struct {
     memory_t addr;   /*  0: Machine address of packet.  */
     MEMORY_PADDING;
-    u16      id;     /*  8: Echoed in response message. */
+    u16      csum:1;
+    u16      id:15;     /*  8: Echoed in response message. */
     u16      size;   /* 10: Packet size in bytes.       */
 } PACKED netif_tx_request_t; /* 12 bytes */
 
@@ -29,7 +30,8 @@ typedef struct {
 typedef struct {
     memory_t addr;   /*  0: Machine address of packet.              */
     MEMORY_PADDING;
-    u16      id;     /*  8:  */
+    u16      csum:1;
+    u16      id:15;     /*  8:  */
     s16      status; /* 10: -ve: BLKIF_RSP_* ; +ve: Rx'ed pkt size. */
 } PACKED netif_rx_response_t; /* 12 bytes */
 
--- 
../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netback/netback.c  
    2005-05-04 22:20:01.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netback/netback.c       2005-05-19 
13:25:50.000000000 -0500
@@ -13,6 +13,9 @@
 #include "common.h"
 #include <asm-xen/balloon.h>
 #include <asm-xen/evtchn.h>
+#include <net/ip.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
 #include <linux/delay.h>
@@ -154,10 +157,14 @@ int netif_be_start_xmit(struct sk_buff *
         __skb_put(nskb, skb->len);
         (void)skb_copy_bits(skb, -hlen, nskb->data - hlen, skb->len + hlen);
         nskb->dev = skb->dev;
+       nskb->ip_summed = skb->ip_summed;
         dev_kfree_skb(skb);
         skb = nskb;
     }
 
+    if (skb->ip_summed > 0)
+       netif->rx->ring[MASK_NETIF_RX_IDX(netif->rx_resp_prod)].resp.csum = 1;
+       
     netif->rx_req_cons++;
     netif_get(netif);
 
@@ -646,6 +653,18 @@ static void net_tx_action(unsigned long 
         skb->dev      = netif->dev;
         skb->protocol = eth_type_trans(skb, skb->dev);
 
+       skb->csum = 0;
+       if (txreq.csum) {
+               skb->ip_summed = CHECKSUM_UNNECESSARY;
+               skb->flags |= SKB_FDW_NO_CSUM;
+               skb->nh.iph = (struct iphdr *) skb->data;
+               if (skb->nh.iph->protocol == IPPROTO_TCP)
+                       skb->csum = offsetof(struct tcphdr, check);
+               if (skb->nh.iph->protocol == IPPROTO_UDP)
+                       skb->csum = offsetof(struct udphdr, check);
+       } else
+               skb->ip_summed = CHECKSUM_NONE;
+
         netif->stats.rx_bytes += txreq.size;
         netif->stats.rx_packets++;
 
--- 
../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netback/interface.c
    2005-05-04 22:20:09.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netback/interface.c     2005-05-20 
10:36:14.000000000 -0500
@@ -159,6 +159,7 @@ void netif_create(netif_be_create_t *cre
     dev->get_stats       = netif_be_get_stats;
     dev->open            = net_open;
     dev->stop            = net_close;
+    dev->features        = NETIF_F_NO_CSUM;
 
     /* Disable queuing. */
     dev->tx_queue_len = 0;
--- 
../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netfront/netfront.c
    2005-05-04 22:20:11.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netfront/netfront.c     2005-05-20 
13:15:39.000000000 -0500
@@ -40,6 +40,7 @@
 #include <linux/init.h>
 #include <linux/bitops.h>
 #include <linux/proc_fs.h>
+#include <linux/ethtool.h>
 #include <net/sock.h>
 #include <net/pkt_sched.h>
 #include <net/arp.h>
@@ -287,6 +288,11 @@ static int send_fake_arp(struct net_devi
     return dev_queue_xmit(skb);
 }
 
+static struct ethtool_ops network_ethtool_ops = {
+       .get_tx_csum = ethtool_op_get_tx_csum,
+       .set_tx_csum = ethtool_op_set_tx_csum,
+};
+
 static int network_open(struct net_device *dev)
 {
     struct net_private *np = netdev_priv(dev);
@@ -472,6 +478,7 @@ static int network_start_xmit(struct sk_
     tx->id   = id;
     tx->addr = virt_to_machine(skb->data);
     tx->size = skb->len;
+    tx->csum = (skb->ip_summed) ? 1 : 0;
 
     wmb(); /* Ensure that backend will see the request. */
     np->tx->req_prod = i + 1;
@@ -572,6 +579,9 @@ static int netif_poll(struct net_device 
         skb->len  = rx->status;
         skb->tail = skb->data + skb->len;
 
+       if (rx->csum)
+               skb->ip_summed = CHECKSUM_UNNECESSARY;
+               
         np->stats.rx_packets++;
         np->stats.rx_bytes += rx->status;
 
@@ -966,7 +976,9 @@ static int create_netdev(int handle, str
     dev->get_stats       = network_get_stats;
     dev->poll            = netif_poll;
     dev->weight          = 64;
-    
+    dev->features       = NETIF_F_IP_CSUM;
+    SET_ETHTOOL_OPS(dev, &network_ethtool_ops);
+
     if ((err = register_netdev(dev)) != 0) {
         printk(KERN_WARNING "%s> register_netdev err=%d\n", __FUNCTION__, err);
         goto exit;
--- ../xen-unstable-pristine/linux-2.6.11-xen0/include/linux/skbuff.h   
2005-03-02 01:38:38.000000000 -0600
+++ linux-2.6.11-xen0/include/linux/skbuff.h    2005-05-18 12:05:41.000000000 
-0500
@@ -37,6 +37,10 @@
 #define CHECKSUM_HW 1
 #define CHECKSUM_UNNECESSARY 2
 
+#define SKB_CLONED     1
+#define SKB_NOHDR      2
+#define SKB_FDW_NO_CSUM        4
+
 #define SKB_DATA_ALIGN(X)      (((X) + (SMP_CACHE_BYTES - 1)) & \
                                 ~(SMP_CACHE_BYTES - 1))
 #define SKB_MAX_ORDER(X, ORDER)        (((PAGE_SIZE << (ORDER)) - (X) - \
@@ -238,7 +242,7 @@ struct sk_buff {
                                mac_len,
                                csum;
        unsigned char           local_df,
-                               cloned,
+                               flags,
                                pkt_type,
                                ip_summed;
        __u32                   priority;
@@ -370,7 +374,7 @@ static inline void kfree_skb(struct sk_b
  */
 static inline int skb_cloned(const struct sk_buff *skb)
 {
-       return skb->cloned && atomic_read(&skb_shinfo(skb)->dataref) != 1;
+       return (skb->flags & SKB_CLONED) && 
atomic_read(&skb_shinfo(skb)->dataref) != 1;
 }
 
 /**
--- ../xen-unstable-pristine/linux-2.6.11-xen0/net/core/skbuff.c        
2005-03-02 01:38:17.000000000 -0600
+++ linux-2.6.11-xen0/net/core/skbuff.c 2005-05-18 12:05:41.000000000 -0500
@@ -240,7 +240,7 @@ static void skb_clone_fraglist(struct sk
 
 void skb_release_data(struct sk_buff *skb)
 {
-       if (!skb->cloned ||
+       if (!(skb->flags & SKB_CLONED) ||
            atomic_dec_and_test(&(skb_shinfo(skb)->dataref))) {
                if (skb_shinfo(skb)->nr_frags) {
                        int i;
@@ -352,7 +352,7 @@ struct sk_buff *skb_clone(struct sk_buff
        C(data_len);
        C(csum);
        C(local_df);
-       n->cloned = 1;
+       n->flags = skb->flags | SKB_CLONED;
        C(pkt_type);
        C(ip_summed);
        C(priority);
@@ -395,7 +395,7 @@ struct sk_buff *skb_clone(struct sk_buff
        C(end);
 
        atomic_inc(&(skb_shinfo(skb)->dataref));
-       skb->cloned = 1;
+       skb->flags |= SKB_CLONED;
 
        return n;
 }
@@ -603,7 +603,7 @@ int pskb_expand_head(struct sk_buff *skb
        skb->mac.raw += off;
        skb->h.raw   += off;
        skb->nh.raw  += off;
-       skb->cloned   = 0;
+       skb->flags    &= SKB_CLONED;
        atomic_set(&skb_shinfo(skb)->dataref, 1);
        return 0;
 
--- ../xen-unstable-pristine/linux-2.6.11-xen0/net/core/dev.c   2005-03-02 
01:38:09.000000000 -0600
+++ linux-2.6.11-xen0/net/core/dev.c    2005-05-20 10:20:36.000000000 -0500
@@ -98,6 +98,7 @@
 #include <linux/stat.h>
 #include <linux/if_bridge.h>
 #include <linux/divert.h>
+#include <net/ip.h> 
 #include <net/dst.h>
 #include <net/pkt_sched.h>
 #include <net/checksum.h>
@@ -1182,7 +1183,7 @@ int __skb_linearize(struct sk_buff *skb,
        skb->data    += offset;
 
        /* We are no longer a clone, even if we were. */
-       skb->cloned    = 0;
+       skb->flags    &= ~SKB_CLONED;
 
        skb->tail     += skb->data_len;
        skb->data_len  = 0;
@@ -1236,6 +1237,15 @@ int dev_queue_xmit(struct sk_buff *skb)
            __skb_linearize(skb, GFP_ATOMIC))
                goto out_kfree_skb;
 
+       /* If packet is forwarded to a device that needs a checksum and not 
+        * checksummed, correct the pointers and enable checksumming in the 
+        * next function.
+        */
+       if (skb->flags & SKB_FDW_NO_CSUM) {
+               skb->ip_summed = CHECKSUM_HW;
+               skb->h.raw = (void *)skb->nh.iph + (skb->nh.iph->ihl * 4);
+       }
+
        /* If packet is not checksummed and device does not support
         * checksumming for this protocol, complete checksumming here.
         */

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel