WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-api

Re: [Xen-API] Re: openvswitch problem with XCP 1.0 RC3

To: blp@xxxxxxxxxxxxxxx
Subject: Re: [Xen-API] Re: openvswitch problem with XCP 1.0 RC3
From: Christian Fischer <christian.fischer@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 21 Mar 2011 08:15:53 +0100
Cc: xen-api@xxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 21 Mar 2011 00:16:18 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <87pqps2no6.fsf@xxxxxxxxxxxxxxxx>
List-help: <mailto:xen-api-request@lists.xensource.com?subject=help>
List-id: Discussion of API issues surrounding Xen <xen-api.lists.xensource.com>
List-post: <mailto:xen-api@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-api>, <mailto:xen-api-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-api>, <mailto:xen-api-request@lists.xensource.com?subject=unsubscribe>
Organization: EasternGraphics GmbH
References: <EBDA14578E54D94191DD90533575FDFBF5EB3816@xxxxxxxxxxxxxxxxxxxxx> <87aah1cuyt.fsf@xxxxxxxxxxxxxxxx> <87pqps2no6.fsf@xxxxxxxxxxxxxxxx>
Sender: xen-api-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.13.5 (Linux/2.6.29-gentoo-r5; KDE/4.4.5; x86_64; ; )
On Tuesday 15 March 2011 20:23:21 Ben Pfaff wrote:
> Ben Pfaff <blp@xxxxxxxxxxxxxxx> writes:
> > Testing is taking longer than I expected, but we're still
> > planning to release this soon.
> 
> Here's the tested version of the kernel patch that we already
> passed along to Citrix.  I have already posted the corresponding
> OVS patch to ovs-dev here:
>         http://openvswitch.org/pipermail/dev/2011-March/028884.html
> 
> --8<--------------------------cut here-------------------------->8--
> 
> From: Ben Pfaff <blp@xxxxxxxxxx>
> Date: Fri, 25 Feb 2011 16:47:48 -0800
> Subject: [PATCH] 8021q: Add kluge to support 802.1Q VLANs on devices with
> buggy drivers.
> 
> Some Linux network drivers support a feature called "VLAN
> acceleration", associated with a data structure called a "vlan_group".
> A vlan_group is, abstractly, a dictionary that maps from a VLAN ID (in
> the range 0...4095) to a VLAN device, that is, a Linux network device
> associated with a particular VLAN, e.g. "eth0.9" for VLAN 9 on eth0.
> 
> Some drivers that support VLAN acceleration have bugs that fall
> roughly into the following categories:
> 
>     * Some NICs strip VLAN tags on receive if no vlan_group is
>       registered, so that the tag is completely lost.
> 
>     * Some drivers size their receive buffers based on whether a
>       vlan_group is enabled, meaning that a maximum size packet with a
>       VLAN tag will not fit if a vlan_group is not configured.
> 
>     * On transmit some drivers expect that VLAN acceleration will be
>       used if it is available (which can only be done if a vlan_group
>       is configured).  In these cases, the driver may fail to parse
>       the packet and correctly setup checksum offloading and/or TSO.
> 
> The correct long term solution is to fix these driver bugs.

FYI:
VLAN trunks work for us with igb, e1000, e1000e with all versions of XCP and 
openvswitch-1.1.0pre2.

The limitation is the incorrect receive buffer size which must be adjusted 
manually.

The drivers are patched to avoid transmit checksum errors, see 
http://patchwork.ozlabs.org/patch/24670/, but note that the patch is buggy in 
line 19.

> 
> This commit implements a workaround, which has been tested to work with the
> "igb" driver, which does not otherwise work with VLANs on Open vSwitch:
> 
>     * Implement a new VLAN device ioctl that sets up a vlan_group in
>       which all 4096 VLANs point back to the physical device.
> 
>     * In the code that handles received VLAN-accelerated packets, save
>       the VLAN on which the packet was received to a new sk_buff
>       member before clearing the VLAN number.
> 
>     * Check the value of the new member in the OVS kernel module
>       packet receive code.
> 
> This requires only minimal, low-risk changes to the core of the kernel
> network stack, as well as minimal changes to the Open vSwitch kernel
> module.
> 
> Driver Details
> --------------
> 
> The following drivers in linux-2.6.32.12-0.7.1.xs1.0.0.311.170586 are
> the ones that use "vlan_group" and are relevant to Open vSwitch on
> XenServer.  We have not tested any version of most of these drivers,
> so we do not know whether they have a VLAN problem that needs to be
> fixed:
> 
>     8139cp
>     acenic
>     amd8111e
>     atl1c
>     atl1e
>     atl1
>     atl2
>     be2net
>     bna
>     bnx2
>     bnx2x
>     cnic
>     cxgb
>     cxgb3
>     e1000
>     e1000e
>     enic
>     forcedeth
>     gianfar
>     igb
>     igbvf
>     ixgb
>     ixgbe
>     jme
>     mlx4_en
>     ns83820
>     qlge
>     r8169
>     s2io
>     sky2
>     starfire
>     tehuti
>     tg3
>     typhoon
>     via-velocity
>     vxge
> 
> Of the drivers that use "vlan_group" listed above, the following
> drivers inspect vlan devices in ways that imply that the VLAN
> workaround would not work for them.  These drivers have not yet been
> tested, so we do not know whether they need to use the VLAN workaround
> anyhow:
> 
>     cnic
>     cxgb3
> 
> Of the drivers that use "vlan_group" listed above, the following
> drivers appear to have bugs that would limit them to VLANs with VIDs
> in the range 0...63 with the kernel patch fix.  This is based on quick
> code inspection, so it is not necessarily an exhaustive list.  The
> drivers on this list have not yet been tested, so we do not know
> whether they need to use the VLAN workaround anyhow:
> 
>     via-velocity
> 
> The following drivers use "vlan_group" but are irrelevant to Open
> vSwitch on XenServer:
> 
>     bonding (not used with Open vSwitch)
>     ehea (cannot be built on x86; IBM Power architecture only)
>     stmmac (cannot be built on x86; SH4 architecture)
>     vmxnet3 (not shipped with XenServer; for use inside VMware VMs only)
> 
> Signed-off-by: Ben Pfaff <blp@xxxxxxxxxx>
> ---
>  include/linux/if_vlan.h |    5 ++-
>  net/8021q/vlan.c        |  131
> ++++++++++++++++++++++++++++++++++++++++++++-- net/8021q/vlan_core.c   |  
>  8 +++-
>  net/core/dev.c          |    1 +
>  4 files changed, 137 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
> index 46e2bf4..a0260d7 100644
> --- a/include/linux/if_vlan.h
> +++ b/include/linux/if_vlan.h
> @@ -84,6 +84,7 @@ struct vlan_group {
>       struct hlist_node       hlist;  /* linked list */
>       struct net_device **vlan_devices_arrays[VLAN_GROUP_ARRAY_SPLIT_PARTS];
>       struct rcu_head         rcu;
> +     bool                    loopback;
>  };
> 
>  static inline struct net_device *vlan_group_get_device(struct vlan_group
> *vg, @@ -325,7 +326,9 @@ enum vlan_ioctl_cmds {
>       SET_VLAN_NAME_TYPE_CMD,
>       SET_VLAN_FLAG_CMD,
>       GET_VLAN_REALDEV_NAME_CMD, /* If this works, you know it's a VLAN 
> device,
> btw */ -      GET_VLAN_VID_CMD /* Get the VID of this VLAN (specified by name)
> */ +  GET_VLAN_VID_CMD, /* Get the VID of this VLAN (specified by name) */
> +     ADD_ALL_VLANS_CMD,
> +     DEL_ALL_VLANS_CMD
>  };
> 
>  enum vlan_flags {
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index bd8a167..f74d10e 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -101,7 +101,8 @@ static void vlan_group_free(struct vlan_group *grp)
>       kfree(grp);
>  }
> 
> -static struct vlan_group *vlan_group_alloc(struct net_device *real_dev)
> +static struct vlan_group *vlan_group_alloc(struct net_device *real_dev,
> +                                        bool loopback)
>  {
>       struct vlan_group *grp;
> 
> @@ -110,6 +111,7 @@ static struct vlan_group *vlan_group_alloc(struct
> net_device *real_dev) return NULL;
> 
>       grp->real_dev = real_dev;
> +     grp->loopback = loopback;
>       hlist_add_head_rcu(&grp->hlist,
>                       &vlan_group_hash[vlan_grp_hashfn(real_dev->ifindex)]);
>       return grp;
> @@ -242,7 +244,7 @@ int register_vlan_dev(struct net_device *dev)
> 
>       grp = __vlan_find_group(real_dev);
>       if (!grp) {
> -             ngrp = grp = vlan_group_alloc(real_dev);
> +             ngrp = grp = vlan_group_alloc(real_dev, false);
>               if (!grp)
>                       return -ENOBUFS;
>               err = vlan_gvrp_init_applicant(real_dev);
> @@ -363,6 +365,90 @@ out_free_newdev:
>       return err;
>  }
> 
> +/* Attach a "loopback" VLAN device to every VLAN on dev.
> + * Returns 0 if successful, a negative error code otherwise.
> + */
> +static int register_all_vlans(struct net_device *dev)
> +{
> +     struct net_device **array;
> +     struct vlan_group *grp;
> +     unsigned int size;
> +     int err;
> +     int i;
> +
> +     ASSERT_RTNL();
> +
> +     if (__vlan_find_group(dev))
> +             return -EBUSY;
> +
> +     err = vlan_check_real_dev(dev, 0);
> +     if (err < 0)
> +             return err;
> +
> +     grp = vlan_group_alloc(dev, true);
> +     if (!grp)
> +             return -ENOBUFS;
> +
> +     size = sizeof(struct net_device *) * VLAN_GROUP_ARRAY_PART_LEN;
> +     array = kmalloc(size, GFP_KERNEL);
> +     if (array == NULL)
> +             return -ENOBUFS;
> +     for (i = 0; i < VLAN_GROUP_ARRAY_PART_LEN; i++)
> +             array[i] = dev;
> +     for (i = 0; i < VLAN_GROUP_ARRAY_SPLIT_PARTS; i++)
> +             grp->vlan_devices_arrays[i] = array;
> +
> +     /* Account for reference in struct vlan_dev_info */
> +     dev_hold(dev);
> +
> +     grp->nr_vlans = VLAN_GROUP_ARRAY_LEN;
> +
> +     if (dev->features & NETIF_F_HW_VLAN_RX)
> +             dev->netdev_ops->ndo_vlan_rx_register(dev, grp);
> +     if (dev->features & NETIF_F_HW_VLAN_FILTER)
> +             for (i = 0; i < VLAN_GROUP_ARRAY_LEN; i++)
> +                     dev->netdev_ops->ndo_vlan_rx_add_vid(dev, i);
> +
> +     return 0;
> +}
> +
> +static int unregister_all_vlans(struct net_device *dev)
> +{
> +     struct vlan_group *grp;
> +     int i;
> +
> +     ASSERT_RTNL();
> +
> +     grp = __vlan_find_group(dev);
> +     if (!grp || !grp->loopback)
> +             return -EINVAL;
> +
> +     /* Take it out of our own structures, but be sure to interlock with
> +      * HW accelerating devices or SW vlan input packet processing.
> +      */
> +     if (dev->features & NETIF_F_HW_VLAN_FILTER)
> +             for (i = 0; i < VLAN_GROUP_ARRAY_LEN; i++)
> +                     dev->netdev_ops->ndo_vlan_rx_kill_vid(dev, i);
> +
> +     for (i = 0; i < VLAN_GROUP_ARRAY_PART_LEN; i++)
> +             vlan_group_set_device(grp, i, NULL);
> +     grp->nr_vlans = 0;
> +
> +     if (dev->features & NETIF_F_HW_VLAN_RX)
> +             dev->netdev_ops->ndo_vlan_rx_register(dev, NULL);
> +
> +     hlist_del_rcu(&grp->hlist);
> +
> +     synchronize_net();
> +     kfree(grp->vlan_devices_arrays[0]);
> +     kfree(grp);
> +
> +     /* Get rid of the vlan's reference to dev */
> +     dev_put(dev);
> +
> +     return 0;
> +}
> +
>  static void vlan_sync_address(struct net_device *dev,
>                             struct net_device *vlandev)
>  {
> @@ -444,6 +530,12 @@ static int vlan_device_event(struct notifier_block
> *unused, unsigned long event, * as we run under the RTNL lock.
>        */
> 
> +     if (grp->loopback) {
> +             if (event == NETDEV_UNREGISTER)
> +                     unregister_all_vlans(dev);
> +             goto out;
> +     }
> +
>       switch (event) {
>       case NETDEV_CHANGE:
>               /* Propagate real device state to vlan devices */
> @@ -581,13 +673,18 @@ static int vlan_ioctl_handler(struct net *net, void
> __user *arg) case DEL_VLAN_CMD:
>       case GET_VLAN_REALDEV_NAME_CMD:
>       case GET_VLAN_VID_CMD:
> +     case ADD_ALL_VLANS_CMD:
> +     case DEL_ALL_VLANS_CMD:
>               err = -ENODEV;
>               dev = __dev_get_by_name(net, args.device1);
>               if (!dev)
>                       goto out;
> 
>               err = -EINVAL;
> -             if (args.cmd != ADD_VLAN_CMD && !is_vlan_dev(dev))
> +             if (args.cmd != ADD_VLAN_CMD &&
> +                 args.cmd != ADD_ALL_VLANS_CMD &&
> +                 args.cmd != DEL_ALL_VLANS_CMD &&
> +                 !is_vlan_dev(dev))
>                       goto out;
>       }
> 
> @@ -667,6 +764,20 @@ static int vlan_ioctl_handler(struct net *net, void
> __user *arg) err = -EFAULT;
>               break;
> 
> +     case ADD_ALL_VLANS_CMD:
> +             err = -EPERM;
> +             if (!capable(CAP_NET_ADMIN))
> +                     break;
> +             err = register_all_vlans(dev);
> +             break;
> +
> +     case DEL_ALL_VLANS_CMD:
> +             err = -EPERM;
> +             if (!capable(CAP_NET_ADMIN))
> +                     break;
> +             err = unregister_all_vlans(dev);
> +             break;
> +
>       default:
>               err = -EOPNOTSUPP;
>               break;
> @@ -769,9 +880,17 @@ static void __exit vlan_cleanup_module(void)
> 
>       dev_remove_pack(&vlan_packet_type);
> 
> -     /* This table must be empty if there are no module references left. */
> -     for (i = 0; i < VLAN_GRP_HASH_SIZE; i++)
> -             BUG_ON(!hlist_empty(&vlan_group_hash[i]));
> +     for (i = 0; i < VLAN_GRP_HASH_SIZE; i++) {
> +             struct hlist_head *hlist = &vlan_group_hash[i];
> +             while (!hlist_empty(hlist)) {
> +                     struct vlan_group *grp;
> +                     int err;
> +
> +                     grp = hlist_entry(hlist->first, struct vlan_group, 
> hlist);
> +                     err = unregister_all_vlans(grp->real_dev);
> +                     BUG_ON(err);
> +             }
> +     }
> 
>       unregister_pernet_gen_device(vlan_net_id, &vlan_net_ops);
>       rcu_barrier(); /* Wait for completion of call_rcu()'s */
> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> index 7f7de1a..8327334 100644
> --- a/net/8021q/vlan_core.c
> +++ b/net/8021q/vlan_core.c
> @@ -33,6 +33,11 @@ int vlan_hwaccel_do_receive(struct sk_buff *skb)
>       struct net_device *dev = skb->dev;
>       struct net_device_stats *stats;
> 
> +     if (!is_vlan_dev(dev)) {
> +             skb->pkt_type = PACKET_OTHERHOST;
> +             return 0;
> +     }
> +
>       skb->dev = vlan_dev_info(dev)->real_dev;
>       netif_nit_deliver(skb);
> 
> @@ -90,7 +95,8 @@ static int vlan_gro_common(struct napi_struct *napi,
> struct vlan_group *grp,
> 
>       for (p = napi->gro_list; p; p = p->next) {
>               NAPI_GRO_CB(p)->same_flow =
> -                     p->dev == skb->dev && !compare_ether_header(
> +                     p->dev == skb->dev && p->vlan_tci == skb->vlan_tci &&
> +                     !compare_ether_header(
>                               skb_mac_header(p), skb_gro_mac_header(skb));
>               NAPI_GRO_CB(p)->flush = 0;
>       }
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0b5bf64..4c99b87 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2742,6 +2742,7 @@ void napi_reuse_skb(struct napi_struct *napi, struct
> sk_buff *skb) {
>       __skb_pull(skb, skb_headlen(skb));
>       skb_reserve(skb, NET_IP_ALIGN - skb_headroom(skb));
> +     skb->vlan_tci = 0;
> 
>       napi->skb = skb;
>  }


_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api

<Prev in Thread] Current Thread [Next in Thread>