This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-community] A weird bug in Xen networking?

To: xen-community@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-community] A weird bug in Xen networking?
From: Danilo Godec <danilo.godec@xxxxxxxxx>
Date: Wed, 06 Oct 2010 16:21:50 +0200
Delivery-date: Wed, 06 Oct 2010 07:22:03 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-archive: <http://lists.xensource.com/archives/html/xen-community>
List-help: <mailto:xen-community-request@lists.xensource.com?subject=help>
List-id: Community Discussion <xen-community.lists.xensource.com>
List-post: <mailto:xen-community@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-community>, <mailto:xen-community-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-community>, <mailto:xen-community-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-community-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100802 SUSE/3.1.2 Thunderbird/3.1.2
 I think I've hit a weird and mostly hidden bug in Xen, but I'm not 100%

Here's the setup - I have a OpenSuSE 11.2 based Dom0 (Xen 3.4.1). Dom0
is also acting as a router / firewall and it provides WAN connectivity
for DomU's by means of IPSEC (OpenSwan). I use 'bridged' networking for
DomU's, there are several NIC's as each DomU belongs to a separate
subnet. Dom0's bridge interfaces have an IP also belonging to respective
subnet and this IP is used as a gateway for the subnet.

DomU's are also OpenSuSE 11.2. I use 'cfengine' to centrally manage most
of the configuration and (custom) software distribution.

That's where things go south - when I run cfengine's 'cfagent', it runs
and it works up to a point where it just hangs. I can interrupt it with
'CTRL-C' or I can wait till it timeout's (socket timeout). Initially I
thought it's cfengine's problem, but then I noticed that a similar thing
happens when I connect to a DomU with SSH and run 'ls -lR /' - it goes
through some directories but eventually it just stalls (and I have to
disconnect the SSH session to 'get out').

Everytime such a 'hang' happens I see some OpenSwan / ipsec errors on Dom0:

   klips_error:ipsec_xmit_encap_once: tried to skb_put 20, 16
available.  This should never happen, please report.

The numbers vary somewhat (sometimes it's 21, 17 instead 20,16).

I posted all my 'findings' on OpenSwam mailing list thinking it might be
an OpenSwan issue, but one of the developers said it doesn't look like
'their' issue and that I should talk to 'Xen guys'. Here is the relevant
part of his reply:

> Yeah, this does not seem to be an openswan bug. The code in question is:
> (one instance of it):
>         /* Set the data pointer */
>         skb_reserve(n,skb->data-skb->head+headroom);
>         /* Set the tail pointer and length */
>         if(skb_tailroom(n) < skb->len) {
>                 printk(KERN_WARNING "klips_error:skb_copy_expand: "
>                        "tried to skb_put %ld, %d available.  This
> should never happen, please report.\n",
>                        (unsigned long int)skb->len,
>                        skb_tailroom(n));
>                 ipsec_kfree_skb(n);
>                 return NULL;
>         }
> I would check with the xen people to see what might be going on. 

So here I am, asking the 'Xen guys'.

Does anyone have any idea what might be going on?

 Regards, Danilo

Xen-community mailing list

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-community] A weird bug in Xen networking?, Danilo Godec <=