WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] bonded NICs and Xen

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] bonded NICs and Xen
From: Fraser Campbell <fraser@xxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 07 Oct 2005 09:46:17 -0400
Delivery-date: Fri, 07 Oct 2005 13:43:41 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla Thunderbird 1.0.6 (X11/20050912)
Hi,

I am trying to configure Xen on top of an active-backup bonded configuration using eth0 and eth1. Dom0 can communicate, dom1 cannot communicate (well, it can with some extra work).

I have seen a few posts about this problem but didn't really find answers except "it should work" and "it works for me". I'm hoping someone can share their configuration or simply confirm for me that this really works as I would expect, I thought it worked at first but after more testing realized it did not.

This is where I am at in my debugging steps:

- when dom0 is configured to use "plain" eth0 and xen-br0, networking to
  dom0 and dom1 works correctly
- when dom0 is configured to use "plain" eth1 and xen-br0, networking to
  dom0 and dom1 works correctly
- when server is booted non-Xen, host can communicate correctly over
  bond0 interface, failover tested by plugging/unplugging cat5 works as
  expected
- when server is booted into Xen (only a single domU for now), bond0 and
  vif1.0 are attached to xen-br0
- when server is booted into Xen, dom0 communicates correctly and has no
  networking problems (at least that I have detected)
- dom1 cannot communicate over bonded bridge

The Xen startup scripts leave my dom0's IP address on bond0 as well as xen-br0, I have removed the IP from bond0, no improvement.

By using tcpdump I have determined that arp replies are not being received, let's pretend my router is 192.168.1.1 here is what I am seeing:

- from dom1 "ping -n 192.168.1.1"
- arp request "who-has 192.168.1.1" goes through
  (vif1.0->xen-br0->bond0->eth0->wire)
- arp request is received by 192.168.1.1 and router replies ("arp-reply
  192.168.1.1 is at ...")
- arp reply is not received ... sniffing on eth0/eth1, bond0, etc shows
  no arp reply

This sounds like a switch problem except for one issue; why does eth0->xen-br0->vif1.0 and eth1->xen-br0->vif1.0 work but neither eth0->bond0->xen-br0->vif1.0 or eth1->bond0->xen-br0->vif1.0 work?

Active-backup mode is not supposed to require any support from switch and my testing confirms this. Without Xen in the picture bonding and failover work exactly as they are supposed to.

I expected that there may be some security configured in our switches (perhaps one MAC per port) but that doesn't make sense either since things work correctly without bonding. I don't control our switches ...

If I hardcode MAC address of router in dom1's arp table then it can communicate with the world. Anything in the local subnet must be hardcoded in arp table though or it cannot get through (logical since arp replies are not being received).

It strikes me as strange that everything but arp is functioning correctly on the bridge ... isn't that one of the major functions of a bridge? Anyway I continue to read up on bridges hoping for my eureka moment.

I have played with rp_filter on all interfaces and other parameters that I thought might be relevant ... to no avail so far.

One other strange thing to note is that sometimes it works. In fact I had a bonded server in testing for quite a while and didn't notice this problem until I moved it to a new data centre (though I don't think I had testing failover originally) ... now it won't work even in the original lab setting.

Occassionally something happens and dom1 suddenly gets an arp-reply on it's own and talks, perhaps it's just coincidentally waiting for an arp-reply at the same time as dom0 and that gets passed though???

Once in a while if I down one interface in the bond things start working, not always. I'm trying to track down a cause here but no ideas so far.

If I hardcode the arp entries in dom1, things always work.

Using Xen 2.0.5/kernel 2.6.11 in dom0 (SuSE Pro 9.3) and kernel 2.6.5 in domU.

Ideas greatly appreciated, I've been mulling this one over for a while now! I will followup for the record if I happen to stumble over an answer myself.

Thanks,
Fraser

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] bonded NICs and Xen, Fraser Campbell <=