WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] pv_ops kernel and network problems (checksum offloading?)

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] pv_ops kernel and network problems (checksum offloading?)
From: Markus Schuster <ml@xxxxxxxxxxxxxxxxxxxx>
Date: Sun, 10 Jan 2010 02:08:49 +0100
Delivery-date: Sat, 09 Jan 2010 17:09:21 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.12.2 (Linux/2.6.29-2-amd64; KDE/4.3.2; x86_64; ; )
Hi list,

I'm experiencing some very strange network problems when using a masquerading 
router domU with pv_ops kernels.
First of all here is some ASCII art explaining my network configuration:

                  +---------------------+
               +--|-eth0   domU2   eth1-|-----+
               |  +---------------------+     |
               |                              |
               |  +---------------------+     |
               |  |        domU1   eth1-|--+  |
               |  +---------------------+  |  |
               |                           |  |
       +-------|---------------------------|--|--------+
       |       | vif2.0             vif1.1 |  | vif2.1 |
 Internet      |                           |  |        |
 <-----|----- brexternal     dom0        brinternal    |
       | eth0                                          |
       +-----------------------------------------------+

domU1 intentionally has no internet connection and domU2 acts as masquerading 
router for the internal network. 
Configuration is very very basic, on domU2 I've issued the following commands:
# echo 1 > /proc/sys/net/ipv4/ip_forward
# iptables -A POSTROUTING -t nat -s <internal/net> -j MASQUERADE

Now the problems:

1. ICMP
When I try to ping an internet host from domU1, dom0 kernel logs the following 
message for every ICMP echo request packet domU1 tries to send:
--- cut ---
Attempting to checksum a non-TCP/UDP packet, dropping a protocol 1 packet
--- cut ---
IP protocol 1 is ICMP, so this matches. Using tcpdump I've been able to follow 
the ping packets their way: domU1-eth1 -> vif1.1 -> brinternal -> vif2.1 -> 
domU2-eth1 -> domU2-eth0
The packet never reaches vif2.0 - it gets dropped somewhere between (according 
to the message I see, I would expect dom0 kernel to be the problem)
Issuing the same ping command directly on domU2 works without any problems. 

2. TCP
When I try to connect to an internet host by TCP from domU1 I see a very very 
odd behavior:
The TCP SYN packet leaves dom0 on eth0 as desired and reaches the remote host. 
But the remote host never responds with a SYN/ACK packet, so I took a deeper 
look with tcpdump and Wireshark: The packet *seems* to leave dom0 eth0 with 
correct TCP checksum but enters the remote host with TCP checksum ALWAYS set 
to 0xeeee - which is wrong of course, so the remote host drops the SYN packet. 
But I'm very sure the packet leaves dom0 with wrong checksum. 
Next I remembered the early XEN 3 days where we have been forced to use 
ethtool to disable checksum offloading everywhere, so I did the same: I used 
"ethtool -K <interface> tx off" for EVERY interface in the communication path 
(domU1-eth1, vif1.1, brinternal, vif2.1, domU2-eth1, domU2-eth0, vif2.0, 
brexternal and dom0-eth0) but the only effect this gives is that now I see the 
packet leaving dom0 at eth0 with a wrong checksum (0xeeee). 
I have no problem connecting to this host directly from domU2. 

My system configuration:
Debian lenny amd64 everywhere
XEN 3.4.2 (Debian unstable built for lenny)
dom0 kernel: pv_ops from Jeremies tree (changeset 
8735edb4a976105fd29c97c00c6d14760537e4ee)
domU kernel: pv_ops 2.6.29-2 (from Debian unstable) (would like to go to newer 
kernel, but there's that other nasty bug :))

This looks like some sort of checksum offloading bug in pv_ops kernel tree 
that kicks in when using a domU to route (and masquerade) other traffic.

Any ideas?

Regards,
Markus

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users