Before I run this by the devs, I thought I'd pose the question here
first, in case it's just something I'm overlooking. I do suspect that
it's a bug, though...
I was creating a test scenario in a lab environment, which was a
bit more complicated (just more variables) than what I'm about to
describe here, but I managed to remove as many variables as I could, for
the purpose of tracking down a problem I was having with it.
What I determined was that consistently, any packets forwarded
through the domU's kernel would show up at their destination with the
wrong checksum. Not only was it wrong, every packet has the same
checksum (0x9e85). Using tcpdump to watch the packets through all of
the virtual interfaces, bridges and even the dom0's physical interface,
they show the correct checksum. It's only at the destination's
interface where it was incorrect (and thus dropped there, causing
communications to fail). I still suspect Xen's networking to be the
source of the problem because if I use the dom0's kernel to do the same,
exact forwarding, it works just fine.
Another peculiarity is that ICMP forwarded through the domU is
dropped by the dom0's kernel and every packet causes this message to
appear in dmesg: Attempting to checksum a non-TCP/UDP packet, dropping
a protocol 1 packet
Here's the set-up:
I have two, generic desktop machines, running Debian Linux (not Xen):
One with IP address 192.168.15.3/24, the other with IP address
192.168.14.3/24. I have a Xen server, also running Debian Linux for the
privileged domain (dom0), with three ethernet interfaces. There are no
IP addresses bound to eth1, eth2, peth1, or peth2, but their links are
in the "UP" state. The Xen server is hosting a Debian Linux domU that
has all three of the dom0's ethernet interfaces bridged to them. The
first desktop is connected to eth1 on the Xen server and the second is
connected to eth2 on the Xen server (eth0 is not really used in this
test). I'm using cross-over cables for these connections in order to
eliminate ethernet switches as being culprits. The domU is configured
with 192.168.15.1/24 on eth1 and 192.168.14.1/24 on eth2. The desktops
are using those as their respective default gateways.
/proc/sys/net/ipv4/ip_forward shows "1" on the domU and there's nothing
in iptables.
One of the two desktop machines (the one at 192.168.15.3) has SSH
listening on port 22120. From the domU, both 192.168.15.3 and
192.168.14.3 are pingable. Connecting to port 22120 on the 192.168.15.3
machine is also possible from the domU. Attempting a connection from
the 192.168.14.3 machine to the 192.168.15.3 machine on that port yields
the following results in tcpdump:
****************
[eth1 on the domU]
192.168.14.3.42062 > 192.168.15.3.22120: Flags [S], cksum 0x6c4b
(correct), seq 2616249367, win 5840, options [mss 1460,sackOK,TS val
15712992 ecr 0,nop,wscale 6], length 0
[vif6.1 on the dom0]
192.168.14.3.42062 > 192.168.15.3.22120: Flags [S], cksum 0x6c4b
(correct), seq 2616249367, win 5840, options [mss 1460,sackOK,TS val
15712992 ecr 0,nop,wscale 6], length 0
[eth1 on the dom0] (the bridge)
192.168.14.3.42062 > 192.168.15.3.22120: Flags [S], cksum 0x6c4b
(correct), seq 2616249367, win 5840, options [mss 1460,sackOK,TS val
15712992 ecr 0,nop,wscale 6], length 0
[peth1 on the dom0]
192.168.14.3.42062 > 192.168.15.3.22120: Flags [S], cksum 0x6c4b
(correct), seq 2616249367, win 5840, options [mss 1460,sackOK,TS val
15712992 ecr 0,nop,wscale 6], length 0
[eth0 on the destination desktop machine]
192.168.14.3.42062 > 192.168.15.3.22120: Flags [S], cksum 0x9e85
(incorrect -> 0x6c4b), seq 2616249367, win 5840, options [mss
1460,sackOK,TS val 15712992 ecr 0,nop,wscale 6], length 0
****************
From this, I know that the packet is being correctly forwarded by
the domU kernel because it's arriving on eth2 from one desktop machine
and it's showing up on eth1, headed toward the second desktop machine.
The checksum shown on the tcpdump from eth1 on the domU (the interface
it is being forwarded to) is correct.
I tried the test with the domU's checksum offloading turned on and
with it turned off. Both yield the same result. It's as though, just
as it's physically going out on the wire, the checksum gets changed, and
it always seems to be set to 0x9e85, regardless of what's in the packet.
The version of Xen is 3.4.3 (from debian [testing] package
xen-hypervisor-3.4-amd64_3.4.3~rc3-1_amd64.deb). The kernel is 2.6.32.3
from Jeremy Fitzhardinge's stable-2.6.32.x git branch. The server has
the following hardware:
Tyan Thunder K8S Pro S2882
2 AMD Opteron 240 1.4GHz
4GB RAM
LSI MegaRAID MRSCSI320-2X
The on-board ethernet controllers are:
Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) (tigon)
Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) (tigon)
Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 10) (e100)
Has anyone else run across this problem? It only seems to be
affecting packets forwarded through the domU kernel. All other
communications seem to be behaving normally.
--
Scott Garron
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|