Summary:
After sending some UDP traffic between two xen domains (Domain 0 and
Domain 1) the networking between the domains fails. This failure is 100%
repeatable.
In more detail:
I have two xen domains. They run the kernels from the 2.0.3 release. (I've run
into the same problem with 2.0.1 as well.) Domain 0 has 5 physical ethernet
interfaces, and a virtual interface to Domain 1. Domain 1 has just the virtual
interface to Domain 0.
D0 is configured with IP address 192.168.0.1, and D1 with 192.168.1.1. The
netmask is set to 255.255.0.0.
When I bring up D1, I can ping D1 from D0, ssh into D1, etc.
I then start a UDP server in D0, and a traffic generator in D1. After the
traffic generator sends its 128-th packet, networking between the domains
fails. The 128th packet is received successfully by the UDP server, but no
later traffic arrives in D0. This includes UDP, TCP, ICMP, and ARP.
Looking at the interrupt counts in /proc/interrupts, I see that D0 no longer
receives packets sent by D1. D1, however, does receive packets sent by D0. (To
be clear, D0->D1 traffic is ICMP ping requests, unrelated to the UDP traffic.
There is not UDP traffic sent from D0 to D1.)
(I suspect the stuff in this paragraph doesn't matter, but include it for
completeness.) Eventually, D0's ARP cache entry for D1 expires. D0 ARPs for D1,
and D1 replies. But D0 never receives these replies. And eventually, D1 stops
replying to the ARPs entirely. (D1's sending behavior is observed via tcpdump
running in the console connection to D1.)
Note that the networking failure only occurs if the UDP packets are delivered
to a user-level process in D0. In particular, UDP traffic to D0's kernel NFS
server does not induce the failure. Nor does traffic sent to D0 for which there
is no user process to accept the packets. And neither does traffic which is
forwarded on to other hosts via NAT. (I haven't tested the regular forwarding
case.)
Also, for what it's worth, Domain 0's network connectivity on its other
interfaces (which are connected to the world at large) are unaffected.
Looking through the mailing list archive, I saw a prior bug that seemed
similar, but involved IP fragmentation. That is not the case here, as the UDP
packets sent by D1 are small (<100 bytes).
Any suggestions for debugging this?
Thanks,
mukesh
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel
|