|
|
|
|
|
|
|
|
|
|
xen-devel
RE: [Xen-devel] arp during live migration
> > In my case, I NEVER see the gratuitous ARP being sent (confirmed
> using
> > tcpdump on peth0 in Dom0) and the return value from dev_queue_xmit
> is
> > sometimes 0 and sometimes 2 (that's PLUS 2 -- congestion
> notification
> > [NET_XMIT_CN]).
>
> I am seeing the same error, indeed it looks like it is NET_XMIT_CN. I
> also see 100% percent loss, the ARP never makes it to the wire in any
> of
> my tests.
>
So, I have a little more info now -- it seems that the ARP is being
assembled and passed to the backend driver BUT it is ignoring it because
the VIF link state is down (netif_carrier_ok() is returning FALSE) --
the link goes up shortly after, but the packet has been dropped by this
time.
The actual sequence of events is also a little strange (but *very*
reproducible):
. In the DomU, I see the following at the end of migration:
. First, netfront sees the backend state change to InitWait - this
causes it to
attempt to connect the rings and send the ARP (even though the
current state is
actually Connected).
. Next, the resume processing runs in netfront (I think this is
expected to run first but
it does not).
. Now it sees the back state change to InitWait a second time and
attempts to send the ARP
a second time.
. In Dom0:
. The first attempt to send the ARP is completely ignored since the
backend is not
connected yet (specifically, it hasn't set up the softirq handler)
. The first thing we see is the front end state changing to Connected
-- this causes
it to initialize the connection and setup the irq handler
. Now we see an irq signaled, BUT it is ignored by the backend because
netif_carrier_ok()
returns FALSE.
. The very next thing is the link becomes ready and the backend
completes its state
change to the Connected state.
It seems to me that problem lies in the fact that the backend sees the
ARP packet before it's finished setting up the vif and ignores it.
I don't know if this is relevant, but Dom0 is running with 2 VCPUs in
this configuration so it's possible that the timing window here was not
seen when Dom0 is run as a uni-processor...
/simgr
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|