WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] arp during live migration

To: "Cristian Zamfir" <zamf@xxxxxxxxxxxxx>, "Jacob Gorm Hansen" <jacobg@xxxxxxx>
Subject: RE: [Xen-devel] arp during live migration
From: "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>
Date: Mon, 5 Mar 2007 18:47:46 -0500
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 05 Mar 2007 15:47:21 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: 45EC39E2.3020100@xxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: 45E88EEA.4020707@xxxxxxxxxxxxx<342BAC0A5467384983B586A6B0B3767104DC3DAF@xxxxxxxxxxxxxxxxxxxxx><1172938895.14470.25.ca mel@xxxxxxxxxxxxxxxxxxxxx> 45EC39E2.3020100@xxxxxxxxxxxxx
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcdfPJTYp3d71aDxRxK0gccT+4xaGgAQl6Og
Thread-topic: [Xen-devel] arp during live migration
>  > In my case, I NEVER see the gratuitous ARP being sent (confirmed
> using
>  > tcpdump on peth0 in Dom0) and the return value from dev_queue_xmit
> is
>  > sometimes 0 and sometimes 2 (that's PLUS 2 -- congestion
> notification
>  > [NET_XMIT_CN]).
> 
> I am seeing the same error, indeed it looks like it is NET_XMIT_CN. I
> also see 100% percent loss, the ARP never makes it to the wire in any
> of
> my tests.
> 

So, I have a little more info now -- it seems that the ARP is being
assembled and passed to the backend driver BUT it is ignoring it because
the VIF link state is down (netif_carrier_ok() is returning FALSE) --
the link goes up shortly after, but the packet has been dropped by this
time.

The actual sequence of events is also a little strange (but *very*
reproducible):

. In the DomU, I see the following at the end of migration:
   . First, netfront sees the backend state change to InitWait - this
causes it to
     attempt to connect the rings and send the ARP (even though the
current state is
     actually Connected).
   . Next, the resume processing runs in netfront (I think this is
expected to run first but
     it does not).
   . Now it sees the back state change to InitWait a second time and
attempts to send the ARP
     a second time.

. In Dom0:
  . The first attempt to send the ARP is completely ignored since the
backend is not
    connected yet (specifically, it hasn't set up the softirq handler)
  . The first thing we see is the front end state changing to Connected
-- this causes
    it to initialize the connection and setup the irq handler
  . Now we see an irq signaled, BUT it is ignored by the backend because
netif_carrier_ok() 
    returns FALSE.
  . The very next thing is the link becomes ready and the backend
completes its state
    change to the Connected state.

It seems to me that problem lies in the fact that the backend sees the
ARP packet before it's finished setting up the vif and ignores it.

I don't know if this is relevant, but Dom0 is running with 2 VCPUs in
this configuration so it's possible that the timing window here was not
seen when Dom0 is run as a uni-processor...

/simgr

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel