This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] arp during live migration

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] arp during live migration
From: Cristian Zamfir <zamf@xxxxxxxxxxxxx>
Date: Fri, 02 Mar 2007 20:54:02 +0000
Delivery-date: Fri, 02 Mar 2007 12:53:05 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird (X11/20070104)


I am having some trouble with the send_fake_arp in the netfront driver.

Normally, on my domU, which has no queuing disciplines compiled in, the packets are sent via dev_queue_xmit in net/core/dev.c and enqueued using pfifo_fast_enqueue in net/sched/sch_generic.c.

However, during live migration, send_fake_arp() returns -2 and does not go to pfifo_fast_enqueue any more. I have been able to trace it further than this code in dev_queue_xmit:

if (q->enqueue) {
                /* Grab device queue */
                rc = q->enqueue(skb, q);
                rc = rc == NET_XMIT_BYPASS ? NET_XMIT_SUCCESS : rc;
                goto out;

I noticed that the error code returned by send_fake_arp() is not checked. Would it be a good option to check the error code and reschedule the arp broadcast at a later time?

I have made some changes to xen 3.0.3 regarding block device migration so I might have messed things up. It could be the reason only few people reported this problem on xen-users. Obviously, the problem can also go unnoticed if a downtime of 1-2 seconds is tolerated.

Does anyone have any hints on why this might happen or how to search for more clues?

Thank you.


Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>