On Wed, 2011-07-13 at 17:11 +0100, Laszlo Ersek wrote:
> On 07/13/11 15:29, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jul 13, 2011 at 01:44:47PM +0200, Laszlo Ersek wrote:
> >> In addition to backporting 43223efd9bfd to the RHEL-5 host side, we needed
> >> the following in the RHEL-6 guest, in order to fix the network outage after
> >> live migration. I also tested a Fedora-15 guest (without the patch), and
> >> the
> >
> > Laszlo,
> >
> > This description is .. well, pointless for upstream patches. Just succinctly
> > describe the problem, how to reproduce it, and what this patch does.
>
> I avoided writing up the commit message myself because in my first posting [1]
> I quoted two paragraphs from Ian Campbell [2]. I intended those two cited
> paragraphs verbatim as the commit message (along with my short remarks),
> because they describe the problem exactly.
Please don't expect/require that maintainers trawl around and construct
a commit message for you, always propose the full text you would like to
see committed in each patch posting.
> Anyway:
>
> After a guest is live migrated, the xen-netfront driver emits a gratuitous ARP
> message, so that networking hardware on the target host's subnet can take
> notice, and public routing to the guest is re-established. However, if the
> packet appears on the backend interface before the backend is added to the
> target host's bridge, the packet is lost, and the migrated guest's peers
> become
> unable to talk to the guest.
>
> A sufficient two-parts condition to prevent the above is:
>
> (1) ensure that the backend only moves to Connected xenbus state after its
> hotplug scripts completed, ie. the netback interface got added to the bridge;
> and
>
> (2) ensure the frontend only queues the gARP when it sees the backend move to
> Connected.
>
> These two together provide complete ordering. Sub-condition (1) is satisfied
> by pvops commit 43223efd9bfd.
>
> In general, the full condition is sufficient, not necessary, because,
> according
> to [2], live migration has been working for a long time without satisfying
> sub-condition (2). However, after 43223efd9bfd was backported to the RHEL-5
> host to ensure (1), (2) still proved necessary in the RHEL-6 guest. This patch
> intends to provide (2) for upstream.
I expect that 43223efd9bfd just reduces the window (enough to prevent it
occuring most of the time) but doesn't actually close it. This change
seems good to me.
> [1] http://lists.xensource.com/archives/html/xen-devel/2011-07/msg00327.html
> [2] http://lists.xensource.com/archives/html/xen-devel/2011-06/msg01969.html
>
> Signed-off-by: Laszlo Ersek <lersek@xxxxxxxxxx>
Reviewed-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> ---
> drivers/net/xen-netfront.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index d29365a..f033656 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1646,7 +1646,6 @@ static void netback_changed(struct xenbus_device *dev,
> case XenbusStateInitialised:
> case XenbusStateReconfiguring:
> case XenbusStateReconfigured:
> - case XenbusStateConnected:
> case XenbusStateUnknown:
> case XenbusStateClosed:
> break;
> @@ -1657,6 +1656,9 @@ static void netback_changed(struct xenbus_device *dev,
> if (xennet_connect(netdev) != 0)
> break;
> xenbus_switch_state(dev, XenbusStateConnected);
> + break;
> +
> + case XenbusStateConnected:
> netif_notify_peers(netdev);
> break;
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|