RE: [Xen-devel] Shared disk corruption caused by migration

To:	"Dutton, Jeff" <Jeff.Dutton@xxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	RE: [Xen-devel] Shared disk corruption caused by migration
From:	"Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
Date:	Mon, 21 Aug 2006 20:58:52 +0100
Delivery-date:	Mon, 21 Aug 2006 12:59:12 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcbFQg1VaRXOhoOXS72Mkld2/HQgowAAaCrg
Thread-topic:	[Xen-devel] Shared disk corruption caused by migration

> The blkfront xenbus_driver doesn't have a "suspend" method.  I was
going to
> add one to flush the outstanding requests from the migration source to
fix
> the problem.  Or maybe we can cancel all outstanding I/O requests to
> eliminate the concurrency between the two nodes.  Does the Linux block
I/O
> interface allow the canceling of requests?
> 
> Anyone else seeing this problem?  Any other ideas for solutions?

There's already work in progress on this.

The simplest thing to do is to wait until the backend queues are empty
before signalling the destination host to unpause the relocated domain.
However, this would add to migration downtime. It would be nice if we
could quickly cancel the IOs queued at the original host, but Linux
doesn't have a good mechanism for this.

For targets that support fencing it's possible to quickly and
synchronously fence the original host. For other targets, we need to be
a bit cunning to minimize downtime: we can actually start running the VM
on the destination host before we've had the 'all queues empty' message
from the source host. We just have to be careful to make sure that we
don't issue any writes to blocks that also potentially still have writes
pending on them in the source host. If such a write occurs, we have to
stall issuing of the write until we receive the 'all queues empty' from
the source host. However, such conflicting writes are actually pretty
unusual, so the majority of relocations won't incur the stall. 

Stay tuned for a patch.

Ian
 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] Shared disk corruption caused by migration