On Mon, 2011-07-18 at 06:25 -0400, Dave Scott wrote:
> Hi Daniel,
>
> Thanks for your thoughts. I think I'll add a wiki page later to describe the
> DRBD-based design idea and we can list some pros and cons of each perhaps.
>
> I'm still not a DRBD expert but I've now read the manual and configured it a
> few times (where 'few' = 'about 3') :)
>
> Daniel wrote:
> > If only FS integrity matters, you can run a coarser series of updates,
> > for asynchronous mirroring. I suspect DRBD to do at least sth like that
> > (I'm not a DRBD expert either). I'm not sure if the asynchronous mode I
> > see on the feature list allows for conclusions on DRBD's idea of HA in
> > any way. It may just limit HA to being synchronous mode. Does anyone
> > know?
>
> It seems that DRBD can operate in 3 different synchronization modes:
>
> 1. fully synchronous: writes are ACK'ed only when written to both disks
> 2. asynchronous: writes are ACK'ed when written to the primary disk (data is
> somewhere in-flight to the secondary)
> 3. semi-synchronous: writes are ACK'ed when written to the primary disk and
> in the memory (not disk) of the secondary
>
> Apparently most people run it in fully synchronous mode over a fast
> LAN. Provided we could get DRBD to flush outstanding updates and
> guarantee that the two block devices are identical during the
> migration downtime when the domain is shutdown, I guess we could use
> any of these methods. Although if fully synchronous is the most common
> option, we may want to stick with that?
Are we still talking about storage migration, or mirroring
applications?
Well, these semantics only really make sense for disk pair starting from
mirrored state. So what are the semantics for a pair which has just been
created? Does DRBD default to anything while performing an initial synch
at start of day?
Ideally, it would stay asynchronous and wait for WWS convergence.
Synchronnous mode in that state doesn't buy you anything, it's just
going to produce seek overhead. Durability is only useful if you
actually have something consistent to switch over to in the failure
case.
The normal way of doing linear passes through a bitmap (what memory
migration does) makes a lot of sense for storage, because it's naturally
elevating through the block list. I terms of DRBD consistency
guarantees, that's fully asynchronous.
Until stop/copy is reached, the question is whether you're converging
smoothly. With a sane network, sane guest and local storage it typically
will.
A migration smoke test usually comprises a diabolic workload to prove
correctness under worst case scenarios. Does DRBD use a transfer block
size above sector size? Almost certainly yes. I'd suggest two cases,
random and linear writes, small sizes at a stride which equals block
size, and see what happens. It will likely need to throttle guest
throughput.
In tapdisk, we've got some work on rate limiting on trunk now, it might
fit in.
If you don't want to deal with enforcing eventual termination, yeah, I
guess a mkfs, as George suggests, is a decent scenario too.
Daniel
> > Anyway, it's not exactly a rainy weekend project, so if you want
> > consistent mirroring, there doesn't seem to be anything better than
> > DRBD
> > around the corner.
>
> It did rain this weekend :) So I've half-written a python module for
> configuring and controlling DRBD:
>
> https://github.com/djs55/drbd-manager
>
> It'll be interesting to see how this performs in practice. For some realistic
> workloads I'd quite like to measure
> 1. total migration time
> 2. total migration downtime
> 3. ... effect on the guest during migration (somehow)
>
> For (3) I would expect that continuous replication would slow down guest I/O
> more during the migrate than explicit snapshot/copy (as if every I/O
> performed a "mini snapshot/copy") but it would probably improve the downtime
> (2), since there would be no final disk copy.
>
> What would you recommend for workloads / measurements?
>
> > In summary, my point is that it's probably better to focus on migration
> > only - it's one flat dirty log index and works in-situ at the block
> > level. Beyond, I think it's perfectly legal to implement mirroring
> > independently -- the math is very similar, but the difference make for
> > huge impact on performance, I/O overhead, space to be set aside, and
> > robustness.
>
> Thanks,
> Dave
>
> >
> > Cheers,
> > Daniel
> >
> > [PS: comments/corrections welcome, indeed].
> >
> > > 3. use the VM metadata export/import to move the VM metadata between
> > pools
> > >
> > > I'd also like to
> > > * make the migration code unit-testable (so I can test the failure
> > paths easily)
> > > * make the code more robust to host failures by host heartbeating
> > > * make migrate properly cancellable
> > >
> > > I've started making a prototype-- so far I've written a simple python
> > wrapper around the iscsi target daemon:
> > >
> > > https://github.com/djs55/iscsi-target-manager
> > >
> > > _______________________________________________
> > > xen-api mailing list
> > > xen-api@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/mailman/listinfo/xen-api
> >
>
_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api
|