xen-users
[Xen-users] HA Xen on 2 servers!! No NFS, special hardware, DRBD or iSCS
I've been brainstorming...
I want to create a 2-node HA active/active cluster (In other words I want to
run a handful of
DomUs on one node and a handful on another). In the event of a failure I want
all DomUs to fail
over to the other node and start working immediately. I want absolutely no
single-points-of-failure. I want to do it with free software and no special
hardware. I want to
do it with just 2 servers. I want it to be pretty simple to set up. And I
would like to take
advantage Xen's live migration feature for maintenance.
Not asking much, am I? ;-)
This topic was discussed earlier this year (read the "Xen and iSCSI" thread on
the list archives)
and the best solution I saw was to create a poor-man's SAN using DRBD,
Heartbeat, GFS and iSCSI on
two nodes and then run Xen on two MORE nodes. Four nodes -- and quite a bit of
complication, and
I wonder what the performance was like?
I think I know how to do it with 2 nodes and (I think) it'll be less
complicated and perform
better (I think).
I haven't tried it, neither do I have any experience with AoE, LVM, etc. Only
basic experience
with Xen. But I think it should work.
Check it out:
* Get 2 computers with 2 NICs
* Install your favorite Linux distro on each
* Partition the drive into 4 partitions. Two are Dom0 OS and swap, two
unformatted.
* Connect the intra-node NICs with a crossover/switch/hub (Didja know that
gigabit NICs are
auto-MDIX? No crossover needed!)
* Configure the intra-node IPs to something like 10.0.0.1 and 10.0.0.2 or
192.168...
* Install Xen
* Install ATA over Ethernet and VBlade in Dom0
* Node1: vblade 0 1 /dev/hda3 eth1 # This is one of the unformatted partitions
* Node2: vblade 0 2 /dev/hda4 eth1 # The other
# Or use vbladed but I don't know how yet
* modprobe aoe on each node
* Install LVM on both nodes (in Dom0)
* Create two volume groups on each node:
Node1: vgcreate DomU1hda /dev/hda3
Node1: vgcreate DomU1hdb /dev/etherd/e0.2 # The AoE-exported device from the
other node
Node2: vgcreate DomU2hda /dev/hda4
Node2: vgcreate DomU1hdb /dev/etherd/e0.1
* Create logical volumes:
Node1: lvcreate -n hda DomU1hda # Or do you create partitions here? lvcreate
-n hda1?
Node1: lvcreate -n hdb DomU1hdb
Node2: lvcreate -n hda DomU1hda
Node2: lvcreate -n hdb DomU1hdb
* fdisk /dev/DomU1hda/hda # Create two partitions for DomU1, OS and swap
* mkfs.ext3 /dev/DomU1hda/hda1 # Repeat for DomU1hdb and DomU2hdX
* mkswap /dev/DomU1hda/hda2 # Repeat for
DomU1hdb and DomU2hdX
* Create a Xen DomU on each node with this configuration:
Node1 DomU1:
disk = [ 'phy:DomU1hda/hda1,hda1,w' ]
disk = [ 'phy:DomU1hdb/hdb1,hda1,w' ]
disk = [ 'phy:DomU1hda/hda2,hda2,w' ]
disk = [ 'phy:DomU1hdb/hdb2,hda2,w' ]
Node2 DomU2:
disk = [ 'phy:DomU2hda/hda1,hda1,w' ]
disk = [ 'phy:DomU2hdb/hdb1,hda1,w' ]
disk = [ 'phy:DomU2hda/hda2,hda2,w' ]
disk = [ 'phy:DomU2hdb/hdb2,hda2,w' ]
* Install the DomU OSes
* (Important part) Mirror the OSes using software RAID
* Install Heartbeat on both nodes in Dom0, ensure the Xen script uses live
migration when failing
over gracefully
* Run DomU1 on Node1, DomU2 on Node2
Result:
[ DomU1 ] [ DomU2 ]
/ \ / \
[ hda ] [ hdb ] [ hda ] [ hdb ]
\ / \ /
[ LVM ] [ LVM ]
/ \ |
[ Real HD ] [ AoE HD ]<--[ Real HD ] |
___|
/ |
[ Real HD ]-->[ AoE HD ] [ Real HD ]
[ Node1 ] [ Node2 ]
After a failure or during maintenance:
[ DomU1 ] [ DomU2 ]
/ /
[ hda ] [ hda ]
\ \
[ LVM ] [ LVM ]
/ /
[ Real HD ] /
/
/
[ Real HD ]
[ Node1 ]
(ASCII art shore is purdy, Sam...)
LVM is not just a nice thing in this case, it is a necessity! In addition to
being able to resize
the DomU's partitions on the fly, it adds a layer of obfuscation so that Xen is
presented with the
same device name on both nodes. I understand this is critical during a live
migration and appears
to be the reason for going with iSCSI or NFS.
The key is to use software mirroring within the DomU OSes. I thought about
using DRBD alone but
that doesn't allow live migration. Works great if you do a regular
suspend-to-disk migration but
not live migration, because when you change a DRBD device from secondary to
primary you must
umount it (so you'd have to stop the DomU).
Mirroring also allows the DomU OS to restart if the host node it's on crashes
because the data
should be consistent. It also allows a DomU to keep operating -- no downtime
-- if the other node
(the node it's not running on) crashes.
Finally, AoE is painless to set up. But I don't see why AoE could not be
replaced with iSCSI if
it's not working right.
I have three question marks in my head:
1.) How will it perform?
2.) Does it work? Someone want to beat me to the punch and try it themselves?
It's likely to be
a little while before I can find the time to try it.
3.) Is it reliable? Should be; AoE is relatively new but very simple. LVM is
well-tested and
software mirroring is as old as the hills.
Thoughts?
CD
TenThousandDollarOffer.com
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-users] HA Xen on 2 servers!! No NFS, special hardware, DRBD or iSCSI...,
Chris de Vidal <=
|
|
|