WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] HA Xen on 2 servers!! No NFS, special hardware, DRBD or iSCS

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] HA Xen on 2 servers!! No NFS, special hardware, DRBD or iSCSI...
From: Chris de Vidal <chris@xxxxxxxxxx>
Date: Tue, 6 Jun 2006 21:10:20 -0700 (PDT)
Delivery-date: Tue, 06 Jun 2006 21:10:58 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Reply-to: chris@xxxxxxxxxx
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
I've been brainstorming...

I want to create a 2-node HA active/active cluster (In other words I want to 
run a handful of
DomUs on one node and a handful on another).  In the event of a failure I want 
all DomUs to fail
over to the other node and start working immediately.  I want absolutely no
single-points-of-failure.  I want to do it with free software and no special 
hardware.  I want to
do it with just 2 servers.  I want it to be pretty simple to set up.  And I 
would like to take
advantage Xen's live migration feature for maintenance.

Not asking much, am I?  ;-)

This topic was discussed earlier this year (read the "Xen and iSCSI" thread on 
the list archives)
and the best solution I saw was to create a poor-man's SAN using DRBD, 
Heartbeat, GFS and iSCSI on
two nodes and then run Xen on two MORE nodes.  Four nodes -- and quite a bit of 
complication, and
I wonder what the performance was like?

I think I know how to do it with 2 nodes and (I think) it'll be less 
complicated and perform
better (I think).

I haven't tried it, neither do I have any experience with AoE, LVM, etc.  Only 
basic experience
with Xen.  But I think it should work.

Check it out:
* Get 2 computers with 2 NICs
* Install your favorite Linux distro on each
* Partition the drive into 4 partitions.  Two are Dom0 OS and swap, two 
unformatted.
* Connect the intra-node NICs with a crossover/switch/hub (Didja know that 
gigabit NICs are
auto-MDIX?  No crossover needed!)
* Configure the intra-node IPs to something like 10.0.0.1 and 10.0.0.2 or 
192.168...
* Install Xen
* Install ATA over Ethernet and VBlade in Dom0
* Node1: vblade 0 1 /dev/hda3 eth1 # This is one of the unformatted partitions
* Node2: vblade 0 2 /dev/hda4 eth1 # The other
  # Or use vbladed but I don't know how yet
* modprobe aoe on each node
* Install LVM on both nodes (in Dom0)
* Create two volume groups on each node:
  Node1: vgcreate DomU1hda /dev/hda3
  Node1: vgcreate DomU1hdb /dev/etherd/e0.2 # The AoE-exported device from the 
other node
  Node2: vgcreate DomU2hda /dev/hda4
  Node2: vgcreate DomU1hdb /dev/etherd/e0.1
* Create logical volumes:
  Node1: lvcreate -n hda DomU1hda # Or do you create partitions here?  lvcreate 
-n hda1?
  Node1: lvcreate -n hdb DomU1hdb
  Node2: lvcreate -n hda DomU1hda
  Node2: lvcreate -n hdb DomU1hdb
* fdisk /dev/DomU1hda/hda # Create two partitions for DomU1, OS and swap
* mkfs.ext3 /dev/DomU1hda/hda1 # Repeat for DomU1hdb and DomU2hdX
* mkswap /dev/DomU1hda/hda2    # Repeat for 
DomU1hdb and DomU2hdX
* Create a Xen DomU on each node with this configuration:
  Node1 DomU1:
     disk = [ 'phy:DomU1hda/hda1,hda1,w' ]
     disk = [ 'phy:DomU1hdb/hdb1,hda1,w' ]
     disk = [ 'phy:DomU1hda/hda2,hda2,w' ]
     disk = [ 'phy:DomU1hdb/hdb2,hda2,w' ]
  Node2 DomU2:
     disk = [ 'phy:DomU2hda/hda1,hda1,w' ]
     disk = [ 'phy:DomU2hdb/hdb1,hda1,w' ]
     disk = [ 'phy:DomU2hda/hda2,hda2,w' ]
     disk = [ 'phy:DomU2hdb/hdb2,hda2,w' ]
* Install the DomU OSes
* (Important part) Mirror the OSes using software RAID
* Install Heartbeat on both nodes in Dom0, ensure the Xen script uses live 
migration when failing
over gracefully
* Run DomU1 on Node1, DomU2 on Node2

Result:
[       DomU1       ]   [       DomU2       ]
       /    \                  /    \
 [ hda ]     [ hdb ]     [ hda ]     [ hdb ]
       \    /                  \    /
 [       LVM       ]     [       LVM       ]
         /  \                         |    
[ Real HD ] [ AoE HD ]<--[ Real HD ]  |
                                   ___|
                                  /   |
           [ Real HD ]-->[ AoE HD ] [ Real HD ]
[       Node1        ]   [      Node2      ]



After a failure or during maintenance:
[       DomU1       ]   [       DomU2       ]
       /                       /
 [ hda ]                 [ hda ]
       \                        \
 [       LVM       ]     [       LVM       ]
         /               /
[ Real HD ]             /
                       /
                      /
           [ Real HD ]
[       Node1        ]



(ASCII art shore is purdy, Sam...)



LVM is not just a nice thing in this case, it is a necessity!  In addition to 
being able to resize
the DomU's partitions on the fly, it adds a layer of obfuscation so that Xen is 
presented with the
same device name on both nodes.  I understand this is critical during a live 
migration and appears
to be the reason for going with iSCSI or NFS.

The key is to use software mirroring within the DomU OSes.  I thought about 
using DRBD alone but
that doesn't allow live migration.  Works great if you do a regular 
suspend-to-disk migration but
not live migration, because when you change a DRBD device from secondary to 
primary you must
umount it (so you'd have to stop the DomU).

Mirroring also allows the DomU OS to restart if the host node it's on crashes 
because the data
should be consistent.  It also allows a DomU to keep operating -- no downtime 
-- if the other node
(the node it's not running on) crashes.

Finally, AoE is painless to set up.  But I don't see why AoE could not be 
replaced with iSCSI if
it's not working right.


I have three question marks in my head:
1.) How will it perform?
2.) Does it work?  Someone want to beat me to the punch and try it themselves?  
It's likely to be
a little while before I can find the time to try it.
3.) Is it reliable?  Should be; AoE is relatively new but very simple.  LVM is 
well-tested and
software mirroring is as old as the hills.


Thoughts?


CD

TenThousandDollarOffer.com

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users