This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-users] Re: Xen, LVM, DRBD, Linux-HA

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-users] Re: Xen, LVM, DRBD, Linux-HA
From: Steve Wray <steve.wray@xxxxxxxxx>
Date: Wed, 23 Apr 2008 08:19:19 +1200
Delivery-date: Tue, 22 Apr 2008 13:19:59 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <m3skxdubfw.fsf@xxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <20080421222832.GW26447@xxxxxxxxxxxxxxxxxxxxxxx> <fukbc1$ut0$1@xxxxxxxxxxxxx> <480DB90C.4010202@xxxxxxxxx> <m3skxdubfw.fsf@xxxxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird (Macintosh/20080213)
I am going to reply to this thread but I'm going to start from something new as it doesn't seem to be covered in this thread so far.

I have been testing drbd under Xen and found some very disturbing things.

I'd like to implement this in a production system but this scares the hell out of me...

I have two Dom0 servers connected with a crossover cable between two gigabit e1000 NICs. No switch involved.

One DomU on each server with a 20G drbd device shared between them.

The drbd config contains:

  syncer {
    rate 10M;
    group 1;
    al-extents 257;

  net {
    on-disconnect reconnect;

so the net section is working at defaults. At first I had thought that the problems I was seeing was due to timeout values etc and tried various parameters in the net section but nothing made any difference.

When, on the current secondary node, I execute

drbdadm invalidate all

I get frequent errors such as:

drbd0: PingAck did not arrive in time.
drbd0: drbd0_asender [1572]: cstate SyncSource --> NetworkFailure
drbd0: asender terminated
drbd0: drbd_send_block() failed
drbd0: drbd0_receiver [1562]: cstate NetworkFailure --> BrokenPipe
drbd0: short read expecting header on sock: r=-512
drbd0: worker terminated
drbd0: ASSERT( mdev->ee_in_use == 0 ) in /usr/src/modules/drbd/drbd/drbd_receiver.c:1880
drbd0: drbd0_receiver [1562]: cstate BrokenPipe --> Unconnected
drbd0: Connection lost.

I observe the xm top in both Dom0's and I note a HUGE amount of dropped RX packets being reported on both DomU's vif interfaces. The dropping of RX packets is continuous throughout the drbd resync and grows extremely large.

The ifconfig output within the DomU's do not show any dropped packets.

I have used iperf to test the performance of the crossover link and it is fine when there is no drbd syncing going on.

I have tried various things such as setting sysctl.conf options:


net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

but so far the only thing that prevents the "PingAck did not arrive in time" errors is to take the sync rate down to 1M.

My Xen version info is:

Xen version 3.0.3-1 (Debian 3.0.3-0-4)

Please advise...


Xen-users mailing list