[Xen-users] GFS on DomU: What block device should I use?

To:	<xen-users@xxxxxxxxxxxxxxxxxxx>
Subject:	[Xen-users] GFS on DomU: What block device should I use?
From:	Jeff Sturm <jeff.sturm@xxxxxxxxxx>
Date:	Mon, 1 Jun 2009 14:19:38 -0400
Delivery-date:	Tue, 02 Jun 2009 01:53:19 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-users-request@lists.xensource.com?subject=help>
List-id:	Xen user discussion <xen-users.lists.xensource.com>
List-post:	<mailto:xen-users@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	Acni5YYt/nw5cYqWSaG47BseU569rw==
Thread-topic:	GFS on DomU: What block device should I use?

In our Xen cluster, we have:

- Many DomU hosts (CentOS 5.2, paravirtualized) mounting a GFS filesystem on a VBD,

- A few Dom0 hosts (CentOS 5.2), connected over GigE,

- A single SAN providing shared block storage for all of the above.

Works great most of the time. The DomU storage is backed by logical volumes on the Dom0's, all part of a clustered VG on the SAN.

Once every few weeks however we experience FS corruption with kernel messages like:

May 28 09:30:26 r3core-roll03 kernel: GFS: fsid=r3core-inner:wwwdocs.1: fatal: invalid metadata block
May 28 09:30:26 r3core-roll03 kernel: GFS: fsid=r3core-inner:wwwdocs.1:   bh = 27845 (type: exp=4, found=9)
May 28 09:30:26 r3core-roll03 kernel: GFS: fsid=r3core-inner:wwwdocs.1:   function = gfs_get_meta_buffer
May 28 09:30:26 r3core-roll03 kernel: GFS: fsid=r3core-inner:wwwdocs.1:   file = /builddir/build/BUILD/gfs-kmod-0.1.23/_kmod_build_xen/src/gfs/dio.c, line = 1225
May 28 09:30:26 r3core-roll03 kernel: GFS: fsid=r3core-inner:wwwdocs.1:   time = 1243517426
May 28 09:30:27 r3core-roll03 kernel: GFS: fsid=r3core-inner:wwwdocs.1: about to withdraw from the cluster
May 28 09:30:27 r3core-roll03 kernel: GFS: fsid=r3core-inner:wwwdocs.1: telling LM to withdraw

The remedy is to shut down nodes accessing the shared FS, fsck and/or mkfs it, then start up again.

What is puzzling is the exact cause of the FS corruption. As we try to narrow it down, I've been forced to closely examine the block layers in Xen. While I don't fully understand (yet) what blkback is doing, I'm nervous the request queueing causes blocks to be flushed to disk asynchronously. That could be very bad for shared filesystems, as I'd expect a file's metadata blocks need to be written to physical media once a lock is released.

So I'm looking at blktap now. Most documentation suggests configuring VBDs with tap:aio:, however my reading of this suggests it can also reorder or defer block writes, which I'm trying to avoid. It looks like tap:sync: is what I really need, though very little documentation is available on that specific driver.

Surely somebody must have had this problem before, but a couple days of searchinig and reading have yielded very little. Or am I way off base in understanding the magic that is GFS and how it guarantees filesystem consistency? Help please?

-Jeff

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

WARNING - OLD ARCHIVES

xen-users

[Xen-users] GFS on DomU: What block device should I use?