WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] XCP NFS SR issues

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] XCP NFS SR issues
From: Ciaran Kendellen <ciaran@xxxxxxxxxxxxxxx>
Date: Thu, 01 Sep 2011 09:31:55 +0100
Delivery-date: Thu, 01 Sep 2011 01:35:02 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.9.2.17) Gecko/20110414 SUSE/3.1.10 Thunderbird/3.1.10
Hello all,

We have 2 x compute nodes running XCP and 2 x storage nodes with GlusterFS exposed via NFS to the compute nodes.

Roughly every week we're experiencing failure of VMs due to tapdisk errors like the below :

Aug 31 17:00:51 xen02 tapdisk[15678]: ERROR: errno -116 at vhd_complete: /var/run/sr-mount/0554f47c-1a12-9bd1-ea9b-d2b68984f0ed/75af4999-3447-4fa4-bdf0-b89580807a7c.vhd: op: 1, lsec: 3464605, secs: 8, nbytes: 4096, blk: 845, blk_offset: 2725095 Aug 31 17:00:51 xen02 tapdisk[15678]: ERROR: errno -116 at __tapdisk_vbd_complete_td_request: req 1: read 0x0008 secs to 0x0034dd9d

The standard fix is always to reboot the XCP host, start the VMs and repair filesystems, and so far, I'm happy/lucky to say they have repaired successfully and come back up. However, I believe it is only a matter of time before I experience an unrecoverable failure....

We have introduced NIC bonding on the storage nodes (All NICs are Gb), we have only ~8 VMs running on the compute nodes currently, but was reluctant to add NIC bonding to them as it means reconfiguring the VMs' VIFs etc.

Can anybody advise what fine tuning of NFS may have worked for you? Are there any "Usual suspects" I should be looking at here? We're currently using vanilla NFS out of the box...

The main reason we're using Gluster is so we can expose volumes greater than the size of a single physical disc and have built in redundancy. We achieved good direct I/O results on the storage nodes, hence why I'm suspecting NFS...

Any feedback would be very much appreciated - the powers that be *may* use this as an excuse to test Hyper-V as a solution which I want to avoid at all costs ;)

Many thanks in advance for any trouble taken.

Kind regards,

Ciaran Kendellen.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>