[Xen-users] apparent issues with Xen and PVFS2

Good morning,

I have been tinkering with Xen for the past couple months, looking touse it as the base of our new cluster infrastructure. So far I've hadsignificant success with it, performance is great, live migrations areawesome.

However, as I've continued to get the Xen0 infrastructure set up theway I would prefer, I've finally run into a wall that I can't seem toget by, so I am tossing out a plea for assistance here on the xen-userslist. It seems as if the visible error I am receiving has been discussedbefore on the list, but perhaps not to the satisfactory resolution toall involved.


 That's right, I have the infamous:

        Error: Device 769 (vbd) could not be connected. Backend device
        not found.

 error. Woohoo.

A little bit about my cluster setup... a bunch (currently 12, but moreon the way) of Dell OptiPlex GX240s (2GB of RAM each, currentlyallocating ~64MB for each Xen0, and have more than enough allocated toeach XenU, but not all, so I can double-and-triple-up XenUs when I amtesting stuff), 2.4GHz P4 CPUs, and hard disks in each node ranging from30GB - 40GB in capacity. Gig-E interconnect amongst all nodes.

I'm using Xen 3.0.1, as that is the version I've had the most successwith. I am now working with a custom-compile set of 2.6.12.6 3.0.1Xen0/XenU kernels as I needed to enable stuff like kernel-based NFS.Debian Testing is my base distro.

My working setup utilizes an NFS mount which contains all the imagesand config files I use in Xen... saves and live migrations also gothrough here.

And all works fine and dandy. In fact the current NFS server islocated on a machine with only 100Mb ethernet, and I have been veryimpressed with the overall responsiveness when migrating.

My problem has cropped up when I decided I wanted to try and dosomething about some of the unused disk space on each of my Xen0machines (I'm only using 6GB root + 2GB swap, out of ~40GB on eachmachine, as I wanted to play around with some fancy network accessiblestorage solutions), so I allocated the remaining 20-30GB on each machineto a partition, formatted it, and then proceeded to setup PVFS2 (version1.5.1).. I seemed to get that up without a hitch... 4 I/O servers, 4metadata servers (on the same machines as the I/O servers), and balancedout clients across all my machines (currently about 12).

I'm using the 2.6 kernel PVFS2 module so I can have it mounted justlike a real filesystem, so I can use regular utilities and whatnot. I'vegot a nice 110GB block of space via my PVFS2 mount point, and thought itwould be neat to see how well my Xen operations would work out of thePVFS2 storage vs. the NFS storage. So I copied the necessary files over,updated my Xen config .sxp files, and gave it a go. That's when I firstgot that dreaded error.

As for current debugging efforts... I've gone ahead and allocated upto 128 loopback devices, as was the popular suggestion in the thread Ifound on this list. (loop_max=128 on the appropriate line in grub). A"dmesg | grep loop" indicates this was successful, and /dev/ lists 128loop devices.


 However, does not fix my problem. The error persists.

I also tried changing how Xen looks for my XenU images... in theconfig file (as they work via NFS), I access the disks via the "file:"schema... I have seen a "phy:" schema which I tried, and got a slightlydifferent error message, saying something to the effect of "it isalready mounted, I can't do it again".

So I went in and put in "w!" for the access options, instead of theregular "w" option that was there. This actually got the kernelbooting... however, the attempt was in vain because it could not locatethe root filesystem (so it really didn't do much for me aside frombypassing the first error).

ALSO: If I go and try the old-and-working-NFS-style XenU guestcreation, it will also spit back the unable to find backend, good olderror 769.

I have found that if I turn off the PVFS2 client on the Xen0 host,that NFS then seems to work again. So this seems to indicate PVFS2 isdoing something.. what, is a good question, but it is doing something.

Are there other schemas I could be using? Since I'm running PVFS2 overgig-E, I'm using the TCP transport, default port 3334... does Xen havenetwork-based file-access schemas?


 Any suggestions or things you all think I should try?

 Thank you for any pointers you can throw my way.

-Matthew
--
 Matthew Haas
 SUNY Geneseo
 Distributed Systems Lab

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

WARNING - OLD ARCHIVES

xen-users

[Xen-users] apparent issues with Xen and PVFS2