Good morning,
I have been tinkering with Xen for the past couple months, looking to
use it as the base of our new cluster infrastructure. So far I've had
significant success with it, performance is great, live migrations are
awesome.
However, as I've continued to get the Xen0 infrastructure set up the
way I would prefer, I've finally run into a wall that I can't seem to
get by, so I am tossing out a plea for assistance here on the xen-users
list. It seems as if the visible error I am receiving has been discussed
before on the list, but perhaps not to the satisfactory resolution to
all involved.
That's right, I have the infamous:
Error: Device 769 (vbd) could not be connected. Backend device
not found.
error. Woohoo.
A little bit about my cluster setup... a bunch (currently 12, but more
on the way) of Dell OptiPlex GX240s (2GB of RAM each, currently
allocating ~64MB for each Xen0, and have more than enough allocated to
each XenU, but not all, so I can double-and-triple-up XenUs when I am
testing stuff), 2.4GHz P4 CPUs, and hard disks in each node ranging from
30GB - 40GB in capacity. Gig-E interconnect amongst all nodes.
I'm using Xen 3.0.1, as that is the version I've had the most success
with. I am now working with a custom-compile set of 2.6.12.6 3.0.1
Xen0/XenU kernels as I needed to enable stuff like kernel-based NFS.
Debian Testing is my base distro.
My working setup utilizes an NFS mount which contains all the images
and config files I use in Xen... saves and live migrations also go
through here.
And all works fine and dandy. In fact the current NFS server is
located on a machine with only 100Mb ethernet, and I have been very
impressed with the overall responsiveness when migrating.
My problem has cropped up when I decided I wanted to try and do
something about some of the unused disk space on each of my Xen0
machines (I'm only using 6GB root + 2GB swap, out of ~40GB on each
machine, as I wanted to play around with some fancy network accessible
storage solutions), so I allocated the remaining 20-30GB on each machine
to a partition, formatted it, and then proceeded to setup PVFS2 (version
1.5.1).. I seemed to get that up without a hitch... 4 I/O servers, 4
metadata servers (on the same machines as the I/O servers), and balanced
out clients across all my machines (currently about 12).
I'm using the 2.6 kernel PVFS2 module so I can have it mounted just
like a real filesystem, so I can use regular utilities and whatnot. I've
got a nice 110GB block of space via my PVFS2 mount point, and thought it
would be neat to see how well my Xen operations would work out of the
PVFS2 storage vs. the NFS storage. So I copied the necessary files over,
updated my Xen config .sxp files, and gave it a go. That's when I first
got that dreaded error.
As for current debugging efforts... I've gone ahead and allocated up
to 128 loopback devices, as was the popular suggestion in the thread I
found on this list. (loop_max=128 on the appropriate line in grub). A
"dmesg | grep loop" indicates this was successful, and /dev/ lists 128
loop devices.
However, does not fix my problem. The error persists.
I also tried changing how Xen looks for my XenU images... in the
config file (as they work via NFS), I access the disks via the "file:"
schema... I have seen a "phy:" schema which I tried, and got a slightly
different error message, saying something to the effect of "it is
already mounted, I can't do it again".
So I went in and put in "w!" for the access options, instead of the
regular "w" option that was there. This actually got the kernel
booting... however, the attempt was in vain because it could not locate
the root filesystem (so it really didn't do much for me aside from
bypassing the first error).
ALSO: If I go and try the old-and-working-NFS-style XenU guest
creation, it will also spit back the unable to find backend, good old
error 769.
I have found that if I turn off the PVFS2 client on the Xen0 host,
that NFS then seems to work again. So this seems to indicate PVFS2 is
doing something.. what, is a good question, but it is doing something.
Are there other schemas I could be using? Since I'm running PVFS2 over
gig-E, I'm using the TCP transport, default port 3334... does Xen have
network-based file-access schemas?
Any suggestions or things you all think I should try?
Thank you for any pointers you can throw my way.
-Matthew
--
Matthew Haas
SUNY Geneseo
Distributed Systems Lab
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|