Hello all. I'm trying to figure out vastSky on xcp1.0 beta. As we all know, it has been integrated to the xcp. That's just about all one can find about the matter. Been trying to google a lot, but with no luck. I'll write here what information I've gathered, what I tried and how far I managed to get with this.
I include information about my hardware, in case it has something to do with all this. I have one four node SuperMicro twin2 server (2026TT-HiBQRF) with QDR InfiniBand (haven't bought a switch or managed to get drivers for dom0's yet, so it's gigabit ethernet for now). Each node is identical, containing: 1x Intel Xeon E5620, 12GB ddr3, 3x 60GB OCZ Vertex2 ssd and 3x 500 GB Seagate Momentus 7200.4 SATA 2.5". No raid cards, just the onboard ICH10.
Networking configuration:
node A: hostname: super0nodeA ip: 192.168.10.210
node B: hostname: super0nodeB ip: 192.168.10.211
node C: hostname: super0nodeC ip: 192.168.10.212
node D: hostname: super0nodeD ip: 192.168.10.213
I have bonded two interfaces on each node, have only one gigabit switch and haven't done any multipath configurations.
My plan was to use super0nodeA as Storage Manager and super0nodeB, super0nodeC, super0nodeD as storage servers but ended up installing storage and head server on super0nodeA also.
Install manual isn't xcp specific, actually it only references to xcp couple of times but it seem's pretty straight forward when it comes to config's. Someone at ##xen-api clarified what thing's I need to install. I mean there actually is /etc/vas.conf on stock xcp 1.0 beta but one still need's to install the needed rpm's to get the functionality.
So, as (also) stated in the installation document one needs (taken from the vas_install.txt): <start copy paste> vastsky-common.rpm Common library and configuration
vastsky-hsvr.rpm Head server agent vastsky-ssvr.rpm Storage server agent vastsky-sm.rpm Storage manager vastsky-cli.rpm Storage manager command-line clients
vastsky-doc.rpm Documentations (including this file)
Basically, - -common package is required by other packages. - Head servers need -hsvr package.
- Storage servers need -ssvr package. - The storage manager needs and -sm package. - The host on which you want to run user commands needs -cli package.
<end copy paste>
Everything I did, I did on dom0 of each server, actually I had no domU's on these servers when I did all this.
So, first I edited "/etc/vas.conf" that exist on all four nodes, inserted ip for "super0nodeA". It says "Comma separated list of hosts on which storage manager runs" but I remember reading somewhere, that there can only be one instance of it. Maybe one can define multiple ip's on a single host. I didn't find anything else to modify in "/etc/vas.conf".
<part of vas.conf> [storage_manager]
# host_list:
# Comma separated list of hosts on which storage manager runs. host_list: 192.168.10.210 </part of vas.conf>
Then I created "/var/lib/vas/register_device_list" on each node. Added disk's, following the instructions on vas_install.txt. I configured one ssd disk and one hdd on each node. Actually, first I added this to nodes B, C and D, but later on, I added this to A also.
I didn't modify "/etc/multipath.conf" since vas_install.txt states "This step is not necessary if you solely use our XCP SR driver". Also I didn't modify "/etc/hosts", since I used IP address instead of host name in "/etc/vas.conf" and haven't found any where else to insert host names or ip addresses.
Then after multiple reboot's and plenty of googling, I went to #xen and #xen-api to ask some help. I was told that I need to install the rpm's. It was "ahaa" moment and explained nicely why I didn't have cli commands availeable or "/etc/init.d" script's for the vastSky servers. So I did "rpm -i vastsky-hsvr.rpm" and "rpm -i vastsky-ssvr.rpm" on all nodes. I also did "rpm -i vastsky-sm.rpm" and "rpm -i vastsky-cli.rpm" on " super0nodeA". vastsky-common.rpm is already installed on "stock" xcp 1.0 beta and it is vastSky 2.1, so all the rpm's I installed, were from 2.1, not 3.0 that seem's to be the newest version availeable at: http://sourceforge.net/projects/vastsky/files/vastsky/
Then I did "/etc/init.d/vas_sm init" and "/etc/init.d/vas_sm start" on "super0nodeA". Seemed like I was on fire. Finally I had some processe's running that I was pretty comfortable thinking had something to do with vastSky. Finaly I had commands working like:
- hsvr_list "list head servers" - ssvr_list "list storage servers" - pdsk_list "list physical disks"
Tho no resources present, even after I issued "/etc/init.d/vas_hsvr start" and "/etc/init.d/vas_ssvr start" on nodes "super0nodeB", "super0nodeC" and "super0nodeD". I knew that these services started since "ps -aux | grep vas" told me so and also because I started getting lines on "/var/log/vas_<host name>.log" (not sure if that is correct but the log files can be found at "/var/log", there is only one starting with "vas" there and it is similar to what i wrote).
This is when I started thinking if the problem migth be network related. So I installed vastsky-hsvr.rpm and vastsky-ssvr.rpm to super0nodeA and started them. I also modified my "/etc/hosts" and added:
192.168.10.210 super0nodeA super0nodeA-data1 super0nodeA-data2
192.168.10.211 super0nodeB super0nodeB-data1 super0nodeB-data2
192.168.10.212 super0nodeC super0nodeC-data1 super0nodeC-data2
192.168.10.213 super0nodeD super0nodeD-data1 super0nodeD-data2
I did this to all nodes.
This is when I finally had something come out of "storage manager". If I did hsvr_list, ssvr_list or pdsk_list, they all printed one resource, and it was the same that was on "super0nodeA", where the storage manager was also running. So still no connections from other nodes, even if I rebooted all nodes.
After re-re-re-re-checking all the config's I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr stop" and "/etc/init.d/vas_ssvr start" on "super0nodeA". About 5s after I started vas_ssvr I observed my server shutting down. Tried to start it, just to see it shut it self again just after the loading screen with panda on it. Just a text saying something about stunnel and bunch of numbers on top of the screen. Well I taught it was something I did, so I re-installed xcp.
While I was reinstallin xcp to node A, I started to think that my problem might be node A, so I installed vastsky-cli.rpm and vastsky-sm.rpm to "super0nodeB", modified (changed the "host_list: 192.168.10.210" to 192.168.10.211) "/etc/vas.conf" on node B, C and D. Again, I had connections from head and storage servers, but only from local ones. Still no connections from nodes C or D.
I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr stop" and "/etc/init.d/vas_ssvr start" on node B and again, server started shutting it self down. This time I had another ssh session where I had "tail -f /var/log/vas_super0nodeB.log" so even if the server shutted it self down, I was able to copy paste the content of the screen:
<start of log> 2010-12-19 15:47:59,435 ssvr_reporter DEBUG /opt/vas/bin/daemon_launcher -n 1 /opt/vas/bin/DiskPatroller /var/run/DiskPatroller.run 2010-12-19 15:47:59,443 storage_manager INFO DISPATCH registerStorageServer called. ({'ip_data': ['192.168.10.211', '192.168.10.211'], 'ver': 3},)
2010-12-19 15:47:59,444 storage_manager INFO DISPATCH registerStorageServer EXCEPTION <Fault 17: 'EEXIST'> 2010-12-19 15:47:59,445 ssvr_reporter ERROR shutdown 2010-12-19 15:47:59,500 ssvr_reporter DEBUG shutdown -g0 -h now
2010-12-19 15:47:59,501 ssvr_reporter ERROR Traceback (most recent call last): File "ssvr_reporter.py", line 231, in main File "ssvr_reporter.py", line 100, in register_resources
File "vas_subr.py", line 68, in send_request File "/usr/lib/python2.4/xmlrpclib.py", line 1096, in __call__ return self.__send(self.__name, args) File "/usr/lib/python2.4/xmlrpclib.py", line 1383, in __request
verbose=self.__verbose File "/usr/lib/python2.4/xmlrpclib.py", line 1147, in request return self._parse_response(h.getfile(), sock) File "/usr/lib/python2.4/xmlrpclib.py", line 1286, in _parse_response
return u.close() File "/usr/lib/python2.4/xmlrpclib.py", line 744, in close raise Fault(**self._stack[0]) Fault: <Fault 17: 'EEXIST'>
2010-12-19 15:48:00,337 storage_manager DEBUG RW.__send_request ('192.168.10.211', '192.168.10.211') 8883 registerShredRequest {'dextid': 4, 'capacity': 465, 'pdskid': 3, 'ver': 3, 'offset': 0}
2010-12-19 15:48:00,338 storage_manager DEBUG RW.__send_request ('192.168.10.211', '192.168.10.211') 8883 registerShredRequest {'dextid': 2, 'capacity': 55, 'pdskid': 2, 'ver': 3, 'offset': 0}
2010-12-19 15:48:00,340 ssvr_agent INFO DISPATCH registerShredRequest called. ({'dextid': 4, 'ver': 3, 'pdskid': 3, 'capacity': 465, 'offset': 0},) 2010-12-19 15:48:00,342 ssvr_agent INFO DISPATCH registerShredRequest called. ({'dextid': 2, 'ver': 3, 'pdskid': 2, 'capacity': 55, 'offset': 0},)
2010-12-19 15:48:00,343 ssvr_agent INFO false [Status 256] 2010-12-19 15:48:00,343 ssvr_agent INFO retrying(1/16) ... 2010-12-19 15:48:00,345 ssvr_agent INFO false [Status 256] 2010-12-19 15:48:00,345 ssvr_agent INFO retrying(1/16) ...
<end of log>
Notice: "2010-12-19 15:47:59,500 ssvr_reporter DEBUG shutdown -g0 -h now"
I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr stop" and "/etc/init.d/vas_ssvr start" on node C also and exactly the same happened. Server shut it self down and cant be started. Same stunnel... error.
This is how far I got before I stopped trying. Hope this helps someone else. I would also welcome input if some one has something to say.
-Henrik Andersson
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|