[Xen-devel] Re: txenmon: Cluster monitoring/management

On Tue, Feb 10, 2004 at 08:06:25AM +0000, Ian Pratt wrote:
> > On Sun, Feb 08, 2004 at 09:19:56AM +0000, Ian Pratt wrote:
> > > Of course, this will all be much neater in rev 3 of the domain
> > > control tools that will use a db backend to maintain state about
> > > currently running domains across a cluster...
> > 
> > Ack!  We might be doing duplicate work.  How far have you gotten with
> > this?
> 
> We haven't even started, but have been thinking about the design,
> and what the schema for the database show be etc.

When you say "database", do you mean "an independent sqlite running in
each dom0", or do you mean "a central SQL server running somewhere on a
dedicated machine"?  (See further down for why I ask.)

As far as schema goes, the things I've needed to track so far are these
"control" items, referenced in the guest.ctl() calls in txenmon:

    domid
    gw
    host
    ip
    kernel
    mem
    run
    swap
    vbds

...and I'm considering adding a 'reboot' boolean.  I also track several
runtime state items as attributes of the Guest class -- the whole object
is saved as a pickle, so see __init__ for a list of them.  

The NFS export directory tree looks something like this:

    /export/xen/fs/stevegt
    /export/xen/fs/stevegt/tcx
    /export/xen/fs/stevegt/tcx/root
    /export/xen/fs/stevegt/tcx/ctl
    /export/xen/fs/stevegt/tcx/log
    /export/xen/fs/stevegt/xentest1
    /export/xen/fs/stevegt/xentest1/root
    /export/xen/fs/stevegt/xentest1/log
    /export/xen/fs/stevegt/xentest1/ctl
    /export/xen/fs/stevegt/xentest2
    /export/xen/fs/stevegt/xentest2/root
    /export/xen/fs/stevegt/xentest2/log
    /export/xen/fs/stevegt/xentest2/ctl
    /export/xen/fs/stevegt/crashme1
    /export/xen/fs/stevegt/crashme1/root
    /export/xen/fs/stevegt/crashme1/ctl
    /export/xen/fs/stevegt/crashme1/log

...where 'stevegt' is a user who owns one or more virtual domains, and
'xentest1' is the hostname of a virtual domain.  Those control items I
mentioned above go in individual files (qmail style) under ./ctl, and
the python pickle for each virtual domain is saved as ./log/pickle.  The
root partition for each domain is under ./root.  Here's what the
contents of ./ctl look like for a given guest:

    nfs1:/export/xen# ls -l /export/xen/fs/stevegt/tcx/ctl
    total 32
    -rw-r--r--    1 root     root            3 Feb  8 20:57 domid
    -rw-r--r--    1 root     root           12 Feb  5 22:51 gw
    -rw-r--r--    1 root     root            6 Feb  9 21:56 host
    -rw-r--r--    1 root     root           13 Feb  8 20:57 ip
    -rw-r--r--    1 root     root           30 Feb  5 22:52 kernel
    -rw-r--r--    1 root     root            4 Feb  9 17:47 mem
    -rw-r--r--    1 root     root            2 Feb  9 21:56 run
    -rw-r--r--    1 root     root           14 Feb  5 22:53 swap
    -rw-r--r--    1 root     root            0 Feb  5 22:52 vbds

Because these are individual files, this makes it easy to say, for
instance, 'echo 0 > run' from a shell prompt to cause a domain to shut
down, or 'echo node43 > host' to cause it to move to a different node.

I considered using the sqlite db for these things; I didn't do that (1)
because this was faster to implement and easier to access from the
command line, and (2) I didn't want to cause future schema conflicts
with whatever you were going to do.

                                 * * * 

Having said all this, I'm less worried about schema and more worried
about single points of failure.  Right now txenmon runs in domain 0 on
each node, and the data store is distributed as above.  This gives me a
dependence on the central NFS server staying up, but an NFS server is a
relatively simple thing, it can be HA'd, backed up easily, and will tend
to have uptimes in the hundreds of days anyway as long as you leave it
alone.

If these data items were to move into a "real" database server instead,
say a central mysql or postgresql server, than I'd worry more; database
servers aren't as easy to keep available for hundreds of days without
interruption.  (See http://Infrastructures.Org for more of my
perspective on this.)

I'm moving in the direction of keeping some sort of distributed data
store, like those flat files and python pickles, (or use the sqlite on
each dom0?) which can be cached on local disk in each dom0, and then use
something like UDP broadcast (simple) or XMPP/jabber (less simple) as a
peer-to-peer communications mechanism, to keep the caches synced.

My goal here is to be able to walk into a Xen data center and destroy
any random machine without impacting any user for more than a few
minutes.  (See http://www.infrastructures.org/bootstrap/recovery.shtml).

To this end, I'm curious what people's thoughts are on backups and
real-time replication of virtual disks -- I'm only using them for swap
right now, because of these issues.

                                 * * * 

> Cool! It's always a nice surprise to find out what work is
> going on by people on the list. 

As I said last night, you have me full time right now.  ;-)  My wife and
I are launching a commercial service based on Xen (we were evaluating
UML).  I have until the end of March.  If enough revenue is flowing by
then, then you get to keep me.  If not, then "the boss" will tell me to
put myself back on the consulting market.

Nothing like a little pressure.  ;-)

> You might want to try repulling 1.2 and trying the newer versions
> of the tools which are a bit more user friendly.

My most recent pull was a week ago; this got me xc_dom_control and
xc_vd_tool.  I'll likely do another pull this week.  We already have one
production customer (woo hoo!), so I am trying to limit upgrades/reboots
for them.

> Great, we'd love to see stuff like this in the tree.

Would it help if I exposed a bk repository you could pull from, or how
do you want to do this?

Steve
-- 
Stephen G. Traugott  (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@xxxxxxxxxxxxx 
http://www.stevegt.com -- http://Infrastructures.Org 


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel
WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: txenmon: Cluster monitoring/management