[UFO Chicago] syncing multiple workstations to one master

Mon Sep 5 13:14:32 PDT 2011

For me, there's a difference between "software management," and "systems management."  There may be other conventions, but this is my personal view of things, and there often are not clear lines of distinction.
Software management is *directly* tied to "software package management," (or more generally, "package management.").  Every major Linux, BSD and several UNIX distributions (e.g. Solaris, AIX) all use some sort of package management.  Every "program" or closely tied collection of programs, along with supporting files (documentation, sample configuration, etc) is bundled into a single package.  The idea is that everything in that package should be treated at install/uninstall time as single unit.  There's usually little the documentation for a program you don't have, and the sample configuration files are useless without something to actually parse them.
Let's say you want install the best email client on the planet:  Mutt.  The package for Mutt on my home system includes the /usr/bin/mutt binary, a basic configuration file, three other supporting programs (which all deal with encrypting email), translation files for 29 languages, and a host of documentation. 
Mutt does a lot of things, including connecting to email servers, handling encrypted email, keeping caches of read email files, etc.  Most of these tasks are handed by other libraries, and not by Mutt directly.  So in order to use Mutt, you need them installed, because Mutt is useless without them (or severely limited, at least).  
At the same time, many packages contain *only* library files that are used by other programs.  Things like libc, curses and openssl are like this, and have their own packages, even though they may not have any "programs" that are used directly.  If you remove the curses library package, Mutt (among other programs) will cease to work.  If you the glibc package, your entire system will probably come to a crashing halt, and getting it running again will not be pleasant.At the same time,
To deal with the problems in these two examples, package management systems have a concept of dependencies between packages.  The "mutt" package has requirements for glibc, libcurses and a host of other packages.  If those packages are not present on the system already, then the "mutt" package will not install.  On the other hand, if you try to remove the "libcurses" package, the package manager should refuse to do so because there are other programs installed (such as Mutt) that require that package for operation.  (There are, of course, administrative overrides for all this, but I'm not going to deal with them now.)
Good package management programs (which almost all Linux and BSD distributions now have) will either resolved the dependencies at installation time ("You asked for Mutt to be installed, but you also need to install libcurses as well.  Do you want to install both?") or warn you that something else requires the program you asked to remove ("You requested that libcurses be removed, but Mutt requires it.  Do you want to remove both?").  This ability is a wonderful thing, especially when setting new systems, since you can just list the "end programs" that you want to actually use, instead of those programs, plus all of their many dependencies.

Now, software/package management is really one aspect of a much broader systems management category. Bundling and installing "mutt" is a software management task.  Configuring a usable workstations is a much higher-level task, which includes sub-tasks like "install and configure an email client" (which can have sub-sub-tasks like "install mutt" and "create appropriate /etc/Muttrc file").  Systems configuration also can include things like "install package X, resolving and installing all dependencies as needed," or "mount an NFS share," or "at system install time, partition the hard disks thusly," or "install NTP, configure it to start at boot, use 3 different external NTP servers, automatically restart if the /etc/ntpd.conf file changes, and punch firewall rules to allow client connections from designated subnets."  (All four of these examples are things I have cfengine do across a few hundred boxes at $day_job.)

So to actually answer your questions...
I'd start with something simple.  A "base" install of some sort.  On RPM based systems, there's a procedure called "kickstart", which allows you to automate installations.  Included in kickstart installs is the ability to include a list of programs.  The program will automatically resolve the dependencies for you, and install everything you need.
You will only need to create your own packages if you need software that is not included in the base distribution.  For example at $day_job, I use a program called "multitail"[1] that is not included in the main distrubution.  I've compiled and created an RPM for it, and install that package on all of my systems.  But I don't have to do that for GCC, since there is already a package for it.
You do not need to rebuild the packages each time you make a change.
There are several ways to distribute packages.  The simplest way is to copy them to the "client" systems and install them directly.  This can be done with an NFS mount, over HTTP, or ssh--it doesn't really matter.  If you find that you have a lot of custom packages, you may want to consider creating and running your own package repository, and configuring you systems to use that, just like you would any other software repository.  There is a program called "mrepo"[2] that does this, but it sounds like this would be a "stage 3" sort of thing...
One nice thing about using a repository is that your systems configuration program (cfengine, puppet, chef) can then make use of your built-in package tools to handle all of the grunt work of installing packages.  For my cfengine-based systems, I simply add the package to a list, and it will be installed (if possible) the next time the policy is executed.
The configuration management programs do not *directly* handle the software packages, but they will direct other programs to do this.  On Debian systems they will call apt-get, while on RPM systems they will call RPM or YUM as appropriate.
The other place that systems management comes into play here is actually configuring the programs after installation.  For example, if you are building an small collection of web servers, it is not enough to merely install the "httpd" package.  You also need to properly edit the configuration files to use the proper hostnames, lock down security, load the proper modules, etc.  While you can edit five different files on ten different hosts, it will get tedioius very quickly.  
Instead, you write rules for your cfg.mngt. program that say (roughly speaking):  "make sure that these 5 files stored on the policy host are copied to all the web servers, and restart apache if, and only if, any of the files are updated".
This does several things.  First, it makes sure that you don't forget to edit one of the 50 different files you need to check  by mistake (5 files on 10 hosts).  Second, it allows you to automate certain actions based on conditional events--restart the webservers, but *only* when you have to.  Lastly, it does introduce a new layer of complexity--managing your configuration management tool can be a bit of work, it's true.  But the time invested is *far* better spent that duplicating the same sequence of tasks on multiple systems by hand.
Hope this answered at least a few of the questions...it's a big topic.

[1] http://www.vanheusden.com/multitail/
[2] http://dag.wieers.com/home-made/mrepo/
-- 

Jesse Becker

--- On Mon, 9/5/11, Calvin Pryor <calvinpryor at gmail.com> wrote:On Mon, Sep 5, 2011 at 10:41 AM, Jesse Becker <jesse_becker at yahoo.com> wrote:

I'd also suggest learning to build RPM/DEB packages as well.  That let's you easily encapsulate and distribute programs to multiple machines, but that's probably a "stage 2" sort of thing.

Jesse you are reading my mind. My next question was going to be how to abstract the software from the hardware, so we can set up a master system with all the packages we want, then push these packages to the rest of the boxes. 

How can we replicating packages ? If I add/remove packages from the "master", can these changes be propagated to the rest of the lab automatically ? Or would I have to rebuild the "master package" after all changes are made, and push out this new package manually ? Or are changes in the packages handled by one of the tools you already mentioned (puppet, cfengine, chef) ? Since you also suggested learning how to build packages, I'm guessing puppet/cfengine/chef does not handle the software packages ?  

Looking at this http://en.wikipedia.org/wiki/Configuration_management sounds like  puppet/cfengine/chef is "Computer hardware configuration management" and package replication is "software configuration management" ? but the link makes the software stuff seem more like what developers would use ? 

-----Inline Attachment Follows-----

_______________________________________________
UFO Chicago -- Users of Free Operating Systems
Free Software Rules -- Proprietary Drools!
http://ufo.chicago.il.us/cgi-bin/mailman/listinfo/ufo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ufo.chicago.il.us/pipermail/ufo/attachments/20110905/0aa629c5/attachment.htm