Tuesday, July 11, 2006

A General-Purpose Nightly Update Script

This is the fourth (and last) in a series of posts about a simple nightly update system for OS X.

In previous posts I've described Apple's "softwareupdate" command, which can be used to keep Apple-supplied software up-to-date, and I've described a set of scripts that can be used to keep third-party packages (at least if they're packaged in certain formats) up-to-date. These are necessary components of a nightly update system, but we still need more.

We'd also like to accomodate software that isn't packaged. For example, scripts we've written locally, or software that we've installed from source. For this purpose I've traditionally kept a couple of local directory trees that are synchronized nightly with master copies living on a server.

In addition, we'd like to keep certain local configuration files in sync with master copies. For example, we might want to com.apple.loginwindow.plist the same on all machines, and be able to change this file on all machines from time to time.

But there are other tasks besides software updates and configuration file updates. To accomodate these, we'd like the ability to run one or more catchall scripts on a nightly basis. The scripts should be syncronized with master copies living on a server, and can be used to perform whatever tasks we find necessary on a given night.

To do all of this I've written the following script, called just "update", which is based on an analogous script we've been using for many years to maintain our Linux computers.

Script update
#!/usr/bin/perl

##############################################################################
# This script is run daily to update OS X computers running the standard
# configuration.
##############################################################################

use strict;

BEGIN {
$PkgUpdate::noyaml = 0;
unless (eval "use YAML; 1") {
warn "YAML not found, falling back to simple config.\n";
$PkgUpdate::noyaml = 1;
}
}

use File::Temp qw/tempdir/;

my $configfile = "/etc/localconfig";
my $window = 4*60*60; # 4 hours

# Establish lock on update system:
if ( -f "/var/run/update" ) {
chomp( my $OLDPID = `cat /var/run/update` );
chomp( my $PROCESS = `ps -p $OLDPID |wc -l` );
if ( $PROCESS > 1 ) {
`echo "Update system is locked!" | /usr/bin/mail -s "Message from Update" root`;
exit;
} else {
print "Removing stale lock file.\n";
unlink ("/var/run/update");
}
}
`echo $$ > /var/run/update`;


# Unless instant gratification is requested, wait for a random
# time, from 0 to 4 hours, to avoid hitting the server too hard
# all at once:
unless ( $ARGV[0] eq "now" ) {
my $wait = int(rand($window));
print "Sleeping for $wait seconds...\n";
sleep($wait);
}

# Load local configuration:
my $config;
my %simpleconfig;
unless ($PkgUpdate::noyaml) {
$config = LoadFile("$configfile.yml") or die "Cannot load config file $configfile.yml: $!\n";
} else {
%simpleconfig = simpleconfig($configfile);
$config = \%simpleconfig;
}

# Resync /common/manager:
if ( $config->{CACHE_COMMON} eq "yes" or
$config->{CACHE_COMMON_MANAGER} = "yes" ) {
print "Resynchronizing /common/manager...\n";
`chflags -R nouchg /common/manager`;
print `rsync -aqz --delete $config->{UPDATEMASTER}::update-$config->{UPDATEVERSION}/common/manager/ /common/manager/`;
`chflags -R uchg /common/manager`;
}

# Run pre-update script:
if ( -x "/common/manager/update.prescript" ) {
print "Running pre-update script...\n";
print `/common/manager/update.prescript`;
}

# Resync config files with master copies:
print "Resynchronizing config files...\n";
chomp( my $NOW = `date +%y%m%d%H%M%S` );
my $EXCL = "";
my $INCL = "";
-f "/etc/CACHE_CONFIG.exclude" && ($EXCL="--exclude-from=/etc/CACHE_CONFIG.exclude");
-f "/etc/CACHE_CONFIG.include" && ($INCL="--include-from=/etc/CACHE_CONFIG.include");
-d "/etc/oldconfigs" || mkdir("/etc/oldconfigs");
mkdir ("/etc/oldconfigs/$NOW");
print `rsync -aqz -b --backup-dir="/etc/oldconfigs/$NOW" -u $EXCL $INCL $config->{UPDATEMASTER}::update-$config->{UPDATEVERSION}/config/ /`;
# If oldconfig directory is empty, remove it:
rmdir ("/etc/oldconfigs/$NOW");

# Resync with master images, if requested:
if ( $config->{CACHE_COMMON} eq "yes" ) {
print "Resynchronizing /common...\n";
`chflags -R nouchg /common`;
print `rsync -aqz --delete $config->{UPDATEMASTER}::update-$config->{UPDATEVERSION}/common/ /common/`;
`chflags -R uchg /common`;
}
if ( $config->{CACHE_LOCAL} eq "yes" ) {
print "Resynchronizing /local...\n";
my $excl = "";
my $incl = "";
-f "/local/etc/CACHE_LOCAL.exclude" && ($excl = "--exclude-from=/local/etc/CACHE_LOCAL.exclude");
-f "/local/etc/CACHE_LOCAL.include" && ($incl = "--include-from=/local/etc/CACHE_LOCAL.include");
print `rsync -aqz -u $excl $incl $config->{UPDATEMASTER}::update-$config->{UPDATEVERSION}/local/ /local/`;
}

# Do Apple softwareupdate:
print "Doing Apple Softwareupdate...\n";
print `softwareupdate --install --all`;

# Do local pkg updates:
print "Doing Local Package Updates...\n";
print `/common/manager/pkgupdate /common/manager/update.pkglist`;

# Run post-update script:
if ( -x "/common/manager/update.postscript" ) {
print "Running post-update script...\n";
print `/common/manager/update.postscript`;
}

# Print time stamp and unlock update system:
my $now = localtime();
print "Update completed at $now\n";
unlink ("/var/run/update");

sub simpleconfig {
my $file = shift;

my %config = ();
open (FILE,"<$file") or die "Cannot open simpleconfig file $file: $!\n";
while (<FILE>) {
/^\#/ && next;
/^([^:]+)?\s*:\s*(.*)?\s*$/;
$config{$1} = $2;
}
close (FILE);
return %config;
}
The local directory trees mentioned above are /common and /local. There are two of them for a couple of reasons: one historical and probably no longer valid, and another still relevant. The historical reason is that Once Upon A Time we couldn't count on the local machine having enough disk space to accommodate all of the files we wanted to put into local directory trees, so we split the original "/common" and put some of the files (the ones that would be accessed most often) into "/local". We could then NFS-mount /common on machines with small disks.

This arrangement proved useful for another reason, even after disk sizes far outstripped our needs. It allowed us to distribute management responsibilities, where appropriate. For example, a user's search path now looks something like this:
/local/bin:/common/bin:/usr/bin:/bin
If a computer has a local administrator, he or she can effectively replace any software in /common/bin by installing a file with the same name in /local/bin. More about this later.

The "update" script begins by checking to see if any older update processes are still running. In the past, we've sometimes seen network problems that cause update processes to pile up on a machine. This locking step was implemented to avoid that problem. If a valid lock is found, the process e-mails the local "root" account (which is just an alias in /etc/aliases, pointing to a central e-mail address for receiving such reports) and exits. Otherwise, the update script establishes its own lock and proceeds.

At this point, the script normally waits for a random time between 0 and 4 hours in order to spread out the load on the servers that supply update information. This wait can be skipped by supplying a command-line flag ("now").

The update script itself, and other associated scripts, live in the directory /common/manager. Before updating anything else, the update script first resynchronizes this directory with a master copy living on a central server. This ensures that changes to any of the update configuration files or subsidiary scripts will be made before the rest of the update procedure is executed. (Note that any changes to the update script itself won't take effect until after two update cycles. This could be avoided by re-exec'ing /common/manager/update with an appropriate flag after /common/manager has been updated, but we've never found that necessary.)

Between updates, all flags in the /common filesystem are kept in an immutable state. This prevents accidental (or trivial malicious) modification of these files, and it reinforces the idea that /common should always be an exact copy of a master image stored on a central server. Any local changes should be made in /local instead. Before synchronizing /common/master (or, later in the script, /common itself) the immutable flag is temporarily lifted, and then restored after the synchronization is complete. Note that under OS X we're using the "uchg" flag instead of "schg", since we need to turn immutablity on and off.

The actual syncronization is done with rsync. For /common/manager, the script constructs an rsync source address of the form:
$UPDATEMASTER::update-$UPDATEVERSION/common/manager/
and synchronizes that with the destination /common/master/ . The parameters UPDATEMASTER and UPDATEVERSION are drawn from a local configuration file, either /etc/localconfig or /etc/localconfig.yml.

Local configuration flags, CACHE_COMMON and CACHE_COMMON_MASTER allow a local administrator to turn off synchronization of all of /common or only /common/master.

After /common/manager is updated, the script looks for an executable file called /common/manager/update.prescript. If this is found, it is executed. The update.prescript file can contain any commands deemed necessary to be done before the rest of the update process. Later on, after most of the update process is completed, the update script will also look for an update.postscript script. As a matter of local convention, we make additions to these scripts cumulative. In other words, whenever we add a task to one of the pre- or post-scripts, we add code to check to see if the task has already been done, and skip it if so. This allows us to bring up-to-date any hosts that have been offline for a while and missed some updates, while avoiding re-doing tasks on every machine.

After the pre-script has been run, the update script next resynchronizes selected local configuration files with master copies on a central server. This is another place where a local administrator can step in and control the update process, by selectively excluding some configuration files. The resynchronization is again done with rsync, which this time uses a source of the form
$UPDATEMASTER::update-$UPDATEVERSION/config/
This is synchronized with the destination "/", but this time the synchronization isn't exact:
  • Files present on the local machine, but not present in the master copy, are ignored.
  • Backup copies of any files changed during the resynchronization are saved into a date-stamped directory under /etc/oldconfigs
  • The rsync process looks for include/exclude lists in the local files /etc/CACHE_CONFIG.include and /etc/CACHE_CONFIG.exclude
The config file tree on the server might look something like this:
|-- Library
| |-- LaunchDaemons
| | `-- org.zebedee.zebedee.plist
| `-- Preferences
| |-- SystemConfiguration
| | `-- com.apple.PowerManagement.plist
| `-- com.apple.sharing.firewall.plist
`-- private
`-- etc
|-- bashrc
|-- csh.cshrc
|-- hostconfig
`-- profile


Next, the rest of the /common tree is synchronized with the master copy. The synchronization is exact in this case, with any extra local files being deleted. As before, immutable flags are cleared before the resynchronization and re-instated afterward. The local configuration variable CACHE_COMMON allows the local administrator to selectively skip this part of the update process.

After /common is resynchronized, the update script resynchronizes /local. This is a looser resynchronization, without the --delete flag, so that the local administrator can add extra things to the /local tree that aren't present in the master copy. The local administrator can also selectively exclude parts of the tree from synchronization by using the /etc/CACHE_LOCAL.exclude and CACHE_LOCAL.include files.

The update script next does an Apple softwareupdate, automatically installing all available appropriate updates. Once that's finished, it uses our pkgupdate script to update any locally-installed third-party packages. The list of tasks for pkgupdate is stored in /common/manager/update.pkglist .

Finally, the catchall post-update script (if any) is run and the update lock is cleared.

No comments: