Perl Practicum: The Devil in the Details

by Hal Pomeranz

A Modest Proposal

We usually think of UNIX tools using command-line switches rather than configuration files. Sometimes, however, the number of configuration options is so large or the amount of configuration information is so daunting (e.g., sendmail or inetd) that a configuration file is required. Configuration files also give users a relatively easy interface for customization and enable the program to adapt over time without destabilizing the application's actual code.

Lately, I have been writing a lot of applications that cry out for the use of configuration files, and I have found that a particularly flexible paradigm is to write config files in Perl syntax and then eval() them. There is no need to write a new file-parsing routine for each application, and the configuration files have direct access to data in the program's environment. The downside, of course, is that it is harder for users (particularly nontechnical users) to configure the application. Still, if you write tools for developers (as I generally do), this is an extremely powerful idea.

Pulling in Files

Generally, there are three ways to bring such configuration files into a program. First, there's the " open() the file and slurp" method:
     open(FILE, $file) || die "Failed to open $file\n";
     @lines = <FILE>;
     close($file);
     eval("@lines");
     die "Failed to eval() file $file:\n$@\n" if ($@);
The variable $@ is defined only if the preceding eval() statement detected a syntax error. Configuration files tend to be small and RAM tends to be large, so there is not much worry in using too much memory for @lines. Note that $file must be the configuration file's full path name or $file must be in the current working directory of the program.

A second option, then, is to use do $file. The do construct searches the standard Perl "include" path stored in @INC. The problem is that do won't trap syntax errors in the configuration file. Instead write:

     $result = do $file;
     die "Probable syntax error $file\n" unless ($result);
and make sure to end $file with a nonzero statement as is typical for Perl library files (usually the last line of such files is simply "1;"). If it hits a syntax error, do stops evaluating - causing $result to be undef.

Rather than having to remember to end config files with statements that evaluate to be nonzero, another option is to use require, which searches through @INC just like do but also raises a fatal error if the file contains a syntax error. These errors can be trapped with eval() like this:

     eval('require("$file")');
     die "*** Failed to eval() file $file:\n$@\n" if ($@);
The only difficulty with require is that it will include a given file only once. This may not seem like an issue at first, but suppose the application is a daemon that is supposed to re-read its configuration file when it receives a HUP signal.

It turns out that require keeps track of which files the program has read with the %INC hash. The keys to the hash are the arguments given to require, and the values are the full pathnames to the file as found by searching @INC. Using this information we can write this simple function:

     sub acquire {
          my($file) = @_;
          delete($INC{$file});
          eval('require("$file")');
          die "*** Failed to eval() file $file:\n$@\n" if ($@);
     }
This gives us all the benefits of require and still enables us to reread the same configuration file.

But What Good Is It?

All right, we can safely read in configuration files written as Perl code, but what exactly does this buy us? In one of my early columns, I talked about writing portable Perl scripts that read in a configuration file that contained machine-specific configuration information. For example, consider an /etc/OSinfo file on all machines that contained information like:
     $VENDOR = "Sun";
     $HARDWARE = "Sparc";
     $OS = "Solaris";
     $VERSION = "2.5";
     $HOSTNAME = `/usr/bin/uname -n`;
     $PSCMD = "/usr/bin/ps -ef";
     $MAILER = "/usr/bin/mailx -s";
On Berkeley-based systems, $PSCMD might be ps -aux, and $HOSTNAME might be set by calling hostname instead of uname. All of a site's administration scripts could simply use the acquire() function to suck in all this configuration information. Assuming the scripts used the variables set in the configuration file, they would be completely portable across every machine on the network. Configurations files can, of course, contain more than simple scalar variables. For example, I wrote myself a little program that splits my mailbox up into smaller files based on who the email comes from. The program reads a configuration file which defines a hash like:
     %File = ("firewalls-owner" => "firewalls",
             "owner-namedroppers" => "dns",
             "bind-" => "dns",
             "socks-owner" => "socks",
             "owner-www-security" => "wwwsec",
             "owner-best-of-security" => "bos",
             "bosslug-owner" => "bosslug",
             "owner-solaris-x86" => "x86",);
The keys of the above array are all From addresses of various mailing lists I subscribe to. If the From address of a given message matches one of the keys in the hash, then the message is deposited in the file whose name is $File{$key} (if the From address doesn't match any key, then the message goes into a default file). This program is very useful when I've been out of the office for several days and I want to ignore all the mailing list traffic I usually get and just concentrate on mail sent by individuals.

It is even possible to define subroutines in the configuration files. Because the eval() statements happen at runtime, function definitions in the config file will always override declarations with the same name in your program. For example, a program like this:

     eval('require("$file")');
     die "*** Failed to eval() file $file:\n$@\n" if ($@);

     sub printer {
          print "In Perl prog\n";
     }

     printer();
with $file that contains this:
     sub printer {
          print "In required file\n";
     }
will print In required file\n when the configuration file invokes printer. My PLOD program uses this idea to enable users to replace a standard (but weak) encryption routine with a stronger routine of their own devising. Generally avoid using symbols in a configuration file that will clash with variables and function names in the programs. One simple solution is to use all uppercase symbols in configuration files and all lowercase symbols in programs. Symbols in your configuration file could also be prefixed with some standardized string (such as the name of the configuration file itself). Alternatively, use package in the configuration file to push all symbols into a protected namespace.

Hybrid Files

Sometimes it is undesirable to force users to write a full-blown Perl script just to configure the latest tool. Instead, consider using a hybrid-type file that has easy-to-parse fields, some of which might be Perl expressions. For example, we could rewrite our /etc/OSinfo file as

     # Sun specific:
     VENDOR Sun
     HARDWARE Sparc OS Solaris
     VERSION 2.5
     HOSTNAME `/usr/bin/uname -n`

     PSCMD "/usr/bin/ps -ef"
     MAILER "/usr/bin/mailx -s"
and then read in the file with
     open(FILE, "/etc/OSinfo") || die "Failed to read config\n";
     while (<FILE>) {
          next if (/^(#.*|\s*)$/);
          ($key, $val) = split(/\s+/, $_, 2);
          $Config{$key} = eval($val);
          die "Error on line $.:\n$@\n" if ($@);
     }
     close(FILE);
Note that this code skips comments (lines beginning with a "#") and blank lines (lines that contain only white space). It uses the three-argument form of split() so that the line is broken into two pieces. This kind of hybrid file can give the best of both worlds: easy, yet extremely flexible and powerful configuration.

Wrapping Up

Although these kinds of configuration files can be extremely powerful, they can also be a living nightmare for users. Make them have to configure only useful parameters, and don't bury them under a huge number of options. Choose sensible defaults that will work properly in the normal case. Provide users with a variety of well-documented, precon- figured files that they can copy and modify to suit their particular needs.


Reproduced from ;login: Vol. 21 No. 3, June 1996.

Back to Table of Contents

12/4/96ah