Perl Practicum: Know All the Angles

by Hal Pomeranz

Perl 5 is Coming!

Good news, Perl enthusiasts! As of this writing, Larry Wall has just announced the Alpha release of Perl 5. It passes all of the Perl 4 regression tests, but has no Configure script yet, and is only guaranteed to build on a Sun Sparc machine. New ftp sites are springing up daily, so consult comp.lang.perl for more details. A stable release of Perl 5 by Christmas? I guess we'll have to wait and see...

File Manipulation

File manipulation is a Perl fundamental. If you have been using the language for any period of time, then you are probably more than familiar with
     open(FILE, "< myfile") || die "Can't open `myfile'\n";
     while (<FILE>) { ... }
However, you may not have caught on to everything that can go inside those angle brackets. They're not just for file handles anymore.

Arguments as Filenames

First, we have the special file handle ARGV. When used in a loop like
     while (<ARGV>) {
          ...
     }
each element of the argument list, @ARGV, will be treated as a filename. Perl will attempt to open each file in the list, read the entire contents, and move on to the next file. You will get an error message if a file cannot be opened, but the loop will continue until all filenames are exhausted. If there are no arguments to the program (i.e., if @ARGV is empty), then the loop above will get lines from the standard input, as any good UNIX program should. By the way, since the idiom is so common, there is a shorthand notation, <>, which means the same thing.

Associated with the file handle ARGV is the scalar $ARGV, which contains the name of the file currently open. We can use this to write a simple-minded grep program:

     $pat = shift @ARGV;
     $many = @ARGV > 1;
     while (<>) {
          next unless /$pat/;
          print "$ARGV:" if $many;
          print;
     }
The program uses shift() to remove the first element, the pattern to search for, from the argument list. All other arguments are treated as file names. The name of the current file is printed before the matching lines if more than one file name is given on the command line (just like the UNIX grep program does).

The only rub with this whole business is that as each file is opened, the special variable $., which gives the current line number that we are on, is not reset. If this is a problem, employ the following trickery:

     $oldname = '';
     while (<>) {
          if ($ARGV ne $oldname) {
               $lineno = 0;
               $oldname = $ARGV;
          }
          $lineno++;
          ...
     }
and use $lineno instead of the $. variable. You may ask yourself why we are messing around with $lineno and not just doing the assignment to $. instead. The reason is that it simply does not work: $. is only reset on an explicit close().

Indirect File Handles

A scalar variable inside angle brackets, <$file>, indicates an indirect file handle. Perl attempts to read the next line from the file handle whose name is the string value of $file. For example:
     open(FILE, "/etc/motd") || die "Can't open /etc/motd\n";
     $file = 'FILE';
     $line1 = <FILE>;
     $line2 = <$file>;
Why is this at all useful? First it allows us to pass file handles to subroutines in a reasonable fashion:
     open(FILE, "/etc/motd") || die "Can't open /etc/motd\n";
     &mysub(FILE);
     sub mysub {
          local($file) = @_;
          while (<$file>) {
          ...
          }
     }
Second, the string contained in the variable need not be a valid identifier. It could even, for example, be the name of the file that we are opening:
     for (0..8) {
          $file = "/var/adm/messages.$_";
          open($file, "$file") || die "Can't open $file\n";
     }
and then we could later do something like:
     &do_something_with ("/var/adm/messages.0");

     sub do_something_with {
          local($file) = @_;
          while (<$file>) {
          ...
          }
     }
This can be a big win as far as readability goes, e.g., if you have lots of open file handles running around in your program.

Third, we can build up lists and arrays of indirect file handles:

     @myfiles = 0..7;
          for (@myfiles) {
               open($myfiles[$_], "syslog.$_") || die "Can't open syslog.$_\n";
     }
Note that though we are able to use an array reference in the open() call in the above example, we can only use a scalar variable inside angle brackets to denote an indirect file handle. Thus, we must first dereference values from @myfiles before using them:
     $file = $myfiles[3];
     $line = <$file>;
     print "$line";

Globbing

If a string inside angle brackets is not a file handle (direct or indirect), then it is passed to a subshell (the C shell if available, otherwise the Bourne shell) to be globbed. You can use the glob in a loop to get back the matching file names one at a time:
     while (<*.c>) {
          print "Checking out $_...\n";
          system("co -l $_");
     }
or you can slurp all the files into a list:
     chmod 0644, <*.c>;
However, don't think from the two examples above that the glob behaves just like a file handle, because it doesn't. This example
     $file1 = <*.c>;
     print "$file1\n";
     $file2 = <*.c>;
     print "$file2\n";
prints the same filename twice (and spawns a subshell twice as well), rather than printing the first and second matching filenames. Read on, though, if you care to see some real tragedy.

One layer of variable interpolation will be done before the glob, but you can't say <$glob> because that's an indirect file handle. You have to throw curly braces around your variable name to force interpolation:

     $glob = "*.c";
     @c_files = <${glob}>;
For those of you who weren't paying attention, we have just illuminated one part of the seamy underbelly of Perl: a place where $glob and ${glob} do not mean the same thing.

Since Perl does an exec() to let the shell glob the files, rather than relying on some built-in globbing function, it is almost always more efficient (in terms of run-time, but perhaps not in terms of readability or amount of code) to use the builtin directory operators:

     opendir(DIR, ".") || die "Can't open directory `.'\n";
     @c_files = grep(/\.c$/, readdir(DIR));
     closedir(DIR);
Note that the glob will always return the file names in alphabetical order while the above code won't (although you're always free to sort() the list of files you get from the above method).

Summary

The <> idiom is a useful one and should be part of every Perl programmer's toolkit. Indirect file handles and shell globs are used less frequently but often to good effect in improving your code's clarity and readability. Indirect file handles in particular can also be used to needlessly obfuscate your code. So remember as you try to cloud the minds of lesser mortals, that in this life one sometimes needs to maintain one's own code.


Reproduced from ;login: Vol. 18 No. 5, October 1993.

Back to Table of Contents

11/22/96ah