open()
or with backticks
rather than trying to write the same function in Perl. There are real
reasons, however, why this may be suboptimal.
First, there's the efficiency bogeyman. Certainly there is a great deal of overhead in setting up another process for execution. Of course, as the size of the data set you are processing grows, this overhead may become insignificant. As with all optimizations, you should experiment: try different solutions on real data sets and see which approach is most efficient.
A more telling argument in favor of avoiding vendor-provided tools is
portability. If you have ever main- tained software across multiple
platforms, then you know how difficult it is to find the utility you
need sometimes. Where does the find
command live on all
of your systems - /bin,
/usr/bin
,
/usr/ucb
, some other evil hidden location? Does it accept
the same set of options on all of your platforms? Multiply these
issues by the number of different UNIX utilities that your Perl
programs would like to use, and suddenly the cost of reimplementation
appears much lower. If you maintain the same revision of Perl across
all platforms, you have a consistent basis to work from. As further
incentive, this installment will demonstrate some simple methods for
emulating various UNIX utilities in Perl.
chown()
,
chmod()
, mkdir()
, and rmdir()
functions. There's also link()
and symlink()
for creating hard and symbolic links respectively, as well as
unlink()
for removing files, and even a
rename()
function to partially emulate mv
.
The chown()
, chmod()
, and
unlink()
functions will accept a list of files to operate
on and return the number of successes. Sometimes, however, you want to
know exactly which operations failed. In this case, use a loop over
the individual elements of your list of files:
for (@files) { chown(0644, $_) || warn "Can't change permissions on $_\n"; }A similar strategy can be used for those file operations which do not operate on lists.
Note that if your operating system does not support one of the above
function calls, you will encounter various failure modes (some more
graceful than others). For example, if symbolic links aren't supported
on your system, then the symlink()
call will cause your
program to die at runtime with a fatal error. Like any function you
are unsure of, you should wrap the call in an eval()
statement to trap possible errors:
eval "symlink($old, $new);"; warn "Symlink not supported\n" if ($@);The
$@
variable is guaranteed to be the null string if
the eval()
succeeds, so this is a reliable test.
Sometimes Perl will simply invoke the appropriate operating system
tool if a function is not provided as a library call: the
mkdir()
function is the classic example of this. In this
case, it is probably more efficient to call the program once yourself
with a list of directories, rather than spawning a process for each
individual directory you wish to create.
split()
and substr()
emulate
cut
very closely. Perl has a builtin sort()
function that is much more powerful than the UNIX sort
utility, but you have to define your own selection routines to do
really tricky sorts. (See the first Perl
Practicum for more information on devious
sorting). Sometimes, though, you have to change your thinking a bit to
get Perl to do what you want.
For example, programmers often like to use basename
and
dirname
to get the name the current program was called by
(for error messages) and the directory it was called from. Perl stores
the full invocation pathname of the program in the variable
$0
and basename
and dirname
can
be emulated with appropriate substitutions:
($basename = $0) =~ s%.*/%%; ($dirname = $0) =~ s%/[^/]*$%%;The first substitution takes advantage of Perl's greedy pattern matching algorithm to eat up everything up to the last `/' in the pathname and throw it away. If you're interested in both the directory and the file name, you can use the following one-liner:
($dirname, $basename) = $0 =~ /(.*)\/(.*)/;Again, we're making use of the greedy pattern match as well as the fact that pattern match returns a list of subexpressions in a list. The statement looks a little strange, but the precedence is correct.
Another common UNIX filter is uniq
. Of course, you always
have to sort your file before passing it to uniq
because
the tool will only recognize consecutive matching lines. Not so with
Perl:
open(FILE, "< myfile") || die "Can't open myfile\n"; while (<FILE>) { next if $seen{$_}++; ...do some processing here... } close(FILE);Note that memory usage can get quite high if the file is large and doesn't have a great deal of repetition. On the positive side, the
%seen
array ends up having a count of the number of
repetitions of each line, in case you care to emulate uniq
-c
. You can always run sort()
on the unique lines
in the file if you really wanted the lines to be sorted.
The grep()
function in Perl can be used to emulate UNIX
grep
:
open(FILE, "< myfile") || die "Can't open myfile\n"; @lines = <FILE>; close(FILE); @found = grep(/$pattern/, @lines);This, however, can be rather memory intensive for large files. Instead, simply operate sequentially:
open(FILE, "< myfile") || die "Can't openmyfile\n"; while (<FILE>) { next unless (/$pattern/); ...process here... } close(FILE);If you want a list of matching lines, rather than operating sequentially, just
push()
the matching lines into a list
in the processing section. At least you save having to slurp the
entire file into memory.
You use a package by first "requiring" it and then calling the
functions it contains as you would any user-defined function. For
example, the ctime.pl
package provides a simple
work-alike for the UNIX date
command:
require "ctime.pl"; $date_str = &ctime(localtime);Of course, you don't get the formatting string capabilities that some
date
commands provide, put you can always use
localtime()
and printf()
to emulate this
behavior.
Also in the easy to use category are the getcwd.pl
and
fastgetcwd.pl
libraries to help you find where you are in
the directory tree. The function defined in fastgetcwd.pl
is more efficient because it uses chdir()
to traverse a
path up to the root, but you might not be able to get back where you
started from once you chdir()
out. For those of you who
like the $PWD
variable under the C shell, there's the
pwd.pl
library. Simply &initpwd()
after
requiring the library and then use the package-defined
&chdir()
function instead of Perl's built-in
chdir()
. The &chdir()
function will
continuously update the $PWD
environment variable.
Far and away the most useful volume in the Perl library, though, is
find.pl.
The union of find
options across
all UNIX platforms throughout history is tremendous, but their
intersection is often minimal. Writing a find
command
that works on every platform at your site can be a study in
constraints. Actually, the find.pl
library really exists
to drive the find2perl
program provided with the Perl
distribution, but you can require find.pl
directly in a
program of your own devising. (The find2perl
program will
emit a complete Perl program which will exactly match the behavior of
the find
options fed to find2perl
on the
command line.)
The &find()
function defined in the library accepts a
list of file and directory names as arguments and will traverse all
files and subdirectories just as the UNIX find
command
does. For each item in the traversal, &find()
will call a
user-defined subroutine &wanted
. No arguments are passed
to &wanted()
; but $dir
contains the pathname
of the current directory, $_
the name of the item
currently being considered, and $name
the full pathname
"$dir/$_
". If $_
is a directory, the
&wanted
function can set the variable $prune
to true to stop the &find()
function from descending into
$_
.
Beyond that, the processing done in &wanted
is entirely
up to the user. Judicious use of the stat()
or
lstat()
function and the Perl file test operators can
emulate most of the options supported by the UNIX find
command. (Don't forget that the $_
variable caches the
result of the last stat()
or lstat()
,
whether the function was called directly or via one of the file test
operators). Of course, Perl is a much richer and more powerful
language than the command-line syntax of find
, so some
extremely powerful effects can be obtained. For more guidance, run
find2perl
on some of your favorite find
invocations and study the output carefully. (There are probably
dozens of good candidates in /usr/spool/cron/crontabs/*
alone.)
Plenty of other tools are available in the Perl library. In
look.pl
, there's a dictionary lookup that emulates the
UNIX look
command. There's even a syslog
interface in syslog.pl
in case you hate calling
logger
all the time. More libraries are being invented,
posted to comp.lang.perl, and archived at coombs
every day.
s2p
and the a2p
translators that come with
the Perl distribution) UNIX tools. Sometimes, though you really need
to call some tool outside of Perl. For example, I have yet to find
anything better than:
chop($hostname = `/bin/hostname`);So, next time we'll be talking about strategies for writing portable Perl scripts across a bewildering variety of UNIX implementations with subtly different pathnames and command behaviors. Tentative title to be, "The Thing I Love About Standards Is That There Are So Many."
Reproduced from ;login: Vol. 18 No. 6, December 1993.
Back to Table of Contents
11/22/96ah