Those of you who have been living under a rock for the last twelve months may have missed out on this whole World Wide Web thing. Most of you have probably already tried your hand at some basic HTML authoring. The most interesting application of Web technology, though, is using the Web as an interface to arbitrary data from other sources such as databases and system applications. One mechanism for creating these interfaces is the Common Gateway Interface, CGI for short.
Here is a trivial example
#!/bin/perl print "Content-type: text/html\n\n"; print <<"EOmyPage"; <TITLE>"Hello World!" Page</TITLE> <H1>HELLO WORLD!</H1> EOmyPageThe first line of the script prints the header information, specifying the type of document which follows the header. In this case, we are saying that the document is an HTML text document. A blank line must follow the header information (note the two
\n
s). The
rest of the program is just a "here document" which prints a trivial
HTML page. If at this point you are thinking, "That's easy!", you are
absolutely correct: there is no great mystery to this CGI stuff.
#!/bin/perl print "Content-type: text/html\n\n"; $visitors = 'cat countfile'; $visitors++; if (open(OUT, "> countfile")) { print OUT $visitors; close(OUT); print <<"EOmyPage"; <TITLE>Welcome</TITLE> Hello visitor number $visitors. EOmyPage } else { print "Sorry, an error occurred\n"; }Be warned that your HTTP server will probably be running under some other user ID and will have that user's access rights to files on your system (try to run your servers as a user with no privileges, like the "nobody" user - NEVER give HTTP servers superuser access). Make sure that whatever files you are manipulating have the correct access rights.
It is almost never a good idea to abort a CGI program in the middle of
execution. Remember that there is a user on the other side of the
Internet who is expecting some sort of page to be returned by your
script. Notice that the script above prints an error message if the
open()
fails rather than calling die()
as
you would usually.
Also keep in mind that you can manipulate the output of other programs
from within your Perl script. In the example above, we used the UNIX
cat
program to retrieve the contents of a file, but CGI
allows you to effectively extend the reach of the Web by making data
from other programs available to Web browsers. For example, here is a
little CGI script that gives back ps
output from the
machine it is run on (one could imagine this as part of a suite of
remote diagnostic tools for a large network):
#!/bin/perl print "Content-type: text/plain\n\n"; if (open(PS, "ps -ef |")) { while (<PS>) { print; } } else { print "An error occurred\n"; }Note that we are using a different
Content-type
header. Plain text is usually displayed by browsers in a fixed-width
font (Courier) with all whitespace preserved (unlike HTML). For those
of you familiar with HTML, the output usually looks like it has been
formatted in the <PRE>
block.
You can call just about any program. You could interface with other
network information services like gopher and WAIS, or even NNTP (how
about a Web-based threaded newsreader?). You could interface with
pieces of your company database and write a company phone book page,
or allow people to review their benefits via the Web. However, think
about security before you go off and try to save the world with the
Web: you may not want everybody in the world to have easy access to
much of your data. Even the ps
example above potentially
gives away more knowledge to people outside your organization than you
should be comfortable with.
#!/bin/perl print "Content-type: text/plain\n\n"; foreach $var (sort keys %ENV) { print "\$ENV{$var} = '$ENV{$var}'\n"; }For example, the
REMOTE_HOST
and REMOTE_ADDR
variables give the fully qualified hostname and the IP address of the
machine that it connecting to your HTTP server. At NetMarket we get a
lot of "How'd you do that?!?" comments because our home page prints a
little "Thanks for connecting from $ENV{'REMOTE_HOST'}
"
message.
The client browser can also send information to your HTTP server. Your
HTTP server will put this information into your CGI program's
environment using variables that are prefixed with
HTTP_
. In particular, the client will usually provide an
identifying string such as NCSA Mosaic for the X Window
System/2.4 libwww/2.12 modified
in the
HTTP_USER_AGENT
variable. Unfortunately, there is no
established format standard for user agent information, so it is
nearly impossible to build a procedure which can identify an arbitrary
browser from its user agent information. However, it is pretty easy to
recognize most of the major browsers.
What good is identifying a browser? Remember that older browsers may not support all the latest features of the HTML specification. For example, you do not want to send a table to NCSA Mosaic 2.4 because the browser cannot format the table information, and you would not want to send an image map to a text-only browser like Lynx because the user would not be able to see the image.
<TITLE>Send Us Email!</TITLE> We'd love to hear from you. Enter your email address and comments in the spaces provided and we'll respond as quickly as we can!<P> <FORM METHOD="POST" ACTION="bin/process_form"> Your E-mail address<BR> <INPUT NAME="email" SIZE=45 MAXLENGTH=45><BR> Your Message<BR> <TEXTAREA NAME="comments" ROWS=12 COLS=45></TEXTAREA><P> <INPUT TYPE="submit" VALUE="Send your comments"> </FORM>The
<FORM ... ACTION=" ... ">
tag specifies what program
the user's browser should try to call when they submit the form
information. This form creates a space for the user to enter an email
address and a free-form text area for the user to type in a
message. Finally, there is a Send your comments
button to
allow the user to submit the form information.
When the user punches the Send your comments button
, the
client browser bundles up all the information that the user entered in
and sends that information to your HTTP server along with a request to
the server to run the appropriate program from the <FORM
... ACTION=" ... ">
. How your program gets the form information
depends upon the <FORM METHOD="..." ...>
tag. In the
example form above, the form method is POST
, which means
that the form information will be handed to your CGI program on the
standard input. You will get a blob of data whose length will be
specified by the CONTENT_LENGTH
environment variable. The
easiest way to grab the data is with the read()
function:
#!/bin/perl read(STDIN,$stuff $ENV{`CONTENT_LENGTH'}); . . .Now you have to break up the data into intelligible pieces. The data comes to you in
name=value
pairs separated by
&
characters. The names for each piece of data are
whatever you specified in the form using the <... NAME="
... "...>
tags: in the example above, the name for the email
field is email
, and the name for the free-form text area
is comments
. The other tricky part is that spaces are
converted to +
signs and non-alphanumeric characters are
generally converted to %<hex>
where <hex>
is
the ASCII value for the character in hexadecimal notation. Typically,
the beginning of all form processing programs looks like:
#!/bin/perl read(STDIN, $stuff, $ENV{'CONTENT_LENGTH'}); @pairs = split(/\&/, $stuff); for (@pairs) { ($field, $val) = split(/=/); $field =~ s/\+/ /g; $field =~ s/%(\w\w)/sprintf("%c", hex($1))/eg; $val =~ s/\+/ /g; $val =~ s/%(\w\w)/sprintf("%c", hex($1))/eg; $entries{$field} = $val; } ...First, we read the data off the standard input and then break it up into a list of
name=value
pairs. Then we iterate over
each pair, break the pair apart, and convert the plus signs and
hexadecimal escapes back to the original characters. Do not try to do
the substitutions before you split everything up because some of the
escaped characters may be &
or =
. Convert
the +
signs to spaces first because some of the escaped
characters may be +
.
Now that you have parsed out the input into an associative array, you can do anything with the information you like. You must return a page back to the user, however, as a result of their forms submission:
print "Content-type: text/html\n\n"; if (open(MAIL, "| /usr/lib/sendmail webmaster")) { print MAIL <<"EOdoc"; From: The Comments Page <webmaster> To: webmaster Subject: Comments Mail Mail from: $entries{"email"} $entries{"comments"} EOdoc close(MAIL); print <<"EOpage"; <TITLE>Thanks!</TITLE> Thanks for taking the time to send us comments!<P> We will be responding promptly.<P> EOpage } else { print <<"EOpage"; <TITLE>Bummer!</TITLE> We encountered an error trying to send your comments.<P> Please send mail to <I>webmaster\@netmarket.com</I><P> EOpage }Be VERY careful about what you do with the data you collect from a form: remember that the user can type ANYTHING into that form and could cause huge amounts of havoc if you trust what they type in. Do not ever allow form data to be used as part of a command that you execute from your script. Notice that I will not even put the user's email address in the
From:
line of my message because
that data might be used to generate a sendmail command if the email
bounces.
Sample CGI programs are available all over the Web (NCSA has a small archive of examples to get you started).
Reproduced from ;login: Vol. 20 No. 4, August 1995.
Back to Table of Contents
11/27/96ah