This multipart article examines network programming using Perl. Network programming with Perl is very much like network programming with C, but Perl's language constructs make it much easier to focus on the actual work of setting up a network connection, rather than issues like exception handling and data reformatting. People who have always been mystified by network applications can often use Perl to spin themselves up; veteran programmers can use Perl to prototype network applications rapidly.
One useful analogy is those in-flight music systems that airlines have. The server is the airplane's music system: it has a bunch of data (movie soundtracks and different types of music) that it can supply to the passengers (the clients). The clients have to ask explicitly for the information, however, by plugging in a pair of headphones and dialing a little num ber to get the exact sounds they want to listen to.
The headphones in the above analogy stand for a concept that network programmers call a "socket." Clients establish socket connections to servers by connecting one end of a logical pipe to the server at a well-known address (the little hole in your airline seat) with a specific port number (dialing the number on your seat to get rock music) while holding onto the other end of the pipe (keeping the headphones on your ears).
As in my airline example, properly designed network servers can handle several client connections simultaneously. Unlike my analogy, network clients can connect to several different servers (or multiple times to the same server) simultaneously.
use Socket; $server = "www.netmarket.com"; $port = 80; $server_addr =(gethostbyname($server))[4]; $server_struct = pack("S n a4 x8", AF_INET, $port, $server_addr); $proto = (getprotobyname(`tcp'))[2]; socket(MYSOCK, PF_INET, SOCK_STREAM, $proto)|| die "Failed to initialize socket: $!\n"; connect(MYSOCK, $server_struct) || die "Failed to connect() to server: $!\n";The first line of this example simply pulls in the Perl sockets module. This module defines a number of useful constants that are employed later in the program. Next come the name of the server that this client will contact and the network port on which to talk to the server-port 80 happens to be the port that Web servers listen on, by default. You might actually get these values passed into your program as command line arguments, or this code might become part of a function that gets these values as function arguments.
In order to be able to connect to the server, the program has to
translate the server's human readable name
(www.netmarket.com) into a network address. The
gethostbyname()
function looks up the server name and
returns a list of information: the network address of the server is
the fifth value of the result (don't worry about the other values
right now).
The C structure is created by using this address and the
pack()
function. This structure has three fields: a
description of the type of network address in the rest of the
structure, what port address to connect to, and what server address to
connect to (the rest of the structure is just filled up with
zeroes). AF_INET
is a constant defined in the Perl socket
module, which stands for an Internet Protocol (IP) type address
(unfortunate people have to use other types of networks like
AppleTalk, DECnet, or X.25, all of which have their own
AF_*
constants in Socket.pm
). Unless the
programmer specifies the type of network connection at the front of
the structure, the operating system will not be able to interpret the
network address information in the rest of the structure, and the
attempt to set up the socket will fail.
With that messy pack()
business out of the way, we can
start setting up the actual socket. First, the client initializes its
end of the socket as a Perl file handle, MYSOCK
. The
other arguments to the socket()
function specify the type
of network connection, how the socket will be used, and the
transmission protocol. PF_INET
is another constant from
Socket.pm
that is related to AF_INET
and
specifies that this socket will be an IP type socket (indeed, in the
early days, AF_INET
was used in both the C structure and
in the socket()
call-avoid and abhor this practice).
SOCK_STREAM
is another constant which says that the
client and server will talk using a connection similar to a telephone
call - both parties can talk back and forth to each other and the
connection will stay up until one party hangs up.
(SOCK_STREAM
is the most common communications method,
but other methods exist such as SOCK_DGRAM
which is more
like smoke signalling-client and server can send out messages, but
there is no guarantee that the other party will receive them.)
Finally, the transmission protocol is specified: the discussion of TCP
versus UDP is beyond the scope of this article, but TCP is always the
right thing to use unless you are very sure that it isn't. Always use
getprotobyname()
to get the right value for the TCP
protocol number. Lazy programmers frequently hard-code this value
because it happens to be the same on nearly every UNIX variant out
there, and people like me curse them when I have to port the code to
non-UNIX systems or strange UNIX variants.
With one end of the socket firmly in hand (again, as the file handle
MYSOCK
) the client calls connect()
to
actually contact the server. The connect()
function takes
as arguments the file handle and the C structure created
earlier. Assuming the connect()
succeeds, the client has
actually established a session with the server.
MYSOCK
can now be treated just like any Perl file handle,
except that you can both read and write from the same socket. In order
to save network and system resources, it is particularly important to
remember to close()
sockets when you are done with them.
Because this client has connected to the Web server (port 80, remember?) on www.netmarket.com, the client program can request an HTML document using the HTTP protocol:
select(MYSOCK); $| = 1; select(STDOUT); print MYSOCK "GET /\n\n"; while (<MYSOCK>) { print; } close(MYSOCK);The first three lines turn off the standard I/O buffering on the socket. When reading and writing from a file, it is usually most efficient to do large reads or writes (read more data than needed or save up a lot of small writes and do them all at once), and most UNIX systems take care of doing this automatically. This behavior can, however, be disabled - for example, on a network socket where the client and server are passing short messages back and forth. The Perl mechanism for turning off buffering is to set the
$|
variable to be non zero (it's zero by default). Setting this variable
affects only the currently selected()
file handle (STDOUT
is selected by default), so you have to select(MYSOCK)
,
set the vari able, and then go back to the default of
STDOUT
.
That done, the client requests a file from the Web server using the
GET
command in the HTTP protocol. The argument to
GET
is the name of the file requested (in this case, the
client is asking for the file at the root of the document tree, but
could just as easily have asked for:
/some/other/file.html).The
GET
request is followed by two newlines.
Once the client makes its request, the server sends the contents of the requested file back down the socket (or an error message if the file was not found or some other error occurred). The standard HTTP protocol defines that when the server finishes sending the file, it hangs up its end of the connection - this causes the entire socket to be torn down. A client reading from a socket interprets this event just as if it had been reading from a file and reached the end-of-file marker. In the program above, the HTML document is simply being printed to the standard output.
In the meantime, practice these concepts by taking the example above and writing a program that will take the server name, port number (default to port 80), and file name as command line arguments and fetch that file from the remote Web server. Impress your friends (and increase your productivity) by building a Web robot that surfs the Web for you by looking for HREF tags in the documents you download and then fetches those documents as well (making sure that you don't download the same document twice!). Now make sure the robot stops at some point, or you'll download the entire Web.
Reproduced from ;login: Vol. 21 No. 4, August 1996.
Back to Table of Contents
12/4/96ah