Perl Practicum: Network Wiles
(Part II)

by Hal Pomeranz

In the last installment, we saw how to program a network client by writing a simple tool to get pages from remote Web servers. In this issue, we will explore how to write a simple network server. As an example project, we will actually write a simpleminded Web server (the complete code is presented at the end of this article in case you find it easier to follow along that way). Reread the previous issue if you think you have forgotten any of the basic networking concepts I presented there.

Getting Started

The first thing a network server must do is set up a socket upon which it can accept requests. The first phase of this process looks a lot like the initial code of a network client:
     use Socket;
     $this_host = `my-server.netmarket.com';
     $port = 8080;
     $server_addr = (gethostbyname($this_host))[4];
     $server_struct = pack("S n a4 x8", AF_INET, $port, $server_addr);
     $proto = (getprotobyname(`tcp'))[2];
     socket(SOCK, PF_INET, SOCK_STREAM, $proto)||  die "Failed to initialize socket: $!\n";
First, the program has to pull in the Perl Socket.pm module. The hostname of the machine upon which the server will run and the port upon which it will accept requests are specified on the next two lines (you can imagine getting these parameters out of a configuration file or on the command line). The program then calls gethostbyname() to get the IP address of the server machine and uses that information to create a C structure which we will use later. Finally, we call socket() to create a file handle for the socket.

Remember from the last article that Web servers usually wait for connections on port 80. Why does the code above specify the port as 8080? As a security feature, only the superuser is allowed to run servers that accept connections on ports below 1024. The thinking behind this policy is that users should then be able to trust connecting to unknown machines as long as they are connecting to services (like Telnet, FTP, gopher, et al.) that listen for connections at low port numbers because they will require the system manager at the remote site to "approve" the service being run on those ports. This reasoning is probably no longer true in this age of workstations on every desk, but the rule remains.

Returning to our example, the server now needs to prepare to receive connections at the given address and port combination:

     setsockopt(SOCK, SOL_SOCKET, SO_REUSEADDR,1) ||
          die "setsockopt() failed: $!\n";
     bind(SOCK, $server_struct) || die "bind() failed: $!\n";
     listen(SOCK, SOMAXCONN) || die "listen() failed: $!\n";
The setsockopt() function allows the program to change various parameters associated with the socket: more on SO_REUSEADDR in a moment. The bind() call is what actually associates the SOCK file handle with the address and port number pair specified at the top of the program. As long as any program has bound itself to a particular address and port, no other program can bind to the same location. This is useful and prevents confusion. However, even after a given server program has exited, its address/port combination does not become available for reuse (at least until the machine the server was running on is rebooted) - even if you rerun the exact same program. This is annoying and creates bad feelings. Use setsockopt() to set the SO_REUSEADDR bit to 1 (true) - BEFORE the call to bind() - so other programs can reuse the same port after the server program has exited. Both the SOL_SOCKET and SO_REUSEADDR constants are defined in Socket.pm.

The listen() call is probably misnamed. All this function does is specify how long a queue of pending connection attempts the server is willing to deal with. If the server queue is full, further connection attempts will be rejected. On almost every socket implementation in existence, the maximum queue length that you can set is 5 (so handle incoming connection requests quickly!), and SOMAXCONN (another helpful constant from Socket.pm) is usually set to 5. If you try to set the queue length to a value above 5, the operating system silently throttles the queue length back to the maximum value. Solaris 2.x is the only modern operating system that I am aware of where you can meaningfully specify queue length values that are greater than 5 (though interestingly SOMAXCONN is still given as 5 in the Solaris 2.x system header files).

Dealing with Pending Requests

At this point, most network servers go into a tight loop so that they can rapidly deal with their queue of pending network connections:
     for (;;) {
          $remote_host = accept(NEWSOCK, SOCK);
          die "accept() error: $!\n" unless ($remote_host);

          # do some work here
          close(NEWSOCK);
     }
The accept() call grabs the next connection request off the pending queue for SOCK. (If there are no pending connections, accept() pauses until one comes in.) A new socket that is the local endpoint of this new communications channel is created. If you print to NEWSOCK you are sending data to the remote machine making the connection, and you can read data from NEWSOCK just like any other file handle to get data from the remote machine. Always remember to close NEWSOCK when it is no longer needed.

The accept() function returns a C structure containing the address of the remote machine (or undef if the accept() fails for any reason). This structure is the same as the one passed to bind() and connect(), and you can extract the IP address of the remote machine as follows:

     $raw_addr = (unpack("S n a4 x8",$remote_host))[2];
     @octets = unpack("C4", $raw_addr);
     $address = join(".", @octets);
You can also obtain the hostname of the remote host (usually) with the gethostbyaddr() function:
     $hostname = (gethostbyaddr($raw_addr,AF_INET))[0];
This can be useful for logging purposes. Note the reappearance of AF_INET - gethostbyaddr() needs to be told what type of network address it is being given.

A Simple Web Server

Up to this point, we've been flushing out the basic skeleton that every network server application has to have. Now let's do something interesting with it.

HTTP is an incredibly simpleminded protocol. Requests sent by the Web browser are simply lines of ASCII text, terminated by a blank line. After seeing the blank line, the server sends back the requested data and shuts down the connection. Although the client typically sends over a great deal of useful information in its request, a simple Web server can ignore everything except the line that looks like:

     GET /some/path/to/file.html ...
Here's some code that reads the client request and extracts the path to the information that the user is requesting:
     while (<NEWSOCK>) {
          last if (/^\s*$/);
          next unless (/^GET /);
          $path = (split(/\s+/))[1];
     }
Now the server has to respond. Typically $path is relative to the top of some directory hierarchy where your Web documentation lives - your $docroot in Web-speak. This directory can be defined in a config file or on the command line. Assuming that $docroot has been defined elsewhere we can simply
     if (open(FILE, "< $docroot$path")) {
          @lines = <FILE>;
          print NEWSOCK @lines;
          close(FILE);
     }
     else {
          print NEWSOCK <<"EOErrMsg";
     <TITLE>Error</TITLE><H1>Error</H1>
     The following error occurred while
     trying to retrieve your information:
     $!
     EOErrMsg
     }
If we are able to open the requested file, we simply dump its contents down NEWSOCK. Note that the server sends back an error message if the open() fails. Never forget that there is somebody on the other end of that connection who is waiting to hear something back as a result of his or her request.

Congratulations. If you glue together all the code fragments in this article, you will have a bare-bones Web server. You will find all of the code in proper order at the end of this article to make it easier to review all the concepts presented here.

That's Not All

Although this Web server "works" as far as answering simple requests for information, it has a number of problems. First and foremost, it only can handle one request at a time: most production-quality servers can handle hundreds or thousands of simultaneous requests. Second, if you run this server on your machine, I can request
     /../../../../../../../etc/passwd
and get a copy of your password file. Obviously, a better access control mechanism is needed.

In the third and final installment of this series, we will look at ways to solve these (and other) problems with our mini Web server.

     #!/packages/misc/bin/perl

     use Socket;

     $docroot = `/home/hal/public_html';
     $this_host = `my-server.netmarket.com';
     $port = 8080;

     # Initialize C structure
     $server_addr =(gethostbyname($this_host))[4];
     $server_struct = pack("S n a4 x8", AF_INET,$port, $server_addr);

     # Set up socket
     $proto = (getprotobyname(`tcp'))[2];
     socket(SOCK, PF_INET, SOCK_STREAM,$proto)|| die "Failed to initialize socket:$!\n";

     # Bind to address/port and set up pending queue
     setsockopt(SOCK, SOL_SOCKET, SO_REUSEADDR, 1) || die "setsockopt() failed: $!\n";
     bind(SOCK, $server_struct) || die "bind() failed: $!\n";
     listen(SOCK, SOMAXCONN) || die "listen() failed: $!\n";

     # Deal with requests
     for (;;) {
          # Grab next pending request
          #
          $remote_host = accept(NEWSOCK, SOCK);
          die "accept() error: $!\n" unless ($remote_host);

          # Read client request and get $path
          while (<NEWSOCK>) {
               last if (/^\s*$/);
               next unless (/^GET /);
               $path = (split(/\s+/))[1];
          }

          # Print a line of logging info to STDOUT
          $raw_addr = (unpack("S n a4 x8", $remote_host))[2];
          $dot_addr = join(".", unpack("C4", $raw_addr));
          $name = (gethostbyaddr($raw_addr, AF_INET))[0];
          print "$dot_addr\t$name\t$path\n";

          # Respond with info or error message
          if (open(FILE, "< $docroot$path")) {
               @lines = <FILE>;
               print NEWSOCK @lines;
               close(FILE);
          }
          else {
               print NEWSOCK <<"EOErrMsg";
     <TITLE>Error</TITLE><H1>Error</H1>
     The following error occurred while trying to retrieve your information: $!
     EOErrMsg
          }

          # All done
          close(NEWSOCK);
     }


Reproduced from ;login: Vol. 21 No. 5, October 1996.

Back to Table of Contents

12/5/96ah