Perl Practicum: Network Wiles
(Part III)

by Hal Pomeranz

This is the last of three articles dealing with network programming in Perl. Strictly speaking, this article is not about network programming, per se, but rather deals with a pair of system calls that are often used in network servers. This article starts from where I left off in Part II, so a quick reread is in order if you are hazy on any of the concepts presented there.

Doing Several Things At Once

Last issue presented the typical loop a network server uses for handling pending network connections:
for ( ;; ) {
	$remote_host = accept(NEWSOCK, SOCK);
	die "accept() error: $!\n" unless ($remote_host);

	# do some work here

	close(NEWSOCK);
}
Unfortunately, during the "do some work here" phase, the network server is not handling requests that are queueing. Heavily loaded Web servers can get hundreds of requests per second, so handling them all in a serial fashion is unworkable. In a perfect world, the server would spend all of its time doing accept()s and let one or more other processes handle all of the specific requests for information in parallel.

UNIX-like systems support the fork()system call for creating new processes. The fork() call causes the program to create an exact duplicate of itself - all data structures and file handles (and buffered output) are copied to the new process and both processes continue execution from the point in the program at which fork() was called. The only difference between the two processes is that fork() returns zero to the new process (referred to as the "child") while the original process (the "parent") gets the process id number of the child process. The usual application of fork() in network daemons is to have the parent process call accept() and then immediately call fork(). The parent goes back to the next accept() while the child handles the information request from the remote machine and then exits once the request has been completed. In Perl:

for ( ;; ) {
	$remote_host = accept(NEWSOCK, SOCK);
	die "accept() error: $!\n" unless ($remote_host);

	# We're the parent if fork() returns non-zero

	last unless (fork());
	close(NEWSOCK);
}

# We've fallen out of the loop, we're the child.
# Do work here...
It is the child process that falls out of the for loop (i.e., when fork() returns zero). Otherwise, the parent closes the socket that was created for the pending request and jumps back up to the top of the loop and waits for the next accept(). It is safe for the parent to close NEWSOCK since the child has its own copy, which is still open.

The child continues executing the same code that used to live inside the for loop as shown in the last article (in ;login: Vol. 21 No. 5, October 1996) - a complete copy of the code for the mini Web server is reproduced at the end of this article. Once the request has been serviced, the child exits, but it does not die completely silently. When a child process exits, the parent is notified and must acknowledge the child before the child process can be terminated. If the parent does not acknowledge the child, then the child process stays around in limbo until the parent process exits (such stuck child processes are referred to as "zombies").

The parent receives its notification through the UNIX signal(3) interface. Specifically, every time a child exits, the parent receives a SIGCHLD signal and must respond in some fashion. Perl uses the special variable %SIG to define how a process will respond to a given signal: the keys of %SIG are the signal names, and the values are the name of a subroutine to call when the signal is received.

The easiest response is to simply ignore the signal. Using the keyword "IGNORE" instead of a subroutine name as a value in the %SIG array causes the process to acknowledge but discard any occurrences of the given signal:

	$SIG{"CHLD"} = "IGNORE";
at the top of your program will cause child processes to exit silently.

Protecting Your Data

In the last issue, our server responded to requests with:
if (open(FILE, "< $docroot$path")) {
	@lines = <FILE>;
	print NEWSOCK @lines;
	close(FILE);
}
where $docroot was defined at the top of the script, and $path was the document pathname in the HTTP request. We noted that users could request
	../../../../../../../etc/passwd
and get a copy of the system password file. Ideally, the server should allow only remote users to get files that are located under $docroot. Most UNIX-like systems support the chroot() system call, which enables a process to restrict the directories that can be accessed. The chroot() call takes a directory name as an argument and effectively makes that directory the root of the filesystem for the process. If our Web server were to
	chroot($docroot);
and then get the request shown above, the remote user would get only $docroot/etc/passwd. There are, however, a couple of problems with using chroot(). First, only processes running as the superuser can call chroot(). Although working in a chroot-ed environment is very secure, running all of your network servers as root is not. The usual workaround is for the process to give up superuser privileges as soon as it has performed the chroot() call - usually becoming a system user with no privileges like the "nobody" user. This is easily accomplished in Perl:
$user = "nobody";
unless ($uid = (getpwnam($user))[2]) {
	die "Attempt to run server as non-existent or superuser\n";
}

# [...] stuff happens [...]

# chroot() to docroot and then change our effective userid
#
chroot($docroot) || die "chroot() failed: $!\n";
$> = $uid;
The special variable $> is the effective userid of the process. Perl programs that are running as the superuser may change this variable to change their privilege level.

The second problem with chroot() is that, once the process has performed the chroot(), it can no longer access system configuration files or system devices that live outside the chroot-ed filesystem. This is why anonymous FTP servers require the administrator to create scaled-down copies of various system files in the anonymous FTP area: the FTP server does a chroot() before giving anonymous users access.

This really is not a big problem for our Web server, because the files we need to access are under $docroot. However, the server wants to try and look up the hostname of the remote host for logging purposes, and this step must be per formed before the chroot() or the lookup will fail.

$user = "nobody";
unless ($uid = (getpwnam($user))[2]) {
	die "Attempt to run server as non-existent or superuser\n";
}

# [...] stuff happens [...]

# Resolve hostname of remote machine before calling chroot()
#
$raw_addr = (unpack("S n a4 x8", $remote_host))[2];
$dot_addr = join(".", unpack("C4", $raw_addr));
$name = (gethostbyaddr($raw_addr, AF_INET))[0];

# chroot() to docroot and then change our effective userid
#
chroot($docroot) || die "chroot() failed: $!\n";
$> = $uid;
Not all network servers are amenable to running in a chroot-ed environment, but consider the option when developing your own servers.

That's It!

Network programming is fun once you get the hang of it. Feel free to take this basic framework and adapt it for your own needs. One improvement that could be made is to use syslog() instead of print() for outputting informational and error messages. Typically, nothing is watching the standard output of network daemons, so syslog is a more appropriate mechanism. See Syslog.pm in the Perl lib directory for more details.
#!/usr/local/bin/perl

use Socket;

$docroot = "/home/hal/public_html";
$this_host = "my-server.netmarket.com";
$port = 80;
$user = "nobody";

# Let children perish
#
$SIG{"CHLD"} = "IGNORE";

# Get userid for $user. Abort if userid is zero (superuser) or
# non-existent.
#
unless ($uid = (getpwnam($user))[2]) {
	die "Attempt to run server as non-existent or superuser\n"; }

# Initialize C structure
#
$server_addr = (gethostbyname($this_host))[4];
$server_struct = pack("S n a4 x8", AF_INET, $port,
				$server_addr);

# Set up socket
#
$proto = (getprotobyname("tcp"))[2];
socket(SOCK, PF_INET, SOCK_STREAM, $proto) ||
	die "Failed to initialize socket: $!\n";

# Bind to address/port and set up pending queue
#
setsockopt(SOCK, SOL_SOCKET, SO_REUSEADDR, 1) ||
	die "setsockopt() failed: $!\n";
bind(SOCK, $server_struct) || die "bind() failed: $!\n";
listen(SOCK, SOMAXCONN) || die "listen() failed: $!\n";

# Deal with requests
#
for ( ;; ) {
	# Grab next pending request
	#
	$remote_host = accept(NEWSOCK, SOCK);
	die "accept() error: $!\n" unless ($remote_host);

	# We're the parent if fork() returns non-zero
	#
	last unless (fork());
	close(NEWSOCK);
}

# *** If we've fallen out of the loop, then we're the child. ***

# Close master socket
#
close(SOCK);

# Resolve hostname of remote machine before calling chroot()
#
$raw_addr = (unpack("S n a4 x8", $remote_host))[2];
$dot_addr = join(".", unpack("C4", $raw_addr));
$name = (gethostbyaddr($raw_addr, AF_INET))[0];

# chroot() to docroot and then change our effective userid
#
chroot($docroot) || die "chroot() failed: $!\n";
$> = $uid;

# Read client request and get $path
#
while (<NEWSOCK>) {
	last if (/^\s*$/);
	next unless (/^GET /);
	$path = (split(/\s+/))[1];
}

# Print a line of logging info to STDOUT
#
print "$dot_addr\t$name\t$path\n";

# Respond with info or error message
#
if (open(FILE, "< $path")) {
	@lines = <FILE>;
	print NEWSOCK @lines;
	close(FILE);
}
else {
	print NEWSOCK <<"EOErrMsg";
<TITLE>Error</TITLE><H1>Error</H1>
The following error occurred while trying to retrieve your
information:
$! EOErrMsg
}

# All done
#
close(NEWSOCK);


Reproduced from ;login: Vol. 22 No. 5, February 1997.

Back to Table of Contents

2/12/97jd