CS 290
Lab 3: Building a HTTP Server
IMPORTANT: This is a team project. The maximum number of members
in the team is 2. You may work individually if you decide to do so, but
your project will be subject to the same grading standards. You are
required to use a source control system like SVN or mercurial. Set up this before
you start your project.
Pre-reading for this Lab
Before coming to the lab, study carefully the example server given in
[1].
Also, familiarize yourself with the following functions in the socket
API:
getservbyname,
getprotobyname,
bind,
listen,
accept
etc.
Purpose of the Lab
The objective of this lab is to implement a HTTP server that will allow
a HTTP client (a web browser like FireFox or Internet Explorer ) to
connect
to it and download files.
HTTP Protocol Overview
A HTTP client issues a `GET' request to a server in order to retrieve a
file. The general syntax of such a request is given below :
GET <sp> <Document Requested> <sp>
HTTP/1.0 <crlf>
{<Other Header Information> <crlf>}*
<crlf>
where :
- <sp> stands for a whitespace character and,
- <crlf> stands for a carraige return-linefeed
pair.
i.e. a carriage
return (ascii character 13) followed by a linefeed (ascii character 10).
- <crlf><crlf> is
also
represented as "\n\n".
- <Document Requested> gives us the name of the
file
requested
by the client. As mentioned in the previous lab, this could be just a
backslash
( / ) if the client is requesting the default file on the server.
- {<Other Header Information> <crlf>}*
contains
useful ( but
not critical ) information sent by a client. These can be ignored for
this
lab. Note that this part can be composed of several lines each
seperated
by a <crlf>.
- * - kleene star // regular
expressions
Finally, observe that the client ends the request with two carriage
return
linefeed character pair: <crlf><crlf>
The function of a HTTP server is to parse the above request from a
client,
identify the file being requested and send the file across to the
client.
However, before sending the actual document, the HTTP server must send
a response header to the client. The following shows a typical response
from a HTTP server when the requested file is found on the server:
HTTP/1.0 <sp> 200 <sp> Document <sp>
follows <crlf>
Server: <sp> <Server-Type> <crlf>
Content-type: <sp> <Document-Type> <crlf>
{<Other Header Information> <crlf>}*
<crlf>
<Document Data>
where :
- <Server-Type> identifies the
manufacturer/version of
the server.
For this lab, you can set this to CS 290 lab3.
- <Document-Type> indicates to the client, the
type of
document
being sent. This should be "text/html" for an html document,
"image/gif" for
a gif file, "text/plain" for plain text, etc.
- {<Other Header Information><crlf>}* as
before,
contains
some additional useful header information for the client to use. These
may be ignored for this lab.
- <Document Data> is the actual document
requested.
Observe that
this is separated from the response headers be two carraige return -
linefeed
pairs.
If the requested file cannot be found on the server, the server must
send
a response header indicating the error. The following shows a typical
response:
HTTP/1.0 <sp> 404 File Not Found <crlf>
Server: <sp> <Server-Type> <crlf>
Content-type: <sp> <Document-Type> <crlf>
<crlf>
<Error Message>
where :
- <Document-Type> indicates the type of document
(i.e.
error
message in this case) being sent. Since you are going to send a plain
text
message, this should be set to text/plain.
- <Error Message> is a human readable description
of
the error
in plain text/html format indicating the error (e.g. Could not find the
specified URL. The server returned an error).
Procedure and Algorithm Details
This project is divided in three stages. Each stage is built on top of
the previous one. If you are not able to finish the later stages, you
may
obtain partial credit by finishing and turning in the earlier stages.
Stage 0:
The source of an example time server program called daytime-server
are
provided. See lab3-src. You can use this
program to learn how to do socket
programming
with servers. Download the file lab3-src.tar.gz.
uncompress it and untar it:
gunzip lab3-src.tar.gz
tar -xvf lab3-src.tar
Then build the server by typing make. Run the server by typing
daytime-server without arguments
to get information about how to use the server. Run the server and read
the sources to see how it is implemented. Some of the functionality of
the HTTP server that you will implement is already available in this
server.
Stage 1:
Basic Server
You will implement an iterative HTTP server that implements the
following
basic algorithm:
- Open Passive Socket.
- Do Forever
- Accept new TCP connection
- Read request from TCP connection and parse it.
- Frame the appropriate response header depending on whether
the URL
requested
is found on the server or not.
- Write the response header to TCP connection.
- Write requested document (if found) to TCP connection.
- Close TCP connection
The server that you will implement at this stage will not be
concurrent,
i.e., it will not serve more than one client at a time (it queues the
remaining requests while processing each request). You can base your
implementation on the example server given in [1] . The server should
work as specified in the overview above. Make a
copy of the daytime server and name it "myhttpd.cpp". Add the
rules to
the Makefile to build it.
POINTS WILL BE DEDUCTED FOR INCORRECT MAKEFILE.
Adding Concurrency
You will also add concurrency to the server. You will implement three
concurrency
modes. The concurrency mode will be passed as argument. The concurrency
modes you will implement are the following:
-f : Create a new process for each request
In this mode your HTTP server will fork a child process when a request
arrives. The child process will process this request while the parent
process
will wait for another incoming request. You will also have to prevent
the
accumulation of inactive zombie processes. You can base your
implementation on the
server given in [3]
-t : Create a new thread for each request
In this mode your HTTP server will create a new thread to process each
request that arrives. The thread will go away when the request is
completed.
-p: Pool of threads
In this mode your server will put first the master socket in listen
mode
and then it will create a pool of 5 threads where each thread will
execute
a procedure that has a while loop running forever which calls
accept() and
dispatches the request. The idea is to have an iterative server running
in each thread. Having multiple threads calling accept() at the same
time
will work but it creates some overhead under Solaris (See [4]). To
avoid
having multiple threads calling accept() at the same time, use a MUTEX
lock around the accept() call.
If you want a review of threads see Introduction
to Threads.
The format of the command should be:
myhttpd [-f|-t|-p] [<port>]
If no flags are passed the server will be an iterative server like in
the Basic Server section. If <port> is not passed, you will
choose your own default port number. Make sure it is larger than 1024
and less than
65536.
MAKE SURE THAT THERE IS A HELP FUNCTION AND YOUR CODE IS INDENTED
AND EASY TO READ (there will be points for this).
This stage is due Friday, October 22nd, at 11:59pm. Write
your
program in a directory called lab3-src. Make sure that your server can
be built by typing "make" in one of the lab machines. Include a
file README
with the names and logins of the team members. Submit only once for
each
team. You will turn in this part electronically by typing the following
command:
turnin -c cs290 -p lab3-1 lab3-src
Stage 2
Browsing Directories
In this stage you will add to your server the capacity to browse
directories.
If the <Document Requested> in the request is a
directory,
your HTTP server should return an HTML document with hyperlinks to the
contents
of the directory. Also, you should be able to recursively browse
subdirectories
contained in this directory. An example of how a directory should look
like is indicated in http-root-dir. Check
the
man pages for opendir and readdir.
Also implement sorting by name, size, and modification time.
Below is a pdf file introducing the functions you might be using for
this stage
pso02.pdf
Requesting directories without the "/"
Requesting directories without "/" can confuse the relative paths
inside that directory. It is better to handle these requests using a
301 redirerect message like Apache does. If the document requested is a
directory without a "/" at the end, then your server should send a 301
message redirecting that document to a document with "/" at the end.
See the following reply from Apache. Your server shoudld o the same.
harry@vodka:~$ echo -ne "GET /homes/grr HTTP/1.0\r\nhost:
www.cs.purdue.edu 80\r\n\r\n" | nc www.cs.purdue.edu 80
HTTP/1.0 301 Moved Permanently
Date: Tue, 10 Feb 2009 19:39:28 GMT
Server: Apache/1.3.37 (Unix)
Location: http://www.cs.purdue.edu/homes/grr/
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
139
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A
HREF="http://www.cs.purdue.edu/homes/grr/">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.37 Server at www.cs.purdue.edu Port 80</ADDRESS>
</BODY></HTML>
MAKE SURE THAT THERE IS A HELP FUNCTION when using -h AND YOUR
CODE IS
INDENTED
AND EASY TO READ (there will be points for this).
This stage is due Friday October 29th, at 11:59pm. Write
your program
in a directory called lab4-src. Make sure that your server can be built
by typing "make" in one of the lab machines. You will turn in this part
electronically by typing the following command:
turnin -c cs290 -p lab3-2 lab3-src
URL to File Mapping
In Apache and other web servers, the mapping from URL documents
to directories in the file system is done through a configuration file.
In your project you will do this mapping programatically in the
following way:
If document is icons/document in the URL, then it will serve the
document from http-root-dir/icons/document.
If the document is cgi-bin/script, then your server will execute the
scrip in http-root-dir/cgi-bin.
Otherwise, the document requested will be served from
http-root-dir/htdocs. The document could be a subdirectory or a file in
a subdirectory in htdocs.
IMPORTANT: A URL for your server should not contain http-root-dir or
htdocs in it.
IMPORTANT: Make sure that a user should not be able to request/browse
files above the htdocs/ cgi-bin/ or icons/ directories. You can use the
"realpath" function to translate paths with ".." and other relative
paths to absolute paths.
IMPORTANT: The default page when requesting http://host:port should be
the index.html in htdocs.
Stage 3
CGI-BIN
In this stage you will implement cgi-bin . When a request like
this
one arrives:
GET <sp>
/cgi-bin/<script>?{<var>=<val>&}*{<var>=<val>}<sp>
HTTP/1.0 <crlf>
{<Other Header Information> <crlf>}*
<crlf>
the child process that is processing the request will call execv
on the program in cgi-bin/<script>.
There are two ways the variable-value pairs in {<var>=<val>&}*{<var>=<val>}
are
passed to the cgi-bin script: the GET method and the POST method. You
will
implement the GET method and for extra points you may implement
the POST
method.
In the GET method the string of variables {<var>=<val>&}*{<var>=<val>}
is passed to the <script> program as an environment
variable QUERY_STRING.
It is up to the <script> program to decode this string. Also if
this
string of variables exists, you should set the REQUEST_METHOD
environment
variable to "GET". The output of <script> will be sent back to
the client.
For more information on how cgi-bin works see The
Common Gateway Interface <http://hoohoo.ncsa.uiuc.edu/cgi/>.
Loadable Modules
In this stage you will implement loadable modules to be able to extend
your server. When the name of a cgi-bin script ends with .so, instead
of
calling exec for this file your server will load that module into
memory using dlopen(), if it has not been previously loaded.
Then
your server will transfer the control to this module by first looking
up
the function extern "C" httprun(int ssock, char * query_string)
in that module using dlsym() and then calling httprun()
passing
the slave socket and the query string as parameters. httprun()
will
write the response to the ssock slave socket using the parameters in
querystring.
For example, a request of the form:
http://localhost:8080/cgi-bin/hello.so?a=b
will make your server load the loadable module hello.so into
memory
and then call the function httprun() in this module with ssock
and querystring as parameters. It is up to the module to write
the
response to ssock. Your server needs to keep track of what
modules
have been already loaded to not call dlopen() multiple times
for
the same module.
There is an example of how to use loadable modules in your lab3-src/.
Also, in this part, you will need to rewrite the script http-root-dir/cgi-src/jj.c
into a loadable module and name it jj-mod.c.
Hint: Use the call fdopen to be able to use buffered and formatter
calls such as fprintf() to write to the slave socket. For
example,
in the top of httprun() in jj-mod.c call
FILE * fssock = fdopen( ssock, "r+");
Then you can use the following to print to the slave socket:
fprintf (fssock, "tomato, and mayo.<P>%c",LF);
Remember to close ffsock at the end of httprun().
fclose( fssock);
MAKE SURE THAT THERE IS A HELP FUNCTION AND YOUR CODE IS INDENTED
AND EASY TO READ (there will be points for this)
Implementing the Statistics and Log pages
You will implement a page http://localhost:<port>/stats
with the following:
- The names of the team members
- The time the server has been up
- The number of requests since the server started
- The minimum service time and the URL request that took this time.
- The maximum service time and the URL request that took this time.
The service-time is the time it takes to service a request since the
request is acccepted until the socket is closed. Use the function
timer_gettime to measure the duration of the requests and link your
program
with -lrt.
Also implement a page http://localhost:<port>/logs that will
display a list
of all the requests so far including in each line:
- The source host of the request
- The directory requested
The log will be stored into a file that will be preserved across runs.
Turning in your project
1. You will presenting your projects to your PSO instructor during PSO
time. If you will not be able to attend you PSO, you are responsible
for
arranging another time with him for the presentation.
2. Make sure that your server uses the http-root-dir and it loads by
default the index.html from this directory. Test the simple, complex
test, browsing directories and cgi-bin's. Your
PSO instructors will use this directory during the presentation.
3. Write a short README file that includes:
a) Features in the
handout
that you have implemented
b) Features in the handout
that you have not implemented
c) Extra features
Include this file in your server's directory lab4-src/
4. You still need to turnin your project electronically.
Write your program in a directory called lab4-src. Make sure that
your
server can be built by typing "make" in one of the lab machines.
IMPORTANT: Do not include the http-root-dir in your submitted files.
You will turn in this part electronically by typing the following
command
from a lab machine before your presentation:
turnin -c cs290 -p lab3-3 lab3-src
The deadline for the project is on November 8th at 11:59, one day
before the presentations take place. tThe presentations will take place
on November 9th and 11th during your PSO. The list of available times will be
posted in LWSN1169 so you can sign up for you presentation time.
The grade will be based on how well your server works, the
organization
of your code, as well as the extra features you include to your
project. Do not forget the README file.
Reading and References
- [1] Chapter 30 in `Computer Networks and Internets' by Douglas E.
Comer
- "Example of a client and a server"
- [2] Chapter 10 in `Internetworking with TCP/IP - Vol 3' by
Douglas E.
Comer
and David L. Stevens - "Iterative, Connection Oriented Servers (TCP)".
- [3] Chapter 11 in `Internetworking with TCP/IP - Vol 3' by
Douglas E.
Comer
and David L. Stevens - "Concurrent, Connection Oriented Servers (TCP)".
- [4] RFC 1945 defines the HTTP 1.0 protocol. You can access this
by
typing
`rfc 1945' on your console.
- [5] "UNIX Network Programming Vol 1" by Richard Stevens