CS 252
Lab 5: Building a HTTP Server

IMPORTANT: This project will be done individually.

Pre-reading for this Lab

Before coming to the lab, study carefully the example server given in [1]. Also, familiarize yourself with the following functions in the socket API: getservbyname, getprotobyname, bind, listen, accept etc.

Purpose of the Lab

The objective of this lab is to implement a HTTP server that will allow a HTTP client (a web browser like FireFox or Internet Explorer ) to connect to it and download files.

HTTP Protocol Overview

A HTTP client issues a `GET' request to a server in order to retrieve a file. The general syntax of such a request is given below :

GET <sp> <Document Requested> <sp> HTTP/1.0 <crlf>
{<Other Header Information> <crlf>}*
<crlf>

where :

<sp> stands for a whitespace character and,
<crlf> stands for a carraige return-linefeed pair. i.e. a carriage return (ascii character 13) followed by a linefeed (ascii character 10).
<crlf><crlf> is also represented as "\n\n".
<Document Requested> gives us the name of the file requested by the client. As mentioned in the previous lab, this could be just a backslash ( / ) if the client is requesting the default file on the server.
{<Other Header Information> <crlf>}* contains useful ( but not critical ) information sent by a client. These can be ignored for this lab. Note that this part can be composed of several lines each seperated by a <crlf>.
* - kleene star // regular expressions

Finally, observe that the client ends the request with two carriage return linefeed character pair: <crlf><crlf>

The function of a HTTP server is to parse the above request from a client, identify the file being requested and send the file across to the client. However, before sending the actual document, the HTTP server must send a response header to the client. The following shows a typical response from a HTTP server when the requested file is found on the server:

HTTP/1.1 <sp> 200 <sp> Document <sp> follows <crlf>
Server: <sp> <Server-Type> <crlf>
Content-type: <sp> <Document-Type> <crlf>
{<Other Header Information> <crlf>}*
<crlf>
<Document Data>

where :

<Server-Type> identifies the manufacturer/version of the server. For this lab, you can set this to CS 252 lab5.
<Document-Type> indicates to the client, the type of document being sent. This should be "text/html" for an html document, "image/gif" for a gif file, "text/plain" for plain text, etc.
{<Other Header Information><crlf>}* as before, contains some additional useful header information for the client to use. These may be ignored for this lab.
<Document Data> is the actual document requested. Observe that this is separated from the response headers be two carraige return - linefeed pairs.

If the requested file cannot be found on the server, the server must send a response header indicating the error. The following shows a typical response:

HTTP/1.1 <sp> 404 File Not Found <crlf>
Server: <sp> <Server-Type> <crlf>
Content-type: <sp> <Document-Type> <crlf>
<crlf>
<Error Message>

where :

<Document-Type> indicates the type of document (i.e. error message in this case) being sent. Since you are going to send a plain text message, this should be set to text/plain.
<Error Message> is a human readable description of the error in plain text/html format indicating the error (e.g. Could not find the specified URL. The server returned an error).

Procedure and Algorithm Details

This project is divided in three stages. Each stage is built on top of the previous one.

Stage 0:

The source of an example time server program called daytime-server are provided. You can use this program to learn how to do socket programming with servers. In this project you will use the Source Control program to download your sources and to keep track of your project. It is important that you push your changes at least twice week so we can see the progress of your project. Please see the git tutorial first. Here is a summary of the commands that you will be using.
Login to a sslab machine (sslab01-sslab20) and type:

	   git config --global user.name $USER
	   git config --global user.email "$USER@purdue.edu"
	   git clone ssh://$USER@data.cs.purdue.edu/homes/cs252/sourcecontrol/work/$USER/lab5-src.git

When you have done a group of modifications and you have reached a milestone, then it is time to push your changes. Type:

	   git commit -am "Type here a description of the changes you have made."
	   git push

If you want to see a list of all the changes you have done. Type

	   git log

See the git tutorial described above. It is important that you commit/push your changes at least twice a week. Then build the server by typing make. Run the server by typing daytime-server without arguments to get information about how to use the server. Run the server and read the sources to see how it is implemented. Some of the functionality of the HTTP server that you will implement is already available in this server.

Stage 1:

Basic Server

You will implement an iterative HTTP server that implements the following basic algorithm:

Open Passive Socket.

Do Forever

Accept new TCP connection

Read request from TCP connection and parse it.

Frame the appropriate response header depending on whether the URL requested is found on the server or not.

Write the response header to TCP connection.

Write requested document (if found) to TCP connection.

Close TCP connection

The server that you will implement at this stage will not be concurrent, i.e., it will not serve more than one client at a time (it queues the remaining requests while processing each request). You can base your implementation on the example server given in [1] . The server should work as specified in the overview above. Make a copy of the daytime server and name it "myhttpd.cpp". Add the rules to the Makefile to build it.

POINTS WILL BE DEDUCTED FOR INCORRECT MAKEFILE.

Adding Concurrency

You will also add concurrency to the server. You will implement three concurrency modes. The concurrency mode will be passed as argument. The concurrency modes you will implement are the following:

-f : Create a new process for each request

In this mode your HTTP server will fork a child process when a request arrives. The child process will process this request while the parent process will wait for another incoming request. You will also have to prevent the accumulation of inactive zombie processes. You can base your implementation on the server given in [3]

-t : Create a new thread for each request

In this mode your HTTP server will create a new thread to process each request that arrives. The thread will go away when the request is completed.

-p: Pool of threads

In this mode your server will put first the master socket in listen mode and then it will create a pool of 5 threads where each thread will execute a procedure that has a while loop running forever which calls accept() and dispatches the request. The idea is to have an iterative server running in each thread. Having multiple threads calling accept() at the same time will work but it creates some overhead under Solaris (See [4]). To avoid having multiple threads calling accept() at the same time, use a MUTEX lock around the accept() call.

If you want a review of threads see Introduction to Threads.

The format of the command should be:

myhttpd [-f|-t|-p] [<port>]

If no flags are passed the server will be an iterative server like in the Basic Server section. If <port> is not passed, you will choose your own default port number. Make sure it is larger than 1024 and less than 65536.

MAKE SURE THAT THERE IS A HELP FUNCTION AND YOUR CODE IS INDENTED AND EASY TO READ (there will be points for this).

Stage 2

Browsing Directories

In this stage you will add to your server the capacity to browse directories. If the <Document Requested> in the request is a directory, your HTTP server should return an HTML document with hyperlinks to the contents of the directory. Also, you should be able to recursively browse subdirectories contained in this directory. An example of how a directory should look like is indicated in http-root-dir. Check the man pages for opendir and readdir.

Also implement sorting by name, size, and modification time.

MAKE SURE THAT THERE IS A HELP FUNCTION AND YOUR CODE IS INDENTED AND EASY TO READ (there will be points for this)

For purposes of pacing, you should have this stage completed by Monday, April 6th, at 11:59pm. Write your program in a directory called lab5-src. Make sure that your server can be built by typing "make" in one of the lab machines. You will turn in this part electronically by typing the following command:

         git commit -am "Lab5 Part 1 submission" 
         git push

5. Verify that you have turned in your files correctly by typing:

        git status

Stage 3

CGI-BIN

In this stage you will implement cgi-bin . When a request like this one arrives:

GET <sp> /cgi-bin/<script>?{<var>=<val>&}*{<var>=<val>}<sp> HTTP/1.0 <crlf>
{<Other Header Information> <crlf>}*
<crlf>

the child process that is processing the request will call execv on the program in cgi-bin/<script>.

There are two ways the variable-value pairs in {<var>=<val>&}*{<var>=<val>}are passed to the cgi-bin script: the GET method and the POST method. You will implement the GET method and for extra points you may implement the POST method.

In the GET method the string of variables {<var>=<val>&}*{<var>=<val>} is passed to the <script> program as an environment variable QUERY_STRING. It is up to the <script> program to decode this string. Also if this string of variables exists, you should set the REQUEST_METHOD environment variable to "GET". The output of <script> will be sent back to the client.

In summary your cgi-bin implementation should:

Fork child process
Set the environment variable REQUEST_METHOD=GET
Set the environment variable QUERY_STRING=(arguments after ?)
Redirect output of child process to slave socket.

Print the following header:

HTTP/1.1 200 Document follows crlf 
Server: Server-Type crlf

Execute script

The script or cgi program will print the content type and will generate an output that is sent to the browser.

For more information on how cgi-bin works see the Apache documentation.
Note. You will need to recompile the cgi-bin modules in lab5-src/http-root-dir/cgi-src

cd lab5-src/http-root-dir/cgi-src
grep getline *
----- Replace all the occurrences of "getline" to "mygetline"
make clean
make

Loadable Modules

In this stage you will implement loadable modules to be able to extend your server. When the name of a cgi-bin script ends with .so, instead of calling exec for this file your server will load that module into memory using dlopen(), if it has not been previously loaded. Then your server will transfer the control to this module by first looking up the function extern "C" httprun(int ssock, char * query_string) in that module using dlsym() and then calling httprun() passing the slave socket and the query string as parameters. httprun() will write the response to the ssock slave socket using the parameters in querystring.

For example, a request of the form:

http://localhost:8080/cgi-bin/hello.so?a=b

will make your server load the loadable module hello.so into memory and then call the function httprun() in this module with ssock and querystring as parameters. It is up to the module to write the response to ssock. Your server needs to keep track of what modules have been already loaded to not call dlopen() multiple times for the same module.

There is an example of how to use loadable modules in your lab5-src directory included in your git repository.

Also, in this part, you will need to rewrite the script http-root-dir/cgi-src/jj.c into a loadable module and name it jj-mod.c.
Hint: Use the call fdopen to be able to use buffered and formatter calls such as fprintf() to write to the slave socket. For example, in the top of httprun() in jj-mod.c call

FILE * fssock = fdopen( ssock, "r+");

Then you can use the following to print to the slave socket:

fprintf (fssock, "tomato, and mayo.<P>%c",LF);

Remember to close ffsock at the end of httprun().

fclose( fssock);

MAKE SURE THAT THERE IS A HELP FUNCTION AND YOUR CODE IS INDENTED AND EASY TO READ (there will be points for this)

Implementing the Statistics and Log pages

You will implement a page http://localhost:<port>/stats with the following:

The names of the student who wrote the project
The time the server has been up
The number of requests since the server started
The minimum service time and the URL request that took this time.
The maximum service time and the URL request that took this time.

The service-time is the time it takes to service a request since the request is acccepted until the socket is closed. Use the function timer_gettime to measure the duration of the requests and link your program with -lrt.

Also implement a page http://localhost:<port>/logs that will display a list of all the requests so far including in each line:

The source host of the request
The directory requested

The log will be stored into a file that will be preserved across runs.

Extra Credit

You may notice that the logging features become somewhat complicated when multiple processes are involved (such as with the -f option). How do you coordinate writes to the log file between these processes? Successfully creating a logging feature that cleanly logs to the same file from multiple processes will earn 5 extra credit points.

Hint: look up the mmap() function manpage.

Turning in your project

1. You will presenting your projects to your PSO instructor during PSO time. If you will not be able to attend you PSO, you are responsible for arranging another time with him for the presentation.
2. Make sure that your server uses the http-root-dir and it loads by default the index.html from this directory. Test the simple, complex test, browsing directories and cgi-bin's. Your PSO instructors will use this directory during the presentation.

3. Write a short README file that includes:

        a) Features in the handout that you have implemented
        b) Features in the handout that you have not implemented
        c) Extra features

Include this file in your server's directory lab5-src/

4. You still need to turnin your project electronically.

Write your program in a directory called lab5-src. Make sure that your server can be built by typing "make" in one of the lab machines.

IMPORTANT: Do not include the http-root-dir in your submitted files.

REMEMBER: You have to do a "git add" on any new source files that you have created and added to your lab in order for git to know about them!

You will turn in this part electronically by typing the following command from a lab machine:

         git add <any previously unversioned files>
         git commit -am "Lab5 Part 1 submission"
         git push

5. Verify that you have turned in your files correctly by typing:

        git status

The deadline for electronic turnin is Monday April 13th at 11:59pm. The presentations will take place the following week during your PSO.

The grade will be based on how well your server works, the organization of your code, as well as the extra features you include to your project.

Reading and References

[1] Chapter 30 in `Computer Networks and Internets' by Douglas E. Comer - "Example of a client and a server"
[2] Chapter 10 in `Internetworking with TCP/IP - Vol 3' by Douglas E. Comer and David L. Stevens - "Iterative, Connection Oriented Servers (TCP)".
[3] Chapter 11 in `Internetworking with TCP/IP - Vol 3' by Douglas E. Comer and David L. Stevens - "Concurrent, Connection Oriented Servers (TCP)".
[4] RFC 1945 defines the HTTP 1.0 protocol. You can access this by typing `rfc 1945' on your console.
[5] "UNIX Network Programming Vol 1" by Richard Stevens

CS 252
Lab 5: Building a HTTP Server

Frequently asked questions

Lab 5 Slides

Your server should look like this: This is the http-root-dir as served by the CS HTTP Server.

Pre-reading for this Lab

Purpose of the Lab

HTTP Protocol Overview

Procedure and Algorithm Details

Stage 0:

Stage 1:

Basic Server

Adding Concurrency

-f : Create a new process for each request

-t : Create a new thread for each request

-p: Pool of threads

Stage 2

Browsing Directories

Stage 3

CGI-BIN

Loadable Modules

Implementing the Statistics and Log pages

Extra Credit

Turning in your project

CS 252 Lab 5: Building a HTTP Server

Frequently asked questions

Lab 5 Slides

Your server should look like this: This is the http-root-dir as served by the CS HTTP Server.

Pre-reading for this Lab

Purpose of the Lab

HTTP Protocol Overview

Procedure and Algorithm Details

Stage 0:

Stage 1:

Basic Server

Adding Concurrency

-f : Create a new process for each request

-t : Create a new thread for each request

-p: Pool of threads

Stage 2

Browsing Directories

Stage 3

CGI-BIN

Loadable Modules

Implementing the Statistics and Log pages

Extra Credit

Turning in your project

CS 252
Lab 5: Building a HTTP Server