Frequently Asked Questions

1. Content Types, virtual root directory and safety

 >
 > I have some questions about the project. Should there be a configuration
 > file of supported "Content-type"s ? Our server only supports text/html and
 > test/plain right now. Should we support the standard ones like image/gif,
 > image/jpg, etc?

Support at least the necessary ones to load the example
documents in htdocs. They are "text/html" for html files,
"text/plain" for plain text, and "image/gif" for gif files.

You will change the Content-type depending of the suffix of the file.
For example, if the requested file ends with .html or .htm, then
the Content-type will be "text/html". If it ends with ".gif, then the
content type is "image/gif" .

You don't need a configuration file for that. You can hardwired in
your program the types specified above. It is up to you if you want to
implement a configuration file.

 > Will you be giving us some more details on Friday, or do
 > you want us to determine things like where the virtual root
directory.

The virtual root directory is the directory the server looks up
for documents.

By default the virtual root directory of your server will be the
current directory where you execute your server. It is up to you if
you want to implement passing a different root directory as argument.

 > should be and if a "../../" in the URL should allow the user to read files
 > above the virtual root directory on the file system?
 >

Make sure that the URL's do not go above the virtual root
directory. This is the least security mechanism that your http server
will have. If a URL tries to read above the virtual root return an
error.
 

2. Problems sending gif files

 > Is there any reason you can think of why a .gif image would fail to load
 > properly when sent by my http server?  HTML pages come up correctly, but
 > when I try to load a gif, it comes up in Netscape as a broken picture.
 >
 > When I manually telnet to my port and request a gif, it appears to be
 > sending correctly, i.e. HTTP header, followed by a blank line, and then a
 > bunch of control characters.
 >
 > I faintly remember you mentioning something about .gifs in class, but I
 > could be wrong.  I was also unable to check the class web page FAQ since
 > the CS webserver seems to be down tonight.
 >

Make sure that you are not sending less bytes than what you are actually reading at server. Check for bytesRead during the read call and check for bytesSend returned by send call and compare the two. Also it would be better to use fread rather than fgets with images.

Make sure to use two <CRLF> sequences to  separate the http header
and the doument (or gif). That is the way netscape knows where the
document starts.
 

3. Problems with undefined symbols

 > I got the following error message:
 > enad138-003.cc.purdue.edu% make
 > gcc -c H_server.cc
 > gcc -o H_server H_server.o connectsock.o connectTCP.o passivesock.o passiveTCP.o
 > errexit.o -lnsl -lsocket -lthread
 > Undefined                       first referenced
 >  symbol                             in file
 > passiveTCP__FPCci                   H_server.o
 > errexit__FPCce                      H_server.o
 > ld: fatal: Symbol referencing errors. No output written to H_server
 > *** Error code 1
 > make: Fatal error: Command failed for target `H_server'
 >

To include "C" functions in your C++ program you have to
declare them as:

 extern "C" <function>

For example type in your C++ program at the beginning.

 extern "C" int passiveTCP(const char *service, int qlen);
 extern "C" int errexit(const char *format, ...);
 

4. Using setsockopt() and SEGV in accept()

When using accept()

 int accept(int s, struct sockaddr *addr, int *addrlen);

you have to initialize the parameter "addrlen" with the size of
the argument you pass in "addr". Failing to do so your program may
get SEGV in unexpected places.
 

Also when using setsockopt()

 int setsockopt(int s, int level, int optname,
  const char *optval, int optlen);

You have to pass in optval an address to an integer variable that has
0 if you want to disable the option, or 1 if you want to enable it.
 

5. Converting integers to string and formating

You can use "sprintf()" to convert integers to string and
to format the output.

 int sprintf(char *s, const char *format, /* args */ ... );

sprintf works like printf, however instead of sending the output to
stdout, it sends it to the string "s" in the first argument. Make sure
that "s" has enough space to store the output of sprintf.
 

6. Problems reusing port

 > I am still having problems with my server giving up the socket when the
 > process has been killed.  I am doing this...
 >
 >   int optval = 1;
 >   int pid;
 >   int msock,ssock;
 >   struct sockaddr_in fsin; /* the from address of a client */
 >   int alen;   /* from-address length  */
 >   printf("Concurrency: fork\n");
 >
 >  msock = passiveTCP("80808", QLEN);
 >
 >         setsockopt(msock, SOL_SOCKET, SO_REUSEADDR, (char *)&optval,
 >                    sizeof(int));
 >
 > I keep having to wait for a little while before I can reuse the socket.
 > Is there something wrong here?
 >
 >

The call to setsockopt() should be before binding the socket in
passivesock().
 

7. Problems with finger and others

> > When we do cgi, like finger, do we have to reparse > finger?gbujak%40cs.purdue.edu > > to gbujak@cs.purdue.edu > > or when its greg[space]bujak which cgi turns it into greg+bujak > back to greg[space]bujak > All the cgi-bin scripts that use to get information, such as archie, calendar, finger, and wais pass the input parameters in a different way. If there is a "+" sign in the request like http://.../cgi-bin/xxx?a+b+c, then that means that the arguments that the "+" separates have to be passed as arguments to execvp when executing "xxx", i.e., arg[0]="xxx", arg[1]="a", arg[2]="b", arg[3]="c". This is not required feature, and we will consider it as an extra.
 

8. How to test my server using webdump/netscape

>
>   How can the server be tested? More precisely, if I run the server at
> bobo.cs.purdue.edu machine, how should I use webdump to ask for requests
> to that server I created?
>

If your server runs at port 6880 you can use the URL
bobo.cs.purdue.edu:6880 in webdump or netscape.
 

9. HTDOCS and cgi-bin examples and server semantics

You will find in the on-line handout for lab7 a hyperlink to
a tar file with the htdocs and cgi-bin directories and files that
we will be using for grading.

Untar the tar file. You will see an htdocs and a cgi-bin directory.

Make sure that your http server only serves files from the
htdocs directories and subdirectories and that cgi-bin requests
are only for files in the cgi-bin directory.

HTTP requests for documents will be relative to the htdocs
directory, so it does not need to appear explicitly in the request.

If you have questions about server semantics not specified in the handout,
check first what the departamental HTTP server do. The department
currently uses an Apache server.

10. htdocs and cgi-bin directories

> Should the cgi-bin directory be inside htdocs or not?

No.

> If not, then how should server interpret the request like /directory?
> As http-root-dir/directory or as http-root-dir/htdocs/directory?

As http-root-dir/htdocs/directory.

If the request comes as cgi-bin/finger for example, then your server
will identify that as a cgi-bin request and execute the file in
http-root-dir/cgi-bin/finger

11. Broken pipe

>   One of the students here got some problem like this but I am sure of the
> reason:
>   His server can send ascii files without any problem, even some of size
> 113K, but if it comes to some not very big jpeg file(5K), it can still
> send the file, but after that the server will quit with some info like
> "broken pipe".
>   what could be the possible reason?

It may be that the browser is closing the conection prematurely.
Try ignoring or handling the signal SIGPIPE.
 

12. ".."  in requested directory

> 1.  I am working on assuring that the http request does not go above the
> virtual root, but I'm not sure how to do that.  I currently believe that
> searching for the "../" in the request string will do it, but the user
> may,(for some strange reason) have "../" in there without actually trying
> to go above the root such as
> "http://melissa.cs.purdue.edu:8001/dir/../bla"
> How can I take care of this?

Add the check that .. is not allowed at all in the requested directory.
Only serve files that are in the htdocs/ directory and subdirectories.

13. How the TCP implementation demultiplexes data

> For the first part of this project, are we supposed to open a new port to
> send back to the client, or just use the one that's already open?

The server does not need to open a new port to interact with the client.
It will use the slave socket returned by accept.

The port that the slave sockets use is the same one that the
master socket uses. The TCP implementation uses the tuple
<server-ip, server-port, client-ip, client-port> to identify a conection
and to multiplex to which slave socket the TCP data received should go.

14. Problems connecting to web server


> I'm having trouble just getting the web client to connect to the web server.
> The basic algorithm was to create a socket, bind it, listen on it,
> and then accept. But, I can't get them to communicate. When I use Webdump, I
> get a message saying that the web server is not accepting requests and that
> the connection was refused. netscape says the same. Do you have any ideas of
> why this might be?
>

The code I gave in class has the problem that the port in not converted to
network byte order before asigning it:

        sad.so_port = htons( port );

Also, you can use "netstat -a" as a debugging tool to find out the ports
that are currently in use the machine. The port your server is using
should be there.
 

15. http-root-dir

I suggest that you copy the  http-root-dir directory to your server's
directory. All the requests that go to your server will be obtained from
this directory.

In this order, requests such as:

http://lorenzo:8080/cgi-bin/xxxx

        are obtained from

        <your-servers-dir>/http-root-dir/cgi-bin

http://lorenzo:8080/icons/folder.gif

        are obtained from

        <your-servers-dir>/http-root-dir/icons

http://lorenzo:8080/a

        are obtained from

        <your-servers-dir>/http-root-dir/htdocs
 

If there is a request that contains a directory ".." print an error.
 

16. Diagnostics not being printed

...the program prints everything until the file not found printfs, then stops.... I am including the source ... First of all, add "\n" at the end of each printf. Otherwise the output of your diagnostics will not be flushed.
 

17. Program hangs in read...

A common problem I have seen in some projects is that
the server hangs in read(ss...). This is likely to happen
becaue in the accept call:

        if (ss=accept(........) < 0 ) {

        }

it should be:

        if ((ss=accept(........)) < 0 ) {

        }

"<" has precedence to "=".

This causes ss to be evaluated to 0 instead of the value returned by
accept. read(0,...) will try to read input from stdin.
 

18. Service exits unexpectedly


> I'm having problems making my child processes quit. I don't quite
> understand the point of capturing the SIGCHLD signal. It would seem to me
> like just having the child process exit would work.

What is happening (as Gong pointed out) is that accept() returns -1
when SIGCHLD is received after a child process exits.

Therefore, you need to have something like the following in your code:

        while (1)
                ss=accept(....);

                if ( ss == -1 && errno == EINTR)
                        //accept was interrupted. Retry call
                        conitnue;

                }

errno returns the last error of a system call. errno will be EINTR if
accept was interrupted by a signal.

EINTR is defined in <errno.h>
 

19. How should we test our concurrency once we implement it?


One way to check concurrency is to load the complex example and see how
pictures are loaded simultaneously instead of one by one.
 

12. How to know if it is eliminating zombies?


Type ps -u <loging> and check if there are not <defunct> processes around.
 

13. Only first Zombie process is eliminated


It looks like myt zombie process is eliminated only once. I am using signal() to set up handler.

signal() only sets up the signal handler once and then after the signal
is delivered it disables it.

sigset() restores the signal handler automatically after the signal is
delivered.

Use sigset instead of signal. Also you may use sigaction().
 

14. Special chars in cgi-bin

>
> My server works fine w/ an input of somebody login, but if I add '@' at
> the end, it will be interpreted as %40. ('+' -> %2, '&' -> %26)
>
> Do we need change those back?
>

Yes.

Special characters such as + and @ are escaped when passed as cgi-bin
arguments.

You will need to unescape these characters.
 

15. Debugging headers and problems with gifs


> Hi, I got it to work for Netscape; however, I'm getting broken images for
> complex.html (but it works with IE).
>
> please help? (the header was still the same).
>

This can be due to not sending the right header.
To debug what your server sends, whenever you have an instruction such as:

        write(ss, .....)

Add also a instruction to write it to screen:

        write(ss, .....)
        write(1, .....)

In this way you verify that what is sent is what you expect.

Additionally, make sure that the number bytes read from the file are the
ones sent back to the client. Some students assume that they can
manipulate gif files like text files. gif files are binary files and may
contain 0's in it. Do not use string operations with the contents of gif
files.
 

16. Could not find some cgi-bin scripts such as "jj"


> How about the link to "pizza".  The link is to a file called "jj" in the
> cgi-bin directory that doesn't exist.
>

For those files cd to cgi-src/ and type "make".
 

17. Making the finger cgi-script work


> My confusion is that in the link from index.html, there are no variables.
> If you click on the finger link, it doesn't send any variables. That's why
> I'm confused.

The finger cgi-script will send a form if there are no arguments.

Since ths script uses the <ISINDEX> tag, you may need to do
extra processing in the server. Look at what netscape sends to the server
and modify your server so the finger scripts receives the right
arguments. In other words, do whatever is necessary to make the finger
script work.
 

18. Runnning cgi-scripts

1. The test-env and test-cgi scripts should work fine as long as your cgi
iplementation is OK.

If you are obtaining a scrambled output in netscape it may be because
your server is not sending its header. The cgi-bin scripts send the
content type but the server has to include its own server header.

HTTP/1.1 <sp> 200 <sp> Document <sp> follows <crlf>
Server: <sp> <Server-Type> <crlf>

2. For the finger script, finger without arguments should return an HTML
document. This document includes a tag <ISINDEX> that uses cgi-bin in a
different way that allows passing parameters as arguments of the script
and not in the QUERY_STRING. You will have to make some reverse
engineering and see what <ISINDEX> sends to the server. See
The Common Gateway Interface <http://hoohoo.ncsa.uiuc.edu/cgi/> for more
info.

3. For the pizza example, you will have to modify the file  cgi-src/jj.c
line <FORM ACTION=\"http://hoohoo.ncsa.uiuc.edu:80/htbin/jj\"> to point
to your server. Then type "make". The scripts should run fine as long as
your cgi-bin implementation is OK.

Do whatever it takes to have these scripts running correctly. You may
modify the scripts if necessary. Also you may add your own.

4. Make sure that you have implemented the port reuse option. It is
frustrating to have to grade somebody in the presentation and have to wait
4 minutes or manually change the port number and recompile the server. See
the FAQ to find out how to implement the port reuse.

5. Make sure that the default doucment that is loaded when you type the
URL of your server in netscape is the  one in
http-root-dir/htdocs/index.html. Also make sure that all the simple,
complex tests, directory browsing, and cgi-bin can be accessed from
this page by just clicking on them. This will make the grading and the
presentation easier and you will not loose points for it.
 

19. Mapping htdocs, cgi-bin, and icons

> Should our web server be able to handle requests like this below (i.e.
> should it be able to get files in other directories?)
>
>
> GET /mydir/anotherdir/index.html HTTP/1.0
>
>

The directory that your server should serve is the one in:

        http://www.cs.purdue.edu/homes/cs422/lab4/http-root-dir.tar.Z

Download this file and install it.

There is a  htdocs/, cgi-bin/, and icons/ subdirectory.

A request such as "GET /" will get the file htdocs/index.html.
A request such as "GET /cgi-bin/finger" will get the file from
cgi-bin/finger.
A request such as "GET /icons/ball.gif" will get thew file from
icons/ball.gif.

Your server should only return files in htdocs/ cgi-bin/ or icons/

You can use the function "realpath" and make sure that the
directory requested is inside the http-root-directory.

20. Problems with dir1 and dir


> If the dir like does not have a trailing slash my server does not recurse to
> a depth greater than that particular directory's contents. If it has a
> trailing slash then it is no problem.
>
> Should I fix this and if yes, How?
>

Yes. You have to fix it. That is why the dir1/ and dir are there.

With your http-client  or telnet try to request dir1 and dir1/ in the apache
web-server in:

http://www.cs.purdue.edu/homes/grr/cs422-root-dir-test/htdocs/dir1/

and

http://www.cs.purdue.edu/homes/grr/cs422-root-dir-test/htdocs/dir1

Your server should do the same.
 

21. Sorting by date etc.


> For retrieving the directory, do we need to implement the resorting of
> files/subdirectories upon pressing the according "Name", "Last Modified",
> "Size", etc? They should be thought as CGI since the request actually
> contains "?variable=value", but they are not in cgi-bin directory.
>

It is not required but it would be consider an extra (About 5 pts of 100).
Sorting by name as the default is required.

for the students.

22. Problems with finger

1. When receiving a request such as:

GET /cgi-bin/a?b

Then "b" is passed as argument to "a" in execvp

argv[0] = "a";
argv[1] = "b";
argv[2] = 0;

execvp( argv[0], argv);

If the request has the form:

GET /cgi-bin/a?b+c+d

2. Then "b", "c", "d"" ares passed as argument to "a" in execvp

argv[0] = "a";
argv[1] = "b";
argv[2] = "c";
argv[3] = "d";
argv[4] = 0;

execvp( argv[0], argv);

3. If the request has the form:

GET /cgi-bin/a?b=c&d=e;

Then you pass the "b=c&d=e" in the QUERY_STRING.

You can distinguish between 1,2 and 3 by the "=" or the "+"
 
 

23. Using STL strings

Since I have been encouraging both of my sections, and anyone who will listen, to use STL strings for all of their projects, I have decided to create some examples to get students started on using STL strings. I know there is a bit of an initial learning curve with using STL strings, and the error messages are quite crazy which doesn't help.

However, these examples should cover almost all of the commands you will need: http://www.cs.purdue.edu/homes/wspeirs/stl_strings.html They are laid out as a comparison between C and C++ with STL Strings.

As always it is a good idea to reference the people who made the library, HP & SGI, when looking for definitions, etc. The STL Programmer's Guide can be found here on SGI's site: http://www.sgi.com/tech/stl/

The STL library was created back in 1994 so any errors or unusual results are not because of an error in STL, but rather a misuse of the library or a misunderstanding of what a call is actually doing. Also, the STL library contains a lot of container classes like linked lists, a safer version of array called a vector, and other containers like hash tables, sets, maps, etc. These are already implemented for you, and can reduce the amount of code you need to write by an order of magnitude. Also, common functions like sorting are already done and can be used on anything from a list to an array, to a string; because all of the functions are templated for any type. See SGI's site for more information on these functions.

If anyone has questions about STL please feel free to send me e-mails. Once you get over the learning curve they really become easier to use, and more logical when reading through code. See my example page...
 

24. Compiling Errors Using strtok_r


There have been quite a few question about implicit declaration errors
when trying to compile code with _r system calls. These system calls are
the thread safe versions of the original functions. Ex: strtok (not
thread safe) -> strtok_r (thread safe).

If you are using the default version of gcc/g++ you will get these
errors. You need to do one of two things:

1) The best option is to use the newer version of gcc/g++, version 3.3.
You can tell what version of gcc/g++ you are using by issuing the
command: gcc -v. The default version is 2.95.2 and is VERY old. They are
almost going to release version 4. I would highly recommend using
version 3.3 located here: /p/gcc-3.3/bin/gcc or /p/gcc-3.3/bin/g++. This
can be easily changed in the Makefile. The error messages from 3.3 are a
LOT better, and 3.3 can deal with namespaces, 2.95 cannot. If you are
using STL I would definitely recommend using 3.3. If you are not using
STL you should still use 3.3, but you are a fool for not using STL ;-)

2) If you are really attached to version 2.95, or your code won't
compile under 3.3 (I'd be amazed if it wouldn't) you can use 2.95 and
just add a compiler flag. Add the compiler flag
-D_POSIX_PTHREAD_SEMANTICS to your compile statement for any file using
these system calls. For example, if you have the file main.cpp and it
uses strtok_r then you would compile the file like this: g++ -c main.cpp
-D_POSIX_PTHREAD_SEMANTICS. This will also take care of this error, but
using version 3.3 is the better option.

I hope this clears things up for people. If you are using strtok and NOT
strtok_r then you should talk to your TA because you are doing something
wrong... unless of course you are using STL strings, they are thread
safe!!! (http://www.sgi.com/tech/stl/thread_safety.html)
 

25 Problems using dlopen

Some students are having problems with the jj loadable module.
 
1. To build the module copy the files jj.c, util.c and util.h from
http-root-dir/cgi-src to the lab4-src directory.

cp http-root-dir/cgi-src/jj.c http-root-dir/cgi-src/util.* .

2. Rename jj.c to jj-mod.c. jj-mod.c will be the loadable module

mv jj.c jj-mod.c

3. Modify the Makefile to build jj-mod.so. Add the following to
the Makefile:

CC= gcc

jj-mod.o: jj-mod.c
        $(CC) -c jj-mod.c

util.o: util.c
        $(CC) -c util.c

jj-mod.so: jj-mod.o util.o
        ld -G -o jj-mod.so jj-mod.o util.o

4. Also add -ldl to your http server rule

myhttpd : myhttpd.o<
        $(CXX) -o $@ $@.o $(NETLIBS) -ldl

5. To load the library see the file use-dlopen.cc

   // Opening
  void * lib = dlopen( "./jj-mod.so", RTLD_LAZY );

  if ( lib == NULL ) {
    fprintf( stderr, "dlerror:%s\n", dlerror());
    perror( "dlopen");
    exit(1);
  }

6. Implement in jj-mod.c the function  httprun() as described in the
handout and call dlopen/dlsym in your myhttp server program.

7. It helps to print dlerror() if there is an error in dlopen or dlsym:

// Opening
  void * lib = dlopen( "./jj-mod.so", RTLD_LAZY );

  if ( lib == NULL ) {
    fprintf( stderr, "dlerror:%s\n", dlerror());
    perror( "dlopen");
    exit(1);
  }