Support at least the necessary ones to load the example
documents in htdocs. They are "text/html" for html files,
"text/plain" for plain text, and "image/gif" for gif files.
You will change the Content-type depending of the suffix of the file.
For example, if the requested file ends with .html or .htm, then
the Content-type will be "text/html". If it ends with ".gif, then the
content type is "image/gif" .
You don't need a configuration file for that. You can hardwired in
your program the types specified above. It is up to you if you want
to
implement a configuration file.
> Will you be giving us some more details on Friday, or do
> you want us to determine things like where the virtual root
directory.
The virtual root directory is the directory the server looks up
for documents.
By default the virtual root directory of your server will be the
current directory where you execute your server. It is up to you if
you want to implement passing a different root directory as argument.
> should be and if a "../../" in the URL should allow the user
to read files
> above the virtual root directory on the file system?
>
Make sure that the URL's do not go above the virtual root
directory. This is the least security mechanism that your http server
will have. If a URL tries to read above the virtual root return an
error.
Make sure that you are not sending less bytes than what you are actually reading at server. Check for bytesRead during the read call and check for bytesSend returned by send call and compare the two. Also it would be better to use fread rather than fgets with images.
Make sure to use two <CRLF> sequences to separate the http
header
and the doument (or gif). That is the way netscape knows where the
document starts.
To include "C" functions in your C++ program you have to
declare them as:
extern "C" <function>
For example type in your C++ program at the beginning.
extern "C" int passiveTCP(const char *service, int qlen);
extern "C" int errexit(const char *format, ...);
int accept(int s, struct sockaddr *addr, int *addrlen);
you have to initialize the parameter "addrlen" with the size of
the argument you pass in "addr". Failing to do so your program may
get SEGV in unexpected places.
int addrlen = sizeof( struct sockaddr_in );
ssock = accept( msock, &fsin, &addrlen);
int setsockopt(int s, int level, int optname,
const char *optval, int optlen);
You have to pass in optval an address to an integer variable that has
0 if you want to disable the option, or 1 if you want to enable it.
int sprintf(char *s, const char *format, /* args */ ... );
sprintf works like printf, however instead of sending the output to
stdout, it sends it to the string "s" in the first argument. Make sure
that "s" has enough space to store the output of sprintf.
The call to setsockopt() should be before binding the socket in
passivesock().
If your server runs at port 6880 you can use the URL
bobo.cs.purdue.edu:6880 in webdump or netscape.
Untar the tar file. You will see an htdocs and a cgi-bin directory.
Make sure that your http server only serves files from the
htdocs directories and subdirectories and that cgi-bin requests
are only for files in the cgi-bin directory.
HTTP requests for documents will be relative to the htdocs
directory, so it does not need to appear explicitly in the request.
If you have questions about server semantics not specified in the handout,
check first what the departamental HTTP server do. The department
currently uses an Apache server.
No.
> If not, then how should server interpret the request like /directory?
> As http-root-dir/directory or as http-root-dir/htdocs/directory?
As http-root-dir/htdocs/directory.
If the request comes as cgi-bin/finger for example, then your server
will identify that as a cgi-bin request and execute the file in
http-root-dir/cgi-bin/finger
It may be that the browser is closing the conection prematurely.
Try ignoring or handling the signal SIGPIPE.
Add the check that .. is not allowed at all in the requested directory.
Only serve files that are in the htdocs/ directory and subdirectories.
The server does not need to open a new port to interact with the client.
It will use the slave socket returned by accept.
The port that the slave sockets use is the same one that the
master socket uses. The TCP implementation uses the tuple
<server-ip, server-port, client-ip, client-port> to identify a conection
and to multiplex to which slave socket the TCP data received should
go.
> I'm having trouble just getting the web client to connect to the
web server.
> The basic algorithm was to create a socket, bind it, listen on it,
> and then accept. But, I can't get them to communicate. When I use
Webdump, I
> get a message saying that the web server is not accepting requests
and that
> the connection was refused. netscape says the same. Do you have any
ideas of
> why this might be?
>
The code I gave in class has the problem that the port in not converted
to
network byte order before asigning it:
sad.so_port = htons( port );
Also, you can use "netstat -a" as a debugging tool to find out the ports
that are currently in use the machine. The port your server is using
should be there.
In this order, requests such as:
http://lorenzo:8080/cgi-bin/xxxx
are obtained from
<your-servers-dir>/http-root-dir/cgi-bin
http://lorenzo:8080/icons/folder.gif
are obtained from
<your-servers-dir>/http-root-dir/icons
http://lorenzo:8080/a
are obtained from
<your-servers-dir>/http-root-dir/htdocs
If there is a request that contains a directory ".." print an error.
if (ss=accept(........) < 0 ) {
}
it should be:
if ((ss=accept(........)) < 0 ) {
}
"<" has precedence to "=".
This causes ss to be evaluated to 0 instead of the value returned by
accept. read(0,...) will try to read input from stdin.
> I'm having problems making my child processes quit. I don't quite
> understand the point of capturing the SIGCHLD signal. It would seem
to me
> like just having the child process exit would work.
What is happening (as Gong pointed out) is that accept() returns -1
when SIGCHLD is received after a child process exits.
Therefore, you need to have something like the following in your code:
while (1)
ss=accept(....);
if ( ss == -1 && errno == EINTR)
//accept was interrupted. Retry call
conitnue;
}
errno returns the last error of a system call. errno will be EINTR if
accept was interrupted by a signal.
EINTR is defined in <errno.h>
One way to check concurrency is to load the complex example and
see how
pictures are loaded simultaneously instead of one by one.
Type ps -u <loging> and check if there are not <defunct> processes
around.
It looks like myt zombie process is eliminated only once. I am using
signal() to set up handler.
signal() only sets up the signal handler once and then after the signal
is delivered it disables it.
sigset() restores the signal handler automatically after the signal
is
delivered.
Use sigset instead of signal. Also you may use sigaction().
Yes.
Special characters such as + and @ are escaped when passed as cgi-bin
arguments.
You will need to unescape these characters.
> Hi, I got it to work for Netscape; however, I'm getting broken
images for
> complex.html (but it works with IE).
>
> please help? (the header was still the same).
>
This can be due to not sending the right header.
To debug what your server sends, whenever you have an instruction such
as:
write(ss, .....)
Add also a instruction to write it to screen:
write(ss, .....)
write(1, .....)
In this way you verify that what is sent is what you expect.
Additionally, make sure that the number bytes read from the file are
the
ones sent back to the client. Some students assume that they can
manipulate gif files like text files. gif files are binary files and
may
contain 0's in it. Do not use string operations with the contents of
gif
files.
> How about the link to "pizza". The link is to a file called
"jj" in the
> cgi-bin directory that doesn't exist.
>
For those files cd to cgi-src/ and type "make".
> My confusion is that in the link from index.html, there are no
variables.
> If you click on the finger link, it doesn't send any variables. That's
why
> I'm confused.
The finger cgi-script will send a form if there are no arguments.
Since ths script uses the <ISINDEX> tag, you may need to do
extra processing in the server. Look at what netscape sends to the
server
and modify your server so the finger scripts receives the right
arguments. In other words, do whatever is necessary to make the finger
script work.
If you are obtaining a scrambled output in netscape it may be because
your server is not sending its header. The cgi-bin scripts send the
content type but the server has to include its own server header.
HTTP/1.1 <sp> 200 <sp> Document <sp> follows <crlf>
Server: <sp> <Server-Type> <crlf>
2. For the finger script, finger without arguments should return an
HTML
document. This document includes a tag <ISINDEX> that uses cgi-bin
in a
different way that allows passing parameters as arguments of the script
and not in the QUERY_STRING. You will have to make some reverse
engineering and see what <ISINDEX> sends to the server. See
The Common Gateway Interface <http://hoohoo.ncsa.uiuc.edu/cgi/>
for more
info.
3. For the pizza example, you will have to modify the file cgi-src/jj.c
line <FORM ACTION=\"http://hoohoo.ncsa.uiuc.edu:80/htbin/jj\"> to
point
to your server. Then type "make". The scripts should run fine as long
as
your cgi-bin implementation is OK.
Do whatever it takes to have these scripts running correctly. You may
modify the scripts if necessary. Also you may add your own.
4. Make sure that you have implemented the port reuse option. It is
frustrating to have to grade somebody in the presentation and have
to wait
4 minutes or manually change the port number and recompile the server.
See
the FAQ to find out how to implement the port reuse.
5. Make sure that the default doucment that is loaded when you type
the
URL of your server in netscape is the one in
http-root-dir/htdocs/index.html. Also make sure that all the simple,
complex tests, directory browsing, and cgi-bin can be accessed from
this page by just clicking on them. This will make the grading and
the
presentation easier and you will not loose points for it.
The directory that your server should serve is the one in:
http://www.cs.purdue.edu/homes/cs422/lab4/http-root-dir.tar.Z
Download this file and install it.
There is a htdocs/, cgi-bin/, and icons/ subdirectory.
A request such as "GET /" will get the file htdocs/index.html.
A request such as "GET /cgi-bin/finger" will get the file from
cgi-bin/finger.
A request such as "GET /icons/ball.gif" will get thew file from
icons/ball.gif.
Your server should only return files in htdocs/ cgi-bin/ or icons/
You can use the function "realpath" and make sure that the
directory requested is inside the http-root-directory.
> If the dir like does not have a trailing slash my server does
not recurse to
> a depth greater than that particular directory's contents. If it
has a
> trailing slash then it is no problem.
>
> Should I fix this and if yes, How?
>
Yes. You have to fix it. That is why the dir1/ and dir are there.
With your http-client or telnet try to request dir1 and dir1/
in the apache
web-server in:
http://www.cs.purdue.edu/homes/grr/cs422-root-dir-test/htdocs/dir1/
and
http://www.cs.purdue.edu/homes/grr/cs422-root-dir-test/htdocs/dir1
Your server should do the same.
> For retrieving the directory, do we need to implement the resorting
of
> files/subdirectories upon pressing the according "Name", "Last Modified",
> "Size", etc? They should be thought as CGI since the request actually
> contains "?variable=value", but they are not in cgi-bin directory.
>
It is not required but it would be consider an extra (About 5 pts of
100).
Sorting by name as the default is required.
for the students.
GET /cgi-bin/a?b
Then "b" is passed as argument to "a" in execvp
argv[0] = "a";
argv[1] = "b";
argv[2] = 0;
execvp( argv[0], argv);
If the request has the form:
GET /cgi-bin/a?b+c+d
2. Then "b", "c", "d"" ares passed as argument to "a" in execvp
argv[0] = "a";
argv[1] = "b";
argv[2] = "c";
argv[3] = "d";
argv[4] = 0;
execvp( argv[0], argv);
3. If the request has the form:
GET /cgi-bin/a?b=c&d=e;
Then you pass the "b=c&d=e" in the QUERY_STRING.
You can distinguish between 1,2 and 3 by the "=" or the "+"
However, these examples should cover almost all of the commands you will need: http://www.cs.purdue.edu/homes/wspeirs/stl_strings.html They are laid out as a comparison between C and C++ with STL Strings.
As always it is a good idea to reference the people who made the library, HP & SGI, when looking for definitions, etc. The STL Programmer's Guide can be found here on SGI's site: http://www.sgi.com/tech/stl/
The STL library was created back in 1994 so any errors or unusual results are not because of an error in STL, but rather a misuse of the library or a misunderstanding of what a call is actually doing. Also, the STL library contains a lot of container classes like linked lists, a safer version of array called a vector, and other containers like hash tables, sets, maps, etc. These are already implemented for you, and can reduce the amount of code you need to write by an order of magnitude. Also, common functions like sorting are already done and can be used on anything from a list to an array, to a string; because all of the functions are templated for any type. See SGI's site for more information on these functions.
If anyone has questions about STL please feel free to send me e-mails.
Once you get over the learning curve they really become easier to use,
and more logical when reading through code. See my example page...
There have been quite a few question about implicit declaration
errors
when trying to compile code with _r system calls. These system calls
are
the thread safe versions of the original functions. Ex: strtok (not
thread safe) -> strtok_r (thread safe).
If you are using the default version of gcc/g++ you will get these
errors. You need to do one of two things:
1) The best option is to use the newer version of gcc/g++, version 3.3.
You can tell what version of gcc/g++ you are using by issuing the
command: gcc -v. The default version is 2.95.2 and is VERY old. They
are
almost going to release version 4. I would highly recommend using
version 3.3 located here: /p/gcc-3.3/bin/gcc or /p/gcc-3.3/bin/g++.
This
can be easily changed in the Makefile. The error messages from 3.3
are a
LOT better, and 3.3 can deal with namespaces, 2.95 cannot. If you are
using STL I would definitely recommend using 3.3. If you are not using
STL you should still use 3.3, but you are a fool for not using STL
;-)
2) If you are really attached to version 2.95, or your code won't
compile under 3.3 (I'd be amazed if it wouldn't) you can use 2.95 and
just add a compiler flag. Add the compiler flag
-D_POSIX_PTHREAD_SEMANTICS to your compile statement for any file using
these system calls. For example, if you have the file main.cpp and
it
uses strtok_r then you would compile the file like this: g++ -c main.cpp
-D_POSIX_PTHREAD_SEMANTICS. This will also take care of this error,
but
using version 3.3 is the better option.
I hope this clears things up for people. If you are using strtok and
NOT
strtok_r then you should talk to your TA because you are doing something
wrong... unless of course you are using STL strings, they are thread
safe!!! (http://www.sgi.com/tech/stl/thread_safety.html)
1. To build the module copy the files jj.c, util.c and util.h from
http-root-dir/cgi-src to the lab4-src directory.cp http-root-dir/cgi-src/jj.c http-root-dir/cgi-src/util.* .
2. Rename jj.c to jj-mod.c. jj-mod.c will be the loadable module
mv jj.c jj-mod.c
3. Modify the Makefile to build jj-mod.so. Add the following to
the Makefile:CC= gcc
jj-mod.o: jj-mod.c
$(CC) -c jj-mod.cutil.o: util.c
$(CC) -c util.cjj-mod.so: jj-mod.o util.o
ld -G -o jj-mod.so jj-mod.o util.o4. Also add -ldl to your http server rule
myhttpd : myhttpd.o<
$(CXX) -o $@ $@.o $(NETLIBS) -ldl5. To load the library see the file use-dlopen.cc
// Opening
void * lib = dlopen( "./jj-mod.so", RTLD_LAZY );if ( lib == NULL ) {
fprintf( stderr, "dlerror:%s\n", dlerror());
perror( "dlopen");
exit(1);
}6. Implement in jj-mod.c the function httprun() as described in the
handout and call dlopen/dlsym in your myhttp server program.7. It helps to print dlerror() if there is an error in dlopen or dlsym:
// Opening
void * lib = dlopen( "./jj-mod.so", RTLD_LAZY );if ( lib == NULL ) {
fprintf( stderr, "dlerror:%s\n", dlerror());
perror( "dlopen");
exit(1);
}