辅导案例-CS 252
CS 252: Systems Programming
Fall 2019
Lab 5: Web Server
Prof. Turkstra
Monday, November 11 11:58pm
1 Goals
In this lab, you will implement an HTTP server which allows an HTTP client (such as Mozilla
Firefox or Google Chrome) to browse a website and receive different types of content. You will gain
a high level understanding of Internet sockets, the HyperText Transfer Protocol (HTTP), SSL/TLS
extensions to HTTP, and the Common Gateway Interface (CGI).
2 Deadlines
• The deadline for the checkpoint is Monday, November 4 11:58pm. No late submissions
for the checkpoint will be accepted.
• The deadline for the final submission is Monday, November 11 11:58pm.
3 The Big Idea
You will implement relevant portions of the HTTP/1.1 specification (RFC 2616). Your server
will not need to support any methods beyond GET, although there is extra credit available for
supporting other methods (outlined in the extra credit section).
The lab template provides the basic foundation of your server in C and will allow you to focus more
on the technical systems programming aspect of this lab, rather than needing to come up with a
maintainable design for your server.
4 Getting Started
Login to a CS department machine (a lab machine or data.cs.purdue.edu), navigate to the directory
containing your labs, and run
$ git clone ~cs252/repos/$USER/lab5
$ cd lab5
1
Lab 5: Web Server CS252 - Fall 2019
There is also a file example/daytime-server.c that has the example on sockets covered in class.
Build the server by typing make, and start it by running ./myhttpd where portno is the
port number you wish to bind to. You can see the various options that need to be implemented by
running ./myhttpd -h.
4.1 Security Notice
All CS machines are publicly accessible on the Internet. When developing a web server, it is wise to
shield the server from any possible network traffic so that an attacker cannot take advantage of
bugs in your server. You should modify your server to bind to the loopback interface for
at least the first 3 tasks.
Upon doing this, if you are connected to data, your webserver will only accept connections coming
from data itself. While all CS machines allow many users to be logged in simultaneously, it is a
reasonably safe assumption that no malicious requests will be sent to your server. When you are
comfortable that your server won’t accidentally leak secret files, you can accept all connections from
any interface so that you can develop via SSH and the web browser on your machine.
You can make your server bind to the loopback interface by modifying src/tcp.c to bind to
INADDR_LOOPBACK instead of INADDR_ANY.
4.2 Resource Limits
In this lab, we set the limit for the amount of memory you use to 48 MiB, the maximum amount of
CPU time to 5 minutes, and the maximum number of forked processes to be running to 50. We
don’t expect you to hit these limits, so if your server hits any of these limits, it’s probably because
you need to change the way you’re doing something! Web servers like the one we are implementing
should not be resource intensive.
5 The Assignment - Checkpoint
5.1 Background
HTTP, or Hypertext Transport Protocol, is a protocol for communicating over the Internet that
is used by numerous applications. HTTP specifies the format and meaning of messages that are
used to communicate between client applications and server applications. It is a textual format,
and requests and responses can be thought of as structured blobs of text that humans can read.
HTTP 1.0 follows a strict request-response model where the client opens a connection, sends a single
request, and then receives a single response from the server before the server closes the connection
and the transaction is complete.
Connections for HTTP are initiated using TCP, which is a high-level transmission protocol that has
many features built in to handle dropped connections and faulty data transmission. You will not be
implementing TCP in any capacity in this lab, it is merely the mechanism used to facilitate HTTP
connections.
Page 2
Lab 5: Web Server CS252 - Fall 2019
There are two types of HTTP messages: request messages and response messages. As you can
imagine, a client sends a request message to an HTTP server to serve some request (e.g. the webpage
/index.html or the picture /img/PurdueSeal.png). The request message contains details such as
the version of HTTP being used, which resource is being requested, what you’d like to do with that
resource, what sorts of responses the client accepts, and so on. The server is responsible for parsing
the request message and creating a response message containing the results of the query.
5.2 Format
A HTTP client issues a ‘GET’ request to a server in order to retrieve a file. The general syntax of
such a request is given below:
GET HTTP/1.1
( )*

where:
stands for a whitespace character
stands for a carriage return + line feed pair (ASCII character 13 followed by ASCII
character 10).
is also represented as \r\n\r\n.
gives us the name and location of the file requested by the client relative to a specified
DocumentRoot. This could be just a forward slash (/) if the client is requesting the default
index file on the server.
• ( )* contains additional information that can influence how the server
behaves when responding. Note that this part can be composed of several lines each separated
by a .
Finally, observe that the client ends the request with two carriage return + line feed character pairs:

The function of a HTTP server is to parse the above request from a client, identify the resource
being requested, and send the resource to the client.
Before sending the actual document, the HTTP server must send a response header to the client.
The following shows a typical response from a HTTP server when the requested resource is found
on the server:
HTTP/1.1 200 OK
Connection: close
Content-Type:
Content-Length:
( )*


where:
Page 3
Lab 5: Web Server CS252 - Fall 2019
• Connection: close indicates that the connection will be closed upon completion of the
response
indicates to the client the type of document being sent. Below are the
document types that need to be implemented for this lab:
– “text/plain” for plain text
– “text/html” for HTML documents
– “text/css” for CSS documents
– “image/gif” for gif files
– “image/png” for png files
– “image/jpeg” for jpg/jpeg files
– “image/svg+xml” for svg files
is the number of bytes that compose the delivered content
• ()* contains, as before, some additional useful information for
the client to use.
is the actual document requested. Observe that this is separated from the response
headers by two carriage return + line feed pairs.
If the requested file cannot be found on the server, the server must send a response header indicating
the error. The following shows a typical response:
HTTP/1.1 404 File Not Found
Content-Type:


where:
indicates the type of document (i.e. error message in this case) being sent.
Since you are going to send a plain text message, this should be set to text/plain.
is a human readable description of the error in plain text format indicating
the error (e.g. Could not find the specified URL. The server returned an error).
5.3 Basic Server
You will need to implement an iterative HTTP server that implements the following basic algo-
rithm:
• Open a passive socket
• Do forever:
– Accept a new TCP connection
– Read a request from the TCP connection and parse it
Page 4
Lab 5: Web Server CS252 - Fall 2019
– Choose the appropriate response header depending on whether the URL requested is
found on the server or not
– Write the response header to the TCP connection
– Write the requested document document (by default you should respond with index.html,
located at htdocs/index.html) to the TCP connection
– Close the TCP connection
The server that you will implement at this stage will not be concurrent, meaning that it will not
serve more than one client at a time (it queues the remaining requests while processing each request).
Use the aforementioned daytime server as a reference for programming with sockets. Implement
your http server in “server.c”, and “http messages.c”.
Note that in HTTP, all newlines should be “\r\n” (CRLF), not just “\n” (LF)!
• You should read RFC2616 Section 5 for information on how to parse request messages. The
only method you are required to support is GET.
• You should also read RFC2616 Section 6 for information on how to form response messages.
5.4 Basic HTTP Authentication
In this part, you will add basic HTTP authentication to your server. Your HTTP server may
have some bugs and may expose security problems. You don’t want to expose this to the open
Internet. One way to minimize this security risk is to implement basic HTTP authentication. You will
implement the authentication scheme in RFC7617, aptly called “Basic HTTP Authentication.”
In Basic HTTP Authentication, you will check for an Authorization header field in all HTTP
requests. If the Authorization header field isn’t present, you should respond with a status code of
401 Unauthorized with the following additional field:
WWW-Authenticate: Basic realm="something" (you should change to a realm ID
of your choosing, such as “The Great Realm of CS252”)
When your browser receives this response, it knows to prompt you for a username and password.
Your browser will encode this in Base64 in the following format: username:password and will supply
them in the Authorization header field. Your browser will repeat the request with the Authorization
header.
You should create your own username/password combination and encode it using a Base64 encoder,
which may be found online, or on a CS lab machine with the below command:
$ cat mycredentials.txt | base64
To illustrate this process, consider the following message sequence:
Client Request:
GET /index.html HTTP/1.1
Server Response:
Page 5
Lab 5: Web Server CS252 - Fall 2019
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="myhttpd-cs252"
Client browser prompts for username/password. User supplies “cs252” as the username and
“password” as the password, which the client then encodes as cs252:password in base 64
(Y3MyNTI6cGFzc3dvcmQ).
Client Request:
GET /index.html HTTP/1.1
Authorization: Basic Y3MyNTI6cGFzc3dvcmQ=
Note: you should create your own username and password and NOT use cs252 and
password. You can modify username and password in file auth.txt, and use function
return user pwd string (in server.c) to load it. When loaded, the string is stored in a global
variable g user pass as well. You shouldn’t use your Purdue career account credentials either.
You will check that the request includes the line
"Authorization: Basic "
and then respond. If the request does not include this line you will return an error.
After you add the basic HTTP authentication, you may serve other documents besides index.html.
5.5 Serving Static Files
A lot of web servers are really simple and have no dynamic content. These web servers use a
directory hierarchy as their website structure. Your webserver will serve the http-root-dir/htdocs
directory in the lab handout. When your web server gets a request such as /index.html, you will
look for the file http-root-dir/htdocs/index.html and send that file. If you get a request for a file
that doesn’t exist, you should reply back with a Status-Code of 404. If you get a request for a
directory, you should serve the index.html file in that directory (if that file is not present, then
404).
You do not need to respect the Accept field of the Request message. We do expect that you give
a valid Content-Type in your response. You can call get_content_type() (defined in misc.h) to
get this information for you. Be careful though! This function does not validate the filename and
simply runs file -biE and gives you the output.
You will need to be capable of sending the following status codes: 200, 404. When grading for 404,
we will only check that the Status-Code was set correctly. You can provide any content message
you wish.
Some notes:
• You will mostly be modifying server.c and htdocs.c for this part.
• If you receive a request for a document that is a directory without a trailing frontslash (e.g.
/dir1, not /dir1/, then you should really server /dir1/index.html. That is, pretend as
though requests for /dir1 are really requests for /dir1/index.html.
• If a request has a trailing frontslash, you will handle this in the browsable directories part.
Page 6
Lab 5: Web Server CS252 - Fall 2019
5.6 Concurrency
In this part, you will add concurrency to the server. You will implement three concurrency modes,
distinguished by an input argument passed to the server. The concurrency modes you will implement
are the following:
• -f : Create a new process for each request
Fork mode (run_forking_server()): You should handle each request in a new process. To
do this, after you accept a connection off of your socket acceptor, you should call fork() and
execute your normal request/response logic from within the child. Don’t forget about zombies!
• -t : Create a new thread for each request
Thread-per-request mode (run_threaded_server()): You should handle each request in a
new thread. To do this, after you accept a connection off of your socket acceptor, you should
create a new thread and execute your normal request/response logic from within the child.
You should consider using pthreads for this part.
• -pNUM THREADS : Pool of threads
Pool of threads (run_thread_pool_server()): You should handle each request using a pool
of n workers. n is denoted by NUM THREADS. Your program should be using n+1 threads
during execution - the thread that your program starts in should be used to create the n
workers and then wait for them to finish. Of course, since each worker is running in an infinite
loop, they will only finish on an error or when the program exits.
• -h : Print usage
This flag will print out the usage of myhttpd and myhttpds.
The format of the command to run your server should be:
myhttpd [-f|-t|-pNUM_THREADS] [-h]
If no flags are passed, the server should act like an iterative server as created in the Basic Server
section. If port is not passed, choose your own default port number. Make sure it is
larger than 1024 and less than 65536.
5.7 Turning in the Checkpoint
The deadline for the checkpoint is Monday, November 4th, 2019 at 11:58 PM You must
run the following commands in order for your submission to be valid:
$ make clean
$ make myhttpd
$ make submit_checkpoint
Page 7
Lab 5: Web Server CS252 - Fall 2019
6 The Assignment - Final
6.1 Secure HTTP (HTTPS)
HTTP messages are sent across the Internet unencrypted, which means that anyone who can sniff
traffic along a message’s route can see anything transmitted between the client and server (think
about that Base64 encoded password from earlier...). This is often undesirable.
Fortunately a more secure version of HTTP exists, called HTTPS. HTTPS uses Transport Layer
Security (TLS) to communicate. TLS encrypts all traffic at the transport layer of the Internet
model, which means that you, as an application programmer, only need to worry about adding
high-level support for communicating via TLS. That is, the underlying socket has changed, but the
HTTP protocol you’ve implemented is unconcerned about what the socket does when your server
asks to read or write data.
Fortunately, we have an abstraction of tls socket and tls acceptor. You just need to implement the
functions declared in tls.h
• int close tls socket(tls socket *socket);
• int tls write(tls socket *socket, char *buf, size t buf len);
• int tls read(tls socket *socket, char *buf, size t buf len);
• tls acceptor *create tls acceptor(int port);
• tls socket *accept tls connection(tls acceptor *acceptor);
• int close tls acceptor(tls acceptor *acceptor);
To run your server with the TLS Socket, you should
$ make myhttpsd
$ ./myhttpds ...
When you navigate to your server with your Internet browser, be sure to include the https://
prefix.
Note: After successfully porting the TLS “driver” to your web server, you will receive warning
messages in your browser along the lines of “Connection is not Private.” This is because your
SSL certificate was not signed by an accepted certificate authority, such as Comodo, LetsEncrypt,
GlobalSign, etc. You can manually install your certificate as a trusted certificate in your browser,
or you may simply add an exception. In Chrome, you do this by clicking “Advanced” and “Proceed
to ...”
If you haven’t added the certificate exception when you first connect to your server, Firefox will
immediately close the connection—often before your server can write a response back to Firefox.
Your code will likely crash because you receive a SIGPIPE. You must safely handle this signal.
6.1.1 Resources
We have provided a simple reference TLS server in the file examples/tls-server.c. You can build
it with make tls-server and play around with it to learn how to use the OpenSSL library. The
Page 8
Lab 5: Web Server CS252 - Fall 2019
Simple TLS Server example assumes that your private key and certificate file are called cert.pen
and key.pem. And that you can bind to port 4433 - you will want to change this port number so
that you can run the demo. You can generate these keys by running the following command and
filling in the prompts:
$ openssl req -newkey rsa:4096 -nodes -sha512 -x509 -days 21 -
nodes -out cert.pem -keyout key.pem
$ chmod 700 *.pem
Note: For this task, you should primarily be understanding the flow of the provided Simple TLS
Server and how those functions in socket.c control whether to use tcp functions or tls functions.
6.2 cgi-bin
For this part, you will implement CGI-like behavior on your server. The Common Gateway Interface
allows a server to handle dynamic requests and forward them on to an arbitrary executable, typically
inside of a specified folder like cgi-bin.
When a request like this one arrives:
GET /cgi-bin/