2019 CSEE 4119: Computer Networks Project 1: Video CDN Preliminary Stage Due: October 14, 2020, at 11pm Final Stage Due: November 18, 2020, at 11pm Updates: 2020/10/15: (FINAL STAGE) WE RELEASED A NEW VM WITH VITAL PACKAGES. YOU MUST RE-FETCH THE VM, EVEN IF YOU DOWNLOADED IT FOR THE PRELIMINARY STAGE. 1. Overview 2 1.1 In the Real World 2 1.2 Your System 3 1.3 Groups and collaboration policy 4 2. Preliminary stage 4 2.1 Get your connections right. 4 2.2 Forwarding protocol. 5 2.3 Running the proxy 6 2.4 Test it out! 7 2.5 Final result 7 2.6 What to Submit for preliminary stage 7 2.7 Where to Submit 8 3. Final stage: Video Bitrate Adaptation 8 3.1 Requirements 8 3.2 Implementation Details 9 3.2.1 Simple ABR Algorithm 9 3.2.1.1 Throughput Calculation 10 3.2.1.2 Choosing a Bitrate 10 3.2.2 Logging 11 3.2.3 Running the Proxy 12 3.3 What to Submit for the final stage 12 3.4 Where to Submit 13 3.5 Possible plan of attack 13 3.6 Hints 14 3.7 Sample output 14 4 Development Environment 15 4.1 Virtual Box 15 4.2 Starter Files (for final stage) 16 4.3 Network Simulation (for final stage) 18 4.4 Apache 19 4.5 Programming Language and Packages 19 [4.6] (Optional) CUIT Workstations 20 5 Grading 20 Academic integrity: Zero tolerance on plagiarism 21 1. Overview In this project, you will explore aspects of how streaming video works, as well as socket programming and HTTP. In particular, you will implement adaptive bitrate selection. The programming languages and packages are specified in the development environment section. 1.1 In the Real World Figure 1 depicts (at a high level) what this system looks like in the real world. Clients trying to stream a video first issue a DNS query to resolve the service’s domain name to an IP address for one of the content servers operated by a content delivery network (CDN). The CDN’s authoritative DNS server selects the “best” content server for each particular client based on (1) the client’s IP address (from which it learns the client’s geographic or network location) and (2) current load on the content servers (which the servers periodically report to the DNS server). Once the client has the IP address for one of the content servers, it begins requesting chunks of the video the user requested. The video is encoded at multiple bitrates. As the client player receives video data, it calculates the throughput of the transfer and monitors how much video it has buffered to play, and it requests the highest bitrate the connection can support without running out of video in the playback buffer. 1.2 Your System Implementing an entire CDN is clearly a tall order, so let’s simplify things. First, your entire system will run on one host; we’re providing a network simulator (described in Development Environment) that will allow you to run several processes with arbitrary IP addresses on one machine. Our simulator also allows you to assign arbitrary link characteristics (bandwidth and latency) to the path between each pair of “end hosts” (processes). For this project, you will do your development and testing using a virtual machine (VM) we provide. Browser. You’ll use an off-the-shelf web browser (e.g. Firefox) to play videos served by your CDN (via your proxy). Proxy. Rather than modify the video player itself, you will implement adaptive bitrate selection in an HTTP proxy. The player requests chunks with standard HTTP GET requests. Your proxy will intercept these and modify them to retrieve whichever bitrate your algorithm deems appropriate. To simulate multiple clients, you will launch multiple instances of your proxy. Web Server. Video content will be served from an off-the-shelf web server (Apache). More detail in the Development Environment section. As with the proxy, you can run multiple instances of Apache on different fake IP addresses to simulate a CDN with several content servers. However, in the assignment, rather than using DNS redirection like a CDN would, the proxy will contact a particular server via its IP address (without a DNS lookup). A possible (ungraded) future extension to the project could include implementing a DNS server that decides which server to direct the proxy to, based on distance or network conditions from a proxy to various web servers. The project is broken up into two stages: ● In the preliminary stage, you will implement a simple proxy that sequentially handles clients and passes messages back and forth between client and server without modifying the messages. ● In the final stage, you will extend the proxy to implement the full functionality described above, with the proxy modifying HTTP requests to perform bitrate adaptation. 1.3 Groups and collaboration policy This is an individual project, but you can discuss it at a conceptual level with other students or consult Internet material (excluding implementations of Python proxies), as long as the final code and configuration you submit is completely yours and as long as you do not share code or configuration. Before starting the project, be sure to read the collaboration policy at the end of this document. 2. Preliminary stage You will be implementing a simple proxy that accepts client connections sequentially (i.e. handles a client, and, once it disconnects, takes care of the next client). In later stages, your proxy will be required to handle client connections concurrently. In this preliminary stage, the proxy does NOT need to modify any messages it receives, as it will just relay the messages back and forth. In later stages, you will enhance your proxy to modify messages in order to perform adaptive bitrate selection. 2.1 Get your connections right. Your proxy should accept connections from clients and then open up another connection with a server (see How to run the proxy). Once both connections are established, the proxy should forward messages between the client and server. You should implement this in two steps: a. Establish a connection with a client: Your proxy should listen for connections from a client on any IP address on the port specified as a command line argument (see How to run the proxy). Your proxy should accept multiple connections from clients. It is not required to handle them concurrently for now. Simply handling them one by one, sequentially, will be enough. b. Establish a connection with a server: Once the proxy gets connected to the client, it should then connect to the server. The server IP is provided as a command line argument. As for the port number, use 8080. Make sure to close connections to the client and server when either of them disconnects. (Figure 4) Preliminary structure 2.2 Forwarding protocol. The “messages” that the proxy forwards follow a particular structure (for example, in HTTP, you know that there is a header and a body). This structure is important, as the recipient of the message can know where to look to get a specific piece of information. For the preliminary stage, we keep our message structure very simple: (Figure 5) Structure of a message The message has a body, and an End Of Message (EOM) symbol that indicates the end of the message. In our case, we define our EOM as the new line character ‘\n’. Please note that detecting ‘\n’ is different from detecting the slash character ‘\’ and the letter ‘n’.. A message has to be fully received by the proxy before being forwarded to the other side. Here is how the forwarding protocol works in our case: 1. The proxy gets a message from the client and forwards it to the server 2. The proxy expects a response from the server, gets it, and forwards it to the client An important thing to notice here is that there is no asynchronous forwarding (i.e., the proxy doesn’t simply forward any message, it first waits for a message from the client, and then waits for a response from a server). In other terms, the proxy shouldn’t forward a message coming from a server before getting one from the client. 2.3 Running the proxy You should create an executable Python script called proxy inside the proxy directory (see below for a description of the development environment), which should be invoked as follows: cd ~/project1-starter/proxy ./proxy listen-port: The TCP port your proxy should listen on for accepting connections from the client. fake-ip: Your proxy should bind to this IP address for outbound connections to the server. You should not bind your proxy listen socket to this IP address— bind the listen socket to receive traffic to the specified port regardless of the IP address. (i.e. by calling mySocket.bind((“”, )) Important note: The above is a pretty unusual thing to do. You might think “in lecture, we saw that the client socket doesn’t bind, and now, you are telling us to bind the proxy’s outbound socket when connecting to the server”. However, it is necessary for the Final Stage’s network simulator to work properly, and so we will add it in this stage. server-ip: The IP address of the server See instructions for making your script executable in the section Hand In. 2.4 Test it out! You can test parts Get your connections right and Forwarding protocol of your proxy implementation by using the netcat tool (nc or netcat, which is installed in the VM) presented in class, using both a netcat client and a netcat server. You should be able to send a message from the client and see it appear on the server side. Then, any response sent from your server should also appear on the client. For the fake-ip, you can indicate 127.0.0.1 (localhost) when testing with netcat instances that are created on your machine. 2.5 Final result Your proxy should be able to forward a message from a client to the server, and in turn, forward the response from the server back to the client. It should support back-and-forth messages until one side closes the connection. After the connection is closed, it should be able to accept a new connection from a client. You should be able to test the behavior of your application by creating netcat instances. Note that this version simply forwards messages with the structure specified in Figure 2. The next stage will have a different message structure: the HTTP message structure, and you’ll have to adapt your proxy based on your knowledge of HTTP messages. 2.6 What to Submit for preliminary stage PLEASE PAY ATTENTION TO THE HANDIN STRUCTURE, AS EVEN A TYPO WILL CAUSE THE GRADER TO BREAK, WHICH CAN MAKE YOU LOSE 10 POINTS. You will submit your project as a zipped file named <yourUNI>.zip. Unzipping this file should give us a directory named handin which should only contain the following: ● proxy — A directory named proxy containing only your source code. The code that you want to execute should be an executable named proxy, as described in 2.3 How to run the proxy. To make the code executable, follow these steps: 1. Add ‘#!/usr/bin/env python3’ to the top of your proxy Python file 2. Run ‘chmod 755 proxy’ to ensure that the file has the correct permissions to be executable by us. (Figure 6) Preliminary stage submission file structure. You may organize your code within the proxy directory as you see fit. Part of your grade may be based on how understandable/organized/well-explained your code is, but we do not require any particular organization as well as it is well-organized. 2.7 Where to Submit You will submit your code to CourseWorks. If you have any questions about it, please let us know ASAP. 3. Final stage: Video Bitrate Adaptation Many video players monitor how quickly they receive data from the server and use this throughput value to request higher or lower quality encodings of the video, aiming to stream the highest quality encoding that the connection can handle. Rather than modifying an existing video client to perform bitrate adaptation, you will implement this functionality in an HTTP proxy through which your browser will direct requests. 3.1 Requirements (1) Implement basic video adaption functionality in your proxy. Your proxy should calculate the throughput it receives from the video server and select the best bitrate for the connection. See Implementation Details for details. (2) Evaluate your basic video adaptation function. Running the dumbbell topology (via netsim) will create two servers (listening on the IP addresses in topo1.servers: 3.0.0.1:8080 and 4.0.0.1:8080); you should direct one proxy to EACH server. Now (a) Start playing the video through each proxy. Run the topo1 events file and direct netsim.py to generate a log file: ./netsim.py -l ../topos/topo1 run -- after 1 minute, stop video playback and kill the proxies. (b) Gather the netsim log file and the log files from your proxy and use them to generate plots for link utilization, fairness, and smoothness. Use our grapher.py script to do this: ./grapher.py . (c) Repeat this process for alpha = .1, .5, .9. Generate 9 plots (from (b)) and compile them into a single PDF. In the PDF, include a brief discussion of the tradeoffs and trends when varying alpha. We are looking for a paragraph or two, at most. 3.2 Implementation Details You are implementing a simple HTTP proxy. It accepts connections from web browsers, modifies video chunk requests as described below, opens a connection with the web server’s IP address, and forwards the modified request to the server. Any data (the video chunks) returned by the server should be forwarded, unmodified, to the browser. Your proxy should listen for connections from a browser on any IP address on the port specified as a command line argument (see below). Your proxy should accept multiple concurrent connections from web browsers by starting a new thread or process for each new request. When it connects to a server, it should first bind the socket to the fake IP address specified on the command line (note that this is somewhat atypical: you do not ordinarily bind() a client socket before connecting). Figure 4 depicts this. 3.2.1 Simple ABR Algorithm The simple ABR algorithm looks at bandwidth available to a client, and requests bitrates that the connection can likely handle. Hence, there are two functions: throughput calculation and bitrate requests. 3.2.1.1 Throughput Calculation Your proxy could estimate each stream’s throughput once per chunk as follows. Note the start time, ts, of each chunk request (i.e., include “time” and save a current timestamp using time.time() when your proxy receives a request from the player). Save another timestamp, tf , when you have finished receiving the chunk from the server. Now, given the size of the chunk, B, you can compute the throughput, T, your proxy saw for this chunk (to get the size of each chunk, parse the received data and find the “Content-Length” parameter) : T = Bt − tf s To smooth your throughput estimate, your proxy should use an exponentially-weighted moving average (EWMA). Every time you make a new throughput measurement (Tnew), update your current throughput estimate as follows: αT (1 α)TT current = new + − current The constant 0 ≤ α ≤ 1 controls the tradeoff between a smooth throughput estimate (α closer to 0) and one that reacts quickly to changes (α closer to 1). You will control α via a command line argument. When a new stream starts, set Tcurrent to the lowest available bitrate for that video. 3.2.1.2 Choosing a Bitrate Once your proxy has calculated the connection’s current throughput, it should select the highest offered bitrate the connection can support. For this project, we say a connection can support a bitrate if the average throughput is at least 1.5 times the bitrate. For example, before your proxy should request chunks encoded at 1000 Kbps, its current throughput estimate should be at least 1.5 Mbps. Your proxy should learn which bitrates are available for a given video by parsing the manifest file (the “.f4m” initially requested at the beginning of the stream). The manifest is encoded in XML; each encoding of the video is described by a element, whose bitrate attribute you should find. Important: Upon investigating the video files, we found that the labeled bitrate and actual bitrate do not match. Hence videos labelled ‘10’ are not actually 10 kbps. We are asking all of YOU to find the proper bitrates of each labeled bitrate (HINT: you can assume that all chunks are the same duration as each other, and that all chunks of the same labeled bitrate are the same actual bitrate, as the differences are minor). Note, you only need to find the actual bitrates once (offline), and hard-code the mapping from labeled bitrate to actual bitrate in your proxy script. When requesting videos, you should name requests according to the bitrates found in big_buck_bunny.f4m (i.e., the labeled bitrate), but you should choose which bitrate to request based on the actual bitrate you calculate yourself (and how it compares to the current throughput estimate). In summary, labeled bitrate is in the chunk filename, but actual bitrate is related to the size of the chunk in bytes, and the two are not necessarily equal. Your proxy replaces each chunk request with a request for the same chunk at the selected labeled bitrate by modifying the HTTP request’s Request-URI. Video chunk URIs are structured as follows: /path/to/video/Seq-Frag For example, suppose the player requests fragment 3 of chunk 2 of the video Big Buck Bunny at bitrate label 500: /path/to/video/500Seg2-Frag3 To switch to a higher bitrate, e.g., bitrate label 1000, the proxy should modify the URI like this: /path/to/video/1000Seg2-Frag3 IMPORTANT: When the video player requests big_buck_bunny.f4m, you should instead return big_buck_bunny_nolist.f4m, which is also stored in servers. The latter does not list the available bitrates (actually, it only lists a single bitrate label, 1000, therefore you should expect the browser to always send you queries with 1000 as the desired bitrate label for any sequence). Your proxy should, however, fetch big_buck_bunny.f4m for itself (i.e., don’t return it to the client) so you can parse the list of available encodings as described above. 3.2.2 Logging We require that your proxy create a log of its activity in a very particular format for scoring and graphing. After each request, your proxy should append the following line to the log: