Assignment – 5 (CSE30: Computer Organization and Systems) Instructors: Bryan Chin SP21 Due: Friday, May 7th 11:59PM Please read over the entire assignment before starting to get a sense of what you will need to get done in the next week. REMEMBER: Everyone procrastinates but it is important to know that you are procrastinating and still leave yourself enough time to finish. Follow the suggested development flow discussed in lecture (soapbox talk). Background (Optional Reading) In the area of network security and privacy there is a category of devices that focus on network tracking and advertisement blocking called DNS sinkholes. From Wikipedia (https://en.wikipedia.org/wiki/Domain_Name_System): “The Domain Name System (DNS) is a hierarchical and decentralized naming system for computers, services, or other resources connected to the Internet or a private network. The DNS system associates various information with domain names assigned to each of the participating entities. One important function of DNS is that it translates more readily memorized domain names, such as www.ucsd.edu, to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols. By providing a worldwide, distributed directory service, the Domain Name System has been an essential component of the functionality of the Internet since 1985.” When your computer connects to the network, it is given the IP address of a DNS server where DNS queries are to be sent. An end-user application, like a web browser, will make a DNS request for every DNS name associated with a web page that you visit. The DNS server will respond with the IP address for each DNS name. The open source project, pihole (https://pi-hole.net/), turns a raspberry pi into a DNS server (and DNS sinkhole) for your local/home network. Computers on your local network are then configured to make DNS requests directly to the pihole DNS server. The pihole DNS server will either respond to your DNS name request directly (it keeps copies of commonly requested domain names) or will query higher level DNS servers on your behalf to get the IP address and then forward the response to your computer. By placing the pihole DNS server between your computer and the internet, DNS servers allow the pihole to filter DNS requests to block sites that you do not want to visit (like advertising, tracking sites, malware etc). A DNS sinkhole includes a database of sites keyed by DNS names that are to be blocked. When a DNS sinkhole receives a DNS request for a blocked site from say a web browser on your computer, it responds not with the actual IP address of the requested site but substitutes the IP address of the DNS sinkhole. The application then attempts to connect to the DNS sinkhole instead. The DNS sinkhole takes the place of the real site and in the case of a web page connection it returns a blank page (blocking access to the actual page requested), or for other types of applications drops the request. In the Figure 1 below we show what this looks like at a client web browser when the IP address for a blocked address (adservice.google.com) is returned. The warning page from the DNS sinkhole web server replaces the actual web page. Figure 1 : Warning Page Assignment Overview Jarvis needs to block certain websites from being accessible. The way to do that is to build a DNS blocker. Your assignment is to help build a system that checks if a website is blocked or not given a list of blocked websites and an input file which has the list of websites one would like to access. You will do this by implementing a hash table-based in-memory database using single linked chains for collision resolution. The database will contain DNS names of websites you wish to block. You will load a real DNS block list into the database and then make some queries for different DNS names. If a DNS name is found in the database, it will respond with a blocked message. If the DNS name is not in the database, it will respond with a not blocked message. Figure 2 shows a sample visualization of a hash table once it has been loaded with a database of DNS names of blocked websites. Figure 2 : Visualization of a hash table - a short video reviewing hashing can be found on canvas - “Hashing for HW5” Part 0: Getting started We’ve provided you some starter code with a Makefile here. Download all the files to use. This assignment has both - C programming as well as ARM assembly. We have placed the reference solution binary at: /home/linux/ieng6/cs30sp21/public/bin/dnsblockA5reference. This can only be executed in the pi-cluster. It will NOT run if you try to execute it in ieng6. In order to run your program you MUST ssh into the pi-cluster on ieng6. Both your ieng6 and pi-cluster accounts share the same files so anything you copy to ieng6 (scp to ieng6) will ALSO be on the pi-cluster when you ssh in. Compile and run your program ONLY on the pi-cluster, it will not work on ieng6.From Your Personal Computer 1. Open a Terminal 2. ssh into your ieng6 account 3. Download the starter code 4. From ieng61, ssh into the pi-cluster using the following command (may not work if executing from your personal computer): $ ssh
@pi-cluster.ucsd.edu Example: $ ssh [email protected] 5. You are ready to start! (REMEMBER: when you exit, you need to do so twice if you want to get back to your local machine!) Makefile: The Makefile provided will create a dnsblock executable from the source files provided. Compile it by typing make into the terminal. Run make clean to remove files generated by make. gdb: To run with gdb, the command is gdb [executable] to start gdb with your executable. r [arguments] will run your executable with [arguments] Part 1: dnsblock program [50 points] Step 1. Command line Parsing Your program will be called using the following syntax: dnsblock [-s] [-t tablesize] -b blockfile < input.txt The -s option prints out the following information after the hash table has been loaded. It works by walking down each chain to find the total number of entries and the length of the longest and shortest collision chain, it also keeps track of the website which was blocked the most number of times. Table size: 1873 Total entries: 75064 Longest chain: 65 Shortest chain: 18 Maximum block count: 10 The -t tablesize option allows an override of the default size of the hash table array (use 1873 as a default) to tablesize. Making this a small positive number may help you to debug your chaining routines. The simple hash function we are using (see below) often works better with prime numbers for the size of the hash table. 1 You can also log directly into the pi-cluster if you are on a UCSD network (e.g. on campus, or use UCSD VPN). The -b blockfile must be specified as it is the filename of the block list test file to load into the database. If not specified it is an error. You will open the file using fopen() in read only mode (“r”), read the file with getline() and then close the file using fclose()after EOF is reached. The blockfile will never be empty - it will at least have one entry in it. Step 2. Load the Hashtable from the blockfile Each entry in the hash chain is of the following type: typedef struct node { char *DNSname; int count; struct node *next; } node; Create the hash table on the head where htable is a pointer to the table. node **htable; Using calloc(), allocate space for the hash table to contain tablesize elements of (node *). This way the table is zero filled (where all the (node *) pointers are NULL to start. The blockfile is an ascii text file of blocked DNS names. Each line of the file will contain a single DNS name terminated by a newline (‘\n’). Lines that start with a # or a ‘\n’ are skipped over (ignored - not inserted into the hash table). You can assume all other lines are ok. Until you reach the end of the file, read the text file, one line at a time using a getline() loop, remove the trailing ‘\n’ from the input (you can use a routine like strchr()if you like) and then generate a hash value for each entry in the block DNS file (refer hash function part described later in the write up). Using the hash value, you will check if the entry is already in the table. If the DNS name you are trying to add to the database is already in the table, using fprintf() you will print a message line (as below) to stderr and go to the next entry in the blockfile. Loadtable Duplicate entry: ads.google.com If this DNSname is not in the table, you need to insert it into the hash table. Use malloc() to allocate a node and use strdup() to allocate the space (pointed at by a DNS name member) to contain a DNS name string. Insert the new entry at the front of the collision chain at the bucket specified by the hash value. Step 3. Print the stats If the -s option is given you will walk the hash table from the first chain pointer (offset 0) to the last (offset tablesize – 1). For each entry you walk the chain and while doing so count the number of entries on the chain. You need to keep track of the shortest and longest chain as well as the total number of entries in the table. The shortest length of a chain is greater than equal to 1 (if the hash table has no nodes in a particular offset - it is not a chain). You also need to keep track of the number of times a website was blocked for - you need to print the maximum. Step 4. Query Mode After the table is built, you will read standard input (until a ctrl-d is pressed on a line by itself and the program will exit) using getline(). You can type on each line input, a DNS name that you will check to see if it is blocked (found in the hash table) or is not blocked (not found in the hash table). Lines that start with a # or a ‘\n’ are skipped over. You can also create a test file and test it with: dnsblock -s -b blocklist.txt < test_file.txt Your output will be DNS name input followed by a space and either [blocked] or [not blocked]: Below we provided the output of executing the program with test_file.txt file that was provided as part of the starter code. ach.magnificentpool.com [not blocked] events3alt.adcolony.com [blocked] ach.mine-vn.com [not blocked] ach.nibirupool.com [not blocked] www.amazonco.uk [blocked] Table size: 1873 Total entries: 2919 Longest chain: 7 Shortest chain: 1 Maximum block count: 1 How to develop your code There are four files, Makefile, hash.S, blocklist.txt and dnsblock.c. Use the file dnsblock.c and hash.S to write your program. blocklist.txt is a real blocklist file. You must follow the following rules: ● You cannot change the name of parameters (name or count) of any function prototype given. ● You cannot alter main(), parseopts(), or hash() in any way. ● You cannot change the typedef for struct node. ● If given you must use any printf()or fprintf() format string. ● Implement the hash function (hashFun()) in the file hash.S - only insert your code in the part that the comment in the file indicates to. ● You must check the return value of all functions you use in the body you write and print a descriptive error message to stderr. Functions to Implement (in dnsblock.c) : a) loadtable Arguments: node **htable -> pointer to hash table unsigned long tabz -> hash table size char *blockname -> file name of block list Operation: ● opens the blockname file for "r". ● reads using getline(), each line from the blocklist and removes trailing newline. ● Lines that are start with a # or have just a newline are skipped. ● Hashes the name to find the chain. ● Calls dns_lookup() to see if the entry is already in the table ● If so fprintf to stderr and skips to next entry, otherwise calls add_front() to insert it into the table ● Closes the file and frees the buffer created by getline(). ● Returns 0 if all ok, 1 otherwise b) dns_lookup Arguments: node *front -> hash chain pointer head char *DNSname -> DNSname string Operation: The function walks the chain looking for a node that has DNSname and returns a pointer to the node in the chain that has the key DNSname. It returns NULL if not found. c) add_front Arguments: node *front -> hash chain pointer head char *DNSname -> DNSname string Operation: The function inserts a new node with the DNSname at the front of the chain. It returns a pointer to the new head of the chain if insert succeeds or returns NULL if the insert fails. d) dostats Arguments: node **htable -> pointer to hash table unsigned long tabsz -> hash table size Operation: ● Walks the hash table chain by chain ● Calculates the number of nodes in the table ● The longest and shortest chains ● The maximum block count ● Prints all this information to stderr e) querytable Arguments: node **htable -> pointer to hash table unsigned long tabsz -> hash table size Operation: The function reads a line that is a DNSname one line at a time from stdin until getline() returns <= 0. This happens when a ctrl+d keypress is made on a line by itself (after enter is pressed). Looks up to see if the name is blocked or not. Using printf, prints “$dns_address [blocked]” if found in the hashtable or or “$dns_address [not blocked]” if not found in the database. f) deleteTable Arguments: node **htable -> pointer to hash table unsigned long tabsz -> hash table size Operation: This function frees all the memory allocated while creating linked lists for each entry in the hash table. Finally, it frees the memory allocated for the hash table itself. Run valgrind to check for memory leaks and memory errors. When you run valgrind ./dnsblock -s -b blocklist.txt < test_file.txt, It should report “All heap blocks were freed -- no leaks are possible” and “ERROR SUMMARY: 0 errors from 0 contexts” Hash function: The hash function will operate on the string you pass to it and return an unsigned long int. Insert the new entry at the front of the collision chain at the bucket specified by the hash value. The hash function is a C function that calls a small ARM assembly function called hashFun. It goes in the file hash.S - in the part denoted by the comment in the file in the starter code. The hash function will follow the following logic (modified from Dan Bernstein, University of Illinois): unsigned long hash(char *str) { unsigned long hash = 0; unsigned int c; while ((c = (unsigned char)*str++) != '\0') hash = hashFun((unsigned long)c, hash); // hash = c + (hash << 6) + (hash << 16) - hash; return hash; } where hashFun has this signature: unsigned long hashFun(unsigned long c, unsigned long hash); hashFun is an ARM assembly function which you will be implementing in hash.S. We have provided the boiler plate code and you need to implement the following logic: hash = c + (hash << 6) + (hash << 16) - hash; where c is in R0, hash is in R1, return value is in R0. You may use registers R2 and R3 as temporaries. Allowed Instructions (in hash.S) You are only allowed to use the instructions provided in the ARM ISA Green Card. Failure to comply will result in a score of 0 on the assignment. Style and Commenting Points will NOT be given for style in THIS assignment but future ARM assignments WILL have points for styling, and teaching staff won't be able to provide assistance or regrades unless code is readable. Please follow these Style Guidelines for ARM assembly. Submission and Grading 1. Submit your files to Gradescope under the assignment titled Assignment5. You will submit the following file: dnsblock.c hash.S README 2. After submitting, the autograder will run a few tests: a. Checks that all required files were submitted b. Checks that dnsblock.c, hash.S compile c. Runs some tests on dnsblock.c, hash.S Make sure to check the autograder output after submitting! We will be running additional tests after the deadline passes to determine your final grade. The assignment will be graded out of 50 points. Make sure your assignment compiles correctly on the pi-cluster accessed via ieng6. Any assignment that does not compile will receive 0 credit. Slides and Video Slides: https://docs.google.com/presentation/d/1iyIG7oXTZEjboLOJOJ6y6X0o5dYsDw0Nq9naYbrh5lk/edit?usp= sharing Video: https://bit.ly/CSE30sp21A5 References [1] https://en.wikipedia.org/wiki/Domain_Name_System 欢迎咨询51作业君