辅导案例-COMP0023

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

University College London
Department of Computer Science
COMP0023 Networked Systems
Individual Coursework 1: The Domain Name System
Due: 4th November 2020
By mapping names to IP addresses, the Domain Name System (DNS) plays a critical role in enabling end
users to access services all over the Internet.
For this coursework, assume that some UK universities have decided to run a DNS hierarchy (from the root
to authoritative name servers) on their own, as part of a larger experiment aimed at reproducing the dynamics
of the Internet in a controlled environment1. As soon as you become aware of the project, you enthusiastically
volunteer to join the project, eager to experiment with concepts you are studying in COMP0023.
Soon after your candidature, you are granted access to the machines prepared to host the university DNS in-
frastructure. You are then given the task to implement name servers (NSes) supporting specific name resolution
patterns decided by your new team leader.
Getting Started
Before your code is deployed in production, you are asked to demonstrate the correctness and robustness of
your implementation in a virtual environment – a common practice also used in industry to limit the amount
of bugs and misbehaviours experienced by actual users.
To complete this coursework, you will therefore have to download a VirtualBox Virtual Machine (VM),
and run an emulated network in it. In the following, we will also refer to such an emulated network as
lab. A link to the VM you will have to use for this coursework is available at the following web page:
http://www0.cs.ucl.ac.uk/staff/S.Vissicchio/comp0023/cw.html.
The VM includes all the software needed to complete your coursework. In fact, you are strongly advised
to install no additional software in the VM. This is in your own interest: our marking scripts will test
your submission inside a new copy of the VM for this coursework, and you will get zero marks if our scripts fail
because your code relies on additional software (e.g., tools or libraries) you customised your local copy of the
VM with.
Our original VM comes with no graphical interface. In other words, when you connect to the running VM,
all you will see is a command line interface (CLI). This is a deliberate choice of ours, motivated by the fact that
a command line is really all you need (for this coursework, at least). Avoiding graphical interfaces additionally
enables us to let you have first-hand experience on the convenience and expressiveness of the Linux CLI (in the
case you haven’t had the chance to appreciate it yet), while also limiting hardware requirements for you to run
the VM. If you feel you absolutely need a graphical interface to be productive, you can of course install one. In
this latter case, however, do remember to test your coursework’s solution inside a copy of our original VM.
Downloading your lab
After installing the VM, the first step is for you to download and install your lab for this coursework. To do so,
you can login in the VM specifying vagrant as both username and password. Then, simply run the installer
provided in the VM: position in the directory /home/vagrant/comp0023 platform, and execute the following
command.
vagrant@vagrant:$ cd comp0023 platform
vagrant@vagrant:∼/comp0023 platform$ bash install lab.sh
where the argument of the script is your own Portico student number (as it appear on Moodle).
Starting and stopping an emulated network
Now, you are ready to start a new network emulation, that is, to spawn virtual hosts and routers inside your
machine. To do so, position in the /home/vagrant/comp0023 platform directory, and execute the startup.sh
script from there, as follows:
1note that this scenario is totally fictitious, to the best of our knowledge
1
vagrant@vagrant:$ cd /home/vagrant/comp0023 platform
vagrant@vagrant:∼/comp0023 platform$ sudo bash startup.sh
where sudo is needed to run the startup script as privileged user inside the VM.
Important: the first instructions run by the startup.sh script destroy all the emulated hosts and
routers running in the VM at the moment the startup script is executed. So, when executing the
above command, be always sure that no data, files or scripts have to be saved from the emulated
network devices before they are torn down: you would lose them otherwise!
Also, note that the images for emulated network devices will be downloaded the first time you start a lab. This
download can take quite some time, depending on the speed of your Internet connection. It is however done
only once, at the very first execution of startup.sh.
To stop a network emulation without starting a new one, use the cleanup script specifying the directory
/home/vagrant/comp0023 platform as argument – or “.” if you are already positioned in that directory.
vagrant@vagrant:∼/comp0023 platform$ sudo bash cleanup/cleanup.sh .
Interacting with an emulated network
To perform network emulations, we rely on Docker. Roughly speaking, Docker enables to run software in
(partially) isolated environments, called containers, that can be thought of as lightweight Virtual Machines. In
this coursework, we will emulate both network intermediate systems (i.e., routers) and hosts (i.e., running DNS
name servers) as Docker containers.
You can retrieve diagnostic information about Docker containers and interact with them through the docker
CLI command. Although a full tutorial on the docker command line interface is beyond the scope of this
document, we report hereafter a few commands that we expect to be useful for you. We also encourage you to
check the docker documentation, e.g., by typing on the VM CLI man docker, man docker ps, and similar call
to man pages for other docker sub-commands.
Once an emulated network is started with the startup.sh command described in the previous section, you
can obtain a list of running containers by typing the following command.
vagrant@vagrant:$ sudo docker ps | grep "host"
In particular, the above command will print information about Docker containers emulating hosts. For
this coursework, you will have to work on hosts only, that is, those whose names terminates with the “host”
substring – so, please, ignore the containers emulating routers that you may notice if you execute sudo docker
ps without the grep instruction.
To log in a container (e.g., to access the container running a name server), you can use docker exec:
vagrant@vagrant:$ sudo docker exec -it bash
The above command will launch a (bash) command line interface on the given container. Hence, if you
specified 1 R1host as container name, you will be presented with the following prompt:
root@R1 host:/#
indicating that successive commands you will type will be executed inside the container named 1 R1host.
Once logged in a container, you are conceptually working in the machine emulated by the container: each
container has its own filesystem and runs its own processes. Try for example to type ls in the command line
to check which files are locally available.
You may find it useful to retrieve information about the network interfaces of hosts emulated by containers.
Use ifconfig to do so:
root@R1 host:/# ifconfig
The output of ifconfig will specify number, type and various parameters of each network interface of the
emulated host. Among them, the IP address of each interface is reported next to the keyword inet.
Important: ignore the interface called lo, which is a “virtual” interface, not corresponding to any
physical link.
2
You may like to develop code outside containers, and then transfer files (e.g., source code) to them; or,
vice versa, you may want to perform some debugging inside a container and then analyse collected information
outside it. You can transfer files between the VM and the containers running in it through docker cp. In
particular, the following command will copy a file from the VM to a container:
vagrant@vagrant:$ sudo docker cp :
You can use docker cp to copy a file from a container to the VirtualBox VM, too:
vagrant@vagrant:$ sudo docker cp :
DNS development and debugging utilities
As a starting point for your work, we provide an implementation of the root nameserver for the portion of the
fictitious university-hosted DNS hierarchy you have to implement. After starting your emulated network, you
will find the Python3 implementation of such a root name server up and running at the container 1 R2host. In
the same container, you will also find the source code of this implementation: check the /home/root-ns.py file
inside 1 R2host.
The source code for the root name server exemplifies the usage of a few building blocks you will likely need
for the implementation of your name servers. Prominently, root-ns.py shows how to programmatically send
and receive UDP packets through a network interface, using the so-called sockets in Python. It additionally
illustrates how to use the dnslib Python library for parsing received DNS packets, and packing replies to
them. Although we believe that the code is mostly self-explanatory, don’t hesitate to consult the official Python
documentation on sockets (see https://docs.python.org/3.5/library/socket.html) and on the dnslib
library (see https://pypi.org/project/dnslib/) if you need additional information.
Of course, you can copy the root-ns.py file (e.g., using docker cp) on other hosts, and use it as a basis for
a name server responsible for DNS zones other than “.”. Since the source file is in Python, running the software
on a container named hostX boils down to connecting to that container, and executing the script from its CLI:
root@hostX:/home# python3 root-ns.py -a X.Y.Z.W
where the string X.Y.Z.W following the -a option is the IP address you would like other hosts to contact
the name server at. Once the above command is executed, other hosts can therefore send packets to the process
running root-ns.py by specifying X.Y.Z.W as destination IP address. For example, you can log in another
emulated host, say 1 R1host, and send a DNS query to the root name server, by using the dig tool:
root@1 R1host:/home# dig @1.102.0.1 www.example.com.
The above dig command will print the records returned by the name server at the IP address 1.102.0.1
when that NS is asked about the input name www.example.com. You can further check the content of the query
packet received by the root name server as well as its response by adding print(..) statements inside the
root-ns.py file in 1 R2host.
In addition to dig, you can also debug your DNS hierarchy with the tracedns script located in the home
directory of every emulated hosts of a running lab. This script tries to resolve a name provided as argument the
same way a local nameserver with an empty cache would. For instance, the following command would trigger
iterative DNS queries to the name servers in the emulated DNS hierarchy, aimed to resolve the input name,
www.example.com.
root@1 R1host:# python3 home/tracedns.py www.example.com.
More precisely, the above command will send a first DNS query for www.example.com. to the hard-coded IP
address of the root NS (i.e., 1.102.0.1). If any NS record is provided in the reply, tracedns will then send a new
query for the same name to the returned authoritative name server, and it will continue to traverse the DNS
hierarchy until either it resolves the initial name, or it cannot proceed further (e.g., because of an unresponsive
name server). We provide more details on the tracedns output in the following section, since such output is
also used to specify the name resolution patterns your NS implementation should support.
Of course, you are also allowed to build your own additional debugging tools. For example, you could
implement a local name server answering to recursive queries. We stress however that no custom tool additional
to dig and tracedns is needed to complete this coursework.
3
Stage 1: Supporting Name Resolution Patterns
Your primary objective in this coursework is to support the resolution of names according to the unmodified
DNS protocol. This means that you will have to implement support for the very same interactions that happen
in the Internet when a name is mapped to an IP address. Yet, you will have to do so for specific names and IP
addresses, and supporting pre-defined resolution patterns.
Input: All the information you need to complete Stage 1 is provided in the text files inside the stage1 directory
(created after the installation of your lab). Those files store the output that tracedns should returned for specific
names, when your name servers are implemented. The content of those files is similar to the following, although
names, IP addresses and TTL values will be different:
trace recursive DNS query to resolve www.example.com (10.58.210.228)
1 root-server.net. [10.0.0.1]
2 tld.net. [10.41.162.30]
3 ns2.example.com. [10.239.34.10]
final reply
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33369
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 4
;; QUESTION SECTION:
;www.example.com. IN A
;; ANSWER SECTION:
www.example.com. 299 IN A 10.58.210.228
;; AUTHORITY SECTION:
example.com. 172800 IN NS ns2.example.com.
;; ADDITIONAL SECTION:
ns2.example.com. 172800 IN A 10.239.34.10
The content of each resolution pattern’s file encompasses two main parts. The first part shows the trace of
the query, i.e., the sequence of name servers contacted during a recursive query to resolve a given name. The
second part of the file details the final answer to the query, in a format very close to the dig command.
A couple of observations are worth doing. Each trace you are provided with lists name and IP address of
the name servers that are sequentially queried by a local name server with an empty cache when resolving the
name specified in the first line of the file. Thus, the first name server in the list is always the DNS root, and the
last one is the name server returning the final answer to the query. The first four lines in the above example
trace therefore indicate that to resolve the name www.example.com, a local NS with an empty cache would first
query root-server.net, then tld.net and finally ns2.example.com. In analysing the traces in the stage1
directory, assume that all NSes in the trace provided an answer when queried, and no DNS packet was lost
when the files were generated.
Task: implement a hierarchy of name servers supporting the input resolution patterns. In other words, for
each trace file t in the stage1 directory, running tracedns with the name in t as parameter on any host in the
emulated lab should provide the same output as in t. [70 marks]
• Deliverable: a zip file cw1-task1.zip. The zip file must include one file per NS, with your Python
implementation of that NS, and no other files2. Each Python file inside the zip archive must be named as
.py, where is the name of the Docker container where the Python
file has to copied and run.
Important: you will lose marks if your submission does not follow the above format.
• Constraints: Each name server must be implemented in Python 3, and it must be possible to run it in
the same way root-ns.py is run on 1 R2host when a new lab is started. More precisely, given a script
hostX.py, it must be possible to run the script by copying it on the container named hostX, logging on
that container, and executing the following command from the directory where the script is copied:
2any file additional to the Python implementation of NSes will anyway be ignored
4
root@hostX:# python3 hostX.py -a X.Y.Z.W.
where X.Y.Z.W is the IP address of the hostX’s network interface other than lo. The above command
should launch a process that answers to DNS queries sent to port 53 (i.e., the port reserved to DNS) and
the given IP address X.Y.Z.W.
To implement NSes, you must not use any other library than the ones already provided inside the Docker
containers emulating hosts, that is, those included in the installed Python distribution plus dnslib. More
generally, you should not install additional software on any Docker container.
• Suggestions: Although this task requires the implementation of several name servers, we stress that
the NSes to be implemented share the vast majority of functionalities (e.g., parsing received messages,
processing them, sending replies, and so on) between them – and with the root name server. The most
important thing that changes across different name servers is the DNS zones for which they are authori-
tative, and hence the DNS records they store and return in their replies. In the implementation of your
NSes, you may therefore want to structure your code to isolate the functions shared across NSes.
Important: irrespectively of how you develop your code, remember to submit only one Python script
per NS ; you will lose marks otherwise!
• Marking scheme: We will assign marks for each trace that can be reproduced after running your NSes’
implementations. We will assign additional marks for non-disclosed queries that are consistent with the
traces you are given. Those non-disclosed queries are meant to check that your implementation does not
over-fit the input traces. More schematically:
– support for trace01: 10 marks
– support for trace02: 10 marks
– support for trace03: 10 marks
– support for trace04: 10 marks
– support for trace05: 15 marks
– support for additional queries: 15 marks
The additional queries can test the behaviour of your NS implementations when queried for: (i) zones in
your portion of DNS hierarchy, (ii) names of other name servers in your portion of DNS hierarchy, and
(iii) names with no A record. For each of those additional queries we will check that your name servers
answer as production NSes deployed in the Internet would.
Note that you are actually given six traces. We believe that the sixth trace presents exceptional challenges.
Your implementation of the DNS hierarchy is therefore not required to support trace06: you can still get
the maximum number of marks (i.e., 100) irrespectively of whether the sixth trace is supported or not.
Yet, you may want to accept the challenge of supporting this last trace too: if you succeed, you will get
ten bonus marks – in addition to glory and fame! Those marks would not enable you to exceed 100 marks,
but they can compensate for marks you may lose on other parts of the coursework. Evaluate pros and
cons of supporting trace06, carefully!
– [bonus] support for trace06: 10 marks
To check that a trace is correctly implemented, our marking scripts will download a copy of the VM and
of the lab you are initially provided with, copy your submitted zip file in the VM, copy and run each
Python script on the container indicated by the script’s name, and then launch tracedns for each of the
names in your resolution patterns files. Our scripts will automatically compute your mark by comparing
the output of such tracedns commands with the content of the corresponding resolution patterns’ files.
5
Stage 2: Avoiding Overloads
Suppose now that while operating the hierarchy, university researchers note that some name servers receive too
many requests for their resources. Because of one of those overloads, a nameserver in the portion of hierarchy
you implemented in Stage 1 is noted to fail at times. You are then asked to deploy on 1 R1host a replica of the
failing NS.
Input: You are provided with a minimalistic workload under which the non-root NS receiving the highest
number of requests is likely to fail: your task is to identify such a nameserver, and replicate it on 1 R1host.
The workload is specified in the stage2/workload.txt file, which has the following format:
LNS 1:

...

LNS 2:

...

The file reports queries sent to your portion of the DNS hierarchy by two different local name servers (LNSes).
For each LNS, the workload file lists the names that the LNS queried when the NS you will have to replicate
failed. In computing the NS which received the highest number of queries under the provided workload, assume
that no packet was lost during any query, and every LNS started with an empty cache, resolved the names in
workload.txt one after the other (i.e., no parallelization), and did not discard any resource record because of
TTL expiration.
Important: running tracedns on the queried names, one after the other, and counting the queries
generated this way does not enable to determine the NS you should replicate. Before starting Stage
2, be sure you understand why.
Task: Reduce the load of the non-root NS which receives the maximum number of queries when two LNSes with
an initially empty cache resolve the names in workload.txt. You are allowed to change the implementation of
any name server implemented in Stage 1 (including the root NS we provide), and to replicate exactly one NS
to the container named 1 R1host. For the replicated NS, the failure of either of the two replicas should not
reduce the number of names that can be resolved by an external LNS. [30 marks]
• Deliverable: a text file named cw1-task2.txt, and a zip file cw1-task2.zip. The text file must include
a single line with the name of the Python file to be deployed on the container 1 R1host. For example,
if you intend to replicate the name server running on 1 R100host3, you should submit a cw1-task2.txt
file with the following line only:
1 R100host.py
The cw1-task2.zip archive must exclusively include one file per NS: each of those files must be the
Python implementation of an NS. Each Python file must be named as .py, where
is the name of the Docker container where the Python file has to be copied and run.
Do not include in the zip file any NS implementation for the replica to be run on 1 R1host – we will run
the script indicated in cw1 task2.txt on 1 R1host.
Important: you will lose marks if your submission does not follow the above format.
• Constraints: the implementation of all NSes must be in Python 3. As for Stage 1, it must be possible to
run every submitted Python file from the CLI, using a command equivalent to the one used to start the
root NS. You must not change the mapping implied by Stage 1 between each NS and the zones for which
the NS is authoritative for. You must not install new software on any Docker container. Additionally, you
must not change the IP address at which existing NSes are run – which would anyway provide no benefit
in the context of this coursework.
3this will obviously not be your case, since there are much less than one hundred hosts in your lab
6
Important: The NS replica deployed on 1 R1host must appear in the traces with a name different
from any other NS in the original lab.
• Assumptions: Your input workload may include names that are not mentioned in Stage 1: assume that
no name server in the DNS hierarchy stores an A record for any of those names. If multiple NSes are
specified in the authoritative section of a DNS reply, all LNSes always try to contacted them in order:
LNSes send a query to the first authoritative NS first, if they don’t get any response, they try the second
one, and so on. In computing the load for each NS, assume that: (i) no packet is lost during any of those
queries, and (ii) the implementation of LNSes and NSes are bug-free.
• Marking scheme: We will assign marks as follows.
– correct selection of the NS to replicate; i.e., cw1-task2.txt specifies the name of the NS receiving
the highest number of queries under the input workload: 10 marks
– reduction of the number of queries for the replicated NS under the input workload: 10 marks
– robustness to one replica failure: 10 marks
Your implementation will be automatically checked by marking scripts. Our scripts first deploy the Python
files on Docker containers on the basis of the corresponding names. They then check how many queries
are served by every NS. Load-reduction checks will be repeated five times, each time simulating LNSes
restarting with an empty cache: your replica implementation will therefore pass our tests if it reduces the
load of the most loaded NS before replication either in a single execution of the given workload, or across
up to five executions.
Academic Honesty
You are permitted to discuss the lectures’ and assigned readings’ content about the Domain Name System with
your classmates, but you are not permitted to share details of the assignment, show your code (in whole or in
part) to any other student, or to contribute any lines of code to any other student’s solution.
All code that you submit must be written entirely by you alone.
We use sophisticated copying detection software that exhaustively compares code submitted by all students
from this year’s class and past years’ classes, and produces color-coded copies of students’ submissions, showing
exactly which parts of pairs of submissions are highly similar. Do not copy code from anyone, either in
the current year, or from a past year of the class. You will be caught, just as students have been caught
in years past.
Copying of code from student to student is a serious infraction; it will result in automatic awarding of zero
marks to all students involved, and is viewed by the UCL administration as cheating under the regulations
concerning Examination Irregularities (normally resulting in exclusion from all further examinations at UCL).
You have been warned!
Questions and Piazza Site
If you have questions about the coursework, please don’t hesitate to visit us during office hours, or to ask
questions on Piazza. When asking questions on Piazza, please be careful to mark your question as private if it
reveals details of your solution. Questions that don’t reveal details of your solution, such as those about how
to interpret the coursework text or lecture material, should be left public, though, so that all in the class may
benefit from seeing the answers.
As always, please monitor the Piazza site of the course. Any announcements (e.g., helpful tips on how to
work around unexpected problems encountered by others) will be posted there.
Credits
The support for network emulation used in this coursework is derived from the mini-Internet project4 developed
by the ETH network research group.
4 T. Holterbach, T. Bu¨hler, T. Rellstab, and L. Vanbever, “An Open Platform to Teach How the Internet Practically Works,”
in SIGCOMM Comput. Commun. Rev., 2020.
7

欢迎咨询51作业君