Name: Tufts ID:

COMP160: Algorithms, Spring 2020, Final Exam

Instructions

• You have 24 hours to answer the 8 questions in the exam.

• There are 120 points in the exam but the exam will be graded over 100 points (i.e., you can

get over 100%) in exam.

• Please read each question carefully, specify any assumptions that you make, and briefly

justify all your answers.

• Please start each of the 8 questions on a new page (for easier grading). Your answer can

be more than a page long. Subquestions can be on the same page. Your answers may be

handwritten or typewritten or any combination as long as they are easily readable.

• To maximize how much credit we can give you, we remind you to write answers that are

well-structured and easy to understand quickly.

• Upload your answers to Gradescope before noon on Saturday, May 2. To be safe, submit

something by 11am. Remember you can resubmit as many times as you like up until the

deadline. Late submissions will be penalized 1 pt per 3 minutes late.

• This exam is open-book. You may consult any books, notes, webpages, online videos, etc

that do not involve interacting with a live person. You may not consult a live person in any

way.

• Do not send any messages asking for clarification. If any parameters are unclear,

make a decision that is consistent with what is given and what we have done in COMP 160,

and specify your assumptions. However this does not mean you can add extra unwarranted

assumptions.

Examples: If we mention an otherwise unspecified hash function, you should assume it

is SUH since that was generally the case in this course. If we ask you about sorting without

mentioning duplicates, you are welcome to assume there either are or aren’t duplicates and

proceed accordingly. On the other hand, if we ask you a question about a BST, you cannot

assume it is an AVL tree. All you know is that it is a BST.

• If you have any essential questions during the exam (e.g., something in the exam appears

incorrect) make a private question on Piazza. We will do our best to respond on time.

•

WARNING: This is an exam from previous semesters given as prac-

tice. Note that topics were slightly different and learning expectations

may have changed. Also, solutions were hastily written just as notes

for TAs. Use them at your own discretion

•

Self reflection post exam: students complained on length even though we feel it is shorter

than 2 midterms (and exam is 3 hours long). In the future we should not make 24 hour

long questions (students felt forced to work 10+ hours on exam). Similarly, we should

remove bonus questions: caused unnecessary stress and was not needed since grades were

high (median of 100!).

1 ©2020 Tufts University

Please write the following affirmation and sign with your name.

This will be uploaded as problem 8 on Gradescope and is necessary

in order for you to receive credit on the exam:

“I affirm that I have not consulted with any other person (except the

instructors) in the course of answering this exam. I understand that doing

so may result in a grade of zero on the exam. Signed, ”

2 ©2020 Tufts University

1. (16 points) When we discussed hash tables we assumed that n (the number of elements that

we insert) is known in advance. That way we could ignore rehashing. For this exercise we

will consider the case in which n is not known in advance, deletions are not allowed, and we

are handling collisions with chaining. Also, for any instant of time i, let ni and mi represent

the number of elements and buckets in the hash table, respectively.

When n is not known in advance we must modify insertion. In the table we always keep track

of ni and mi (when table is created, we set mi = 2). You also fix a desired upper bound

on the load factor α (for this exercise we will set α = 1). Before inserting element ei in the

table, you check if after insertion, the load factor would exceed α. If so, we double the size

of the array, rehash all elements into the new table, insert ei (and update values ni and mi).

Otherwise, no resizing is needed so we proceed with the usual insertion (and update ni).

(a) Explain in 1-2 sentences why now the worst case runtime of insertion under this modi-

fication is Θ(n)

Any time we insert and need to expand we must pay Θ(ni). So any expansion where

ni > n/2 will have this situation.

(b) The remainder of the exercise will show that insertion has an amortized cost of O(1).

Assume this is true. What does it mean to say insertion has an amortized cost of O(1)?

Explain it in your own words (in 2-4 sentences)

One single operation may need linear time, but n executions will need in total O(n) time.

(c) For a hash table Hi we use the potential function Φ(Hi) = 2ni−mi + 2. No justification

necessary.

i. What is the potential of an empty hash table? Remember that we do not allow

deletions and initial table size is 2

0

ii. What is the potential of a hash table when ni = 5 and mi = 8?

10− 8 + 2 = 4

iii. Can the potential be ever negative? Why? Remember that we rehash when α > 1

no, because we always have ni ≤ mi ≤ 2ni. The only exception to the previous inequality

is when the array is empty.

(d) Say that we execute an insertion that does not need to rehash all elements. Show that

the potential grows by 2 (that is Φ(Hi)− Φ(Hi−1) = 2)

It just follows from the fact that mi = mi−1 and ni1 = ni−1 + 1.

(e) Say that we do an insertion that causes a rehash. Show that the potential after the

insertion is 4 (that is Φ(Hi) = 4). What is the value of Φ(Hi)− Φ(Hi−1)?

Now we have mi = 2mi−1 and ni = ni−1 + 1. More before we were full, so ni−1 = mi−1

and 2ni−1 = mi.

So, Φ(Hi) = 2ni −mi + 2 = 2(ni−1 + 1)− 2ni−1 + 2 = 4.

Similarly:

Φ(Hi)− Φ(Hi−1) = 2ni −mi + 2− (2ni−1 −mi−1 + 2) (1)

= 2ni−1 + 2− 2ni−1 + 2− (2ni−1 − ni−1 + 2) (2)

= 4− (ni−1 + 2) = 2− ni−1 (3)

3 ©2020 Tufts University

(f) With the above results, show that the amortized cost cˆi of an insertion satisfies cˆi ≤ 3,

regardless of whether or not it caused a rehash. For simplicity, you may assume that it

takes 1 unit of time to insert an element in a hash table (in normal insertion or when

rehashing) and that the load factor α is 1. You can ignore times needed to check if

pointers are null, time needed to deallocate memory, and other similar computations.

In an insertion without expansion we have ci = 1. If we need to expand, cost is ci = ni.

So, now we have:

Without expansion: cˆi = ci + φi − φi−1 = 1 + 2 = 3.

With expansion: cˆi = ci + φi − φi−1 = ni + (2− ni−1) = (ni − ni−1) + 2 = 3.

(g) Use your previous answers to justify (with 4-6 bullet points) why amortized cost of

insertion is O(1).

Pasted from recitation:

• Let ci be the real cost of doing the i-th operation. The runtime is

∑

i ci

• It is difficult to give a bound for ci (sometimes it is low, sometimes it is high). We

want to somehow flatten the runtimes

• In order to make all runtimes as similar as possible we introduce the amortized cost

cˆi = ci + φi − φi−1

• φi is the potential of the data structure after the i-th operation

• In short, when ci is small, the potential should grow (φi − φi−1 must be positive).

The reverse when ci is big

• With some minor requirements on φ we can show

∑

ci ≤

∑

cˆi

• Use some magic to show that cˆi is small (say, cˆi ≤ O(1) or cˆi ≤ O(log n))

• Total time spent by the algorithm is equal to

∑

ci ≤

∑

cˆi ≤ n·the bound obtained

in previous step

Summary: cheap operations increase the potential. Expensive operations must use

enough potential to make sure cˆi remains low.

2. (10 points) Mati and Karen discover one day that they own the exact same set of n games.

They decide to pass the time over n summer evenings as follows: each evening Mati picks

one game at random from his collection. Similarly, Karen picks a game from her collection

and both games are played. Once a game has been picked it is retired from that person’s

collection and not picked again. On any evening where they both happen to pick the same

game, they play the game once and go out to JP Licks afterwards.

By end of the summer, how many times should they expect to go out to JP Licks?

This problem is identical to the hat-check problem in the class notes. The answer is 1.

X = number of times they go to JP licks.

Xi = 1 if on ith day they go to JP licks

X =

∑

Xi

E[Xi] = P (Xi = 1) = 1/n

E[X] = E[

∑

Xi] =

∑

1/n = n

4 ©2020 Tufts University

3. (10 points) By now you have heard a gazillion times that staying at home is the best we can

do to #FlattenTheCurve. Let’s discuss this with Comp 160 terminology:

• Model 1: no quarantine is introduced. In this situation, every currently infected person

infects a new person every 3 days.

• Model 2: everyone stays at home and spread is reduced by 75 percent, which we will

model by saying every currently infected person infects a new person every 12 days.

This model is vastly oversimplified: it assumes that once someone is infected they stay infected

and does not consider other effects like herd immunity and so on. Say that on day 0 there is

one person infected. Let M1(t) and M2(t) denote the number of infected people under models

1 and 2, respectively, where t represents days.

(a) Give a recurrence for M1(t) (that is, define M1(t) in terms of some M1(t

′) where t′ < t.)

M1(t) = 2M1(t− 3)

(b) Choose the correct explicit formulas for M1(t) and M2(t). Note that you can check this

against your answer to part (a).

i. M1(t) = t

2/3, M2(t) = t

2/12

ii. M1(t) = 3

t, M2(t) =

(

3

4

)t

iii. M1(t) = 2

t/3, M2(t) = 2

t/12

iv. M1(t) = 3t, M2(t) =

3

4 t

M1(t) = 2

t/3, M2(t) = 2

t/12

(c) Show that M2(t) = O(M1(t)) but that the reverse is NOT true.

M2(t) = 2

t/12 = (2t/3)1/4 < 2t/3 = M1(t), which shows M2(t) = O(M1(t)).

However, can we find c, n0 > 0 s.t. M1(t) ≤ cM2(t), i.e., 2t/3 ≤ c · (2t/3)1/4 for all n > n0?

No because dividing by (2t/3)1/4 would give (2t/3)3/4 ≤ c, and (2t/3)3/4 = 2t/4 is not a

bounded function.

Fun fact: we have just shown that the two exponential curves are not asymptotically

equivalent. This is why it is so important that we stick to Model 2.

4. (20 points) We are given a long string (like Comp160isAmazing) that we want to split into

a series of substrings of length `1, `2, . . . , `n in that order (in the previous example, we want

lengths 4, 3, 2, and 7).

You can split a string into two by slicing anywhere, but it costs `′ units (where `′ is the length

of the string).

So, in the previous example we could do as follows:

• Slice the string into one of length 9 (string Comp160Is) and a second one of length 7

(string Amazing). Cost of cut = original string length = 16

• Now, we slice the first string into two (Comp160 and Is). Cost of cut = 9

• We have three strings: Comp 160, Is, and Amazing. To finish, we slice the first string

into two, creating Comp and 160. Cost of cut = 7

5 ©2020 Tufts University

The total cost of all slice operations is 16 + 9 + 7 = 32

(a) First observe that the order matters. Give a different slice order for the example above

that results in a smaller overall cost.

• Slice Comp160isAmazing into Comp160Is and Amazing. 16 pts

• Slice Comp160Is into Comp and 160Is. 9 pts

• Slice 160Is into 160 and Is. 5 pts

Total 30 pts.

(b) Note that in the above example, the minimal cost is achieved by a greedy strategy (slicing

off the longest of either the first or last substrings at every step). Give a different problem

instance in which this approach would fail (Hint: what happens if all target words have

the same length?)

If a string has length 8 and we want all strings of length 1 the best we can do is split always

by the middle. We pay 8+4+4+2+2+2+2 = 20. Greedy will pay 8+7+6+5+4+3+2 =

35.

(c) Give an algorithm using dynamic programming that given the target lengths `1, `2, . . . , `n

slices a string of length ` = `1 + . . . + `n into the desired substrings, while at the same

time minimizing the overall cost. Justify correctness of your algorithm. What is the

runtime of your algorithm?

Note: as usual, points will be given for clear structure, efficiency, and arguing correct-

ness.

6 ©2020 Tufts University

I will call L the array of lengths, e.g. L = [4,3,2,7]. Let L be 1-indexed. Splitting the

original string at one of the designated locations is like separating L into two subarrays

1 to i and i + 1 to n, with cost equal to the sum of all entries of L. Splitting the entire

spring into the target lengths corresponds to splitting L into subarrays of size 1. We

do not actually need to know what the string is to compute the minimal cost of this

procedure.

Define subproblem: let MinCost(i, j) = the minimal cost required to split L[i,j] into

individual subarrays of size 1. Assume i ≤ j.

Base case(s): MinCost(i, i) = 0 because there is nothing further to split.

Justification of recursion: Consider the first split of the optimal solution for

MinCost(i, j). It must happen between consecutive indices of L. The cost of the first

split is the sum of all entries of L. The cost of all remaining splits will be accounted for

in the subproblems for each subarray.

Recursion:

MinCost(i, j) =

0 if i = j

j∑

a=i

L[a] + min

1≤a≤j−1

(MinCost(i, a) +MinCost(a+ 1, j)) otherwise

Runtime: is horrible so there’s probably a better solution. There are n2 subproblems

and each one takes O(j − i) = O(n) to compute, so the total runtime is n3. But if

this solution works I’d give a student 80-90% credit for it. We can be more efficient by

computing the table of subarray sums first in n2 time using DP (i.e. Sum(i, j) =

j∑

a=i

L[a])

but this doesn’t affect the asymptotic runtime since we still have to compute the min for

each subproblem of MinCost.

(d) (Bonus 3 points. Attempt this only when you have finished everything else) What if

you wanted to list the slices itself (in addition to the minimum cost)? Describe with 4-5

sentences how would you do so.

We need to be able to determine what choice was done. This is done by storing bread-

crumbs or noticing what made the decision at each step.

5. (20 points) You are given a weighted directed graph G = (V,E) with n vertices and m edges.

Each edge e has a weight w(e) (for simplicity you can assume that all weights are strictly

positive) and a color (white or black). Your goal is to modify an existing algorithm from class

so that given G and a vertex s ∈ V , the algorithm computes the shortest path from s to all

other vertices of G with the constraint that in a shortest path you are not allowed to traverse

two white edges in a row (for example: a possible path could follow edges in color black,

black, white, black, white but is not allowed to do white, white, black). Your algorithm need

only compute the length of the shortest path (not the path itself).

For full credit you must address the full points in separate entries

(a) Which algorithm from class do you plan to modify to solve the problem? (just name the

algorithm)

7 ©2020 Tufts University

Dijkstra or BF work. Of course they have to be consistent with answer below

(b) What modifications will you apply? (explain in more detail the modifications). The

details can be challenging. A fully correct algorithm will receive full credit. An attempt

which is partly correct but also analyzes what parts are not handled correctly yet will get

very high credit. An attempt which is partly correct with no reflection will get partial

credit. MATI: Grader’s note: this is a tough question: our goal is that well structured

attempts get half credit, a quarter for the idea and a quarter for a good execution overall.

Strategy 1 (easiest). Modify the graph: We will make 2 copies of G: intuitively speaking,

which copy you are in represents in what state you are. If you are in a vertex of the first

copy, then you can follow any edge. If you are in a vertex of copy 2 the next edge to

follow cannot be white.

Every vertex v appears twice (let v1 be the copy in first graph and v2 the copy in second

graph). Now we add edges: if there is a black edge uv in G, we add an edge from u1 to

v1 and from u2 to v1. If there is a white edge uv in G, we add an edge from u1 to v2.

Edges inherit the weights from G.

Let G′ be the modified graph. Run Dijkstra on G′ starting on s1. For every vertex

u, return its score as the minimum between u1 and u2. The shortest path is found by

following the edges from the vertex that had smallest score.

Strategy 2 (difficult). Remember two paths: Dijkstra normally stores only the shortest

path (this is why it has one score and one parent). We need to modify it to keep the

overall shortest path and the shortest path that reaches a vertex making sure that you

arrive by following a black edge.

The way to do so is to keep two scores: score (as usual, shortest path from s to current

vertex) and black score (shortest path under the constraint that the last edge that reaches

current vertex is black). Notice that v.score ≤ v.blackscore. For every vertex we will store

parent of both score and blackscore. As you will see, the details are a bit tricky

Initiallize all scores as infinite, except for source (both scores zero). Now priority queue

will have 2n elements, each vertex appears twice in the queue: once with flag followed

white edge equal to true and once with the flag equal to false. Those flags will never

change

Main loop of algorithm is the same as Dijkstra: extract the vertex v of lowest priority,

relax outgoing edges (possibly updating priorities), extract another vertex and so on. We

finish when no elements remain in the priority queue.

However, some changes are needed in main loop. When you extract vertex v from queue,

check its flag:

• If followed white edge is true, we have just obtained the score of v. For any black

outgoing edge vw we relax both copies of w in the priority queue (with flag true and

false) if they are in queue. Note that we do not relax from white outgoing edges.

• if followed white edge is false, we have just obtained the black score of v. We relax

all black outgoing edges as before. In addition, for any white outgoing edge vw we

relax the copy w that has the flag equal to true (if it is still in Q).

Once the big loop has ended the length of the shortest paths are stored in the score as

usual. Paths are a bit hard to extract (you need to recall how you reached that vertex

and follow either parent or black parent...but this was not asked in the question).

8 ©2020 Tufts University

Strategy 3 (intermediate). Use same idea as 2 but with Bellman-Ford:

Each vertex has two scores (v.score v.blackscore). As usual, source has 0 for both, infinite

for the rest

Now, when you relax edge uv you have two cases:

• If edge is white: we update score of v (if current score higher than u.blackscore

+w(uv), we update score and parents)

• if edge is black we update both score and blackscore of v (possible new score is u.score

+w(uv)).

We still do m rounds of relaxations. When we end, the scores contain the lengths of the

shortest paths.

(c) Any additional assumptions?

Graph is given in adjacency list

(d) Justify correctness of your modified algorithm

Strategy 1: is based on the fact that the two layers can be used to remember which

edge we followed. When we are at copy 1 we can follow any edge, but possibly we land

on copy 2 (if we follow a white edge). When we are at copy 2 we can only follow black

edges (which puts us back at copy 1). Once this is somehow justified the rest follows from

classic Dijkstra

Strategy 2: The key in correctness follows from same idea as Dijkstra: when you extract

a vertex you know their correct score. Now, you need to be careful of what type of

extraction you did (depending on flag), and if so you computed same score or blackscore.

Strategy 3: As in BF, we do induction in the number of hops in the shortest path. The

main change is that score and blackscore both are correct. One more round of relaxation

should update both scores one more hop (students should have some justification on why

the two cases mentioned above work)

(e) What is the runtime of your algorithm?

In both cases we are running Dijkstra with twice the number of vertices, so O(n+m log n)

if using Fibonacci...but other similar runtimes are accepted (see Recitation 2A). BF will

run in O(nm) time. We just do a constant modification

(f) How much space does your algorithm need?

O(n+m) as usual for Dijkstra/BF

Hint: this question has the same vibe as homework 9, question 2. Rather than creating an

algorithm from scratch, pick a technique that we described in class and modify it to work

for this case. Remember that you do not need to describe what is common: limit yourself to

what you changed. Make sure to justify correctness and discuss time/space constraints. Feel

free to reuse correctness Lemmas from class (but make sure that they apply to your modified

algorithm!).

Fun fact: variations of this question (possibly with several colors and more complicated

prohibitions) are used as a challenge for coding interviews. Make sure you find out the

answer and you will dazzle your interviewer!

6. (24 points) True/False? Justify your answer (The justification should be 1-3 sentences OR a

counterexample. The justification will be most of the credit).

9 ©2020 Tufts University

(a) If all weights are distinct, then there is a unique shortest path from u to v.

False. This can be done with two paths from u to v. One has weights 3 and 7, but the

other has 4 and 6

(b) Suppose s

p v is a shortest path from s to v. Then Dijkstra always relaxes the edges of

p in order along the path.

True. This is why dijkstra is so much faster than BF (and why we relax each edge once).

(c) If G is undirected, connected and has negative weights, it is guaranteed to have an MST.

True. There will be spanning trees with scores, and (since there are finitely many) there

will be one or more with a lowest score. Note: Both Prim’s and Kruskal’s will work.

(d) If G is undirected, connected and has negative weights, it is guaranteed to have an SSSP

solution.

False. In case of negative cycles there is no solution.

(e) Suppose algorithm DoesSomething calls itself recursively three times with a quarter

of the original input, but requires O(

√

n) overhead to process the results of the recursive

calls. Overall this means DoesSomething will have greater than linear runtime.

This is recursion T (n) = 3T (n/4) + O(

√

(n) which by the master theorem solves to

Θ(nlog4 3) which is less than n. Therefore false.

(f) Let G be a graph where some of the edges and vertices are shown below with their

information from the CutVertices algorithm (as usual, in format (disc, α, β)). Based

on this information, G must have a cut vertex.

u

(20,5,1)

v

(26, 20, 1)

w

(28, 20, 20)

false. We do not have enough information. Clearly u and v are not causing a cut vertex.

Maybe the parent if w is a cut vertex but it depends on who the parent is (if it is u then

yes, but it could be v in which case it is not a cut vertex).

(g) Decision problems can have approximations.

false. You cannot approximate a true/false statement

(h) A “problem lower bound” is the minimum of all worst-case runtimes for all known

algorithms that solve the problem.

false. it is the minimum of all existing algorithms. Even those that we do not know

7. (15 points) Examples. Give an example for each of the following, OR explain why it is not

possible. Justify if needed (e.g., to explain why the example you provide has the desired

properties, if not immediately obvious.)

(a) A graph with 10 vertices where BFS and DFS have the same discovery times and the

same tree (that is, BFS/DFS follow the same edges in their exploration).

a star (that is, vertex 1 connected to all other vertices and no other edges)

10 ©2020 Tufts University

(b) A graph with 10 vertices where BFS and DFS have the same discovery times and a

different tree.

the union of a star and a path (that is, vi connected to vi−1 and vi+1)

(c) An array of 10 numbers between 50 and 100 where constructing an AVL tree in the

order given will require one single rotation and one double rotation. Your answer should

consist of: the array, the tree before & after the single rotation, and the tree before &

after the double rotation. No other steps needed.

insert 75,50, 25. This causes a single rotation. Now we have 50 on top and 25,75 as

children.

Now insert 60, 80 and 65. This causes a double rotation.

(d) A situation in which RadixSort would take Θ(n log n) time.

all numbers are distint. Then we need log n bits to represent each number and thus nlogn

total time

(e) A weighted, undirected, connected graph with 8 vertices such that throughout Kruskal’s

algorithm each vertex will belong to components of size 1,2,4,8 (in that order, of course).

Draw the example (with weights on edges) and explain in what order will edges be added.

Fun fact: this example helps prove that the Θ(log(n)) bound on Union-Find is tight.

Easiest way is to draw the 8 vertices. Then put weights so that we choose how they are

added onto kruskal.

Edges of weigth 1:between v1 and v2, between v3 and v4, ... and v7-v8

All those edges are added and now all vertices started in components of size 1 to 2

Now we put as edges of weight 2 a connection between v1-v3 and v5-v7 Now we have two

components of size 4

Finally, an edbe between v1 and v8 of weight 3

8. (5 points, required for credit) Write the following affirmation and sign with your name. This

is necessary in order for you to receive credit on the exam:

“I affirm that I have not consulted with any other person (except the instructors) in the

course of answering this exam. I understand that doing so may result in a grade of zero on

the exam. Signed, ”

11 ©2020 Tufts University

欢迎咨询51作业君