程序代写案例-1COMPSCI

1COMPSCI 514: Algorithms for Data Science
Exam 2: Nov. 15, 2018
Instructor: Arya Mazumdar
T WO PAGES OF NOTES ALLOWED (BOTH SIDES0. CALCULATORS/OTHE
R ELECTRONIC
DEVICES NOT ALLOWED. PLEASE BE RIGOROUS AND PRECISE AND SHOW YOUR WORK.
IF YOU NEED EXTRA SPACE, USE THE BACKOF A PAGE. IF YOU HAVEQUESTIONS DURING
THE EXAM, RAISE YOUR HAND. THIS EXAM CONSISTS OF FOUR PROBLEMS THAT CARRY
A TOTAL OF 35 POINTS, AS MARKED (PERFECT SCORE = 35). DURATION: 60 MINUTES.
BEST WISHES!
NAME (LAST, FIRST):
Student Id Number:
Points Available Points Achieved
Problem 1 : 5
Problem 2 : 17
Problem 3 : 6
Problem 4 : 7
Total : 35
2Problem 1: (1 × 5 points) Write if the following statements are True or False beside them (no
justification is necessary).
1) The Jaccard distance between the two sets {a, s, d, f, 9, k} and {2, 9, w, s, t, d} is 1
3
.
2) The total number of elements in the set of 2-shingles in the string abcdabd is five.
3) For any two sets S and T , the probability that the minhashes (with a random uniform permutation)
being not equal is the Jaccard distance between the sets S and T .
4) The L2-distance between two vectors (1, 0, 0) and (0, 1, 0) is 2.
5) For any random variable X , E[X2] ≥ (E[X])2.
3Problem 2: (1 + 6 + 3 + 7 points) Consider the data stream 1,2,6,3,2,2,3,1,6,6
• What is the number of distinct items in the stream?
• Find the number of distinct items in the stream by the Flajolet-Martin Algorithm. Use the hash
h(x) = 3x+ 1 mod 7 for that. Show your work.
• Find the frequency of each element in the stream.
• Show the execution of Count-Min sketch data structure on the stream. Draw the Count-Min sketch
table and compute the estimated frequency from it for each of the input element. Use the following
two hashes and table width 7:
1) h1(x) = (2x+ 1) mod 7
2) h2(x) = (5x+ 3) mod 7
4Problem 3: (2 + 4 points) Suppose you have a Bloom filter with 8 bit memory and a set S of 100
allowed keys. The filter uses 3 random independent hash functions
• What is the probability that an element in S is going to be blocked by the filter?
• What is the probability that an element not in S is going to be blocked by the filter?
5Problem 4: (2 + 2 + 3 points) Suppose you toss a biased coin with probability of heads equal to 1
4
a
total of 1000 times (each of the toss outcomes are independent).
• If you use Markov inequality, what is the upper bound on the probability that the number of heads
is 500 or more?
• Use Chebyshev inequality to derive another upper bound on the same probability as above. What do
you get?
Consider a graph on n vertices, and for every pair of vertices draw an edge randomly if a toss of the coin