1COMPSCI 514: Algorithms for Data Science Exam 2: Nov. 15, 2018 Instructor: Arya Mazumdar T WO PAGES OF NOTES ALLOWED (BOTH SIDES0. CALCULATORS/OTHER ELECTRONIC DEVICES NOT ALLOWED. PLEASE BE RIGOROUS AND PRECISE AND SHOW YOUR WORK. IF YOU NEED EXTRA SPACE, USE THE BACKOF A PAGE. IF YOU HAVEQUESTIONS DURING THE EXAM, RAISE YOUR HAND. THIS EXAM CONSISTS OF FOUR PROBLEMS THAT CARRY A TOTAL OF 35 POINTS, AS MARKED (PERFECT SCORE = 35). DURATION: 60 MINUTES. BEST WISHES! NAME (LAST, FIRST): Student Id Number: Points Available Points Achieved Problem 1 : 5 Problem 2 : 17 Problem 3 : 6 Problem 4 : 7 Total : 35 2Problem 1: (1 × 5 points) Write if the following statements are True or False beside them (no justification is necessary). 1) The Jaccard distance between the two sets {a, s, d, f, 9, k} and {2, 9, w, s, t, d} is 1 3 . 2) The total number of elements in the set of 2-shingles in the string abcdabd is five. 3) For any two sets S and T , the probability that the minhashes (with a random uniform permutation) being not equal is the Jaccard distance between the sets S and T . 4) The L2-distance between two vectors (1, 0, 0) and (0, 1, 0) is 2. 5) For any random variable X , E[X2] ≥ (E[X])2. 3Problem 2: (1 + 6 + 3 + 7 points) Consider the data stream 1,2,6,3,2,2,3,1,6,6 • What is the number of distinct items in the stream? • Find the number of distinct items in the stream by the Flajolet-Martin Algorithm. Use the hash h(x) = 3x+ 1 mod 7 for that. Show your work. • Find the frequency of each element in the stream. • Show the execution of Count-Min sketch data structure on the stream. Draw the Count-Min sketch table and compute the estimated frequency from it for each of the input element. Use the following two hashes and table width 7: 1) h1(x) = (2x+ 1) mod 7 2) h2(x) = (5x+ 3) mod 7 4Problem 3: (2 + 4 points) Suppose you have a Bloom filter with 8 bit memory and a set S of 100 allowed keys. The filter uses 3 random independent hash functions • What is the probability that an element in S is going to be blocked by the filter? • What is the probability that an element not in S is going to be blocked by the filter? 5Problem 4: (2 + 2 + 3 points) Suppose you toss a biased coin with probability of heads equal to 1 4 a total of 1000 times (each of the toss outcomes are independent). • If you use Markov inequality, what is the upper bound on the probability that the number of heads is 500 or more? • Use Chebyshev inequality to derive another upper bound on the same probability as above. What do you get? Consider a graph on n vertices, and for every pair of vertices draw an edge randomly if a toss of the coin above gives you an head. • What is the average number of triangles in the graph?
欢迎咨询51作业君