辅导案例-CS 391

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

CS 391: Midterm
October 15, 2020
Due date: Oct 16, 8am Boston time
Remember: For any question you answer “I do not know” you get 20% of the grade associated with this
question. A totally wrong answer gets 0.
Collaboration and Citing Policy: You must cite any sources that you use, including the lecture slides.
Additionally, collaboration with other students is not allowed. You are encouraged to find the answers
yourselves. If you use external sources please cite them.
• All solutions should be typed in LaTeX.
• Plots generated in python should be incorporated in the LaTeX file you submit.
• Your python code should also be submitted.
Exercise 1 (20 points) You and your friend decide to test whether a coin is a fair coin. You decide to
flip it 2000 times and count the number of heads and tails. Your experiment results in 1200 heads and 800
tails. How confident can you be that the coin is not fair? In other words, if the coin were fair, how likely
is the observed event? Give three upper bounds using the Markov, Chebyshev and Chernoff inequalities.
Which one is the best?
Exercise 2 (30 points)
• (20 points) Assume that you are given a dataset and you are asked to produce a soft clustering using
the EM algorithm. How are you going to decide the number of components (i.e., the value of k) by
running the EM algorithm as many times as you want (and without visualizing the data).
Describe your algorithm and explain why it is a reasonable algorithm.
• (10 points) The problem of describing the data using a mixture of k Gaussians is solved by the EM
algorithm, which finds a locally optimal solution (not necessarily the best). Show that when k = 1 this
problem can be solved exactly. Describe your algorithm and prove its correctness.
1
Exercice 3 (30 points) Use the dataset “spatial data.txt” which contains as rows Facebook users and as
columns 210 content categories on Facebook. Let’s denote this matrix as A.
• (5 points )Compute the cosine similarity between users in A and store it in matrix C.
• (10 points) Using code similar to the one we described in class, compute a low rank approximation
of A, using its first 25 singular vectors.
• (5 points )Compute the cosine similarity between users as described in this new 25-dimensional space.
Let these similarities be stored in matrix C ′
• (10 points ) Plot a histogram of the entries in D = C − C ′.
Exercise 4 (20 points)
1. (10 points) Show that A and AT have the same nonzero singular values. How are their SVD’s related?
2. (10 points) Show that if σ is a singular value of A then there exists a nonzero vector ~x such that:
σ =
||A~x||2
||~x||2 .
(Hint: A vector x can be written as a linear combination of scaled unit vectors. For instance, let
φ = {u1, u2, . . . , un} be a set of orthonormal vectors. We can then express x as x =
∑n
i=1 aiui for some
scalar values {a1, a2, . . . , an} along each unit vector. How can we express Ax using singular values?)
2

欢迎咨询51作业君