程序代写案例-STATS 303

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Statistical ML STATS 303
STATS 303
Please assign questions to corresponding pages when submitting and show your work to earn credit (you
don’t need to show the details of 1-dimensional integrations - feel free to use your favorite integral calculator
but adding steps to a computation may give you points in case of wrong answer). In case of numerical values, give
your answer to an approximation of 3 decimal digits.
1.
Suppose that P (Y = 1) = P (Y = 0) = 1/2 and
X|Y = 0 ∼ N(0, 1)
X|Y = 1 ∼ 1
2
N(−5, 1) + 1
2
N(5, 1).
(a) Find an expression for the Bayes classifier (the Bayes decision rule for classification) for the 0/1
loss and find an expression for the corresponding Bayes risk.
(b) What linear classifier minimizes the risk and what is its risk? (if not unique, give one optimal
linear classifier). In this case, a linear classifier is a classifier that divides the space with a single
line (which in the d = 1 case is a single point).
2. Consider a three category classification problem. Let the prior probabilites:
P (Y = 1) = P (Y = 2) = P (Y = 3) = 1/3
The class-conditional densities are multivariate normal densities with parameters:
µ1 = [0, 0]
>, µ2 = [1, 1]>, µ3 = [−1, 1]>
and
Σ1 = Σ2 = Σ3 =
(
1 0
0 1
)
(a) Compute the Bayes optimal classifier for the 0/1 loss for the following points: x = [−0.7, 0.1] and
x = [0.7, 0.7]
(b) Now assume that
Σ1 =
(
0.7 0
0 0.7
)
Σ2 =
(
0.8 0.2
0.2 0.8
)
Σ3 =
(
0.8 0.2
0.2 0.8
)
Compute the Bayes optimal classifier for the 0/1 loss for the following points: x = [−0.5, 0.5] and
x = [0.5, 0.5].
3. Assume a regression model y = f(x) + where x, y ∈ R, f(x) is some deterministic but unknown
function and ∼ N (0, σ2). Suppose g(x|θ) is our estimator to f where θ denotes the parameters.
(a) Write the density p(y|x) in terms of g(x|θ) and σ.
1
Statistical ML STATS 303
(b) Suppose there is an unknown joint density p(x, y) for x and y. Explain why the log likelihood
L(θ|X ) of p(x, y), where the sample X = {x`, y`}N`=1 contains i.i.d. data points, can be written as
L(θ|X ) = log
N∏
`=1
p(y`|x`) + C .
(c) According to Parts (a) and (b), show that the maximum likelihood estimator is given by minimizing
1
2
N∑
`=1
[y` − g(x`|θ)]2 .
4. Consider the data points x1 = (0, 1, 2)
T, x2 = (−1, 3, 4)T, x3 = (0, 0, 1)T and x4 = (2, 3,−2)T.
(a) Write a data matrix X for the data points where each row correspond to a data point.
(b) Suppose a system gives output yj if we input xj for j = 1, 2, 3, 4. We fit a ridge regression model
by solving
min
w∈R4
1
2
∥∥∥y − X˜w∥∥∥2
2
+
λ
2
‖w‖22 , (1)
where y = [y1, y2, y3, y4]
T. What is X˜?
(c) By taking the gradient with respect to w, derive the solution of (1) in terms of X˜,λ and y.
(d) Describe qualitatively how you expect your answer to the previous question would change if you
used a regularization of the form λ‖w‖1 where ‖ · ‖1 denotes the `1 norm in R4.
(e) What is the name of the two problems considered above?
5. (a) Let {x`}N`=1 for x` ∈ R be given. The K-NN density estimator is given by pˆ(x) =
K
2NdK(x)
where
dK(x) is the distance between x and its K-th closest neighbor in {x`}N`=1. Prove that pˆ is NOT a
probability density.
(b) Consider applying K-means with K = 2 clusters to the five points (0, 0), (1, 2), (2, 0), (3, 2), (4, 0).
Suppose the initial centers are set to be (0, 0) and (3, 0). Write the E-step and the M-step for the
first iteration. You need to clearly state the locations of the centers and the labels of the points.
6. Let (Z1, Y `), . . . , (ZN , Y N ) be generated independently as follows:
Z` ∼ Bernoulli(p)
Y `|Z` = 0 ∼ N (0, 1)
Y `|Z` = 1 ∼ N (3, 1)
(a) Assume we do not observe the {Z`}. Write the distribution (density) fY (y) of Y as a mixture.
(b) Write down the complete likelihood function for p (assuming the Z` are observed)
(c) Write down the E and the M step for the EM algorithm (i.e. write the updates of the algorithm
explicitly, like we did in class for the gaussian case)
7. Let X1 ∈ R and X2 ∈ R and
Y = m(X1, X2) +
where E() = 0
(a) Consider the class of multiplicative predictors of the form mˆ(x1, x2) = βx1x2. Let β
∗ be the best
predictor, that is, β∗ minimizes E(Y −βX1X2)2. Find an expression for β∗ in terms of expectations
of the quantities being considered.
2
Statistical ML STATS 303
(b) Suppose the true regression function is
Y = X1 +X2 + .
Also assume that E(X1) = E(X2) = 0, E(X21 ) = E(X22 ) = 1 and that X1 and X2 are independent.
Find the predictive risk R = E(Y − β∗X1X2)2 where β∗ was defined in part (a) (this answer
should be in form of a number).
3

欢迎咨询51作业君