辅导案例-HW3

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

HW3: More Linear Regression
Stat 154, Fall 2019
Problem 1
We examine a response variable Y in terms of two predictors X and Z. There are n observa-
tions. Let X be a matrix formed by a constant term of 1, and the vectors x and z. Consider
the cross-product matrix XTX given below:
XTX =
30 0 0? 10 7
? ? 15

a. Complete the missing values denoted by “?”, and determine the value of n?
b. Calculate the linear correlation coefficient between X and Z.
c. If the OLS regression equation is: yˆi = −2 + xi + 2zi, What is the value of y¯?
d. If the residual sum of squares (RSS) is 12, What is the value of R2? Recall that
RSS = ∑ni=1(yi − yˆi)2.
Problem 2
Consider a linear regression model:
yi = w0xi0 + w1xi1 + wpxip + i = wTxi + i
and assume that the noise terms i are independent and have a Gaussian distribution with
mean zero and constant variance σ2:
i ∼ N(0, σ2)
In lecture, we discussed how to obtain the parameters w = (w0, w1, . . . , wp) of a linear
regression model via Maximum Likelihood (ML), which are given by: w = (XTX)−1XTy
Determine the ML estimate of the other model paramater: σ2 (the constant variance).
1
Problem 3
Multicollinearity is easy to detect in a linear regression model with two predictors; we need
only look at the value of r12 = cor(X1, X2). When there are more than two regressors,
however, inspection of the rij is not sufficient.
For example, assume that we have four predictors X1, X2, X3 and X4, and correlation coef-
ficients, rij are r12 = r13 = r23 = 0, with variances σ21 = σ22 = σ23, and X4 = X1 +X2 +X3.
Show that r14 = r24 = r34 = 0.577
Problem 4
Consider minimizing a quadratic function f(x) = 12x>Ax−bTx, where b is a vector and A
is a positive semidefinite matrix.
Suppose that A is invertible. Show that the minimum of f(x) is given by x∗ = A−1b.
Problem 5
Let
A1 =
1 0 00 2 0
0 0 2
 ,
A2 =
1 0 00 2 0
0 0 0
 ,
and
b =
11
0

Let f1(x) = 12x>A1x− b>x and f2(x) = 12x>A2x− b>x.
Implement Gradient Descent to minimize both f1(x) and f2(x). For each function, run
gradient descent with 5 different random initializations, and print the solutions. Are they
the same for each random initialization? Explain.
2