程序代写案例-STAT 2131

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

STAT 2131 take-home final exam
Due Friday, 12/4/20 at 11:59pm on Canvas.
This exam is to be completed on your own. Since no one is watching you to make sure you
follow that direction, I am trusting you to abide by the honor code and not collaborate with your
classmates. I will be available via email and will hold my regularly scheduled office hours on
Monday and Tuesday next week.
Please be concise with your answers, and do not turn in any R code (I will not read it if you
do). All numerical answers should be rounded to three decimal places. Good luck!!
1
1. Let Y = (y1, . . . , yn)T ∈ Rn, xi ∈ Rp, and X = (x1 · · ·xn)T ∈ Rn×p be full rank. Suppose
your colleague wants to model yi as yi = xTi β + i, but has reason to believe some of the i’s
take extreme, outlying values.
(a) Would you recommend that your colleague use ordinary least squares to estimate β?
Why or why not?
(b) Your colleague wishes to simultaneously estimate β and identify potential outliers.
They therefore propose the following estimator for β:
Lλ (β, δ) = 12‖Y −Xβ − δ‖
2
2 + λ‖δ‖1
{βˆ(λ), δˆ(λ)} = arg min
β∈Rp, δ∈Rn
Lλ (β, δ) .
Note that they are only penalizing δ, and not β.
(i) What is limλ→∞ βˆ(λ)?
(ii) Is βˆ(0) unique?
(iii) Show that the loss function Lλ is convex in x = (βT , δT )T ∈ Rp+n. (Hint: Recall
the function f (x) is convex if f {αx1 + (1 − α)x2} ≤ α f (x1) + (1 − α) f (x2) for all
α ∈ [0, 1].)
(iv) Give a qualitative description of the purpose of δ = (δ1, . . . , δn)T in the above loss
function. When answering this question, think about what values of i might cause
δˆ(λ)i , 0.
(c) Show that βˆ(λ) and δˆ(λ) must satisfy
βˆ(λ) = (XTX)−1XT
(
Y − δˆ(λ)
)
δˆ(λ)i = S λ
(
yi − xTi βˆ(λ)
)
, i ∈ {1, . . . , n},
where S λ (x) = sign(x)(|x| − λ)+ is the soft-thresholding function.
(d) Use part (c) to derive an iterative algorithm to determine βˆ(λ), and prove that the se-
quence of iterates is such that Lλ is non-increasing at each iteration.
(e) Your colleague says that if they knew which samples i = 1, . . . , n were outliers, they
would remove them from the dataset completely. Based on your answer to part (c), is
the loss function Lλ congruent with this statement? Explain.
(f) Let
ρλ(x) =
 12 x2 if |x| ≤ λλ|x| − 12λ2 if |x| > λ
be Huber’s loss function. Show that βˆ(λ) is a minimizer of
∑
i ρλ(yi − xTi β). Given
your answer in part (e), what does this imply about the robustness of Huber’s loss in
the presence of outliers? Should it be used if your goal is to completely eliminate the
impact of outliers on your estimate for β? Explain.
2
(g) Now consider the general outlier-corrected estimate
{β˜(λ), δ˜(λ)} = arg min
β∈Rp, δ∈Rn
12‖Y −Xβ − δ‖22 +
n∑
i=1
Pλ(δi)
 ,
where Pλ(x) penalizes large values of x and λ ≥ 0 is a tuning parameter. Under mild
assumptions, Pλ(x) defines a thresholding rule S λ,P(x), where
S λ,P(x) = arg min
u∈R
{
1
2
(x − u)2 + Pλ(u)
}
.
(i) If Pλ(x) = λ
2
2 1{x , 0}, show that S λ,P is the hard-thresholding function, i.e.
S λ,P(x) = x1{|x| > λ}.
(ii) Show that like βˆ(λ) and δˆ(λ), β˜(λ) and δ˜(λ) must satisfy
β˜(λ) = (XTX)−1XT
(
Y − δ˜(λ)
)
δ˜(λ)i = S λ,P
(
yi − xTi β˜(λ)
)
, i ∈ {1, . . . , n}.
(iii) Let ρλ(x) be any continuously differentiable and robust loss function (e.g. Huber’s
loss), and let Ψλ(x) = ddxρλ(x). Show that if Ψλ(x) + S λ,P(x) = x, β˜
(λ) is a stable
point of
∑
i ρλ(yi − xTi β). That is, ddβ
{∑
i ρλ(yi − xTi β)
}
|β=β˜(λ)= 0p.
(iv) Use this result to suggest a continuous shrinkage function S λ,P that would satisfy
your colleague’s goal of removing data points with potentially extreme values of
i.
3
2. An airfoil is the cross-sectional shape of a wing or propeller, and is the object that helps
generate lift (the force that allows airplanes to fly, for example). Here you will use the
data “airfoil.dat” to model “pressue” (the sound pressure level, in decibels) as a function of
“frequency”, “angle”, “chordLength”, “velocity”, and “thickness”. More information about
the experiment and, for those interested, about airfoils can be found here and here.
(a) Use ordinary least squares to regress pressure onto all of the other variables.
(i) Write down the assumed mathematical model for pressure, define all coefficients
in your model, and clearly list all assumptions.
(ii) In order to satisfy modelling assumptions, decide whether the dependent variable
requires a transformation.
(iii) Fit the new model, and determine if this new model satisfies your assumptions
from part (i).
(iv) A colleague suggests that because pressure is not normally distributed, the boot-
strap is a more appropriate way to do inference in these data. Design a bootstrap
procedure to determine a 90% confidence interval for the expected transformed
pressure variable at frequency, angle, chordLength, velocity and thickness values
given in “Interval.txt”.
(v) How does your interval compare to the confidence interval obtained using stan-
dard normal theory? Are the similarities/differences between the two surprising?
Explain.
(b) Using the most appropriate model from (a), determine if the dependent variable is non-
linearly related to the five covariates. If so, do your best to fix these non-linearities in
the context of classical linear modelling, and determine if the changes to your model
from part (a) are significant.
(c) In your opinion, is the model from part (b) better or worse than that from part (a)?
While we have not discussed many alternatives in class, do you think linear modelling
is an appropriate way to analyze these data? Explain.
4

欢迎咨询51作业君