STAT 2131 take-home final exam Due Friday, 12/4/20 at 11:59pm on Canvas. This exam is to be completed on your own. Since no one is watching you to make sure you follow that direction, I am trusting you to abide by the honor code and not collaborate with your classmates. I will be available via email and will hold my regularly scheduled office hours on Monday and Tuesday next week. Please be concise with your answers, and do not turn in any R code (I will not read it if you do). All numerical answers should be rounded to three decimal places. Good luck!! 1 1. Let Y = (y1, . . . , yn)T ∈ Rn, xi ∈ Rp, and X = (x1 · · ·xn)T ∈ Rn×p be full rank. Suppose your colleague wants to model yi as yi = xTi β + i, but has reason to believe some of the i’s take extreme, outlying values. (a) Would you recommend that your colleague use ordinary least squares to estimate β? Why or why not? (b) Your colleague wishes to simultaneously estimate β and identify potential outliers. They therefore propose the following estimator for β: Lλ (β, δ) = 12‖Y −Xβ − δ‖ 2 2 + λ‖δ‖1 {βˆ(λ), δˆ(λ)} = arg min β∈Rp, δ∈Rn Lλ (β, δ) . Note that they are only penalizing δ, and not β. (i) What is limλ→∞ βˆ(λ)? (ii) Is βˆ(0) unique? (iii) Show that the loss function Lλ is convex in x = (βT , δT )T ∈ Rp+n. (Hint: Recall the function f (x) is convex if f {αx1 + (1 − α)x2} ≤ α f (x1) + (1 − α) f (x2) for all α ∈ [0, 1].) (iv) Give a qualitative description of the purpose of δ = (δ1, . . . , δn)T in the above loss function. When answering this question, think about what values of i might cause δˆ(λ)i , 0. (c) Show that βˆ(λ) and δˆ(λ) must satisfy βˆ(λ) = (XTX)−1XT ( Y − δˆ(λ) ) δˆ(λ)i = S λ ( yi − xTi βˆ(λ) ) , i ∈ {1, . . . , n}, where S λ (x) = sign(x)(|x| − λ)+ is the soft-thresholding function. (d) Use part (c) to derive an iterative algorithm to determine βˆ(λ), and prove that the se- quence of iterates is such that Lλ is non-increasing at each iteration. (e) Your colleague says that if they knew which samples i = 1, . . . , n were outliers, they would remove them from the dataset completely. Based on your answer to part (c), is the loss function Lλ congruent with this statement? Explain. (f) Let ρλ(x) = 12 x2 if |x| ≤ λλ|x| − 12λ2 if |x| > λ be Huber’s loss function. Show that βˆ(λ) is a minimizer of ∑ i ρλ(yi − xTi β). Given your answer in part (e), what does this imply about the robustness of Huber’s loss in the presence of outliers? Should it be used if your goal is to completely eliminate the impact of outliers on your estimate for β? Explain. 2 (g) Now consider the general outlier-corrected estimate {β˜(λ), δ˜(λ)} = arg min β∈Rp, δ∈Rn 12‖Y −Xβ − δ‖22 + n∑ i=1 Pλ(δi) , where Pλ(x) penalizes large values of x and λ ≥ 0 is a tuning parameter. Under mild assumptions, Pλ(x) defines a thresholding rule S λ,P(x), where S λ,P(x) = arg min u∈R { 1 2 (x − u)2 + Pλ(u) } . (i) If Pλ(x) = λ 2 2 1{x , 0}, show that S λ,P is the hard-thresholding function, i.e. S λ,P(x) = x1{|x| > λ}. (ii) Show that like βˆ(λ) and δˆ(λ), β˜(λ) and δ˜(λ) must satisfy β˜(λ) = (XTX)−1XT ( Y − δ˜(λ) ) δ˜(λ)i = S λ,P ( yi − xTi β˜(λ) ) , i ∈ {1, . . . , n}. (iii) Let ρλ(x) be any continuously differentiable and robust loss function (e.g. Huber’s loss), and let Ψλ(x) = ddxρλ(x). Show that if Ψλ(x) + S λ,P(x) = x, β˜ (λ) is a stable point of ∑ i ρλ(yi − xTi β). That is, ddβ {∑ i ρλ(yi − xTi β) } |β=β˜(λ)= 0p. (iv) Use this result to suggest a continuous shrinkage function S λ,P that would satisfy your colleague’s goal of removing data points with potentially extreme values of i. 3 2. An airfoil is the cross-sectional shape of a wing or propeller, and is the object that helps generate lift (the force that allows airplanes to fly, for example). Here you will use the data “airfoil.dat” to model “pressue” (the sound pressure level, in decibels) as a function of “frequency”, “angle”, “chordLength”, “velocity”, and “thickness”. More information about the experiment and, for those interested, about airfoils can be found here and here. (a) Use ordinary least squares to regress pressure onto all of the other variables. (i) Write down the assumed mathematical model for pressure, define all coefficients in your model, and clearly list all assumptions. (ii) In order to satisfy modelling assumptions, decide whether the dependent variable requires a transformation. (iii) Fit the new model, and determine if this new model satisfies your assumptions from part (i). (iv) A colleague suggests that because pressure is not normally distributed, the boot- strap is a more appropriate way to do inference in these data. Design a bootstrap procedure to determine a 90% confidence interval for the expected transformed pressure variable at frequency, angle, chordLength, velocity and thickness values given in “Interval.txt”. (v) How does your interval compare to the confidence interval obtained using stan- dard normal theory? Are the similarities/differences between the two surprising? Explain. (b) Using the most appropriate model from (a), determine if the dependent variable is non- linearly related to the five covariates. If so, do your best to fix these non-linearities in the context of classical linear modelling, and determine if the changes to your model from part (a) are significant. (c) In your opinion, is the model from part (b) better or worse than that from part (a)? While we have not discussed many alternatives in class, do you think linear modelling is an appropriate way to analyze these data? Explain. 4
欢迎咨询51作业君