Shrinkage Estimators
• LASSO — L1 penalty
• Shrinks some coecient estimates all the way to 0
• Ridge — L2 penalty
• Shrinks coecie
nt estimates (not all the way to 0)
• Relaxed LASSO—decouple selection from shrinkage
• Many others
Linear Regression
y = X + ✏, ✏ ⇠ MVN(0,2I )
1. Linearity
2. Independence
3. Normality
4. Equal variance (homoskedasticity)
Estimate via OLS:
(yi xTi )2
yields ˆ = (XTX)1XTy , and we have shown
ˆ ⇠ N(,2(XTX)1)
What happens when our
assumptions are broken?
i (yi xTi )2 is still a reasonable thing to do, so we can get
our usual OLS estimator ˆ.
What about the appealing features of ˆ?
E [ˆ] = (XTX)1XTE [y]
Under linearity, E [y] = X, hence
E [ˆ] = (XTX)1XTX =
If linearity does not hold, E [ˆ] 6=
Note the other three assumptions were not necessary for unbiased
What about SEs?
Var(ˆ) = Var((XTX)1XTy)
= (XTX)1XTVar(y)X(XTX)1
Under Independence and Homoskedasticity Var(y) = 2I:
Var(ˆ) = (XTX)1XT2IX(XTX)1 = 2(XTX)1
But if either assumption is not met, our variance estimates will be
Hence our SEs, CIs, etc are invalid.
A Note on Normality
Without Normality, ˆ is no longer a linear transformation of a MVN
vector, hence it is no longer Normally-distributed, so our CIs, tests are
not necessarily valid.
However, in large samples, ˆ is approximately Normally distributed due
to the Central Limit Theorem.
n(z¯ E [z ])p
d! N(0, 1)
So we can get away with valid inference despite non-normal errors in
“large enough” samples.
• Replace critical tnp1,↵/2 values with z↵/2
Prediction intervals explicitly require Normality: ynew ⇠ N(xTnew,2)
Without normality, our prediction intervals are invalid.
• Predictions are still unbiased, however (why?)
Predictions intervals are sensitive to all 4 assumptions
Model Diagnostics
Each nice feature of regression relies on one of our assumptions (to
varying degrees)
Once we have fit a model, we need some tools to diagnose whether our
assumptions are broken
One of the best tools for diagnostics is to visualize residuals.
We can use ordinary residuals: ei = yi yˆi
Could also use studentized residuals: ri =
1hi where hi is the i
diagonal of H.
Intuition for studentized residuals
• Recall: e = (IH)y ⇠ N(0,2(IH))
• Hence ei ⇠ N(0,2(1 hi ))
• I.e., ei have di↵erent variances, so it is dicult to learn anything
about their distribution
• By contrast ei/
(1 hi ) has constant variance 2, so they should
look normally distributed when plotted.
• Note: in practice we estimate ˆ so the studentized residuals are
really t-distributed
Assessing Normality
Histogram of studentized residuals
• Should look like the density for N(0,1)
Normal-QQ plots
• Comparing quantiles to theoretical quantiles of N(0,1)
• Should fall on 45 degree line
Assessing Normality
Assessing Heteroskedasticity
Plot residuals against fitted values
• Can detect mean-variance relationships
• I.e. if there is higher variance for larger fitted values
Assessing Heteroskedasticity
Assessing Independence
Dicult to visualize unless you have something like time-series data
(Also: note that residuals are not independent even when the errors are!)
• Clear from P ei = 0
Instead consider how data were collected:
• Observations on patients, clustered within hospitals
• Students within classes
• Following a person over time
Assessing Linearity in SLR
Consider simple linear regression:
yi = 0 + 1xi + ✏i
Linearity assumption is
E [yi ] = 0 + 1xi
Residuals: ei = yi (ˆ0 + ˆ1xi )
Plot yi against xi
• Should look linear!
Plot residuals ei against xi
• Conspicuous pattern may indicate non-linearity
• Should look fairly “random”
• Sometimes easier to identify non-linearity than plotting yi against xi
Assessing Linearity
Assessing Linearity in MLR
Plotting yi against xi ignores the e↵ect of all the other covariates!
Instead visualize partial regression plots (added variable plots)
To assess linearity in x⇤
1. Regress y on other covariates (all xj except xj = x⇤)
• Get fitted values from this model fit, and compute the residuals, ey
2. Regress x⇤ on other covariates (all xj except xj = x⇤)
• Get fitted values from this model fit, and compute the residuals, ex⇤
3. Plot ey against ex⇤
Intuitively: we are isolating the y ⇠ x⇤ relationship, after adjusting for
the other covariates
Assessing Linearity
Practice Q1
Load the HSB2.csv data, and consider only the first 100 observations.
Regress math on locus, concept, mot.
(a) Are you comfortable with the heteroskedasticity assumption?
(b) Are you comfortable with the normality assumption?
(c) Are you comfortable with the linearity assumption?
Practice Q2
First generate a normally distributed covariate, xi . Now simulate a
dataset in each of the settings below (assuming all other assumptions
hold as usual), Regress yi on xi as usual and examine the diagnostic plots
to investigate the impact of these violations.
(a) Generate yi = 0 + 1xi + ✏i where ✏i
iid⇠ N(0, 2)
(b) Generate yi = 0 + 1xi + ✏i where ✏i
iid⇠ Unif (2, 2)
(c) Generate yi = 0 + 1xi + ✏i where ✏i
iid⇠ N(0, 2 + |xi |)
(d) Generate yi = 0 + 1xi + ✏i where ✏i
iid⇠ N(0, 1I (xi  0)+ 3I (xi > 0))
(e) Generate yi = 0 + 1x3i + ✏i where ✏i
iid⇠ N(0, 2)
(f) Generate yi = 0 + 1exp(xi ) + ✏i where ✏i
iid⇠ N(0, 2)
