程序辅导案例 > Program >

程序代写案例-STAT3022

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Multiple Linear Regression Models - Part 2
Residual Diagnostics, Unusual observations
Dr. Linh Nghiem
STAT3022
Applied linear models
Regression Diagnostics
Background
Recall the MLR model
y = Xβ + ε, E(y) = Xβ, Var(y) = Var(ε) = σ2In
Assuming the design matrix X is full-ranked, so the OLS estimate
is
βˆ = (X>X)−1X> y .
The vector of fitted value and residual are
yˆ = X βˆ = X(X>X)−1Xy = Hy,
e = y−yˆ = y−Hy = (In−H)y
where H = X(X>X)−1X> is the n× n hat matrix.
1
Background
Similar to model diagnostics for SLR, diagnostic for MLR is based
on the residuals, which depends critically on the hat matrix H.
• H is symmetric, i.e H> = H. As a result, the matrix In −H
is also symmetric.
• Next, HX = X. As a result, (In−H)X = X−X = 0.
• Third, H2 = H, so we say H is idempotent. As a result, the
matrix In −H is also idempotent, since
(In −H)(In −H) = InIn −H In − In H + H H
= In −H−H + H = In −H .
• Finally, as proved in the Tutorial 4, trace(H) =
∑n
i=1 hii = p.
2
Residual vector
• First, let’s compute its expectation:
E(e) = E {(In−H)y} = (In−H)E(y) = (In−H)Xβ = 0.
• Second, let’s compute the variance-covariance matrix.
Var(e) = Var {(In−H)y} = (In−H) Var(y)(In−H)>
= (In−H)σ2 In(In−H) = σ2(In−H)(In−H)
= σ2(In−H),
i.e Var(ei) = σ
2(1− hii), Cov(ei, ej) = −σ2hij .
These computation tell us that (1) each residual term ei has a
smaller variance than the true error εi, and (2) these residuals are
correlated.
3
Residuals plots
We can use similar residual plots similar to in the case of simple
linear regression for model diagnostics. Specifically,
• To check constant variance assumption: Use the plot of
residual ei vs. fitted values yˆi or the plot of residual vs. each
covariate. no news is good news.
• To check normality assumption: Use normal quantile-quantile
plot, or normality test.
4
A reasonable constant-variance
l l
l
l
l
l
l
l
l
l l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll l
l
l
l
l
l
ll
−5
0
5
10
7.5 10.0 12.5 15.0
Fitted values
R
es
id
ua
ls
Constant variance is reasonable
Residuals vs. fitted values plot
5
Example of violation of assumption: Non-constant variance
l
lll
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
lll
l
l l
l l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
0.0 0.2 0.4 0.6 0.8 1.0
−
1.
5
0.
0
1.
0
Fitted values
R
es
id
ua
ls
l
l
l
l lll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
0.0 0.2 0.4 0.6 0.8
−
1.
0
0.
0
0.
5
Fitted values
R
es
id
ua
ls
l
l
l ll
l
l
l
l
ll
l
l
l
ll
l
ll
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
0.0 0.2 0.4 0.6 0.8 1.0
−
2.
0
−
1.
0
0.
0
1.
0
Fitted values
R
es
id
ua
ls
l
l
l l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
ll l
l
l
lll
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
0.0 0.2 0.4 0.6 0.8 1.0
−
1.
5
−
0.
5
0.
5
Fitted values
R
es
id
ua
ls
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
l l
l
l
l
ll
l
ll
l
l
l l
l
0.0 0.2 0.4 0.6 0.8 1.0
−
1.
5
0.
0
1.
0
Fitted values
R
es
id
ua
ls
l
l
l
l
ll
ll
l
l
l l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l
l
l l
l
l l
0.0 0.2 0.4 0.6 0.8
−
1.
5
−
0.
5
0.
5
1.
5
Fitted values
R
es
id
ua
ls
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l l
l
l
l
l
l
ll
l
l
l
l
l
l
l l
0.0 0.2 0.4 0.6 0.8 1.0
−
2.
0
−
1.
0
0.
0
1.
0
Fitted values
R
es
id
ua
ls
l l
l
l
l
l
l
l
l l
l l
l
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l ll
l
ll l
l
l
l
0.0 0.2 0.4 0.6 0.8 1.0
−
1.
5
−
0.
5
0.
5
1.
5
Fitted values
R
es
id
ua
ls
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
0.0 0.2 0.4 0.6 0.8 1.0
−
1.
0
0.
0
1.
0
Fitted values
R
es
id
ua
ls
6
A reasonable normality assumption
l
l l l
l
l l l l l
lllll
l
lll
lllll
ll
lllllll
l
lll
lll
l l
l l
l l l
l l
l
−5
0
5
10
−2 −1 0 1 2
Theoretical Quantile
Sa
m
pl
e
Qu
an
tile
Normality is reasonable
QQ−plot
7
Example of violation of assumption: Non-normality
l
l
l
l
l
l
l
l
l
ll l
l
ll
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
lll
l
−2 −1 0 1 2
0
2
4
6
8
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
l
l
l
l
l ll l
l
l
l
l
l
l
ll
l
l
l
l
l
lll
l
l
l
l
ll
l
l
lll
l
ll
−2 −1 0 1 2
0
2
4
6
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
l
l ll
l
l
l
l
l
l ll
ll
ll l
lll
l
l
ll l
lll l
l
l
ll l
l l
−2 −1 0 1 2
0
10
20
30
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
l
l
ll
l
ll
l
l
l
l
l
lll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l
−2 −1 0 1 2
0
1
2
3
4
5
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
l l
l
ll
l
l
ll
l
ll
l
l
l
l ll
l l
ll
l
ll
l
l
l
l
l
ll
l
lll
l
−2 −1 0 1 2
0
1
2
3
4
5
6
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
l
l
ll
l
l
l
l
l
ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
ll
l
l
−2 −1 0 1 2
0
1
2
3
4
5
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
l l
l
l l
l
l ll l
l
l
l
l
l ll ll
l
l
l
ll
l lll
ll
l
l
l
ll l
l
−2 −1 0 1 2
0
5
10
15
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
lll
l
l lll
l
l l
l
l
l
l l
l
ll
ll
l llll
ll
l
l
l
l
l
l
ll
l
−2 −1 0 1 2
0
2
4
6
8
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
l
l
lll
l
l
l
l ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
lll
l
l
l
l
l
l
−2 −1 0 1 2
0
1
2
3
4
5
6
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
8
Leverage, Outlier and Influential
Observations
Overview
Roughly speaking,
• An outlier is an observation that appears to contradict the
postulated model used to describe the data.
• A high leverage observation is one that is far from the
center of the predictor space.
• An influential observation is one that exerts substantial
influence of the fitted model; i.e when removed, significant
changes occur in the fitted model result.
Note that one does not necessarily imply another.
9
Overview
2.0
2.5
3.0
3.5
4.0
1.0 1.5 2.0 2.5 3.0
x
y
R2 = 0.86 , R2 (red point removed) = 0.87
Outlier, Low Leverage, not Influential
2
3
4
5
1 2 3 4
x
y
R2 = 0.9 , R2 (red point removed) = 0.9
High Leverage only
1.5
2.0
2.5
3.0
3.5
1.0 1.5 2.0 2.5 3.0
x
y
R2 = 0.36 , R2 (red point removed) = 0.6
High Leverage, Influential, Outlier
10
Leverage
• Each observation in the data always try to “pull” the fitted
line toward itself, i.e try to make the fitted value as close to
the observed outcome as possible.
• Formally, leverage of the ith point represents the change in yˆi
when yi changes one unit.
In MLR, we have
yˆi = h
>
i y =
n∑
j=1
hijyj ,
where hii denotes the (i, i) element of H = X(X
>X)−1X>.
Hence, when yi changes one unit, yˆi changes hii unit. Therefore
hii is a measure of leverage of the ith observation in MLR.
11
Leverage
Note that we have
trace(H) =
n∑
i=1
hii = p, so h¯ =
1
n
n∑
i=1
hii =
p
n
;
in other words, the average leverage is p/n.
• As a rule of thumb, the ith observation is said to have a high
leverage if their corresponding leverage hii > 2h¯ or hii > 3h¯.
• hii = x>i (X
>X)−1xi; in SLR, hii = n−1 + (xi − x¯)2/Sxx.
The further a point is from the center of the predictor space,
the higher leverage it has.
12
Outlier
An outlier is an observation that is far from the postulated model.
Hence, a natural idea to detect whether an observation is an
outlier or not is to look at whether the magnitude of the
corresponding residual ei is big.
But how big is big? From the previous background slide,
ei ∼ N
(
0, σ2(1− hii)
)
,
so the variance of each residual ei depends on the scale σ
2.
Therefore, to determine whether an observation is an outlier, we
typically look at one of the following types of standardized
residuals.
13
Different types of residuals
Standardized residual: We replace σ2 by σˆ2 and standardize
ri =
ei
σˆ
√
1− hii
The main problems with this kind of residual is that since ei and s
are independent, the distribution of ri is difficult to calculate.
14
Different types of residuals
To overcome this problem, we consider the regression with the
ith observation deleted (leave one-out cross validation).
• Take the ith observation (x>i , yi) out of the dataset.
• Fit the model on (n− 1) remaining observations and obtain
the residual standard error σˆ(i).
Finally, we obtain the externally studentized residuals
ti =
ei
σˆ(i)
√
1− hii
∼ tn−1−p
so the ith observation can be considered outlier if the |ti| is large
(for example, greater than t1−α/2,n−1−p with α = 0.05) .
15
Different types of residuals
The procedure seems to be computationally expensive, since it
requires us to run n regressions, where each regression has one
observation removed. It turns out that
ti = ei
[
n− p− 1
SSE(1− hii)− e2i
]1/2
with SSE =
∑n
i=1 e
2
i , the sum of squared residuals on the
regression with the full dataset.
16
Outlier detection
In practice, we typically obtain studentized residuals for all n
observations in the dataset, and we want to check whether every
observation is an outlier or not.
Hence, to be more conservative, when deciding whether an
observation is an outlier, we should use a more conservative
threshold for |ti|.
A simple way is to use something called Bonferroni correction,
when we claim the ith observation is an outlier if
|ti| > t1−α/2n,n−1−p.
17
Influential Observations
The last kind of unusual observation is influential observations, the
points when being removed results in significant changes in the
model fit.
Given we want to inspect whether the ith observation is influential,
the above definition motivates us to fit the models with and
without the ith observation, and examine the changes.
Two most common measures of these changes are
• DFBETA and DFBETAS: measure changes in regression
estimated coefficients.
• DFFITS and Cook’s distance: measure changes in model fit.
18
DFBETA and DFBETAS
Recall βˆ is the p× 1 estimated coefficient vector with the full
dataset, while βˆ(i) is the same quantity with the ith observation
being removed. Hence, we define
DFBETA(i) = βˆ− βˆ(i),
so DFBETA is another p× 1 vector, whose element is denoted as
DFBETA(i)j , j = 1, . . . , p. Finally we form the standardized vector
DEBETAS by standardizing each element separately
DFBETAS(i)j =
DFBETA(i)j
σˆ(i)
√
vjj
,
where vjj is the jth diagonal element of the matrix
V = (X>X)−1.
19
DFBETA and DFBETAS
• The DFBETAS helps us determine which data points
influence which coefficients.
• You can use cutoffs of 1, 2, or 3 to determine whether
DFBETAS(i)j is big enough.
• But the best approach is for each coefficient βj , make a plot
of all the DFBETA(i)j , i = 1, . . . , n and inspect large values.
20
DFFITS and Cook’s Distance
The second approach is to measure changes in model fit. The
influence of case i on the fitted value yˆi is
DFFITSi =
yˆi − yˆ(i)i
s(i)
√
hii
=
x>i βˆ−x>i βˆ(i)
s(i)
√
hii
= ti
√
hii
1− hii
The Cook’s distance measures the influence of the ith cases on all
n fitted values:
Di =
∑n
j=1
(
yˆj − yˆ(i)j
)2
pMSE
=
r2i hii
p(1− hii)
where ri is the corresponding studentized residual.
In the red part of the above formulas, the denominator is just the
standard error of the numerator.
21
DFFITS and Cook’s Distance
• Both DFFITS and Cook’s distance simultaneously takes both
leverage (hii) and outlier (ti or ri) measures into account.
• Hence, influential is associated with at least one of the two:
either high leverage or outlier.
• Similar to DFBETAS, a good approach for determining
whether an observation is influential is to plot all the DFFITSi
or Di, then inspect these large values.
22

欢迎咨询51作业君