辅导案例-MTH 542/642

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

1
MTH 542/642 Chapter 10 – Variable Selection

If there are p predictors, then there are 2
p
possible models.
Prefer: a model that fits sufficiently well in the least complex way.
Notice: removing variables from the model may improve the precision of the parameter estimates, but
may increase bias of these estimates.

There are two main types of variable selection: stepwise testing and criterion-based methods.
Criteria for comparing sets of variables selected are based on lack of fit of a model and complexity, the
later one measured by the number of terms in the model.

Crime Data (from Faraway)

Data collected from 50 states (U.S. Bureau Census). Response: life expectancy and the other variables
are predictors.
> install.packages("faraway")
> library(faraway)
> statedata<-data.frame(state.x77,row.names=state.abb)
> statedata
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
AL 3615 3624 2.1 69.05 15.1 41.3 20 50708
AK 365 6315 1.5 69.31 11.3 66.7 152 566432
AZ 2212 4530 1.8 70.55 7.8 58.1 15 113417
AR 2110 3378 1.9 70.66 10.1 39.9 65 51945
-----
WI 4589 4468 0.7 72.48 3.0 54.5 149 54464
WY 376 4566 0.6 70.29 6.9 62.9 173 97203

Backward Elimination Method
 Start with the model that includes all predictors.
 Remove the predictor with the highest p-value bigger than a chosen α (something like 0.05).
 Refit the model with the remaining predictors and repeat the above step until there are no more
predictors with a p-value greater than α.

Step1 – the model with all predictors

> m1<-lm(Life.Exp~.,data=statedata)
> summary(m1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.094e+01 1.748e+00 40.586 < 2e-16 ***
Population 5.180e-05 2.919e-05 1.775 0.0832 .
Income -2.180e-05 2.444e-04 -0.089 0.9293
Illiteracy 3.382e-02 3.663e-01 0.092 0.9269
Murder -3.011e-01 4.662e-02 -6.459 8.68e-08 ***
HS.Grad 4.893e-02 2.332e-02 2.098 0.0420 *
Frost -5.735e-03 3.143e-03 -1.825 0.0752 .
Area -7.383e-08 1.668e-06 -0.044 0.9649 highest p-value
---
Residual standard error: 0.7448 on 42 degrees of freedom
Multiple R-Squared: 0.7362, Adjusted R-squared: 0.6922
F-statistic: 16.74 on 7 and 42 DF, p-value: 2.534e-10
2

Step 2. Eliminate Area
> m2<-update(m1,.~.-Area)
> summary(m2)
lm(formula = Life.Exp ~ Population + Income + Illiteracy + Murder
+ HS.Grad + Frost, data = statedata)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.099e+01 1.387e+00 51.165 < 2e-16 ***
Population 5.188e-05 2.879e-05 1.802 0.0785 .
Income -2.444e-05 2.343e-04 -0.104 0.9174
Illiteracy 2.846e-02 3.416e-01 0.083 0.9340
Murder -3.018e-01 4.334e-02 -6.963 1.45e-08 ***
HS.Grad 4.847e-02 2.067e-02 2.345 0.0237 *
Frost -5.776e-03 2.970e-03 -1.945 0.0584 .

Step 3. Eliminate Illiteracy
> m3<-update(m2,.~.-Illiteracy)
> summary(m3)

Call:
lm(formula = Life.Exp ~ Population + Income + Murder + HS.Grad +
Frost, data = statedata)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.107e+01 1.029e+00 69.067 < 2e-16 ***
Population 5.115e-05 2.709e-05 1.888 0.0657 .
Income -2.477e-05 2.316e-04 -0.107 0.9153
Murder -3.000e-01 3.704e-02 -8.099 2.91e-10 ***
HS.Grad 4.776e-02 1.859e-02 2.569 0.0137 *
Frost -5.910e-03 2.468e-03 -2.395 0.0210 *
---
Residual standard error: 0.7277 on 44 degrees of freedom
Multiple R-Squared: 0.7361, Adjusted R-squared: 0.7061

Step 4. Eliminate Income
> m4<-update(m3,.~.-Income)
> summary(m4)
Call:
lm(formula = Life.Exp ~ Population + Murder + HS.Grad + Frost,
data = statedata)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.103e+01 9.529e-01 74.542 < 2e-16 ***
Population 5.014e-05 2.512e-05 1.996 0.05201 .
Murder -3.001e-01 3.661e-02 -8.199 1.77e-10 ***
HS.Grad 4.658e-02 1.483e-02 3.142 0.00297 **
Frost -5.943e-03 2.421e-03 -2.455 0.01802 *
---
Residual standard error: 0.7197 on 45 degrees of freedom
Multiple R-Squared: 0.736, Adjusted R-squared: 0.7126
>

3

Step 5. Eliminate Population ----- should we remove population?
> m5<-update(m4,.~.-Population)
> summary(m5)
Call:
lm(formula = Life.Exp ~ Murder + HS.Grad + Frost, data = statedata)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 71.036379 0.983262 72.246 < 2e-16 ***
Murder -0.283065 0.036731 -7.706 8.04e-10 ***
HS.Grad 0.049949 0.015201 3.286 0.00195 **
Frost -0.006912 0.002447 -2.824 0.00699 **
---
Residual standard error: 0.7427 on 46 degrees of freedom
Multiple R-Squared: 0.7127, Adjusted R-squared: 0.6939
F-statistic: 38.03 on 3 and 46 DF, p-value: 1.634e-12

Compare R-squared for the model with all predictors = 0.7362
With R-squared for the model after step 5 = 0.7127.
It is just a minor reduction, so we could keep model in step 5.

Even though we eliminated the other variables from the model, like Illiteracy, we cannot say that those
variables are not of interest.

Criterion Based Procedures

There are three such criteria:
1. AIC = Akaike Information Criteria
AIC = p
n
RSS
n 2log 





(p is the number of predictors included in the model)
Smaller values of AIC are preferred.

2. BIC = Schwarz’s Bayes Information Criteria
BIC = np
n
RSS
n loglog 






Smaller values of BIC are preferred.
BIC penalizes larger models more heavily than AIC does; will tend to choose smaller models than
AIC does.

3. Mallow’s Cp ; If a p predictor model fits, then E(RSSp) =
2)( pn 

Cp = np
RSS
 2
ˆ 2

If a p predictor model fits, then E(RSSp) =
2)( pn  and then pCE p )(

4
We will use R to illustrate variable selection based on AIC criteria for the state – level life expectancy
data.

We use the function step:

k = 2 gives the AIC criterion and k = log(n) gives the BIC criterion.

The step function does not evaluate all possible models, but compares them sequentially.

> step(m1)
Start: AIC= -22.18
Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad +
Frost + Area

Df Sum of Sq RSS AIC
- Area 1 0.001 23.298 -24.182
- Income 1 0.004 23.302 -24.175
- Illiteracy 1 0.005 23.302 -24.174
23.297 -22.185
- Population 1 1.747 25.044 -20.569
- Frost 1 1.847 25.144 -20.371
- HS.Grad 1 2.441 25.738 -19.202
- Murder 1 23.141 46.438 10.305

Step: AIC= -24.18
Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad +
Frost

Df Sum of Sq RSS AIC
- Illiteracy 1 0.004 23.302 -26.174
- Income 1 0.006 23.304 -26.170
23.298 -24.182
- Population 1 1.760 25.058 -22.541
- Frost 1 2.049 25.347 -21.968
- HS.Grad 1 2.980 26.279 -20.163
- Murder 1 26.272 49.570 11.568

Step: AIC= -26.17
Life.Exp ~ Population + Income + Murder + HS.Grad + Frost

Df Sum of Sq RSS AIC
- Income 1 0.006 23.308 -28.161
23.302 -26.174
- Population 1 1.887 25.189 -24.280
- Frost 1 3.037 26.339 -22.048
- HS.Grad 1 3.495 26.797 -21.187
- Murder 1 34.739 58.041 17.457

Step: AIC= -28.16
Life.Exp ~ Population + Murder + HS.Grad + Frost

Df Sum of Sq RSS AIC
23.308 -28.161
- Population 1 2.064 25.372 -25.920
- Frost 1 3.122 26.430 -23.876
- HS.Grad 1 5.112 28.420 -20.246
- Murder 1 34.816 58.124 15.528
5

Call:
lm(formula = Life.Exp ~ Population + Murder + HS.Grad + Frost, data =
statedata)

Coefficients:
(Intercept) Population Murder HS.Grad Frost
7.103e+01 5.014e-05 -3.001e-01 4.658e-02 -5.943e-03

The following function will use the BIC criteria:
> step(m1,k = log(50))
Start: AIC= -6.89
Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad +
Frost + Area

Df Sum of Sq RSS AIC
- Area 1 0.001 23.298 -10.798
- Income 1 0.004 23.302 -10.791
- Illiteracy 1 0.005 23.302 -10.790
- Population 1 1.747 25.044 -7.185
- Frost 1 1.847 25.144 -6.987
23.297 -6.888
- HS.Grad 1 2.441 25.738 -5.818
- Murder 1 23.141 46.438 23.689

Step: AIC= -10.8
Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad +
Frost

Df Sum of Sq RSS AIC
- Illiteracy 1 0.004 23.302 -14.702
- Income 1 0.006 23.304 -14.698
- Population 1 1.760 25.058 -11.069
23.298 -10.798
- Frost 1 2.049 25.347 -10.496
- HS.Grad 1 2.980 26.279 -8.691
- Murder 1 26.272 49.570 23.041

Step: AIC= -14.7
Life.Exp ~ Population + Income + Murder + HS.Grad + Frost

Df Sum of Sq RSS AIC
- Income 1 0.006 23.308 -18.601
- Population 1 1.887 25.189 -14.720
23.302 -14.702
- Frost 1 3.037 26.339 -12.488
- HS.Grad 1 3.495 26.797 -11.627
- Murder 1 34.739 58.041 27.017

Step: AIC= -18.6
Life.Exp ~ Population + Murder + HS.Grad + Frost

Df Sum of Sq RSS AIC
23.308 -18.601
- Population 1 2.064 25.372 -18.271
- Frost 1 3.122 26.430 -16.228
- HS.Grad 1 5.112 28.420 -12.598
- Murder 1 34.816 58.124 23.176
6

Call:
lm(formula = Life.Exp ~ Population + Murder + HS.Grad + Frost, data =
statedata)

Coefficients:
(Intercept) Population Murder HS.Grad Frost
7.103e+01 5.014e-05 -3.001e-01 4.658e-02 -5.943e-03

Mallow’s Cp Criterion

We need a new package: leaps
> install.packages("leaps")
> library(leaps)
> help("leaps")
leaps() performs an exhaustive search for the best subsets of the
variables in x for predicting y in linear regression, using an
efficient branch-and-bound algorithm. It is a compatibility
wrapper for 'regsubsets' does the same thing better.

It will give the best model for each size.

> rs<-summary(regsubsets(Life.Exp~.,data=statedata))
> rs
Subset selection object
Call: regsubsets.formula(Life.Exp ~ ., data = statedata)
7 Variables (and intercept)
1 subsets of each size up to 7
Selection Algorithm: exhaustive
Population Income Illiteracy Murder HS.Grad Frost Area
1 ( 1 ) " " " " " " "*" " " " " " "
2 ( 1 ) " " " " " " "*" "*" " " " "
3 ( 1 ) " " " " " " "*" "*" "*" " "
4 ( 1 ) "*" " " " " "*" "*" "*" " "
5 ( 1 ) "*" "*" " " "*" "*" "*" " "
6 ( 1 ) "*" "*" "*" "*" "*" "*" " "
7 ( 1 ) "*" "*" "*" "*" "*" "*" "*"

Next plot Cp values against the number of parameters.
> plot(2:8, rs$cp,xlab="Number of parameters",ylab="Mallow'sCp")
7

欢迎咨询51作业君