辅导案例-MATH2831

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
MATH2831 Linear Models
Assignment
Note: This assignment is due by 11:59pm Monday 16 November (week 10)
Please follow the instructions below for completing the assignment, it’s worth
20% of your final mark.
• This assignment must be completed individually.
• Your assignment must be submitted as a pdf file. It may be typed
or handwritten, then converted into one pdf file. You must include the
completed coversheet in your assignment.
• You must sign and date your submitted assignment, and include your name
and zID below.
I declare that this assessment item is my own work, except where acknowledged,
and has not been submitted for academic credit elsewhere, and acknowledge that
the assessor of this item may, for the purpose of assessing this item:
• Reproduce this assessment item and provide a copy to another member of
the University; and/or,
• Communicate a copy of this assessment item to a plagiarism checking ser-
vice (which may then retain a copy of the assessment item on its database
for the purpose of future plagiarism checking).
I certify that I have read and understood the University Rules in respect of
Student Academic Misconduct.
Student’s full name and zID
Signed: Date:
1
1. An experiment was conducted in order to study the size of squid eaten by
sharks and tuna. The predictor variables are characteristic of the beak or
mouth of the squid. The predictors and response considered for the study
are:
x1 : Rostral length in inches
x2 : Wing length in inches
x3 : Rostral to notch length
x4 : Notch to wing length
x5 : Width in inches
y : Weight in pounds
The study involved measurements and weight taken on 22 specimen and is
available in the squid.txt data set.
(I) Best subset selection. Carry out a best subset linear regression analysis
using the regsubsets() function on the squid data set.
(a) Copy summary output in your report and briefly comment on
the models identified in this output. From the summary ouput,
identify the best model obtained with the four predictors.
(b) What is the best model based on adjusted R2, PRESS and Cp
from among the chosen models by the regsubsets() function?
To provide evidence for your answer, include in your report a
table showing the values of adjusted R2, PRESS and Cp for the
best subsets of each size. Include also two plots, one of adjusted
R2 and another of Cp for the best subsets of each size against the
number of predictors.
(II) Sequential variable selection on the squid data set.
(a) Carry out forward model selection with the stepAIC() function,
using all the available predictors and starting from the model with
just an intercept.
(i) Copy the R output in your report and describe the selection
procedure from the output. At each step state the ’current
model’, which predictor was added to the current model and
why.
(ii) Clearly state the final model obtained, including the coefficient
estimates of the fitted model. How does your answer compare
to the results in (I)?
(b) Repeat a) using backward model selection, starting from the model
with all the available predictors.
2
(i) Copy the R output in your report and describe the selection
procedure from the output. At each step state the ’current
model’, which predictor was removed from the current model
and why. What is the AIC for the model with just x4?
(ii) Clearly state the final model obtained, including the coefficient
estimates of the fitted model. Do you obtain the same model
as in the forward selection above?
(c) Carry out a stepwise selection procedure with the stepAIC() func-
tion, using all the available predictors and starting from the model
with just x1. Do NOT include the R output, just state the final
model in your answer, and compare to your findings in parts a)
and b) above.
(III) Model criticism. Fit a linear model with y as the response and all the
available predictors to the squid data.
(a) Include in your report diagnostic plots of residuals for the fitted
model and comment on the appropriateness of the general linear
model assumptions. In particular, comment on whether or not
there appear to be any violation of model assumptions, such as
incorrectly specified mean, failure of the constancy of error vari-
ance, departure from normality, outliers and observations that
have a large influence on the model analysis.
Note, that you can plot all four residual plots using:
par(mfrow=c(2,2))
plot(model)
par(mfrow=c(1,1))
(b) Recommend the final optimal model based on you findings in (I)
and (II). Give reasons to support your choice.
Fit chosen optimal model to the squid data and repeat part (a).
Produce the summary output of the fitted model and include in
your report. Are all the predictor variables ”significant” in the
final model according to the parial t-tests?
3
2. In this question we consider derivation of Mallows’ Cp statistic for model
selection discussed in lectures.
Suppose the experimenter proposes a model
y = X1β1 + ε
∗ (p parameters)
where X1 is n× p matrix and vector β1 contains p parameters.
The “true” model however contains additional m− p parameters described
by vector β2. So the “true” model is given by
y = X1β1 +X2β2 + ε (m parameters,m > p)
where X2 is n×(m−p) matrix. Assume that errors ε are uncorrelated with
mean zero and common variance σ2.
Consider fitting the proposed general linear model to data and write ŷi for
the fitted value at xi and MSE(ŷi) for its mean squared error. Recall that
if the error variance σ2 is known, then an estimate of∑n
i=1MSE(ŷi)
σ2
=
∑n
i=1 V ar(ŷi)
σ2
+
∑n
i=1Bias
2(ŷi)
σ2
is
p+
(n− p)(σ̂2 − σ2)
σ2
(1)
where σ̂2 is the estimate of the error variance for the proposed model and
p is the number of parameters.
You now have to provide a justification for (1).
(a) Writing ŷ = (ŷ1, ..., ŷn)
> for the vector of fitted values for the proposed
model and observing that
ŷ = X1(X
>
1 X1)
−1X>1 y = H1y,
show that
n∑
i=1
V ar(ŷi) = σ
2tr(H1),
where tr(A) denotes the trace of A and H1 = X1(X
>
1 X1)
−1X>1 denotes
the hat matrix corresponding to the proposed model. By using the
rules given in lectures about matrix traces, deduce that∑n
i=1 V ar(ŷi)
σ2
= p.
4
(b) Consider the estimate of σ2 obtained in lectures for the proposed
model,
σ̂2 =
y>(I −X1(X>1 X1)−1X>1 )y
n− p .
By using the result stated in lectures about the expected value of a
quadratic form y>Ay, and noting that
E(y) = X1β1 +X2β2,
show that
E(σ̂2) = σ2 +
1
n− pβ
>
2 X
>
2 (I −H1)X2β2.
(c) Show that
n∑
i=1
Bias2(ŷi) = (E(y)− E(ŷ))>(E(y)− E(ŷ))
= β>2 X
>
2 (I −H1)X2β2.
(d) From b) and c), deduce that an unbiased estimator of∑n
i=1Bias
2(ŷi)
σ2
is
(n− p)(σ̂2 − σ2)
σ2
,
from which it follows that (1) is a sensible estimator of∑n
i=1MSE(ŷi)
σ2
.
5

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468