辅导案例-STAT 527

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
STAT 527 — Fall 2020
Homework 3
(Posted on Friday Oct. 02; Due Friday Oct. 16)
Please submit your assignment following the Guidelines for Homework posted at course website.
(Even if correct, answers might not receive credit if they are too difficult to read.) Remember to
include relevant computer output.
1. (a) Using the prostate data set, fit a linear regression model with lpsa as the response
and all of the other variables as predictors.
(b) Fit a generalized least squares model (provide estimates of β and σ2) which assumes
the following covariance matrix for the errors:
Var() = σ2W,
where
W =

1 0.5 0 ... 0
0.5 1 0.5 ... 0
0 0.5 1 ... 0
... ... ... ... ...
0 0 ... 1 0.5
0 0 ... 0.5 1
 .
That is, correlation between the adjacent errors is 0.5 but the other errors are not
correlated. To find the inverse of W , you can use the solve function in R and to find
square root of a matrix you can use the sqrtm function from the expm package.
(c) Provide confidence intervals for all the coefficient parameters based on the generalizes
least squares model from Part (b).
2. (a) Perform Box-Cox transformation analysis for the linear regression model in the
previous question (that is, for prostate data). Which transformation of the response
would you use, if any?
(b) Include squares of all the covariates in the model. Are any of the square terms
significant?
(c) Now perform the Box-Cox transformation for the model in part (b) and compare it
with the transformation from part (a).
(d) Out of all these models with or withour transformations on the response and/or the
covariates, which one would you consider to be the best and why?
3. Consider the corn yield data considered in the lecture. Perform the following analysis:
(a) Perform a regression model with yield as response, time and rain as covariates.
(b) Now, include quadratic terms for both the predictors. Compare these two models using
an F statistic and decide which one is preferred.
(c) Following the code presented in the lecture slides, perform a spline regression analysis
for the regression model with yield as response, time and rain as covariates (splines
should be included for both these covariates).
1
4. Using the sat data, fit a model with total as the response and takers, salary and expend as
predictors using the following methods:(a) Ordinary least squares (b) Huber’s robust
regression (c) Least absolute deviation method (find an R command that can be used for
doing this). Compare the results. In each case, comment on the significance of the
predictors.
5. Given a dataset with response Y and covariates X, provide a flowchart of the key steps you
would follow to perform a comprehensive linear regression analysis of this dataset to
understand the relationship between the covariates and the response.
6. Using the seatpos data, fit the linear regression of hipcenter on all of the other variables.
(a) Produce a summary of the regression results.
(b) Do any variables appear to be significant based on the individual t-tests for their
coefficients? What about based on the overall F -test (for all of the variables together)?
(c) Compute the variance inflation factors (VIFs) for the variables. Using the threshold of
10 to determine if a VIF indicates a problem of collinearity, which variables have a VIF
indicating a possible problem?
(d) Reduce the model by removing all variables that had VIFs you identified as
problematic in the previous part. Produce a summary of the regression results.
(e) For the model of the previous part, do any variables appear to be significant based on
the individual t-tests for their coefficients? What about based on the overall F -test (for
all of the variables together)?
(f) Compute the VIFs for the reduced set of variables. (Have they changed?) Again using
the threshold of 10, which variables have a VIF indicating a possible problem?
2

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468