STAT 527 — Fall 2020 Homework 3 (Posted on Friday Oct. 02; Due Friday Oct. 16) Please submit your assignment following the Guidelines for Homework posted at course website. (Even if correct, answers might not receive credit if they are too difficult to read.) Remember to include relevant computer output. 1. (a) Using the prostate data set, fit a linear regression model with lpsa as the response and all of the other variables as predictors. (b) Fit a generalized least squares model (provide estimates of β and σ2) which assumes the following covariance matrix for the errors: Var() = σ2W, where W = 1 0.5 0 ... 0 0.5 1 0.5 ... 0 0 0.5 1 ... 0 ... ... ... ... ... 0 0 ... 1 0.5 0 0 ... 0.5 1 . That is, correlation between the adjacent errors is 0.5 but the other errors are not correlated. To find the inverse of W , you can use the solve function in R and to find square root of a matrix you can use the sqrtm function from the expm package. (c) Provide confidence intervals for all the coefficient parameters based on the generalizes least squares model from Part (b). 2. (a) Perform Box-Cox transformation analysis for the linear regression model in the previous question (that is, for prostate data). Which transformation of the response would you use, if any? (b) Include squares of all the covariates in the model. Are any of the square terms significant? (c) Now perform the Box-Cox transformation for the model in part (b) and compare it with the transformation from part (a). (d) Out of all these models with or withour transformations on the response and/or the covariates, which one would you consider to be the best and why? 3. Consider the corn yield data considered in the lecture. Perform the following analysis: (a) Perform a regression model with yield as response, time and rain as covariates. (b) Now, include quadratic terms for both the predictors. Compare these two models using an F statistic and decide which one is preferred. (c) Following the code presented in the lecture slides, perform a spline regression analysis for the regression model with yield as response, time and rain as covariates (splines should be included for both these covariates). 1 4. Using the sat data, fit a model with total as the response and takers, salary and expend as predictors using the following methods:(a) Ordinary least squares (b) Huber’s robust regression (c) Least absolute deviation method (find an R command that can be used for doing this). Compare the results. In each case, comment on the significance of the predictors. 5. Given a dataset with response Y and covariates X, provide a flowchart of the key steps you would follow to perform a comprehensive linear regression analysis of this dataset to understand the relationship between the covariates and the response. 6. Using the seatpos data, fit the linear regression of hipcenter on all of the other variables. (a) Produce a summary of the regression results. (b) Do any variables appear to be significant based on the individual t-tests for their coefficients? What about based on the overall F -test (for all of the variables together)? (c) Compute the variance inflation factors (VIFs) for the variables. Using the threshold of 10 to determine if a VIF indicates a problem of collinearity, which variables have a VIF indicating a possible problem? (d) Reduce the model by removing all variables that had VIFs you identified as problematic in the previous part. Produce a summary of the regression results. (e) For the model of the previous part, do any variables appear to be significant based on the individual t-tests for their coefficients? What about based on the overall F -test (for all of the variables together)? (f) Compute the VIFs for the reduced set of variables. (Have they changed?) Again using the threshold of 10, which variables have a VIF indicating a possible problem? 2
欢迎咨询51作业君