Student number Semester 1 Assessment, 2022 School of Mathematics and Statistics MAST30025 Linear Statistical Models Assignment 2 Submission deadline: Friday April 29, 5pm This assignment consists of 3 pages (including this page) with 5 questions and 40 total marks Instructions to Students Writing This assignment is worth 7% of your total mark. You may choose to either typeset your assignment in LATEX, or handwrite and scan it to produce an electronic version. You may use R for this assignment, including the lm function unless otherwise specified. If you do, include your R commands and output. Write your answers on A4 paper. Page 1 should only have your student number, the subject code and the subject name. Write on one side of each sheet only. Each question should be on a new page. The question number must be written at the top of each page. Scanning and Submitting Put the pages in question order and all the same way up. Use a scanning app to scan all pages to PDF. Scan directly from above. Crop pages to A4. Submit your scanned assignment as a single PDF file and carefully review the submission in Gradescope. Scan again and resubmit if necessary. ©University of Melbourne 2022 Page 1 of 3 pages Can be placed in Baillieu Library MAST30025 Linear Statistical Models Assignment 2 Semester 1, 2022 Question 1 (5 marks) Consider a general full rank linear model y = Xβ + ε with p > 2 parameters. Derive an expression for a joint 100(1 − α)% confidence region for parameters βi and βj , where i and j are arbitrary. Question 2 (11 marks) (11 marks does not include the bonus part (f).) An experiment is conducted to estimate the annual demand for cars, based on their cost, the current unemployment rate, and the current interest rate. A survey is conducted and the following measurements obtained: Cars sold (×103) Cost ($k) Unemployment rate (%) Interest rate (%) 5.5 7.2 8.7 5.5 5.9 10.0 9.4 4.4 6.5 9.0 10.0 4.0 5.9 5.5 9.0 7.0 8.0 9.0 12.0 5.0 9.0 9.8 11.0 6.2 10.0 14.5 12.0 5.8 10.8 8.0 13.7 3.9 For this question, you may not use the lm function in R. (a) Fit a linear model to the data, and estimate the parameters and error variance. (b) Calculate 95% confidence intervals for the model parameters. (c) In a year with 8% unemployment rate and 3.5% interest rate, we price a car at $12, 000 and observe that 7,000 cars are sold. Is this an atypical year (according to your model)? (d) Using your answer from question 1, find and draw a joint 95% confidence region for the parameters corresponding to unemployment rate and interest rate. Superimpose a rectangle corresponding to the confidence intervals found in (b). (e) Do you expect the confidence region to be larger or smaller than the rectangle? Justify your answer. (f) (Bonus) What is the probability that the true parameters for unemployment rate and interest rate (jointly) lie in the rectangle you drew in (d)? Question 3 (7 marks) Consider a full rank linear model y = Xβ + ε. Derive a formula for a 100(1− α)% prediction interval for the sum of the responses of two independent future observations y1 and y2, with predictors x1 and x2 respectively. Page 2 of 3 pages MAST30025 Linear Statistical Models Assignment 2 Semester 1, 2022 Question 4 (12 marks) For this question we use the data set bike.csv (available on the LMS). This data set records counts of public bikes rented in an hour with the corresponding weather information. The variables are: count = the number of bikes rented in an hour temp = temperature (in Celsius) hum = relative humidity wind = windspeed (in m/s) visi = visibility (in metres) dew = dew point temperature (in Celsius) solar = solar radiation (in MJ/m2) (a) Fit a linear model using all of the variables. (b) Test for model relevance, using a corrected sum of squares. (c) Use forward selection with F tests to select variables for your model. (d) Starting from a null model, use stepwise selection with AIC to select variables for your model. Use this as your final model; comment briefly on the variables included. (e) Using the full model, test whether the temperature and dew point temperature have the same effect on the number of bikes rented. (f) Comment on the suitability of your final model, using diagnostic plots. Question 5 (5 marks) Suppose that we have a response variable y which is known to have a quadratic relationship with a predictor variable x. Explain all of the differences between fitting a linear model of y against x and x2, versus a linear model of √ y against x. Which would you use for each of the two datasets shown below? l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l 2 4 6 8 10 12 0 10 20 30 40 x y1 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 2 4 6 8 10 12 0 5 10 15 20 25 30 35 x y2 End of Assignment — Total Available Marks = 40 Page 3 of 3 pages
欢迎咨询51作业君