STATS 201/208 Page 1 of 36 STATS 201/208 Page 2 of 36 STATS 201/208 Page 3 of 36 STATS 201/208 1. This question refers to the Attack Data in Appendix A. [Total 10 marks] a. Comment on what the scatter plot at the start of Appendix A reveals. [2 marks] b. Answer the lecturer’s first question about how the game deals with random- ness. Justify your answer. Note: you do not need to answer this question again later in the Executive Summary. [2 marks] c. Give a brief Executive Summary of the main conclusions of the analysis of the Attack Data in Appendix A. [6 marks] 2. This question refers to the Triplefin Respiration Rate Data in Appendix B. [Total 18 marks] a. Comment why it is appropriate to log the response (Rate) for this analysis. [1 mark] b. Comment on what the Second interaction plot (for the logged data) reveals. [2 marks] c. Write brief Methods and Assumption Checks of the analysis of the Triplefin Respiration Rate Data in Appendix B. [7 marks] d. Give a brief Executive Summary of the main conclusions of the Triplefin Respiration Rate Data in Appendix B. Remember: focus on the questions the UoA researchers were interested in. [8 marks] Page 4 of 36 STATS 201/208 3. This question refers to the Glass Fragments Data in Appendix C. [Total 20 marks] a. Consider the plots of Count and log(Count) versus Velocity. Comment on what they reveal about the nature of the relationship between the number of recovered glass fragments and the different projectile hardnesses. Also comment on whether there is anything that stands out about the projectile velocities. [3 marks] b. Two GLM models have been fitted to the data, one specifying family=poisson and the other specifying family=quasipoisson. Which Poisson assumption is violated that justified the use of the quasi-Poisson correction? What evidence is there in the analyses to support the use of the quasi-Poisson correction? [2 marks] c. Write brief Methods and Assumption Checks notes of the analysis of the Glass Fragments Data in Appendix C. [4 marks] d. Consider the variable HHRH55 in this model (before the baseline is changed). What does the coefficient of this variable estimate? And hence, explain why we would not directly interpret this in an Executive Summary. [3 marks] e. Give a brief Executive Summary of the main conclusions of the analysis of the Glass Fragments Data in Appendix C, making sure to address the specific interests in investigating this data set. [8 marks] 4. This question refers to the Titanic Data in Appendix D. [Total 10 marks] a. Comment on both the bar charts at the start of the analysis of the Titanic Data. Focus your comments on British and American passengers as that is what the newspaper article focused on. [3 marks] b. Discuss the reasoning behind the step in changing between models titanic.gfit and titanic.gfit1. [1 mark] c. Which of the factors, Residence or PClass, has the strongest evidence of influencing the likelihood of passengers survival? [1 mark] d. Can the difference in the odds of survival between Americans and Britons be solely explained by differences in passenger class? Justify your answer. [2 marks] e. Write a sentence (as if for an Executive Summary) quantifying the difference in odds of surviving for American and British passengers. [3 marks] Page 5 of 36 STATS 201/208 5. This question refers to the New Zealand Visitors Data in Appendix E. [Total 12 marks] a. Consider the time series plots of Visitors and log(Visitors) for January 2011 to December 2019 at the start of Appendix E, comment on what they reveal. [2 marks] b. Look at the acf plot for Visitors.fit1. How does it provide evidence of autocorrelation in the Residual Series? How is this confirmed later in the model fitting process? [2 marks] c. Does the Residual Series plot for Visitors.fit2 look like White Noise? Explain your answer. [1 mark] d. Using model Visitors.fit2, write an equation to calculate the estimated log(visitors) in January, 2020. This is the estimated equation, not the model equation, so use the estimated values for the coefficients and substitute appropriate values for variables. You DO NOT need to calculate the final value. [3 marks] e. Using the Holt-Winters model, provide a 95% prediction interval for the number of visitors in July, 2020. [1 mark] f. Based on the previous patterns in the series, do the predictions look reason- able? [1 mark] g. Can we rely on the 2020 predictions? Explain your answer. [2 marks] APPENDICES BOOKLET FOLLOWS Page 6 of 36 STATS 201/208 APPENDICES BOOKLET CONTENTS Appendix Name Pages A Attack Data 8–11 B Triplefin Respiration Rate Data 12–17 C Glass Fragments Data 18–24 D Titanic Data 25–29 E New Zealand Visitors Data 30–36 Page 7 of 36 STATS 201/208 Appendix A Attack Data A lecturer was playing a computer game where they sent a number of troops to attack a fort. It would take quite a few attacks over time to destroy a fort so the lecturer was interested in how the game calculated damage. He noticed that if the identical strength of troops were sent on the attack on multiple occasions, different amounts of damage were done, so there was a random element to the damage. A minimum of 100 points of troop strength had to be sent for the attack, otherwise it failed. The lecturer decided to collect some data on this and so carried out a series of attacks where he randomly allocated a strength of troops for the attack (between 100 and 500) and then recorded the damage. The variables measured for each attack are: Strength Troop strength for the attack. Damage number of points of damage inflicted on the fort. The lecturer was interested in several questions: How does the game deal with randomness? Is there a single sized random adjustment added to the attack or does the size of the random adjustment depend on the size of the strength of the attack force. On average, how much damage would an attack force of strength 100 be expected to do? On average, how much additional damage would each additional 100 points of strength in an attack force add? Page 8 of 36 STATS 201/208 > plot(Damage ~ Strength, data = Attack.df) 100 200 300 400 500 20 0 40 0 60 0 80 0 Strength D am ag e > fit1=lm(Damage~Strength,data=Attack.df) > plot(fit1,which=1) 200 400 600 800 − 20 − 10 0 10 20 Fitted values R es id ua ls Residuals vs Fitted 70 92 98 Page 9 of 36 STATS 201/208 > normcheck(fit1, main = "") Theoretical Quantiles Sa m pl e Qu an tile s Residuals from lm(Damage ~ Strength) > cooks20x(fit1) observation number 0 20 40 60 80 1000 .0 0 0. 02 0. 04 0. 06 Cook's Distance plot Co ok 's di st an ce 92 82 13 Page 10 of 36 STATS 201/208 > summary(fit1) Call: lm(formula = Damage ~ Strength, data = Attack.df) Residuals: Min 1Q Median 3Q Max -17.8525 -5.6600 0.1072 5.2280 16.9520 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.518e+02 1.912e+00 -79.38 <2e-16 *** Strength 2.006e+00 6.029e-03 332.68 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 7.731 on 98 degrees of freedom Multiple R-squared: 0.9991,Adjusted R-squared: 0.9991 F-statistic: 1.107e+05 on 1 and 98 DF, p-value: < 2.2e-16 > confint(fit1) 2.5 % 97.5 % (Intercept) -155.550376 -147.962755 Strength 1.993782 2.017711 > preddata = data.frame(Strength = 100) > predict(fit1, preddata, interval = "confidence") fit lwr upr 1 48.81811 46.07553 51.56069 Page 11 of 36 STATS 201/208 Appendix B Triplefin Respiration Rate Data Intertidal fishes inhabit the narrow body of water between the high and low tide marks along rocky coastlines, many of them remaining in the rock pools that form at low tide. The temperatures in these rock pools can rise during the low tide cycle leading to a reduction in the amount of oxygen dissolvable in the seawater. Critically high temperatures can result in a mismatch occuring between oxygen supply and demand, leading to hypoxia (low oxygen) and, eventually, death. A small group of researchers from The University of Auckland (UoA) studied two closely related species of New Zealand triplefin fishes, namely Bellapiscis medius (BM), inhabiting intertidal rock pools, and B. lesleyae (BL), inhabiting low intertidal and subtidal environments. Specimens of each species were caught at the same location and held in tanks with the water temperature set at 15◦C for one month. Individual fishes were then randomly selected and put into a new tank in which the water temperature was set at 10, 15, 20 or 25◦C. The target temperature of the water was reached after approximately 2 hours. The fish remained in the tank for a further 30 minutes and then had its respiration rate measured. The variables in this study are as follows: Species a factor with two levels indicating the species of New Zealand triplefin fishes, either BM (B. medius) or BL (B. lesleyae). Note: Treat BM as intertidal and BL as subtidal. Temp a factor with four levels indicating the seawater temperature in the tank the fish was placed in, one of 10, 15, 20 or 25 ◦C. Rate the respiration rate, in metric tons. Respiration rates typically increase as dissolved oxygen concentration decreases, so higher respiration rates would be indicative of greater tolerance to an increase in water temperature. The UoA researchers were interested in whether differences in tolerance due to wa- ter temperature depend of whether fish are intertidal or subtidal. What were any differences in tolerances due to water temperature between intertidal or subtidal fishes? In particular, is there any evidence that intertidal fishes have greater tolerance to higher water temperatures than subtidal fishes? > rate.df <- transform(rate.df, + logRate = log(Rate), + Temp = factor(rate.df$Temp)) Page 12 of 36 STATS 201/208 > interactionPlots(Rate ~ Temp + Species, data = rate.df) Plot of 'Rate' by levels of 'Temp' and 'Species' Temp R at e 0. 2 0. 4 0. 6 0. 8 10 15 20 25 BM BL > rate.fit = lm(Rate ~ Temp * Species, data = rate.df) > plot(rate.fit , which = 1) 0.2 0.3 0.4 0.5 − 0. 2 0. 0 0. 1 0. 2 0. 3 Fitted values R es id ua ls Residuals vs Fitted 84 68 92 Page 13 of 36 STATS 201/208 > interactionPlots(logRate ~ Temp + Species, data = rate.df) Plot of 'logRate' by levels of 'Temp' and 'Species' Temp lo gR at e − 2. 5 − 2. 0 − 1. 5 − 1. 0 − 0. 5 0. 0 10 15 20 25 BM BL > lograte.fit = lm(logRate ~ Temp * Species, data = rate.df) > plot(lograte.fit , which = 1) −2.0 −1.8 −1.6 −1.4 −1.2 −1.0 −0.8 − 0. 4 0. 0 0. 2 0. 4 Fitted values R es id ua ls Residuals vs Fitted 68 84 14 Page 14 of 36 STATS 201/208 > normcheck(lograte.fit , main = "") Theoretical Quantiles Sa m pl e Qu an tile s Residuals from lm(logRate ~ Temp * Species) > cooks20x(lograte.fit) observation number 0 20 40 60 800 .0 0 0. 02 0. 04 0. 06 Cook's Distance plot Co ok 's di st an ce 14 68 16 Page 15 of 36 STATS 201/208 > anova(lograte.fit) Analysis of Variance Table Response: logRate Df Sum Sq Mean Sq F value Pr(>F) Temp 3 23.5635 7.8545 218.2875 < 2.2e-16 *** Species 1 0.0228 0.0228 0.6332 0.4283 Temp:Species 3 1.7573 0.5858 16.2796 1.755e-08 *** Residuals 87 3.1305 0.0360 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > summary(lograte.fit) Call: lm(formula = logRate ~ Temp * Species, data = rate.df) Residuals: Min 1Q Median 3Q Max -0.39742 -0.11028 0.00239 0.13441 0.43660 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.88840 0.06707 -28.157 < 2e-16 *** Temp15 0.25515 0.08524 2.993 0.00359 ** Temp20 0.58336 0.08524 6.844 1.03e-09 *** Temp25 1.22617 0.08524 14.385 < 2e-16 *** SpeciesBM 0.05775 0.08998 0.642 0.52265 Temp15:SpeciesBM -0.38569 0.11676 -3.303 0.00139 ** Temp20:SpeciesBM 0.35559 0.11774 3.020 0.00332 ** Temp25:SpeciesBM -0.05311 0.11676 -0.455 0.65035 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1897 on 87 degrees of freedom Multiple R-squared: 0.8901,Adjusted R-squared: 0.8812 F-statistic: 100.6 on 7 and 87 DF, p-value: < 2.2e-16 Page 16 of 36 STATS 201/208 > summary2way(lograte.fit, page = "interaction") Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = fit) $`Comparisons within Temp` diff lwr upr p adj 10:BM-10:BL 0.057753100 -0.2217129 0.33721909 0.9981469 15:BM-15:BL -0.327933863 -0.5590238 -0.09684393 0.0007549 20:BM-20:BL 0.413347758 0.1774926 0.64920293 0.0000129 25:BM-25:BL 0.004646926 -0.2264430 0.23573686 1.0000000 $`Comparisons between Temp` diff lwr upr p adj 15:BL-10:BL 0.2551468 -0.009599945 0.5198936 0.0672285 20:BL-10:BL 0.5833588 0.318611986 0.8481055 0.0000000 25:BL-10:BL 1.2261675 0.961420688 1.4909142 0.0000000 20:BL-15:BL 0.3282119 0.097121997 0.5593019 0.0007446 25:BL-15:BL 0.9710206 0.739930699 1.2021106 0.0000000 25:BL-20:BL 0.6428087 0.411718768 0.8738986 0.0000000 15:BM-10:BM -0.1305401 -0.378356475 0.1172762 0.7272595 20:BM-10:BM 0.9389534 0.686687595 1.1912192 0.0000000 25:BM-10:BM 1.1730613 0.925244946 1.4208776 0.0000000 20:BM-15:BM 1.0694936 0.833638376 1.3053487 0.0000000 25:BM-15:BM 1.3036014 1.072511488 1.5346914 0.0000000 25:BM-20:BM 0.2341079 -0.001747306 0.4699630 0.0531724 > exp(c(-0.5590238, -0.09684393)) [1] 0.5717670 0.9076977 > 100 * (exp(c(-0.5590238, -0.09684393)) - 1) [1] -42.823305 -9.230234 > exp(c(0.1774926, 0.64920293)) [1] 1.194219 1.914015 > 100 * (exp(c(0.1774926, 0.64920293)) - 1) [1] 19.42192 91.40146 Page 17 of 36 STATS 201/208 Appendix C Glass Fragments Data Wong (2007) was interested in modeling the number of glass fragments on the ground that were recovered after a window pane was shot with a handgun. In particular, he ex- plored the hypothesis that the expected number of fragments was altered by the velocity, hardness, and shape (profile) of the projectile – the projectile being the lead part of a bullet. In the analyses below attention is restricted to only the velocity (V) and hardness (H) of the projectile. The variables are: Count the number of recovered glass fragments V the velocity of the projectile (in metres per second) H a factor with two levels indicating the hardness of the projectile, one of HRH40 (soft) or HRH55 (hard) Our interest here lies in understanding whether the nature of the relationship between the expected number of recovered glass fragments and projectile velocity depends on a projectile’s hardness. More specifically, we are interested in whether the percentage increase in the expected number of recovered glass fragments with increased velocity depends on the projectile’s hardness. Furthermore, we are interested in estimating any such percentage increases. Page 18 of 36 STATS 201/208 > plot(Count ~ V, pch = c(1:2)[glass.df$H], data = glass.df) 600 800 1000 1200 1400 0 50 00 10 00 0 15 00 0 20 00 0 V Co un t Hardness HRH40 HRH55 > plot(log(Count) ~ V, pch = c(1:2)[glass.df$H], data = glass.df) 600 800 1000 1200 1400 5 6 7 8 9 10 V lo g(C ou nt) Hardness HRH40 HRH55 Page 19 of 36 STATS 201/208 > glass.gfit <- glm(Count ~ H*V, family = poisson, data = glass.df) > plot(glass.gfit, which = 1, lty = 3) > summary(glass.gfit) Call: glm(formula = Count ~ H * V, family = poisson, data = glass.df) Deviance Residuals: Min 1Q Median 3Q Max -61.275 -11.044 -4.205 4.552 74.495 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.606e+00 2.821e-02 92.39 <2e-16 *** HHRH55 1.296e+00 3.718e-02 34.86 <2e-16 *** V 5.425e-03 2.412e-05 224.91 <2e-16 *** HHRH55:V -1.787e-03 3.125e-05 -57.19 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 159051 on 35 degrees of freedom Residual deviance: 21880 on 32 degrees of freedom AIC: 22211 Number of Fisher Scoring iterations: 4 > 1 - pchisq(21880, 32) [1] 0 Page 20 of 36 STATS 201/208 > glass.quasigfit <- glm(Count ~ H * V, family = quasipoisson, data = glass.df) > summary(glass.quasigfit) Call: glm(formula = Count ~ H * V, family = quasipoisson, data = glass.df) Deviance Residuals: Min 1Q Median 3Q Max -61.275 -11.044 -4.205 4.552 74.495 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.6059085 0.7725498 3.373 0.00196 ** HHRH55 1.2961129 1.0184172 1.273 0.21230 V 0.0054254 0.0006607 8.212 2.22e-09 *** HHRH55:V -0.0017872 0.0008559 -2.088 0.04483 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for quasipoisson family taken to be 750.176) Null deviance: 159051 on 35 degrees of freedom Residual deviance: 21880 on 32 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 4 Page 21 of 36 STATS 201/208 > confint(glass.quasigfit) 2.5 % 97.5 % (Intercept) 0.972690034 4.0122136166 HHRH55 -0.683428951 3.3350412572 V 0.004207561 0.0068068053 HHRH55:V -0.003503468 -0.0001332541 > exp(confint(glass.quasigfit)) 2.5 % 97.5 % (Intercept) 2.6450502 55.2690798 HHRH55 0.5048828 28.0795417 V 1.0042164 1.0068300 HHRH55:V 0.9965027 0.9998668 > exp(confint(glass.quasigfit)[3, ]) 2.5 % 97.5 % 1.004216 1.006830 > exp(100 * confint(glass.quasigfit)[3, ]) 2.5 % 97.5 % 1.523113 1.975221 > 100 * (exp(100 * confint(glass.quasigfit)[3, ]) - 1) 2.5 % 97.5 % 52.31128 97.52215 Page 22 of 36 STATS 201/208 Changing the baseline level of H to HRH55 > glass.df$H <- relevel(glass.df$H, ref = "HRH55") > glass.quasigfit <- glm(Count ~ H * V, family = quasipoisson, data = glass.df) > summary(glass.quasigfit) Call: glm(formula = Count ~ H * V, family = quasipoisson, data = glass.df) Deviance Residuals: Min 1Q Median 3Q Max -61.275 -11.044 -4.205 4.552 74.495 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.9020214 0.6635815 5.880 1.54e-06 *** HHRH40 -1.2961129 1.0184172 -1.273 0.2123 V 0.0036382 0.0005441 6.687 1.51e-07 *** HHRH40:V 0.0017872 0.0008559 2.088 0.0448 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for quasipoisson family taken to be 750.176) Null deviance: 159051 on 35 degrees of freedom Residual deviance: 21880 on 32 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 4 Page 23 of 36 STATS 201/208 > confint(glass.quasigfit) 2.5 % 97.5 % (Intercept) 2.5131267957 5.123678079 HHRH40 -3.3350412572 0.683428951 V 0.0026114910 0.004751810 HHRH40:V 0.0001332541 0.003503468 > exp(confint(glass.quasigfit)) 2.5 % 97.5 % (Intercept) 12.34346528 167.951976 HHRH40 0.03561312 1.980658 V 1.00261490 1.004763 HHRH40:V 1.00013326 1.003510 > exp(confint(glass.quasigfit)[3, ]) 2.5 % 97.5 % 1.002615 1.004763 > exp(100 * confint(glass.quasigfit)[3, ]) 2.5 % 97.5 % 1.298421 1.608305 > 100 * (exp(100 * confint(glass.quasigfit)[3, ]) - 1) 2.5 % 97.5 % 29.84212 60.83053 Page 24 of 36 STATS 201/208 Appendix D Titanic Data Over 2200 passengers and crew were on board the luxury steamship RMS Titanic when it set sail on its maiden voyage from Southampton to New York City. More than 1,500 men, women and children lost their lives when, in the early hours of 15 April, 1912, the ship collided with an iceberg and began taking on water! Data obtained on the passengers and crew on board the Titanic have led to numerous articles, including one which appeared in the Independent newspaper in 2009 almost a century after the disaster with the headline More Britons than Americans died on Titanic ‘because they queued’. Data on the passengers of the Titanic was analysed. The variables (of interest to us) in the data set are: survived A binary variable which takes the value 1 if a passenger survived or 0 if he or she died. PClass A factor variable indicating a passenger’s ticket class 1st, 2nd or 3rd. Residence A factor variable indicating a passenger’s country of residence, namely American, British or Other. In the following analysis, we are interested in whether or not the 2009 headline is mis- leading. Specifically, did Britons have a lower odds of surviving than Americans? Or was it, as others have speculated, that most Americans were in 1st class which was the main factor influencing survival? Plots of the numbers of passengers and survival rates of passengers by combination of Residence and PClass have been created and are on the next page. The R-code has been hidden to save space. Page 25 of 36 STATS 201/208 1st 2nd 3rd American British Other Number of Passengers by combination of Residence and Class Passenger class N um be r o f P a ss e n ge rs 0 10 0 20 0 30 0 40 0 50 0 60 0 1st 2nd 3rd American British Other Percentage Survived by combination of Residence and Class Passenger class Su rv ive d (% ) 0 20 40 60 80 10 0 Page 26 of 36 STATS 201/208 > titanic.gfit <- glm(survived ~ PClass * Residence, family = binomial, + data = titanic.df) > anova(titanic.gfit, test = "Chisq") Analysis of Deviance Table Model: binomial, link: logit Response: survived Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 1308 1741.0 PClass 2 127.765 1306 1613.3 < 2e-16 *** Residence 2 7.129 1304 1606.1 0.02831 * PClass:Residence 4 2.001 1300 1604.1 0.73567 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > titanic.gfit1 <- glm(survived ~ PClass + Residence, family = binomial, + data = titanic.df) > anova(titanic.gfit1, test = "Chisq") Analysis of Deviance Table Model: binomial, link: logit Response: survived Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 1308 1741.0 PClass 2 127.765 1306 1613.3 < 2e-16 *** Residence 2 7.129 1304 1606.1 0.02831 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Page 27 of 36 STATS 201/208 > summary(titanic.gfit1) Call: glm(formula = survived ~ PClass + Residence, family = binomial, data = titanic.df) Deviance Residuals: Min 1Q Median 3Q Max -1.4210 -0.7889 -0.7889 0.9750 1.7985 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.55649 0.13815 4.028 5.62e-05 *** PClass2nd -0.60137 0.18444 -3.260 0.00111 ** PClass3rd -1.50428 0.16771 -8.970 < 2e-16 *** ResidenceBritish -0.44838 0.20271 -2.212 0.02697 * ResidenceOther -0.05988 0.17778 -0.337 0.73626 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1741.0 on 1308 degrees of freedom Residual deviance: 1606.1 on 1304 degrees of freedom AIC: 1616.1 Number of Fisher Scoring iterations: 4 Page 28 of 36 STATS 201/208 > confint(titanic.gfit1) 2.5 % 97.5 % (Intercept) 0.2878516 0.83007470 PClass2nd -0.9644187 -0.24078947 PClass3rd -1.8361431 -1.17814378 ResidenceBritish -0.8463438 -0.05100489 ResidenceOther -0.4068087 0.29068337 > exp(confint(titanic.gfit1)) 2.5 % 97.5 % (Intercept) 1.3335593 2.2934901 PClass2nd 0.3812047 0.7860071 PClass3rd 0.1594312 0.3078496 ResidenceBritish 0.4289805 0.9502740 ResidenceOther 0.6657716 1.3373411 > 100 * (exp(confint(titanic.gfit1)) - 1) 2.5 % 97.5 % (Intercept) 33.35593 129.349007 PClass2nd -61.87953 -21.399291 PClass3rd -84.05688 -69.215036 ResidenceBritish -57.10195 -4.972597 ResidenceOther -33.42284 33.734107 Page 29 of 36 STATS 201/208 Appendix E New Zealand Visitors Data The following data contains the combined monthly figures for visitors to New Zealand from January 2011 to December 2019. The variable is: Visitors the monthly number of visitors to New Zealand. Note: Define Month as a factor for month of the year coded as 1 for January, 2 for February, etc. Define Time as a numeric variable with values 1 for January 2011, 2 for February 1980, ..., 108 for December 2019. > Time = 1:108 > Month = factor(visitors.df$Month[1:108]) Page 30 of 36 STATS 201/208 > Visitors.ts = ts(visitors.df$Visitors[1:108],start=2011,frequency=12) > plot(Visitors.ts,xlab="month",ylab="numbers of visitors", + main="NZ Visitors 2011 to 2019") NZ Visitors 2011 to 2019 month n u m be rs o f v isi to rs 2012 2014 2016 2018 2020 2e +0 5 4e +0 5 > log.Visitors.ts=ts(log(visitors.df$Visitors[1:108]),start=2011,frequency=12) > plot(log.Visitors.ts,xlab="month",ylab="(log)numbers of visitors", + main="(log) NZ Visitors 2011 to 2019") (log) NZ Visitors 2011 to 2019 month (lo g)n u m be rs o f v isi to rs 2012 2014 2016 2018 2020 11 .8 12 .2 12 .6 13 .0 Page 31 of 36 STATS 201/208 > Visitors.fit1 = lm(log(Visitors.ts) ~ Time + Month) > acf(residuals(Visitors.fit1)) 0 5 10 15 20 − 0. 2 0. 2 0. 4 0. 6 0. 8 1. 0 Lag AC F > Visitors.fit2 = lm(log(Visitors.ts[-1]) + ~Time[-1]+Month[-1]+log(Visitors.ts[-108])) > acf(residuals(Visitors.fit2)) 0 5 10 15 20 − 0. 2 0. 2 0. 4 0. 6 0. 8 1. 0 Lag AC F Page 32 of 36 STATS 201/208 > plot.ts(residuals(Visitors.fit2), main = "Residual Series") Residual Series Time re si du al s(V isi tor s.f it2 ) 0 20 40 60 80 100 − 0. 10 0. 00 0. 10 0. 20 > normcheck(Visitors.fit2, main = "") Theoretical Quantiles Sa m pl e Qu an tile s Residuals from lm(log(Visitors.ts[−1]) ~ Time[−1] + Month[−1] + log(Visitors.ts[−108])) Page 33 of 36 STATS 201/208 > summary(Visitors.fit2) Call: lm(formula = log(Visitors.ts[-1]) ~ Time[-1] + Month[-1] + log(Visitors.ts[-108])) Residuals: Min 1Q Median 3Q Max -0.096130 -0.027219 -0.005955 0.026706 0.196144 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.0440122 1.1388469 5.307 7.53e-07 *** Time[-1] 0.0025502 0.0004836 5.274 8.67e-07 *** Month[-1]2 0.1958563 0.0348421 5.621 1.97e-07 *** Month[-1]3 0.0531338 0.0320116 1.660 0.100318 Month[-1]4 -0.0949275 0.0400473 -2.370 0.019834 * Month[-1]5 -0.3101308 0.0561956 -5.519 3.07e-07 *** Month[-1]6 -0.1456389 0.0830669 -1.753 0.082850 . Month[-1]7 0.0180168 0.0826995 0.218 0.828017 Month[-1]8 -0.0891090 0.0685877 -1.299 0.197088 Month[-1]9 -0.0161417 0.0706131 -0.229 0.819687 Month[-1]10 0.0008575 0.0654930 0.013 0.989582 Month[-1]11 0.2143367 0.0615398 3.483 0.000758 *** Month[-1]12 0.4826646 0.0427035 11.303 < 2e-16 *** log(Visitors.ts[-108]) 0.5012654 0.0896925 5.589 2.27e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.04886 on 93 degrees of freedom Multiple R-squared: 0.9788,Adjusted R-squared: 0.9758 F-statistic: 330.4 on 13 and 93 DF, p-value: < 2.2e-16 > log.Visitors.ts[108] [1] 13.17727 Page 34 of 36 STATS 201/208 > HW.fit = HoltWinters(Visitors.ts, seasonal = "multiplicative") > HW.fit Holt-Winters exponential smoothing with trend and multiplicative seasonal component. Call: HoltWinters(x = Visitors.ts, seasonal = "multiplicative") Smoothing parameters: alpha: 0.1096714 beta : 0.1615218 gamma: 0.5832997 Coefficients: [,1] a 3.207408e+05 b 3.129164e+01 s1 1.240819e+00 s2 1.314053e+00 s3 1.189316e+00 s4 9.472730e-01 s5 6.874044e-01 s6 6.778433e-01 s7 7.987935e-01 s8 7.787323e-01 s9 8.174633e-01 s10 8.851151e-01 s11 1.168538e+00 s12 1.649082e+00 > HW.Mult.pred = predict(HW.fit,n.ahead=12,prediction.interval=TRUE) > HW.Mult.pred fit upr lwr Jan 2020 398020.0 414650.9 381389.1 Feb 2020 421552.5 438561.6 404543.4 Mar 2020 381573.9 398877.1 364270.7 Apr 2020 303947.6 321332.5 286562.8 May 2020 220586.2 237893.4 203278.9 Jun 2020 217539.3 235299.8 199778.8 Jul 2020 256380.6 275336.8 237424.5 Aug 2020 249966.1 269568.2 230364.1 Sep 2020 262424.0 283244.4 241603.7 Oct 2020 284169.5 306785.1 261553.9 Nov 2020 375199.9 403209.9 347190.0 Dec 2020 529547.0 564448.3 494645.8 Page 35 of 36 STATS 201/208 > plot(HW.fit,HW.Mult.pred, + main="HW Multiplicative Model and Predictions for NZ Visitors in 2020") HW Multiplicative Model and Predictions for NZ Visitors in 2020 Time O bs er ve d / F itt ed 2012 2014 2016 2018 2020 2e +0 5 4e +0 5 Page 36 of 36
欢迎咨询51作业君