辅导案例-STATS 201/208

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

STATS 201/208
Page 1 of 36
STATS 201/208

Page 2 of 36
STATS 201/208

Page 3 of 36
STATS 201/208
1. This question refers to the Attack Data in Appendix A. [Total 10 marks]
a. Comment on what the scatter plot at the start of Appendix A reveals.
[2 marks]
b. Answer the lecturer’s first question about how the game deals with random-
ness. Justify your answer. Note: you do not need to answer this question
again later in the Executive Summary. [2 marks]
c. Give a brief Executive Summary of the main conclusions of the analysis of
the Attack Data in Appendix A. [6 marks]
2. This question refers to the Triplefin Respiration Rate Data in Appendix B.
[Total 18 marks]
a. Comment why it is appropriate to log the response (Rate) for this analysis.
[1 mark]
b. Comment on what the Second interaction plot (for the logged data) reveals.
[2 marks]
c. Write brief Methods and Assumption Checks of the analysis of the
Triplefin Respiration Rate Data in Appendix B. [7 marks]
d. Give a brief Executive Summary of the main conclusions of the Triplefin
Respiration Rate Data in Appendix B.
Remember: focus on the questions the UoA researchers were interested in.
[8 marks]
Page 4 of 36
STATS 201/208
3. This question refers to the Glass Fragments Data in Appendix C.
[Total 20 marks]
a. Consider the plots of Count and log(Count) versus Velocity. Comment on
what they reveal about the nature of the relationship between the number
of recovered glass fragments and the different projectile hardnesses. Also
comment on whether there is anything that stands out about the projectile
velocities. [3 marks]
b. Two GLM models have been fitted to the data, one specifying family=poisson
and the other specifying family=quasipoisson. Which Poisson assumption is
violated that justified the use of the quasi-Poisson correction? What evidence
is there in the analyses to support the use of the quasi-Poisson correction?
[2 marks]
c. Write brief Methods and Assumption Checks notes of the analysis of the
Glass Fragments Data in Appendix C. [4 marks]
d. Consider the variable HHRH55 in this model (before the baseline is changed).
What does the coefficient of this variable estimate? And hence, explain why
we would not directly interpret this in an Executive Summary. [3 marks]
e. Give a brief Executive Summary of the main conclusions of the
analysis of the Glass Fragments Data in Appendix C, making sure to
address the specific interests in investigating this data set. [8 marks]
4. This question refers to the Titanic Data in Appendix D. [Total 10 marks]
a. Comment on both the bar charts at the start of the analysis of the Titanic
Data. Focus your comments on British and American passengers as that is
what the newspaper article focused on. [3 marks]
b. Discuss the reasoning behind the step in changing between models
titanic.gfit and titanic.gfit1. [1 mark]
c. Which of the factors, Residence or PClass, has the strongest evidence of
influencing the likelihood of passengers survival? [1 mark]
d. Can the difference in the odds of survival between Americans and Britons be
solely explained by differences in passenger class? Justify your answer.
[2 marks]
e. Write a sentence (as if for an Executive Summary) quantifying the
difference in odds of surviving for American and British passengers. [3 marks]
Page 5 of 36
STATS 201/208
5. This question refers to the New Zealand Visitors Data in Appendix E.
[Total 12 marks]
a. Consider the time series plots of Visitors and log(Visitors) for January
2011 to December 2019 at the start of Appendix E, comment on what they
reveal. [2 marks]
b. Look at the acf plot for Visitors.fit1. How does it provide evidence of
autocorrelation in the Residual Series? How is this confirmed later in the
model fitting process? [2 marks]
c. Does the Residual Series plot for Visitors.fit2 look like White Noise?
Explain your answer. [1 mark]
d. Using model Visitors.fit2, write an equation to calculate the estimated
log(visitors) in January, 2020. This is the estimated equation, not the model
equation, so use the estimated values for the coefficients and substitute
appropriate values for variables. You DO NOT need to calculate the final
value. [3 marks]
e. Using the Holt-Winters model, provide a 95% prediction interval for the
number of visitors in July, 2020. [1 mark]
f. Based on the previous patterns in the series, do the predictions look reason-
able?
[1 mark]
g. Can we rely on the 2020 predictions? Explain your answer. [2 marks]
APPENDICES BOOKLET FOLLOWS
Page 6 of 36
STATS 201/208
APPENDICES BOOKLET
CONTENTS
Appendix Name Pages
A Attack Data 8–11
B Triplefin Respiration Rate Data 12–17
C Glass Fragments Data 18–24
D Titanic Data 25–29
E New Zealand Visitors Data 30–36
Page 7 of 36
STATS 201/208
Appendix A Attack Data
A lecturer was playing a computer game where they sent a number of troops to attack
a fort. It would take quite a few attacks over time to destroy a fort so the lecturer was
interested in how the game calculated damage. He noticed that if the identical strength
of troops were sent on the attack on multiple occasions, different amounts of damage were
done, so there was a random element to the damage. A minimum of 100 points of troop
strength had to be sent for the attack, otherwise it failed. The lecturer decided to collect
some data on this and so carried out a series of attacks where he randomly allocated a
strength of troops for the attack (between 100 and 500) and then recorded the damage.
The variables measured for each attack are:
Strength Troop strength for the attack.
Damage number of points of damage inflicted on the fort.
The lecturer was interested in several questions:
How does the game deal with randomness? Is there a single sized random
adjustment added to the attack or does the size of the random adjustment depend
on the size of the strength of the attack force.
On average, how much damage would an attack force of strength 100 be expected
to do?
On average, how much additional damage would each additional 100 points of
strength in an attack force add?
Page 8 of 36
STATS 201/208
> plot(Damage ~ Strength, data = Attack.df)
100 200 300 400 500
20
0
40
0
60
0
80
0
Strength
D
am
ag
e
> fit1=lm(Damage~Strength,data=Attack.df)
> plot(fit1,which=1)
200 400 600 800
−
20
−
10
0
10
20
Fitted values
R
es
id
ua
ls
Residuals vs Fitted
70
92
98
Page 9 of 36
STATS 201/208
> normcheck(fit1, main = "")
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
Residuals from lm(Damage ~ Strength)
> cooks20x(fit1)
observation number
0 20 40 60 80 1000
.0
0
0.
02
0.
04
0.
06
Cook's Distance plot
Co
ok
's
di
st
an
ce
92
82
13
Page 10 of 36
STATS 201/208
> summary(fit1)
Call:
lm(formula = Damage ~ Strength, data = Attack.df)
Residuals:
Min 1Q Median 3Q Max
-17.8525 -5.6600 0.1072 5.2280 16.9520
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.518e+02 1.912e+00 -79.38 <2e-16 ***
Strength 2.006e+00 6.029e-03 332.68 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.731 on 98 degrees of freedom
Multiple R-squared: 0.9991,Adjusted R-squared: 0.9991
F-statistic: 1.107e+05 on 1 and 98 DF, p-value: < 2.2e-16
> confint(fit1)
2.5 % 97.5 %
(Intercept) -155.550376 -147.962755
Strength 1.993782 2.017711
> preddata = data.frame(Strength = 100)
> predict(fit1, preddata, interval = "confidence")
fit lwr upr
1 48.81811 46.07553 51.56069
Page 11 of 36
STATS 201/208
Appendix B Triplefin Respiration Rate Data
Intertidal fishes inhabit the narrow body of water between the high and low tide marks
along rocky coastlines, many of them remaining in the rock pools that form at low
tide. The temperatures in these rock pools can rise during the low tide cycle leading
to a reduction in the amount of oxygen dissolvable in the seawater. Critically high
temperatures can result in a mismatch occuring between oxygen supply and demand,
leading to hypoxia (low oxygen) and, eventually, death.
A small group of researchers from The University of Auckland (UoA) studied two closely
related species of New Zealand triplefin fishes, namely Bellapiscis medius (BM),
inhabiting intertidal rock pools, and B. lesleyae (BL), inhabiting low intertidal and
subtidal environments. Specimens of each species were caught at the same location and
held in tanks with the water temperature set at 15◦C for one month. Individual fishes
were then randomly selected and put into a new tank in which the water temperature
was set at 10, 15, 20 or 25◦C. The target temperature of the water was reached after
approximately 2 hours. The fish remained in the tank for a further 30 minutes and then
had its respiration rate measured.
The variables in this study are as follows:
Species a factor with two levels indicating the species of New Zealand triplefin
fishes, either BM (B. medius) or BL (B. lesleyae).
Note: Treat BM as intertidal and BL as subtidal.
Temp a factor with four levels indicating the seawater temperature in the tank
the fish was placed in, one of 10, 15, 20 or 25 ◦C.
Rate the respiration rate, in metric tons.
Respiration rates typically increase as dissolved oxygen concentration decreases, so higher
respiration rates would be indicative of greater tolerance to an increase in water
temperature.
The UoA researchers were interested in whether differences in tolerance due to wa-
ter temperature depend of whether fish are intertidal or subtidal. What were any
differences in tolerances due to water temperature between intertidal or subtidal fishes? In
particular, is there any evidence that intertidal fishes have greater tolerance to higher
water temperatures than subtidal fishes?
> rate.df <- transform(rate.df,
+ logRate = log(Rate),
+ Temp = factor(rate.df$Temp))
Page 12 of 36
STATS 201/208
> interactionPlots(Rate ~ Temp + Species, data = rate.df)
Plot of 'Rate'
by levels of 'Temp' and 'Species'
Temp
R
at
e
0.
2
0.
4
0.
6
0.
8
10 15 20 25
BM
BL
> rate.fit = lm(Rate ~ Temp * Species, data = rate.df)
> plot(rate.fit , which = 1)
0.2 0.3 0.4 0.5
−
0.
2
0.
0
0.
1
0.
2
0.
3
Fitted values
R
es
id
ua
ls
Residuals vs Fitted
84
68
92
Page 13 of 36
STATS 201/208
> interactionPlots(logRate ~ Temp + Species, data = rate.df)
Plot of 'logRate'
by levels of 'Temp' and 'Species'
Temp
lo
gR
at
e
−
2.
5
−
2.
0
−
1.
5
−
1.
0
−
0.
5
0.
0
10 15 20 25
BM
BL
> lograte.fit = lm(logRate ~ Temp * Species, data = rate.df)
> plot(lograte.fit , which = 1)
−2.0 −1.8 −1.6 −1.4 −1.2 −1.0 −0.8
−
0.
4
0.
0
0.
2
0.
4
Fitted values
R
es
id
ua
ls
Residuals vs Fitted
68 84
14
Page 14 of 36
STATS 201/208
> normcheck(lograte.fit , main = "")
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
Residuals from lm(logRate ~ Temp * Species)
> cooks20x(lograte.fit)
observation number
0 20 40 60 800
.0
0
0.
02
0.
04
0.
06
Cook's Distance plot
Co
ok
's
di
st
an
ce
14 68
16
Page 15 of 36
STATS 201/208
> anova(lograte.fit)
Analysis of Variance Table
Response: logRate
Df Sum Sq Mean Sq F value Pr(>F)
Temp 3 23.5635 7.8545 218.2875 < 2.2e-16 ***
Species 1 0.0228 0.0228 0.6332 0.4283
Temp:Species 3 1.7573 0.5858 16.2796 1.755e-08 ***
Residuals 87 3.1305 0.0360
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> summary(lograte.fit)
Call:
lm(formula = logRate ~ Temp * Species, data = rate.df)
Residuals:
Min 1Q Median 3Q Max
-0.39742 -0.11028 0.00239 0.13441 0.43660
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.88840 0.06707 -28.157 < 2e-16 ***
Temp15 0.25515 0.08524 2.993 0.00359 **
Temp20 0.58336 0.08524 6.844 1.03e-09 ***
Temp25 1.22617 0.08524 14.385 < 2e-16 ***
SpeciesBM 0.05775 0.08998 0.642 0.52265
Temp15:SpeciesBM -0.38569 0.11676 -3.303 0.00139 **
Temp20:SpeciesBM 0.35559 0.11774 3.020 0.00332 **
Temp25:SpeciesBM -0.05311 0.11676 -0.455 0.65035
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1897 on 87 degrees of freedom
Multiple R-squared: 0.8901,Adjusted R-squared: 0.8812
F-statistic: 100.6 on 7 and 87 DF, p-value: < 2.2e-16
Page 16 of 36
STATS 201/208
> summary2way(lograte.fit, page = "interaction")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = fit)
$`Comparisons within Temp`
diff lwr upr p adj
10:BM-10:BL 0.057753100 -0.2217129 0.33721909 0.9981469
15:BM-15:BL -0.327933863 -0.5590238 -0.09684393 0.0007549
20:BM-20:BL 0.413347758 0.1774926 0.64920293 0.0000129
25:BM-25:BL 0.004646926 -0.2264430 0.23573686 1.0000000
$`Comparisons between Temp`
diff lwr upr p adj
15:BL-10:BL 0.2551468 -0.009599945 0.5198936 0.0672285
20:BL-10:BL 0.5833588 0.318611986 0.8481055 0.0000000
25:BL-10:BL 1.2261675 0.961420688 1.4909142 0.0000000
20:BL-15:BL 0.3282119 0.097121997 0.5593019 0.0007446
25:BL-15:BL 0.9710206 0.739930699 1.2021106 0.0000000
25:BL-20:BL 0.6428087 0.411718768 0.8738986 0.0000000
15:BM-10:BM -0.1305401 -0.378356475 0.1172762 0.7272595
20:BM-10:BM 0.9389534 0.686687595 1.1912192 0.0000000
25:BM-10:BM 1.1730613 0.925244946 1.4208776 0.0000000
20:BM-15:BM 1.0694936 0.833638376 1.3053487 0.0000000
25:BM-15:BM 1.3036014 1.072511488 1.5346914 0.0000000
25:BM-20:BM 0.2341079 -0.001747306 0.4699630 0.0531724
> exp(c(-0.5590238, -0.09684393))
[1] 0.5717670 0.9076977
> 100 * (exp(c(-0.5590238, -0.09684393)) - 1)
[1] -42.823305 -9.230234
> exp(c(0.1774926, 0.64920293))
[1] 1.194219 1.914015
> 100 * (exp(c(0.1774926, 0.64920293)) - 1)
[1] 19.42192 91.40146
Page 17 of 36
STATS 201/208
Appendix C Glass Fragments Data
Wong (2007) was interested in modeling the number of glass fragments on the ground
that were recovered after a window pane was shot with a handgun. In particular, he ex-
plored the hypothesis that the expected number of fragments was altered by the velocity,
hardness, and shape (profile) of the projectile – the projectile being the lead part of a
bullet. In the analyses below attention is restricted to only the velocity (V) and hardness
(H) of the projectile.
The variables are:
Count the number of recovered glass fragments
V the velocity of the projectile (in metres per second)
H a factor with two levels indicating the hardness of the projectile, one of
HRH40 (soft) or HRH55 (hard)
Our interest here lies in understanding whether the nature of the relationship between
the expected number of recovered glass fragments and projectile velocity depends on
a projectile’s hardness. More specifically, we are interested in whether the percentage
increase in the expected number of recovered glass fragments with increased velocity
depends on the projectile’s hardness. Furthermore, we are interested in estimating any
such percentage increases.
Page 18 of 36
STATS 201/208
> plot(Count ~ V, pch = c(1:2)[glass.df$H], data = glass.df)
600 800 1000 1200 1400
0
50
00
10
00
0
15
00
0
20
00
0
V
Co
un
t
Hardness
HRH40
HRH55
> plot(log(Count) ~ V, pch = c(1:2)[glass.df$H], data = glass.df)
600 800 1000 1200 1400
5
6
7
8
9
10
V
lo
g(C
ou
nt)
Hardness
HRH40
HRH55
Page 19 of 36
STATS 201/208
> glass.gfit <- glm(Count ~ H*V, family = poisson, data = glass.df)
> plot(glass.gfit, which = 1, lty = 3)
> summary(glass.gfit)
Call:
glm(formula = Count ~ H * V, family = poisson, data = glass.df)
Deviance Residuals:
Min 1Q Median 3Q Max
-61.275 -11.044 -4.205 4.552 74.495
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.606e+00 2.821e-02 92.39 <2e-16 ***
HHRH55 1.296e+00 3.718e-02 34.86 <2e-16 ***
V 5.425e-03 2.412e-05 224.91 <2e-16 ***
HHRH55:V -1.787e-03 3.125e-05 -57.19 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 159051 on 35 degrees of freedom
Residual deviance: 21880 on 32 degrees of freedom
AIC: 22211
Number of Fisher Scoring iterations: 4
> 1 - pchisq(21880, 32)
[1] 0
Page 20 of 36
STATS 201/208
> glass.quasigfit <- glm(Count ~ H * V, family = quasipoisson, data = glass.df)
> summary(glass.quasigfit)
Call:
glm(formula = Count ~ H * V, family = quasipoisson, data = glass.df)
Deviance Residuals:
Min 1Q Median 3Q Max
-61.275 -11.044 -4.205 4.552 74.495
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.6059085 0.7725498 3.373 0.00196 **
HHRH55 1.2961129 1.0184172 1.273 0.21230
V 0.0054254 0.0006607 8.212 2.22e-09 ***
HHRH55:V -0.0017872 0.0008559 -2.088 0.04483 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 750.176)
Null deviance: 159051 on 35 degrees of freedom
Residual deviance: 21880 on 32 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
Page 21 of 36
STATS 201/208
> confint(glass.quasigfit)
2.5 % 97.5 %
(Intercept) 0.972690034 4.0122136166
HHRH55 -0.683428951 3.3350412572
V 0.004207561 0.0068068053
HHRH55:V -0.003503468 -0.0001332541
> exp(confint(glass.quasigfit))
2.5 % 97.5 %
(Intercept) 2.6450502 55.2690798
HHRH55 0.5048828 28.0795417
V 1.0042164 1.0068300
HHRH55:V 0.9965027 0.9998668
> exp(confint(glass.quasigfit)[3, ])
2.5 % 97.5 %
1.004216 1.006830
> exp(100 * confint(glass.quasigfit)[3, ])
2.5 % 97.5 %
1.523113 1.975221
> 100 * (exp(100 * confint(glass.quasigfit)[3, ]) - 1)
2.5 % 97.5 %
52.31128 97.52215
Page 22 of 36
STATS 201/208
Changing the baseline level of H to HRH55
> glass.df$H <- relevel(glass.df$H, ref = "HRH55")
> glass.quasigfit <- glm(Count ~ H * V, family = quasipoisson, data = glass.df)
> summary(glass.quasigfit)
Call:
glm(formula = Count ~ H * V, family = quasipoisson, data = glass.df)
Deviance Residuals:
Min 1Q Median 3Q Max
-61.275 -11.044 -4.205 4.552 74.495
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.9020214 0.6635815 5.880 1.54e-06 ***
HHRH40 -1.2961129 1.0184172 -1.273 0.2123
V 0.0036382 0.0005441 6.687 1.51e-07 ***
HHRH40:V 0.0017872 0.0008559 2.088 0.0448 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 750.176)
Null deviance: 159051 on 35 degrees of freedom
Residual deviance: 21880 on 32 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
Page 23 of 36
STATS 201/208
> confint(glass.quasigfit)
2.5 % 97.5 %
(Intercept) 2.5131267957 5.123678079
HHRH40 -3.3350412572 0.683428951
V 0.0026114910 0.004751810
HHRH40:V 0.0001332541 0.003503468
> exp(confint(glass.quasigfit))
2.5 % 97.5 %
(Intercept) 12.34346528 167.951976
HHRH40 0.03561312 1.980658
V 1.00261490 1.004763
HHRH40:V 1.00013326 1.003510
> exp(confint(glass.quasigfit)[3, ])
2.5 % 97.5 %
1.002615 1.004763
> exp(100 * confint(glass.quasigfit)[3, ])
2.5 % 97.5 %
1.298421 1.608305
> 100 * (exp(100 * confint(glass.quasigfit)[3, ]) - 1)
2.5 % 97.5 %
29.84212 60.83053
Page 24 of 36
STATS 201/208
Appendix D Titanic Data
Over 2200 passengers and crew were on board the luxury steamship RMS Titanic when
it set sail on its maiden voyage from Southampton to New York City. More than 1,500
men, women and children lost their lives when, in the early hours of 15 April, 1912, the
ship collided with an iceberg and began taking on water!
Data obtained on the passengers and crew on board the Titanic have led to numerous
articles, including one which appeared in the Independent newspaper in 2009 almost a
century after the disaster with the headline More Britons than Americans died on Titanic
‘because they queued’.
Data on the passengers of the Titanic was analysed. The variables (of interest to us) in
the data set are:
survived A binary variable which takes the value 1 if a passenger survived or 0
if he or she died.
PClass A factor variable indicating a passenger’s ticket class 1st, 2nd or 3rd.
Residence A factor variable indicating a passenger’s country of residence,
namely American, British or Other.
In the following analysis, we are interested in whether or not the 2009 headline is mis-
leading. Specifically, did Britons have a lower odds of surviving than Americans? Or was
it, as others have speculated, that most Americans were in 1st class which was the main
factor influencing survival?
Plots of the numbers of passengers and survival rates of passengers by combination of
Residence and PClass have been created and are on the next page. The R-code has
been hidden to save space.
Page 25 of 36
STATS 201/208
1st 2nd 3rd
American
British
Other
Number of Passengers by combination of Residence and Class
Passenger class
N
um
be
r o
f P
a
ss
e
n
ge
rs
0
10
0
20
0
30
0
40
0
50
0
60
0
1st 2nd 3rd
American
British
Other
Percentage Survived by combination of Residence and Class
Passenger class
Su
rv
ive
d
(%
)
0
20
40
60
80
10
0
Page 26 of 36
STATS 201/208
> titanic.gfit <- glm(survived ~ PClass * Residence, family = binomial,
+ data = titanic.df)
> anova(titanic.gfit, test = "Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: survived
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 1308 1741.0
PClass 2 127.765 1306 1613.3 < 2e-16 ***
Residence 2 7.129 1304 1606.1 0.02831 *
PClass:Residence 4 2.001 1300 1604.1 0.73567
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> titanic.gfit1 <- glm(survived ~ PClass + Residence, family = binomial,
+ data = titanic.df)
> anova(titanic.gfit1, test = "Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: survived
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 1308 1741.0
PClass 2 127.765 1306 1613.3 < 2e-16 ***
Residence 2 7.129 1304 1606.1 0.02831 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Page 27 of 36
STATS 201/208
> summary(titanic.gfit1)
Call:
glm(formula = survived ~ PClass + Residence, family = binomial,
data = titanic.df)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4210 -0.7889 -0.7889 0.9750 1.7985
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.55649 0.13815 4.028 5.62e-05 ***
PClass2nd -0.60137 0.18444 -3.260 0.00111 **
PClass3rd -1.50428 0.16771 -8.970 < 2e-16 ***
ResidenceBritish -0.44838 0.20271 -2.212 0.02697 *
ResidenceOther -0.05988 0.17778 -0.337 0.73626
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1741.0 on 1308 degrees of freedom
Residual deviance: 1606.1 on 1304 degrees of freedom
AIC: 1616.1
Number of Fisher Scoring iterations: 4
Page 28 of 36
STATS 201/208
> confint(titanic.gfit1)
2.5 % 97.5 %
(Intercept) 0.2878516 0.83007470
PClass2nd -0.9644187 -0.24078947
PClass3rd -1.8361431 -1.17814378
ResidenceBritish -0.8463438 -0.05100489
ResidenceOther -0.4068087 0.29068337
> exp(confint(titanic.gfit1))
2.5 % 97.5 %
(Intercept) 1.3335593 2.2934901
PClass2nd 0.3812047 0.7860071
PClass3rd 0.1594312 0.3078496
ResidenceBritish 0.4289805 0.9502740
ResidenceOther 0.6657716 1.3373411
> 100 * (exp(confint(titanic.gfit1)) - 1)
2.5 % 97.5 %
(Intercept) 33.35593 129.349007
PClass2nd -61.87953 -21.399291
PClass3rd -84.05688 -69.215036
ResidenceBritish -57.10195 -4.972597
ResidenceOther -33.42284 33.734107
Page 29 of 36
STATS 201/208
Appendix E New Zealand Visitors Data
The following data contains the combined monthly figures for visitors to New Zealand
from January 2011 to December 2019.
The variable is:
Visitors the monthly number of visitors to New Zealand.
Note: Define Month as a factor for month of the year coded as 1 for January, 2 for
February, etc. Define Time as a numeric variable with values 1 for January 2011, 2 for
February 1980, ..., 108 for December 2019.
> Time = 1:108
> Month = factor(visitors.df$Month[1:108])
Page 30 of 36
STATS 201/208
> Visitors.ts = ts(visitors.df$Visitors[1:108],start=2011,frequency=12)
> plot(Visitors.ts,xlab="month",ylab="numbers of visitors",
+ main="NZ Visitors 2011 to 2019")
NZ Visitors 2011 to 2019
month
n
u
m
be
rs
o
f v
isi
to
rs
2012 2014 2016 2018 2020
2e
+0
5
4e
+0
5
> log.Visitors.ts=ts(log(visitors.df$Visitors[1:108]),start=2011,frequency=12)
> plot(log.Visitors.ts,xlab="month",ylab="(log)numbers of visitors",
+ main="(log) NZ Visitors 2011 to 2019")
(log) NZ Visitors 2011 to 2019
month
(lo
g)n
u
m
be
rs
o
f v
isi
to
rs
2012 2014 2016 2018 2020
11
.8
12
.2
12
.6
13
.0
Page 31 of 36
STATS 201/208
> Visitors.fit1 = lm(log(Visitors.ts) ~ Time + Month)
> acf(residuals(Visitors.fit1))
0 5 10 15 20
−
0.
2
0.
2
0.
4
0.
6
0.
8
1.
0
Lag
AC
F
> Visitors.fit2 = lm(log(Visitors.ts[-1])
+ ~Time[-1]+Month[-1]+log(Visitors.ts[-108]))
> acf(residuals(Visitors.fit2))
0 5 10 15 20
−
0.
2
0.
2
0.
4
0.
6
0.
8
1.
0
Lag
AC
F
Page 32 of 36
STATS 201/208
> plot.ts(residuals(Visitors.fit2), main = "Residual Series")
Residual Series
Time
re
si
du
al
s(V
isi
tor
s.f
it2
)
0 20 40 60 80 100
−
0.
10
0.
00
0.
10
0.
20
> normcheck(Visitors.fit2, main = "")
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
Residuals from lm(log(Visitors.ts[−1]) ~ Time[−1] + Month[−1] + log(Visitors.ts[−108]))
Page 33 of 36
STATS 201/208
> summary(Visitors.fit2)
Call:
lm(formula = log(Visitors.ts[-1]) ~ Time[-1] + Month[-1] + log(Visitors.ts[-108]))
Residuals:
Min 1Q Median 3Q Max
-0.096130 -0.027219 -0.005955 0.026706 0.196144
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.0440122 1.1388469 5.307 7.53e-07 ***
Time[-1] 0.0025502 0.0004836 5.274 8.67e-07 ***
Month[-1]2 0.1958563 0.0348421 5.621 1.97e-07 ***
Month[-1]3 0.0531338 0.0320116 1.660 0.100318
Month[-1]4 -0.0949275 0.0400473 -2.370 0.019834 *
Month[-1]5 -0.3101308 0.0561956 -5.519 3.07e-07 ***
Month[-1]6 -0.1456389 0.0830669 -1.753 0.082850 .
Month[-1]7 0.0180168 0.0826995 0.218 0.828017
Month[-1]8 -0.0891090 0.0685877 -1.299 0.197088
Month[-1]9 -0.0161417 0.0706131 -0.229 0.819687
Month[-1]10 0.0008575 0.0654930 0.013 0.989582
Month[-1]11 0.2143367 0.0615398 3.483 0.000758 ***
Month[-1]12 0.4826646 0.0427035 11.303 < 2e-16 ***
log(Visitors.ts[-108]) 0.5012654 0.0896925 5.589 2.27e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04886 on 93 degrees of freedom
Multiple R-squared: 0.9788,Adjusted R-squared: 0.9758
F-statistic: 330.4 on 13 and 93 DF, p-value: < 2.2e-16
> log.Visitors.ts[108]
[1] 13.17727
Page 34 of 36
STATS 201/208
> HW.fit = HoltWinters(Visitors.ts, seasonal = "multiplicative")
> HW.fit
Holt-Winters exponential smoothing with trend and multiplicative seasonal component.
Call:
HoltWinters(x = Visitors.ts, seasonal = "multiplicative")
Smoothing parameters:
alpha: 0.1096714
beta : 0.1615218
gamma: 0.5832997
Coefficients:
[,1]
a 3.207408e+05
b 3.129164e+01
s1 1.240819e+00
s2 1.314053e+00
s3 1.189316e+00
s4 9.472730e-01
s5 6.874044e-01
s6 6.778433e-01
s7 7.987935e-01
s8 7.787323e-01
s9 8.174633e-01
s10 8.851151e-01
s11 1.168538e+00
s12 1.649082e+00
> HW.Mult.pred = predict(HW.fit,n.ahead=12,prediction.interval=TRUE)
> HW.Mult.pred
fit upr lwr
Jan 2020 398020.0 414650.9 381389.1
Feb 2020 421552.5 438561.6 404543.4
Mar 2020 381573.9 398877.1 364270.7
Apr 2020 303947.6 321332.5 286562.8
May 2020 220586.2 237893.4 203278.9
Jun 2020 217539.3 235299.8 199778.8
Jul 2020 256380.6 275336.8 237424.5
Aug 2020 249966.1 269568.2 230364.1
Sep 2020 262424.0 283244.4 241603.7
Oct 2020 284169.5 306785.1 261553.9
Nov 2020 375199.9 403209.9 347190.0
Dec 2020 529547.0 564448.3 494645.8
Page 35 of 36
STATS 201/208
> plot(HW.fit,HW.Mult.pred,
+ main="HW Multiplicative Model and Predictions for NZ Visitors in 2020")
HW Multiplicative Model and Predictions for NZ Visitors in 2020
Time
O
bs
er
ve
d
/ F
itt
ed
2012 2014 2016 2018 2020
2e
+0
5
4e
+0
5
Page 36 of 36

欢迎咨询51作业君