STATS 786 SEMESTER 1, 2021 STATISTICS Special Topic in Statistical Computing (Time Series Forecasting for Data Science) NOTE: For constants and , ( + )2 = 2 + 2 + 2. Page 1 of 15 STATS 786 1 Select FOUR of the following scenarios. State whether the underlined statements are true or false. You MUST provide reasoning for your answer. a Consider a time series generated from the following model: = 0 + 12 + −12 + , where is a white noise series, 0 and 1 are constants. A seasonal differenc- ing and a first differencing are sufficient to make the time series stationary. b The following sample autocorrelations are computed using a time series of length 500: lag () 1 2 3 4 5 k 0.208 -0.44 -0.166 -0.036 -0.021 Assume that the sample autocorrelations are approximately normally dis- tributed. Only the first two autocorrelations are statistically significant at the 5% level. c The MA(2) model given below is stationary and invertible: = 0.3 + + 1.2−1 + 0.8−2, where the white noise series ∼ (0, 4). d Suppose the error sum of squares has decreased after including an additional term in the model. As a result, the value of the Akaike Information Criterion (AIC) will also decrease. e The ETS(A,N,N) model has a flat forecast function, and the width of the pointwise prediction intervals is fixed. f The ACF plot shown to the right of Figure 1 does not match with the time plot given. 0.5 0.7 0.9 1.1 2002 Jan 2004 Jan 2006 Jan 2008 Jan Month Sa le s (in m illio ns ) −0.4 0.0 0.4 0.8 5 10 15 lag a cf Figure 1: Time and ACF plots. [Total: 20 marks] Page 2 of 15 STATS 786 2 Figures 2 and 3 show the time, seasonal, and subseries plots for quarterly sales (in millions of dollars) from food and beverage services in New Zealand over the period 1995 Q3–2020 Q4. 1000 1500 2000 2500 3000 2000 Q1 2010 Q1 2020 Q1 Quarter [1Q] Sa le s (in m illio ns of do lla rs) 1995199678 19992000 20012002 2003 2004 20052006 20072008 200910 20112 2013 2014 2015 2016 2017 2018 201920 1000 1500 2000 2500 3000 Q1 Q2 Q3 Q4 Quarter Sa le s (in m illio ns of do lla rs) Figure 2: Time and seasonal plots for sales in food and beverage services. Page 3 of 15 STATS 786 Q1 Q2 Q3 Q4 20 00 20 10 20 20 20 00 20 10 20 20 20 00 20 10 20 20 20 00 20 10 20 20 1000 1500 2000 2500 3000 Quarter Sa le s (in m illio ns of do lla rs) Figure 3: Subseries plot for sales in food and beverage services. a Using Figures 2 and 3, describe the sales data for food and beverage services in New Zealand. Your answer must refer to information obtained from all three plots. [9 marks] b The sales time series is decomposed into its components using two different settings. The estimates of the decomposition are shown in Figure 4. Comment on • what is plotted in all eight panels of Figure 4; • the behaviour of each component over time; • the effect of using robust = TRUE. Which setting would you consider appropriate for this time series? [16 marks] [Total: 25 marks] Page 4 of 15 STATS 786 log(Sales) trend se a so n _ye a r re m ainder 2000 Q1 2010 Q1 2020 Q1 7.0 7.4 7.8 6.75 7.00 7.25 7.50 7.75 8.00 −0.05 0.00 0.05 −0.4 −0.3 −0.2 −0.1 0.0 0.1 Quarter STL(log(Sales) ~ trend(window = 17)) Decomposition setting 1 log(Sales) trend se a so n _ye a r re m ainder 2000 Q1 2010 Q1 2020 Q1 7.0 7.4 7.8 6.8 7.2 7.6 8.0 −0.025 0.000 0.025 0.050 −0.4 −0.2 0.0 Quarter STL(log(Sales) ~ trend(window = 17), robust = TRUE) Decomposition setting 2 Figure 4: Decomposition settings 1 and 2. Page 5 of 15 STATS 786 3 Figure 5 shows the number of employees’ in food and beverage stores in the US over the period January 1990–March 2021. 2.7 2.8 2.9 3.0 3.1 3.2 1990 Jan 2000 Jan 2010 Jan 2020 Jan Month [1M] N um be r o f e m pl oy e e s' (in m illio ns ) Figure 5: Time plot for the number of employees’ in food and beverage stores in the US. a Briefly comment on the main features that you can observe in this time series? Can you identify any unusual observations? [4 marks] b The R code below is used to fit two models to the employees’ data shown in Figure 5 and to extract summary output from each model. The estimated components for the two models are shown in Figure 6. Use this information to answer questions 3(b)i–3(b)vi. fit <- employees_food %>% model( additive = ETS(Persons ~ trend("A")), damped = ETS(Persons ~ trend("Ad")) ) Page 6 of 15 STATS 786 fit %>% select(additive) %>% report() ## Series: Persons ## Model: ETS(A,A,A) ## Smoothing parameters: ## alpha = 0.253 ## beta = 0.0545 ## gamma = 0.000104 ## ## Initial states: ## l b s1 s2 s3 s4 s5 s6 ## 2.77 0.000889 0.0364 0.0254 0.00395 -0.00354 0.0128 0.0207 ## s7 s8 s9 s10 s11 s12 ## 0.0205 -0.00885 -0.0308 -0.0352 -0.0273 -0.014 ## ## sigma^2: 1e-04 ## ## AIC AICc BIC ## -1156 -1154 -1089 fit %>% select(damped) %>% report() ## Series: Persons ## Model: ETS(M,Ad,A) ## Smoothing parameters: ## alpha = 0.754 ## beta = 0.217 ## gamma = 0.24 ## phi = 0.879 ## ## Initial states: ## l b s1 s2 s3 s4 s5 s6 ## 2.79 -0.00323 0.0548 0.0333 0.0119 -0.0112 0.00827 0.0197 ## s7 s8 s9 s10 s11 s12 ## 0.0106 -0.0194 -0.0363 -0.0435 -0.0308 0.00272 ## ## sigma^2: 0 ## ## AIC AICc BIC ## -1286 -1284 -1215 Note: The ̂2 for the damped model appears in the summary output as zero due to the rounding. Page 7 of 15 STATS 786 P e rso n s le vel slope se a so n re m ainder 1990 Jan 2000 Jan 2010 Jan 2020 Jan 2.7 2.8 2.9 3.0 3.1 3.2 2.8 2.9 3.0 3.1 −0.005 0.000 0.005 −0.02 0.00 0.02 −0.04 −0.02 0.00 0.02 0.04 Month Estimated components ETS(A,A,A) decomposition P e rso n s le vel slope se a so n re m ainder 1990 Jan 2000 Jan 2010 Jan 2020 Jan 2.7 2.8 2.9 3.0 3.1 3.2 2.8 2.9 3.0 3.1 −0.010 −0.005 0.000 0.005 0.010 −0.025 0.000 0.025 0.050 −0.01 0.00 0.01 0.02 Month Estimated components ETS(M,Ad,A) decomposition Figure 6: Estimated components from the two models. Page 8 of 15 STATS 786 i Describe the differences between the two model specifications. [5 marks] ii Describe the estimated components shown in Figure 6 for the ETS(A,A,A) model. Explain how they are related to the estimated parameters. [4 marks] iii Considering the names of the R objects created above for this question, write R code to assess the fit of the additive model. [6 marks] iv What modifications would you make to the R code written in 3(b)iii to assess the fit of the damped model? [3 marks] v Figure 7 shows forecasts from the two fitted models. Based on these forecasts, which model would you choose for the given data. Give reasons for your selection. [6 marks] vi Write down the equations for the model you have chosen in 3(b)v. [5 marks] [Total: 33 marks] Page 9 of 15 STATS 786 2.7 2.9 3.1 3.3 3.5 1990 Jan 2000 Jan 2010 Jan 2020 Jan Month N um be r o f e m pl oy e e s' (in m illio ns ) level 80 95 Forecasts from ETS(A,A,A) model 2.8 3.0 3.2 3.4 1990 Jan 2000 Jan 2010 Jan 2020 Jan Month N um be r o f e m pl oy e e s' (in m illio ns ) level 80 95 Forecasts from ETS(M,Ad,A) model Figure 7: Forecasts from the two models fitted to the employees’ data. Page 10 of 15 STATS 786 4 Consider the employees’ time series data used in Question 3. The following R code creates three new variables. employees_food %>% mutate(diff_persons = difference(Persons), sdiff_persons = difference(Persons, 12), diff_sdiff_persons = difference(difference(Persons, 12))) Figures 8 and 9 show time, ACF, and PACF plots for the original employees’ time series and the new variables constructed in the R code above. P e rso n s diff_persons sdiff_persons diff_sdiff_persons 1990 Jan 2000 Jan 2010 Jan 2020 Jan 2.7 2.8 2.9 3.0 3.1 3.2 −0.04 0.00 0.04 −0.05 0.00 0.05 0.10 −0.02 0.00 0.02 0.04 0.06 Month Figure 8: Time plots related to the employees’ time series. Page 11 of 15 STATS 786 0.00 0.25 0.50 0.75 1.00 6 12 18 24 lag [1M] a cf ACF of Persons 0.0 0.5 6 12 18 24 lag [1M] a cf ACF of diff_persons 0.00 0.25 0.50 0.75 1.00 6 12 18 24 lag [1M] a cf ACF of sdiff_persons −0.2 −0.1 0.0 0.1 6 12 18 24 lag [1M] a cf ACF of diff_sdiff_persons −0.5 0.0 0.5 1.0 6 12 18 24 lag [1M] pa cf PACF of Persons −0.2 0.0 0.2 0.4 0.6 0.8 6 12 18 24 lag [1M] pa cf PACF of diff_persons 0.00 0.25 0.50 0.75 1.00 6 12 18 24 lag [1M] pa cf PACF of sdiff_persons −0.3 −0.2 −0.1 0.0 0.1 6 12 18 24 lag [1M] pa cf PACF of diff_sdiff_persons Figure 9: ACF and PACF plots related to the employees’ time series. Page 12 of 15 STATS 786 a Use Figures 8 and 9 to find an appropriate differencing to obtain a stationary time series for employees’ data. Give reasons for your selection. [6 marks] b The R code below is used to fit three models to the employees’ data and extract summary output from each model. Consider the model for your choice of differencing in 4a to answer questions 4(b)i–4(b)iii. fit <- employees_food %>% model(arima1 = ARIMA(Persons ~ pdq(d = 1) + PDQ(D = 0), stepwise = FALSE), arima2 = ARIMA(Persons ~ pdq(d = 0) + PDQ(D = 1), stepwise = FALSE), arima3 = ARIMA(Persons ~ pdq(d = 1) + PDQ(D = 1), stepwise = FALSE)) fit %>% select(arima1) %>% report() ## Series: Persons ## Model: ARIMA(4,1,0)(0,0,2)[12] ## ## Coefficients: ## ar1 ar2 ar3 ar4 sma1 sma2 ## 0.0810 -0.181 -0.1915 -0.1115 0.7616 0.5468 ## s.e. 0.0547 0.051 0.0511 0.0539 0.0602 0.0572 ## ## sigma^2 estimated as 0.0001507: log likelihood=1112 ## AIC=-2210 AICc=-2210 BIC=-2182 fit %>% select(arima2) %>% report() ## Series: Persons ## Model: ARIMA(2,0,1)(1,1,2)[12] ## ## Coefficients: ## ar1 ar2 ma1 sar1 sma1 sma2 ## 1.9639 -0.965 -0.9190 -0.737 0.121 -0.6135 ## s.e. 0.0221 0.022 0.0319 0.168 0.155 0.0936 ## ## sigma^2 estimated as 6.55e-05: log likelihood=1231 ## AIC=-2449 AICc=-2448 BIC=-2421 Page 13 of 15 STATS 786 fit %>% select(arima3) %>% report() ## Series: Persons ## Model: ARIMA(0,1,0)(0,1,2)[12] ## ## Coefficients: ## sma1 sma2 ## -0.5482 -0.1959 ## s.e. 0.0595 0.0602 ## ## sigma^2 estimated as 6.721e-05: log likelihood=1223 ## AIC=-2440 AICc=-2440 BIC=-2428 i Describe the relationship between the relevant ACF and PACF plots given in Figure 9 to the orders estimated in the ARIMA model. [5 marks] ii Write down the estimated model using the backward shift operator. [3 marks] iii Use the information given below to compute a 1-step-ahead forecast and its 95% prediction interval for the model written in 4(b)ii. [8 marks] [Total: 22 marks] Information about arima1 model ## # A tsibble: 15 x 3 [1M] ## Month Persons .resid ##
## 1 2020 Jan 3.06 -0.0124 ## 2 2020 Feb 3.05 -0.00915 ## 3 2020 Mar 3.03 -0.0222 ## 4 2020 Apr 3.01 -0.0302 ## 5 2020 May 3.08 0.0590 ## 6 2020 Jun 3.14 0.0306 ## 7 2020 Jul 3.13 -0.0202 ## 8 2020 Aug 3.13 0.0261 ## 9 2020 Sep 3.11 0.00132 ## 10 2020 Oct 3.13 0.0111 ## 11 2020 Nov 3.16 0.0165 ## 12 2020 Dec 3.18 0.0164 ## 13 2021 Jan 3.14 -0.0125 ## 14 2021 Feb 3.14 0.0212 ## 15 2021 Mar 3.13 0.00427 Page 14 of 15 STATS 786 Information about arima2 model ## # A tsibble: 15 x 3 [1M] ## Month Persons .resid ## ## 1 2020 Jan 3.06 0.00101 ## 2 2020 Feb 3.05 0.00147 ## 3 2020 Mar 3.03 -0.0190 ## 4 2020 Apr 3.01 -0.0229 ## 5 2020 May 3.08 0.0578 ## 6 2020 Jun 3.14 0.0334 ## 7 2020 Jul 3.13 -0.0170 ## 8 2020 Aug 3.13 0.0105 ## 9 2020 Sep 3.11 0.00434 ## 10 2020 Oct 3.13 0.00494 ## 11 2020 Nov 3.16 0.00419 ## 12 2020 Dec 3.18 0.0107 ## 13 2021 Jan 3.14 0.00339 ## 14 2021 Feb 3.14 0.0122 ## 15 2021 Mar 3.13 -0.00289 Information about arima3 model ## # A tsibble: 15 x 3 [1M] ## Month Persons .resid ## ## 1 2020 Jan 3.06 0.000272 ## 2 2020 Feb 3.05 -0.000909 ## 3 2020 Mar 3.03 -0.0207 ## 4 2020 Apr 3.01 -0.0259 ## 5 2020 May 3.08 0.0551 ## 6 2020 Jun 3.14 0.0335 ## 7 2020 Jul 3.13 -0.0174 ## 8 2020 Aug 3.13 0.0123 ## 9 2020 Sep 3.11 0.00191 ## 10 2020 Oct 3.13 0.00453 ## 11 2020 Nov 3.16 0.00465 ## 12 2020 Dec 3.18 0.0119 ## 13 2021 Jan 3.14 0.00414 ## 14 2021 Feb 3.14 0.0153 ## 15 2021 Mar 3.13 -0.0000886 Page 15 of 15 欢迎咨询51作业君