Statistics 726 Assignment 4 Due: October 28, 2020, 11PM Please submit a scanned document of your solutions on Canvas. The answers should be numbered accordingly. The calculations should include all the necessary steps and/or reasoning that lead to the final answer. Please submit a .R file which includes all the R codes that you have written. 1. Consider the Gaussian AR(2) process given by Xt − µ = φ1(Xt−1 − µ) + φ2(Xt−2 − µ) + εt, where E[Xt] = µ, {εt}t∈Z ∼ IIDN(0, σ2). (a) i. (3 marks) Derive an expression for the conditional log-likelihood function, logL(µ, φ1, φ2, σ|X1, X2), based on X1, X2, . . . , Xn (n > 2). ii. (3 marks) Using the result from part (a)i, demonstrate that maximizing the conditional log-likelihood function with respect to µ, φ1 and φ2 is equivalent to minimizing the least squares of the following regression model: Y = Zβ + ε, where Y ∈ Rn−2, Z ∈ R(n−2)×3, β = [µ(1 − φ1 − φ2]), φ1, φ2]> and ε = [ε3, ε4, . . . , εn] >. You are supposed to identify the representation of Y and Z. iii. (5 marks) Write an R function named CSSar2 to compute the conditional sum of squares estimates for µ, φ1, φ2 and σ 2. The function definition is given below: CSSar2 <- function(x) x A numeric vector of observations. iv. (1 mark) Generate n = 200 observations from the AR(2) process with µ = 5, φ1 = 1.5, φ2 = −0.75 and σ2 = 2. Hint: You may find arima.sim function in R to be useful. v. (2 marks) For the generated observations in part (a)iv, compute the conditional sum of squares estimates for µ, φ1, φ2 and σ 2 using the CSSar2 function defined in part (a)iii. Compare these estimates with the output from Arima function in the “forecast” package for R. (b) i. (5 marks) Find the autocovariance function of {Xt}t∈Z at lag-0 (in terms of ρ(1), ρ(2), φ1, φ2 and σ 2) and autocorrelation function at lags 1 and 2 (in terms of φ1 and φ2). ii. (3 marks) State the joint distribution of X1 and X2 in terms of γ(0) and ρ(1). Note: You are not required to substitute the expressions for γ(0) and ρ(1). 1 Assignment 4, Semester 2, 2020 iii. (6 marks) Using the result from part (b)ii, derive an expression for the negative log-likelihood function, −logL(µ, φ1, φ2, σ;X1, . . . , Xn). Hint: For easy of calculations you may use the following notations: X2 = (X1−µ,X2−µ)> and Cov(X2,X2) = σ2Γ2, where Γ2 ∈ R2×2. For non-singular matrix A ∈ Rm×m and k ∈ R: |kA| = km|A| and (kA)−1 = k−1A−1, where |A| denotes the determinant of A. iv. (3 marks) Find an expression for the maximum likelihood estimator for σ2 (de- note it by σˆ2ML). v. (2 marks) By substituting σˆ2ML into the negative log-likelihood function, find the concentrated negative log-likelihood function, −logL(µ, φ1, φ2;X1, X2, . . . , Xn) in terms of σˆ2ML,Γ2 and n. vi. (6 marks) Write an R function named neglogL to compute the concentrated negative log-likelihood function obtained in part (b)v. The function definition is given below: neglogL <- function(par, x) par A numeric vector of length 3 for the values of µ, φ1 and φ2. x A numeric vector of observations. vii. (6 marks) Using the function defined in parts (a)iii and (b)vi, write another function in R named CSSMLar2 to compute the maximum likelihood estimates for µ, φ1, φ2 and σ 2 using conditional sum of squares estimates for µ, φ1 and φ2 as the initial solutions. The function definition is given below: CSSMLar2 <- function(x) x A numeric vector of observations. Hint: Use the optim function in R to solve the non-linear concentrated negative log-likelihood function. viii. (3 marks) Using the observations generated in part (a)iv and CSSMLar2 function, find the CSS-ML estimates for µ, φ1, φ2 and σ 2. Compare these estimates with the output from Arima function in the “forecast” package for R by setting the argument method to ‘‘CSS-ML’’. Note: This question is only for pedagogical purposes. Do not assume that CSSar2 or CSSMLar2 are better replacements for Arima or any other related functions defined in R. 2. Consider the Gaussian MA(1) process defined by Xt = θεt−1 + εt, where {εt}t∈Z ∼ IIDN(0, σ2). Let X˜t−1t be the best linear predictor for Xt based on {Xt−1, . . . , X1}, Zt = Xt − X˜t−1t and Z1 = X1. Define Var(Zt) = σ 2rt. 2 Assignment 4, Semester 2, 2020 (a) (5 marks) Using the result Zt = Xt − Cov(Zt−1, Xt) Var(Zt−1) Zt−1, show that Zt = Xt − θ rt−1 Zt−1, rt = 1 + θ 2 − θ 2 rt−1 . Hint: Refer to the results in slide 29 of the “ARIMA” handout. (b) (1 mark) Write an expression for the negative log-likelihood function, −logL(θ, σ;X1, X2, . . . , Xn) for n ≥ 1. (c) (2 marks) Find an expression for the maximum likelihood estimator for σ2 (denote it by σˆ2ML). (d) (1 mark) By substituting σˆ2ML into the negative log-likelihood function, find the concentrated negative log-likelihood function, −logL(θ;X1, X2, . . . , Xn). (e) (6 marks) Write an R function named neglogL to compute the concentrated negative log-likelihood function obtained in part (d). The function definition is given below: neglogL <- function(par, x) par A numeric vector of length 1 for the value of θ. x A numeric vector of observations. (f) (6 marks) Using the function defined in part (e), write another function in R named MLma1 to compute the maximum likelihood estimates for θ and σ2. The function definition is given below: MLma1 <- function(x) x A numeric vector of observations. Hint: Set the initial value of θ to zero (when using the optim function). (g) (1 mark) Generate n = 200 observations from the MA(1) process with θ = 0.8 and σ2 = 2. (h) (3 marks) For the generated observations in part (g) compute the maximum likelihood estimates for θ and σ2 using the MLma1 function defined in part (f). Compare these estimates with the output from Arima function in the “forecast” package for R by setting the argument method to ‘‘ML’’. 3 Assignment 4, Semester 2, 2020 3. Consider the Gaussian AR(2) process defined by Xt = φ1Xt−1 + φ2Xt−2 + εt, where {εt}t∈Z ∼ IIDN(0, σ2) and σ2 = 2. (a) (3 marks) Let the two roots of the polynomial φ(z) = 1− φ1z − φ2z2 be written as z = r[cos(ω)± isin(ω)], where r = 3 and ω ∈ (−pi, pi). Choose a non-zero value for ω and calculate the values of φ1 and φ2. (b) (5 marks) Using the values of φ1 and φ2 computed in part (a) and σ 2, generate 2000 time series of length n = 50. For each series, find the maximum likelihood estimates for φ1, φ2 and σ 2. Suppose they are denoted by {φˆ1,ts}2000ts=1, {φˆ2,ts}2000ts=1 and {σˆ2ts}2000ts=1 for φ1, φ2 and σ 2, respectively. Hint: You may use Arima function in the “forecast” package for R. (c) (1 mark) Compute the mean of {φˆ1,ts}2000ts=1 obtained in part (b). (d) (1 mark) Repeat part (c) for {φˆ2,ts}2000ts=1 and {σˆ2ts}2000ts=1. (e) (6 marks) Repeat parts (b)–(d) for n = 150, n = 250 and n = 500. (f) (3 marks) Comment on the magnitude of the difference between mean of {φˆ1,ts}2000ts=1 and φ1 as n increases. Repeat this step for φ2 and σ 2. Hint: You may set the seed of the random number generator using set.seed function in R. 4. Consider the data set wmurders, which are the annual totals of women murdered (per 100,000 standard population) in the United States. The data set is available through fpp2 package in R. (a) (10 marks) Using time plots, ACF and PACF plots, suggest an appropriate ARIMA(p, d, q) model for these data. Give reasons for your selection. Note: You may consider applying a simple mathematical transformation to stabilize the variance before fitting the models. (b) (2 marks) Write the model you suggested in terms of the backward-shift operator. (c) (4 marks) Fit the model using R and examine the residuals. Is the model satisfactory? (d) (8 marks) Use the model fitted in part (c) to compute forecasts for the next 3 years (use functions in R to perform this task). Manually compute these forecasts and compare your results with those produced in R. 4 Assignment 4, Semester 2, 2020 (e) (4 marks) Does auto.arima function in R give the same model that you have chosen in part (a)? If not, which model do you think is more appropriate? Give reasons for your selection. Total: 120 marks 5
欢迎咨询51作业君