# 辅导案例-11PM-Assignment 4

Statistics 726 Assignment 4 Due: October 28, 2020, 11PM
accordingly. The calculations should include all the necessary steps and/or reasoning that lead
Please submit a .R file which includes all the R codes that you have written.
1. Consider the Gaussian AR(2) process given by
Xt − µ = φ1(Xt−1 − µ) + φ2(Xt−2 − µ) + εt,
where E[Xt] = µ, {εt}t∈Z ∼ IIDN(0, σ2).
(a) i. (3 marks) Derive an expression for the conditional log-likelihood function,
logL(µ, φ1, φ2, σ|X1, X2), based on X1, X2, . . . , Xn (n > 2).
ii. (3 marks) Using the result from part (a)i, demonstrate that maximizing the
conditional log-likelihood function with respect to µ, φ1 and φ2 is equivalent to
minimizing the least squares of the following regression model:
Y = Zβ + ε,
where Y ∈ Rn−2, Z ∈ R(n−2)×3, β = [µ(1 − φ1 − φ2]), φ1, φ2]> and ε =
[ε3, ε4, . . . , εn]
>. You are supposed to identify the representation of Y and Z.
iii. (5 marks) Write an R function named CSSar2 to compute the conditional sum
of squares estimates for µ, φ1, φ2 and σ
2.
The function definition is given below:
CSSar2 <- function(x)
x A numeric vector of observations.
iv. (1 mark) Generate n = 200 observations from the AR(2) process with µ = 5,
φ1 = 1.5, φ2 = −0.75 and σ2 = 2.
Hint: You may find arima.sim function in R to be useful.
v. (2 marks) For the generated observations in part (a)iv, compute the conditional
sum of squares estimates for µ, φ1, φ2 and σ
2 using the CSSar2 function defined
in part (a)iii. Compare these estimates with the output from Arima function in
the “forecast” package for R.
(b) i. (5 marks) Find the autocovariance function of {Xt}t∈Z at lag-0 (in terms of
ρ(1), ρ(2), φ1, φ2 and σ
2) and autocorrelation function at lags 1 and 2 (in terms
of φ1 and φ2).
ii. (3 marks) State the joint distribution of X1 and X2 in terms of γ(0) and ρ(1).
Note: You are not required to substitute the expressions for γ(0) and ρ(1).
1
Assignment 4, Semester 2, 2020
iii. (6 marks) Using the result from part (b)ii, derive an expression for the negative
log-likelihood function, −logL(µ, φ1, φ2, σ;X1, . . . , Xn).
Hint: For easy of calculations you may use the following notations: X2 =
(X1−µ,X2−µ)> and Cov(X2,X2) = σ2Γ2, where Γ2 ∈ R2×2. For non-singular
matrix A ∈ Rm×m and k ∈ R: |kA| = km|A| and (kA)−1 = k−1A−1, where |A|
denotes the determinant of A.
iv. (3 marks) Find an expression for the maximum likelihood estimator for σ2 (de-
note it by σˆ2ML).
v. (2 marks) By substituting σˆ2ML into the negative log-likelihood function, find the
concentrated negative log-likelihood function, −logL(µ, φ1, φ2;X1, X2, . . . , Xn)
in terms of σˆ2ML,Γ2 and n.
vi. (6 marks) Write an R function named neglogL to compute the concentrated
negative log-likelihood function obtained in part (b)v.
The function definition is given below:
neglogL <- function(par, x)
par A numeric vector of length 3 for the values of µ, φ1 and φ2.
x A numeric vector of observations.
vii. (6 marks) Using the function defined in parts (a)iii and (b)vi, write another
function in R named CSSMLar2 to compute the maximum likelihood estimates
for µ, φ1, φ2 and σ
2 using conditional sum of squares estimates for µ, φ1 and φ2
as the initial solutions.
The function definition is given below:
CSSMLar2 <- function(x)
x A numeric vector of observations.
Hint: Use the optim function in R to solve the non-linear concentrated negative
log-likelihood function.
viii. (3 marks) Using the observations generated in part (a)iv and CSSMLar2 function,
find the CSS-ML estimates for µ, φ1, φ2 and σ
2. Compare these estimates with
the output from Arima function in the “forecast” package for R by setting the
argument method to ‘‘CSS-ML’’.
Note: This question is only for pedagogical purposes. Do not assume that CSSar2 or
CSSMLar2 are better replacements for Arima or any other related functions defined in R.
2. Consider the Gaussian MA(1) process defined by
Xt = θεt−1 + εt,
where {εt}t∈Z ∼ IIDN(0, σ2).
Let X˜t−1t be the best linear predictor for Xt based on {Xt−1, . . . , X1}, Zt = Xt − X˜t−1t
and Z1 = X1. Define Var(Zt) = σ
2rt.
2
Assignment 4, Semester 2, 2020
(a) (5 marks) Using the result
Zt = Xt − Cov(Zt−1, Xt)
Var(Zt−1)
Zt−1,
show that
Zt = Xt − θ
rt−1
Zt−1,
rt = 1 + θ
2 − θ
2
rt−1
.
Hint: Refer to the results in slide 29 of the “ARIMA” handout.
(b) (1 mark) Write an expression for the negative log-likelihood function, −logL(θ, σ;X1, X2, . . . , Xn)
for n ≥ 1.
(c) (2 marks) Find an expression for the maximum likelihood estimator for σ2 (denote
it by σˆ2ML).
(d) (1 mark) By substituting σˆ2ML into the negative log-likelihood function, find the
concentrated negative log-likelihood function, −logL(θ;X1, X2, . . . , Xn).
(e) (6 marks) Write an R function named neglogL to compute the concentrated negative
log-likelihood function obtained in part (d).
The function definition is given below:
neglogL <- function(par, x)
par A numeric vector of length 1 for the value of θ.
x A numeric vector of observations.
(f) (6 marks) Using the function defined in part (e), write another function in R named
MLma1 to compute the maximum likelihood estimates for θ and σ2.
The function definition is given below:
MLma1 <- function(x)
x A numeric vector of observations.
Hint: Set the initial value of θ to zero (when using the optim function).
(g) (1 mark) Generate n = 200 observations from the MA(1) process with θ = 0.8 and
σ2 = 2.
(h) (3 marks) For the generated observations in part (g) compute the maximum likelihood
estimates for θ and σ2 using the MLma1 function defined in part (f). Compare these
estimates with the output from Arima function in the “forecast” package for R by
setting the argument method to ‘‘ML’’.
3
Assignment 4, Semester 2, 2020
3. Consider the Gaussian AR(2) process defined by
Xt = φ1Xt−1 + φ2Xt−2 + εt,
where {εt}t∈Z ∼ IIDN(0, σ2) and σ2 = 2.
(a) (3 marks) Let the two roots of the polynomial φ(z) = 1− φ1z − φ2z2 be written as
z = r[cos(ω)± isin(ω)],
where r = 3 and ω ∈ (−pi, pi). Choose a non-zero value for ω and calculate the values
of φ1 and φ2.
(b) (5 marks) Using the values of φ1 and φ2 computed in part (a) and σ
2, generate 2000
time series of length n = 50. For each series, find the maximum likelihood estimates
for φ1, φ2 and σ
2. Suppose they are denoted by {φˆ1,ts}2000ts=1, {φˆ2,ts}2000ts=1 and {σˆ2ts}2000ts=1
for φ1, φ2 and σ
2, respectively.
Hint: You may use Arima function in the “forecast” package for R.
(c) (1 mark) Compute the mean of {φˆ1,ts}2000ts=1 obtained in part (b).
(d) (1 mark) Repeat part (c) for {φˆ2,ts}2000ts=1 and {σˆ2ts}2000ts=1.
(e) (6 marks) Repeat parts (b)–(d) for n = 150, n = 250 and n = 500.
(f) (3 marks) Comment on the magnitude of the difference between mean of {φˆ1,ts}2000ts=1
and φ1 as n increases. Repeat this step for φ2 and σ
2.
Hint: You may set the seed of the random number generator using set.seed function in
R.
4. Consider the data set wmurders, which are the annual totals of women murdered (per
100,000 standard population) in the United States. The data set is available through
fpp2 package in R.
(a) (10 marks) Using time plots, ACF and PACF plots, suggest an appropriate ARIMA(p, d, q)
model for these data. Give reasons for your selection.
Note: You may consider applying a simple mathematical transformation to stabilize
the variance before fitting the models.
(b) (2 marks) Write the model you suggested in terms of the backward-shift operator.
(c) (4 marks) Fit the model using R and examine the residuals. Is the model satisfactory?
(d) (8 marks) Use the model fitted in part (c) to compute forecasts for the next 3 years
(use functions in R to perform this task). Manually compute these forecasts and
compare your results with those produced in R.
4
Assignment 4, Semester 2, 2020
(e) (4 marks) Does auto.arima function in R give the same model that you have chosen
in part (a)? If not, which model do you think is more appropriate? Give reasons for  