程序代写案例-MAST30027-Assignment 3

MAST30027: Modern Applied Statistics
Assignment 3, 2022.
Due: 11:59pm Sunday October 2rd
• This assignment is worth 14% of your total mark.
• T
o get full marks, show your working including 1) R commands and outputs you use, 2)
mathematics derivation, and 3) rigorous explanation why you reach conclusions or answers.
If you just provide final answers, you will get zero mark.
• The assignment you hand in must be typed (except for math formulas), and be submitted
using LMS as a single PDF document only (no other formats allowed). For math formulas,
you can take a picture of them. Your answers must be clearly numbered and in the same
order as the assignment questions.
• The LMS will not accept late submissions. It is your responsibility to ensure that your
assignments are submitted correctly and on time, and problems with online submissions are
not a valid excuse for submitting a late or incorrect version of an assignment.
• We will mark a selected set of problems. We will select problems worth ≥ 50% of the full
marks listed.
• If you need an extension, please contact the tutor coordinator before the due date with
appropriate justification and supporting documents. Late assignments will only be accepted
if you have obtained an extension from the tutor coordinator before the due date. Under
no circumstances an assignment will be marked if solutions for it have been released. Please
DO NOT email the lecturer for extension request.
• Also, please read the “Assessments” section in “Subject Overview” page of the LMS.
1. The file assignment3 prob1.txt contains 300 observations. We can read the observations
and make a histogram as follows.
> X = scan(file="assignment3_prob1.txt", what=double())
Read 300 items
> length(X)
[1] 300
> hist(X)
We will model the observed data using a mixture of three binomial distributions. Specifically,
we assume the observations X1, . . . , X300 are independent to each other, and each Xi follows
this mixture model:
Zi ∼ categorical (pi1, pi2, 1− pi1 − pi2),
Xi|Zi = 1 ∼ Binomial(20, p1),
Xi|Zi = 2 ∼ Binomial(20, p2),
Xi|Zi = 3 ∼ Binomial(20, p3).
1
The binomial distribution has probability mass function
f(x;m, p) =
(
m
x
)
px(1− p)m−x.
We aim to obtain MLE of parameters θ = (pi1, pi2, p1, p2, p3) using the EM algorithm.
(a) (5 marks) Let X = (X1, . . . , X300) and Z = (Z1, . . . , Z300). Derive the expectation
of the complete log-likelihood, Q(θ, θ0) = EZ|X,θ0 [log(P (X,Z|θ))].
(b) (3 marks) Derive E-step of the EM algorithm.
(c) (5 marks) Derive M-step of the EM algorithm.
(d) (5 marks) Note: Your answer for this problem should be typed. Answers
including screen-captured R codes or figures won’t be marked.
Implement the EM algorithm and obtain MLE of the parameters by applying the imple-
mented algorithm to the observed data, X1, . . . , X300. Set EM iterations to stop when either
the number of EM-iterations reaches 100 (max.iter = 100) or the incomplete log-likelihood
has changed by less than 0.00001 ( = 0.00001). Run the EM algorithm two times with
the following two different initial values and report estimators with the highest incomplete
log-likelihood.
pi1 pi2 p1 p2 p3
1st initial values 0.3 0.3 0.2 0.5 0.7
2nd initial values 0.1 0.2 0.1 0.3 0.7
For each EM run, check that the incomplete log-likelihoods increase at each EM-step by
plotting them.
2. The file assignment3 prob2.txt contains 100 observations. We can read the 300 observa-
tions from the problem 1 and the new 100 observations and make histograms as follows.
> X = scan(file="assignment3_prob1.txt", what=double())
Read 300 items
> X.more = scan(file="assignment3_prob2.txt", what=double())
Read 100 items
> length(X)
[1] 300
> length(X.more)
[1] 100
2
> par(mfrow=c(2,2))
> hist(X, xlim=c(0,20), ylim=c(0,80))
> hist(X.more, xlim=c(0,20), ylim=c(0,80))
> hist(c(X,X.more), xlim=c(0,20), ylim=c(0,80), xlab="X + X.more", main = "Histogram of X + X.more")
3
LetX1, . . . , X300 andX301, . . . , X400 denote the 300 observations from assignment3 prob1.txt
and the 100 observations from assignment3 prob2.txt, respectively. We assume the ob-
servations X1, . . . , X400 are independent to each other. We model X1, . . . , X300 (from
assignment3 prob1.txt) using the mixture of three binomial distributions (as we did in the
problem 1), but we model X301, . . . , X400 (from assignment3 prob2.txt) using one of the
three binomial distributions. Specifically, for i = 1, . . . , 300, Xi follows this mixture model:
Zi ∼ categorical (pi1, pi2, 1− pi1 − pi2),
Xi|Zi = 1 ∼ Binomial(20, p1),
Xi|Zi = 2 ∼ Binomial(20, p2),
Xi|Zi = 3 ∼ Binomial(20, p3),
and for i = 301, . . . , 400,
Xi ∼ Binomial(20, p1).
We aim to obtain MLE of parameters θ = (pi1, pi2, p1, p2, p3) using the EM algorithm.
(a) (5 marks) Let X = (X1, . . . , X400) and Z = (Z1, . . . , Z300). Derive the expectation
of the complete log-likelihood, Q(θ, θ0) = EZ|X,θ0 [log(P (X,Z|θ))].
(b) (5 marks) Derive E-step and M-step of the EM algorithm.
(c) (5 marks) Note: Your answer for this problem should be typed. Answers
including screen-captured R codes or figures won’t be marked.
Implement the EM algorithm and obtain MLE of the parameters by applying the imple-
mented algorithm to the observed data, X1, . . . , X400. Set EM iterations to stop when either
the number of EM-iterations reaches 100 (max.iter = 100) or the incomplete log-likelihood
has changed by less than 0.00001 ( = 0.00001). Run the EM algorithm two times with
the following two different initial values and report estimators with the highest incomplete
log-likelihood.
pi1 pi2 p1 p2 p3
1st initial values 0.3 0.3 0.2 0.5 0.7
2nd initial values 0.1 0.2 0.1 0.3 0.7
For each EM run, check that the incomplete log-likelihoods increase at each EM-step by
plotting them.
4

欢迎咨询51作业君
51作业君 51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: ITCSdaixie