程序代写案例-MTH6102

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Main Examination period 2020 – January – Semester A
MTH6102: Bayesian Statistical Methods
Duration: 2 hours
Apart from this page, you are not permitted to read the contents of this question paper
until instructed to do so by an invigilator.
You should attempt ALL questions. Marks available are shown next to the questions.
Only non-programmable calculators that have been approved from the college list of
non-programmable calculators are permitted in this examination. Please state on your
answer book the name and type of machine used.
Complete all rough work in the answer book and cross through any work that is not to be
assessed.
Possession of unauthorised material at any time when under examination conditions is an
assessment offence and can lead to expulsion from QMUL. Check now to ensure you do not
have any unauthorised notes, mobile phones, smartwatches or unauthorised electronic devices
on your person. If you do, raise your hand and give them to an invigilator immediately.
It is also an offence to have any writing of any kind on your person, including on your body. If
you are found to have hidden unauthorised material elsewhere, including toilets and
cloakrooms, it will be treated as being found in your possession. Unauthorised material found
on your mobile phone or other electronic device will be considered the same as being in
possession of paper notes. A mobile phone that causes a disruption in the exam is also an
assessment offence.
Exam papers must not be removed from the examination room.
Examiners: J. Griffin, L. Pettit
© Queen Mary University of London (2020) Turn Over
Page 2 MTH6102 (2020)
Question 1 [12 marks].
A box contains m= 5 balls, of which r are red and the rest black. The unknown quantity is r.
Our prior distribution is that each value r = 0,1, . . . ,m has equal probability. We are told that
twice, a ball was taken out and immediately replaced, and both times the ball was red.
(a) Write down the likelihood for the observed data. What is the maximum likelihood
estimate for r? [4]
(b) Derive the normalized posterior distribution for r. What is the posterior mean for r? [5]
(c) Find the posterior predictive probability that if another ball is taken from the box, it is
black. [3]
Question 2 [34 marks].
A biased coin with probability q of landing heads is repeatedly tossed until the first head is
seen. The number of tails X before the first head is modelled as a geometric distribution with
probability mass function P(X = x) = q(1−q)x. The experiment was repeated n times and
x1, x2, . . . , xn tails were observed.
(a) Write down the likelihood for q. Show that the maximum likelihood estimate for q is
qˆ=
n
n+ S
, where S =
n∑
i=1
xi. [6]
(b) Find the Fisher information and hence the asymptotic variance for qˆ. [5]
(c) A Beta(α0,β0) distribution is chosen as the prior distribution for q. Show that the
posterior distribution is Beta(α1,β1), where you should determine α1 and β1. [6]
(d) We have n= 5 and observed data x1, . . . , xn = 4,2,5,6,3.
(i) What is the maximum likelihood estimate qˆ? [3]
(ii) Find an approximate 95% confidence interval for q. [4]
(iii) Before seeing the data, our probability distribution for q has mean 0.4 and
standard deviation 0.2. Find values of α0 and β0 corresponding to this belief. What
is then the posterior distribution for q? What is the posterior mean? [8]
(iv) Comment on the posterior mean compared to the maximum likelihood estimate
and the prior mean for this example. No further calculations or formulae are
needed here. [2]
© Queen Mary University of London (2020)
MTH6102 (2020) Page 3
Question 3 [26 marks].
We want to estimate a single unknown parameter θ in a certain model. Assume that in R we
have defined a function log post to calculate the log of the unnormalized posterior density as
a function of θ. This function and the data y being analysed are not shown in the code extract
below. The posterior density is p(θ | y). Consider the following R code:
nb = 1000
nm = 10000
theta = vector(length=nm)
s = 0.4
theta0 = 2
log post0 = log post(theta0)
for(i in 1:(nb+nm)){
theta1 = rnorm(1, mean=theta0, sd=s)
log post1 = log post(theta1)
if(log(runif(1)) < log post1-log post0){
theta0 = theta1
log post0 = log post1
}
if(i>nb) theta[i-nb] = theta0
}
stheta = sort(theta)
stheta[nm/2]
stheta[nm*0.025]
stheta[nm*0.975]
Except where stated, an explanation in words is all that is needed for this question.
(a) What is the name of the algorithm that the code is carrying out? [3]
(b) Explain what the command theta1 = rnorm(1, mean=theta0, sd=s) is doing in
the context of the algorithm. [4]
(c) Explain what the command if(log(runif(1)) < log post1-log post0) is doing
in the context of the algorithm. In your answer, include a formula involving p(θ | y) that
the code is implementing. [5]
(d) What are the effects on the behaviour of the algorithm of making the variable called s
smaller? What are the effects of making it larger? [4]
(e) What is the purpose of the variable called nb? [2]
(f) When the code has run, what will the vector theta contain? [2]
(g) In statistical terms, what will the command stheta[nm/2] output? [2]
(h) In statistical terms, what will the last two lines of code output? [4]
© Queen Mary University of London (2020) Turn Over
Page 4 MTH6102 (2020)
Question 4 [17 marks].
The observed data y= {yi j : i= 1, . . . ,n, j= 1, . . . ,mi} are the recorded counts of a disease in
district j within county i. The population of each district is Ni j. The following hierarchical
model is considered reasonable
yi j ∼ Poisson(λiNi j), j= 1, . . . ,mi
λi ∼ Gamma(α,β), i= 1, . . . ,n.
α and β are unknown parameters which are given a prior distribution p(α,β).
Suppose that we have generated a sample of size M from the joint posterior distribution
p(α,β,λ1, . . . ,λn | y).
(a) How would we obtain a sample from the marginal posterior distribution p(α,β | y) using
the joint posterior sample? How would we estimate the posterior mean for α/β? [5]
(b) Explain how to generate a sample from the posterior predictive distribution of the
disease count for a district not in our dataset with population P, in each of the following
two cases: if the county containing the district is in our dataset; or if the county is not in
our dataset. In the latter case, how would we estimate the posterior predictive
probability that the disease count in this district will be zero? [8]
(c) Give two reasons why in general we might want to use a hierarchical model instead of a
single-level model. [4]
Question 5 [11 marks].
Two models M1 and M2 are under consideration, with corresponding parameters θ and ψ. θ is
a single parameter with unbounded range. For the prior distribution p(θ | M1), we assign a
normal distribution N(0,σ2) with an extremely large value of σ so that the prior is practically
flat over the range supported by the likelihood. We also assign a prior distribution p(ψ | M2).
The observed data is y.
(a) State the formula for the Bayes factor B12 for comparing the models, in which large
values of B12 favour model M1. [5]
(b) For inference conditional upon model M1, what is the effect on the posterior mean for θ
if we replace σ with 1000σ in p(θ | M1)? [3]
(c) What is the effect on B12 if we replace σ with 1000σ in p(θ | M1)? [3]
End of Paper – An appendix of 1 page follows.
© Queen Mary University of London (2020)
MTH6102 (2020) Page 5
Appendix: common distributions
For each distribution, x is the random quantity and the other symbols are parameters.
Discrete distributions
Distribution Probability
mass function
Range of parameters
and variates
Mean Variance
Binomial
(
n
x
)
qx(1−q)n−x 0 ≤ q ≤ 1x= 0,1, . . . ,n nq nq(1−q)
Poisson
λxe−λ
x!
λ > 0
x= 0,1,2, . . . λ λ
Geometric q(1−q)x 0 < q ≤ 1x= 0,1,2, . . .
(1−q)
q
(1−q)
q2
Negative
binomial
(
r+ x−1
x
)
qr(1−q)x 0 < q ≤ 1, r > 0x= 0,1,2, . . .
r(1−q)
q
r(1−q)
q2
Continuous distributions
Distribution Probability
density function
Range of parameters
and variates
Mean Variance
Uniform
1
b−a
−∞ < a < b <∞
a < x < b
a+ b
2
(b−a)2
12
Normal N(µ,σ2)
1√
2piσ2
exp
(
−(x−µ)
2
2σ2
) −∞ < µ <∞,σ > 0
−∞ < x <∞ µ σ
2
The 95th and 97.5th percentiles of the standard N(0,1) distribution are 1.64 and 1.96, respectively.
Normal No(µ,τ)
√
τ√
2pi
exp
(
−τ(x−µ)
2
2
) −∞ < µ <∞,τ > 0
−∞ < x <∞ µ
τ−1
(precision τ)
Exponential λe−λx λ > 0x > 0
1
λ
1
λ2
Gamma
βαxα−1e−βx
Γ(α)
α > 0,β > 0
x > 0
α
β
α
β2
Beta
Γ(α+β)
Γ(α)Γ(β)
xα−1(1− x)β−1 α > 0,β > 00 < x < 1
α
α+β
αβ
(α+β)2(α+β+ 1)
End of Appendix.
© Queen Mary University of London (2020)

欢迎咨询51作业君