程序代写案例-EMESTER 2 2021

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
Australian National University RSFAS, College of Business and Economics
INTRODUCTION TO BAYESIAN DATA ANALYSIS (STAT3016/4116/7016)
SEMESTER 2 2021
ASSIGNMENT 1
DUE DATE: Friday 3 September 2021, by 11:59pm
(15% of total course grade)
(Total Marks: 55 (STAT3016); 70 (STAT4116/7016))
INSTRUCTIONS:
1. All students must hand in an assignment of their own writing.
2. The assignment should be submitted using the online submission facility Turnitin on
the course Wattle site under ‘ASSIGNMENTS/Assignment 1’
3. Begin each question on a new page. The questions are not equally weighted.
4. Typed solutions are preferred. However, you may scan and submit hand written
solutions to problems which require mathematical derivations. Be sure your handwritten
work is legible on the computer once scanned.
5. Where required, provide sufficient computer output to support your answers. Provide
enough intermediate numerical calculations to justify working for your final answer.
6. Computer output must be interpreted in written format. A solution solely highlighting
the computer output is not acceptable.
7. No late assignments will be accepted without prior permission before the due date
and time from the course convenor
COLLABORATION POLICY
University policies on plagiarism will be strictly enforced. You are encouraged to (orally)
discuss your assignments with your classmates, but each student must write up solutions
separately. Be sure that you have worked through each problem yourself and that all
answers you submit are the results of your own efforts. This includes all computer code
and output. The submission facility Turnitin will provide a similarity score after matching
your submission against other student submissions and external sources.
Australian National University RSFAS, College of Business and Economics
Problem 1 [10 marks]
How many ambulance vehicles are there in Canberra? Suppose I know there cannot be
more than 60 ambulance vehicles in Canberra. Whilst driving around last week I observed
six ambulances numbered 13, 14, 21, 26, 37 and 38. So the sample size is n = 6. I assume
that ambulances in Canberra are numbered from 1 to N, and that I am equally likely to
observe any numbered ambulance at any time. I also assume observations are independent.
To solve this problem, suppose one takes independent observations y1, ..., yn from a discrete
uniform distribution on the set {1, 2, ..., N}, where the upper bound N is unknown. Suppose
one places a uniform discrete prior for N on the values 1, ..., B, where B is known.
(a) [3 marks] Derive the posterior distribution of N up to a proportionality constant. Be
sure to specify the bounds on the parameter space of N in your posterior distribution.
(b) [3 marks] Compute posterior probabilities of N over a grid of values.
(c) [2 marks] Compute the posterior mean and posterior standard deviation of N.
(d) [2 marks] Find the posterior probability that there are more than 50 ambulance
vehicles in Canberra.
Problem 2 [5 marks]
Suppose for a binary sampling problem we plan on using a uniform, or Beta(1,1) prior
for the population proportion θ. Perhaps our reasoning is that this represents “no prior
information about θ”. However, some people like to look at proportions on the log-odds
scale, that is, they are interested in γ = log
(
θ
1−θ
)
. Via Monte-Carlo sampling or otherwise
find the prior distribution for γ that is induced by the uniform prior for θ. Is this prior
informative about γ?
Australian National University RSFAS, College of Business and Economics
Problem 3 [15 marks]
The speed limit on Gunghalin Drive in Canberra between Barton Highway and the Glenloch
interchange is 90km/h. Ben frequently drives on Gunghalin Drive and typically drives at
a constant speed of 90km/h (that is, at the speed limit) on this section of road. One day,
he passes 3 cars and gets passed by 17 cars on this section of road.
Suppose that car speeds on this section of road are normally distributed with unknown
mean µ and known standard deviation σ = 4.5. Let s = 3 denote the number of cars that
Ben overtakes and let their unobserved car speeds be y1, y2, y3. Let t = 17 denote the
number of cars that overtake Ben and let their unobserved car speeds be y4, y5, .....y20.
(a) [3 marks] Assign the unknown mean µ a flat prior density. Write down the mathematical
expression for the posterior density of µ (that is, p(µ|σ, s, t, y)) up to a proportionality
constant. Hint: the actual car speeds yi (i = 1, ..., 20) are not observed, but if Ben
passes say Car A, then we know that the speed of Car A must be less than 90km/hr.
Similarly if Car B passes Ben, then we know that the speed of Car B must be greater
than 90km/hr.
(b) [2 marks] Plot the posterior density of µ.
(c) [1 mark] Using the density found in part (b), provide a 95% interval estimate for
the average speed at which cars travel along this section of Gunghalin Drive between
Barton Highway and the Glenloch interchange.
(d) [1 mark] Estimate the probability that the average speed of the cars on this section
of Gunghalin Drive exceeds the 90km/h speed limit.
(e) [2 marks] Now let’s assume σ is unknown. Assume the non-informative joint prior
distribution p(µ, σ2) ∝ (σ2)−1. Derive the joint posterior distribution of (µ, σ2) up
to a proportionality constant.
(f) [3 marks] Create a contour plot of the joint posterior density of µ and σ2.
(g) [3 marks] Using the joint density found in part (f), provide a 95% interval estimate for
the average speed at which cars travel along this section of Gunghalin Drive between
Barton Highway and the Glenloch interchange. How does your answer compare to
your previous answer in part (c)?
Australian National University RSFAS, College of Business and Economics
Problem 4 [8 marks]
Due to COVID restrictions, final exams are currently administered online. For exams that
do not use invigilation software there has been an increase in reports of potential collusion
between students when sitting the online exam. Identical answers or matching submission
times for each question raise red flags for a potential breach of academic integrity.
Consider an online exam with 20 questions and answers are directly entered into input
boxes on Wattle (that is, no file uploads are permitted). Two students, say Student A and
Student B provided an answer to all 20 questions. Suppose for 15 of the 20 questions, the
answers from Student A are identical to the answers from Student B, and for the remaining
5 questions, the answers are very similar. As further evidence of collusion, we check the
time stamps of when each of the 20 answers were submitted.
Let yi = 1 if there is a match between the timestamp of Question i for Student A and the
timestamp of Question i for Student B, (i = 1, ..., 20). Let θ be the common probability of a
match in timestamps for any question. The assumed likelihood function is yi|θ iid∼ Bern(θ).
Therefore, if n = 20 is fixed, the joint likelihood is p(y1, ..., y20|θ) ∝ θ
∑n
i=1 yi(1−θ)n−
∑n
i=1 yi
Suppose the observed data are, in order, 1,1,1,0,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,1. What criteria
should we use to establish a case for collusion between Student A and Student B based on
the observed timestamps? Suppose the protocol for measurement is to stop once 15 ones
have appeared.
(a) [2 marks] Assume a uniform prior on θ. What is the posterior distribution p(θ|y)
under the new protocol where n is not fixed?
(b) [6 marks] Let’s run some posterior predictive checks. Define the test quantity T=
number of switches between 0 and 1 in the sequence. Simulate the replications yrep
under the new measurement protocol to stop once 15 ones have appeared. Display
the predictive simulations T (yrep) in a histogram. Compare to the distribution of
T (yrep) when n = 20 is fixed and explain any differences.
Australian National University RSFAS, College of Business and Economics
Problem 5 [17 marks]
We often use the t-distribution to model continuous random variables where the distribution
is approximately symmetric but with a higher possibility of outliers compared to a normal
distribution. Consider the following course grade data for a class of size n=15 students,
78, 50, 72, 72, 75, 72, 68, 94, 66, 92, 66, 90, 64, 71, 45. With lower and upper outliers in
the observed data, a t-distribution may be more appropriate to model the distribution of
course grades.
Suppose y1, ..., yn are a sample from a t distribution with location parameter µ, scale
parameter σ, and known degrees of freedom ν. Assuming conditional independence given
the parameters, the likelihood function is given by p(y1, ..., yn|µ, σ, ν) =
∏n
i=1
1
σ
(
1 + (yi−µ)
2
σ2
)−(ν+1)/2
.
Assuming a non-informative prior g(µ, σ) ∝ 1σ the posterior density is given by
p(µ, σ|y1, ..., yn, ν) ∝ 1
σ
n∏
i=1
1
σ
(
1 +
(yi − µ)2
σ2
)−(ν+1)/2
To obtain posterior draws of µ and σ, we can actually implement a Gibbs sampler. First,
we need to introduce a new scale parameter λ and each observation yi is represented as a
scale mixture of normals. The new representation of the model is
yi|µ, σ, λi ∼ Normal(µ, σ/

λi)
λi|ν ∼ Gamma(ν/2, ν/2)
p(µ, σ) ∝ 1/σ.
where ν is assumed fixed and known.
(a) [3 marks] Reparametise the model in terms of σ2 instead of σ, and write out the joint
posterior density of all parameters (µ, σ2, λi) (i = 1, ..., n)
(b) [2 marks] Derive the full conditional distribution of λi (i = 1, ..., n) given µ and σ
2
and ν.
(c) [2 marks] Derive the full conditional distribution of µ given λi (i = 1, ..., n) and σ
2
and ν.
(d) [2 marks] Derive the full conditional distribution of σ2 given µ and λi (i = 1, ..., n)
and ν.
(e) [3 marks] Assume ν=4. Write some code in R (or other computing package) to
implement your Gibbs sampling algorithm
Australian National University RSFAS, College of Business and Economics
(f) [2 marks] Obtain 95% posterior interval estimates for µ, σ2.
(g) [3 marks] Show some convergence diagnostics for your Gibbs sampling algorithm.
Problem 6 [STAT4116/STAT7016 ONLY [15 marks]
Consider the problem of comparing proportions from two binomial distributions, θ1 and
θ2. We observe y1 distributed as Binomial(n1, θ1) and y2 distributed as Binomial(n2, θ2).
We want to derive the posterior distributions of θ1 and θ2.
Let’s consider the case of dependent priors for θ1 and θ2. That is, knowledge of the
value of θ1 may influence the prior belief about the location of the second proportion θ2.
For example, the Australian Technical Advisory Group on Immunisation (ATAGI) has
recommended that Pfizer is the preferred vaccine for people aged 60 and under. So what
proportion of people under age 60 are willing jump the Pfizer queue and get the alternative
AstraZeneca vaccine which is more readily available?
Let θ1 denote the proportion of people aged 30-39 who are willing to get the AstraZeneca
vaccine. Let θ2 denote the proportion of people aged 40-49 who are willing to get the
AstraZeneca vaccine. Because we are considering adjacent age groups, the vaccine preferences
of people in the first age group may affect the vaccine preferences of people in the second
age group and vice versa. That is, the belief that θ1 is close to say 7% might lead us to
believe that the value of θ2 is also close to 7%. This belief implies the use of dependent
priors for θ1 and θ2.
What are the options for a dependent prior? Howard (1998) proposed a special form of
dependent prior between θ1 and θ2 expressed as follows. First, consider a logit transformation
of the parameters θ1 and θ2. That is, define
γ1 = log
θ1
1− θ1 and γ2 = log
θ2
1− θ2 .
To model the dependency, let γ2|γ1 ∼ Normal(mean = γ1, stdev = σ). Howard (1998)
proposed the following general form of the dependent prior
p(θ1, θ2) ∝ e−1/2u2θα−11 (1− θ1)β−1θκ−12 (1− θ2)δ−1
where u = 1σ (γ1 − γ2).
(a) [3 marks] Explain the role of each of the hyperparameters (α, β, κ, δ, σ)
(b) [2 marks] Is the joint prior on p(θ1, θ2) defined above a conjugate prior? Explain why
or why not??
Australian National University RSFAS, College of Business and Economics
(c) [6 marks] Suppose the following data are observed from a sample of 30-39 year old
people and 40-49 year old people in Canberra on their willingness to receive an
AstraZeneca vaccine.
Age group Yes No Total
30-39 3 15 18
40-49 1 11 12
Write an R function to compute the value of the posterior density p(θ1, θ2|y1, y2).
Draw contour plots of the posterior distribution for values of the parameter σ = 2,
1, 0.5 and 0.25 respectively.
(d) [4 marks] For each of the four assumed values of σ in part (c), compute the posterior
probability that θ1 > θ2 (that is, Pr(θ1 > θ2|y1, y2)) by simulating samples from the
posterior distribution p(θ1, θ2|y1, y2).

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468