辅导案例-FIT3154-Assignment 2

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

FIT3154 Assignment 2
Due Date: 11:59PM, Sunday, 8/11/2020
Introduction
There are a total of two questions worth 17 + 19 = 36 marks in this assignment.
This assignment is worth a total of 20% of your final mark, subject to hurdles and any other matters
(e.g., late penalties, special consideration, etc.) as specified in the FIT3154 Unit Guide or elsewhere
in the FIT3154 Moodle site (including Faculty of I.T. and Monash University policies).
Students are reminded of the Academic Integrity Awareness Training Tutorial Activity and, in par-
ticular, of Monash University’s policies on academic integrity. In submitting this assignment, you
acknowledge your awareness of Monash University’s policies on academic integrity and that work is
done and submitted in accordance with these policies.
Submission: No files are to be submitted via e-mail. Correct files are to be submitted to Moodle,
as given above. You must submit your files in a single ZIP archive. Your ZIP file should contain the
following:
1. One PDF file containing non-code answers to all the questions that require written answers. This
file should also include all your plots, and must be clearly written and formatted.
2. The required R and Stan files containing R and Stan code answers.
Please read these submission instructions carefully and take care to submit the correct files in the
correct places.
1
Question 1 (17 marks)
In this question we will again examine Bayesian inference of the exponential distribution, but this
time using a different prior distribution. Recall that the exponential distribution is frequently used in
the analysis of time-to-failure or time-to-event (or so called reliability) data, and has the probability
density
p(y | θ) =
(
1
θ
)
exp
(
−y
θ
)
where θ > 0 is the scale parameter of the distribution. In Assignment 1 we used an inverse gamma prior
distribution as this resulted in a simple posterior. This time we will use a heavy-tailed half-Cauchy
prior distribution with probability density
pi(θ | s) = 2
spi (1 + θ2/s2) .
If a RV X follows a half-Cauchy with scale s, we say X ∼ C+(0, s). Our hierarchy is then
yi | θ ∼ Exp(θ), i = 1, . . . , n (1)
θ | s ∼ C+(0, s) (2)
where s can be thought of as setting the value of the “best guess” of θ before we see the data. For
reference, the analysis in Assignment 1 used an inverse gamma prior distribution with hyperparameters
a = 5.85 and b = 68, which encoded a prior best guess for θ of 14. The posterior mean was E [θ |y] ≈
14.45 and the 95% credible interval was ≈ (13.25, 15.77). We do not have access to the exact values
of the Covid recovery data we analysed last assignment, but the file covid.sim.csv contains n = 502
simulated observations with a sum of 7, 258.9 as per the statistics reported by the Israeli government.
1. Implement the hierarchy (1)-(2) in Stan. Provide the Stan code in the report. (hint: note that
the Stan implementation of the exponential is in terms of an inverse-scale parameter β = 1/θ
instead of the scale parameter θ) [2 marks]
2. Use this Stan code to analyse the provided Covid-19 recovery “data” in covid.sim.csv. Use
s = 14 as your prior best guess. Use four chains of at least 20, 000 samples each. Plot the
histogram of the samples, and report on the posterior mean and the 95% credible intervals;
compare these to the posterior mean and intervals found using the inverse-gamma prior from
Assignment 1 (summarised above). [3 marks]
The second part of this question will look at finding the maximum a posteriori (MAP) estimator, which
is found by maximising the posterior, or equivalently minimising the negative log-posterior, i.e,.
θˆMAP = argmin
θ
{− log p(y | θ)pi(θ|s)} (3)
Please answer the following questions:
3. Write down the formula for the negative log-posterior, i.e., − log p(y | θ)pi(θ|s), up to constants
not dependent on θ. Make sure to simplify the expression. [2 marks]
4. Prove that maximising the posterior is equivalent to solving
(n+ 2)θ3 − Y θ2 + ns2θ − s2Y = 0 (4)
where Y =
∑n
i=1 yi. [2 marks]
2
5. Download the function exp.hc.map.R. This contains the function exp.map(y,s) which finds the
MAP estimator by solving equation (4).
Using this function and the data in covid.sim.csv, compute the MAP estimator for s = 14
(our prior best guess based on SARS recovery). Compare this estimate to the posterior mean
using the half-Cauchy found in Question 1.2. Discuss why you think they are similar/different.
[1 mark]
6. Using function exp.map(y,s), explore the sensitivity of the MAP estimator for our Covid-19
data when using choices of prior best guess that are quite different from our guess based on the
SARS recovery time. In particular, try the values
s =
{
10−4, 10−3, . . . , 1, . . . , 106, 107
}
.
Plot the MAP estimates using these values of s against the values of s. How do these estimates
compare to using s = 14? What does this say about the half-Cauchy prior? [2 marks]
7. Find the (asymptotic) formula for the MAP estimate of θ, i.e., the solution to (4) when (i) s→ 0,
and (ii) s→∞. Comment on these. [2 marks]
As discussed in Assignment 1, it is sometimes more natural to use the alternative parameterisation
v = log θ for the exponential distribution.
8. Transform the half-Cauchy prior (3) on θ to a prior on v using the transformation of random
variables technique from Lecture 5. [2 marks]
9. Plot this prior distribution for the choice of hyperparameters s = 10−1, s = 1 and s = 10. How
does s affect the prior on v? [1 mark]
3
Question 2: Exponential Regression (19 marks)
In this question we are going to examine an extension to the exponential distribution to a regression
setting. Let y = (y1, . . . , yn) be n non-negative real number observations that we would like to model,
and let xi = (xi,1, . . . , xi,p) be a vector of p predictors associated with the i-th data point. Exponential
regression models extend the usual exponential distribution in the following way:
yi |xi,1, . . . , xi,p ∼ Exp(θi) (5)
θi = exp
β0 + p∑
j=1
βjxi,j

where β = (β1, . . . , βp) are the coefficients relating the predictors to the target and β0 is the intercept.
In this way the exponential regression model extends the regular linear regression model to modelling
non-negative real number data. The effects of the predictors are additive on the log scale, and therefore
multiplicative on the original scale.
Part I
The first part of this question will examine using the exponential regression model. Download the file
diabetes.train.ass2.2020.csv. This contains data regarding the diabetes progression on n = 400
individuals. The target, Y, is a measure of diabetes progression, with a higher score indicating a worse
progression.
1. Using the glm() function fit an exponential regression using maximum likelihood to the diabetes
data, using Y as the target (hint: use the family=Gamma(link="log") option to fit an exponential
regression). Use summary() to inspect the fitted model. From this model:
(a) which variables do you think are associated with disease progression, and why? [2 marks]
(b) how do sex and blood pressure (bp) affect diabetes progression? (note: sex is coded as
0=male, 1=female and bp is measured in millimeters of mercury (mmHg)) [2 marks]
2. We previously analysed this diabetes data using a regular linear (Gaussian) regression model.
Fit a linear model with Gaussian targets to this data using lm(). Using summary(), compare the
two fitted models (exponential and Gaussian). How do they differ in terms of which variables
they suggest are important? [1 marks]
3. The file diabetes.test.ass2.2020.csv contains data on a further n′ = 42 individuals. Produce
predictions for both your exponential regression model (use the type="response" option when
predicting) and Gaussian linear regression model. From these predictions, which model do you
think is a better fit to the testing data? [1 mark]
Part II
There is no simple closed-form formula for the maximum likelihood estimates for an exponential regres-
sion. The second part will require you to write code to fit an exponential regression using maximum
likelihood and gradient descent. To get you started I have provided a script called expreg.gd.R which
contains two functions: expreg.gd() and my.standardise().
The main function expreg.gd() is a skeleton that takes a y and X, standardises the predictors
(as per Lecture 3) using my.standardise() and then has space for you to fill in with your gradient
descent code. Once your code has finished running, the function unstandardises the coefficients back
4
to the original scale. Download the script df2matrix.R. It contains a function called df2matrix()
that takes a data frame and a formula and returns an appropriate matrix of predictors X and vector
of targets y. Answer the following questions:
4. Write down the expression for the negative log-likelihood of data y = (y1, . . . , yn) under the
exponential regression model (5) with predictor matrix X of size n× p, coefficient vector β and
intercept β0. [2 marks]
5. Find the expressions for the gradient of the negative log-likelihood with respect to the intercept
β0 and the coefficients β1, . . . , βp. [4 marks]
6. Write an R function that takes a vector of data y, a matrix of predictors, X, an intercept beta0
and a vector of coefficients beta and returns: (i) the negative log-likelihood and (ii) the gradients
with respect to β0 and the coefficients β1, . . . , βp [2 marks]
7. Implement gradient descent to find the maximum likelihood estimates of the intercept β0 and
coefficients β1, . . . , βp. Use the function expreg.gd.R as basis to get started. The function takes
as arguments
• An X matrix of predictors and y vector of targets
• An initial value for the learning rate κ.
• The size of the largest absolute value of the gradient used for termination (max.grad).
The function does some initialisation of the estimates of β0 and β for you by using least-squares
onto the log-transformed targets. This are good starting points for a search. You will need to
write the code to find the maximum likelihood estimates βˆ0 and βˆ1, . . . , βˆp for the X and y that
were passed using the gradient descent algorithm. Once it has finished, it should return a list
containing:
• The ML estimate of the intercept β0.
• The ML estimates of the coefficients β1, . . . , βp
• The negative log-likelihood at your final estimates
• The gradient at your final estimates
• The negative log-likelihood recorded at every iteration of your gradient descent loop.
Some further details
• Your learning rate, κ, must be adaptive.
• Your algorithm should terminate when the largest absolute value of the gradients is less
than the max.grad argument passed to your function.
• When writing your code, you should consider computational efficiency and ensure that it is
cleanly written and commented.
[4 marks]
8. Using the gradient descent code you have written, fit an exponential regression to the diabetes
data you analysed previously. You should compare your estimates to those obtained using the
glm() function to make sure your code is working properly. Plot the negative log-likelihoods
recorded for every iteration of your descent loop (i.e., plot how the negative log-likelihood de-
creased as the algorithm ran until it reached the minima). [1 mark]
5
At the end of this assignment, you should reflect on how much you have learned since you started
this unit. You now have learned how to use the Bayesian approach to statistical inference, you have
ideas on how to analyse non-Gaussian data using regression models, and have even managed to write
a program to fit such a model using maximum likelihood!
6

欢迎咨询51作业君