FIT3154 Assignment 1

Due Date: 12:00PM, Monday, 12/9/2022

Introduction

There are a total of three questions worth 17 + 15 + 14 = 46 marks in this assignment.

This assignment is worth a total of 20% of your final mark, subject to hurdles and any other matters

(e.g., late penalties, special consideration, etc.) as specified in the FIT3154 Unit Guide or elsewhere

in the FIT3154 Moodle site (including Faculty of I.T. and Monash University policies).

Students are reminded of the Academic Integrity Awareness Training Tutorial Activity and, in par-

ticular, of Monash University’s policies on academic integrity. In submitting this assignment, you

acknowledge your awareness of Monash University’s policies on academic integrity and that work is

done and submitted in accordance with these policies.

Submission: No files are to be submitted via e-mail. Correct files are to be submitted to Moodle.

Scans of handwritten answers are acceptable but theymust be clean and legible. You must ensure your

submission contains answers to the questions in the order they appear in the assignment. Submission

must occur before 12:00 PM Monday, 12th of September, and late submissions will incur penalties as

per Faculty of I.T. policies.

1

Question 1 (17 marks)

Background: The Gamma Distribution

In this question we will be looking at the gamma distribution. This is a distribution frequently used

in the analysis of non-negative real numbers. This is introduced in Lecture 6, but here we will be

using a different (but equivalent) form parameterised instead in terms of the mean µ and shape φ with

probability density

p(y |µ, φ) = 1Γ(φ)

(

φ

µ

)φ

yφ−1 exp

(

−φy

µ

)

.

where Γ(·) is known as the “gamma function”. Figure 1 shows several example gamma distributions.

The following facts about the gamma distribution will prove very helpful for answering the questions.

If Y ∼ Ga(µ, φ) is a random variable following a gamma distribution with mean µ and shape φ then

E [Y ] = µ, and V [Y ] = µ

2

φ

so that µ controls the mean, and φ inversely scales the variance. Given a sample y = (y1, . . . , yn) the

maximum likelihood estimate, and Fisher information, for µ are well known to be

µˆ = y¯ and Jn(µ) =

nφ

µ2

(1)

respectively, where y¯ = (1/n)

∑n

i=1 yi is the sample mean. The estimates for φ are complicated, so

initially we will proceed assuming φ is known; we will later relax this assumption.

Bayesian Analysis of the Gamma Distribution with Known φ

Let us now consider a Bayesian analysis of the gamma distribution with known φ using the hierarchy:

yi |µ ∼ Ga(µ, φ), i = 1, . . . , n

µ | a, b ∼ IG(a, b)

where IG(a, b) denotes an inverse-gamma distribution with shape a and scale b1. A useful fact is that

if Y ∼ IG(a, b) and a > 1 then

E [Y ] = b

a− 1 .

One reason people use the inverse-gamma distribution as a prior for the mean of the gamma distribution

is that it is flexible; Figure 2 demonstrates several possible different inverse-gamma distributions. It

is also convenient, as the posterior distribution is then also an inverse gamma distribution of the form

µ |y, φ, a, b ∼ IG (nφ+ a, nφy¯ + b) .

The posterior mean of µ is then given by

E [µ |y] ≡ µˆa,b = φny¯ + b

φn+ (a− 1) . (2)

which can be used as an estimate (best guess) of the unknown population µ. You should install

the R package invgamma. This gives you access to the functions dinvgamma(), pinvgamma() and

qinvgamma() which will be crucial in answering some questions.

Important: for some reason, this package refers to the scale parameter of an inverse

gamma distribution as a rate parameter.

1https://en.wikipedia.org/wiki/Inverse-gamma_distribution

2

Consider using (2) as an estimate of the unknown population µ from a sample of data y =

(y1, . . . , yn) coming from a gamma distribution with mean µ and shape φ. The bias and variance

of this estimator is

biasµ(µˆa,b) =

(1− a)µ+ b

nφ+ a− 1 , Vµ [µˆa,b] =

nµ2φ

(nφ+ a− 1)2 (3)

Please answer the following questions. Provide appropriate working/mathematical justification.

1. If φ = 1, how can we interpret the hyperparameters a and b? [2 marks]

2. Describe the dependency of the variance of the estimator on the population values of µ and φ.

[1 mark]

3. Is the estimator (2) asymptotically efficient as n→∞? [2 marks]

It is common to use so called “normalized” squared-error for many problems in which the mean and

variance are linked. For the gamma distribution the normalised squared-error is

L(µ, µˆ) = (µˆ− µ)

2

(µ2/φ) .

and the normalized squared-error risk is therefore

R(µ, µˆ) =

(

φ

µ2

)

E

[

(µˆ− µ)2] .

Please answer the following questions regarding this loss function.

4. Why do you think the normalized squared-error loss might be preferable to the usual (unnor-

malized) squared-error loss? [1 mark]

5. Making use of (3), what is the normalized squared-error risk for the estimator when b = 0?

Comment on the risk. [2 marks]

6. Prove that when a = 1, the choice b = 0 dominates the choice b > 0. [1 mark]

7. Plot the normalized squared-error risk (expected normalized squared-error) for n = 5, φ = 2 and

µ ∈ (0.01, 10) for (a = 1, b = 0) (i.e., maximum likelihood), (a = 1/2, b = 1) and (a = 2, b = 1/2).

You may plot all curves on the one plot. For readability, use a logarithmic scale for the y-axis

and set the limits for the y-axis from (0.1, 2). Describe the key features of the risk curves. Over

the range of µ values considered in our plot do any of the estimators appear to be inadmissable

(that is, not admissable)? [3 marks]

8. Given an a > 1 and a b > 0, for which population value of µ does the estimator (2) attain the

smallest normalized squared-error risk? [2 marks]

Sometimes the alternative log-mean parameterisation v = logµ is used for the gamma distribution.

This is in some (statistical) ways a more natural parameterisation than the mean µ. Under this

parameterisation the probability density is

p(y | v, φ) = 1Γ(φ)φ

φyφ−1 exp

(−φe−vy − φv)

where ex ≡ exp(x). Please answer the following questions.

7. Derive the Fisher information Jn(v) for the log-scale parameter v. (hint: remember the observa-

tions are independently and identically distributed) [2 marks]

8. Provide an argument as to why v = logµ might be more “natural” by comparing the Fisher

informations for v, Jn(v), to the Fisher information for µ, Jn(µ) given by (1). [1 mark]

3

Question 2 (15 marks)

In this question we will use the gamma distribution to analyse some topical and relevant data: case

numbers of people in Victoria, Australia infected with the novel coronavirus (Covid-19). The data we

will use was obtained from the the Victorian government public health website. In particular, we will

analyse the daily case numbers for the month of July, 2022 from covid.daily.cases.07.2022.csv.

You will also need to download the file gamma.fit.R. This contains functions that will assist you

in fitting a gamma distribution to the data; in particular it contains code for maximum likelihood

estimation and Bayesian sampling.

Let us first analyse the data using maximum likelihood and the asymptotic distribution of maximum

likelihood estimates. Please answer the following questions. Provide working/justifications/code as

required to support your answers.

1. Fit a gamma distribution to the data using maximum likelihood via the function gamma.ml(y).

What is the estimated mean number of daily cases µ for the month of July 2022? Provide an

approximate 95% confidence interval for this estimate using the asymptotic distribution of the

maximum likelihood estimate. [3 marks]

2. The average daily case numbers in the previous month was roughly 7,200. Using the asymptotic

distribution of the ML estimate of µ, test the hypothesis that the average number of daily cases

in July 2022 is the same as in June 2022. Clearly specify the hypothesis you are testing and

discuss the resulting p-value. [3 marks]

Let us perform a Bayesian analysis. We will use the function gamma.sampling(y,a,b,n.samples)

to generate samples from the posterior distribution for the following hierarchy:

yj |µ, φ ∼ Ga(µ, φ)

µ | a, b ∼ IG(a, b)

φ ∼ C+(0, 1)

where y is the data, a and b are the hyperparameters for the inverse-gamma prior distribution for µ,

and n.samples is the number of samples to generate. The hierarchy uses a default weakly informative

half-Cauchy prior for φ, so we do not need to specify any hyperparameters for this prior. Using the

daily case numbers from the previous four and a half months, we find that a reasonable prior guess at

the average daily case numbers is around 8,500, with a 95% prior probability of the average daily case

numbers ranging anywhere from 5,000 to 13,000 cases.

3. Select appropriate values of a and b for our inverse gamma prior distribution to (approximately)

encode the specified information (hint: use the pinvgamma() function). [2 marks]

4. Generate 100, 000 samples from the posterior distribution using your specified prior hyperparam-

eters. Plot the prior and posterior distributions of µ (hint: use the dinvgamma() function and

histogram, as appropriate). [2 marks]

5. Report on the posterior mean and 95% credible intervals for µ. How do these compare to our

maximum likelihood estimates? (hint: use the quantile() function) [2 marks]

6. Do you think the Bayesian analysis supports the hypothesis that the average daily case numbers

for July, 2022 are the same as June, 2022? [1 mark]

7. It is important for authorities to have an idea on the likely range of daily case numbers. Provide

an estimate of the 97.5-th percentile of daily case numbers using your Bayesian samples. Provide

a 95% credible interval for this estimate (hint: use the qgamma.mu() function provided). [2

marks]

4

Question 3 (14 marks)

This question will require you to analyse a regression dataset using Bayesian lasso regression. In

particular, you will be looking at predicting the fuel efficiency of a car (in kilometers per litre) based

on characteristics of the car and its engine. This is clearly an important and useful problem. The

dataset fuel.2022.csv contains n = 1, 000 observations on p = 9 predictors obtained from actual fuel

efficiency tables for car models available for sale during the years 2017 through to 2020. The data

dictionary for this dataset is given in Table 1.

To analyse this data in a Bayesian fashion you will use the bayesreg package in R.

1. Fit a regression model to the fuel efficiency data using the least-squares method in R (i.e., using

lm()). From this output, what variables do you believe are potentially associated with increased

fuel efficiency, and why? [2 marks]

2. Analyse the fuel efficiency data using the Bayesian lasso; use at least 10, 000 samples from the

posterior. Compare posterior means, standard deviations and t-statistics to those obtained using

least-squares. How do they compare, and if you believe they are different, why do you think they

are different? [2 marks]

3. Write down your best estimate of the regression equation relating fuel efficiency to the explana-

tory variables. [2 marks]

4. If we wanted to advise a car manufacturer regarding the choice of an engine to have likely good

fuel efficiency, what does this model suggest we should look for? [2 marks]

5. From your Bayesian analysis, which variables seem unimportant/unlikely to be associated with

fuel efficiency? Justify your answer [2 marks]

6. Imagine that a friend proposes purchasing a particular model of car with the properties given in

Table 2. Use your Bayesian model to:

(a) Predict the mean fuel efficiency for this car. [1 mark]

(b) Provide a histogram of the posterior distribution of the mean fuel efficiency for this car,

and a 95% credible interval on the plausible range of mean fuel efficiency for this car. [1

mark]

(hint: use the rv$beta matrix in the return value from bayesreg() to access the samples of the

coefficients)

7. An engineer suggests they believe there is likely an interaction effect between engine displacement

and the number of cylinders. Using the Bayesian lasso, and statistics produced by the Bayesian

lasso code, assess whether you think this is the case, and what effect it has on the model? [2

marks]

5

Variable name Description Values

Model.Year Year of sale 2017− 2020

Eng.Displacement Engine Displacement (litres, l) 0.9− 8.4

No.Cylinders Number of Cylinders 3− 16

Aspiration Engine Aspiration (Oxygen intake) N: Naturally∗

OT: Other

SC: Supercharged

TC: Turbocharged

TS: Turbo+supercharged

No.Gears Number of Gears 1− 10

Lockup.Torque.Converter Lockup torque converter present? N∗ and Y

Drive.Sys Drive System 4∗: 4-wheel drive

A:All-wheel

F:Front-wheel

P:Part-time 4-wheel

R:Rear-wheel

Max.Ethanol Maximum % of Ethanol allowed 10− 85

Fuel.Type Type of Fuel G∗: Regular Unleaded

GM: Mid-grade Unleaded Recommended

GP: Premium Unleaded Recommended

GPR: Premium Unleaded Required

Comb.FE Fuel Efficiency (km/l) 4.974− 26.224

Table 1: Fuel efficiency data dictionary. The ∗ denotes the reference category for each categorical

variable.

Variable Model.Year Eng.Displacement No.Cylinders Aspiration No.Gears Lockup.Torque.Converter Drive.Sys Max.Ethanol Fuel.Type

Value 2018 3.5 6 N 8 N R 15 GPR

Table 2: Engine Properties.

some text

6

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Figure 1: Example probability density functions for the gamma distribution with mean µ and shape

φ.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0

0.2

0.4

0.6

0.8

1

1.2

Figure 2: Example probability density functions for the inverse gamma distribution with shape a and

scale b.

7

欢迎咨询51作业君