程序辅导案例 > Program >

代写辅导接单-STAT2004/2904/7004 2024 – Assignment 4

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

STAT2004/2904/7004 2024 – Assignment 4

Due date: 25 October 2024 at 16:00

STAT2004/7004: Complete Exercises 1–4 for a maximum of 40 marks and a total of 10%.

STAT2904: Complete Exercises 1–5 for a maximum of 50 (+5 bonus) marks and a total of

10% (+1% bonus).

Allstudents: Exercise6isabonusquestion. Acompleteandcorrectsolutionofthisquestion

earns you an extra 1% for this assignment.

Note that some questions involve interpretation and communication of results in the

form of an audio recording which you upload onto Blackboard as an audio file.

Reminder: while discussion of the Assignment questions (amongst yourselves, with lecturers

and/or tutors) is encouraged, the final write-up must be your own. If you cannot express a

solution in your own words, then you must cite your source(s).

Question 1 (Testing exponential rates) (8 marks)

Let X ,X ,...,X be a random sample from an exponential distribution with pdf

1 2 7

f (x) = λe−λx , x ≥ 0 ,

and Y ,Y ,...,Y be another independent sample from an exponential distribution with pdf

1 2 8

f (y) = θe−θy , y ≥ 0 .

Here, λ > 0 and θ > 0 are both unknown parameters. We want to test the null hypothesis

H : λ = θ versus the alternative hypothesis H : λ ̸= θ.

0 1

(a) (2 marks) Show that under the null hypothesis, the maximum likelihood estimator of

λ = θ is given by

λˆ = θˆ= .

(cid:80) (cid:80)

X + Y

i j

(b) (1 mark) Show that under the alternative hypothesis, the maximum likelihood estima-

tors of λ and θ are given respectively by

7 8

λˆ = , and θˆ= .

(cid:80) (cid:80)

X Y

i j

H : λ ̸= θ, and show that it reduces to a test based on large or small values of the test

statistic

(cid:80)

T(X,Y) = .

(cid:80) (cid:80)

X + Y

i j

It is given to you that T(X,Y) ∼ Beta(7,8) under the null hypothesis H : λ = θ.

(d) (1 mark) Explain how you would set critical value(s) for your test from part (c) to

control the Type I error at α = 5%, and write down your decision rule explicitly using

these critical value(s).

(e) (1 mark) [Audio question]: Is your test from parts (c)–(d) uniformly most powerful

for testing H : λ = θ versus H : λ ̸= θ at the 5% significance level? Briefly explain

0 1

why, or why not.

Question 2 (Comparing ratings across groups) (8 marks)

A recent poll asked social media users to provide their opinions on a decision by a popular

photo-sharing app to remove the number of “likes” from their posts. Each respondent was

asked to express their opinion on the following five-point scale:

1 = Strongly disagree

2 = Disagree

3 = Neutral

4 = Agree

5 = Strongly Agree

Of the n = 198 respondents, 98 were “influencers” (with over 10,000 followers each) while

the other 100 were regular users. The full dataset can be downloaded as a .csv file from

Blackboard > Assessment > Assignment 4 > likes.csv.

(a) (1 mark) Visualise the data using an appropriate graph(s).

(b) (3 marks) Do the two types of users exhibit differing opinions regarding the recent

changes to the photo-sharing app? Answer this question by carrying out an appropri-

ate hypothesis test. Clearly state the null and alternative hypotheses, propose a test

statistic, compute and interpret a p-value, and write your conclusions in a way that is

understandable to a social scientist.

(d) (2 marks) [Audio question]: A social scientist suggests comparing the two groups

using a two-sample t-test applied directly to the five-point responses. Explain to her

why this is inappropriate here.

Question 3 (Tuberculosis and blood type) (14 marks)

Overfield and Klauber (1980) published the following data on the incidence of tuberculosis

in relation to ABO blood groups in a sample of Eskimos:

blood type

tuberculosis severity O A AB B

moderate or advanced 7 5 3 13

minimal 27 32 8 18

not present 55 50 7 24

We want to investigate whether tuberculosis incidence is related to blood type.

Let p denote the underlying proportion of the population with tuberculosis severity i ∈

{moderate/advanced, minimal, not present} and blood type j ∈ {O, A, AB, B}. For con-

venience, write p = (p ) for the 3×4 vector of proportions.

(a) (1 mark) Write down the null and alternative hypotheses in words.

(b) (1 mark) Write down the likelihood function for p given the observed counts x.

Under the null hypothesis, p = p ×p for each i and j, where p is the overall proportion

ij i• •j i•

with tuberculosis severity level i and p is the overall proportion with blood type j.

•j

i• j•

given, respectively, by

pˆ = x /n and pˆ = x /n ,

i• i• •j •j

where x is the observed number of cases of tuberculosis severity i, x is the observed

i• •j

number of cases of blood type j, and n is the total sample size.

(d) (1 mark) Using the results from part (c), or otherwise, what counts would we expect

to see in each cell of the table if the null hypothesis is indeed true?

Under the alternative hypothesis, there are no restrictions on the cell proportions p (except

that they must all sum to 1).

(e) (1 mark) State the ML estimates pˆ of each cell proportion p under the alternative.

ij ij

(You do not have to prove that these are the MLEs).

(f) (2 marks) Using your results from parts (b), (c) and (e), or otherwise, numerically

evaluate the generalized likelihood ratio test statistic,

sup L(p|x)

Λ = H0 ,

sup L(p|x)

for testing the association between tuberculosis and blood type based on the observed

counts in the table above. Also, numerically compute the transformation −2logΛ.

(g) (1 mark)Usingyourresultsfrompart(d), orotherwise, computePearson’sχ2 statistic,

(cid:0) (cid:1)2

(cid:88) Observed ij −Expected ij

Expected

cells i,j

Is Pearson’s χ2 statistic numerically close to the −2logΛ statistic from part (f)?

(h) (2 marks) Carry out the hypothesis test by computing and interpreting a p-value, and

state your conclusion in a way that is understandable to a population health scientist.

Notice that one of the cells in the table contains only 3 counts. This may render the asymp-

totic χ2 distribution inaccurate for part (h). Instead, we can consider Fisher’s exact test.

(i) (2 marks) Using an alternative approach, or otherwise, re-do the analysis to account

for the low counts in some of the cells. Does your conclusion from part (h) change?

Question 4 (Weight gain in pigs) (10 marks)

AtrialwasconductedinIowa,USA,examiningtheeffectsofvitaminB12dietarysupplements

and antibiotics on weight gain in pigs. Twelve adult pigs were randomly divided into four

groups (one using standard pig chow, one using pig chow with added vitamin B12, one using

pig chow with added antibiotics, and one using pig chow with both added vitamin B12 and

antibiotics). After one week of feeding, the pigs were weighed and their weight gain (in

grams) was recorded. The data are plotted below:

Vitamin B12

)gk(

niag

thgieW

005

004

003

002

001

Weight gain in pigs, by Vitamin B12 level

and [P]resence or [A]bsence of Antibiotics

A A

No Yes

We can model the weight gains {Y } using a two-way ANOVA with interactions:

jki

Y = µ+α +β +δ +ϵ ,

jki j k jk jki

where j = 1,2 denotes the level of factor A (antibiotics), k = 1,2 denotes the level of factor

B (vitamin B12), and i = 1,2,3 indexes the observations in each group. Assume that the

errors ϵ i ∼id N(0,σ2) across all j,k and i. The common variance σ2 is taken to be unknown.

jki

If we parametrize this model using the contrast constraints,

α = 0, β = 0 and δ = δ = 0 for j,k = 1,2,

1 1 1k j1

then µ can then be interpreted as the mean of the baseline group with no antibiotics and no

vitamin B12, α is the mean change from adding antibiotics only, β is the mean change from

2 2

adding vitamin B12 only, andthe interactionδ is additionalmean change from adding both

antibiotics and vitamin B12 simultaneously.

(a) (3 marks) Show that under the sum constraints the MLE of each parameter is given by

µˆ = Y ,

11•

αˆ = Y −Y ,

2 21• 11•

βˆ = Y −Y ,

2 12• 11•

δˆ = Y −Y −Y +Y .

22 22• 21• 12• 11•

(b) (2 marks) Show that the following sum-of-squares decomposition holds:

SS = SS +SS +SS +SS ,

Total A B AB residual

where

(cid:88)

SS = (Y −Y )2 is the overall sum-of-squares ignoring groups,

Total jki •••

jki

(cid:88)

SS = (Y −Y )2 is the sum-of-squares between levels of factor A,

A j•• •••

jki

(cid:88)

SS = (Y −Y )2 is the sum-of-squares between levels of factor B,

B •k• •••

jki

(cid:88)

SS = (Y −Y −Y +Y )2 is the interaction sum-of-squares,

AB jk• j•• •k• •••

jki

(cid:88)

SS = (Y −Y )2 is the residual sum-of-squares within groups.

residual jki jk•

jki

Hint: Start with the following identity:

Y −Y = (Y −Y )+(Y −Y )+(Y −Y )+(Y −Y −Y +Y )

jki ••• jki jk• j•• ••• •k• ••• jk• j•• •k• •••

residual ∼ χ2 ,

σ2 dfresidual

wheredf = JK(r−1) = 8. [Here,J = 2isthenumberoflevelsoffactorA,K = 2

residual

isthenumberoflevelsoffactorB,andr = 3isthenumberofreplicationsineachgroup.]

(d) (1 mark) Briefly argue why the residual sum-of-squares SS is independent of the

residual

interaction sum-of-squares SS .

Using similar calculations to part (c), it also can be shown that under the null hypothesis

H : all interactions δ = 0, the interaction sum-of-squares has distribution given by

0 jk

AB ∼ χ2 ,

σ2 dfAB

where df = (J −1)(K −1) = 1.

(e) (1 mark) Using parts (c), (d) and the above result, or otherwise, argue why the null

distribution of the so-called F-ratio,

MS SS /df

AB AB AB

F = := ,

MS SS /df

residual residual residual

is an F distribution with numerator degrees-of-freedom df and denominator degrees-

of-freedom df .

residual

A partially-complete two-way ANOVA table for the pigs weight dataset is given below:

Source df SS MS F P

VitaminB12 1 218700 218700 60.33 < 0.0001

Antibiotics 1 19200 19200 5.30 ≈ 0.05

VitaminB12:Antibiotics 1 172800 172800 47.67 < 0.0005

Residuals 8 29000 3625

Total 11 439700

(e) (2 marks) Using your results from parts (b) and (e), or otherwise, complete the above

ANOVA table. Hence, summarise the main finding(s) of this experiment and write a

short conclusion.

Question 5 (STAT2904 only) (10 marks)

Let X ,X ,...,X be iid random variables from a Pareto distribution with pdf

1 2 n

(cid:40)

θνθ

, x ≥ ν ,

f(x|θ,ν) = xθ+1

0 , otherwise ,

where θ,ν > 0 are two unknown parameters.

(a) (4 marks) Find the MLEs for θ and ν

(b) (1 mark) If it is given to you that θ = 1, does that change the MLE for ν?

test (GLRT) for testing

H : θ = 1, ν unknown versus H : θ ̸= 1, ν unknown,

0 1

and show that it reduces to a test based on either small or large values of the statistic

T(X) given by

(cid:20) (cid:81)n

(cid:21)

T(X) = log i=1 i .

(minX )n

To finish specifying this test, we need to set the critical values for T(X) that determine what

is “too small” or “too large”. However, the distribution of T(X) is too difficult to derive

analytically. Instead, we can use simulations to help us find these critical values.

STAT2904 Bonus Questions (5 marks):

(d) (2marks)Forasamplesizeofn = 22,say,simulateonesetofobservationsx ,x ,...,x

1 2 22

from the Pareto distribution with θ = 1 and ν = 2.1. From this realisation, compute

the value of the observed test statistic

(cid:34) (cid:81)22

(cid:35)

T(x) = log i=1 i .

(minx )22

(e) (1mark)Repeatthesimulationsettingfrompart(d)10,000times,eachtimecomputing

and saving the observed test statistic T(x)

(f) (1 mark) Estimate the upper and lower 2.5%-tiles of the distribution of T(X) using the

simulated values from part (e).

(g) (1 mark) Investigate numerically how the cutoff values from part (f) changes if you set

the nuisance parameter ν to another value (e.g., try ν = 1.3, 2.7, 3.4, etc...)

Question 6 (Bonus question for all students) (4 marks)

Let Y and Y be two random samples from a Uniform(λ,λ+1) distribution. To test the

1 2

hypothesis H : λ = 0 versus H : λ > 0, two competing tests are proposed:

0 1

• Geoff’s Test: reject H in favour of H if Y ≥ 0.95.

0 1 2

• Alan’s Test: reject H in favour of H if Y +Y ≥ c for some critical value c.

0 1 1 2

(a) Find the value of c such that Alan’s Test has the same significance level as Geoff’s Test.

(b) Prove or disprove: Alan’s Test is more powerful than Geoff’s Test.

and Geoff’s Test.