辅导案例-STAT2004

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Page 1 of 7

Exam information
Course code and name STAT2004 Statistical Modelling and Analysis
Semester Semester 2, 2020
Exam type Online, non-invigilated
Exam date and time Please refer to your personalised timetable
Exam duration
You have a 12-hour window in which you must complete your exam. You
can access and submit your exam at any time within the 12 hours. Even
though you have the entire 12 hours to complete and submit your exam,
the expectation is that it will take most students between 2 and 2.5 hours.
Reading time
Reading time has not been formally allocated for online exams, however
students are encouraged to review and plan their approach for the exam
before they start. The total exam time should be sufficient to do this.
Exam window
You must commence your exam during the time listed in your
personalised timetable. The exam will remain open only for the duration
of the exam.
Weighting This exam is weighted at 70% of your total mark for this course.
Permitted materials This is an open book exam – you may use any course resource, including lecture notes, reference notes, statistical software and online resources.
Instructions
Answer all the questions. Note that not all questions are worth the same
number of marks. Allocate your time appropriately for each question.
You need to write or type your answers on blank paper (clearly label your
solutions so that it is clear which problem it is a solution to).
You must submit your answers as a single pdf file through Blackboard
before the end of the allowed time.
For questions requiring an audio response, please upload these as
separate audio files via Blackboard before the end of the allowed time.
You should include your name and student number on the first page of
the file that you submit.
Who to contact
If you have any concerns or queries about a particular question, or need
to make any assumptions to answer the question, state these at the start
of your solution to that question. You may also include queries you may
have made with respect to a particular question, should you have been
able to ‘raise your hand’ in an examination room.
If you experience any technical difficulties during the exam, contact the
Library AskUs service for advice (open 7am–10pm, 7 days a week,
Brisbane time):
Chat: https://support.my.uq.edu.au/app/chat/chat_launch_lib/p/45
Phone: +61 7 3506 2615
Email: [email protected]
You should also ask for an email documenting the advice provided so you
can provide this on request.
In the event of a late submission, you will be required to submit evidence
that you completed the exam in the time allowed. We recommend you use
a phone camera to take photos (or a video) of every page of your exam.
Ensure that the photos are time-stamped.
If you submit your exam after the due time then you should send details
(including any evidence) to SMP Exams ([email protected]) as
soon as possible after the end of the exam.
Page 2 of 7

Important exam
condition information
The normal academic integrity rules apply.
You cannot cut-and-paste material other than your own work as answers.
You are not permitted to consult any other person – whether directly,
online, or through any other means – about any aspect of this assessment
during the period that this assessment is available.
If it is found that you have given or sought outside assistance with this
assessment then that will be deemed to be cheating and will result in
disciplinary action.
By undertaking this online assessment you will be deemed to have
acknowledged UQ’s academic integrity pledge to have made the following
declaration:
“I certify that my submitted answers are entirely my own work and that I
have neither given nor received any unauthorised assistance on this
assessment item”.

STAT2004 - Statistical Modelling and Analysis
Semester Two Final Examinations, 2020
Question 1 (9 marks)
In an exciting scene1 from the 1980 space-opera film “Flash Gordon”, two protagonists take
turns in putting their arms into different holes of a large tree stump, where the wood beast
lives, in order to see who is subjected to its fatal sting. In a twist on this competition, the
sting may take two days to kill, so the game continues even if a sting is made.
Suppose that there are nine holes in this stump, all of which are identical looking except that
three of them lead to a fatal sting. You are to take on the role of one of the protagonists.
(a) (1 mark) Suppose you go first. What is the probability that you do not get stung on
your first turn?
(b) (1 mark) Suppose you went first and did not get stung. What is the probability that
your competitor also does not get stung on their first turn, given that you did not get
stung on your first turn?
(c) (1 mark) Suppose you went first and got stung. What is the probability that your
competitor does not get stung on their first turn, given that you got stung on your first
turn?
(d) (2 marks) Suppose that we stop the game after each player has taken exactly one turn.
Combining your results from parts (a)–(c), or otherwise, show that the probability that
you do not get stung is the same regardless of whether you go first or second.
(e) (4 marks) Now suppose we keep playing until every hole has been tried exactly once.
Would it be better to go first or second? Justify your answer using probability calcula-
tions.
Question 2 (10 marks)
Suppose we tossed a fair coin n times, obtaining x = 16 heads. However, we later forgot the
number of times we tossed the coin! So n is an unknown value here, and we would like to
estimate it based on our observed outcome of x = 16 heads.
(a) (2 marks) Write down the likelihood function for the parameter n given the observed
data x = 16.
(b) (2 marks) Show that this likelihood function is maximised at both nˆ = 31 and nˆ = 32.
(c) (2 marks) [Audio Question:] Give an intuitive explanation as to why the likelihood
is maximised at two values here.
(d) (2 marks) Let X ∼ Binom(n, 0.5). Find the largest n value such that P (X ≥ 16) ≤ 0.05
and the smallest n value such that P (X ≤ 16) ≤ 0.05.
(e) (2 marks) Using your results from part (d), or otherwise, construct and interpret a 90%
confidence interval for n based on the observed data x = 16.
(Question 3: See next page)
Page 3 of 7
1https://www.youtube.com/watch?v=80sCD2p0W1Q
STAT2004 - Statistical Modelling and Analysis
Semester Two Final Examinations, 2020
Question 3 (17 marks)
Let Z1, Z2, . . . be independent and identically distributed copies of Z, where Z is a Gaussian
random variable with mean 0 but some unknown variance σ2. For each n, define a random
variable Sn by
Sn =
n∑
i=1
Zi
2.
(a) (3 marks) Evaluate E(Z2) and E(Z4). (Hint: moment generating functions may be
helpful here.)
(b) (4 marks) Let Y = Z2. Find the probability density function for Y , and calculate its
mean and variance.
(c) (4 marks) Using the Central Limit Theorem, or otherwise, show that for every constant
k the probability P(n− k√2n ≤ Sn/σ2 ≤ n+ k
√
2n) tends to a limit as n→∞ .
(d) (4 marks) Using the result from part (c), or otherwise, construct an approximate 95%
confidence interval for the population variance σ2 based on the following sample of 24
draws from a N(0, σ2) distribution:
z=c(-1.0772249, -3.8338972, -3.9409255, 0.7811483, -0.0734284, 4.2074467,
0.1049149, -2.8546731, 4.3560279, -0.8312923, 2.5316146, 1.1064087,
-0.9579516, -1.7817389, -2.7417048, -1.2759260, 0.4189448, -2.4035361,
-1.5782865, 4.3048260, 0.6618659, -3.7423525, 1.5266717, 3.7929504)
Feel free to copy-and-paste this dataset into R (or any other language of your choice).
(e) (2 marks) [Audio question:] Give an interpretation of the confidence interval you
computed in part (d) in a way that is understandable for another STAT2004 student.
(Question 4: See next page)
Page 4 of 7
STAT2004 - Statistical Modelling and Analysis
Semester Two Final Examinations, 2020
Question 4 (17 marks)
Suppose that the waiting time X between buses on a certain route can be modelled by an
exponential distribution with pdf given by
f(x;λ) = λe−λx , x ≥ 0 ,
where λ > 0 is an unknown frequency parameter. The bus company, LinksTran, claims that
buses on this route arrive once every 12 minutes on average, but a customer suspects that
the true average waiting time is longer than this. To investigate this dispute, the customer
turns up to her local bus stop at 14 randomly selected times during a week, and records the
waiting times X1, X2, . . . , X14 for buses on this route to arrive.
(a) (1 mark) Briefly explain why the null and alternative hypotheses here are H0 : λ = 1/12
and H1 : λ = λ1 < 1/12, respectively.
(b) (1 mark) Write down the likelihood function for λ given the dataX = (X1, X2, . . . , X14)
>.
(c) (5 marks) Construct a likelihood ratio test for testing H0 : λ = 1/12 against H1 : λ =
λ1 < 1/12, and show that it reduces to a test with rejection region{
X1 +X2 + . . .+X14 ≥ d , for some critical value d
}
.
(d) (2 marks) [Audio question:] Explain to a LinksTran operations officer what the test
in part (c) is doing and why this test makes intuitive sense.
It is given to you that the sum, Y = X1+X2+. . .+X14, of fourteen independent exponential(λ)
random variables has a gamma distribution with shape=14, rate=λ, and pdf given by
f(y) =
λ14
Γ(α)
y13e−λy , y ≥ 0 .
You do not have to show this result – it is given to you as a fact.
(e) (2 marks) Using the pgamma function in R, or otherwise, show that setting the critical
value d in the test to be 248.023 minutes controls the Type I error at 5%.
(f) (2 marks) Suppose that the true rate of buses on this route is in fact once every 24
minutes. Using the pgamma function in R, or otherwise, compute the power of this test.
(g) (1 mark) Is this test the most powerful at the 5% level for testing H0 : λ = 1/12 against
H1 : λ < 1/12? Briefly explain why, or why not.
The observed waiting times of buses for these 14 randomly selected times were:
x=c(35.441696, 7.690213, 2.786401, 32.883485, 1.700997, 12.671055, 20.412970,
28.720957, 24.971242, 2.974075, 14.157260, 23.623562, 12.801789, 30.215812)
Feel free to copy-and-paste this dataset into R (or any other language of your choice).
(h) (1 mark) Decide between the two hypothesis from part (a) based on the observed data.
(i) (2 marks) Supplement your decision from part (h) by computing a p-value. You may
use the pgamma function in R, or otherwise, to help you answer this question.
(Question 5: See next page)
Page 5 of 7
STAT2004 - Statistical Modelling and Analysis
Semester Two Final Examinations, 2020
Question 5 (17 marks)
A paper manufacturer is interested in improving their product’s tensile strength. The engi-
neering team at this manufacturer believes that tensile strength is a function of the hardwood
concentration in the pulp, and that the range of hardwood concentrations of practical interest
is between 5% and 20%. This team of engineers decides to investigate four levels of hard-
wood concentrations: 5%, 10%, 15%, and 20%. They decide to make up six test specimens
at each concentration level by using a pilot plant. All 24 specimens were then tested on a
laboratory tensile tenser in random order. A side-by-side boxplot of the tensile strengths for
each hardwood concentration is given below:
A one-way analysis-of-variance (ANOVA) is used to test whether there are systematic differ-
ences between the tensile strengths across the four different hardwood concentrations.
(a) (3 marks) Write down the underlying statistical model for a one-way ANOVA, including
a full description of all the parameters and any distributional assumptions required by
the model.
(b) (2 marks) State the appropriate null and alternative hypotheses for this scenario.
(c) (3 marks) Show that the ANOVA model from part (a) can be written in the linear
model form “y = Xβ + ”, by clearly defining each of the terms y, X, β and here.
(Question 5 continues on next page)
Page 6 of 7
STAT2004 - Statistical Modelling and Analysis
Semester Two Final Examinations, 2020
A partially-complete one-way ANOVA table for these data is given below:
Source df SS MS F Pr(F)
Hardwood % 382.79
Error
Total 512.96
(d) (3 marks) By completing the above ANOVA table, or otherwise, make a decision be-
tween the null and alternative hypotheses. Support your decision by computing and
interpreting a p-value, and write your conclusions in a way that is understandable to
the engineering team.
Using their knowledge of physics, the engineering team proposes a square-root law for the
tensile strength of paper as a function of hardwood concentration:
Tensile strength = β ×
√
Hardwood concentration + , ∼ N(0, σ2) .
(e) (2 marks) Show that the square-root model for this dataset can be written as a con-
strained version of the ANOVA model from part (a). Clearly describe how the ANOVA
model parameter(s) are being constrained and identify the number of free parameters
in the two models.
The fitted square-root model is given in the output below:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
sqrt(hardwood.conc) 4.6481 0.1443 32.22 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.499 on 23 degrees of freedom
Multiple R-squared: 0.9783,Adjusted R-squared: 0.9774
F-statistic: 1038 on 1 and 23 DF, p-value: < 2.2e-16
(f) (3 marks) Using the given output and the ANOVA table from part (c), or otherwise,
test whether the square-root law provides an adequate fit to the data. Do this by stating
the null and alternative hypotheses, computing a test statistic, and finding a p-value for
your test.
(g) (1 mark) [Audio question:] Explain the conclusions from your test in part (f) in a
way that is understandable to the engineering team.
END OF EXAMINATION
Page 7 of 7

欢迎咨询51作业君