辅导案例-MAT00035I

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Module Code
MAT00033I/MAT00035I
BA, BSc and MMath Examinations 2018/9
Department:
Mathematics
Title of Exam:
Probability and Statistics/Statistics option - Statistical Inference II and Linear Models
Time Allowed:
3 hours
Allocation of Marks:
Each question carries 30 marks.
The marking scheme shown on each question is indicative only.
Instructions for Candidates:
This paper contains five questions. Answer all questions.
The first two pages of the question booklet contains tables of probabilities and
quantiles you may use in your answers.
Please write your answers in ink; pencil is acceptable for graphs and diagrams.
Do not use red ink.
Materials Supplied:
Answer booklet
Calculator
Do not write on this booklet before the exam begins.
Do not turn over this page until instructed to do so by an invigilator.
Page 1 (of 9)
MAT00033I/MAT00035I
The following four tables contain quantile information for use in answering exam questions.
All probabilities refer to the lower tail of distributions, i.e. the probability mass to the left of
particular quantiles.
p=0.5 p=0.9 p=0.95 p=0.975
n=7 0 1.41 1.89 2.36
n=8 0 1.40 1.86 2.31
n=9 0 1.38 1.83 2.26
Table 1: Selected quantiles for the t−distribution with n degrees of freedom. e.g.
P (t7 < 1.40) = 0.9.
p=0.5 p=0.9 p=0.95 p=0.975
0 1.28 1.64 1.96
Table 2: Selected quantiles for the unit normal distribution N(0, 1). e.g.
P (z < 1.28) = 0.9.
p= 0.9 p= 0.95 p= 0.975
k= 6 10.64 12.59 14.45
k= 7 12.02 14.07 16.01
k= 8 13.36 15.51 17.53
k= 9 14.68 16.92 19.02
k= 10 15.99 18.31 20.48
k= 11 17.28 19.68 21.92
k= 12 18.55 21.03 23.34
Table 3: Selected quantiles for the χ2-distribution with k degrees of freedom. e.g.
P (χ26 < 10.64) = 0.9.
Page 2 (of 9)
MAT00033I/MAT00035I
λ =1 λ =2 λ =3 λ =4 λ =5
P(X≤ 6 ) 0.89 0.76 0.61 0.45 0.31
P(X≤ 7 ) 0.95 0.87 0.74 0.60 0.45
P(X≤ 8 ) 0.98 0.93 0.85 0.73 0.59
P(X≤ 9 ) 0.99 0.97 0.92 0.83 0.72
P(X≤ 10 ) 1.00 0.99 0.96 0.90 0.82
P(X≤ 11 ) 1.00 0.99 0.98 0.95 0.89
P(X≤ 12 ) 1.00 1.00 0.99 0.97 0.94
P(X≤ 13 ) 1.00 1.00 1.00 0.99 0.97
P(X≤ 14 ) 1.00 1.00 1.00 0.99 0.98
P(X≤ 15 ) 1.00 1.00 1.00 1.00 0.99
P(X≤ 16 ) 1.00 1.00 1.00 1.00 1.00
Table 4: Values of the cumulative mass function for Poisson distributions with rate param-
eters λ = 1, . . . , 5. e.g. P (X ≤ 6 | X ∼ Poisson(λ = 1)) = 0.89.
Page 3 (of 9) Turn over
MAT00033I/MAT00035I
1 (of 5). Body Mass Index (BMI) is used to measure a person’s weight relative to their
height. Medics from the UK’s National Health Service want to estimate the average
BMI of all the country’s residents. They provide you with the following data and
summary statistics, and ask you for advice.
18.87 22.92 17.82 29.98 23.65 17.9 24.44 25.69
Table 5: BMI measurements in Kg/m2 for n = 8 individuals.
x¯ = n−1
n∑
i=1
xi = 22.66, σˆ
2 = (n− 1)−1
n∑
i=1
(xi − x¯)2 = 18.20.
You suggest that a confidence interval would be a more informative estimate than
just a single number, and you start to explain what a confidence interval is.
(a) (i) Explain the difference between the level and the coverage of a confidence
interval. [5]
(ii) For a Frequentist statistician the probability of an event corresponds to
the fraction of times the event occurs in a long series of trials. In the
context of the confidence interval for average BMI and its probabilistic
behavior, what would constitute a trial? [5]
(b) (i) Clearly stating all relevant statistical assumptions, derive a confidence
interval with level 1 − α = 0.95 given the extra assumption that the
estimated population variance, σˆ2, is the true population variance. [5]
(ii) Derive another confidence interval with level 1 − α = 0.95 without the
assumption that the estimated population variance is the true population
variance. [5]
(iii) For the confidence interval computed in part (b)(i), how many measure-
ments would have been needed to achieve a confidence interval with
width less than 1 Kg/m2? [5]
(c) You find out that the measurements you are analyzing are collected from pa-
tients that have visited hospital in the last year. How might this affect the
validity of the estimate? Which of the statistical assumptions may have been
violated? [5]
Page 4 (of 9)
MAT00033I/MAT00035I
2 (of 5). Imagine a country’s legal system is overwhelmed with cases to make judgment on.
The government decides to produce an algorithm for deciding whether or not a
suspect is innocent of a crime based on data collected by the security services. The
algorithm performs a statistical hypothesis test.
(a) (i) Describe in words appropriate hypotheses for the algorithm to test. [2]
(ii) Describe the two different types of error that the algorithm could make.
Describe how the terms size, significance and power relate to these er-
rors. [8]
(iii) Consider the case in which the hypothesis test leads to the decision not
to convict a suspect. Strictly speaking, what should we conclude about
the suspect’s innocence and about the evidence collected by the security
services? [5]
(b) The data from the security services consist of counts of the number of crime
scenes a suspect was seen at over the past year. It is assumed that the number
of crime scenes a totally innocent citizen is seen at (just by coincidence) is
well described by a Poisson distribution with rate parameter λ = 3.
(i) Describe appropriate hypotheses for the test, referring to both the suspect
and the count data. [4]
(ii) Derive a rejection rule for the test so that it is significant at level α = 0.01.
[5]
(iii) A particular suspect has been seen at 12 crime scenes. Does your test
classify them as guilty? [1]
(iv) Compute a p-value for the test and explain how its value relates to the
observed count data. [5]
Page 5 (of 9) Turn over
MAT00033I/MAT00035I
3 (of 5). You are asked by a colleague at the medical school to help interpret a linear model
they have fitted using the computer program R.
(a) In lectures we looked at some ways to score linear models
(i) For what behavior were models rewarded? [3]
(ii) For what properties were the models punished? [3]
(iii) The R2 statistic could be used to score a linear model, it also played a
key part in ANOVA calculations. Describe in words what the R2 statistic
quantifies, and give a formula for computing it in terms of the mean
response value y¯, the individual response values yi, and the model’s fitted
values yˆi i = 1, . . . , n. [4]
(b) R has also performed an ANOVA for the model but your colleague does not
understand the results.
(i) The model’s response variable is the age of an individual at death and
the covariates measure aspects of their lifestyle. Why would the medic
be interested in the ANOVA procedure? [3]
(ii) Each row of the ANOVA table corresponds to a statistical hypothesis
test. What are the precise assumptions under which the tests are valid
and what are the hypotheses being tested? [4]
(iii) What is the role of the F-distribution in the ANOVA procedure? How are
the parameters for the distribution calculated for each test? [3]
Page 6 (of 9) continued on next page
continued from previous page MAT00033I/MAT00035I
3 (of 5) cont.
(c) Your colleague shows you the ANOVA table
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x_1 1 3.0696 3.0696 12.7001 0.0007317 ***
x_2 1 6.1795 6.1795 25.5665 4.437e-06 ***
x_3 1 0.3129 0.3129 1.2946 0.2597951
x_4 1 1.5506 ???? ???? 0.0139934 *
Residuals 59 14.2604 ????
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(i) Show how the estimated variance for the model errors can be computed
from values in the ANOVA table. Hint: the estimated variance for the
model errors is 0.2417 (to 4 d.p.). [3]
(ii) Show how the R2 statistic for the full model can be computed from val-
ues in the ANOVA table. Hint: the R2 statistic is 0.4379 (to 4 d.p.). [3]
(iii) Fill in the three missing cells (currently filled with ????) of the ANOVA
table. [4]
Page 7 (of 9) Turn over
MAT00033I/MAT00035I
4 (of 5). A statistician is defending her linear model, and predictions consistent with it.
(a) Describe in words, in summation notation and in vector notation what is
meant by the model’s sum of squared residuals (SSE) for a linear model with
regression coefficient vector β˜. [5]
(b) Write down the formula for βˆ, the least-squares estimator for the regression
coefficients, and describe the way in which it is optimal. [5]
(c) Suppose that you are using the least-squares estimator βˆ to predict previously
observed response values and that your colleague is using a different estima-
tor βˆ + δ.
(i) Show that the difference in the sums of squares of errors can be written
as
SSE(βˆ + δ)− SSE(βˆ) = −2(y −Xβˆ)TXδ + δTXTXδ.
[5]
(ii) If δTXTXδ is non-negative then
SSE(βˆ + δ)− SSE(βˆ) ≥ −2(y −Xβˆ)TXδ.
Explain how you know δTXTXδ must be non-negative. [5]
(iii) Show that
(y −Xβˆ)TX = 0.
[5]
(d) You have now shown that
SSE(βˆ + δ)− SSE(βˆ) ≥ 0
for any δ. Explain what this means if you and your colleague were to use the
previously observed data to test your models. Can we prove that the least-
squares estimates will do better on unobserved data in the future? [5]
Page 8 (of 9)
MAT00033I/MAT00035I
5 (of 5). Staff in the mathematics department are looking at the relationship between stu-
dents’ A-level grades and their degree classifications. Aggregated count data for
n = 400 graduates are given in Table 6. For the purposes of this question we will
refer to the groupings of grades as grade categories.
3rd 2:2 2:1 1st
F-E 3 12 15 10
C-D 9 26 41 6
B-A 13 64 142 59
Table 6: Numbers of students achieving particular A-level grades and degree classifications.
(a) (i) Estimate the marginal probabilities for a student to fall into each A-level
category, and the marginal probabilities of falling into each degree cate-
gory. [3]
(ii) Given that each student’s A-level grade and degree classification are
probabilistically independent, estimate the number of students achiev-
ing each combination of A-level grade and degree classification. [3]
(iii) Clearly explaining all steps, test the hypothesis at significance level α = 0.05
that the degree classifications are independent of the A-level grades. To
save time on routine calculations, you can assume that the relevant test
statistic takes value 15.12. (You are not expected to calculate a p-value
for this test). [8]
(b) Your colleague suggests that you merge the F-E and C-D categories of A-level
results and repeat the test.
(i) Why might your colleague make this suggestion? [4]
(ii) Now that the some categories are merged there are aspects of the null hy-
pothesis that are no longer being tested (e.g. if getting an F-E rather than
a C-D grade has an effect on degree classification). Without performing
any formal calculations, what can you say about the relative power of the
test with the merged cells? [4]
(iii) Describe a hypothesis test that is guaranteed to reject the null hypothesis
every time the null is false. State the power of this test. [4]
(iv) Your colleague is experimenting with different tests. She constructs one
test with significance 0.05 and a second test with significance 0.01. The
second test is a likelihood ratio test. What can the Neyman-Pearson
lemma tell her about the relative power of the tests, and why? [4]
Page 9 (of 9) End of examination.

SOLUTIONS: MAT00033I/MAT00035I
1. (a) (i) The coverage is the probability that the computed confidence interval
will contain the true numerical value of the quantity we are estimating.
The level is a lower bound on the coverage.
5 Marks
(ii) The coverage is a probability that corresponds to the proportion of re-
peated experiments producing data from which confidence intervals
containing the true parameter are calculated. In this case repeated ex-
periments/trials would involve randomly sampling new sets of people
from the population and measuring their BMI.
5 Marks
(b) (i) We begin by assuming that the measurements are well described as
an iid sample from a Normal distribution with mean µ and variance
σ2, i.e. the finite population of BMI measurements for all residents is
approximated by the infinite population of possible values that can be
taken by the random variables. Since the mean of a set of normal random
variables is also normal and since the expectation and variance of the
mean are µ and σ2/n, it follows that
X¯ ∼ N(µ, σ2/n)⇔ t(X) := n
1/2(X¯ − µ)
σ
∼ N(0, 1).
Since the quantity t(X) is unit normal we can compute the probability
of it falling inside a particular interval using pre-computed quantiles. We
proceed to rearrange the inequalities that define the interval to derive the
formula for our confidence interval
1− α = P
[
zα/2 ≤ n
1/2(X¯ − µ)
σ
≤ z1−α/2
]
= P
[
X¯ − n−1/2σz1−α/2 ≤ µ ≤ X¯ − n−1/2σzα/2
]
.
Plugging in the values x¯ = 22.66, σ2 = σˆ2 = 18.2, n = 8, z1−α/2 =
−zα/2 = 1.96 we arrive at the interval
R(X) =
[
22.66−
√
18.2/8× 1.96, 22.66 +
√
18.2/8× 1.96
]
= [19.70, 25.62] . 5 Marks
(ii) When the same distributional assumptions hold but the population vari-
ance is estimated from the data the statistic
t(X) :=
n1/2(X¯ − µ)
σˆ
∼ tn−1
11
SOLUTIONS: MAT00033I/MAT00035I
follows a t-distribution. The same algebraic manipulations performed for
the previous question now lead us to the confidence interval
R(X) =
[
x¯−
√
σˆ2/n× t1−α/2,n−1, 22.66 +
√
σˆ2/n× t1−α/2,n−1
]
=
[
2.66−
√
18.2/8× 2.36, 22.66 +
√
18.2/8× 2.36
]
= [19.09, 26.23] . 5 Marks
(iii) The width of the first confidence interval is 2n−1/2σz1−α/2. For this to
be less than one we require
2n−1/2σz1−α/2 < 1,
⇔ (2σz1−α/2)2 < n
⇔ 279.72 < n.
This implies that we would need to measure at least 280 people to pro-
duce such a confidence interval.
5 Marks
(c) This new finding undermines the assumption that the measured BMIs are
iid samples from the total UK population. More specifically, we may sus-
pect that the subpopulation of people visiting hospital in the last year are
systematically different from the total population. This might mean that the
mean of the subpopulation sampled from is not the same as the mean of the
total population, which is what we are trying to estimate.
5 Marks
Remarks. Standard confidence interval material was covered extensively in lec-
tures and in homeworks. TD2 is tested here when the students look up normal
quantiles. Total: 30 Marks
2. (a) (i) The natural null hypothesis is that the suspect is innocent, the corre-
sponding alternative hypothesis is that the suspect is guilty.
2 Marks
(ii) The algorithm/test could reject the null hypothesis when the null hy-
pothesis is true. We called this type I error. The algorithm could also
fail to reject the null hypothesis when the null hypothesis is false. We
called this type II error.
4 Marks
The size of a test is the probability of making a type I error, the sig-
nificance is an upper bound on the size. The power is one minus the
12
SOLUTIONS: MAT00033I/MAT00035I
probability of making a type II error.
4 Marks
(iii) The theory for statistical hypothesis testing makes clear that failing to
reject the null hypothesis is not the same as saying that the null hypoth-
esis is true, i.e. failing to show the suspect is guilty is not the same as
saying he is innocent. The test is really a test of the evidence: the test
asks whether the evidence is sufficient to convince us of the suspect’s
guilt.
5 Marks
(b) (i) As suggested in the question, we will assume that the number of times
an innocent citizen is seen an a crime scene is a Poisson random variable
with rate parameter λ = 3. We will also assume that the number of
times a criminal is seen at a crime scene is Poisson distributed with rate
parameter λ1 > 3, so that the expected count for a criminal is higher than
that for an innocent citizen. Denoting the count as X , we write
H0 : X ∼ Poisson(λ = 3) vs. H1 : X ∼ Poisson(λ = λ1 > 3). 4 Marks
(ii) A sensible rejection rule would reject the ‘innocent’ hypothesis (H0)
when the count for a suspect exceeds some critical value denoted c. For
the test to have significance α = 0.01 we require that the probability of
rejecting H0 when H0 is true is smaller than α.
P [X ≥ c : H0] = 1− P [X ≤ c− 1 : H0] ≤ α,
⇔1− α ≤ P [X ≤ c− 1 : H0] .
From the table at the front of the exam booklet we can see that this state-
ment holds true for c−1 = 12, 13, . . .. The lowest critical value (leading
to the most powerful test) is c = 13. Explicitly, our rejection rule be-
comes: ‘Reject the hypothesis that the suspect is innocent if he is seen at
13 or more crime scenes in the past year’.
5 Marks
(iii) A suspect seen at 12 crime scenes would not trigger the rejection rule.
We do not reject the ‘innocent’ hypothesis.
1 Mark
(iv) The p-value is the probability that new data distributed according to the
specifications of the null hypothesis would take a value even more ex-
treme than the value we have just observed, i.e.
p = P [X ≥ 12 : H0] = 1− P [X ≤ 11 : H0] = 1− 0.98 = 0.02 5 Marks
13
SOLUTIONS: MAT00033I/MAT00035I
Remarks. A similar question, taking from Dekking et. al., was looked at in a
revision class towards the end of term. Question is relevant to TD1 and TD2. Total: 30 Marks
3. (a) (i) The model scores we looked at in lectures rewarded models which did
well at predicting previous observations having been optimized to fit
those same observations. In practice this was quantified by the maximum
value of the likelihood for the model parameters.
3 Marks
(ii) The models were punished if they had many parameters, or degrees of
freedom.
3 Marks
(iii) The R2 statistic arises as the fraction of the variance of the response
variable that can be explained by the model. It is computed as
R2 =
n−1
∑n
i=1(yˆi − y¯)2
n−1
∑n
i=1(yi − y¯)2
.
4 Marks
(b) (i) The medic would be interested in the ANOVA because it will help them
identify covariates that are useful for predicting the age at which at
individual will die.
3 Marks
(ii) The ANOVA procedure assumes that the model error terms are iid nor-
mal. We also need to provide an order in which the coefficients for co-
variates are to be tested. Given such an order, row k contains information
relevant to the test between
H0 : βk = 0 ∩ βj 6= 0, j = 1, . . . , k − 1,
H1 : βk 6= 0 ∩ βj 6= 0, j = 1, . . . , k − 1,
i.e. we test the null hypothesis that the kth coefficient is zero against
the alternative hypothesis that it is non-zero, where both hypotheses
assume that preceding coefficients are non-zero.
4 Marks
(iii) The distributions of the test statistics for each test/row are F-distributions
given the respective null hypotheses are true. We use quantiles of these
distributions to calculate critical values for the tests. The F-distributions
are parameterized by two degrees of freedom: one relating to the num-
ber of degrees of freedom in the part of the model being tested, the other
relating to the number of degrees of freedom for error (given the full
model). We denoted these quantities pk and n− p in lectures.
3 Marks
14
SOLUTIONS: MAT00033I/MAT00035I
(c) (i) The unbiased estimate for the variance of the residuals can be computed
by dividing the sum of squares for the residuals by the number of de-
grees of freedom for them. Both of these quantities can be found in the
ANOVA table.
σˆ2 =
1
n− p
n∑
i=1
(yi − yˆi)2
=
Sum of squares of residuals
degrees of freedom for residuals
=
14.2604
59
=0.2417 3 Marks
(ii) The R2 statistic is computed by dividing the sum of squares attributable
to the model by the total sum of squares. The former is the sum of the
sums of squares attributable to each covariate. The latter is the sum of the
sums of squares for the the covariates plus the sum of squares attributable
to the errors
R2 =
Sum of squares of fitted values
Total sum of squares
=
3.0696 + 6.1795 + 0.3129 + 1.5506
3.0696 + 6.1795 + 0.3129 + 1.5506 + 14.2604
=0.438. 3 Marks
(iii) The missing values are
Mean Sq4 =
Sum Sq4
Df4
=
1.5506
1
= 1.5506,
Mean Sqresid =
Sum Sqresid
Dfresid
= 0.2417
(which we have already computed), and
F =
Mean Sq4
Mean Sqresid
=
1.5506
0.2417
= 6.4154.
15
SOLUTIONS: MAT00033I/MAT00035I
The completed table is as follows
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x_1 1 3.0696 3.0696 12.7001 0.0007317 ***
x_2 1 6.1795 6.1795 25.5665 4.437e-06 ***
x_3 1 0.3129 0.3129 1.2946 0.2597951
x_4 1 1.5506 1.5506 6.4153 0.0139934 *
Residuals 59 14.2604 0.2417
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 4 Marks
Remarks. The ANOVA material tested here is routine. The question requires
students to have remembered the meaning and structure of ANOVA calculation
without asking for many calculations. Total: 30 Marks
4. (a) The SSE is the sum of squared differences between the model’s predic-
tions for data quantities and the values for those quantities that are actu-
ally observed. We write this formally as
SSE(β˜) =
∑
(yi − xTi β˜)2 = (y −Xβ˜)T (y −Xβ˜). 5 Marks
(b) The OLS estimate for the true vector of regression coefficients is
βˆ = (XTX)−1XTy.
The estimate is optimal insofar as minimizing the SSE
βˆ = argminβ˜
(
SSE(β˜)
)
.
.
5 Marks
16
SOLUTIONS: MAT00033I/MAT00035I
(c) (i) Using the vector notation, the difference between the two SSEs is
∆SSE =SSE(βˆ + δ)− SSE(βˆ)
=(y −X(βˆ + δ))T (y −X(βˆ + δ))
− (y −Xβˆ)T (y −Xβˆ)
=((y −Xβˆ)−Xδ)T ((y −Xβˆ)−Xδ)
− (y −Xβˆ)T (y −Xβˆ)
=((((
((((
(((
(y −Xβˆ)T (y −Xβˆ)− 2(y −Xβˆ)TXδ + (Xδ)T (Xδ)
−(((((((
((((
(y −Xβˆ)T (y −Xβˆ)
=− 2(y −Xβˆ)TXδ + (Xδ)T (Xδ) 5 Marks
(ii) Xδ is a vector of length n just like yˆ = Xβˆ is. The product δTXTXδ −
(Xδ)TXδ is the squared modulus of this vector, which cannot be a neg-
ative quantity. Equivalently, we can write
(Xδ)TXδ =
n∑
i=1
(Xδ)2i
and see that the product is a sum of squared (real) quantities.
5 Marks
(iii)
(y −Xβˆ)TX =(y −X(XTX)−1XTy)TX
=(yT − yTX(XTX)−1XT )X
=yT (X −X(XTX)−1XTX)
=yT (X −X)
=yT0 = 0 5 Marks
(d) The optimality result means that if we went back and tried to predict the
previously observed data with our linear model, the model using the OLS es-
timates would achieve a smaller sum of squared residuals. We cannot prove
that the OLS estimates will lead to a smaller sum of squared residuals for
unobserved data in the future.
5 Marks
Remarks. The question tests TD3. Final question is more open-ended than most. Total: 30 Marks
17
SOLUTIONS: MAT00033I/MAT00035I
5. (a) (i) The marginal probabilities for the A-level grade categories are (0.100, 0.205, 0.695).
The marginal probabilities for the degree classifications are (0.0625, 0.2550, 0.4950, 0.1875). 3 Marks
(ii) The joint probabilities consistent with the assumption of independence
follow from multiplying the marginal probabilities. The expected counts
follow from multiplying the probabilities by n = 400. Doing this, we
arrive at the figures
3rd 2:2 2:1 1st
F-E 2.50 10.20 19.80 7.50
C-D 5.12 20.91 40.59 15.38
B-A 17.38 70.89 137.61 52.12
3 Marks
(iii) Our assumptions are that, collectively, the counts are well described by
a multinomial distribution. Pearson’s χ2-test further assumes that the
test statistic
t(O) =
∑
i,j
(Oij − Eij)2
Eij
is approximately χ2-distributed with (ni − 1)(nj − 1) degrees of free-
dom. Our test hypotheses concern the probabilities of any individual
falling into a particular pair of categories. More specifically, our null hy-
pothesis states that all the joint probabilities pij are the products of the
marginal probabilities pi,· and p·,j . Our alternative hypothesis states that
this relationship is violated for at least one pair of categories, i.e.
H0 : pij = pi,·p·,j ∀i, j,
H1 : ∃(i, j) such that pij 6= pi,·p·,j.
The test’s rejection rule tells us to reject the null hypothesis when the
test statistic exceeds a critical value c. The result here is that
t(O) = 15.12 > c = χ22×3 = 12.59
so we do reject the null hypothesis.
8 Marks
(b) (i) Pearson’s χ2-test is based on approximations that improve as the ex-
pected counts for every category increase. When the expected counts
are low we cannot be as confident that the test is well calibrated in
terms of its significance.
4 Marks
18
SOLUTIONS: MAT00033I/MAT00035I
(ii) When we are unable to test and criticize the null hypothesis in as much
detail it becomes harder to reject the null when it is false. It is thus
harder to avoid type II error. We would therefore expect the test with
the merged categories to have less power than the first test.
4 Marks
(iii) A test that rejects the null no matter the values taken by the data will al-
ways reject the null, so will always reject the null when the null is false.
Its power is 1.
4 Marks
(iv) The Neyman-Pearson lemma cannot help your colleague here. It
would be able to tell her that the likelihood ratio is more powerful if
the other test had the same significance level, but it doesn’t.
4 Marks
Remarks. Opportunities to access TDs 1 and 2. Questions (c)(ii) and (c)(iv)
require genuine understanding of results covered in lectures. Total: 30 Marks
19