Module Code

MAT00033I/MAT00035I

BA, BSc and MMath Examinations 2018/9

Department:

Mathematics

Title of Exam:

Probability and Statistics/Statistics option - Statistical Inference II and Linear Models

Time Allowed:

3 hours

Allocation of Marks:

Each question carries 30 marks.

The marking scheme shown on each question is indicative only.

Instructions for Candidates:

This paper contains five questions. Answer all questions.

The first two pages of the question booklet contains tables of probabilities and

quantiles you may use in your answers.

Please write your answers in ink; pencil is acceptable for graphs and diagrams.

Do not use red ink.

Materials Supplied:

Answer booklet

Calculator

Do not write on this booklet before the exam begins.

Do not turn over this page until instructed to do so by an invigilator.

Page 1 (of 9)

MAT00033I/MAT00035I

The following four tables contain quantile information for use in answering exam questions.

All probabilities refer to the lower tail of distributions, i.e. the probability mass to the left of

particular quantiles.

p=0.5 p=0.9 p=0.95 p=0.975

n=7 0 1.41 1.89 2.36

n=8 0 1.40 1.86 2.31

n=9 0 1.38 1.83 2.26

Table 1: Selected quantiles for the t−distribution with n degrees of freedom. e.g.

P (t7 < 1.40) = 0.9.

p=0.5 p=0.9 p=0.95 p=0.975

0 1.28 1.64 1.96

Table 2: Selected quantiles for the unit normal distribution N(0, 1). e.g.

P (z < 1.28) = 0.9.

p= 0.9 p= 0.95 p= 0.975

k= 6 10.64 12.59 14.45

k= 7 12.02 14.07 16.01

k= 8 13.36 15.51 17.53

k= 9 14.68 16.92 19.02

k= 10 15.99 18.31 20.48

k= 11 17.28 19.68 21.92

k= 12 18.55 21.03 23.34

Table 3: Selected quantiles for the χ2-distribution with k degrees of freedom. e.g.

P (χ26 < 10.64) = 0.9.

Page 2 (of 9)

MAT00033I/MAT00035I

λ =1 λ =2 λ =3 λ =4 λ =5

P(X≤ 6 ) 0.89 0.76 0.61 0.45 0.31

P(X≤ 7 ) 0.95 0.87 0.74 0.60 0.45

P(X≤ 8 ) 0.98 0.93 0.85 0.73 0.59

P(X≤ 9 ) 0.99 0.97 0.92 0.83 0.72

P(X≤ 10 ) 1.00 0.99 0.96 0.90 0.82

P(X≤ 11 ) 1.00 0.99 0.98 0.95 0.89

P(X≤ 12 ) 1.00 1.00 0.99 0.97 0.94

P(X≤ 13 ) 1.00 1.00 1.00 0.99 0.97

P(X≤ 14 ) 1.00 1.00 1.00 0.99 0.98

P(X≤ 15 ) 1.00 1.00 1.00 1.00 0.99

P(X≤ 16 ) 1.00 1.00 1.00 1.00 1.00

Table 4: Values of the cumulative mass function for Poisson distributions with rate param-

eters λ = 1, . . . , 5. e.g. P (X ≤ 6 | X ∼ Poisson(λ = 1)) = 0.89.

Page 3 (of 9) Turn over

MAT00033I/MAT00035I

1 (of 5). Body Mass Index (BMI) is used to measure a person’s weight relative to their

height. Medics from the UK’s National Health Service want to estimate the average

BMI of all the country’s residents. They provide you with the following data and

summary statistics, and ask you for advice.

18.87 22.92 17.82 29.98 23.65 17.9 24.44 25.69

Table 5: BMI measurements in Kg/m2 for n = 8 individuals.

x¯ = n−1

n∑

i=1

xi = 22.66, σˆ

2 = (n− 1)−1

n∑

i=1

(xi − x¯)2 = 18.20.

You suggest that a confidence interval would be a more informative estimate than

just a single number, and you start to explain what a confidence interval is.

(a) (i) Explain the difference between the level and the coverage of a confidence

interval. [5]

(ii) For a Frequentist statistician the probability of an event corresponds to

the fraction of times the event occurs in a long series of trials. In the

context of the confidence interval for average BMI and its probabilistic

behavior, what would constitute a trial? [5]

(b) (i) Clearly stating all relevant statistical assumptions, derive a confidence

interval with level 1 − α = 0.95 given the extra assumption that the

estimated population variance, σˆ2, is the true population variance. [5]

(ii) Derive another confidence interval with level 1 − α = 0.95 without the

assumption that the estimated population variance is the true population

variance. [5]

(iii) For the confidence interval computed in part (b)(i), how many measure-

ments would have been needed to achieve a confidence interval with

width less than 1 Kg/m2? [5]

(c) You find out that the measurements you are analyzing are collected from pa-

tients that have visited hospital in the last year. How might this affect the

validity of the estimate? Which of the statistical assumptions may have been

violated? [5]

Page 4 (of 9)

MAT00033I/MAT00035I

2 (of 5). Imagine a country’s legal system is overwhelmed with cases to make judgment on.

The government decides to produce an algorithm for deciding whether or not a

suspect is innocent of a crime based on data collected by the security services. The

algorithm performs a statistical hypothesis test.

(a) (i) Describe in words appropriate hypotheses for the algorithm to test. [2]

(ii) Describe the two different types of error that the algorithm could make.

Describe how the terms size, significance and power relate to these er-

rors. [8]

(iii) Consider the case in which the hypothesis test leads to the decision not

to convict a suspect. Strictly speaking, what should we conclude about

the suspect’s innocence and about the evidence collected by the security

services? [5]

(b) The data from the security services consist of counts of the number of crime

scenes a suspect was seen at over the past year. It is assumed that the number

of crime scenes a totally innocent citizen is seen at (just by coincidence) is

well described by a Poisson distribution with rate parameter λ = 3.

(i) Describe appropriate hypotheses for the test, referring to both the suspect

and the count data. [4]

(ii) Derive a rejection rule for the test so that it is significant at level α = 0.01.

[5]

(iii) A particular suspect has been seen at 12 crime scenes. Does your test

classify them as guilty? [1]

(iv) Compute a p-value for the test and explain how its value relates to the

observed count data. [5]

Page 5 (of 9) Turn over

MAT00033I/MAT00035I

3 (of 5). You are asked by a colleague at the medical school to help interpret a linear model

they have fitted using the computer program R.

(a) In lectures we looked at some ways to score linear models

(i) For what behavior were models rewarded? [3]

(ii) For what properties were the models punished? [3]

(iii) The R2 statistic could be used to score a linear model, it also played a

key part in ANOVA calculations. Describe in words what the R2 statistic

quantifies, and give a formula for computing it in terms of the mean

response value y¯, the individual response values yi, and the model’s fitted

values yˆi i = 1, . . . , n. [4]

(b) R has also performed an ANOVA for the model but your colleague does not

understand the results.

(i) The model’s response variable is the age of an individual at death and

the covariates measure aspects of their lifestyle. Why would the medic

be interested in the ANOVA procedure? [3]

(ii) Each row of the ANOVA table corresponds to a statistical hypothesis

test. What are the precise assumptions under which the tests are valid

and what are the hypotheses being tested? [4]

(iii) What is the role of the F-distribution in the ANOVA procedure? How are

the parameters for the distribution calculated for each test? [3]

Page 6 (of 9) continued on next page

continued from previous page MAT00033I/MAT00035I

3 (of 5) cont.

(c) Your colleague shows you the ANOVA table

Analysis of Variance Table

Response: y

Df Sum Sq Mean Sq F value Pr(>F)

x_1 1 3.0696 3.0696 12.7001 0.0007317 ***

x_2 1 6.1795 6.1795 25.5665 4.437e-06 ***

x_3 1 0.3129 0.3129 1.2946 0.2597951

x_4 1 1.5506 ???? ???? 0.0139934 *

Residuals 59 14.2604 ????

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(i) Show how the estimated variance for the model errors can be computed

from values in the ANOVA table. Hint: the estimated variance for the

model errors is 0.2417 (to 4 d.p.). [3]

(ii) Show how the R2 statistic for the full model can be computed from val-

ues in the ANOVA table. Hint: the R2 statistic is 0.4379 (to 4 d.p.). [3]

(iii) Fill in the three missing cells (currently filled with ????) of the ANOVA

table. [4]

Page 7 (of 9) Turn over

MAT00033I/MAT00035I

4 (of 5). A statistician is defending her linear model, and predictions consistent with it.

(a) Describe in words, in summation notation and in vector notation what is

meant by the model’s sum of squared residuals (SSE) for a linear model with

regression coefficient vector β˜. [5]

(b) Write down the formula for βˆ, the least-squares estimator for the regression

coefficients, and describe the way in which it is optimal. [5]

(c) Suppose that you are using the least-squares estimator βˆ to predict previously

observed response values and that your colleague is using a different estima-

tor βˆ + δ.

(i) Show that the difference in the sums of squares of errors can be written

as

SSE(βˆ + δ)− SSE(βˆ) = −2(y −Xβˆ)TXδ + δTXTXδ.

[5]

(ii) If δTXTXδ is non-negative then

SSE(βˆ + δ)− SSE(βˆ) ≥ −2(y −Xβˆ)TXδ.

Explain how you know δTXTXδ must be non-negative. [5]

(iii) Show that

(y −Xβˆ)TX = 0.

[5]

(d) You have now shown that

SSE(βˆ + δ)− SSE(βˆ) ≥ 0

for any δ. Explain what this means if you and your colleague were to use the

previously observed data to test your models. Can we prove that the least-

squares estimates will do better on unobserved data in the future? [5]

Page 8 (of 9)

MAT00033I/MAT00035I

5 (of 5). Staff in the mathematics department are looking at the relationship between stu-

dents’ A-level grades and their degree classifications. Aggregated count data for

n = 400 graduates are given in Table 6. For the purposes of this question we will

refer to the groupings of grades as grade categories.

3rd 2:2 2:1 1st

F-E 3 12 15 10

C-D 9 26 41 6

B-A 13 64 142 59

Table 6: Numbers of students achieving particular A-level grades and degree classifications.

(a) (i) Estimate the marginal probabilities for a student to fall into each A-level

category, and the marginal probabilities of falling into each degree cate-

gory. [3]

(ii) Given that each student’s A-level grade and degree classification are

probabilistically independent, estimate the number of students achiev-

ing each combination of A-level grade and degree classification. [3]

(iii) Clearly explaining all steps, test the hypothesis at significance level α = 0.05

that the degree classifications are independent of the A-level grades. To

save time on routine calculations, you can assume that the relevant test

statistic takes value 15.12. (You are not expected to calculate a p-value

for this test). [8]

(b) Your colleague suggests that you merge the F-E and C-D categories of A-level

results and repeat the test.

(i) Why might your colleague make this suggestion? [4]

(ii) Now that the some categories are merged there are aspects of the null hy-

pothesis that are no longer being tested (e.g. if getting an F-E rather than

a C-D grade has an effect on degree classification). Without performing

any formal calculations, what can you say about the relative power of the

test with the merged cells? [4]

(iii) Describe a hypothesis test that is guaranteed to reject the null hypothesis

every time the null is false. State the power of this test. [4]

(iv) Your colleague is experimenting with different tests. She constructs one

test with significance 0.05 and a second test with significance 0.01. The

second test is a likelihood ratio test. What can the Neyman-Pearson

lemma tell her about the relative power of the tests, and why? [4]

Page 9 (of 9) End of examination.

SOLUTIONS: MAT00033I/MAT00035I

1. (a) (i) The coverage is the probability that the computed confidence interval

will contain the true numerical value of the quantity we are estimating.

The level is a lower bound on the coverage.

5 Marks

(ii) The coverage is a probability that corresponds to the proportion of re-

peated experiments producing data from which confidence intervals

containing the true parameter are calculated. In this case repeated ex-

periments/trials would involve randomly sampling new sets of people

from the population and measuring their BMI.

5 Marks

(b) (i) We begin by assuming that the measurements are well described as

an iid sample from a Normal distribution with mean µ and variance

σ2, i.e. the finite population of BMI measurements for all residents is

approximated by the infinite population of possible values that can be

taken by the random variables. Since the mean of a set of normal random

variables is also normal and since the expectation and variance of the

mean are µ and σ2/n, it follows that

X¯ ∼ N(µ, σ2/n)⇔ t(X) := n

1/2(X¯ − µ)

σ

∼ N(0, 1).

Since the quantity t(X) is unit normal we can compute the probability

of it falling inside a particular interval using pre-computed quantiles. We

proceed to rearrange the inequalities that define the interval to derive the

formula for our confidence interval

1− α = P

[

zα/2 ≤ n

1/2(X¯ − µ)

σ

≤ z1−α/2

]

= P

[

X¯ − n−1/2σz1−α/2 ≤ µ ≤ X¯ − n−1/2σzα/2

]

.

Plugging in the values x¯ = 22.66, σ2 = σˆ2 = 18.2, n = 8, z1−α/2 =

−zα/2 = 1.96 we arrive at the interval

R(X) =

[

22.66−

√

18.2/8× 1.96, 22.66 +

√

18.2/8× 1.96

]

= [19.70, 25.62] . 5 Marks

(ii) When the same distributional assumptions hold but the population vari-

ance is estimated from the data the statistic

t(X) :=

n1/2(X¯ − µ)

σˆ

∼ tn−1

11

SOLUTIONS: MAT00033I/MAT00035I

follows a t-distribution. The same algebraic manipulations performed for

the previous question now lead us to the confidence interval

R(X) =

[

x¯−

√

σˆ2/n× t1−α/2,n−1, 22.66 +

√

σˆ2/n× t1−α/2,n−1

]

=

[

2.66−

√

18.2/8× 2.36, 22.66 +

√

18.2/8× 2.36

]

= [19.09, 26.23] . 5 Marks

(iii) The width of the first confidence interval is 2n−1/2σz1−α/2. For this to

be less than one we require

2n−1/2σz1−α/2 < 1,

⇔ (2σz1−α/2)2 < n

⇔ 279.72 < n.

This implies that we would need to measure at least 280 people to pro-

duce such a confidence interval.

5 Marks

(c) This new finding undermines the assumption that the measured BMIs are

iid samples from the total UK population. More specifically, we may sus-

pect that the subpopulation of people visiting hospital in the last year are

systematically different from the total population. This might mean that the

mean of the subpopulation sampled from is not the same as the mean of the

total population, which is what we are trying to estimate.

5 Marks

Remarks. Standard confidence interval material was covered extensively in lec-

tures and in homeworks. TD2 is tested here when the students look up normal

quantiles. Total: 30 Marks

2. (a) (i) The natural null hypothesis is that the suspect is innocent, the corre-

sponding alternative hypothesis is that the suspect is guilty.

2 Marks

(ii) The algorithm/test could reject the null hypothesis when the null hy-

pothesis is true. We called this type I error. The algorithm could also

fail to reject the null hypothesis when the null hypothesis is false. We

called this type II error.

4 Marks

The size of a test is the probability of making a type I error, the sig-

nificance is an upper bound on the size. The power is one minus the

12

SOLUTIONS: MAT00033I/MAT00035I

probability of making a type II error.

4 Marks

(iii) The theory for statistical hypothesis testing makes clear that failing to

reject the null hypothesis is not the same as saying that the null hypoth-

esis is true, i.e. failing to show the suspect is guilty is not the same as

saying he is innocent. The test is really a test of the evidence: the test

asks whether the evidence is sufficient to convince us of the suspect’s

guilt.

5 Marks

(b) (i) As suggested in the question, we will assume that the number of times

an innocent citizen is seen an a crime scene is a Poisson random variable

with rate parameter λ = 3. We will also assume that the number of

times a criminal is seen at a crime scene is Poisson distributed with rate

parameter λ1 > 3, so that the expected count for a criminal is higher than

that for an innocent citizen. Denoting the count as X , we write

H0 : X ∼ Poisson(λ = 3) vs. H1 : X ∼ Poisson(λ = λ1 > 3). 4 Marks

(ii) A sensible rejection rule would reject the ‘innocent’ hypothesis (H0)

when the count for a suspect exceeds some critical value denoted c. For

the test to have significance α = 0.01 we require that the probability of

rejecting H0 when H0 is true is smaller than α.

P [X ≥ c : H0] = 1− P [X ≤ c− 1 : H0] ≤ α,

⇔1− α ≤ P [X ≤ c− 1 : H0] .

From the table at the front of the exam booklet we can see that this state-

ment holds true for c−1 = 12, 13, . . .. The lowest critical value (leading

to the most powerful test) is c = 13. Explicitly, our rejection rule be-

comes: ‘Reject the hypothesis that the suspect is innocent if he is seen at

13 or more crime scenes in the past year’.

5 Marks

(iii) A suspect seen at 12 crime scenes would not trigger the rejection rule.

We do not reject the ‘innocent’ hypothesis.

1 Mark

(iv) The p-value is the probability that new data distributed according to the

specifications of the null hypothesis would take a value even more ex-

treme than the value we have just observed, i.e.

p = P [X ≥ 12 : H0] = 1− P [X ≤ 11 : H0] = 1− 0.98 = 0.02 5 Marks

13

SOLUTIONS: MAT00033I/MAT00035I

Remarks. A similar question, taking from Dekking et. al., was looked at in a

revision class towards the end of term. Question is relevant to TD1 and TD2. Total: 30 Marks

3. (a) (i) The model scores we looked at in lectures rewarded models which did

well at predicting previous observations having been optimized to fit

those same observations. In practice this was quantified by the maximum

value of the likelihood for the model parameters.

3 Marks

(ii) The models were punished if they had many parameters, or degrees of

freedom.

3 Marks

(iii) The R2 statistic arises as the fraction of the variance of the response

variable that can be explained by the model. It is computed as

R2 =

n−1

∑n

i=1(yˆi − y¯)2

n−1

∑n

i=1(yi − y¯)2

.

4 Marks

(b) (i) The medic would be interested in the ANOVA because it will help them

identify covariates that are useful for predicting the age at which at

individual will die.

3 Marks

(ii) The ANOVA procedure assumes that the model error terms are iid nor-

mal. We also need to provide an order in which the coefficients for co-

variates are to be tested. Given such an order, row k contains information

relevant to the test between

H0 : βk = 0 ∩ βj 6= 0, j = 1, . . . , k − 1,

H1 : βk 6= 0 ∩ βj 6= 0, j = 1, . . . , k − 1,

i.e. we test the null hypothesis that the kth coefficient is zero against

the alternative hypothesis that it is non-zero, where both hypotheses

assume that preceding coefficients are non-zero.

4 Marks

(iii) The distributions of the test statistics for each test/row are F-distributions

given the respective null hypotheses are true. We use quantiles of these

distributions to calculate critical values for the tests. The F-distributions

are parameterized by two degrees of freedom: one relating to the num-

ber of degrees of freedom in the part of the model being tested, the other

relating to the number of degrees of freedom for error (given the full

model). We denoted these quantities pk and n− p in lectures.

3 Marks

14

SOLUTIONS: MAT00033I/MAT00035I

(c) (i) The unbiased estimate for the variance of the residuals can be computed

by dividing the sum of squares for the residuals by the number of de-

grees of freedom for them. Both of these quantities can be found in the

ANOVA table.

σˆ2 =

1

n− p

n∑

i=1

(yi − yˆi)2

=

Sum of squares of residuals

degrees of freedom for residuals

=

14.2604

59

=0.2417 3 Marks

(ii) The R2 statistic is computed by dividing the sum of squares attributable

to the model by the total sum of squares. The former is the sum of the

sums of squares attributable to each covariate. The latter is the sum of the

sums of squares for the the covariates plus the sum of squares attributable

to the errors

R2 =

Sum of squares of fitted values

Total sum of squares

=

3.0696 + 6.1795 + 0.3129 + 1.5506

3.0696 + 6.1795 + 0.3129 + 1.5506 + 14.2604

=0.438. 3 Marks

(iii) The missing values are

Mean Sq4 =

Sum Sq4

Df4

=

1.5506

1

= 1.5506,

Mean Sqresid =

Sum Sqresid

Dfresid

= 0.2417

(which we have already computed), and

F =

Mean Sq4

Mean Sqresid

=

1.5506

0.2417

= 6.4154.

15

SOLUTIONS: MAT00033I/MAT00035I

The completed table is as follows

Analysis of Variance Table

Response: y

Df Sum Sq Mean Sq F value Pr(>F)

x_1 1 3.0696 3.0696 12.7001 0.0007317 ***

x_2 1 6.1795 6.1795 25.5665 4.437e-06 ***

x_3 1 0.3129 0.3129 1.2946 0.2597951

x_4 1 1.5506 1.5506 6.4153 0.0139934 *

Residuals 59 14.2604 0.2417

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 4 Marks

Remarks. The ANOVA material tested here is routine. The question requires

students to have remembered the meaning and structure of ANOVA calculation

without asking for many calculations. Total: 30 Marks

4. (a) The SSE is the sum of squared differences between the model’s predic-

tions for data quantities and the values for those quantities that are actu-

ally observed. We write this formally as

SSE(β˜) =

∑

(yi − xTi β˜)2 = (y −Xβ˜)T (y −Xβ˜). 5 Marks

(b) The OLS estimate for the true vector of regression coefficients is

βˆ = (XTX)−1XTy.

The estimate is optimal insofar as minimizing the SSE

βˆ = argminβ˜

(

SSE(β˜)

)

.

.

5 Marks

16

SOLUTIONS: MAT00033I/MAT00035I

(c) (i) Using the vector notation, the difference between the two SSEs is

∆SSE =SSE(βˆ + δ)− SSE(βˆ)

=(y −X(βˆ + δ))T (y −X(βˆ + δ))

− (y −Xβˆ)T (y −Xβˆ)

=((y −Xβˆ)−Xδ)T ((y −Xβˆ)−Xδ)

− (y −Xβˆ)T (y −Xβˆ)

=((((

((((

(((

(y −Xβˆ)T (y −Xβˆ)− 2(y −Xβˆ)TXδ + (Xδ)T (Xδ)

−(((((((

((((

(y −Xβˆ)T (y −Xβˆ)

=− 2(y −Xβˆ)TXδ + (Xδ)T (Xδ) 5 Marks

(ii) Xδ is a vector of length n just like yˆ = Xβˆ is. The product δTXTXδ −

(Xδ)TXδ is the squared modulus of this vector, which cannot be a neg-

ative quantity. Equivalently, we can write

(Xδ)TXδ =

n∑

i=1

(Xδ)2i

and see that the product is a sum of squared (real) quantities.

5 Marks

(iii)

(y −Xβˆ)TX =(y −X(XTX)−1XTy)TX

=(yT − yTX(XTX)−1XT )X

=yT (X −X(XTX)−1XTX)

=yT (X −X)

=yT0 = 0 5 Marks

(d) The optimality result means that if we went back and tried to predict the

previously observed data with our linear model, the model using the OLS es-

timates would achieve a smaller sum of squared residuals. We cannot prove

that the OLS estimates will lead to a smaller sum of squared residuals for

unobserved data in the future.

5 Marks

Remarks. The question tests TD3. Final question is more open-ended than most. Total: 30 Marks

17

SOLUTIONS: MAT00033I/MAT00035I

5. (a) (i) The marginal probabilities for the A-level grade categories are (0.100, 0.205, 0.695).

The marginal probabilities for the degree classifications are (0.0625, 0.2550, 0.4950, 0.1875). 3 Marks

(ii) The joint probabilities consistent with the assumption of independence

follow from multiplying the marginal probabilities. The expected counts

follow from multiplying the probabilities by n = 400. Doing this, we

arrive at the figures

3rd 2:2 2:1 1st

F-E 2.50 10.20 19.80 7.50

C-D 5.12 20.91 40.59 15.38

B-A 17.38 70.89 137.61 52.12

3 Marks

(iii) Our assumptions are that, collectively, the counts are well described by

a multinomial distribution. Pearson’s χ2-test further assumes that the

test statistic

t(O) =

∑

i,j

(Oij − Eij)2

Eij

is approximately χ2-distributed with (ni − 1)(nj − 1) degrees of free-

dom. Our test hypotheses concern the probabilities of any individual

falling into a particular pair of categories. More specifically, our null hy-

pothesis states that all the joint probabilities pij are the products of the

marginal probabilities pi,· and p·,j . Our alternative hypothesis states that

this relationship is violated for at least one pair of categories, i.e.

H0 : pij = pi,·p·,j ∀i, j,

H1 : ∃(i, j) such that pij 6= pi,·p·,j.

The test’s rejection rule tells us to reject the null hypothesis when the

test statistic exceeds a critical value c. The result here is that

t(O) = 15.12 > c = χ22×3 = 12.59

so we do reject the null hypothesis.

8 Marks

(b) (i) Pearson’s χ2-test is based on approximations that improve as the ex-

pected counts for every category increase. When the expected counts

are low we cannot be as confident that the test is well calibrated in

terms of its significance.

4 Marks

18

SOLUTIONS: MAT00033I/MAT00035I

(ii) When we are unable to test and criticize the null hypothesis in as much

detail it becomes harder to reject the null when it is false. It is thus

harder to avoid type II error. We would therefore expect the test with

the merged categories to have less power than the first test.

4 Marks

(iii) A test that rejects the null no matter the values taken by the data will al-

ways reject the null, so will always reject the null when the null is false.

Its power is 1.

4 Marks

(iv) The Neyman-Pearson lemma cannot help your colleague here. It

would be able to tell her that the likelihood ratio is more powerful if

the other test had the same significance level, but it doesn’t.

4 Marks

Remarks. Opportunities to access TDs 1 and 2. Questions (c)(ii) and (c)(iv)

require genuine understanding of results covered in lectures. Total: 30 Marks

19