STAT7002 Examination 2018 Page 1

Answer ALL questions. Section A carries 40% of the total marks and Section B carries 60% of

the total marks. The relative weights attached to each question are as follows: A1 (9), A2 (8),

A3 (8), A4 (9), A5 (6); B1 (20), B2 (20), B3 (20). The numbers in square brackets indicate

the relative weight attached to each part question. An appendix containing some formulae from

the STAT7002 course is provided at the end of this examination paper.

Section A

A1. A travel company uses the following questions on a questionnaire given to tourists as part

of a visit to London. For each question, briefly identify a potential problem that might

lead to bias. Explain your reasoning.

(a) ‘The Phantom of the Opera has played continuously at Her Majesty’s Theatre since

1986, winning over 70 major theatre awards and receiving much critical acclaim.

Did you see it during your stay?’

[3]

(b) ‘Do you think that public transport in London is easy to use and reasonably priced?

Please circle a response.’

YES NO

[3]

(c) ‘In the box below, please write down the amount that you spent during your stay

in London (not including money spent on accommodation and travel).’

£

[3]

A2. A British town contains 25000 eligible voters. Before an election, a political researcher

performs a simple random sample of 400 eligible voters. Each sampled voter is asked if

they intend to vote for the Labour party candidate, with two possible responses: ‘Yes ’

or ‘No’. Of the sampled voters, 118 answered ‘Yes ’ to this question.

(a) Define the term simple random sample.

[2]

(b) Assuming that there was no non-response, calculate an estimate and an associated

95% confidence interval for the proportion of Labour voters in the town. You should

define any notation that you introduce and show your working clearly.

[6]

Turn Over

STAT7002 Examination 2018 Page 2

A3. A property investor owns four houses in London. The values of the houses are shown in

the table below, with letters (A–D) to denote the different houses.

House Value (£)

A 499,950

B 525,000

C 610,000

D 774,950

(a) A prospective buyer wishes to view two of the property investor’s houses. The

buyer will choose which houses to view using a simple random sample. Assuming

this sampling approach, derive the sampling distribution of the sample mean house

value. You should define any notation or terms that you introduce.

[5]

(b) Using your answer to (a), calculate the expectation of the sample mean.

[3]

A4. The descriptions (a)–(c) outline sampling schemes. For each of (a)–(c), identify the type

of sampling scheme used and describe a potential problem with the proposed sampling

scheme. Justify your answers.

(a) A researcher wants to know about the experience of passengers who use the London

Underground. A questionnaire is devised and the researcher stands outside Hol-

born Station between 0900 and 1100 on a given Monday morning, asking potential

respondents who pass by to participate in a survey.

[3]

(b) An investigative reporter is interested in finding out about the living conditions of

illegal workers. The reporter knows three illegal workers, who agree to participate

in a study. These illegal workers are asked to invite any other illegal workers, whom

they know, to participate in the same study.

[3]

(c) A high school contains 1200 pupils aged 11–16. The list of pupils in the school is

ordered by date of birth (youngest to oldest) and every tenth pupil on the ordered

list is selected to participate in a school sports event, until an overall sample size of

50 pupils is reached.

[3]

Continued

STAT7002 Examination 2018 Page 3

A5. Over a 30 month period, 60 obese males participated in a weight loss study. The weight of

each study participant was recorded at several time points. To show the change in mean

weight of the participants over time, the study research team produced the following

visual display of their data.

l

l

l

l

l

0

50

10

0

15

0

20

0

25

0

30

0

0 3 12 24 30

(a) Identify two, distinct, problematic features of this visual display. Justify your an-

swer.

[2]

(b) Identify the scale type used for each of the following study variables. You should

justify your answer in each case.

(i) A participant’s weight (in pounds).

(ii) The number of visits to the gym that a participant makes.

[4]

Turn Over

STAT7002 Examination 2018 Page 4

Section B

B1. Researchers from a town’s council want to measure residents’ attitudes concerning the

living environment in their town. Below are two statements that the researchers aim to

present to a sample of residents as part of a questionnaire.

‘There is too much litter around the town centre.’

‘Public spaces and gardens within our town are well maintained.’

(a) Using these statements as examples, describe how a Likert Scale could be con-

structed in this questionnaire to measure the attitude of residents concerning the

living environment in the town. Your answer should include a description of polarity

and a definition of the polarity of each of the above statements.

[9]

(b) Describe how Likert Scale responses for a single item (such as either of those above)

could be summarised and presented for a sample of residents who complete the

questionnaire.

[3]

(c) Explain what is meant by the reliability of a measurement instrument and describe

how the reliability of a questionnaire, in which several responses are used to measure

the same attitude with a Likert scale, may be assessed.

[4]

The council decide that they will sample 200 of the town’s residents; the target population

for the study is all adult residents of the town (20000 people). Sampling will be done

by randomly selecting e-mail addresses of people who have paid Council Tax using the

council’s online payment system, with a link to an online questionnaire sent to each

selected e-mail address.

(d) Is this proposed sampling scheme satisfactory? Justify your answer.

[4]

Continued

STAT7002 Examination 2018 Page 5

B2. Suppose that Y1, . . . , YN are binary variables in a population of size N ∈ N with N > 2.

The population proportion is given by

P =

1

N

N∑

i=1

Yi.

A researcher wants to draw a simple random sample of size n from this population (where

n < N), in which the sampled variables are denoted y1, . . . , yn.

(a) Denoting Pˆ as the sample mean of the n sampled binary variables, show that

Var(Pˆ ) =

P (1− P )(N − n)

n(N − 1) .

You may use the following without proof

Cov(yj, yk) =

−P (1− P )

N − 1 for j 6= k.

[6]

(b) The researcher wants to sample enough binary variables so that the standard error

of Pˆ is less than some pre-specified positive constant c. Show that the number of

sampled variables, n, should satisfy

n >

[

4(N − 1)c2

N

+

1

N

]−1

.

[5]

A high school, in which the number of registered pupils is 900, wishes to perform a

simple random sample of pupils. Each sampled pupil will be sent a postal questionnaire

on various aspects of school life. One of the questions will ask ‘Overall, are you satisfied

with the standard of teaching at school? ’ with respondents given two answer options of

‘Yes ’ or ‘No’. It is assumed that the proportion of pupils who would not answer this

question is 10%.

(c) Calculate the number of pupils that should be sampled so that the proportion of

pupils who are satisfied with the school’s standard of teaching can be estimated with

a standard error no larger than 0.03. You should show your working clearly and

define carefully any assumptions that you make.

[5]

(d) The school’s headteacher assumes that pupils who are not satisfied with the standard

of teaching at the school are less likely to answer the question on the standard of

teaching than other pupils in the school, leading to missing data for some of the

answers to this question. Describe this missing data assumption, using words and

appropriate mathematical notation.

[4]

Turn Over

STAT7002 Examination 2018 Page 6

B3. A town contains two medical centres (labelled A and B). Centre A has 2000 registered

adult patients and Centre B has 3000 registered adult patients. A medical researcher

carries out a stratified random sample of adult patients from these medical centres, with

stratification done by the centre at which a patient is registered. A total of 400 adult

patients are sampled (150 registered at Centre A and 250 registered at Centre B) and the

body mass index (BMI) is recorded for each sampled patient. For patients sampled from

Centre A, the sample mean and sample standard deviation BMI are 25.2 kg/m2 and 3.8

kg/m2, respectively. For patients sampled from Centre B, the sample mean and sample

standard deviation BMI are 28.1 kg/m2 and 4.1 kg/m2, respectively.

(a) Define the term stratified random sample.

[3]

(b) Calculate an estimate of the mean BMI of adults in the town and an associated

95% confidence interval. You should show your working and define any notation or

terms that you introduce.

[8]

Another researcher plans to sample 400 of the town’s households at random and collect

data on the BMI of adult occupants of each sampled home.

(c) Identify the sampling approach that this researcher has proposed. Justify your

answer.

[3]

(d) Assuming that this sampling approach is used, write down an appropriate statistical

model for BMI that accounts for variability between adults within households and

for variability between households. You should define any notation or terms that

you introduce.

[6]

Continued

STAT7002 Examination 2018 Page 7

STAT7002 Social Statistics: Some formulae

Below are some formulae from the STAT7002 course notes. Note that these formulae are just

copied, there is no properly introduced notation and no explanation regarding each formula.

The same symbol may mean different things in different formulae and may not necessarily

apply to any examination question where the same symbol is used.

There is no guarantee that any of these formulae is needed in the examination.

In addition, there is no guarantee that all formulae required for this examination are listed

below.

ese(µˆ) =

√

s2(1− f)

n

, ese ˆ(T ) = N

√

s2(1− f)

n

, ese(Pˆ ) =

√

pq(1− f)

n− 1 .

ese(Tˆ ) =

√∑

i

N2i s

2

i (1− fi)/ni, ese(µˆ) =

√∑

i

W 2i s

2

i (1− fi)/ni .

ese(Pˆ ) =

√∑

i

W 2i (1− fi)piqi/(ni − 1) , f =

n

N

.

1

n

≤

(

k

S

)2

+

1

N

,

1

n

≤ 4(N − 1)k

2

N

+

1

N

,

1

n

≤

(

k

CV

)2

+

1

N

α =

kr¯

1 + (k − 1)r¯ , ni =

Ni

N

n, ni = WiSi/

√

λci, ni =

(

NiSi∑k

i=1NiSi

)

n

µˆ =

k∑

i=1

Wiy¯i,

√

λ =

∑k

i=1

√

ciWiSi

C − c0 ,

√

λ =

V +

∑k

i=1W

2

i S

2

i /Ni∑k

i=1

√

ciWiSi

var(µˆ) =

k∑

i=1

W 2i S

2

i (1− fi)/ni, ρ =

σ2u

σ2u + σ

2

ε

α =

k

k − 1

(

1−

∑k

i=1 s

2

i

s2Y

)

, var(µˆ) =

1

n

(

k∑

i=1

WiSi

)2

− 1

N

k∑

i=1

WiS

2

i ,

S2i = PiQiNi/(Ni − 1) ≈ PiQi, Tˆ =

∑

j

Nj y¯j, X

2 =

∑

i

(Oi − Ei)2

Ei

deff = 1 + ρ(m¯− 1), µˆcl =

∑n

i=1 yi∑n

i=1mi

, µˆIPW =

∑k

i=1wiyi∑k

i=1wini

End of Paper