Question Mark Out of

A1 8

A2 12

A3 10

A4 6

A5 9

B1 8

B2 7

B3 10

C1 6

C2 12

C3 6

C4 6

TOTAL 100

Page 2 of 43

Instructions

Answer each question in the space provided. You can write in pen or pencil. Marks are

indicated next to each question. The total mark for the exam is 100.

Part A (45 marks in total)

Question A.1 (1+1+1+1+2+1+1=8 marks)

Consider the following set of numbers: -25, 2, 3, 8, 10, 14, 18, 21, 32. For each of the questions

below, state your answer, showing working if necessary.

(a) What is the median?

(b) What is the 1st quartile?

(c) What is the 3rd quartile?

(d) What is the interquartile range.

(e) Hence sketch a box-plot. Lay it out horizontally below. Be sure to mark the values of

the various parts.

Marks / 6 Page 3 of 43

(f) You are told the mean of the numbers is 9.222 and the mean of their square is 309.666.

What is the sample standard deviation?

(g) If you only knew the mean and sample standard deviation of the sample, what does

Chebyshev’s inequality tell you?

Marks / 2 Page 4 of 43

Question A.2 (4+2+2+4=12 marks)

Throughout this question, show your working and leave your answer in a clear from. Of those

reporting to a medical clinic, 2% have medical condition Z. It is assumed that this figure of

2% is also the base rate across the population. There is a test for condition Z such that, for

those patients who have condition Z, 85% will test positive; and for those patients who do

not have condition Z, 25% will test positive.

(a) If a patient tests positive, what is the probability that the patient has condition Z?

After some consideration, it is decided that the test gives too many false positives, and it

is decided to modify the test as follows. The new test is simply to administer the original

test twice, where it is assumed that these two tests give results that are independent of one

another. A patient will be considered to have tested positive on the new test precisely in

those cases where both tests on the original test return a positive result.

(b) If a patient has condition Z, what is the probability that the patient will test positive

on the new test?

Marks / 6 Page 5 of 43

(c) If a patient does not have condition Z, what is the probability that the patient will test

positive on the new test?

(d) If a patient returns a positive result on this new test, what is the probability that the

patient has condition Z?

Marks / 6 Page 6 of 43

Question A.3 (2+3+3+2=10 marks)

Consider the probability density func-

tion given at the right, defined by

p(x) =

1

2x : 0 ≤ x ≤ 1

0.5 : 1 ≤ x ≤ 2

0.25 : 2 ≤ x ≤ 3

0 : otherwise

Consider the cumulative density func-

tion P (x) corresponding to p(x), and

the quantile function Q(p).

(a) What is P (0.5) and Q(0.375)?

(b) Derive the function for P (x).

Marks / 5 Page 7 of 43

(c) Hence give the quantile function Q(p) corresponding to p(x).

(d) Hence, or otherwise, write pseudo-code for an algorithm that will generate a sample

from this distribution.

Marks / 5 Page 8 of 43

Question A.4 (2+2+2=6 marks)

If E [X] = 1 and E

[

X2

]

= 4, E [Y ] = 0 and E

[

Y 2

]

= 1, and X and Y are independent, then:

(a) Calculate E

[

2X2 + (X + 1)2

]

.

(b) Calculate E

[

(X + 1)(Y + 1)2

]

.

(c) Calculate V [(X + 1)(Y + 1)].

Marks / 6 Page 9 of 43

Question A.5 (3+3+3=9 marks)

Consider the probability density function given by a mixture of two Gaussians with identical

standard deviation σ, as

p(x|ρ, µ1, µ2, σ) = ρN(x|µ1, σ) + (1− ρ)N(x|µ2, σ)

where N(·|·) is the probability debsity function of a Gaussian. Thus the expected value of

function f(x) under this distribution is given by

Eρ,µ1,µ2,σ [f(x)] = ρEN(µ1,σ) [f(x)] + (1− ρ)EN(µ2,σ) [f(x)]

where the two expected values on the right hand side are done using Gaussian distributions.

(a) What is the mean of x for the mixture of two Gaussians?

(b) What is the mean of x2 for the mixture of two Gaussians?

Marks / 6 Page 10 of 43

(c) What is the variance for the mixture of two Gaussians?

Marks / 3 Page 11 of 43

Part B (25 marks in total)

Question B.1 (3+2+3=8 marks)

You have data x distributed as Poisson with rate λ = 16, so x ∼ Pois(16).

(a) Show how to use the central limit theorem to get an approximate value for p(10 ≤ x ≤

20). Compute the approximate value, noting that the Z tables are only accurate to 2 decimal

places.

(b) You have a sample of 10 values from this distribution, and compute its mean x. What

is an approximate distribution for x?

(c) What are 95% confidence intervals for the mean x, according to this approximation?

Marks / 8 Page 12 of 43

Question B.2 (2+5=7 marks)

While IQ is considered to have a mean of 100 and standard deviation of 15. You expect

students in your masters class will have a higher mean.

(a) Given a sample of size 10, compute a one-sided 95% confidence interval in the form

(−∞, I] for where the measured mean should lie.

(b) You get data from 10 students with the form [104, 120, 100, 112, 133, 138, 111, 118, 114, 118].

Note that the mean of the sample is 116.8 and the mean of the squares of the sample is 13765.8.

Test the null hypothesis that the students’ IQ has mean 100. Without assuming you know

the standard deviation, give the test statistic and the p-value for this data. Note the tables

of statistics given at the back of the exam will not allow you to lookup the p-value precisely.

Marks / 7 Page 13 of 43

Question B.3 (2+2+4+2=10 marks)

You obtain paired data (X,Y ) with values ~x = [4.59, 4.60, 6.32, 4.85, 3.27, 5.92, 1.92, 6.90, 4.82, 5.39]

and ~y = [2.89, 2.46, 3.28, 2.34, 2.11, 3.56, 1.77, 3.29, 2.46, 2.60]. The various sample means (us-

ing the above data) are:

x = 4.859

y = 2.677

x2 = 25.516

y2 = 7.460

xy = 13.670

(a) What is the correlation co-efficient between X and Y ? What does this tell you about

X and Y ?

(b) Fit a simple linear model to this data in the form

Yˆ = β0 + β1X

What are your estimates for β0 and β1?

Marks / 4 Page 14 of 43

(c) What are the standard errors for β0 and β1?

(d) Test the hypothesis the β1 = 0. What is your test statistic and its p-value? What is

the outcome of the test?

Marks / 6 Page 15 of 43

Part C (30 marks in total)

Question C.1 (2+2+2=6 marks)

You have a data set supplied as real-valued pairs (X,Y ) and you wish to regress X onto Y .

You have 2 models:

A: a 4 degree polynomial

yˆ =

4∑

i=0

aix

i

B: a 20 degree polynomial

yˆ =

20∑

i=0

aix

i

(a) Describe how the bias of models A and B differ.

(b) Describe how the variance of models A and B differ.

Marks / 4 Page 16 of 43

(c) If you had 100 data points in your sample, which of ther two models would you recom-

mend? Justify your answer.

Marks / 2 Page 17 of 43

Question C.2 (5+3+2+2=12 marks)

(a) You wish to build a na¨ıve Bayes classifier regressing Booleans A, B and C onto the

Boolean X. Someone has already counted the data for you to create frequency tables below:

A=0 A=1 B=0 B=1 C=0 C=1

X=0 10 40 30 20 15 35

X=1 30 20 5 45 40 10

Construct probability tables as needed to specify the estimated na¨ıve Bayes classifier for the

task. Then give the formula for the classifier and describe how it would be used.

Marks / 5 Page 18 of 43

(b) Consider the probabilities p(A=0|X=0) and p(B=0|X=1). Compute their standard

errors, making any assumptions as needed? What can you say about the resulting estimates?

(c) Which would be better, the na¨ıve Bayesian classifier or the logistic regression classifier

for this data set? Justify your answer.

Marks / 5 Page 19 of 43

(d) The first step of the k-means algorithm is to initialise the centroids. Describe a way

this could be done, and why it is OK to use it.

Marks / 2 Page 20 of 43

Question C.3 (6=6 marks)

Consider the probability density function given below, defined by

p(x) =

2

pi

√

1− (2x− 1)2 : 0 ≤ x ≤ 1

2

pi

√

1− (2x− 3)2 : 1 ≤ x ≤ 2

0 : otherwise

This is two semi-circles side-by-side of radius 1/2, then scaled by 4/pi to get a PDF.

Page 21 of 43

(a) Devise pseudo-code for a rejection sampler for this distribution. Note the maximum

value is marked at 2pi .

Marks / 6 Page 22 of 43

Question C.4 (5+1=6 marks)

You wish to build a decision tree to predict a three-valued variable X. The first two features to

test are Booleans A and B. Someone has already counted the data for you to create frequency

tables below:

A=0 A=1 B=0 B=1

X=0 10 40 30 20

X=1 30 20 5 45

X=2 30 20 45 5

(a) Compute and report the quality measure for the attributes A and B using the informa-

tion gain metric.

Marks / 6 Page 23 of 43

(b) Hence say which attribute is recommended to use at the root of the tree?

Page 24 of 43

Blank page for additional answers if needed.

Page 25 of 43

Blank page for additional answers if needed.

Page 26 of 43