Module Code: MATH5745M01

Q1.

(a) Suppose that A is a square n× n matrix. What do you understand by the phrase

“matrix A is not of full rank”? Outline some properties of a not full rank matrix.

How would you check whether the matrix A is not of full rank? [8 marks]

(b) Suppose that B is a symmetric matrix such that B > 0. Explain carefully what

the notation “B > 0” tells you about the matrix B. Outline some properties of

matrix B. [8 marks]

(c) Suppose you are told that a square n×n matrix C is an orthogonal matrix. What

does this tell you about matrixC? Outline some properties of matrix C. [6 marks]

(d) The spectral decomposition theorem states: “Any symmetric (n×n) matrix S can

be written as S = GDG′ where G is the matrix of standardized eigen-vectors of

S and D is a diagonal matrix of eigen-values of S.” Discuss the importance of the

spectral decomposition theorem to Multivariate Statistics. [4 marks]

(e) Why is the multivariate normal distribution important in Multivariate Analysis?

Discuss its usefulness and its limitations. [8 marks]

(f) The figure below shows the results of a cluster analysis of some data for 28 countries

based upon the values of two variables at the end of April 2020.

Explain to a non-statistician what the plot shows (you do not need to know what

the data is) and also explain the methodology used to construct the plot. Discuss

whether alternative methods might be suitable for producing a more informative

cluster analysis plot. [11 marks]

0

50

0

10

00

15

00

20

00

25

00

Data 30/4/2020: Average linkage − Euclidean distance

D

is

ta

nc

e

Br

a

zi

l

Po

la

nd

Ja

pa

n

S.

K

o

re

a

M

al

ay

sia Ira

n

R

om

an

ia

R

us

si

a

Fi

nl

an

d

N

et

he

rla

nd

s

Sw

e

de

n

Po

rtu

ga

l

UK

Fr

a

n

ce

G

er

m

a

ny

Is

ra

e

l

Ca

na

da

N

or

wa

y

Au

st

ria

D

en

m

ar

k

Sp

ai

n

Ic

el

an

d

Be

lg

iu

m

Ire

la

nd

Si

ng

ap

or

e

Ita

ly

US

A

Sw

itz

e

rla

nd

Page 1 of 13 Turn the page over

Module Code: MATH5745M01

Q2.

(a) The figure shows the contours of probability density functions from three bivariate

normal distributions with variables x1 and x2. They have the same mean vector

µ = (5, 5)′, but different structures of covariance matrix Σ.

(a)

x1

x2

0.01

0.02

0.03

0.04

0.05 0.06

0.07

0 2 4 6 8 10

0

2

4

6

8

10

(b)

x1

x2

0.01

0.02

0.03

0.04

0.05

0.06

0.0

7 0.08

0 2 4 6 8 10

0

2

4

6

8

10

(c)

x1

x2

0.05

0.1

0.15

0.2

0.3

0 2 4 6 8 10

0

2

4

6

8

10

Answer the following questions, and explain briefly your reasoning.

(i) Which figure has the highest correlation between x1 and x2? [2 marks]

(ii) Which figure shows independence between x1 and x2? [2 marks]

(iii) Which figure has the lowest generalised population covariance |Σ|? [2 marks]

(iv) What can you say about the relationship between x1 and x2 in each of figures

(a) and (b) and (c) above? [3 marks]

(v) If you are told that x2 = 6, what can you then say about the distribution of

x1 in each of figures (a) and (b) and (c) above? [6 marks]

(There is no need to try to estimate the mean and variance exactly.)

Page 2 of 13 Turn the page over

Module Code: MATH5745M01

Question Q2 continued:

(b) An anthropologist is interested in the physical characteristics of adult males in an

isolated tribe. He measures the total body height (x1), arm length (x2), and head

circumference (x3) of 20 adult males. The measurements are all in centimeters.

The sample mean vector x and sample covariance matrix S of the measurements

are given by

x =

154.661.7

53.4

, S =

7.095 2.768 2.6952.768 5.589 3.495

2.695 3.495 2.673

.

(i) Let X be the data matrix of size 20× 3 for the above example. Suppose the

anthropologist wishes to have the measurements of the data to be in inches

instead of centimeters. Using matrix operations, how would you define a new

data matrix Y with the measurements in inches? How would you obtain the

sample covariance matrix S(Y) of the data matrix Y if given S? Do you

expect the elements of S(Y) to be larger or smaller in magnitude than those

of S? Explain your reasoning. [4 marks]

(Note that there is no need to calculate the elements of S(Y). Also note that

one inch is equal to 2.54 centimeters.)

(ii) For the given sample mean vector x and the given sample covariance matrix

S above, what can you deduce about the three variables x1, x2 and x3 and

their inter-relationship? [4 marks]

(iii) Let R be the sample correlation matrix of X. How would you obtain R using

matrix operations if given S? What can you say when comparing R to the

sample correlation matrix R(Y) based upon the data matrix Y? Explain your

reasoning. [4 marks]

(c) The anthropologist is interested in the null hypotheses H0 : Σ = Σ0, where Σ is

the population covariance matrix of X and

Σ0 =

5 0 00 5 0

0 0 5

.

(i) Looking at the structure of Σ0, what is being hypothesised by the anthropol-

ogist with regard to the dependencies between variables? Do you think this is

a plausible hypothesis? Explain your answer. [3 marks]

(ii) Suppose V is the maximum likelihood estimate of the covariance matrix of X.

How do you calculate V if given S? [1 mark]

(iii) From the above data, we have loge |Σ−10 V| = −2.64038, and trace(Σ−10 V) =

2.91783. Test the null hypothesis H0 at the 5% significance level. What do

you conclude? [4 marks]

(iv) If the null hypothesis were of the form Σ0 = vI, where I is a (3× 3) identity

matrix, and given the values in (c)(iii) above, write down your new test

statistic U in terms of v. What value of v makes U a minimum? [5 marks]

Page 3 of 13 Turn the page over

Module Code: MATH5745M01

Q3.

Morphology is the branch of biology that deals with the form (structure) of living organ-

isms. An expert measures the length (in cm) and weight (in hundred of grams) of 20

adult birds from the same species, but from two different sub-species (10 birds in each

sub-species). The data can be seen in the following figure, where the points are marked

differently to distinguish observations from sub-species 1 and sub-species 2.

40 42 44 46 48

4

6

8

10

12

14

Length (cm)

W

e

ig

ht

(1

00

gr

)

Sub−species 1

Sub−species 2

The sample mean for Variety 1 is y

1

= (45.4, 8.01)′ and for Variety 2 it is y

2

=

(43.0, 10.06)′. The pooled sample covariance matrix Sp and its inverse S

−1

p is given by:

Sp =

(

3.578 2.053

2.053 2.002

)

, S−1p =

(

0.6795 −0.6970

−0.6970 1.2145

)

.

(a) Describe the principle of linear discriminant analysis. [3 marks]

(b) Looking solely from the horizontal axis (Length) or vertical axis (Weight) in the

above figure, can you identify a clear separation between the two varieties? Explain

briefly your reasoning. [2 marks]

(c) Find the discriminant function from the above data. What can you say about the

discriminant function line? Calculate the standardised coefficients of the discrimi-

nant function. [5 marks]

(d) Suppose the expert found two new observations. The first one is a bird with length

44 cm and weight 8.7 (in units of 100 grams). The second one is a bird with length

47 cm and weight 9.0 (in units of 100 grams).

(i) Before performing any calculations, identify to which sub-species should each

new observation be classified. Explain briefly your reasoning. [3 marks]

(ii) Now write down the discriminant rule. Based on this rule, to which sub-species

should each new observation be classified? [4 marks]

Page 4 of 13 Turn the page over

Module Code: MATH5745M01

Question Q3 continued:

(e) Consider the following two plots each showing data from two different groups.

42 43 44 45 46 47 48

4

6

8

10

12

(a)

x1

x2

0.0 0.5 1.0 1.5 2.0 2.5 3.0

4.

0

4.

4

4.

8

5.

2

(b)

x1

x2

For each of the above two figures, answer these questions:

(i) Do you see a group separation in the figure?

(ii) Is linear discriminant analysis suitable for separating the groups? Explain your

reasoning.

[4 marks]

(f) Discuss how linear discriminant analysis is similar to or is different from the aims

of cluster analysis. [4 marks]

Page 5 of 13 Turn the page over

Module Code: MATH5745M01

Q4.

Ten students in the School of Mathematics took the same set of modules in Semester 1,

denoted M1, M2, M3, and M4. Their module marks are shown in the following table,

where the students are labelled by A-J.

Student M1 M2 M3 M4

A 60 57 65 56

B 65 63 63 48

C 58 56 64 57

D 67 58 60 66

E 65 49 60 50

F 52 52 54 47

G 57 50 60 54

H 55 60 62 53

I 73 63 66 61

J 67 55 59 51

The sample mean vector y and covariance matrix S of the data are given by

y =

61.9

56.3

61.3

54.3

, S =

42.544 13.589 10.144 17.033

13.589 24.456 10.789 9.678

10.144 10.789 12.233 9.456

17.033 9.678 9.456 35.122

.

(a) Explain the idea of principal component analysis. [3 marks]

(b) Consider the following edited output of analysis in R, where dat contains the

dataset above and “??” replaces a real number.

> eigen(cov(dat))

$values

[1] ?? 21.594106 18.779765 5.446926

$vectors

[,1] [,2] [,3] [,4]

[1,] 0.6785561 -0.55556959 0.4795013 -0.03134488

[2,] 0.3995959 -0.17495326 -0.7956614 -0.42028241

[3,] 0.2901608 0.01717662 -0.3320533 0.89735850

[4,] 0.5437751 0.81267383 0.1635297 -0.13087369

(i) What is the trace of S? What does this trace represent? [2 marks]

(ii) What is the value that is replaced by ?? in the above R output? [1 mark]

(iii) Calculate the cumulative proportion of variability of the principal components.

Giving reasons, suggest how many principal components should be considered.

[6 marks]

(iv) Explain briefly your assessment of the loadings of the principal components

you selected in part (b)(iii) above. What proportion of total variability in the

data do your chosen principal components represent? [4 marks]

Page 6 of 13 Turn the page over

Module Code: MATH5745M01

Question Q4 continued:

(c) The following figure shows the first two principal components from the above

data, where the points have been replaced by the student labels. Comments on

the patterns that you see in the figure, in light of your answer in part (b)(iv) above

and the data table above. [4 marks]

A

B

C

D

E

F

G

H

I

J

−15 −10 −5 0 5 10 15

−

10

−

5

0

5

10

z1

z2

(d) Suppose, hypothetically, the sample covariance matrix that you observe is in the

form

S⋆ =

250 10 10 10

10 10 8 8

10 8 10 8

10 8 8 10

.

What potential problem could arise in the interpretation of the results of the analysis

using S⋆? What solution do you recommend to deal with the problem? Explain

your reasoning. Would your solution give the same results as using S⋆? [3 marks]

(e) Explain briefly the differences and similarities between the aims of principal com-

ponent analysis and those of factor analysis. [2 marks]

Page 7 of 13 Turn the page over

Module Code: MATH5745M01

Q5.

An expert in modern language conducted an experiment in which the time (in millisec-

onds) of pronunciation of two different syllables, denoted x and y, was measured in three

different contexts from 12 unrelated individuals. The expert believes that both syllables

are highly correlated in terms of the time to pronounce them across different contexts.

Some of the data are shown in the following table:

x1 x2 x3 y1 y2 y3

28 29 37 30 32 42

28 24 28 26 34 35

28 29 29 33 32 27

27 25 25 25 24 29

...

...

...

...

...

...

The overall sample covariance matrix S can be partitioned as follows:

S =

(

Sxx Sxy

Syx Syy

)

.

The diagonal elements of S are given by (4.99, 12.73, 14.82, 12.45, 13.12, 27.42).

The corresponding sample correlation matrix Rxy between x variables and y variables is

given by

Rxy =

y1 y2 y3

x1 0.633 0.600 0.193

x2 0.390 0.442 −0.039

x3 0.438 0.416 0.451

.

(a) Explain what is the aim of canonical correlation analysis. Why is it done? [2

marks]

(b) Looking at the correlation matrix Rxy (and without doing any calculation), is the

expert’s belief likely? Explain your reasoning. [2 marks]

(c) Consider the following matrix

Mx = S

−1

xxSxyS

−1

yy Syx,

with eigenvalues and eigenvectors (as an output from R) satisfying:

Page 8 of 13 Turn the page over

Module Code: MATH5745M01

Question Q5(c) continued:

> eigen(mx)

$values

[1] 0.721597936 0.231768804 0.002894897

$vectors

[,1] [,2] [,3]

[1,] -0.9084010 -0.3187745 -0.8398968

[2,] -0.1633495 -0.6918638 0.4608839

[3,] -0.3848694 0.6478482 0.2866345

(i) What is the largest canonical correlation, denoted r1, between x’s and y’s? [1

mark]

(ii) What can be inferred when you compare r1 to the elements of Rxy? What

could be the reason? Explain your answer. [2 marks]

(d) Consider now the matrix

My = S

−1

yy SyxS

−1

xxSxy

with eigenvectors (as output from R) satisfying:

$vectors

[,1] [,2] [,3]

[1,] -0.8744107 0.2947793 -0.72690136

[2,] -0.3866843 -0.7338849 0.68363101

[3,] -0.2930549 0.6119789 0.06529217

What are the eigenvalues of My? Explain briefly your reasoning. [2 marks]

(e) (i) Which variables substantially contribute to the largest canonical correlation?

Explain briefly your reasoning. [5 marks]

(ii) Are the pair of canonical covariates you have considered in part (e)(i) above

independent? Explain briefly your answer. [2 marks]

(f) Test whether all correlations between x’s and y’s are zero at the 5% significance

level. Is the expert’s belief justified? [5 marks]

(g) If there was just one y variable and just one x variable, what would the matrices

Mx andMy correspond to? What do the eigen-vectors ofMx andMy now equal?

Explain your answer. [4 marks]

Page 9 of 13 Turn the page over

Module Code: MATH5745M01

Normal Distribution Function Tables

The first table gives

Φ(x) =

1√

2pi

∫ x

−∞

e−

1

2

t2dt

and this corresponds to the shaded area in the

figure to the right. Φ(x) is the probability that

a random variable, normally distributed with zero

mean and unit variance, will be less than or equal

to x. When x < 0 use Φ(x) = 1−Φ(−x), as the

normal distribution with mean zero is symmetric

about zero. To interpolate, use the formula

Φ(x) ≈ Φ(x1) + x− x1

x2 − x1

(

Φ(x2)− Φ(x1)

) −3 −2 −1 0 1 2 3

0.

0

0.

1

0.

2

0.

3

0.

4

x

Table 1

x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x)

0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938

0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946

0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953

0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960

0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965

0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970

0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974

0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978

0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981

0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984

0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987

The inverse function Φ−1(p) is tabulated below for various values of p.

Table 2

p 0.900 0.950 0.975 0.990 0.995 0.999 0.9995

Φ−1(p) 1.2816 1.6449 1.9600 2.3263 2.5758 3.0902 3.2905

Page 10 of 13 Turn the page over

Module Code: MATH5745M01

Percentage Points of the χ2-Distribution

This table gives the percentage points χ2ν(P ) for

various values of P and degrees of freedom ν, as

indicated by the figure to the right.

If X is a variable distributed as χ2 with ν de-

grees of freedom, P/100 is the probability that

X ≥ χ2ν(P ).

For ν > 100,

√

2X is approximately normally dis-

tributed with mean

√

2ν − 1 and unit variance.

0 χ2ν(P )

P/100

Percentage points P

ν 10 5 2.5 1 0.5 0.1 0.05

1 2.706 3.841 5.024 6.635 7.879 10.828 12.116

2 4.605 5.991 7.378 9.210 10.597 13.816 15.202

3 6.251 7.815 9.348 11.345 12.838 16.266 17.730

4 7.779 9.488 11.143 13.277 14.860 18.467 19.997

5 9.236 11.070 12.833 15.086 16.750 20.515 22.105

6 10.645 12.592 14.449 16.812 18.548 22.458 24.103

7 12.017 14.067 16.013 18.475 20.278 24.322 26.018

8 13.362 15.507 17.535 20.090 21.955 26.124 27.868

9 14.684 16.919 19.023 21.666 23.589 27.877 29.666

10 15.987 18.307 20.483 23.209 25.188 29.588 31.420

11 17.275 19.675 21.920 24.725 26.757 31.264 33.137

12 18.549 21.026 23.337 26.217 28.300 32.909 34.821

13 19.812 22.362 24.736 27.688 29.819 34.528 36.478

14 21.064 23.685 26.119 29.141 31.319 36.123 38.109

15 22.307 24.996 27.488 30.578 32.801 37.697 39.719

16 23.542 26.296 28.845 32.000 34.267 39.252 41.308

17 24.769 27.587 30.191 33.409 35.718 40.790 42.879

18 25.989 28.869 31.526 34.805 37.156 42.312 44.434

19 27.204 30.144 32.852 36.191 38.582 43.820 45.973

20 28.412 31.410 34.170 37.566 39.997 45.315 47.498

25 34.382 37.652 40.646 44.314 46.928 52.620 54.947

30 40.256 43.773 46.979 50.892 53.672 59.703 62.162

40 51.805 55.758 59.342 63.691 66.766 73.402 76.095

50 63.167 67.505 71.420 76.154 79.490 86.661 89.561

80 96.578 101.879 106.629 112.329 116.321 124.839 128.261

Page 11 of 13 Turn the page over

Module Code: MATH5745M01

Percentage Points of the t-Distribution

This table gives the percentage points tν(P ) for

various values of P and degrees of freedom ν, as

indicated by the figure to the right.

The lower percentage points are given by sym-

metry as −tν(P ), and the probability that |t| ≥

tν(P ) is 2P/100.

The limiting distribution of t as ν → ∞ is the

normal distribution with zero mean and unit vari-

ance. 0 tν(P )

P/100

Percentage points P

ν 10 5 2.5 1 0.5 0.1 0.05

1 3.078 6.314 12.706 31.821 63.657 318.309 636.619

2 1.886 2.920 4.303 6.965 9.925 22.327 31.599

3 1.638 2.353 3.182 4.541 5.841 10.215 12.924

4 1.533 2.132 2.776 3.747 4.604 7.173 8.610

5 1.476 2.015 2.571 3.365 4.032 5.893 6.869

6 1.440 1.943 2.447 3.143 3.707 5.208 5.959

7 1.415 1.895 2.365 2.998 3.499 4.785 5.408

8 1.397 1.860 2.306 2.896 3.355 4.501 5.041

9 1.383 1.833 2.262 2.821 3.250 4.297 4.781

10 1.372 1.812 2.228 2.764 3.169 4.144 4.587

11 1.363 1.796 2.201 2.718 3.106 4.025 4.437

12 1.356 1.782 2.179 2.681 3.055 3.930 4.318

13 1.350 1.771 2.160 2.650 3.012 3.852 4.221

14 1.345 1.761 2.145 2.624 2.977 3.787 4.140

15 1.341 1.753 2.131 2.602 2.947 3.733 4.073

16 1.337 1.746 2.120 2.583 2.921 3.686 4.015

18 1.330 1.734 2.101 2.552 2.878 3.610 3.922

21 1.323 1.721 2.080 2.518 2.831 3.527 3.819

25 1.316 1.708 2.060 2.485 2.787 3.450 3.725

30 1.310 1.697 2.042 2.457 2.750 3.385 3.646

40 1.303 1.684 2.021 2.423 2.704 3.307 3.551

50 1.299 1.676 2.009 2.403 2.678 3.261 3.496

70 1.294 1.667 1.994 2.381 2.648 3.211 3.435

100 1.290 1.660 1.984 2.364 2.626 3.174 3.390

∞ 1.282 1.645 1.960 2.326 2.576 3.090 3.291

Page 12 of 13 Turn the page over

Module Code: MATH5745M01

5 Percent Points of the F -Distribution

This table gives the percentage points Fν1,ν2(P )

for P = 0.05 and degrees of freedom ν1, ν2, as

indicated by the figure to the right.

The lower percentage points, that is the values

F ′ν1,ν2(P ) such that the probability that F ≤

F ′ν1,ν2(P ) is equal to P/100, may be found us-

ing the formula

F ′ν1,ν2(P ) = 1/Fν2,ν1(P ) 0 F (P )

P/100

ν1

ν2 1 2 3 4 5 6 12 24 ∞

2 18.513 19.000 19.164 19.247 19.296 19.330 19.413 19.454 19.496

3 10.128 9.552 9.277 9.117 9.013 8.941 8.745 8.639 8.526

4 7.709 6.944 6.591 6.388 6.256 6.163 5.912 5.774 5.628

5 6.608 5.786 5.409 5.192 5.050 4.950 4.678 4.527 4.365

6 5.987 5.143 4.757 4.534 4.387 4.284 4.000 3.841 3.669

7 5.591 4.737 4.347 4.120 3.972 3.866 3.575 3.410 3.230

8 5.318 4.459 4.066 3.838 3.687 3.581 3.284 3.115 2.928

9 5.117 4.256 3.863 3.633 3.482 3.374 3.073 2.900 2.707

10 4.965 4.103 3.708 3.478 3.326 3.217 2.913 2.737 2.538

11 4.844 3.982 3.587 3.357 3.204 3.095 2.788 2.609 2.404

12 4.747 3.885 3.490 3.259 3.106 2.996 2.687 2.505 2.296

13 4.667 3.806 3.411 3.179 3.025 2.915 2.604 2.420 2.206

14 4.600 3.739 3.344 3.112 2.958 2.848 2.534 2.349 2.131

15 4.543 3.682 3.287 3.056 2.901 2.790 2.475 2.288 2.066

16 4.494 3.634 3.239 3.007 2.852 2.741 2.425 2.235 2.010

17 4.451 3.592 3.197 2.965 2.810 2.699 2.381 2.190 1.960

18 4.414 3.555 3.160 2.928 2.773 2.661 2.342 2.150 1.917

19 4.381 3.522 3.127 2.895 2.740 2.628 2.308 2.114 1.878

20 4.351 3.493 3.098 2.866 2.711 2.599 2.278 2.082 1.843

25 4.242 3.385 2.991 2.759 2.603 2.490 2.165 1.964 1.711

30 4.171 3.316 2.922 2.690 2.534 2.421 2.092 1.887 1.622

40 4.085 3.232 2.839 2.606 2.449 2.336 2.003 1.793 1.509

50 4.034 3.183 2.790 2.557 2.400 2.286 1.952 1.737 1.438

100 3.936 3.087 2.696 2.463 2.305 2.191 1.850 1.627 1.283

∞ 3.841 2.996 2.605 2.372 2.214 2.099 1.752 1.517 1.002

Page 13 of 13 End.