程序辅导案例 > Program >

辅导案例-MATH5745M01

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Module Code: MATH5745M01
Q1.
(a) Suppose that A is a square n× n matrix. What do you understand by the phrase
“matrix A is not of full rank”? Outline some properties of a not full rank matrix.
How would you check whether the matrix A is not of full rank? [8 marks]
(b) Suppose that B is a symmetric matrix such that B > 0. Explain carefully what
the notation “B > 0” tells you about the matrix B. Outline some properties of
matrix B. [8 marks]
(c) Suppose you are told that a square n×n matrix C is an orthogonal matrix. What
does this tell you about matrixC? Outline some properties of matrix C. [6 marks]
(d) The spectral decomposition theorem states: “Any symmetric (n×n) matrix S can
be written as S = GDG′ where G is the matrix of standardized eigen-vectors of
S and D is a diagonal matrix of eigen-values of S.” Discuss the importance of the
spectral decomposition theorem to Multivariate Statistics. [4 marks]
(e) Why is the multivariate normal distribution important in Multivariate Analysis?
Discuss its usefulness and its limitations. [8 marks]
(f) The figure below shows the results of a cluster analysis of some data for 28 countries
based upon the values of two variables at the end of April 2020.
Explain to a non-statistician what the plot shows (you do not need to know what
the data is) and also explain the methodology used to construct the plot. Discuss
whether alternative methods might be suitable for producing a more informative
cluster analysis plot. [11 marks]
0
50
0
10
00
15
00
20
00
25
00
Data 30/4/2020: Average linkage − Euclidean distance
D
is
ta
nc
e
Br
a
zi
l
Po
la
nd
Ja
pa
n
S.
K
o
re
a
M
al
ay
sia Ira
n
R
om
an
ia
R
us
si
a
Fi
nl
an
d
N
et
he
rla
nd
s
Sw
e
de
n
Po
rtu
ga
l
UK
Fr
a
n
ce
G
er
m
a
ny
Is
ra
e
l
Ca
na
da
N
or
wa
y
Au
st
ria
D
en
m
ar
k
Sp
ai
n
Ic
el
an
d
Be
lg
iu
m
Ire
la
nd
Si
ng
ap
or
e
Ita
ly
US
A
Sw
itz
e
rla
nd
Page 1 of 13 Turn the page over
Module Code: MATH5745M01
Q2.
(a) The figure shows the contours of probability density functions from three bivariate
normal distributions with variables x1 and x2. They have the same mean vector
µ = (5, 5)′, but different structures of covariance matrix Σ.
(a)
x1
x2
0.01
0.02
0.03
0.04
0.05 0.06

0.07
0 2 4 6 8 10
0
2
4
6
8
10
(b)
x1
x2
0.01
0.02
0.03
0.04

0.05

0.06

0.0
7 0.08
0 2 4 6 8 10
0
2
4
6
8
10
(c)
x1
x2

0.05

0.1

0.15

0.2

0.3

0 2 4 6 8 10
0
2
4
6
8
10
Answer the following questions, and explain briefly your reasoning.
(i) Which figure has the highest correlation between x1 and x2? [2 marks]
(ii) Which figure shows independence between x1 and x2? [2 marks]
(iii) Which figure has the lowest generalised population covariance |Σ|? [2 marks]
(iv) What can you say about the relationship between x1 and x2 in each of figures
(a) and (b) and (c) above? [3 marks]
(v) If you are told that x2 = 6, what can you then say about the distribution of
x1 in each of figures (a) and (b) and (c) above? [6 marks]
(There is no need to try to estimate the mean and variance exactly.)
Page 2 of 13 Turn the page over
Module Code: MATH5745M01
Question Q2 continued:
(b) An anthropologist is interested in the physical characteristics of adult males in an
isolated tribe. He measures the total body height (x1), arm length (x2), and head
circumference (x3) of 20 adult males. The measurements are all in centimeters.
The sample mean vector x and sample covariance matrix S of the measurements
are given by
x =

154.661.7
53.4

 , S =

7.095 2.768 2.6952.768 5.589 3.495
2.695 3.495 2.673

 .
(i) Let X be the data matrix of size 20× 3 for the above example. Suppose the
anthropologist wishes to have the measurements of the data to be in inches
instead of centimeters. Using matrix operations, how would you define a new
data matrix Y with the measurements in inches? How would you obtain the
sample covariance matrix S(Y) of the data matrix Y if given S? Do you
expect the elements of S(Y) to be larger or smaller in magnitude than those
of S? Explain your reasoning. [4 marks]
(Note that there is no need to calculate the elements of S(Y). Also note that
one inch is equal to 2.54 centimeters.)
(ii) For the given sample mean vector x and the given sample covariance matrix
S above, what can you deduce about the three variables x1, x2 and x3 and
their inter-relationship? [4 marks]
(iii) Let R be the sample correlation matrix of X. How would you obtain R using
matrix operations if given S? What can you say when comparing R to the
sample correlation matrix R(Y) based upon the data matrix Y? Explain your
reasoning. [4 marks]
(c) The anthropologist is interested in the null hypotheses H0 : Σ = Σ0, where Σ is
the population covariance matrix of X and
Σ0 =

5 0 00 5 0
0 0 5

 .
(i) Looking at the structure of Σ0, what is being hypothesised by the anthropol-
ogist with regard to the dependencies between variables? Do you think this is
a plausible hypothesis? Explain your answer. [3 marks]
(ii) Suppose V is the maximum likelihood estimate of the covariance matrix of X.
How do you calculate V if given S? [1 mark]
(iii) From the above data, we have loge |Σ−10 V| = −2.64038, and trace(Σ−10 V) =
2.91783. Test the null hypothesis H0 at the 5% significance level. What do
you conclude? [4 marks]
(iv) If the null hypothesis were of the form Σ0 = vI, where I is a (3× 3) identity
matrix, and given the values in (c)(iii) above, write down your new test
statistic U in terms of v. What value of v makes U a minimum? [5 marks]
Page 3 of 13 Turn the page over
Module Code: MATH5745M01
Q3.
Morphology is the branch of biology that deals with the form (structure) of living organ-
isms. An expert measures the length (in cm) and weight (in hundred of grams) of 20
adult birds from the same species, but from two different sub-species (10 birds in each
sub-species). The data can be seen in the following figure, where the points are marked
differently to distinguish observations from sub-species 1 and sub-species 2.
40 42 44 46 48
4
6
8
10
12
14
Length (cm)
W
e
ig
ht
(1
00
gr
)
Sub−species 1
Sub−species 2
The sample mean for Variety 1 is y
1
= (45.4, 8.01)′ and for Variety 2 it is y
2
=
(43.0, 10.06)′. The pooled sample covariance matrix Sp and its inverse S
−1
p is given by:
Sp =
(
3.578 2.053
2.053 2.002
)
, S−1p =
(
0.6795 −0.6970
−0.6970 1.2145
)
.
(a) Describe the principle of linear discriminant analysis. [3 marks]
(b) Looking solely from the horizontal axis (Length) or vertical axis (Weight) in the
above figure, can you identify a clear separation between the two varieties? Explain
briefly your reasoning. [2 marks]
(c) Find the discriminant function from the above data. What can you say about the
discriminant function line? Calculate the standardised coefficients of the discrimi-
nant function. [5 marks]
(d) Suppose the expert found two new observations. The first one is a bird with length
44 cm and weight 8.7 (in units of 100 grams). The second one is a bird with length
47 cm and weight 9.0 (in units of 100 grams).
(i) Before performing any calculations, identify to which sub-species should each
new observation be classified. Explain briefly your reasoning. [3 marks]
(ii) Now write down the discriminant rule. Based on this rule, to which sub-species
should each new observation be classified? [4 marks]
Page 4 of 13 Turn the page over
Module Code: MATH5745M01
Question Q3 continued:
(e) Consider the following two plots each showing data from two different groups.
42 43 44 45 46 47 48
4
6
8
10
12
(a)
x1
x2
0.0 0.5 1.0 1.5 2.0 2.5 3.0
4.
0
4.
4
4.
8
5.
2
(b)
x1
x2
For each of the above two figures, answer these questions:
(i) Do you see a group separation in the figure?
(ii) Is linear discriminant analysis suitable for separating the groups? Explain your
reasoning.
[4 marks]
(f) Discuss how linear discriminant analysis is similar to or is different from the aims
of cluster analysis. [4 marks]
Page 5 of 13 Turn the page over
Module Code: MATH5745M01
Q4.
Ten students in the School of Mathematics took the same set of modules in Semester 1,
denoted M1, M2, M3, and M4. Their module marks are shown in the following table,
where the students are labelled by A-J.
Student M1 M2 M3 M4
A 60 57 65 56
B 65 63 63 48
C 58 56 64 57
D 67 58 60 66
E 65 49 60 50
F 52 52 54 47
G 57 50 60 54
H 55 60 62 53
I 73 63 66 61
J 67 55 59 51
The sample mean vector y and covariance matrix S of the data are given by
y =


61.9
56.3
61.3
54.3

 , S =


42.544 13.589 10.144 17.033
13.589 24.456 10.789 9.678
10.144 10.789 12.233 9.456
17.033 9.678 9.456 35.122

 .
(a) Explain the idea of principal component analysis. [3 marks]
(b) Consider the following edited output of analysis in R, where dat contains the
dataset above and “??” replaces a real number.
> eigen(cov(dat))
$values
[1] ?? 21.594106 18.779765 5.446926
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.6785561 -0.55556959 0.4795013 -0.03134488
[2,] 0.3995959 -0.17495326 -0.7956614 -0.42028241
[3,] 0.2901608 0.01717662 -0.3320533 0.89735850
[4,] 0.5437751 0.81267383 0.1635297 -0.13087369
(i) What is the trace of S? What does this trace represent? [2 marks]
(ii) What is the value that is replaced by ?? in the above R output? [1 mark]
(iii) Calculate the cumulative proportion of variability of the principal components.
Giving reasons, suggest how many principal components should be considered.
[6 marks]
(iv) Explain briefly your assessment of the loadings of the principal components
you selected in part (b)(iii) above. What proportion of total variability in the
data do your chosen principal components represent? [4 marks]
Page 6 of 13 Turn the page over
Module Code: MATH5745M01
Question Q4 continued:
(c) The following figure shows the first two principal components from the above
data, where the points have been replaced by the student labels. Comments on
the patterns that you see in the figure, in light of your answer in part (b)(iv) above
and the data table above. [4 marks]
A
B
C
D
E
F
G
H
I
J
−15 −10 −5 0 5 10 15
−
10
−
5
0
5
10
z1
z2
(d) Suppose, hypothetically, the sample covariance matrix that you observe is in the
form
S⋆ =


250 10 10 10
10 10 8 8
10 8 10 8
10 8 8 10

 .
What potential problem could arise in the interpretation of the results of the analysis
using S⋆? What solution do you recommend to deal with the problem? Explain
your reasoning. Would your solution give the same results as using S⋆? [3 marks]
(e) Explain briefly the differences and similarities between the aims of principal com-
ponent analysis and those of factor analysis. [2 marks]
Page 7 of 13 Turn the page over
Module Code: MATH5745M01
Q5.
An expert in modern language conducted an experiment in which the time (in millisec-
onds) of pronunciation of two different syllables, denoted x and y, was measured in three
different contexts from 12 unrelated individuals. The expert believes that both syllables
are highly correlated in terms of the time to pronounce them across different contexts.
Some of the data are shown in the following table:
x1 x2 x3 y1 y2 y3
28 29 37 30 32 42
28 24 28 26 34 35
28 29 29 33 32 27
27 25 25 25 24 29
...
...
...
...
...
...
The overall sample covariance matrix S can be partitioned as follows:
S =
(
Sxx Sxy
Syx Syy
)
.
The diagonal elements of S are given by (4.99, 12.73, 14.82, 12.45, 13.12, 27.42).
The corresponding sample correlation matrix Rxy between x variables and y variables is
given by
Rxy =


y1 y2 y3
x1 0.633 0.600 0.193
x2 0.390 0.442 −0.039
x3 0.438 0.416 0.451

.
(a) Explain what is the aim of canonical correlation analysis. Why is it done? [2
marks]
(b) Looking at the correlation matrix Rxy (and without doing any calculation), is the
expert’s belief likely? Explain your reasoning. [2 marks]
(c) Consider the following matrix
Mx = S
−1
xxSxyS
−1
yy Syx,
with eigenvalues and eigenvectors (as an output from R) satisfying:
Page 8 of 13 Turn the page over
Module Code: MATH5745M01
Question Q5(c) continued:
> eigen(mx)
$values
[1] 0.721597936 0.231768804 0.002894897
$vectors
[,1] [,2] [,3]
[1,] -0.9084010 -0.3187745 -0.8398968
[2,] -0.1633495 -0.6918638 0.4608839
[3,] -0.3848694 0.6478482 0.2866345
(i) What is the largest canonical correlation, denoted r1, between x’s and y’s? [1
mark]
(ii) What can be inferred when you compare r1 to the elements of Rxy? What
could be the reason? Explain your answer. [2 marks]
(d) Consider now the matrix
My = S
−1
yy SyxS
−1
xxSxy
with eigenvectors (as output from R) satisfying:
$vectors
[,1] [,2] [,3]
[1,] -0.8744107 0.2947793 -0.72690136
[2,] -0.3866843 -0.7338849 0.68363101
[3,] -0.2930549 0.6119789 0.06529217
What are the eigenvalues of My? Explain briefly your reasoning. [2 marks]
(e) (i) Which variables substantially contribute to the largest canonical correlation?
Explain briefly your reasoning. [5 marks]
(ii) Are the pair of canonical covariates you have considered in part (e)(i) above
independent? Explain briefly your answer. [2 marks]
(f) Test whether all correlations between x’s and y’s are zero at the 5% significance
level. Is the expert’s belief justified? [5 marks]
(g) If there was just one y variable and just one x variable, what would the matrices
Mx andMy correspond to? What do the eigen-vectors ofMx andMy now equal?
Explain your answer. [4 marks]
Page 9 of 13 Turn the page over
Module Code: MATH5745M01
Normal Distribution Function Tables
The first table gives
Φ(x) =
1√
2pi
∫ x
−∞
e−
1
2
t2dt
and this corresponds to the shaded area in the
figure to the right. Φ(x) is the probability that
a random variable, normally distributed with zero
mean and unit variance, will be less than or equal
to x. When x < 0 use Φ(x) = 1−Φ(−x), as the
normal distribution with mean zero is symmetric
about zero. To interpolate, use the formula
Φ(x) ≈ Φ(x1) + x− x1
x2 − x1
(
Φ(x2)− Φ(x1)
) −3 −2 −1 0 1 2 3
0.
0
0.
1
0.
2
0.
3
0.
4
x
Table 1
x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x)
0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938
0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946
0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953
0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960
0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965
0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970
0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974
0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978
0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981
0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984
0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987
The inverse function Φ−1(p) is tabulated below for various values of p.
Table 2
p 0.900 0.950 0.975 0.990 0.995 0.999 0.9995
Φ−1(p) 1.2816 1.6449 1.9600 2.3263 2.5758 3.0902 3.2905
Page 10 of 13 Turn the page over
Module Code: MATH5745M01
Percentage Points of the χ2-Distribution
This table gives the percentage points χ2ν(P ) for
various values of P and degrees of freedom ν, as
indicated by the figure to the right.
If X is a variable distributed as χ2 with ν de-
grees of freedom, P/100 is the probability that
X ≥ χ2ν(P ).
For ν > 100,
√
2X is approximately normally dis-
tributed with mean
√
2ν − 1 and unit variance.
0 χ2ν(P )
P/100
Percentage points P
ν 10 5 2.5 1 0.5 0.1 0.05
1 2.706 3.841 5.024 6.635 7.879 10.828 12.116
2 4.605 5.991 7.378 9.210 10.597 13.816 15.202
3 6.251 7.815 9.348 11.345 12.838 16.266 17.730
4 7.779 9.488 11.143 13.277 14.860 18.467 19.997
5 9.236 11.070 12.833 15.086 16.750 20.515 22.105
6 10.645 12.592 14.449 16.812 18.548 22.458 24.103
7 12.017 14.067 16.013 18.475 20.278 24.322 26.018
8 13.362 15.507 17.535 20.090 21.955 26.124 27.868
9 14.684 16.919 19.023 21.666 23.589 27.877 29.666
10 15.987 18.307 20.483 23.209 25.188 29.588 31.420
11 17.275 19.675 21.920 24.725 26.757 31.264 33.137
12 18.549 21.026 23.337 26.217 28.300 32.909 34.821
13 19.812 22.362 24.736 27.688 29.819 34.528 36.478
14 21.064 23.685 26.119 29.141 31.319 36.123 38.109
15 22.307 24.996 27.488 30.578 32.801 37.697 39.719
16 23.542 26.296 28.845 32.000 34.267 39.252 41.308
17 24.769 27.587 30.191 33.409 35.718 40.790 42.879
18 25.989 28.869 31.526 34.805 37.156 42.312 44.434
19 27.204 30.144 32.852 36.191 38.582 43.820 45.973
20 28.412 31.410 34.170 37.566 39.997 45.315 47.498
25 34.382 37.652 40.646 44.314 46.928 52.620 54.947
30 40.256 43.773 46.979 50.892 53.672 59.703 62.162
40 51.805 55.758 59.342 63.691 66.766 73.402 76.095
50 63.167 67.505 71.420 76.154 79.490 86.661 89.561
80 96.578 101.879 106.629 112.329 116.321 124.839 128.261
Page 11 of 13 Turn the page over
Module Code: MATH5745M01
Percentage Points of the t-Distribution
This table gives the percentage points tν(P ) for
various values of P and degrees of freedom ν, as
indicated by the figure to the right.
The lower percentage points are given by sym-
metry as −tν(P ), and the probability that |t| ≥
tν(P ) is 2P/100.
The limiting distribution of t as ν → ∞ is the
normal distribution with zero mean and unit vari-
ance. 0 tν(P )
P/100
Percentage points P
ν 10 5 2.5 1 0.5 0.1 0.05
1 3.078 6.314 12.706 31.821 63.657 318.309 636.619
2 1.886 2.920 4.303 6.965 9.925 22.327 31.599
3 1.638 2.353 3.182 4.541 5.841 10.215 12.924
4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 5.893 6.869
6 1.440 1.943 2.447 3.143 3.707 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.785 5.408
8 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 1.372 1.812 2.228 2.764 3.169 4.144 4.587
11 1.363 1.796 2.201 2.718 3.106 4.025 4.437
12 1.356 1.782 2.179 2.681 3.055 3.930 4.318
13 1.350 1.771 2.160 2.650 3.012 3.852 4.221
14 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 1.341 1.753 2.131 2.602 2.947 3.733 4.073
16 1.337 1.746 2.120 2.583 2.921 3.686 4.015
18 1.330 1.734 2.101 2.552 2.878 3.610 3.922
21 1.323 1.721 2.080 2.518 2.831 3.527 3.819
25 1.316 1.708 2.060 2.485 2.787 3.450 3.725
30 1.310 1.697 2.042 2.457 2.750 3.385 3.646
40 1.303 1.684 2.021 2.423 2.704 3.307 3.551
50 1.299 1.676 2.009 2.403 2.678 3.261 3.496
70 1.294 1.667 1.994 2.381 2.648 3.211 3.435
100 1.290 1.660 1.984 2.364 2.626 3.174 3.390
∞ 1.282 1.645 1.960 2.326 2.576 3.090 3.291
Page 12 of 13 Turn the page over
Module Code: MATH5745M01
5 Percent Points of the F -Distribution
This table gives the percentage points Fν1,ν2(P )
for P = 0.05 and degrees of freedom ν1, ν2, as
indicated by the figure to the right.
The lower percentage points, that is the values
F ′ν1,ν2(P ) such that the probability that F ≤
F ′ν1,ν2(P ) is equal to P/100, may be found us-
ing the formula
F ′ν1,ν2(P ) = 1/Fν2,ν1(P ) 0 F (P )
P/100
ν1
ν2 1 2 3 4 5 6 12 24 ∞
2 18.513 19.000 19.164 19.247 19.296 19.330 19.413 19.454 19.496
3 10.128 9.552 9.277 9.117 9.013 8.941 8.745 8.639 8.526
4 7.709 6.944 6.591 6.388 6.256 6.163 5.912 5.774 5.628
5 6.608 5.786 5.409 5.192 5.050 4.950 4.678 4.527 4.365
6 5.987 5.143 4.757 4.534 4.387 4.284 4.000 3.841 3.669
7 5.591 4.737 4.347 4.120 3.972 3.866 3.575 3.410 3.230
8 5.318 4.459 4.066 3.838 3.687 3.581 3.284 3.115 2.928
9 5.117 4.256 3.863 3.633 3.482 3.374 3.073 2.900 2.707
10 4.965 4.103 3.708 3.478 3.326 3.217 2.913 2.737 2.538
11 4.844 3.982 3.587 3.357 3.204 3.095 2.788 2.609 2.404
12 4.747 3.885 3.490 3.259 3.106 2.996 2.687 2.505 2.296
13 4.667 3.806 3.411 3.179 3.025 2.915 2.604 2.420 2.206
14 4.600 3.739 3.344 3.112 2.958 2.848 2.534 2.349 2.131
15 4.543 3.682 3.287 3.056 2.901 2.790 2.475 2.288 2.066
16 4.494 3.634 3.239 3.007 2.852 2.741 2.425 2.235 2.010
17 4.451 3.592 3.197 2.965 2.810 2.699 2.381 2.190 1.960
18 4.414 3.555 3.160 2.928 2.773 2.661 2.342 2.150 1.917
19 4.381 3.522 3.127 2.895 2.740 2.628 2.308 2.114 1.878
20 4.351 3.493 3.098 2.866 2.711 2.599 2.278 2.082 1.843
25 4.242 3.385 2.991 2.759 2.603 2.490 2.165 1.964 1.711
30 4.171 3.316 2.922 2.690 2.534 2.421 2.092 1.887 1.622
40 4.085 3.232 2.839 2.606 2.449 2.336 2.003 1.793 1.509
50 4.034 3.183 2.790 2.557 2.400 2.286 1.952 1.737 1.438
100 3.936 3.087 2.696 2.463 2.305 2.191 1.850 1.627 1.283
∞ 3.841 2.996 2.605 2.372 2.214 2.099 1.752 1.517 1.002
Page 13 of 13 End.