程序代写案例-G13MVA

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
The Universit\ of Nottingham
SCHOOL OF MATHEMATICAL SCIENCES
A LEVEL 3 MODULE, SPRING SEMESTER 2018-2019
MULTIVARIATE ANALYSIS
Time allowed TWO Hours THIRTY Minutes
CaQdidaWeV ma\ cRmSleWe Whe fURQW cRYeU Rf WheiU aQVZeU bRRk aQd VigQ WheiU deVk caUd bXW
mXVW NOT ZUiWe aQ\WhiQg elVe XQWil Whe VWaUW Rf Whe e[amiQaWiRQ SeUiRd iV aQQRXQced.
CUedLW ZLOO be gLYeQ fRU Whe beVW THREE aQVZeUV
OQl\ VileQW, Velf-cRQWaiQed calcXlaWRUV ZiWh a SiQgle-LiQe DiVSla\ RU DXal-LiQe DiVSla\ aUe
SeUmiWWed iQ WhiV e[amiQaWiRQ.
DicWiRQaUieV aUe QRW allRZed ZiWh RQe e[ceSWiRQ. ThRVe ZhRVe fiUVW laQgXage iV QRW EQgliVh ma\
XVe a VWaQdaUd WUaQVlaWiRQ dicWiRQaU\ WR WUaQVlaWe beWZeeQ WhaW laQgXage aQd EQgliVh SURYided
WhaW QeiWheU laQgXage iV Whe VXbjecW Rf WhiV e[amiQaWiRQ. SXbjecW VSecific WUaQVlaWiRQ
dicWiRQaUieV aUe QRW SeUmiWWed.
NR elecWURQic deYiceV caSable Rf VWRUiQg aQd UeWUieYiQg We[W, iQclXdiQg elecWURQic dicWiRQaUieV,
ma\ be XVed.
DO NOT WXUQ e[aPLQaWLRQ SaSeU RYeU XQWLO LQVWUXcWed WR dR VR
ADDITIONAL MATERIAL: NHaYH¶V SWaWLVWLFV TaEOHV
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
TXUQ RYeU
1
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
1. (a) i) What does it mean to sa\ that ৛ (৐ · ৐) is an orthogonal matrix?
ii) The spectral decomposition theorem states that an\ s\mmetric matrix ৱ (৐ · ৐)
ma\ be written in the form
ৱ > ৫ಀ৫ԑ > ৐า৊>2 ౠ৊ਅ৊ਅԑ৊ / (1)
Specif\ ৫ in terms of the ਅ৊, ৊ > 2-Ϳ - ৐, and ಀ in terms of the ౠ৊, ৊ > 2-Ϳ - ৐.
What is the value of ਅԑ৊ ਅো?
iii) Explain how to simulate a random vector ਌ ҩ ঵৐)ಛ-ಈ*, assuming that a method
for simulating ਎ ҩ ঵৐)ദ৐- ৣ৐* is available, where ദ৐ is the ৐-vector of ]eros and ৣ৐
is the ৐ · ৐ identit\ matrix. Prove that \our method produces a random vector ਌
with the correct distribution.
[12 marks]
(b) i) Let ৭ denote the sample covariance matrix of ৐-vectors ਌2-Ϳ - ਌ৎ, i.e.
৭ > 2ৎ
ৎา৉>2)਌৉ ѿ Ȣ਌*)਌৉ ѿ Ȣ਌*ԑ-
where Ȣ਌ > ৎѿ2Ѿৎ৉>2 ਌৉. Assuming that ৭ has eigenvalues ౠ2 Ӓ ౠ3 Ӓ Ϳ Ӓ ౠ৐,
prove thatnbyਉңͱਉͱ>2 ਉԑ৭ਉ > ౠ2/
Specif\ a choice of ਉ for which the maximum is attained.
ii) Explain how the remaining principal components are obtained, and express them
in terms of quantities which appear in the spectral decomposition (1) of the matrix৭.
[12 marks]
(c) Five measurements were taken on each of 49 female sparrows. These measurements
(in millimetres) were: ি2 > total length, ি3 > alar extent, ি4 > length of beak and
head, ি5 > length of humerus and ি6 > length of keel of sternum.
The eigenvalues and eigenvectors of the sample correlation matrix of these 49 observation
vectors are given b\
Eigenvalues 4/727 1/643 1/497 1/413 1/276
Eigenvectors
in
columns
֦֩֩
֧֩֩
֨
1/563 ѿ1/162 1/7:2 ѿ1/531 1/4851/573 1/411 1/452 1/659 ѿ1/6411/562 1/436 ѿ1/566 ѿ1/717 ѿ1/4541/582 1/296 ѿ1/522 1/499 1/7631/4:9 ѿ1/988 ѿ1/28: 1/17: ѿ1/2:3
/
cRQWiQXed RQ Qe[W Sage
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
2
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
i) Draw a scree plot. Hence or otherwise, suggest the number, ো, of principal components
that should be retained. Explain \our reasoning.
ii) Calculate the proportion of variabilit\ explained b\ these ো components.
iii) Provide an interpretation for each of the first 3 components.
iv) It turns out that the instrument for measuring variable ি2 was in error b\ a
constant factor 1.1. E.g. if the measurement obtained was 21mm then the correct
measurement would be 21 · 2/2 > 22mm. All the other variables were measured
correctl\. Discuss whether the calculation of the eigenvalues and eigenvectors of
the correlation matrix needs to be done again with the corrected values of the ি2
variable given.
[16 marks]
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
TXUQ OYeU
3
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
2. (a) Suppose that ਌2-Ϳ - ਌ৎ is a random sample of vectors from the ঵৐)ಛ-ಈ* distribution.
Throughout this question it should be assumed ಈ has full rank.
i) Derive the distribution of the sample mean vector
Ȣ਌ > ৎѿ2 ৎา৉>2 ਌৉/
ii) Define the Wishart া৐)ಈ- ৎ* distribution and Hotelling¶s ঻ 3)৐- ৎ* distribution, and
state without proof the distribution of ৎ৭ where
৭ > 2ৎ
ৎา৉>2)਌৉ ѿ Ȣ਌*)਌৉ ѿ Ȣ਌*ԑ
is the sample covariance matrix. Hence derive the distribution of)ৎ ѿ 2*) Ȣ਌ ѿ ಛ*ԑ৭ѿ2) Ȣ਌ ѿ ಛ*/
[12 marks]
(b) A sample of 15 components produced in a certain manufacturing process was collected.
The length (variable 1) and width (variable 2) of each component was measured in
millimetres (mm). The sample mean vector and sample covariance matrix were found
to be
Ȣ਌ > ຏ 26/6421/64 ຐ and ৭ > ຏ 2/35 1/61/6 1/69 ຐ
respectivel\.
i) The components are required to have a mean of ಛ1 > )26/1- 21/1*ԑ. Assess the
evidence that the components produced b\ the manufacturer meet this requirement,
clearl\ stating an\ assumptions that \ou make. You should present \our answer in
terms of bounds for the ৐-value of an appropriate test.
ii) Explain briefl\ wh\ just investigating the mean of the manufacturing process is
inadequate.
HiQW: If ౩3 ҩ ঻ 3)৐- ৎ* WheQ \)ৎ ѿ ৐ , 2*0)৐ৎ*^౩3 ҩ ভ৐-ৎѿ৐,2.
[10 marks]
(c) Consider the multivariate linear model਍৉ > ড়ԑ਌৉ , ಔ৉- ৉ > 2-Ϳ - ৎ-
where ಔ৉ IIDҩ ঵৐)ദ৐- ಈ*, ദ৐ is the ৐-vector of ]eros, ਌৉ ѵ ϓ৑, ৉ > 2-Ϳ - ৎ,ಈ (৐ · ৐) is a covariance matrix of full rank and ড় (৑ · ৐) is a parameter matrix.
cRQWiQXed RQ Qe[W Sage
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
4
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
i) Show that the log-likelihood for ড় and ಈ is given b\
ৌ)ড়-ಈ* > ѿৎ3 mph }3౥ಈ} ѿ 23
ৎา৉>2)਍৉ ѿ ড়ԑ਌৉*ԑಈѿ2)਍৉ ѿ ড়ԑ਌৉*/
ii) Now write ৳ > \਍2-Ϳ - ਍ৎ^ԑ and ৲ > \਌2-Ϳ - ਌ৎ^ԑ. Show that)৳ ѿ ৲ড়*ಈѿ2)৳ ѿ ৲ড়*ԑ > ঢ় > )ৄ৉৊*ৎ৉-৊>2-
whereৄ৉৊ > )਍৉ ѿ ড়ԑ਌৉*ԑಈѿ2)਍৊ ѿ ড়ԑ਌৊*/
Hence deduce that
tr|)৳ ѿ ৲ড়*ಈѿ2)৳ ѿ ৲ড়*ԑ~ > ৎา৉>2)਍৉ ѿ ড়ԑ਌৉*ԑಈѿ2)਍৉ ѿ ড়ԑ਌৉*/
iii) Define৪ > ৣ৐ ѿ৲)৲ԑ৲*ѿ2৲ԑ
and show that ৪৲ > ദৎ-৑ and ৲ԑ৪ > ദ৑-ৎ, where ദৎ-৑ is the ৎ · ৑ matrix of ]eros.
iv) Define Ƞড় > )৲ԑ৲*ѿ2৲ԑ৳ (2)
and show that
tr|)৳ ѿ৲ড়*ಈѿ2)৳ ѿ৲ড়*ԑ~ > tr|৪৳ ಈѿ2৳ ԑ৪~, tr|৲) Ƞড় ѿড়*ಈѿ2) Ƞড় ѿড়*ԑ৲ԑ~/
HiQW: Recall WhaW fRU aQ\ cRmSaWible maWUiceV ৛ aQd ড়, WU)৛ড়* > WU)ড়৛*.
v) Deduce that Ƞড় is the maximum likelihood estimator of ড়. Briefl\ explain \our
reasoning.
[18 marks]
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
TXUQ OYeU
5
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
3. (a) i) Let ʂ৊ denote a population which is defined b\ a probabilit\ densit\ function ে৊)਌*,৊ > 2-Ϳ - ৈ, where ਌ ѵ ϓ৐. What is the (population) maximum likelihood (ML)
decision rule for allocating a new observation vector ਎ ѵ ϓ৐ to one of ʂ2-Ϳ -ʂৈ?
ii) Supposing now that ৈ > 3 and that the ʂ৊ have ঵৐)ಛ৊- ಈ* densities with different
means but a common covariance matrix ಈ, which is assumed to be non-singular.
The population ML discriminant rule is:
³Allocate ਎ to ʂ2 if and onl\ if৵ԑ)਎ ѿ ৼ* ? 1 where ৵ > ಈѿ2)ಛ2 ѿ ಛ3* and ৼ > 23)ಛ2 , ಛ3*/༛
Suppose now that we do not know the values of ಛ2, ಛ3 andಈ but we do have access
to training samples ਌2-Ϳ - ਌ৎ and ਍2-Ϳ - ਍্ from ʂ2 and ʂ3 respectivel\. Explain
how \ou would construct a ³sample ML decision rule´, defining an\ quantities that
\ou use.
[6 marks]
(b) i) Suppose we observe training samples of 50 observation vectors ਌2-Ϳ - ਌61 from
population ʂ2 and 60 observation vectors ਍2-Ϳ - ਍71 from ʂ3, with sample mean
vectors and sample covariance matrices given b\
Ȣ਌ > ຏ 5/15/4 ຐ - Ȣ਍ > ຏ 5/75/2 ຐ - ৭৘ > ຏ 21 77 25 ຐ - ৭৙ > ຏ 23 66 24 ຐ /
Determine the ML decision rule for allocating a new observation vector ਎ > )৚2- ৚3*ԑ
to one of the two populations, and show that it ma\ be expressed in the form
³allocate ਎ to ʂ2 if and onl\ if ৚3 ? 64৚2 ѿ 3/:8´.
Present \our results graphicall\, with ৚2 and ৚3 as the coordinate axes. An\
assumptions that \ou make should be clearl\ stated.
ii) To which population(s) would \ou allocate new observation vectors )5/3- 5/3*ԑ and)5/8- 5/5*ԑ using the rule derived in part (b)(i)?
iii) Suppose now that ਎ is reall\ from ʂ2. Obtain an estimate of the probabilit\ that ਎
is misclassified (i.e. ਎ is allocated to ʂ3).
[18 marks]
(c) i) Give a brief description of Fisher¶s linear classification rule.
ii) Under the Gaussian model in part (a)(ii), where ʂ৊ has a ঵৐)ಛ৊- ಈ* distribution,৊ > 2-Ϳ - ৈ, and the populations have a common covariance matrix ಈ, discuss how
\ou would calculate misclassification probabilities when ৈ ? 3.
iii) Should we prefer the ML decision rule or Fisher¶s classification rule? Discuss briefl\.
[16 marks]
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
6
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
4. (a) The Singular Value Decomposition (SVD) states that an\ ৐ · ৑ matrix ma\ be written
in the form
৛ > ৫ಃ৬ԑ > ৔า৊>2 ౣ৊ਅ৊৒ԑ৊ - (3)
where ৔ > njo)৐- ৑*, ৫ > \ਅ2-Ϳ - ਅ৔^, ৬ > \৒2-Ϳ - ৒৔^, ৫ԑ৫ > ৣ৔ > ৬ԑ৬,ಃ > diag|ౣ2-Ϳ - ౣ৔~ and ౣ2 Ӓ Ϳ Ӓ ౣ৔ Ӓ 1.
i) Given a set of ৐ ৘-variables and ৑ ৙-variables, explain the purpose of a canonical
correlation (cc) anal\sis.
ii) Give details of how to calculate the first cc component: namel\, the weight vector
for the ৘-variables, the weight vector for the ৙-variables and the first cc coefficient.
You ma\ express these cc quantities in terms of the quantities which arise in the
SVD in (3). Ensure that \ou define the matrix ৛.
iii) Consider now the case ৑ > 2, i.e. there is onl\ one ৙-variable. Appl\ the formulae
given in part (a)(ii) in this case, simplif\ing the expressions as far as possible, and
explain the connection with the ordinar\ least squares (OLS) estimator of ಑ in the
univariate linear model৙৉ > ಑ԑ਌৉ , ౚ৉- ৉ > 2-Ϳ - ৎ- (4)
where ಑ and ਌৉ are ৐-vectors and ౚ৉ IIDҩ ঵)1- ౨3*.
iv) What interpretation of OLS is suggested b\ canonical correlation anal\sis?
HiQW: NRWe WhaW Whe OLS eVWimaWRU Rf ಑ iQ (4) iV giYeQ b\ Whe fRUmXla (2) ZiWh ৐ > 2.
[18 marks]
(b) The following dissimilarit\ matrix was obtained for 4 legal cases, ন, ঩, প and ফ, that
are being studied.
Caseন ঩ প ফন 1 1/7: 1/:3 1/:6঩ 1/7: 1 1/:9 1/:1
Case প 1/:3 1/:9 1 1/77ফ 1/:6 1/:1 1/77 1
i) Plot the dendrogram based on the ViQgle liQkage meWhRd.
ii) Plot the dendrogram based on the cRmSleWe liQkage meWhRd.
iii) What clusters arise in (i) and (ii) if we take the threshold as 1/:5 in each case?
Comment briefl\ on the similarities and differences between the single and complete
linkage methods in this example.
[12 marks]
cRQWiQXed RQ Qe[W Sage
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
TXUQ OYeU
7
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
(c) i) Given a distance matrix ৞, explain how to calculate the SUiQciSal cRRUdiQaWeV from
a certain matrix, ড় sa\, which \ou should define explicitl\ in terms of ৞.
ii) Suppose that ড় has eigenvaluesౠ2 > 3- ౠ3 > 2- ౠ4 > 1 and ౠ5 > ѿ2
and corresponding unit eigenvectors
ਅ2 > 23 ֛֛֚֜
2ѿ22ѿ2
֝֞֞
֟ - ਅ3 >
23 ֛֛֚֜
22ѿ2ѿ2
֝֞֞
֟ - ਅ4 >
23 ֛֛֚֜
2ѿ2ѿ22
֝֞֞
֟ - ਅ5 >
23 ֛֛֚֜
2222
֝֞֞
֟ /
A) Is ৞ a Euclidean distance matrix? An\ result that \ou use should be clearl\
stated but no proof is required.
B) Write down the first two principal coordinates for each point.
[10 marks]
MATH3030.G13MVA-E1
MATH4068.G14AMS-E1
EQG

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468