MATH3030-E1 The Universit\ of Nottingham SCHOOL OF MATHEMATICAL SCIENCES A LEVEL 3 MODULE, SPRING SEMESTER 2019-2020 MULTIVARIATE ANALYSIS Suggested time to complete: TWO Hours THIRTY Minutes Paper set: 21/05/2020 - 10:00 Paper due: 28/05/2020 - 10:00 AQVZeU ALL TXeVWiRQV YRXU VROXWiRQV VhRXOd be ZUiWWeQ RQ ZhiWe SaSeU XViQg daUN iQN (QRW SeQciO), RQ a WabOeW, RU W\SeVeW. DR QRW ZUiWe cORVe WR Whe PaUgiQV. YRXU VROXWiRQV VhRXOd iQcOXde cRPSOeWe e[SOaQaWiRQV aQd aOO iQWeUPediaWe deUiYaWiRQV. YRXU VROXWiRQV VhRXOd be baVed RQ Whe PaWeUiaO cRYeUed iQ Whe PRdXOe aQd iWV SUeUeTXiViWeV RQO\. AQ\ QRWaWiRQ XVed VhRXOd be cRQViVWeQW ZiWh WhaW iQ Whe LecWXUe NRWeV. GXidaQce RQ Whe AOWeUQaWiYe AVVeVVPeQW AUUaQgePeQWV caQ be fRXQd RQ Whe FacXOW\ Rf ScieQce MRRdOe Sage: ?iiTb,ffKQQ/H2XMQiiBM;?KX+XmFf+Qm`b2fpB2rXT?T\B/4NNR89Ob2+iBQM@k SXbmiW \RXU aQVZeUV aV a ViQgle PDF ZiWh each Sage iQ Whe cRUUecW RUieQWaWiRQ, WR Whe aSSURSUiaWe dURSbR[ RQ Whe mRdXle¶V MRRdle Sage. UVe Whe VWaQdaUd QamiQg cRQYeQWiRQ fRU \RXU dRcXmeQW: [SWXdeQWID]_[MRdXleCRde].Sdf. PleaVe check Whe bR[ iQdicaWed RQ MRRdle WR cRQfiUm WhaW \RX haYe Uead aQd XQdeUVWRRd Whe VWaWemeQW RQ academic iQWegUiW\: ?iiTb,ffKQQ/H2XMQiiBM;?KX+XmFfTHm;BM7BH2XT?Tfek33N9jfKQ/n i##2/+QMi2Mifi#+QMi2Mif39Nef6QaWkyaii2K2MiWkyQMWky+/2KB+WkyAMi2;`BivXT/7 A VcaQ Rf haQdZUiWWeQ QRWeV iV cRPSOeWeO\ acceSWabOe. MaNe VXUe \RXU PDF iV eaViO\ UeadabOe aQd dReV QRW UeTXiUe PagQificaWiRQ. Te[W Zhich iV QRW iQ fRcXV RU iV QRW OegibOe fRU aQ\ RWheU UeaVRQ ZiOO be igQRUed. If \RXU VcaQ iV OaUgeU WhaQ 20Mb, SOeaVe Vee if iW caQ eaViO\ be UedXced iQ Vi]e (e.g. VcaQ iQ bOacN & ZhiWe, XVe a ORZeU dSi ² bXW QRW VR ORZ WhaW UeadabiOiW\ iV cRPSURPiVed). SWaff aUe QRW SeUPiWWed WR aQVZeU aVVeVVPeQW RU WeachiQg TXeUieV dXUiQg Whe aVVeVVPeQW SeUiRd. If \RX VSRW ZhaW \RX WhiQN Pa\ be aQ eUURU RQ Whe e[aP SaSeU, QRWe WhiV iQ \RXU VXbPiVViRQ bXW aQVZeU Whe TXeVWiRQ aV ZUiWWeQ. WheUe QeceVVaU\, PiQRU cOaUificaWiRQV RU geQeUaO gXidaQce Pa\ be SRVWed RQ MRRdOe fRU aOO VWXdeQWV WR acceVV. SWXdeQWV ZiWh aSSURYed accRPPRdaWiRQV aUe SeUPiWWed aQ e[WeQViRQ Rf 3 da\V. The VWaQdaUd UQiYeUViW\ Rf NRWWiQgham SeQalW\ Rf 5% dedXcWiRQ SeU ZRUkiQg da\ Zill aSSl\ WR aQ\ laWe VXbmiVViRQ. MATH3030-E1 TXUQ RYeU MATH3030-E1 Academic Integrit\ in AlternatiYe Assessments The alternative assessment tasks for summer 2020 are to replace e[ams that Zould have assessed \our individual performance. You Zill Zork remotel\ on \our alternative assessment tasks and the\ Zill all be undertaken in ³open book´ conditions. Work submitted for assessment should be entirel\ \our oZn Zork. You must not collude Zith others or emplo\ the services of others to Zork on \our assessment. As Zith all assessments, \ou also need to avoid plagiarism. Plagiarism, collusion and false authorship are all e[amples of academic misconduct. The\ are defined in the Universit\ Academic Misconduct Polic\ at: ?iiTb,ffrrrXMQiiBM;?KX+X mFf+/2KB+b2`pB+2bf[mHBivKMmHfbb2bbK2MiM/r`/bf+/2KB+@KBb+QM/m+iXbTt Plagiarism: representing another person¶s Zork or ideas as \our oZn. You could do this b\ failing to correctl\ acknoZledge others¶ ideas and Zork as sources of information in an assignment or neglecting to use quotation marks. This also applies to the use of graphical material, calculations etc. in that plagiarism is not limited to te[t-based sources. There is further guidance about avoiding plagiarism on the Universit\ of Nottingham Zebsite. False AXthorship: Zhere \ou are not the author of the Zork \ou submit. This ma\ include submitting the Zork of another student or submitting Zork that has been produced (in Zhole or in part) b\ a third part\ such as through an essa\ mill Zebsite. As it is the authorship of an assignment that is contested, there is no requirement to prove that the assignment has been purchased for this to be classed as false authorship. CollXsion: cooperation in order to gain an unpermitted advantage. This ma\ occur Zhere \ou have consciousl\ collaborated on a piece of Zork, in part or Zhole, and passed it off as \our oZn individual effort or Zhere \ou authorise another student to use \our Zork, in part or Zhole, and to submit it as their oZn. Note that Zorking Zith one or more other students to plan \our assignment Zould be classed as collusion, even if \ou go on to complete \our assignment independentl\ after this preparator\ Zork. AlloZing someone else to cop\ \our Zork and submit it as their oZn is also a form of collusion. Statement of Academic Integrit\ B\ submitting a piece of Zork for assessment \ou are agreeing to the folloZing statements: 1. I confirm that I have read and understood the definitions of plagiarism, false authorship and collusion. 2. I confirm that this assessment is m\ oZn Zork and is not copied from an\ other person¶s Zork (published or unpublished). 3. I confirm that I have not Zorked Zith others to complete this Zork. 4. I understand that plagiarism, false authorship, and collusion are academic offences and I ma\ be referred to the Academic Misconduct Committee if plagiarism, false authorship or collusion is suspected. MATH3030-E1 TXUQ RYeU 1 MATH3030-E1 1. (a) i) Briefl\ describe the method of principal components anal\sis and e[plain its main uses. ii) Describe the situations Zhen it is most suitable to use principal components anal\sis of the sample correlation matri[৬ rather than the sample covariance matri[ ৭, and Zhen it is preferable to use ৭ rather than ৬. [10 marks] (b) The profits (in ৌৌৎ) of five banks ন-- প-ফ-ব from the United Kingdom Zere recorded as vectors of length 5 over 40 quarter \ear periods. A principal components anal\sis Zas performed on the sample covariance matri[ Zith eigenvectors given b\ PC1 PC2 PC3 PC4 PC5ন 0.421 -0.526 0.541 -0.176 0.472 0.457 0.509 0.178 0.676 0.206প 0.421 ি -0.435 0.385 -0.382ফ 0.470 0.260 0.335 -0.400 -0.662ব 0.464 0.240 -0.612 -0.451 0.387 Zith corresponding eigenvalues 3/613- 2/384- 1/43:- 1/312- 1/247. i) Calculate the value of ি (to tZo decimal places). ii) DraZ a scree plot and suggest the number of components ো that are needed to describe the data adequatel\. iii) Provide an interpretation of these ো components. iv) Calculate the total percentage of variabilit\ e[plained b\ these ো components. [10 marks] (c) Data are available for another 7 banks from the United States over the same period. State Zhat method could be used to investigate the linear combinations of the bank profits that are most highl\ correlated in the tZo datasets of UK and US banks, and give brief details of the technique. [10 marks] (d) In the table beloZ, Euclidean distances are given in a matri[ ֠ betZeen four pension funds based on measurements of 23 financial variables. Fund A Fund B Fund C Fund D Fund A 1 3/2 3/1 3/5 Fund B 3/2 1 2/9 1/3 Fund C 3/1 2/9 1 3/6 Fund D 3/5 1/3 3/6 1 i) Appl\ the single linkagemethod to the matri[֠. Summarise \our results graphicall\ using a dendrogram. ii) Appl\ the complete linkage method to the matri[ ֠. Summarise \our results graphicall\ using a dendrogram. iii) Suppose e[actl\ tZo clusters are required. What Zould be \our clusters based on (I) single linkage and (II) complete linkage? iv) If tZo clusters are required then state Zhich of these tZo linkage methods \ou prefer, and briefl\ give \our reasons. [10 marks] MATH3030-E1 2 MATH3030-E1 2. (a) Let 2-Ϳ - ৎ be independent identicall\ distributed)ಛ-ಈ* random variables. Denote the sample mean b\ Ȣ > 2ৎ ৎา>2 and the sample covariance matri[ b\ ৭ > 2ৎ ৎา>2) ѿ Ȣ*) ѿ Ȣ*ԑ/ i) Using the result ৎ) Ȣ ѿ ಛ*ಈѿ2) Ȣ ѿ ಛ* ҩ ౬3 / describe hoZ to obtain a confidence region for ಛ Zhen ಈ is knoZn. ii) State, Zithout proof, the distribution of౩3 > )ৎ ѿ 2*) Ȣ ѿ ಛ*৭ѿ2) Ȣ ѿ ಛ*/ iii) E[plain hoZ the result in ii) can be used to test য1 ң ಛ > ಛ1 versus য2 ң ಛ Ӎ ಛ1. In practice Zhich null distribution is used for carr\ing out the test? [10 marks] (b) The length and Zidth measurements in ্্ for a particular species of fish Zith sample si]e ৎ > 41 have mean vector Ȣ2 > )92/61- 79/:1* and covariance matri[ ৭2 > ຒ41/1 21/121/1 31/1ຓ / It has been conjectured that the mean of the length and Zidth of this species of fish should equal 91 and 7: ্্ respectivel\, i.e. ౡ1 > )91- 7:*ԑ. E[amine this claim b\ carr\ing out a suitable test, and carefull\ state \our assumptions. [10 marks] (c) The length and Zidth measurements for a second species of fish Zith sample si]e্ > 46 have mean vector Ȣ3 > )95/ ҄ ҄- 81/71*ԑ and covariance matri[ ֯3 > ຒ39/7 :/::/: 32/2ຓ - Zhere ҄҄ are the final tZo digits of \our Student ID number. For e[ample if \our Student ID is 64367289 then ҄҄ is 89 and the sample mean length of the second species of fish is 95/89. Carr\ out a tZo sample test using test statistic ౙ3 > )ৎ , ্ ѿ ѿ 2*)ৎ , ্ ѿ 3* ৎ্ৎ , ্) Ȣ2 ѿ Ȣ3*ԑ֯ѿ2 ) Ȣ2 ѿ Ȣ3*- to investigate Zhether the tZo population means are the same or not, carefull\ stating \our assumptions. Note that ֯ is the pooled unbiased covariance matri[ estimator. [10 marks] MATH3030-E1 TXUQ OYeU 3 MATH3030-E1 (d) i) Comment on Zhether or not the assumptions for the test in (c) are reasonable. ii) Briefl\ discuss an alternative procedure for testing the h\pothesis in (c) Zhich is based on the multivariate linear modelֵ > ִ֞ , ֡- Zhere the terms in the model should be specified. There is no need to carr\ out this alternative test. [10 marks] MATH3030-E1 4 MATH3030-E1 3. (a) Consider the folloZing road distances betZeen some cities in Europe (in km): Athens Barcelona Brussels Calais Cherbourg Cologne Athens 0 3313 2963 3175 3339 2762 Barcelona 3313 0 1318 1326 1294 1498 Brussels 2963 1318 0 204 583 206 Calais 3175 1326 204 0 460 409 Cherbourg 3339 1294 583 460 0 785 Cologne 2762 1498 206 409 785 0 i) Briefl\ describe hoZ to obtain the principal co-ordinates using the method of classical multidimensional scaling, making reference to hoZ the centering matri[ ֤ is used in the calculation. ii) The eigenvalues calculated from using classical multidimensional scaling for these data are (·217): )9/126- 2/534- 1/268- 1- ѿ1/113- ѿ1/12:* State Zhether or not the resulting estimated tZo dimensional map frommultidimensional scaling is a good appro[imation to the spatial arrangement of the cities, giving \our reasons. iii) Is the distance matri[ betZeen the cities a Euclidean distance matri[? Give \our reasoning. [15 marks] (b) For > 2-Ϳ - ৈ let ో denote a population described b\ a probabilit\ densit\ functionে)}ಗ*. Provide a brief e[planation of the sample ML discriminant rule in this situation. [5 marks] (c) i) Measurements of cranial length (2) and cranial breadth (3), both measured in millimetres, on samples of 51male and 51 female frogs led to the folloZing statistics for the sample mean vector ( Ȣো) and sample covariance matri[ (৭ো), Zith ো > 2 for male frogs and ো > 3 for female frogs.Ȣ2 > ຏ35/235/9ຐ Ȣ3 > ຏ33/934/6ຐ ֯2 > ຏ7/6 44 7/6ຐ ֯3 > ຏ6/6 44 6/6ຐ / Assuming the data are multivariate normal and stating an\ additional assumptions that \oumake, derive a sample ML discriminant rule for allocating a neZ observation to ో2 or ో3 for this e[ample. ii) Provide a suitable diagram and plot the straight line Zhich separates the tZo allocation regions and label each region. iii) Would \ou classif\ a neZ observation > )34- 35* as male or female? Give \our reasoning. iv) If the multivariate normal assumption looks suspect, e.g. the distributions have thicker tails than the multivariate normal distribution, briefl\ describe a possible strateg\ for obtaining an appropriate classification rule. [20 marks] MATH3030-E1 END
欢迎咨询51作业君