辅导案例-MATH 1309-Assignment 2

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
1

Assignment 2 PG MATH 1309 MULTIVARIATE ANALYSIS
45 points
DUE date October 25, 2019 11.59pm.
Show your SAS code, output and answers within the one attached assignment pdf or docx that you
submit in Canvas.




Question 1 (23 marks)

The file THC.csv contains data on concentrations of 13 different chemical compounds in marijuana
plants own in the same region in Colombia that are derived from three different species varieties.

1. Compute the mean and standard deviation for the 13 chemical concentrations in the
sampleTHC data via SAS (1.5 marks)

2. Produce the correlation matrix and a scatterplot in SAS. Is the correlation matrix suitable for
a principal component analysis (1.5 marks)

3. Perform a Principal component analysis using SAS on the raw data and assess how many PCs
need to retain. Answer the following from the resultant output (10 marks, each part below
is worth 2 marks)

a) What percentage of the total sample variation is accounted for the first, second and
third PCs?
b) Interpret the first 3 PC’s.
c) Write out the first, second and third PCs as linear functions of the original variables.
d) Can the data be effectively summarised in fewer than 13 dimensions? Justify your
answer Comment on it.
e) Obtain via SAS or sketch the scree plot to confirm your choice of the number of PCs.

4. Perform a principal component analysis using SAS on the correlation matrix. Answer the
following from the resultant output (10 marks, each part below is worth 2 marks)

a) What percentage of the total sample variation is accounted for the first, second and
third PCs?
b) Interpret the first 3 PC’s.
c) Write out the first, second and third PCs as linear functions of the standardised
variables.
d) Can the data be effectively summarized in fewer than 13 dimensions? Justify your
answer Comment on it.
e) Obtain via SAS or the scree plot to confirm your choice of the number of PCs.



2

Question 2 (14 marks)
Consider the raw data set with 12 observations, on 5 socio-economic variables, called Population,
School, Employment, Services and HouseValue.

data SocioEconomics;
input Population School Employment Services HouseValue;
datalines;
5700 12.8 2500 270 25000
1000 10.9 600 10 10000
3400 8.8 1000 10 9000
3800 13.6 1700 140 25000
4000 12.8 1600 140 25000
8200 8.3 2600 60 12000
1200 11.4 400 10 16000
9100 11.5 3300 60 14000
9900 12.5 3400 180 18000
9600 13.7 3600 390 25000
9600 9.6 3300 80 12000
9400 11.4 4000 100 13000
;
proc factor data=SocioEconomics simple corr;
run;


Conduct a factor analysis by using the following SAS statements above.
Show your SAS code (it can vary to the one I suggest), output and answers within the ONE
assignment pdf or docx that you submit in Canvas.

1. Prepare the dataset for a Factor analysis via SAS. (1 mark)
2. Generate the means and standard deviations of the data. (1 mark)
3. Perform a Factor analysis on the raw data and the correlation matrix using the code above,
and answer the following questions. (2 marks)

4. From the eigenvalues of the correlation matrix and the factor loading matrix and
communalities outputted answer the following questions.

a) Do the first two principal components (factors) provide an adequate summary of the
data? (1 mark)
b) How much of the variation is accounted for by 2 factors? (1 mark)
c) How much of the variation is accounted for by 3 factors? (1 mark)


5. To get the scoring coefficients as eigenvalues use PROC PRINCOMP to display the scoring
coefficients as eigenvectors, use, and answer the following questions
3

proc princomp data=SocioEconomics;
run;

a) What are the eigenvalues and the respective eigenvectors? (1 mark)
b) What is the proportion of the variance accounted for by the first and second component
respectively? (1 mark)
c) Together how much do the first and second factors together account for the
standardised variance? (1 mark)
d) Do the final communality estimates show that all the variables are well accounted for by
how many components or factors. Justify your answer. (1 mark)

6. To obtain the component scores as linear combinations of the observed variables request
the standardized scoring coefficients by adding the SCORE option in the FACTOR statement:
and run this. Note that the SCORE option in the code below requests the display of the
standardized scoring coefficients.

proc factor data=SocioEconomics n=5 score;
run;

As each factor/component can expressed as a linear combination of the standardised
observed variables using the code above, answer the following questions:,

a) Write down the first principal component or Factor1 in terms of the standardised
variables. (1 mark)
b) Write down the second principal component or Factor2 in terms of the standardised
variables. (1 mark)
c) Write the first and second PCs in terms of eigenvectors. (1 mark)
NOTES/HINTS:

 The SIMPLE option specified in the PROC FACTOR statement generates the means and
standard deviations of all observed variables in the analysis
 The CORR option specified in the PROC FACTOR statement generates the output of the
observed correlations.
 To express the observed variables as functions of the components (or factors), you inspect
the factor loading matrix.
 To obtain the component scores as linear combinations of the observed variables request
the standardized scoring coefficients by adding the SCORE option in the FACTOR statement:
The SCORE option in the code below requests the display of the standardized scoring
coefficients
proc factor data=SocioEconomics n=5 score;
run;



4

QUESTION 3 (8 marks)

Six variables measured on 100 genuine and 100 forged (counterfeit/fake) old Swiss 1000-franc bank
notes are given in Appendix A of the assignment (also available in R library)

data(banknote)

A data.frame of dimension 200x7 with the following 7 variables:
Class
a factor with classes: genuine, counterfeit
Length
Length of bill (mm)
Left
Width of left edge (mm)
Right
Width of right edge (mm)
Bottom
Bottom margin width (mm)
Top
Top margin width (mm)
Diagonal
Length of diagonal (mm)


Note that the data in Appendix A if you do not use R has 6 columns correspond to the following 6
variables:

1. Length of the bank note, length
2. Height of the bank note, measured on the left, left
3. Height of the bank note, measured on the right, right
4. Distance of inner frame to the lower border, bottom
5. Distance of inner frame to the upper border, top
6. Length of the diagonal, diag

You need to create the class or group indicator column (genuine versus fake) to Appendix A data.

Show your SAS code, SAS output and answers within your final assignment pdf or docx that you
submit in Canvas.

1. Prepare the dataset for input for a Discriminant analysis via SAS. (0.5 marks)
2. Generate the means and the variance-covariance matrix of the data for the genuine notes.
(0.5 marks)
3. Generate the means and standard deviations and the variance-covariance matrix of the data
for the forged/fake/counterfeit notes. (0.5 marks)
4. Produce the correlation matrix and an associated scatterplot of the inputted data for the
genuine notes. (0.5 marks)

5

5. Produce the correlation matrix and an associated Scatterplot of the inputted data for the
forged /fake notes. (0.5 marks)
6. Run the discriminant analysis using the SAS code below which allocates a bank note with
the following characteristics X0T = (214.9, 130.1, 129.9, 9, 10.6, 140.5) to the appropriate
grouping i.e. allocates it to either the genuine or the forged/fake class.

Using the SAS DISCRIM code below and resultant output answer the following questions.

a) Is 1= 2. ? Justify your answer. (1 mark)

b) How is the bank note with X0 T = (214.9, 130.1, 129.9, 9, 10.6, 140.5) allocated? (1
mark)

c) Write down the resultant confusion matrix. (1 mark)

data test;
input length left right bottom top diag;
cards;
214.9 130.1 129.9 9 10.6 140.5
;
run;
proc discrim data=combine pool=test crossvalidate testdata=test
testout=a;
class type;
var length left right bottom top diag;
priors "real"=0.99 "fake"=0.01;
run;
proc print;
run;

HINTS AND NOTES TO LEARN AND TO INTERPRET THE OUTPUT: In the SAS code above
 By including pool=test, SAS will decide what kind of discriminant analysis to carry out based
on the results of this test.
o If the test fails to reject, then SAS will automatically do a linear discriminant analysis
(LDF).
o If the test rejects, then SAS will do a quadratic discriminant analysis (QDF).

 There are two other options also. If we put pool=yes then SAS will conduct a linear
discriminant analysis whether it is warranted or not. It will pool the variance-covariance
matrices of the 2 classes/groups and do a linear discriminant analysis without reporting
Bartlett's test.

6

 If pool=no then SAS will not pool the variance-covariance matrices and perform the
quadratic discriminant analysis. SAS does not actually print out the quadratic discriminant
function, but it will use quadratic discriminant analysis to classify sample units into
populations.
 Note: SAS runs the Bartlett's Test to test whether there is a significant difference between
the variance-covariance matrices of the genuine and counterfeit (fake) bank notes, i.e. it
tests is 1= 2.

APPENDIX A
Observations 1-100 are the genuine bank notes and the other 100 observations are the counterfeit
(forged/fake) bank notes. You need to create the class or group indicator column otherwise use

data(banknote)

Length Height Height Inner Frame Inner Frame Diagonal
(left) (right) (lower) (upper)

214.8 131.0 131.1 9.0 9.7 141.0
214.6 129.7 129.7 8.1 9.5 141.7
214.8 129.7 129.7 8.7 9.6 142.2
214.8 129.7 129.6 7.5 10.4 142.0
215.0 129.6 129.7 10.4 7.7 141.8
215.7 130.8 130.5 9.0 10.1 141.4
215.5 129.5 129.7 7.9 9.6 141.6
214.5 129.6 129.2 7.2 10.7 141.7
214.9 129.4 129.7 8.2 11.0 141.9
215.2 130.4 130.3 9.2 10.0 140.7
215.3 130.4 130.3 7.9 11.7 141.8
215.1 129.5 129.6 7.7 10.5 142.2
215.2 130.8 129.6 7.9 10.8 141.4
214.7 129.7 129.7 7.7 10.9 141.7
215.1 129.9 129.7 7.7 10.8 141.8
214.5 129.8 129.8 9.3 8.5 141.6
214.6 129.9 130.1 8.2 9.8 141.7
215.0 129.9 129.7 9.0 9.0 141.9
215.2 129.6 129.6 7.4 11.5 141.5
214.7 130.2 129.9 8.6 10.0 141.9
215.0 129.9 129.3 8.4 10.0 141.4
215.6 130.5 130.0 8.1 10.3 141.6
215.3 130.6 130.0 8.4 10.8 141.5
215.7 130.2 130.0 8.7 10.0 141.6
215.1 129.7 129.9 7.4 10.8 141.1
215.3 130.4 130.4 8.0 11.0 142.3
215.5 130.2 130.1 8.9 9.8 142.4
215.1 130.3 130.3 9.8 9.5 141.9
215.1 130.0 130.0 7.4 10.5 141.8
214.8 129.7 129.3 8.3 9.0 142.0
215.2 130.1 129.8 7.9 10.7 141.8
214.8 129.7 129.7 8.6 9.1 142.3
7

215.0 130.0 129.6 7.7 10.5 140.7
215.6 130.4 130.1 8.4 10.3 141.0
215.9 130.4 130.0 8.9 10.6 141.4
214.6 130.2 130.2 9.4 9.7 141.8
215.5 130.3 130.0 8.4 9.7 141.8
215.3 129.9 129.4 7.9 10.0 142.0
215.3 130.3 130.1 8.5 9.3 142.1
213.9 130.3 129.0 8.1 9.7 141.3
214.4 129.8 129.2 8.9 9.4 142.3
214.8 130.1 129.6 8.8 9.9 140.9
214.9 129.6 129.4 9.3 9.0 141.7
214.9 130.4 129.7 9.0 9.8 140.9
214.8 129.4 129.1 8.2 10.2 141.0
214.3 129.5 129.4 8.3 10.2 141.8
214.8 129.9 129.7 8.3 10.2 141.5
214.8 129.9 129.7 7.3 10.9 142.0
214.6 129.7 129.8 7.9 10.3 141.1
214.5 129.0 129.6 7.8 9.8 142.0
214.6 129.8 129.4 7.2 10.0 141.3
215.3 130.6 130.0 9.5 9.7 141.1
214.5 130.1 130.0 7.8 10.9 140.9
215.4 130.2 130.2 7.6 10.9 141.6
214.5 129.4 129.5 7.9 10.0 141.4
215.2 129.7 129.4 9.2 9.4 142.0
215.7 130.0 129.4 9.2 10.4 141.2
215.0 129.6 129.4 8.8 9.0 141.1
215.1 130.1 129.9 7.9 11.0 141.3
215.1 130.0 129.8 8.2 10.3 141.4
215.1 129.6 129.3 8.3 9.9 141.6
215.3 129.7 129.4 7.5 10.5 141.5
215.4 129.8 129.4 8.0 10.6 141.5
214.5 130.0 129.5 8.0 10.8 141.4
215.0 130.0 129.8 8.6 10.6 141.5
215.2 130.6 130.0 8.8 10.6 140.8
214.6 129.5 129.2 7.7 10.3 141.3
214.8 129.7 129.3 9.1 9.5 141.5
215.1 129.6 129.8 8.6 9.8 141.8
214.9 130.2 130.2 8.0 11.2 139.6
213.8 129.8 129.5 8.4 11.1 140.9
215.2 129.9 129.5 8.2 10.3 141.4
215.0 129.6 130.2 8.7 10.0 141.2
214.4 129.9 129.6 7.5 10.5 141.8
215.2 129.9 129.7 7.2 10.6 142.1
214.1 129.6 129.3 7.6 10.7 141.7
214.9 129.9 130.1 8.8 10.0 141.2
214.6 129.8 129.4 7.4 10.6 141.0
215.2 130.5 129.8 7.9 10.9 140.9
214.6 129.9 129.4 7.9 10.0 141.8
215.1 129.7 129.7 8.6 10.3 140.6
214.9 129.8 129.6 7.5 10.3 141.0
215.2 129.7 129.1 9.0 9.7 141.9
8

215.2 130.1 129.9 7.9 10.8 141.3
215.4 130.7 130.2 9.0 11.1 141.2
215.1 129.9 129.6 8.9 10.2 141.5
215.2 129.9 129.7 8.7 9.5 141.6
215.0 129.6 129.2 8.4 10.2 142.1
214.9 130.3 129.9 7.4 11.2 141.5
215.0 129.9 129.7 8.0 10.5 142.0
214.7 129.7 129.3 8.6 9.6 141.6
215.4 130.0 129.9 8.5 9.7 141.4
214.9 129.4 129.5 8.2 9.9 141.5
214.5 129.5 129.3 7.4 10.7 141.5
214.7 129.6 129.5 8.3 10.0 142.0
215.6 129.9 129.9 9.0 9.5 141.7
215.0 130.4 130.3 9.1 10.2 141.1
214.4 129.7 129.5 8.0 10.3 141.2
215.1 130.0 129.8 9.1 10.2 141.5
214.7 130.0 129.4 7.8 10.0 141.2
214.4 130.1 130.3 9.7 11.7 139.8
214.9 130.5 130.2 11.0 11.5 139.5
214.9 130.3 130.1 8.7 11.7 140.2
215.0 130.4 130.6 9.9 10.9 140.3
214.7 130.2 130.3 11.8 10.9 139.7
215.0 130.2 130.2 10.6 10.7 139.9
215.3 130.3 130.1 9.3 12.1 140.2
214.8 130.1 130.4 9.8 11.5 139.9
215.0 130.2 129.9 10.0 11.9 139.4
215.2 130.6 130.8 10.4 11.2 140.3
215.2 130.4 130.3 8.0 11.5 139.2
215.1 130.5 130.3 10.6 11.5 140.1
215.4 130.7 131.1 9.7 11.8 140.6
214.9 130.4 129.9 11.4 11.0 139.9
215.1 130.3 130.0 10.6 10.8 139.7
215.5 130.4 130.0 8.2 11.2 139.2
214.7 130.6 130.1 11.8 10.5 139.8
214.7 130.4 130.1 12.1 10.4 139.9
214.8 130.5 130.2 11.0 11.0 140.0
214.4 130.2 129.9 10.1 12.0 139.2
214.8 130.3 130.4 10.1 12.1 139.6
215.1 130.6 130.3 12.3 10.2 139.6
215.3 130.8 131.1 11.6 10.6 140.2
215.1 130.7 130.4 10.5 11.2 139.7
214.7 130.5 130.5 9.9 10.3 140.1
214.9 130.0 130.3 10.2 11.4 139.6
215.0 130.4 130.4 9.4 11.6 140.2
215.5 130.7 130.3 10.2 11.8 140.0
215.1 130.2 130.2 10.1 11.3 140.3
214.5 130.2 130.6 9.8 12.1 139.9
214.3 130.2 130.0 10.7 10.5 139.8
214.5 130.2 129.8 12.3 11.2 139.2
214.9 130.5 130.2 10.6 11.5 139.9
214.6 130.2 130.4 10.5 11.8 139.7
9

214.2 130.0 130.2 11.0 11.2 139.5
214.8 130.1 130.1 11.9 11.1 139.5
214.6 129.8 130.2 10.7 11.1 139.4
214.9 130.7 130.3 9.3 11.2 138.3
214.6 130.4 130.4 11.3 10.8 139.8
214.5 130.5 130.2 11.8 10.2 139.6
214.8 130.2 130.3 10.0 11.9 139.3
214.7 130.0 129.4 10.2 11.0 139.2
214.6 130.2 130.4 11.2 10.7 139.9
215.0 130.5 130.4 10.6 11.1 139.9
214.5 129.8 129.8 11.4 10.0 139.3
214.9 130.6 130.4 11.9 10.5 139.8
215.0 130.5 130.4 11.4 10.7 139.9
215.3 130.6 130.3 9.3 11.3 138.1
214.7 130.2 130.1 10.7 11.0 139.4
214.9 129.9 130.0 9.9 12.3 139.4
214.9 130.3 129.9 11.9 10.6 139.8
214.6 129.9 129.7 11.9 10.1 139.0
214.6 129.7 129.3 10.4 11.0 139.3
214.5 130.1 130.1 12.1 10.3 139.4
214.5 130.3 130.0 11.0 11.5 139.5
215.1 130.0 130.3 11.6 10.5 139.7
214.2 129.7 129.6 10.3 11.4 139.5
214.4 130.1 130.0 11.3 10.7 139.2
214.8 130.4 130.6 12.5 10.0 139.3
214.6 130.6 130.1 8.1 12.1 137.9
215.6 130.1 129.7 7.4 12.2 138.4
214.9 130.5 130.1 9.9 10.2 138.1
214.6 130.1 130.0 11.5 10.6 139.5
214.7 130.1 130.2 11.6 10.9 139.1
214.3 130.3 130.0 11.4 10.5 139.8
215.1 130.3 130.6 10.3 12.0 139.7
216.3 130.7 130.4 10.0 10.1 138.8
215.6 130.4 130.1 9.6 11.2 138.6
214.8 129.9 129.8 9.6 12.0 139.6
214.9 130.0 129.9 11.4 10.9 139.7
213.9 130.7 130.5 8.7 11.5 137.8
214.2 130.6 130.4 12.0 10.2 139.6
214.8 130.5 130.3 11.8 10.5 139.4
214.8 129.6 130.0 10.4 11.6 139.2
214.8 130.1 130.0 11.4 10.5 139.6
214.9 130.4 130.2 11.9 10.7 139.0
214.3 130.1 130.1 11.6 10.5 139.7
214.5 130.4 130.0 9.9 12.0 139.6
214.8 130.5 130.3 10.2 12.1 139.1
214.5 130.2 130.4 8.2 11.8 137.8
215.0 130.4 130.1 11.4 10.7 139.1
214.8 130.6 130.6 8.0 11.4 138.7
215.0 130.5 130.1 11.0 11.4 139.3
214.6 130.5 130.4 10.1 11.4 139.3
214.7 130.2 130.1 10.7 11.1 139.5
10

214.7 130.4 130.0 11.5 10.7 139.4
214.5 130.4 130.0 8.0 12.2 138.5
214.8 130.0 129.7 11.4 10.6 139.2
214.8 129.9 130.2 9.6 11.9 139.4
214.6 130.3 130.2 12.7 9.1 139.2
215.1 130.2 129.8 10.2 12.0 139.4
215.4 130.5 130.6 8.8 11.0 138.6
214.7 130.3 130.2 10.8 11.1 139.2
215.0 130.5 130.3 9.6 11.0 138.5
214.9 130.3 130.5 11.6 10.6 139.8
215.0 130.4 130.3 9.9 12.1 139.6
215.1 130.3 129.9 10.3 11.5 139.7
214.8 130.3 130.4 10.6 11.1 140.0
214.7 130.7 130.8 11.2 11.2 139.4
214.3 129.9 129.9 10.2 11.5 139.6



51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468