CS6923 Machine Learning, Fall 2023
NYU School of Engineering
Homework 7
Submission is required only on GradeScope.
Part I: Written Exercises
For this assignment, refer to the matrices defined below:
diagonal matrix 157 0
D= 0 16 0 0 0 0
D, orthogonal matrix V , and zero centered data matrix X
0 0 −0.477 0.522 0.480 0.520 4.5 −3.5 −3.
1. Considering matrix D:
•v∈
0 0.50 0.50
Which vector v from the given set will maximize vTDv?
1.5 0 0 V =0.476 −0.521 0.521 0.480 X=1.5 −2.5 −4. 4.5
4 0 0 0
0.561 0.475 −0.479 0.480 −0.480 −0.479 −0.519 0.520
−3.5 4.5 2. −2.5 −2.5 1.5 5. −3.5
, ,
0 0.50 0.50 0 0 0.50
1 0.71 0.50
• Is there a better unit vector v ∈ R4 that can achieve a higher result? (FYI (0.71 ∗ 0.71) ≈ 0.50.)
2. Using the orthogonal matrix V :
−0.477 0.522 0.480 0.520
0.561 0.475 −0.479 0.480 −0.480 −0.479 −0.519 0.520
Round your answer to 3 decimal places1 3. ForA=XTX=VDVT:
• Which unit vector v ∈ R4 maximizes vT Av.
1Hint: Think of what the answer should be. Verify you are right by performing the calculations using a calculator. You will encounter round-off error.
0.476 −0.521 0.521 0.480 • CalculateVTvforeveryv∈ , , , .
4. If you transformed all examples Φ(x) to have only one feature, and assuming you have no prior knowl- edge about the significance or importance of the original features, which feature would you use to classify the data?
5. The total variance of a data set X is the sum of the variances of all the principal components. If we project onto the top k principle components when performing PCA, the amount of variance remaining is the amount of variance of the top k principle components. 2
For X, how much original variance is retained after projecting onto the top two principal components?
6. Consider the following training set, consisting of 4 unlabeled examples, with real-valued attributes x1, x2, and x3. The training set is represented by a matrix X, where each row of the matrix corresponds to an example, and column i corresponds to the ith attribute.
5 2 4 9 6 4 7 1 0
Perform Principal Component Analysis (PCA) with k = 2. Before applying PCA, make sure to center the data by subtracting the mean from each attribute.
Part II: Programming Exercises
7. Load the python notebook homework 7 PCA.ipynb. The first part of the notebook is a tutorial on PCA. At the bottom of the tutorial, you will find a section labeled Homework. Complete the following tasks:
(a) Display the fourth face.
(b) Calculate the mean of all the examples in the dataset fea.
This involves creating an image where each pixel i of the image is the mean of pixel i across all the images in “fea”.
Display the mean image using an appropriate Python command and include the commands you used.
(c) Perform dimensionality reduction using PCA.
Compute the top 5 principal components of the data matrix fea and provide the Python commands used.
Print the 5 principal components.
(d) Project the fourth face in the dataset onto the first 5 principal components.
Provide z(4) which represents the fourth example in the transformed space.
(e) Project the fourth face back into the original space and add back the mean.
Provide the Python commands you used.
Display the resulting approximate image.
Repeat with the first 50 principle components (instead of 5).
(a) Determine how much variance was maintained after projecting onto the first 5 principle compo- nents and how much was maintained after projecting onto the first 50 components.
All output asked in the questions should be appended to the written answers.
2Learn more at: