程序代写案例-EE 240

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
EE 240: Pattern Recognition and Machine Learning
Homework 4
Due date: May 30, 2021
Description: K-means clustering, principal component analysis.
Reading assignment and references: Instructor notes, AML Ch. 6, Appendix C; ESL Ch. 13 & 14.
Homework and lab assignment submission policy:
All homework and lab assignments must be submitted online via https://iLearn.ucr.edu.
Homework solutions should be written and submitted individually, but discussions among students are
encouraged.
All assignments should be submitted by the due date. You are allowed a total of 5 “late days” in the
whole quarter. If you submit your assignment late, after using all the late days, you will not receive
any credit.
H4.1 K-means clustering: In this exercise we will perform color-based segmentation using K-means
algorithm.
(a) Implement K-means algorithm in Python that accepts target number of clusters (K) and a color
image as input parameters. Treat each color pixel as 3-dim. feature vector xi. (5 pts)
A general K-means algorithm can be described as follows. Suppose we are given training
examples x1,x2, . . . ,xN , where each xn ∈ Rd. We want to group the N data samples into K
clusters.
i. Initialize cluster centers µ1, . . . , µK ∈ Rd at random.
ii. Repeat until convergence {
For every data point xi, update its label as
li = argmin
j
‖xi − µj‖22. (1)
For each cluster j, update its center µj as mean of all points assigned to cluster j:
µj =
∑N
i=1 δ{li = j}xi∑N
i=1 δ{li = j}
.
}
(b) Take a selfie of yourself with a background that has different colors from your skin and clothing.
Use K-means script from previous step to segment your image into K clusters. To create a
segmented output image, replace every pixel in your image with the center of the cluster assigned
to it. Report your results for K= {2, 4, 8, 16} clusters. (10 pts)
(c) Repeat steps (a) and (b) with absolute distance instead of squared euclidean distance. That
is, implement a new script that replaces minimum euclidean distance in (1) with minimum
absolute distance1 li = argminj ‖xi − µj‖1. Report your results for selfie segmentation using
the new distance. (10 pts)
H4.2 Principal component analysis (PCA): In this problem we will consider two tasks. First, we will
explore the efficiency of PCA as a tool for dimensionality reduction and compression. Then, we will
utilize PCA for constructing a rudimentary face recognition algorithm. Download ATT Face dataset
1‖u‖1 =
∑d
j=1 |u(j)| denotes `1 norm of vector u, and it is defined as absolute sum of all entries in u.
1
from Piazza. ATT Face dataset contains images of the faces of 40 individuals. For each individual,
there are 10 images taken under different poses. Divide your data into two sets: select 60% of images
for training and remaining 40% for testing.
You can read about eignefaces at this link: http://www.scholarpedia.org/article/Eigenfaces.
You are allowed to use the PCA module in sklearn:
from sklearn.decomposition import PCA
(a) Perform PCA on the training images viewed as points in high-dimensional space (using their
pixel values). Plot a curve displaying the amount of “energy” captured by the first k principal
components, where energy is the cumulative sum of top-k components variances, divided by the
sum of all the variances. How many components do we need in order to capture 50% of the
energy? How much of the energy is captured with k = 25? (10 pts)
(b) Visualize the previously discovered top 25 eigenfaces (eigenvectors obtained from PCA). Order
them according to the magnitudes of their corresponding eigenvalues and plot them in a single
figure. (5 pts)
(c) Let us now try to recognize the identity of a person’s face in a previously unseen image. Load
an image from the test set, subtract from it the mean of the training images and project it to
the previously computed top-25 principal components. Then, use a nearest neighbor search
to find its closest image in the training set. If the nearest neighbor found depicts the face of
the same person as the one of the unseen image, consider this as a successful discovery of the
person’s identity. Repeat this experiment for all the test images and report the mean accuracy
on the entire test data set. Make comments on the test images that are mistakenly identified.
(10 pts)
Maximum points: 50
2

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468