辅导案例-COMP61021
COMP61021: Modelling and Visualization of High Dimensional Data
Lab 2: Applications of PCA (Assessed Lab Exercise)
This coursework (a zipped file) must be submitted via the Blackboard. The deadline of
this lab exercise is 23:30 on 19th November 2019. The late submission policy is
applied (see the teaching website and FAQs for details).

PCA is one of the most important data analysis tools and can be applied in high dimensional
data visualization and compression. In this exercise, you are asked to apply appropriate
PCA implementations in Matlab provided to real data sets for visualization and compression.
You can download relevant Matlab code and data sets from
http://syllabus.cs.manchester.ac.uk/pgt/COMP61021/Lab/lab1.zip
After unzipping it, you should be able to find two sub-directories; code and data. In the
code sub-directory, there are three Matlab functions:
pca1.m Matlab function of PCA derived from the co-variance matrix
pca2.m Matlab function of PCA derived from SVD (dual PCA)
display_digit.m Matlab function for displaying a grey-level image

In the data sub-directory, you find two files iris.mat and digit.mat. The file
digit.mat has further two data subsets: train and test that contain grey-level images
of hand-written digits “6”.

iris.mat IRIS data set consisting of 150 four-dimensional data points
digit.mat The whole data set of hand-written digit images “6” of two data subsets below
train Training subset of 300 hand-written digits images “6” of 28X28 pixels
test Test subset of 10 hand-written digits images “6” of 28X28 pixels


Now… the lab exercise …





PART 1 – Visualisation
For this part, you are asked choose an appropriate implementation of PCA provided. You
need to apply it to the IRIS data set and then project all 150 data points onto a two-
dimensional PCA subspace consisting of two principal components for visualization. Let PC1,
PC2 and PC3 denote top three principal components of this data set. You need to use a
Matlab display function to show your results in PC1-PC2, PC1-PC3 and PC2-PC3 subspaces.
Based on your observations on visualized results, describe any non-trivial properties you find
out for this data set. Three plots showing your results and the description on your
observation must appear in your report.

PART 2 – Image compression
You are asked to apply an appropriate implementation of PCA to hand-written digit images
for compression. Those 300 images in the train set are used to achieve a PCA
compression system while 10 images in the test set will be used to evaluate your PCA
compression system. In this part, you need to answer the following questions experimentally
with some sensible justification.
• What is the appropriate implementation of PCA used in this application? (Justify why)
• How many principal components are needed to establish a satisfactory compression
system? (Justify why you choose such a number of principal components with evidence)
• For 10 images in the test set, what are their low-dimensional representations?
(Explicitly list all the low-dimensional representations of those 10 images in a table that
should be put in an appendix.)
• How can you reconstruct those images from their low-dimensional representations?
(Display 10 original and their corresponding reconstructed images for comparison)
• How can you estimate their reconstruction errors for 10 test images? (Report them with
your chosen measure in detail and justify why you choose such a measure to estimate
10 reconstruction errors)
The answers to all the above questions must explicitly appear in your report.
After loading the digit.mat in Matlab, you can extract an image from either the train
subset or the test subset directly, e.g., Ik = train{k} extracts the kth image in the
train subset. You can use the display_digit.m function provided to display any image,
e.g., display_digit(Ik) shows the kth image Ik in the train subset.

PART 3 – Bonus marks
Additionally, bonus marks are available for truly exceptional students. To obtain marks in this
category, you should show evidence of learning outside the supplied lecture notes and
closely linking to problems on Part 2. In any case, data sets given in Part 2 must be used in
this part. An example of things you could do: applying an alternative dimensionality reduction
technique to problems described in Part 2 for a better performance with fully understanding
the technique applied rather than reporting the performance improvement only. Along with
the experimental results, the justification of your chosen method and the reason(s)
attributed to a success must be described clearly in the report.
DELIVERABLES

A zipped file, named “yourname-lab1.zip”, including a report in the PDF format (two
single-side A4 pages (font of 11pt) + one-page appendix) and all relevant source code
along with a readme.txt file in the text format. The zipped file must be submitted via the
Blackboard.

Your report must address all requirements and key points/results as specified in Parts 1 and
2. The same requirements are applied to Part 3 if you do that. Your readme.txt file must
contain a step-by-step procedure so that a marker can follow your instructions to run your
submitted code straightforwardly for replicating the results described in your report.

Take note, we are not interested in the details of your code, what Matlab functions are
called, what they return etc. This course unit is about machine learning algorithms, and is
indifferent to how you program them in Matlab.

There is no specific format – marks will be allocated roughly on the basis of:
• rigorous experimentation
• how informative and well your results are presented in your report
• imagination/research/understanding/performance in Part 2 and above
• grammar, ease of reading


The lab is marked out of 15:
Part 1 – Visualization 4 marks
Part 2 – Image Compression 8 marks
Part 3 – Bonus 3 marks


Mark and Feedback will be available on the Blackboard. Once the marking is
completed, you will be notified via email.

51作业君 51作业君

扫码添加客服微信

添加客服微信: IT_51zuoyejun