辅导案例-COMP61021
COMP61021: Modelling and Visualization of High Dimensional Data Lab 2: Applications of PCA (Assessed Lab Exercise) This coursework (a zipped file) must be submitted via the Blackboard. The deadline of this lab exercise is 23:30 on 19th November 2019. The late submission policy is applied (see the teaching website and FAQs for details). PCA is one of the most important data analysis tools and can be applied in high dimensional data visualization and compression. In this exercise, you are asked to apply appropriate PCA implementations in Matlab provided to real data sets for visualization and compression. You can download relevant Matlab code and data sets from http://syllabus.cs.manchester.ac.uk/pgt/COMP61021/Lab/lab1.zip After unzipping it, you should be able to find two sub-directories; code and data. In the code sub-directory, there are three Matlab functions: pca1.m Matlab function of PCA derived from the co-variance matrix pca2.m Matlab function of PCA derived from SVD (dual PCA) display_digit.m Matlab function for displaying a grey-level image In the data sub-directory, you find two files iris.mat and digit.mat. The file digit.mat has further two data subsets: train and test that contain grey-level images of hand-written digits “6”. iris.mat IRIS data set consisting of 150 four-dimensional data points digit.mat The whole data set of hand-written digit images “6” of two data subsets below train Training subset of 300 hand-written digits images “6” of 28X28 pixels test Test subset of 10 hand-written digits images “6” of 28X28 pixels Now… the lab exercise … PART 1 – Visualisation For this part, you are asked choose an appropriate implementation of PCA provided. You need to apply it to the IRIS data set and then project all 150 data points onto a two- dimensional PCA subspace consisting of two principal components for visualization. Let PC1, PC2 and PC3 denote top three principal components of this data set. You need to use a Matlab display function to show your results in PC1-PC2, PC1-PC3 and PC2-PC3 subspaces. Based on your observations on visualized results, describe any non-trivial properties you find out for this data set. Three plots showing your results and the description on your observation must appear in your report. PART 2 – Image compression You are asked to apply an appropriate implementation of PCA to hand-written digit images for compression. Those 300 images in the train set are used to achieve a PCA compression system while 10 images in the test set will be used to evaluate your PCA compression system. In this part, you need to answer the following questions experimentally with some sensible justification. • What is the appropriate implementation of PCA used in this application? (Justify why) • How many principal components are needed to establish a satisfactory compression system? (Justify why you choose such a number of principal components with evidence) • For 10 images in the test set, what are their low-dimensional representations? (Explicitly list all the low-dimensional representations of those 10 images in a table that should be put in an appendix.) • How can you reconstruct those images from their low-dimensional representations? (Display 10 original and their corresponding reconstructed images for comparison) • How can you estimate their reconstruction errors for 10 test images? (Report them with your chosen measure in detail and justify why you choose such a measure to estimate 10 reconstruction errors) The answers to all the above questions must explicitly appear in your report. After loading the digit.mat in Matlab, you can extract an image from either the train subset or the test subset directly, e.g., Ik = train{k} extracts the kth image in the train subset. You can use the display_digit.m function provided to display any image, e.g., display_digit(Ik) shows the kth image Ik in the train subset. PART 3 – Bonus marks Additionally, bonus marks are available for truly exceptional students. To obtain marks in this category, you should show evidence of learning outside the supplied lecture notes and closely linking to problems on Part 2. In any case, data sets given in Part 2 must be used in this part. An example of things you could do: applying an alternative dimensionality reduction technique to problems described in Part 2 for a better performance with fully understanding the technique applied rather than reporting the performance improvement only. Along with the experimental results, the justification of your chosen method and the reason(s) attributed to a success must be described clearly in the report. DELIVERABLES A zipped file, named “yourname-lab1.zip”, including a report in the PDF format (two single-side A4 pages (font of 11pt) + one-page appendix) and all relevant source code along with a readme.txt file in the text format. The zipped file must be submitted via the Blackboard. Your report must address all requirements and key points/results as specified in Parts 1 and 2. The same requirements are applied to Part 3 if you do that. Your readme.txt file must contain a step-by-step procedure so that a marker can follow your instructions to run your submitted code straightforwardly for replicating the results described in your report. Take note, we are not interested in the details of your code, what Matlab functions are called, what they return etc. This course unit is about machine learning algorithms, and is indifferent to how you program them in Matlab. There is no specific format – marks will be allocated roughly on the basis of: • rigorous experimentation • how informative and well your results are presented in your report • imagination/research/understanding/performance in Part 2 and above • grammar, ease of reading The lab is marked out of 15: Part 1 – Visualization 4 marks Part 2 – Image Compression 8 marks Part 3 – Bonus 3 marks Mark and Feedback will be available on the Blackboard. Once the marking is completed, you will be notified via email.