辅导案例-ECE 174
ECE 174: Introduction to Linear and Nonlinear Optimization with Applications Fall 2019 Programming Assignment # 1 Due: November 5 Instructions: In this programming assignment you will implement a “K-Means Clustering" heuristic, test your program on real datasets under various settings, and interpret your results. • You can write the source code in any language of your choice (Python, MATLAB). The code should be properly commented and you will be required to submit the source code. You should write your own code and not copy from any resource on the internet. If you consult any resource online before writing your own code, you must cite the source, add a comment to your code with a link to the source you got it from. Any uncited code you did not write yourself, or any code which is substantially similar to another student’s, will be referred to the Office of Academic Integrity. • In addition to the source code, you will submit a detailed report that describes the simula- tions and explains the results using figures and mathematical interpretations. Your report must address all the questions asked in this assignment. We do not have any detailed re- quirements on the formatting, but we expect the report to be presented in a complete and professional manner, with appropriate labels on items and comments which demonstrate understanding of the results. • Your grade on the programming assignment will be based on the correctness of the code as well as the quality of the report. Since our evaluation of your code is based on its results as presented in your report, correct code is very important. We have every intention of awarding partial credit, but for instance if a small bug causes your program’s output to be unusable throughout the report, we cannot award much credit even if the bug was very small. • Reading: Read Sec. 4.2 and 4.3 from the textbook to review “k-means" clustering. Also study Example 4.4.1 on image clustering. You will implement this example and do several numerical experiments. • Dataset: We will use the well-known MNIST (Mixed National Institute of Standards) database of handwritten digits which contains grayscale images of size 28ˆ 28 which can be represented as vectors of size 784. The dataset consists of N = 60000 images that you will cluster using k-means. The dataset is provided as mnist.mat which you can directly read into MATLAB. For those working in Python, you can also refer to http://rasbt. github.io/mlxtend/user_guide/data/loadlocal_mnist/ and http://rasbt.github.io/ mlxtend/installation/ for obtaining the equivalent data from the MNIST images. 1. Write a program k-means to implement k-means clustering. The functions accepts two in- puts: a list of N data vectors to be clustered, and the number of clusters, K. The program outputs three quantities: a list of N group assignment indices (c1, c2, ¨ ¨ ¨ , cN), a list of K group representative vectors (z1, ¨ ¨ ¨ , zK) and the value of Jclust after each iteration of the algorithm until convergence or termination. Initialization and Termination: Initialize k-means by randomly assigning the data points to K group, i.e., randomly selecting each ci to be an integer between 1 and K. Alternatively, you can also randomly assign K data points as the K group representatives. Explicitly mention your initialization technique in the report. You can follow any of the termination rules suggested in your textbook (Page 76). Again, mention in your report what criterion you used. 2. Let K = 20. Implement k-means on the MNIST dataset, starting with a random assignment of the vectors to clusters, and repeating the experiment P = 30 times. Save the output of the algorithm for each of the 30 runs. Consider the two runs that produce the maximum and minimum value of Jclust upon convergence. For each of these two runs • Plot the value of Jclust as a function of the number of iterations. Comment. • Visualize the K group representatives as images. You should generate something similar to Fig. 4.8 and 4.9 of the textbook. Interpret the group representatives (if which digits they represent, or if they represent something else) and compare them for these two runs. Discuss your results. • Find the 10 nearest data points to each of the K group representatives. Manually list what digits they represent (by eye estimation). In your opinion, how many of these 10 images match the corresponding group representative and how many of them are misclassified? Generate a table and interpret your results. Plot a few of the correctly classified as well as misclassified data points as images, alongside the group representatives (also represented as images). 3. Repeat part 2 for K = 10 and P = 20. Compare with the previous results. 4. Repeat part 2 for K = 5 and P = 10. Compare with the previous results.