ECE 174: Introduction to Linear and Nonlinear Optimization with Applications
Fall 2019
Programming Assignment # 1
Due: November 5
Instructions: In this programming assignment you will implement a “K-Means Clustering"
heuristic, test your program on real datasets under various settings, and interpret your results.
• You can write the source code in any language of your choice (Python, MATLAB). The
code should be properly commented and you will be required to submit the source code.
You should write your own code and not copy from any resource on the internet. If you
consult any resource online before writing your own code, you must cite the source, add
a comment to your code with a link to the source you got it from. Any uncited code you
did not write yourself, or any code which is substantially similar to another student’s,
will be referred to the Office of Academic Integrity.
• In addition to the source code, you will submit a detailed report that describes the simula-
tions and explains the results using figures and mathematical interpretations. Your report
must address all the questions asked in this assignment. We do not have any detailed re-
quirements on the formatting, but we expect the report to be presented in a complete and
professional manner, with appropriate labels on items and comments which demonstrate
understanding of the results.
• Your grade on the programming assignment will be based on the correctness of the code
as well as the quality of the report. Since our evaluation of your code is based on its results
as presented in your report, correct code is very important. We have every intention of
awarding partial credit, but for instance if a small bug causes your program’s output to be
unusable throughout the report, we cannot award much credit even if the bug was very
small.
• Reading: Read Sec. 4.2 and 4.3 from the textbook to review “k-means" clustering. Also
study Example 4.4.1 on image clustering. You will implement this example and do several
numerical experiments.
• Dataset: We will use the well-known MNIST (Mixed National Institute of Standards)
database of handwritten digits which contains grayscale images of size 28ˆ 28 which can
be represented as vectors of size 784. The dataset consists of N = 60000 images that you
will cluster using k-means. The dataset is provided as mnist.mat which you can directly
read into MATLAB. For those working in Python, you can also refer to http://rasbt.
mlxtend/installation/ for obtaining the equivalent data from the MNIST images.
1. Write a program k-means to implement k-means clustering. The functions accepts two in-
puts: a list of N data vectors to be clustered, and the number of clusters, K. The program
outputs three quantities: a list of N group assignment indices (c1, c2, ¨ ¨ ¨ , cN), a list of K
group representative vectors (z1, ¨ ¨ ¨ , zK) and the value of Jclust after each iteration of the
algorithm until convergence or termination.
Initialization and Termination: Initialize k-means by randomly assigning the data points
to K group, i.e., randomly selecting each ci to be an integer between 1 and K. Alternatively,
you can also randomly assign K data points as the K group representatives. Explicitly
mention your initialization technique in the report. You can follow any of the termination
rules suggested in your textbook (Page 76). Again, mention in your report what criterion
you used.
2. Let K = 20. Implement k-means on the MNIST dataset, starting with a random assignment
of the vectors to clusters, and repeating the experiment P = 30 times. Save the output of
the algorithm for each of the 30 runs. Consider the two runs that produce the maximum
and minimum value of Jclust upon convergence. For each of these two runs
• Plot the value of Jclust as a function of the number of iterations. Comment.
• Visualize the K group representatives as images. You should generate something
similar to Fig. 4.8 and 4.9 of the textbook. Interpret the group representatives (if
which digits they represent, or if they represent something else) and compare them
for these two runs. Discuss your results.
• Find the 10 nearest data points to each of the K group representatives. Manually
list what digits they represent (by eye estimation). In your opinion, how many of
these 10 images match the corresponding group representative and how many of
them are misclassified? Generate a table and interpret your results. Plot a few of
the correctly classified as well as misclassified data points as images, alongside the
group representatives (also represented as images).
3. Repeat part 2 for K = 10 and P = 20. Compare with the previous results.
4. Repeat part 2 for K = 5 and P = 10. Compare with the previous results.  Email:51zuoyejun

@gmail.com