Introduction to Machine Learning
CSCE 478/878
Programming Assignment 4
Fall 2020
Linear Support Vector Machine & Principle Component Analysis
Basic Info
You will work in teams of maximum three students from the previous assignment.
The programming code will be graded on both implementation and correctness.
This assignment doesn’t require a written report.
Assignment Goals
This assignment is intended to build the following skill:
 Implement the Gradient Descent algorithm for the Linear Support Vector
Machine classifier model.
 Perform Principle Component Analysis (PCA) based dimensionality reduction by
using the eigendecomposition technique.
Assignment Instructions
Note: you are not allowed to use any ScikitLearn or python library for building the
Linear SVM model and performing PCA.
i. The code should be written in a Jupyter notebook. Use the following naming
convention.
ii. The Jupyter notebook should be submitted via webhandin.
________________________________________________________________________
2
Score Distribution
Part A: 478 (70 pts) & 878 (80 pts)
Part B: 478 (45 pts) & 878 (45 pts)
Total: 478 (115 pts) & 878 (125 pts)
__________________________________________________________________
Part A: Linear Support Vector Machine (SVM)
Dataset: You will use the Iris dataset for binary classification. Use petal length and
petal width features of the Iris dataset.
URL: https://scikitlearn.org/stable/modules/generated/sklearn.datasets.load_iris.html
1. Implement a Linear_SVC model class for performing binary classification. The
model should implement the batch Gradient Descent (GD) algorithm.
[40 pts]
a) __init__(self, C=1, max_iter=100, tol=None, learning_rate=‘constant’,
learning_rate_init=0.001, t_0=1, t_1=1000, early_stopping=False,
validation_fraction=0.1, **kwargs)
This method is used to initialize the data members of the class when an object
of class is created. For example, self.C = C
Arguments:
C : float
It provides the regularization/penalty coefficient.
max_iter : int
Maximum number of iterations. The GD algorithm iterates until
convergence (determined by ‘tol’) or this number of iterations.
tol : float
Tolerance for the optimization.
learning_rate : string (default ‘constant’)
It allows to specify the technique to set the learning rate: constant
learning for all iterations, or varying learning rate given by a learning rate
schedule function.
3
 ‘constant’: a constant learning rate given by ‘learning_rate_init’.
 ‘adaptive’: gradually decreases the learning rate based on a
schedule. Write a function that would be used if learning_rate is set
to ‘adaptive’. When ‘adaptive’ is used, the ‘learning_rate_init’
parameter has no effect as the learning rate varies by a learning rate
schedule function. It uses the ‘t_0’ and ‘t_1’ parameters (see below).

Pseudocode for the “adaptive” learning_rate function:
Write a function that decreases learning rate gradually during each iteration:
Learning rate = !_#$!%&'!$()*!_+
where t_0 and t_1 are two constants that you need to determine empirically.
Choose constant t_0 and t_1 such that initially the learning rate is large
enough.
learning_rate_init : double
The initial learning rate value if learning_rate is set to ‘constant’. It
controls the stepsize in updating the weights. It has no effect is the
‘learning_rate’ is ‘adaptive’.
early_stopping : Boolean, default=False
Whether to use early stopping to terminate training when validation score
is not improving. If set to True, it will automatically set aside a fraction
of training data as validation and terminate training when validation
score is not improving.
validation_fraction : float, default=0.1
The proportion of training data to set aside as validation set for early
stopping. Must be between 0 and 1. Only used if early_stopping is True.
b) fit(self, X, Y):
Implement the batch GD algorithm in the fit method. The weight vector and the
intercept/bias should be denoted by w and b, respectively. Store the cost values
for each iteration so that later you can use it to create a learning curve.
Arguments:
X : ndarray
A numpy array with rows representing data samples and columns
representing features.
Y : ndarray
A 1D numpy array with labels corresponding to each row of the feature
matrix X.
Note: the “fit” method should update the following parameters:
self.intercept_ = np.array([b])
4
self.coef_ = np.array([w])
self.support_vectors_ =
The “fit” method should display the total number of iterations using a print
statement.
Returns:
self
c)
predict(self, X)
Arguments:
X : ndarray
A numpy array containing samples to be used for prediction. Its rows
represent data samples and columns represent features.
Returns:
1D array of predicted class labels for each row in X.
Note: the “predict” method uses the self.coef_[0] and self.intercept_[0] to make
predictions.
Binary Classification using Linear_SVC Classifier
2. Read the Iris data using the sklearn.datasets.load_iris method. Create the data
matrix X by using two features: petal length and petal width
[1 pts]
3. Partition the data into train and test set (80%  20%). Use the “Partition”
function from your previous assignment.
[2 pts]
4. Model selection via Hyperparameter tuning: Use the kFold function from
previous assignment to find the optimal values for the following
hyperparameters.
[5 pts]
C
learning_rate
learning_rate_init (when ‘constant’ learning_rate is used)
max_iter
tol
5
5. Train the model using optimal values for the hyperparameters and evaluate on the
test data. Report the test accuracy and test confusion matrix.
[5 pts]
6. Plot the learning curve.
[5 pts]
7. Plot the decision boundary and show the support vectors using the
“decision_boundary_support_vectors” function given in:
https://github.com/rhasanbd/SupportVectorMachineClassifierBeginners
SurvivalKit/blob/master/Support%20Vector%20Machine1
Linearly%20Separable%20Data.ipynb
[12 pts]
Note that if your test accuracy is less than 95% you will lose 10% of the total
obtained points. If your test accuracy is less than 90% you will lose 30% of the
total obtained points.
8. [Extra Credit for 478 and Mandatory for 878] Implement early stopping in
the “fit” method of the Linear_SVC model. You will have to use the following
two parameters of the model: early_stopping and validation_fraction. Also note
that when training the model using early stopping it should generate an early
stopping curve. [10 pts]
Part B: Principle Component Analysis
You will perform dimensionality reduction on a grayscale image (posted on
Canvas) using PCA. The PCA will be implemented using the eigendecomposition
technique. More specifically, you will find the top k eigenvectors (i.e., principle
components) of a pixel matrix (a gray scale image). Then, using the top k
eigenvector matrix, you will project the pixel matrix on its principle components.
This will reduce the dimension of the pixel matrix without losing much variance.
9. Using the matplotlib.pyplot “imread” function read the image as a 2D matrix.
Denote it with “X”. Show the image using matplotlib.pyplot imshow function.
If the image is RGB, then you need to convert it into a grayscale image, as
follows (use matplotlib.pyplot “gray” function).
X = imread("image_path")[:,:,0]
gray()
[3 pts]
6
10. Implement the steps of eigendecomposition based PCA on X: (a) mean center the
data matrix X, (b) compute the covariance matrix from it, (c) find eigenvalues
and eigenvectors of the covariance matrix (you may use the numpy.linalg.eig
function). [7 pts]
11. Then, find the top k eigenvectors (sort eigenvalueeigenvector pairs from high to
low, and get the top k eigenvectors), and create an eigenvector matrix using top k
eigenvectors (each eigenvector should be a column vector in the matrix, so there
should be k columns). [10 pts]
12. Finally project the mean centered data on the k top eigenvectors (it should be a
dot product between mean centered X and the top k eigenvector matrix).
[5 pts]
13. Reconstruct the data matrix by taking dot product between the projected data
(from last step) and the transpose of the top k eigenvector matrix.
[5 pts]
14. Compute the reconstruction error between the mean centered data matrix X and
reconstructed data matrix (you may use the sklearn.metrics.mean_squared_error
function).
[5 pts]
15. Perform steps 11 – 14 for the following values of k: 10, 30, 50, 100, 500. For
each k, show the reconstructed image (use the matplotlib.pyplot imshow function
with the reconstructed data matrix for each k). With each reconstructed image
print the value of k and the reconstruction error.
[10 pts]
欢迎咨询51作业君