Q1 (30%)

Gradient descent - Linear regression

In this question we are going to learn first hand how gradient descent works in the context of a

concrete house pricing model. Note that we are not making use of a validation strategy here by

using a test set or cross-fold validation—which is something that would be otherwise generally

recommended. This exercise focuses on the inner workings of gradient descent applied to the

quadratic cost function that was learned in the Linear regression class. It’s important that the

code provided is well documented and that no other external package is used (pandas, numpy

are ok).

a) Using the house price prediction dataset (described here and downloadable from Brightspace

as housePriceKaggleTrain) and two features, i.e. 1stFlrSF and 2ndFlrSF, we want to fit a linear

regression model to predict SalePrice. We want to fit this model by applying a hand made

gradient descent algorithm. We will use the basic cost function learned in the lecture.

The model should be as follows:

p: sale price (SalePrice)

t: first floor square feet (1stFlrSF)

s: second floor square feet (2ndFlrSF)

x xp = t 1 + s 2 + b

Your function should be able to return the updated weights x1, x2 and b after every iteration of

the gradient descent algorithm.

Your function should be defined as follows:

def LRGradDesc(data, target, init_x1, init_x2, init_b, learning_rate_w,

learning_rate_b, max_iter):

Note that there is a slight variation with the approach proposed in the lecture as here we use

two different learning rates (one learning rate for the weight coefficients and another one for the

bias).

And it should print lines as indicated below:

Iteration 0: init_x1, init_x2, init_b, initial_cost

Iteration 1: [x1 after first iteration], [x2 after first iteration], [b after first iteration], [cost after first iteration]

Iteration 2: [x1 after second iteration], [x2 after second iteration], [b after second iteration], [cost after second iteration]

…

Iteration max_iter: [x1 after max_iter iteration], [x2 after max_iter iteration], [b after max_iter iteration], [cost after max_iter iteration]

Note that you may want to print every 100 or every 1000 iterations if max_iter is a fairly large

number (but you shouldn’t have more iterations than the indicated in max_iter).

b) Compare this model with a solution computed in closed form or by using a machine learning

library to compute the linear regression model.

c) Discuss how the choice of learning_rate affects the fitting of the model.

HINT: The learning rates for the weight coefficients should be about 1000 to 100000 times

smaller than the one for the bias. In part c) you are encouraged to play with different learning

rates.

Q2 (20%)

Classification using two standard techniques

In this question you will experiment with two traditional classification models on the pima indians

data set. The models are: decision trees, and SVM. The objective of this question is to

demonstrate your ability to compare different machine learning models, and derive a conclusion

if possible.

a) For the SVM classifier, use the default parameters, and 5-fold cross validation, and report the

overall accuracy and confusion matrix, as well as accuracy and confusion matrix for each fold.

What is the standard deviation of accuracy over the folds?

b) for the Decision tree classifier, experiment with different numbers of max depths of the tree

(in the range from 2 to 8). Use the default values for the remaining parameters. Evaluate each

parameter selection using 5-fold cross validation, and report the overall accuracy and the

confusion matrix. Only report the interesting parameter settings.

c) Summarize your findings from (a) and (b). What can you conclude from these two methods?

Q3 (20%)

Polynomial regression

This question aims at applying polynomial regression on a large data set and deciding the set of

hyperparameters that best applies to this scenario.

a) Use the California Housing dataset from scikit-learn and apply a polynomial regression with

the full set of features (no interactions). Experiment with polynomials of different degrees.

Compare their performance with each other using a fixed train and validation partition (top 80%

for training and the remaining 20% for validation). Discuss the interpretation of the results.

Use visualizations of appropriate quantities to make sense of the results and support your

interpretation.

Q4 (30%)

k-Nearest Neighbors

In this question you will develop a kNN algorithm to be used for the automatic recognition of

handwritten digits. You are allowed to use libraries for data manipulation and math operations

(e.g. numpy, pandas) but no machine learning library to compute k-nearest neighbors (e.g.

scikit-learn is not ok). If in doubt, please ask.

a) Write a k-NN algorithm that receives as input labeled training data and a set of unlabeled

instances. The algorithm returns the most likely label (i.e. a digit) to each unlabeled instance.

The algorithm should implement the Minkowski distance. The value for k (number of neighbors)

and the parameter p are hyperparameters of the method that will be also passed as input.

Your function should be defined as follows:

def kNN( train_x, train_y, test_x, k, p):

The code should return a list or array of predictions for all the instances in test_x.

You will test your code using a reduced version of the MNIST database. The original MNIST

dataset is a database of 60,000 handwritten digits for training and 10,000 for testing where each

digit is represented by a 784-dimensional vector. The reduced version of this dataset can be

found on Brightspace as reducedMNIST.zip.

b) Report the classification error and confusion matrix you obtain for different combinations of p

and k. Remember that you must only use the “test_labels.csv” file to compare your predictions

to. In addition, critically discuss your results, as you vary p and k.

c) What strategy would you follow to choose the parameters (k and p) for your final model, if you

did not have the labels for the test set?

Submitting the assignment

1. Your assignment as a single .ipynb file including your answers should be submitted before

the deadline on Brightspace.

Use markdown syntax to format your answers and to clearly indicate what question.

2. You can submit multiple editions of your assignment. Only the last one will be marked. It is

recommended to upload a complete submission, even if you are still improving it, so that you

have something into the system if your computer fails for whatever reason.

3. IMPORTANT: PLEASE NAME YOUR PYTHON NOTEBOOK FILE AS:

Soto-Axel-Assignment-1.ipynb

A penalty applies if the format is not correct.

4. The markers will enter your marks and their overall feedback on Brightspace, and they will

upload your Python notebook file with comments on specific cells, as a new markdown cell

below the cell being commented on.

Marking the assignment

Criteria and weights. Each criterion is marked by a letter grade. Overall mark is the

weighted average of the grade of each criterion.

0.2 Clarity: All steps are clearly described. The origin of all code used is clearly. Markdown is

used effectively to format the answer to make it easier to read and grasp the main points. Links

have been added to all online resources used (markdown syntax is: [AnchorText](URL) ).

0.2 Justification: Parameter choices or processes are well justified.

0.2 Results: The results are complete. The results are presented in a manner that is easy to

understand. The answer is selective in the amount and diversity of the experimental results

presented.

Only key results that support the insights are presented. There is no need to present every

single experiment you carried out. Only the interesting results are presented, where the

behaviour of the ML model varies.

0.4 Insights: The insights obtained from the experimental results are clearly explained. The

insights are connected with the concepts discussed in the lectures.

The insights can also include statistical considerations (separate training-test data,

cross-validation, variance).Preliminary investigation of the statistical properties of the attributes

(e.g. histogram, mean, standard deviation) is included.