Q1 (30%)

Gradient descent - Logistic regression

In this question we are going to experiment with logistic regression. This exercise focuses on

the inner workings of gradient descent using a cross-entropy cost function as it was learned in

class.

a) Using the pima indians data set, first separate a random 20% of your data instances for

validation. Then, apply a feature selection algorithm based on evaluating feature importance

using Pearson correlation (scipy documentation). Extract the top two most important features

based on this measure.

b) We want to train a logistic regression model to predict the target feature Outcome. It’s

important that no other external package is used (pandas, numpy are ok) for this question

part. We want to find the weights for the logistic regression using a hand made gradient descent

algorithm. We will use cross-entropy as the cost function, and the logistic cross-entropy to

compute the weight update during gradient descent. It is OK to reuse as much as you need from

the code you developed for Assignment 1. Differently from what we did for Assignment 1, we

are now using a random 20% of your data instances for validation.

Your function should be able to return the updated weights and bias after every iteration of the

gradient descent algorithm.

Your function should be defined as follows:

def LRGradDesc(data, target, weight_init, bias_init, learning_rate, max_iter):

And it should print lines as indicated below (note the last line with the weights):

Iteration 0: [initial_train cost], [train accuracy], [validation accuracy]

Iteration 1: [train cost after first iteration], [train accuracy after first iteration], [validation

accuracy after first iteration]

Iteration 2: [weights after second iteration], [train cost after second iteration], [train accuracy after

second iteration], [validation accuracy after second iteration]

…

Iteration max_iter: [weights after max_iter iteration], [train cost after max_iter iteration], [train

accuracy after max_iter iteration], [validation accuracy after max_iter iterations]

Final weights: [bias], [w_0], [w_1]

Note that you may want to print every 100 or every 1000 iterations if max_iter is a fairly large

number (but you shouldn’t have more iterations than the indicated in max_iter).

c) Discuss how the choice of learning_rate affects the fitting of the model.

d) Compare your model with one using a machine learning library to compute logistic

regression.

e) Retrain your model using three features of your choice. Compare both models using an ROC

curve (you can use code from here to draw the ROC curve)

Q2 (30%)

Multi-class classification using neural networks

In this question you will experiment with a neural network in the context of text classification,

where a document can belong to one out of several possible categories. The main goal for you

is to try different hyperparameters in a systematic manner so that you can propose a network

configuration that is properly justified. You will experiment with the Reuters dataset, which can

be loaded directly from Keras:

from keras.datasets import reuters

(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)

a) Experiment with different hyper-parameters and report your best accuracy found. The most

important hyperparameters that you need to experiment with in this question part are: number of

layers, nodes per hidden layer, learning rate, and number of epochs.

b) Describe how your convergence changes when you vary the size of your mini-batch. A plot

showing cost in terms of number of epochs would be enough. Discuss the reasons for this.

c) Experiment with different regularization options (e.g. L2 and dropout).You may need to make

your network larger in case you don’t find much benefits from applying regularization.

Note: we recommend you to control your initialization parameters by means of a seed

https://keras.io/api/layers/initializers/.

Q3 (10%)

Computational graph (no code involved)

This question aims at checking your understanding on defining arbitrary network architectures

and compute any derivative involved for optimization.

Consider a neural network with N input units, N output units, and K hidden units. The activations

are computed as follows:

where σ denotes the logistic function, applied elementwise. The cost involves a squared

difference with the target s (with a 0.5 factor) and a regularization term that accounts for the dot

product with respect to an external vector r. More concretely:

a) Draw the computation graph relating x, z, h, y, , , and .

b) Derive the backpropagation equations for computing ∂ /∂ . To make things simpler, you W (1)

may use σ’ to denote the derivative of the logistic function.

Q4 (30%)

Tuning generalization

In this question you will construct a neural network to classify a large set of low resolution

images. Differently from Q2, in this case we suggest you a neural network to start experimenting

with, but we would like you to describe the behavior of the network as you modify certain

parameters. You will be reproducing some concepts mentioned during the lectures, such as the

one shown on slide 8, of the lecture on “Ensembles, regularization and feature selection” from

Week 4.

a) Use the CIFAR-100 dataset (available from Keras)

from keras.datasets import cifar100

(x_train_original, y_train_original), (x_test_original, y_test_original) =

cifar100.load_data(label_mode='fine')

to train a neural network with two hidden layers using the ReLU activation function, with 500 and

200 hidden nodes, respectively. The output layer should be defined according to the nature of

the targets.

a) Generate a plot that shows average precision for training and test sets as a function of the

number of epochs. Indicate what a reasonable number of epochs should be.

b) Generate a plot that shows average precision for training and test sets as a function of the

number of weights/parameters (# hidden nodes). For this question part, you will be modifying

the architecture that was given to you as a starting point.

c) Generate a plot that shows average precision for training and test sets as a function of the

number of instances in the training set. For this question part, you will be modifying your training

set. For instance, you can run 10 experiments where you first use a random 10% of the training

data, a second experiment where you use a random 20% of the training data, and so on until

you use the entire training set. Keep the network hyperparameters constant during your

experiments.

d) Based on all your experiments above, define a network architecture and report accuracy and

average precision for all classes.

e) Can you improve test prediction performance by using an ensemble of neural networks?

Submitting the assignment (REVISED)

Note that you will have four separate Assignments 2 on Brightspace, i.e. one for each

question (A2-Q1, A2-Q2, A2-Q3 and A2-Q4)

1. Your assignment as a single .ipynb file including your answers should be submitted for each

question before the deadline on Brightspace.

Use markdown syntax to format your answers.

2. You can submit multiple editions of your assignment. Only the last one will be marked. It is

recommended to upload a complete submission, even if you are still improving it, so that you

have something into the system if your computer fails for whatever reason.

3. IMPORTANT: PLEASE NAME YOUR PYTHON NOTEBOOK FILE AS:

Soto-Axel-Assignment-2-1.ipynb (for the first question of the second assignment)

A penalty applies if the format is not correct.

4. The markers will enter your marks and their overall feedback on Brightspace. In case that

there is any important feedback, it will be given to you, but otherwise you would need to refer to

the model solutions.

Marking the assignment

Criteria and weights. Each criterion is marked by a letter grade. Overall mark is the

weighted average of the grade of each criterion.

For the experimental questions:

0.2 Clarity: All steps are clearly described. The origin of all code used is clearly. Markdown is

used effectively to format the answer to make it easier to read and grasp the main points. Links

have been added to all online resources used (markdown syntax is: [AnchorText](URL) ).

0.2 Justification: Parameter choices or processes are well justified.

0.2 Results: The results are complete. The results are presented in a manner that is easy to

understand. The answer is selective in the amount and diversity of the experimental results

presented.

Only key results that support the insights are presented. There is no need to present every

single experiment you carried out. Only the interesting results are presented, where the

behaviour of the ML model varies.

0.4 Insights: The insights obtained from the experimental results are clearly explained. The

insights are connected with the concepts discussed in the lectures.

The insights can also include statistical considerations (separate training-test data,

cross-validation, variance).Preliminary investigation of the statistical properties of the attributes

(e.g. histogram, mean, standard deviation) is included.

For the theoretical questions (Q3):

0.6 Correctness: Correctness of the answer. Explanation is clear and precise.

0.4 Neatness of explanation: Explanation is well written, well structured and easy to read. It

uses well defined and consistent notation.