# 辅导案例-CSCI3151-Assignment 2

Assignment 2 - Foundations of Machine Learning CSCI3151 - Dalhousie University

Q1 (30%)
In this question we are going to experiment with logistic regression. This exercise focuses on
the inner workings of gradient descent using a cross-entropy cost function as it was learned in
class.

a) Using the ​pima indians data set​, first separate a random 20% of your data instances for
validation. Then, apply a feature selection algorithm based on evaluating feature importance
using ​Pearson correlation (​scipy documentation). Extract the top two most important features
based on this measure.
b) We want to train a logistic regression model to predict the target feature ​Outcome​. It’s
important that no other external package is used (pandas, numpy are ok) ​for this question
part​. We want to find the weights for the logistic regression using a ​hand made gradient descent
algorithm. We will use cross-entropy as the cost function, and the logistic cross-entropy to
compute the weight update during gradient descent. It is OK to reuse as much as you need from
the code you developed for Assignment 1. Differently from what we did for Assignment 1, we
are now using a random 20% of your data instances for validation.

Your function should be able to return the updated weights and bias after every iteration of the

Your function should be defined as follows:
def LRGradDesc(​data, target, weight_init, bias_init, learning_rate, max_iter​):

And it should print lines as indicated below (note the last line with the weights):
Iteration 0:​ ​[initial_train cost], [train accuracy], [validation accuracy]
Iteration 1: [train cost after first iteration], [train accuracy after first iteration], [validation
accuracy after first iteration]
Iteration 2: [weights after second iteration], [train cost after second iteration], [train accuracy after
second iteration], [validation accuracy after second iteration]

Iteration ​max_iter​: [weights after ​max_iter iteration], [train cost after ​max_iter iteration], [train
accuracy after ​max_iter ​iteration], [validation accuracy after ​max_iter ​iterations]
Final weights: [bias], [w_0], [w_1]

Note that you may want to print every 100 or every 1000 iterations if ​max_iter is a fairly large
number (but you shouldn’t have more iterations than the indicated in ​max_iter​).

c) Discuss how the choice of learning_rate affects the fitting of the model.
d) Compare your model with one using a machine learning library to compute logistic
regression.

e) Retrain your model using three features of your choice. Compare both models using an ROC
curve (you can use code from ​here​ to draw the ROC curve)

Q2 (30%)
Multi-class classification using neural networks
In this question you will experiment with a neural network in the context of text classification,
where a document can belong to one out of several possible categories. The main goal for you
is to try different hyperparameters in a systematic manner so that you can propose a network
configuration that is properly justified. You will experiment with the ​Reuters dataset​, which can

from keras.datasets import reuters
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)

a) Experiment with different hyper-parameters and report your best accuracy found. The most
important hyperparameters that you need to experiment with in this question part are: number of
layers, nodes per hidden layer, learning rate, and number of epochs.
b) Describe how your convergence changes when you vary the size of your mini-batch. A plot
showing cost in terms of number of epochs would be enough. Discuss the reasons for this.
c) Experiment with different regularization options (e.g. L2 and dropout).You may need to make
your network larger in case you don’t find much benefits from applying regularization.

Note: we recommend you to control your initialization parameters by means of a seed
https://keras.io/api/layers/initializers/​.

Q3 (10%)
Computational graph (no code involved)
This question aims at checking your understanding on defining arbitrary network architectures
and compute any derivative involved for optimization.

Consider a neural network with N input units, N output units, and K hidden units. The activations
are computed as follows:

where σ denotes the logistic function, applied elementwise. The cost involves a squared
difference with the target ​s (with a 0.5 factor) and a regularization term that accounts for the dot
product with respect to an external vector ​r​. More concretely:

a) Draw the computation graph relating ​x, z, h, y,​ , , and .
b) Derive the backpropagation equations for computing ∂ /∂ . To make things simpler, you W (1)
may use σ’ to denote the derivative of the logistic function.

Q4 (30%)
Tuning generalization
In this question you will construct a neural network to classify a large set of low resolution
images. Differently from Q2, in this case we suggest you a neural network to start experimenting
with, but we would like you to describe the behavior of the network as you modify certain
parameters. You will be reproducing some concepts mentioned during the lectures, such as the
one shown on slide 8, of the lecture on “Ensembles, regularization and feature selection” from
Week 4.

a) Use the CIFAR-100 dataset (available from Keras)

from keras.datasets import cifar100

(x_train_original, y_train_original), (x_test_original, y_test_original) =

to train a neural network with two hidden layers using the ReLU activation function, with 500 and
200 hidden nodes, respectively. The output layer should be defined according to the nature of
the targets.
a) Generate a plot that shows average precision for training and test sets as a function of the
number of epochs. Indicate what a reasonable number of epochs should be.
b) Generate a plot that shows average precision for training and test sets as a function of the
number of weights/parameters (# hidden nodes). For this question part, you will be modifying
the architecture that was given to you as a starting point.
c) Generate a plot that shows average precision for training and test sets as a function of the
number of instances in the training set. For this question part, you will be modifying your training
set. For instance, you can run 10 experiments where you first use a random 10% of the training
data, a second experiment where you use a random 20% of the training data, and so on until
you use the entire training set. Keep the network hyperparameters constant during your
experiments.
d) Based on all your experiments above, define a network architecture and report accuracy and
average precision for all classes.
e) Can you improve test prediction performance by using an ensemble of neural networks?

Submitting the assignment (REVISED)
Note that you will have four separate Assignments 2 on Brightspace, i.e. one for each
question (A2-Q1, A2-Q2, A2-Q3 and A2-Q4)
question before the deadline on Brightspace.
2. You can submit multiple editions of your assignment. Only the last one will be marked. It is
recommended to upload a complete submission, even if you are still improving it, so that you
have something into the system if your computer fails for whatever reason.
--Assignment-N-Q.ipynb, for example
Soto-Axel-Assignment-2-1.ipynb (for the first question of the second assignment)
A penalty applies if the format is not correct.
4. The markers will enter your marks and their overall feedback on Brightspace. In case that
there is any important feedback, it will be given to you, but otherwise you would need to refer to
the model solutions.

Marking the assignment

Criteria and weights. Each criterion is marked by a letter grade. Overall mark is the
weighted average of the grade of each criterion.
For the experimental questions:
0.2 Clarity: All steps are clearly described. The origin of all code used is clearly. Markdown is
used effectively to format the answer to make it easier to read and grasp the main points. Links
have been added to all online resources used (markdown syntax is: [AnchorText](URL) ).
0.2 Justification: Parameter choices or processes are well justified.
0.2 Results: The results are complete. The results are presented in a manner that is easy to
understand. The answer is selective in the amount and diversity of the experimental results
presented.
Only key results that support the insights are presented. There is no need to present every
single experiment you carried out. Only the interesting results are presented, where the
behaviour of the ML model varies.
0.4 Insights: The insights obtained from the experimental results are clearly explained. The
insights are connected with the concepts discussed in the lectures.
The insights can also include statistical considerations (separate training-test data,
cross-validation, variance).Preliminary investigation of the statistical properties of the attributes
(e.g. histogram, mean, standard deviation) is included.

For the theoretical questions (Q3):
0.6 Correctness: Correctness of the answer. Explanation is clear and precise.
0.4 Neatness of explanation: Explanation is well written, well structured and easy to read. It
uses well defined and consistent notation. Email:51zuoyejun

@gmail.com