CS 486/686 Assignment 3 Winter 2022 (135 marks) Blake VanBerlo Due Date: 11:59 PM ET on Wednesday, March 23, 2022 Changes • v1.1: Small changes to hyperparameters in Q2.2a and Q2.b. Fixed some typos. • v1.2: Fixed typos in function names • v1.3: Instructions to calculate average cross entropy loss 1 CS 486/686 Winter 2022 Assignment 3 Academic Integrity Statement If your written submission on Learn does not include this academic integrity statement with your signature (typed name), we will deduct 5 marks from your final assignment mark. I declare the following statements to be true: • The work I submit here is entirely my own. • I have not shared and will not share any of my code with anyone at any point. • I have not posted and will not post my code on any public or private forum or website. • I have not discussed and will not discuss the contents of this assessment with anyone at any point. • I have not posted and will not post the contents of this assessment and its solutions on any public or private forum or website. • I will not search for assessment solutions online. • I am aware that misconduct related to assessments can result in significant penalties, possibly including failure in the course and suspension. This is covered in Policy 71: https://uwaterloo.ca/secretariat/policies-procedures-guidelines/policy-71. Failure to accept the integrity policy will result in your assignment not being graded. By typing or writing my full legal name below, I confirm that I have read and understood the academic integrity statement above. ©Blake VanBerlo 2022 v1.3 Page 2 of 12 CS 486/686 Winter 2022 Assignment 3 Instructions • Submit any written solutions in a file named writeup.pdf to the A3 Dropbox on Learn. If your written submission on Learn does not contain one file named writeup.pdf, we will deduct 5 marks from your final assignment mark. • Submit any code to Marmoset at https://marmoset.student.cs.uwaterloo.ca/. Grades for the programming component are determined by the unit tests in the “As- signment 3 (Final)” project on Marmoset The “Assignment 3 (Week 1)” and “As- signment 3 (Week 2)” projects contain ungraded public test suites meant to help you debug, and they are only temporarily available. • No late assignment will be accepted. This assignment is to be done individually. • I strongly encourage you to complete your write-up in LaTeX, using this source file. If you do, in your submission, please replace the author with your name and student number. Please also remove the due date, the Instructions section, and the Learning goals section. Thanks! • Lead TAs: – Question 1: Xuejun Du (
[email protected]) – Question 2: Sharhad Bashar (
[email protected]) The TAs’ office hours will be scheduled on MS Teams. Learning goals Decision Trees • Compute the entropy of a probability distribution. • Trace the execution of the algorithm for learning a decision tree. • Determine valid splits for real-valued features. • Apply overfitting prevention strategies for decision trees. Neural networks • Implement a multilayer perceptron • Implement the backpropagation algorithm including the forward and backward passes • Understand and interpret performance metrics in supervised learning ©Blake VanBerlo 2022 v1.3 Page 3 of 12 CS 486/686 Winter 2022 Assignment 3 1 Decision Trees (35 marks) 1. Recall that the entropy of a probability distribution over k outcomes c1, c2, . . . , ck is: I(P (c1), P (c2), . . . , P (ck)) = − k∑ i=1 P (ci) log2(P (ci)) In Lecture 13, we learned that the maximum entropy probability distribution over 2 outcomes was ⟨1 2 , 1 2 ⟩. Interestingly, the maximum entropy probability distribution over n outcomes is the discrete uniform distribution: ⟨ 1 n , 1 n , . . . , 1 n ⟩. Show that ⟨1 3 , 1 3 , 1 3 ⟩ is the maximum entropy probability distribution over 3 outcomes. Hint: Define a general distribution over 3 outcomes as ⟨p, q, 1−p−q⟩, where 0 ≤ p, q ≤ 1. Let H(p, q) = I(p, q, 1 − p − q), where I is the entropy as defined in lecture. The maximum of H occurs when ∇H = ⟨∂H ∂p , ∂H ∂q ⟩ = 0⃗. Marking Scheme: (8 marks) • (8 marks) Proof is correct. • (2 marks) Proof is clear and easy to understand. 2. Suppose that the computer science department would like to predict whether a student will pass CS486. Professor X thinks that the following features may be sufficient to predict success in CS486: • Calc: Whether or not the student took an introductory calculus course in the past. The domain is {true, false}. • Algo: Whether or not the student took an algorithms and data structures course. The domain is {true, false}. • Python: Whether or not the student has experience programming in Python. The domain is {true, false}. • Avg : The student’s grade average in the semester before taking CS486 (expressed as a percentage). The target variable is whether or not the student passed CS486 (i.e. ”Passed”) and its domain is {true, false}. Professor X would like to fit a decision tree to predict whether a student will pass CS486. They extract information from 20 students who took CS486 in the past, com- prising the training set1, given below. 1This dataset is fictional. ©Blake VanBerlo 2022 v1.3 Page 4 of 12 CS 486/686 Winter 2022 Assignment 3 Student Calc Algo Python Avg Passed 1 false false false 54.8 false 2 true true false 63.4 false 3 true false false 68.7 false 4 true true false 71.3 true 5 false true false 73.2 true 6 true false false 73.6 false 7 true false true 75.4 false 8 false true true 80.0 true 9 false true true 80.5 true 10 true true false 84.5 true 11 true true true 85.6 true 12 false true true 86.0 true 13 true true true 88.4 true 14 false true true 89.0 true 15 false true true 91.0 true 16 true true false 92.2 true 17 true true false 93.0 true 18 true true true 93.2 true 19 true true true 95.3 true 20 true true true 99.0 true Training Set Using the decision tree learner algorithm presented in class, fit a decision tree for this training set. Select the next feature to test based on maximum information gain of the remaining features. Remember that discrete features can only appear once on any path in the tree. For real-valued features, consider each possible split as a binary feature. For example, if you have already tested Calc and Python on the current branch, then find the split points for Avg for the examples in the branch. If the candidate split points for Avg are x and y, then you must test the following 3 features: Algo, Avg ≥ x, and Avg ≥ y. Show all of your steps and calculations. You may round all of your calculations to 4 decimal points. You may also abbreviate entropy calculations with I(p0, p1), where p0 and p1 are probabilities. Be sure to clearly show your final decision tree. Marking Scheme: (20 marks) • (4 marks) Correct feature selection • (4 marks) Correct information gain calculations • (4 marks) Correct splitting of Avg • (4 marks) Correct final decision tree ©Blake VanBerlo 2022 v1.3 Page 5 of 12 CS 486/686 Winter 2022 Assignment 3 • (4 marks) All calculations are shown 3. Professor X collects information from 10 separate students to create the following test set: Student Calc Algo Python Avg Passed 21 true false false 58.1 false 22 true false true 64.2 true 23 false false true 71.1 false 24 false true true 72.0 true 25 true true false 73.3 true 26 true true false 78.6 true 27 true false true 80.1 true 28 true true true 86.0 true 29 true true true 92.9 true 30 true true true 97.2 true Test Set List the predictions of your decision tree from the previous question for each example in the test set. What is the error on the test set? What is the accuracy on the test set? Marking Scheme: (5 marks) • (3 marks) Correct predictions • (1 marks) Correct error • (1 marks) Correct accuracy ©Blake VanBerlo 2022 v1.3 Page 6 of 12 CS 486/686 Winter 2022 Assignment 3 2 Neural Networks for Classification and Regression (100 marks) In this part of the assignment, you will implement a feedforward neural network from scratch. Additionally, you will implement multiple activation functions, loss functions, and perfor- mance metrics. Lastly, you will train a neural network model to perform a classification and a regression problem. 2.1 Bank Note Forgery - A Classification Problem The classification problem we will examine is the prediction of whether or not a bank note is forged. The labelled dataset included in the assignment was downloaded from the UCI Machine Learning Repository. The target y ∈ {0, 1} is a binary variable, where 0 and 1 refer to fake and real respectively. The features are all real-valued. They are listed below: • Variance of the transformed iamge of the bank note • Skewness of the transformed iamge of the bank note • Curtosis of the transformed iamge of the bank note • Entropy of the image 2.2 Red Wine Quality - A Regression Problem The task is to predict the quality of red wine from northern Portugal, given some physical characteristics of the wine. The target y ∈ [0, 10] is a continuous variable, where 10 is the best possible wine, according to human tasters. Again, this dataset was downloaded from the UCI Machine Learning Repository. The features are all real-valued. They are listed below: • Fixed acidity • Volatile acidity • Citric acid • Residual sugar • Chlorides • Free sulfur dioxide • Total sulfur dioxide • Density • pH • Sulphates • Alcohol ©Blake VanBerlo 2022 v1.3 Page 7 of 12 CS 486/686 Winter 2022 Assignment 3 2.3 Training a Neural Network In Lecture 14, you learned how to train a neural network using the backpropagation algo- rithm. In this assignment, you will apply the forward and backward pass to the entire dataset simultaneously (i.e. batch gradient decsent). As a result, your forward and backward passes will manipulate tensors, where the first dimension is the number of examples in the training set, n. When updating an individual weight W (l) i,j , you will need to find the average gradient ∂E ∂W (l) i,j across all examples in the training set to apply the update. Algorithm 1 gives the training algorithm in terms of functions that you will implement in this assignment. Further details can be found in the documentation for each function in the provided source code. Algorithm 1 Neural network training Require: η > 0 ▷ Learning rate Require: nepochs ∈ N+ ▷ Number of epochs Require: X ∈ Rn×f ▷ Training examples with n examples and f features Require: y ∈ Rn ▷ Targets for training examples Initiate weight matrices W (l) randomly for each layer. ▷ Initialize net for i ∈ {1, 2, . . . , nepochs} do ▷ Conduct nepochs epochs A vals, Z vals← net.forward pass(X) ▷ Forward pass yˆ ← Z vals[-1] ▷ Predictions L← L(yˆ, y) Compute ∂ ∂yˆ L(yˆ, y) ▷ Derivative of error with respect to predictions deltas ← backward pass(A vals, ∂ ∂yˆ L(yˆ, y) ) ▷ Backward pass update gradients() ▷ W (ℓ) i,j ← W (ℓ)i,j − ηEn ∂L∂W (ℓ)i,j for each weight end for return trained weight matrices W (ℓ) 2.4 Activation and Loss Functions You will implement the following activation functions and their derivatives: Sigmoid g(x) = 1 1 + e−kx Hyperbolic tangent g(x) = tanh x ©Blake VanBerlo 2022 v1.3 Page 8 of 12 CS 486/686 Winter 2022 Assignment 3 ReLU g(x) = max(0, x) Leaky ReLU g(x) = max(0, x) + min(0, kx) You will implement the following loss functions and their derivatives: Cross entropy loss: for binary classification Compute the average over all the examples. Note that log() refers to the natural logarithm. L(yˆ, y) = 1 n n∑ i=1 −(y log(yˆ) + (1− y) log(1− yˆ)) Mean squared error loss: for regression L(yˆ, y) = 1 n n∑ i=1 (yˆ − y)2 2.5 Implementation We have provided three Python files. Please read the detailed comments in the provided files carefully. Note that some functions have already been implemented for you. 1. neural network.py: Contains an implementation of a NeuralNetwork class. You must implement the forward_pass(), backward_pass(), and update_weights() methods in the NeuralNetwork class. Do not change the function signatures. Do not change anything else in this file! 2. operations.py: Contains multiple classes for multiple activation functions, loss functions, and functions for performance metrics. The activation functions extend a base Activation class and the loss functions extend a base Loss class. You must implement all the blank functions as indicated in this file. Do not change the function signatures. Do not change anything else in this file! ©Blake VanBerlo 2022 v1.3 Page 9 of 12 CS 486/686 Winter 2022 Assignment 3 3. train experiment.py:Provides a demonstration of how to define a NeuralNetwork object and train it on one of the provided datasets. Feel free to change this file as you desire. Please complete the following tasks. 1. Implement the empty functions in neural_network.py and operations.py. Zip and submit these two files on Marmoset. Please do not invoke any numpy random operations in neural_network.py and operations.py. This may throw the automatic grading off. Marking Scheme: (84 marks) Unit tests for neural network.py: • NeuralNetwork.forward_pass() (1 public test + 2 secret tests) * 4 marks = 12 marks • NeuralNetwork.backward_pass() (1 public test + 2 secret tests) * 5 marks = 15 marks • NeuralNetwork.update_weights() (1 public test + 2 secret tests) * 5 marks = 15 marks Unit tests for operations.py: • Sigmoid.value() (1 public test + 2 secret tests) * 1 mark = 3 marks • Sigmoid.derivative() (1 public test + 2 secret tests) * 1 mark = 3 marks • Tanh.value() (1 public test + 2 secret tests) * 1 mark = 3 marks • Tanh.derivative() (1 public test + 2 secret tests) * 1 mark = 3 marks • ReLU.value() (1 public test + 2 secret tests) * 1 mark = 3 marks • ReLU.derivative() (1 public test + 2 secret tests) * 1 mark = 3 marks • LeakyReLU.value() (1 public test + 2 secret tests) * 1 mark = 3 marks • LeakyReLU.derivative() (1 public test + 2 secret tests) * 1 mark = 3 marks ©Blake VanBerlo 2022 v1.3 Page 10 of 12 CS 486/686 Winter 2022 Assignment 3 • CrossEntropy.value() (1 public test + 2 secret tests) * 1 mark = 3 marks • CrossEntropy.derivative() (1 public test + 2 secret tests) * 1 mark = 3 marks • MeanSquaredError.value() (1 public test + 2 secret tests) * 1 mark = 3 marks • MeanSquaredError.derivative() (1 public test + 2 secret tests) * 1 mark = 3 marks • accuracy (1 public test + 2 secret tests) * 1 mark = 3 marks • mean_absolute_error (1 public test + 2 secret tests) * 1 mark = 3 marks 2. Once you have implemented the functions, you can train the neural networks on the two provided datasets. The bank note forgery dataset is in data/banknote authentication.csv and the wine quality dataset is in data/wine quality.csv. In train\_experiment.py, we have provided some code to instantiate a neural network and train on an entire dataset. Implement the required functions and then complete the following activities. (a) Execute k-fold cross validation for the banknote forgery dataset with k = 5. Use the sigmoid activation function for your output layer. Report the sizes of the layers that you used in your neural network, along with the activation functions you used for your hidden layers. Train for 1000 epochs in each trial and use η = 0.01. To perform cross validation, randomly split the data into 5 folds. For each fold, train the model on the remaining data and determine the trained model’s accuracy on the fold. You can use NeuralNetwork.evaluate() to determine the accuracy on the validation set (i.e. fold). Report the average and standard deviation of the accuracy across all folds. Pro- duce a plot where the x-axis is the epoch number and the y-axis is the average loss across all folds for the current epoch. Marking Scheme: (8 marks) • (6 marks) Reasonably correct plot. • (2 marks) Reasonable accuracy (average and standard deviation) (b) Execute k-fold cross validation for the wine quality dataset with k = 5. Use the ReLU activation function for your output layer. Report the sizes of the layers that you used in your neural network, along with the activation functions you used for your hidden layers. Train for 1000 epochs in each trial and use η = 0.001. ©Blake VanBerlo 2022 v1.3 Page 11 of 12 CS 486/686 Winter 2022 Assignment 3 To perform cross validation, randomly split the data into 5 folds. For each fold, train the model on the remaining data and determine the trained model’s mean absolute error on the fold. You can use NeuralNetwork.evaluate() to determine the mean absolute error on the validation set (i.e. fold). Report the average and standard deviation of the mean absolute error across all folds. Produce a plot where the x-axis is the epoch number and the y-axis is the average loss across all folds for the current epoch. Marking Scheme: (8 marks) • (6 marks) Reasonably correct plot. • (2 marks) Reasonable mean absolute error (average and standard de- viation) ©Blake VanBerlo 2022 v1.3 Page 12 of 12
欢迎咨询51作业君