STAT3007 Deep Learning, Assignment 2 2021 Semester 1, due 5pm on 1 Apr Instructions. Please read the instructions carefully --- not following them may result in a penalty. (a) Write down your name and student number below. Name: [Your name] Student Number: [Your student number] (b) There are 4 questions. Provide your answers (text/code/output) in the cells marked with Answer in Q1.ipynb to Q4.ipynb . Avoid excessive output (e.g. debugging messages). Constantly save your work. Make sure that we can reproduce your answers by running your notebooks. You may find the Tips section in readme.ipynb helpful for completing the assignment. (c) When you finish your assignment, follow the submission instructions in readme.ipynb to create a zip file. Log on to Blackboard, go to Assessment, Assignment 2 to submit the zip file. You can submit as many times as you want before the deadline. The last submission will be graded. (d) Follow integrity rules, and provide citations as needed. You can discuss with your classmates, but you are required to write your solutions independently, and specify who you have discussed with in your solution. If you do not know how to solve a problem, you can get 15\% of the mark by writing down "I don't know". Tips For programming questions, if you want to use the free computing power offered by Colab, you can upload the notebook and work on it using Colab. Once you've all the results in the notebook, you can download it and replace your local version with it. For theory questions, you are encouraged to write your solution using LaTeX in the notebook. However, if you want, you can also write your solutions in another file, and then use the show_file function provided in util.py . You can run the following code to import the show_file function. If you use such files, please add them to the supplements folder. In [ ]: from util import * In case the show_file function doesn't show your images correctly, you can also use HTML command to display images in your notebooks. For example, if you've an image supplements/Q1-sol.png , you can also include it by running this HTML command
in a text cell. Submitting your solution Once you are ready to submit your solution, run the code below to create A2-sol.zip for submission. In [ ]: from util import * zip_sol() Voila! You're ready to submit A2-sol.zip on Blackboard. Q1. Perceptrons (15 marks) In this question, we examine the representational power of the perceptron. Consider a function for three binary variables , with the function taking value -1 or 1. The function can be specified using the set of examples , where are the 8 input configurations for the three binary variables, and each . Now augment each input vector in with a constant variable 1. We say that the perceptron algorithm learns the function if the perceptron algorithm converges on the training set , and the model at convergence gives exactly the same input-output relationship as . For each of the two functions below, does the perceptron algorithm learn it? Justify your answer. If your answer is yes, derive an upper bound on the number of mistakes that the perceptron algorithm makes before convergence, when the initial parameters are 0. f( , , )x 1 x 2 x 3 , , ∈ {0,1}x 1 x 2 x 3 D = {( , ),…,( , )}x 1 y 1 x 8 y 8 ,…,x 1 x 8 = f( )y i x i D f D f (a) (5 marks) The XOR function for three binary variables takes value 1 if an odd number of the variables are 1, and value -1 if an even number of the variables are 1. , , ∈ {0,1}x 1 x 2 x 3 Answer. [Write your solution here. Add cells as needed.] (b) (10 marks) The function , where , and is the logical and, and is the logical or. f( , , ) = {x 1 x 2 x 3 1 −1 if ( ∧ ) ∨ ( ∧ ) ∨ ( ∧ ),x 1 x 2 x 2 x 3 x 3 x 1 otherwise. , , ∈ {0,1}x 1 x 2 x 3 ∧ ∨ Answer. [Write your solution here. Add cells as needed.] Q2 Hopfield networks (35 marks) In this question, we train a Hopfield network and use it to reconstruct noisy patterns. (a) (5 marks) The supplments folder contain three animal image files image1.png, image2.png, and image3.png. Load the images in Python. What are the sizes of the images and their average pixel values? You may find the skimage.io.imread function from the scikit-image library useful. Answer. [Write your solution here. Add cells as needed.] (b) (5 marks) For each of the 3 images, convert it into a binary pattern by setting each pixel value to -1 if it is smaller than the average pixel value, and +1 otherwise. Answer. [Write your solution here. Add cells as needed.] (c) (5 marks) If we want to use a Hopfield network to store the above 3 binary patterns, how many neurons do we need? Answer. [Write your solution here. Add cells as needed.] (d) (5 marks) Train a Hopfield network to store the 3 binary patterns. How many weights are positive and how many are negative? Answer. [Write your solution here. Add cells as needed.] (e) (5 marks) Write a function that accepts an input pattern, and synchronously updates the activation states of the neurons at each iteration until convergence, or up to 100 iterations. The supplements folder also contain three images, noisy1.png, noisy2.png, and noisy3.png, obtained by adding noise to one of the 3 animal images. Use the trained Hopfield network and the above synchronous update to reconstruct the original images. Display the reconstructed images. Note that you need to perform conversion between images and binary patterns. Answer. [Write your solution here. Add cells as needed.] (f) (5 marks) Write a function that accepts an input pattern, and updates all the acti- vation states of the neurons in a random order at each iteration until convergence, or up to 100 iterations. Use the trained Hopfield network and the above semi-random update to reconstruct the original images. Display the reconstructed images. To make your results reproducible, set the seed for the random number generator to 1 before using it. For example, if you are using numpy to generate the random ordering, you can then set the random seed using the function numpy.random.seed. Answer. [Write your solution here. Add cells as needed.] (g) (5 marks) Construct an input image such that it is more similar to image2 than image1, but the reconstructed image obtained using the synchronous update function is more similar to image1 than image2. Use the number of common pixels as the similarity measure. Report the similarity numbers, and display image1, image2, the input image, and the reconstructed image. Explain the reasoning behind your construction. Answer. [Write your solution here. Add cells as needed.] Q3. Multilayer Perceptrons (30 marks) In this question, we will train and evaluate a single hidden layer MLP on the MNIST dataset. You should use the pytorch library useful for answering the questions below. (a) (5 marks) Load the MNIST training and test set images and labels. You can use the torchvision.datasets.MNIST class, or load them from the data files at http://yann.lecun.com/exdb/mnist/ using your own code. Convert each digit label to a one-hot representation, which is all zero except at the position corresponding to the digit label (e.g., the one-hot representation for 1 is the vector ). How many examples are in the training and test sets? What is the size of each digit image? (0,1,0,…,0) ∈R 10 Answer. [Write your solution here. Add cells as needed.] (b) (10 marks) Consider a single hidden layer MLP with 100 hidden units and 10 output units, defined by , where each is the vector representation of a digit image, and are the weight matrices, and are the biases, and applies the sigmoid function to each component of a vector . We want to train this MLP by minimizing the quadratic loss where is the training set. Note that you need to implement exactly the same loss as defined above Starting with , , , being 0, run gradient descent with step size 0.01 for 100 iterations. Compute the training and test set classification errors for the trained model. Here we use the trained model to make predictions by assigning an input to the index of the largest output. f(x; , , , ) = σ( σ( x+ )+ )W 1 W 2 b 1 b 2 W 2 W 1 b 1 b 2 x ∈R d ∈W 1 R 100×d ∈W 2 R 10×100 ∈b 1 R 100 ∈b 2 R 10 σ(u) u ( , , , ) = ||f( ; , , , ) − | ,R n W 1 W 2 b 1 b 2 1 n ∑ i=1 n x i W 1 W 2 b 1 b 2 y i | 2 2 ( , ),…( , ) ∈ ×x 1 y 1 x n y n R d R 10 W 1 W 2 b 1 b 2 x Answer. [Write your solution here. Add cells as needed.] (c) (5 marks) Repeat (b) by training the MLP starting from parameters randomly chosen from . What are the training set and test set classification errors? [−0.5,0.5] Answer. [Write your solution here. Add cells as needed.] (d) (5 marks) Repeat (b) and (c) with each full gradient replaced by a stochastic gradient computed over mini-batches of 100 examples, and running SGD for 100 epochs (one epoch means one pass through the training data; thus 100 epochs is equivalent to 100n/100=n SGD iterations). Answer. [Write your solution here. Add cells as needed.] (e) (5 marks) Train the MLP using the following learning rates 0.001, 0.1, and 1 for both gradient descent and SGD, starting from 0 and random parameters. Report the training set and test set classification errors for these 12 models. Comment the performance of all the 16 models that you obtained above. Which one has the best performance? Can you explain why? Answer. [Write your solution here. Add cells as needed.] Q4. Transfer learning (20 marks) When we learn to solve a new problem, we often leverage on knowledge that we have learned on related tasks. For example, when we learn how to play chess after learning how to play Chinese chess, we can often quickly learn new tactics for chess by relating to tactics that we have learned for Chinese chess. Exploiting previously acquired knowledge on related tasks to learn how to solve new problems is known as transfer learning, and this idea has been applied to build machine learning systems too. In this question, we will exploit a model learned on a noisy version of MNIST to learn a model on MNIST. (a) (0 marks) The code below loads a model trained on a noisy version of MNIST, with all pixel values normalized to the range [0,1]. Some of the noisy MNIST images are shown on the left figure below, together with some of the original MNIST images shown on the right figure below for comparsion. The model is actually a feature network that converts an input image into a feature vector. Read and run the code to understand how it works. In [ ]: import torch torch.manual_seed(1) # load the feature network feature_net = torch.load('supplements/fnet.pt') # create 3 random images of size 1x28x28 images = torch.rand(3, 1, 28, 28) # compute the features for the 3 random images feature_net(images) (b) (5 marks) Load the MNIST dataset and convert the images into feature vectors using the provided feature network Answer. [Write your solution here. Add cells as needed.] (c) (10 marks) Pick a simple classifier of your choice, train it on MNIST using the pixel values as features, and also train it using the features extracted in (b). Compare and comment on the performances of the two models. Answer. [Write your solution here. Add cells as needed.] (d) (5 marks) We investigate how the features learned on noisy MNIST transfer to another variant of MNIST where each image has a small random 3x3 patches cut out. Read and run the code below to understand how it works. Repeat (c) on this damaged MNIST dataset. In [ ]: import copy import matplotlib.pyplot as plt def randcut(x, patch_size=(3,3)): x = copy.deepcopy(x) h, w = patch_size H, W = x.shape[-2:] for i in range(len(x)): a = np.random.choice(H-h) b = np.random.choice(W-w) x[i, ..., a:a+h, b:b+w] = 255 return x # randomly cut 3x3 patches out from MNIST images np.random.seed(3) x_tr_cut = randcut(x_train) x_ts_cut = randcut(x_test) In [ ]: # display the first modified training image plt.imshow(x_tr_cut[0, 0], cmap='gray') plt.show() Answer. [Write your solution here. Add cells as needed.]
欢迎咨询51作业君