辅导案例-PS4

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

PS4 Coding
This assignment will have us looking to build a deep convolutional neural network, similar to the architecture of
LeNet-5
This network will use the softmax function to make a 10 class image classification on the MNIST data set (the
original MNIST and not the fashion_mnist we've been working with thusfar)
But before you get started, please make sure you have the following packages installed
Packages to install:
1. numpy
2. keras
3. tensorflow
4. matplotlib
For keras and tensorflow, please refer to this link (https://docs.floydhub.com/guides/environments/
(https://docs.floydhub.com/guides/environments/)) to make sure you install versions that are compatible with
each other. I would highly recommend getting tensorflow==1.14.1 and the compatible keras version. The exact
python version, as long as it's python3+, should not impact your ability to use these two packages.
Structure of Assigment
What's new compared to PS3:
1. Convolutional Kernels
2. Ensemble training
Terminology
Please look over the power point under Piazza > Resources > ConvolutionalNetwork.ppt to make sure you
understand exactly what I mean when I type the following terms:
1. Applying a kernel/filter
2. Kernel/Filter
3. Max Pool
4. Feature map
5. Convolution
Network Architecture
title
You will be implementing variations of LeNet-5 by hand, with flexible kernel shapes. In addition you will be
implementing ensemble learning using LeNet-5.
The static architecture can be seen above:
1. 2 convolutional layers, each followed by a max-pooling layer
2. 3 fully conected layers with an output shape of 10
You will notice that each of the hidden convolutional outputs and max-pooling outputs are ??x?? or ?x? in
terms of their dimensions. That is intentional as your first task is to figure out exactly what those are.
1. Kernel 1: 4 x 4, padding = 0, stride = 1
2. Kernel 2: 4 x 4, padding = 0, stride = 1
3. MaxPool1: 2 x 2, padding = 0, stride = 2
4. MaxPool2: 2 x 2, padding = 0, stride = 2
These kernel and maxpool sizes are values to start with. After you get the network working with these kernel
and maxpool shapes, you will need to adjust it so it can take any valid kernel and any valid max pool shape.
Here, we define valid as out_shape is an integer greater than 1, where
Static Variables:
1. Number of layers
2. Output shape
Flexible variables:
1. Kernel shapes
2. MaxPool filter shapes
3. Number of kernels per conv layer
4. Number of nodes per FC layer
Assignment Grading and Procedure Recommendation
This assignment overall has ??? points for all the methods you have to implement. Imposed on this total are the
following percentages:
1. If you correctly implement all methods, and you can correctly apply one kernel per conv layer, you will earn
80% of the points
2. Correct implementation of multiple kernels applied at each layer will earn you an additional 10%
3. Correct implementation of ensemble learning will earn you an additional 5%
4. Experimentation on parameters will earn you the last 5% of the points. See the bottom of this document for
details
For instance, if you correctly implement all methods but do not have multiple kernels, nor ensemble training you
will recieve (87 x 0.75) out of the possible 87.
If you accomplish the situation above but also correctly add drop out and Ensemble training, you will then
recieve (87 * 0.95) out of the possible 87.
Here is how I recommend going about this assignment:
1. Implement the network with batch training, weight decay, bias terms, and one kernel applied to each conv
layer
2. Add multiple kernels per conv layer functionality
_ℎ = + 1
_ℎ + 2 ∗ − _ℎ

3. Add different kernel shape functionality
4. Implement ensemble learning
5. Experimentation
A note on "different kernel shape":
For a convolutional network, all kernels applied at the same layer will be the same shape, when I say that your
network should be able to handle differnt kernel shapes, that means if you change the kernel shape at a given
layer, all kernels applied on that layer will adopt the new shape.
For instance, Kernel1 begins as a 4x4 kernel. This means if I wanted to apply multiple Kernel1's to the input
layer, then I will apply multiple 4x4 kernels (they are all the same shape). If I change my network such that
Kernel1 is now a 6x6 kernel that means ALL applications of kernel1 will now be 6x6.
Data Format
You will notice that this assignment has very little headers and comments. I am leaving it up to you to decide
exactly what info you need to incorporate for each function as a parameter, and the functionality and output of
each that function. Feel free to use the previous problem sets as models for how to model your code. I
recommend you continue to format your data in terms of N x M
1. N = number of features
2. M = number of data points
Loops during multi-kernel convolution
In order to not get you guys bogged down on dealing with 3D and 4D matrix multiplication, I will say the
following:
The application of a single kernel should not impose any loops (straight matrix multiplication).
However, when you reach the stage of applying multiple kernels to a single layer, I would recommend you
simply loop through all kernel matrices for that layer and apply them one at a time.
This means that if your data begins as a NxM (2D matrix), then each kernel application will produce a (N1 x M)
2D matrix, where N1 = the flattened feature map of the kernel application. These (N1 x M) matrices can be
kept separately, rather than combining them into one 3D matrix.
Data management
We will be using four data sets for this problem set.
1. MNIST (the most popular computer vision data set)
2. Dummy data (for testing purposes)
Graded Exercise (15 points total, 3 points each) - implement the following functions for data parsing:
In [2]: from keras.datasets import cifar10
import tensorflow as tf
from tensorflow import keras
from matplotlib import pyplot
import numpy as np
####BEGIN CODE HERE####
def gen_dummy():
'''
dummy data is exceptionally useful to test whether or not your netwo
rk behaves as expected.
For dummy data, you should generate a few (<= 5) input/output pairs
that you can use to test
your forward and backward propagation algorithms

output:
dummy_x = a NxM np matrix, both dimensions of your choosing of very
simple data
dummy_y = a (M, ) np array with the corresponding labels
'''
dummy_x = []
dummy_y = []
####BEGIN CODE HERE####

####END CODE HERE####

return dummy_x, dummy_y

def load_mnist():
'''
look up how to load the mnist data set via keras
'''
return 0

def flatten_normalize():
'''
convert the image from a N1xN1xM to NxM format where N1 = square_roo
t(N) and normalize
'''
return 0

def subset_mnist_training():
'''
Return 100 training samples from each of the 10 classes, 1000 sample
s all together
'''
return 0

def subset_mnist_testing():
'''
Return 20 training samples from each of the 10 classes, 200 samples
all together
'''
return 0

####END CODE HERE
Graded exercise (28 points total, 4 points each): Complete the following helper functions
You will notice that there are no parameters, that is up to you to determine what each function needs. You will
also notice there isn't much explanation as to what each function does. That is because you should determine
what each function takes as parameters and what they return. The only thing I ask you not to change is the
function name
In [4]: ####BEGIN CODE HERE####
def log_cost():
'''
computes the log cost of the current predictions using the labels (s
ame as PS3)
'''

def softmax():
'''
computes the softmax of the input (same as PS3)
'''
return 0

def ReLU():
'''
computes the ReLU of the input (same as PS3)
'''
return 0

def ReLU_prime():
''''
computes the ReLU' of the input (same as PS3)
'''
return 0

def kernel_to_matrix():
'''
converts a kernel to its matrix form
'''
return 0

def max_pool():
'''
applies max-pooling to an input image
'''
return 0

def max_pool_backwards():
'''
takes the output of a maxpool, and projects back to the original sha
pe.
see PPT slides on convolutional backprop if you have no idea what
I'm talking about.
'''
return 0

####END CODE HERE####
Graded exercise (6 points total, 2 points per function) - Complete the initialization functions
In [5]: ####BEGIN CODE HERE####
def kernel_initialization():
'''
returns a kernel with specified height and width.
The values of the kernel should be initialized using the same formul
a as the
He_initialize_weight() function
'''
return 0

def He_initialize_weight():
'''
(same as PS3)
returns a weight matrix with the passed in dimensions
'''
return 0
def bias_initialization():
''' (same as PS3)
returns a bias matrix of the passed in dimensions
'''
return 0
####END CODE HERE####
Forward and BackProp functions
title
Graded Exercise (28 points total, 4 points each) - complete the following functions:
1. predict()
2. delta_Last()
3. delta_el()
4. dW()
5. db()
6. weight_update()
7. bias_update()
For full credit for predict, you will need to incorporate the bias terms correctly, and for weight_update, you will
need to correctly utilize the weight_decay parameter
You may find the image above helpful when implementing the backpropagation methods
In [6]: ####BEGIN CODE HERE####
def predict():
'''
minimum output: the predictions made by the network

You are free to return more things from this function if you see fit

hint: you will need to return all intermediate computations, not jus
t the output
To figure out what you need to return, look at what intermediate res
ults you need
to compute backpropagation

you can return more than one variable with the following syntax:

return var1, var2, ..., varN
'''

return 0

def delta_Last():
'''
task: computer error term for ONLY output layer
'''

return 0

def delta_el():
'''
task: compute error term for any hidden layer
'''
####BEGIN CODE HERE ####

return 0

####END CODE HERE####

def dW():
'''
task: compute gradient for any weight matrix
'''
return 0

def db():
'''
task: compute gradient for any bias term
'''
return 0

def weight_upate():
'''

task: udpate each of the weight matrices, and return them in a varia
ble, name of your choosing
'''

return 0

def bias_update ():
'''
task : update each of the bias terms, return them in a variable, nam
e of your choosing
'''

return 0

TRAINING
Graded Exercise (10 points): Complete the training function below
In [4]: def train():
'''
IN ADDITION: please have the train function output a graph of the co
st using matplotlib.pyplot.
If everything works correctly the cost should be monotonically decre
asing
'''
####BEGIN CODE HERE####

####END CODE HERE####
return 0
In [7]: '''
In the section below is where you should setup all your weights, bia
s, parameters, and input/labels

Then pass all the relevant parameters to train and run and debug it

Remember: no aspect of the network size should be hard coded except
the output layer (which should always
have three nodes), they should adjustable via variables like the one
s provided.

The size of your input layer will be dictated by which data set you
are using. Once again, that parameter should be
in terms of the variables, and not a hard coded size

'''
output_nodes = 10
####BEGIN CODE HERE####

####END CODE HERE####
Testing
There are two steps to the testing process: first, we need to take the output of our network and make a
decision, either class 0, class 1, ..., or class 10, and second we need to measure how well our network performs
on the testing set.
Remember that our network outputs 10 probability values, all between 0 and 1, and we need turn this vector of
three elements into a single output: the predicted class of the data point by the network. Simply pick the index
of the largest probability value as the decision.
Graded Exercise(5 points total) - implement decision (1 point) and test(4 points)
In [6]: ####BEGIN CODE HERE####
def decision(prediction):
'''
input: a (10, M) matrix where each column is the softmax output for
data point i, 0 <= i <= M

output: a (M, ) np array where each element corresponds to the highe
st probability class from prediction
'''

return 0

def test():
'''
output: the accuracy of your model on the unseen x and y pairs
'''

return accuracy

####END CODE HERE####
In [ ]: '''
Here is some space for you to call and test your test/decision functions
'''
####BEGIN CODE HERE####

####END CODE HERE####
Ensemble learning:
This is a fairly common technique to try and promote regularization in a system. Simply put, you train multiple
networks (either of the same or different sizes/shapes), on the same data.
Then, you pass each of the networks a testing set, and initiate a "vote".
The voting procedure is: for each testing point, the class with majority vote wins.
If none of your models agree on a class, then you just randomly pick from their decisions. Tie breakers are also
determined randomly.
To recieve full credit for ensemble learning, simply train a minimum of 5 unique networks, and correctly code a
voting process
In [ ]: ####BEGIN CODE HERE####
'''
ensemble training and testing space
'''

####END CODE HERE####
Experimentation explanation:
What are you experimenting on? Any aspect of your network that is not learned by SGD, and is not the number
of layers of the network, is free for experimentation. Learning rate, number of training samples, epochs, nodes
per layer etc etc are a few examples.
All I ask in this section, is that you try to maximize your performance on the fashion_mnist data set and mnist
dataset on at least 60 testing samples, and keep track of what you found below. The write up can be as simple
as: "The best I got with mnist was (accuracy) with these parameters: (list of parameters)". But ideally you would
talk a bit more about the trends, such as "I noticed that if i kept decreasing my learning rate, the performance
on X dataset would improve until a certain point, then would become worse".
There are a lot of ways to get full credit on this experimentation portion, as there is no dedicated format I am
requesting. There is no benchmark. There is no "you must get 100% accuracy". I simply ask you to see how
well your models can do with the restrictions in place.
Best of luck, Ryan
Experimentation notes:
This cell has been left in text format for you to freely edit and keep track
of your experimentations.
In [ ]: