辅导案例-PS4

PS4 Coding

This assignment will have us looking to build a deep convolutional neural network, similar to the architecture of

LeNet-5

This network will use the softmax function to make a 10 class image classification on the MNIST data set (the

original MNIST and not the fashion_mnist we've been working with thusfar)

But before you get started, please make sure you have the following packages installed

Packages to install:

1. numpy

2. keras

3. tensorflow

4. matplotlib

For keras and tensorflow, please refer to this link (https://docs.floydhub.com/guides/environments/

(https://docs.floydhub.com/guides/environments/)) to make sure you install versions that are compatible with

each other. I would highly recommend getting tensorflow==1.14.1 and the compatible keras version. The exact

python version, as long as it's python3+, should not impact your ability to use these two packages.

Structure of Assigment

What's new compared to PS3:

1. Convolutional Kernels

2. Ensemble training

Terminology

Please look over the power point under Piazza > Resources > ConvolutionalNetwork.ppt to make sure you

understand exactly what I mean when I type the following terms:

1. Applying a kernel/filter

2. Kernel/Filter

3. Max Pool

4. Feature map

5. Convolution

Network Architecture

title

You will be implementing variations of LeNet-5 by hand, with flexible kernel shapes. In addition you will be

implementing ensemble learning using LeNet-5.

The static architecture can be seen above:

1. 2 convolutional layers, each followed by a max-pooling layer

2. 3 fully conected layers with an output shape of 10

You will notice that each of the hidden convolutional outputs and max-pooling outputs are ??x?? or ?x? in

terms of their dimensions. That is intentional as your first task is to figure out exactly what those are.

1. Kernel 1: 4 x 4, padding = 0, stride = 1

2. Kernel 2: 4 x 4, padding = 0, stride = 1

3. MaxPool1: 2 x 2, padding = 0, stride = 2

4. MaxPool2: 2 x 2, padding = 0, stride = 2

These kernel and maxpool sizes are values to start with. After you get the network working with these kernel

and maxpool shapes, you will need to adjust it so it can take any valid kernel and any valid max pool shape.

Here, we define valid as out_shape is an integer greater than 1, where

Static Variables:

1. Number of layers

2. Output shape

Flexible variables:

1. Kernel shapes

2. MaxPool filter shapes

3. Number of kernels per conv layer

4. Number of nodes per FC layer

Assignment Grading and Procedure Recommendation

This assignment overall has ??? points for all the methods you have to implement. Imposed on this total are the

following percentages:

1. If you correctly implement all methods, and you can correctly apply one kernel per conv layer, you will earn

80% of the points

2. Correct implementation of multiple kernels applied at each layer will earn you an additional 10%

3. Correct implementation of ensemble learning will earn you an additional 5%

4. Experimentation on parameters will earn you the last 5% of the points. See the bottom of this document for

details

For instance, if you correctly implement all methods but do not have multiple kernels, nor ensemble training you

will recieve (87 x 0.75) out of the possible 87.

If you accomplish the situation above but also correctly add drop out and Ensemble training, you will then

recieve (87 * 0.95) out of the possible 87.

Here is how I recommend going about this assignment:

1. Implement the network with batch training, weight decay, bias terms, and one kernel applied to each conv

layer

2. Add multiple kernels per conv layer functionality

_ℎ = + 1

_ℎ + 2 ∗ − _ℎ

3. Add different kernel shape functionality

4. Implement ensemble learning

5. Experimentation

A note on "different kernel shape":

For a convolutional network, all kernels applied at the same layer will be the same shape, when I say that your

network should be able to handle differnt kernel shapes, that means if you change the kernel shape at a given

layer, all kernels applied on that layer will adopt the new shape.

For instance, Kernel1 begins as a 4x4 kernel. This means if I wanted to apply multiple Kernel1's to the input

layer, then I will apply multiple 4x4 kernels (they are all the same shape). If I change my network such that

Kernel1 is now a 6x6 kernel that means ALL applications of kernel1 will now be 6x6.

Data Format

You will notice that this assignment has very little headers and comments. I am leaving it up to you to decide

exactly what info you need to incorporate for each function as a parameter, and the functionality and output of

each that function. Feel free to use the previous problem sets as models for how to model your code. I

recommend you continue to format your data in terms of N x M

1. N = number of features

2. M = number of data points

Loops during multi-kernel convolution

In order to not get you guys bogged down on dealing with 3D and 4D matrix multiplication, I will say the

following:

The application of a single kernel should not impose any loops (straight matrix multiplication).

However, when you reach the stage of applying multiple kernels to a single layer, I would recommend you

simply loop through all kernel matrices for that layer and apply them one at a time.

This means that if your data begins as a NxM (2D matrix), then each kernel application will produce a (N1 x M)

2D matrix, where N1 = the flattened feature map of the kernel application. These (N1 x M) matrices can be

kept separately, rather than combining them into one 3D matrix.

Data management

We will be using four data sets for this problem set.

1. MNIST (the most popular computer vision data set)

2. Dummy data (for testing purposes)

Graded Exercise (15 points total, 3 points each) - implement the following functions for data parsing:

In [2]: from keras.datasets import cifar10

import tensorflow as tf

from tensorflow import keras

from matplotlib import pyplot

import numpy as np

####BEGIN CODE HERE####

def gen_dummy():

'''

dummy data is exceptionally useful to test whether or not your netwo

rk behaves as expected.

For dummy data, you should generate a few (<= 5) input/output pairs

that you can use to test

your forward and backward propagation algorithms

output:

dummy_x = a NxM np matrix, both dimensions of your choosing of very

simple data

dummy_y = a (M, ) np array with the corresponding labels

'''

dummy_x = []

dummy_y = []

####BEGIN CODE HERE####

####END CODE HERE####

return dummy_x, dummy_y

def load_mnist():

'''

look up how to load the mnist data set via keras

'''

return 0

def flatten_normalize():

'''

convert the image from a N1xN1xM to NxM format where N1 = square_roo

t(N) and normalize

'''

return 0

def subset_mnist_training():

'''

Return 100 training samples from each of the 10 classes, 1000 sample

s all together

'''

return 0

def subset_mnist_testing():

'''

Return 20 training samples from each of the 10 classes, 200 samples

all together

'''

return 0

####END CODE HERE

Graded exercise (28 points total, 4 points each): Complete the following helper functions

You will notice that there are no parameters, that is up to you to determine what each function needs. You will

also notice there isn't much explanation as to what each function does. That is because you should determine

what each function takes as parameters and what they return. The only thing I ask you not to change is the

function name

In [4]: ####BEGIN CODE HERE####

def log_cost():

'''

computes the log cost of the current predictions using the labels (s

ame as PS3)

'''

def softmax():

'''

computes the softmax of the input (same as PS3)

'''

return 0

def ReLU():

'''

computes the ReLU of the input (same as PS3)

'''

return 0

def ReLU_prime():

''''

computes the ReLU' of the input (same as PS3)

'''

return 0

def kernel_to_matrix():

'''

converts a kernel to its matrix form

'''

return 0

def max_pool():

'''

applies max-pooling to an input image

'''

return 0

def max_pool_backwards():

'''

takes the output of a maxpool, and projects back to the original sha

pe.

see PPT slides on convolutional backprop if you have no idea what

I'm talking about.

'''

return 0

####END CODE HERE####

Graded exercise (6 points total, 2 points per function) - Complete the initialization functions

In [5]: ####BEGIN CODE HERE####

def kernel_initialization():

'''

returns a kernel with specified height and width.

The values of the kernel should be initialized using the same formul

a as the

He_initialize_weight() function

'''

return 0

def He_initialize_weight():

'''

(same as PS3)

returns a weight matrix with the passed in dimensions

'''

return 0

def bias_initialization():

''' (same as PS3)

returns a bias matrix of the passed in dimensions

'''

return 0

####END CODE HERE####

Forward and BackProp functions

title

Graded Exercise (28 points total, 4 points each) - complete the following functions:

1. predict()

2. delta_Last()

3. delta_el()

4. dW()

5. db()

6. weight_update()

7. bias_update()

For full credit for predict, you will need to incorporate the bias terms correctly, and for weight_update, you will

need to correctly utilize the weight_decay parameter

You may find the image above helpful when implementing the backpropagation methods

In [6]: ####BEGIN CODE HERE####

def predict():

'''

minimum output: the predictions made by the network

You are free to return more things from this function if you see fit

hint: you will need to return all intermediate computations, not jus

t the output

To figure out what you need to return, look at what intermediate res

ults you need

to compute backpropagation

you can return more than one variable with the following syntax:

return var1, var2, ..., varN

'''

return 0

def delta_Last():

'''

task: computer error term for ONLY output layer

'''

return 0

def delta_el():

'''

task: compute error term for any hidden layer

'''

####BEGIN CODE HERE ####

return 0

####END CODE HERE####

def dW():

'''

task: compute gradient for any weight matrix

'''

return 0

def db():

'''

task: compute gradient for any bias term

'''

return 0

def weight_upate():

'''

task: udpate each of the weight matrices, and return them in a varia

ble, name of your choosing

'''

return 0

def bias_update ():

'''

task : update each of the bias terms, return them in a variable, nam

e of your choosing

'''

return 0

TRAINING

Graded Exercise (10 points): Complete the training function below

In [4]: def train():

'''

IN ADDITION: please have the train function output a graph of the co

st using matplotlib.pyplot.

If everything works correctly the cost should be monotonically decre

asing

'''

####BEGIN CODE HERE####

####END CODE HERE####

return 0

In [7]: '''

In the section below is where you should setup all your weights, bia

s, parameters, and input/labels

Then pass all the relevant parameters to train and run and debug it

Remember: no aspect of the network size should be hard coded except

the output layer (which should always

have three nodes), they should adjustable via variables like the one

s provided.

The size of your input layer will be dictated by which data set you

are using. Once again, that parameter should be

in terms of the variables, and not a hard coded size

'''

output_nodes = 10

####BEGIN CODE HERE####

####END CODE HERE####

Testing

There are two steps to the testing process: first, we need to take the output of our network and make a

decision, either class 0, class 1, ..., or class 10, and second we need to measure how well our network performs

on the testing set.

Remember that our network outputs 10 probability values, all between 0 and 1, and we need turn this vector of

three elements into a single output: the predicted class of the data point by the network. Simply pick the index

of the largest probability value as the decision.

Graded Exercise(5 points total) - implement decision (1 point) and test(4 points)

In [6]: ####BEGIN CODE HERE####

def decision(prediction):

'''

input: a (10, M) matrix where each column is the softmax output for

data point i, 0 <= i <= M

output: a (M, ) np array where each element corresponds to the highe

st probability class from prediction

'''

return 0

def test():

'''

output: the accuracy of your model on the unseen x and y pairs

'''

return accuracy

####END CODE HERE####

In [ ]: '''

Here is some space for you to call and test your test/decision functions

'''

####BEGIN CODE HERE####

####END CODE HERE####

Ensemble learning:

This is a fairly common technique to try and promote regularization in a system. Simply put, you train multiple

networks (either of the same or different sizes/shapes), on the same data.

Then, you pass each of the networks a testing set, and initiate a "vote".

The voting procedure is: for each testing point, the class with majority vote wins.

If none of your models agree on a class, then you just randomly pick from their decisions. Tie breakers are also

determined randomly.

To recieve full credit for ensemble learning, simply train a minimum of 5 unique networks, and correctly code a

voting process

In [ ]: ####BEGIN CODE HERE####

'''

ensemble training and testing space

'''

####END CODE HERE####

Experimentation explanation:

What are you experimenting on? Any aspect of your network that is not learned by SGD, and is not the number

of layers of the network, is free for experimentation. Learning rate, number of training samples, epochs, nodes

per layer etc etc are a few examples.

All I ask in this section, is that you try to maximize your performance on the fashion_mnist data set and mnist

dataset on at least 60 testing samples, and keep track of what you found below. The write up can be as simple

as: "The best I got with mnist was (accuracy) with these parameters: (list of parameters)". But ideally you would

talk a bit more about the trends, such as "I noticed that if i kept decreasing my learning rate, the performance

on X dataset would improve until a certain point, then would become worse".

There are a lot of ways to get full credit on this experimentation portion, as there is no dedicated format I am

requesting. There is no benchmark. There is no "you must get 100% accuracy". I simply ask you to see how

well your models can do with the restrictions in place.

Best of luck, Ryan

Experimentation notes:

This cell has been left in text format for you to freely edit and keep track

of your experimentations.

In [ ]:

Email:51zuoyejun

@gmail.com

添加客服微信: **IT_51zuoyejun**