代写辅导接单-FIT3181: -python代写-Assignment1

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第1/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html FIT3181: Deep Learning (2024) CE/Lecturer (Clayton): Dr Trung Le | [email protected] Lecturer (Clayton): Prof Dinh Phung | [email protected] Lecturer (Malaysia): Dr Arghya Pal | [email protected] Lecturer (Malaysia): Dr Lim Chern Hong | [email protected] Head Tutor 3181: Miss Vy Vo | [[email protected] ] Head Tutor 5215: Dr Van Nguyen | [[email protected] ] Faculty of Information Technology, Monash University, Australia Student Information Surname: [Enter your surname here] Firstname: [Enter your firstname here ] Student ID: [Enter your ID here ] Email: [Enter your email here ] Your tutorial time: [Enter your tutorial time here ] Deep Neural Networks Due: 11:55pm Sunday, 8 September 2024 (Sunday) Important note: This is an individual assignment. It contributes 25% to your final mark. Read the assignment instructions carefully. 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第2/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html What to submit This assignment is to be completed individually and submitted to Moodle unit site. By the due date, you are required to submit one single zip file, named xxx_assignment01_solution.zip where xxx

is your student ID, to the corresponding Assignment (Dropbox) in Moodle. You can use Google Colab to do Assigmnent 1 but you need to save it to an *.ipynb

file to submit to the unit Moodle. More importantly, if you use Google Colab to do this assignment, you need to first make a copy of this notebook on your Google drive. For example, if your student ID is 12356, then gather all of your assignment solution to folder, create a zip file named 123456_assignment01_solution.zip and submit this file. Within this zip folder, you must submit the following files: 1. Assignment01_solution.ipynb: this is your Python notebook solution source file. 2. Assignment01_output.html: this is the output of your Python notebook solution exported in html format. 3. Any extra files or folder needed to complete your assignment (e.g., images used in your answers). Since the notebook is quite big to load and work together, one recommended option is to split solution into three parts and work on them seperately. In that case, replace Assignment01_solution.ipynb by three notebooks: Assignment01_Part1_solution.ipynb, Assignment01_Part2_solution.ipynb and Assignment01_Part3_solution.ipynb You can run your codes on Google Colab. In this case, you have to make a copy of your Google colab notebook including the traces and progresses of model training before submitting. Part 1: Theory and Knowledge Questions [Total marks for this part: 30 points] 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第3/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html The first part of this assignment is to demonstrate your knowledge in deep learning that you have acquired from the lectures and tutorials materials. Most of the contents in this assignment are drawn from the lectures and tutorials from weeks 1 to 4. Going through these materials before attempting this part is highly recommended. **Question 1.1** Activation function plays an important role in modern Deep NNs. For each of the activation functions below, state its output range, find its derivative (show your steps), and plot the activation fuction and its derivative **(a)** Exponential linear unit (ELU):

[1.5 points] **(b)** Gaussian Error Linear Unit (GELU):


is the probability cummulative function

of the standard Gaussian distribution or

where . In addition, the GELU activation fuction (the link for the main paper) has been widely used in the state-of-the-art Vision for Transformers (e.g., here is the link for the main ViT paper). [1.5 points] Write your answer here. You can add more cells if needed. **Question 1.2:** Assume that we feed a data point

with a ground- truth label

to the feed-forward neural network with the ReLU


function as shown in the following figure ELU(x) = { 0.1 (exp(x)− 1) ifx ≤ 0 x ifx > 0 GELU(x) = xΦ(x) Φ(x) Φ(x) = P (X ≤ x) X ∼ N (0, 1) x y = 2 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第4/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **(a)** What is the numerical value of the latent presentation ? [1 point] **(b)** What is the numerical value of the latent presentation ? [1 point] **(c)** What is the numerical value of the logit ? [1 point] **(d)** What is the corresonding prediction probabilities ? [1 point] **(e)** What is the predicted label ? Is it a correct and an incorect prediction? Remind that . [1 point] **(f)** What is the cross-entropy loss caused by the feed-forward neural network at ? Remind that . [1 point] **(g)** Why is the cross-entropy loss caused by the feed-forward neural network at

(i.e., ) always non-negative? When does this

loss get the value ? Note that you need to answer this question for a general pair

and a general feed-forward neural network with, for example

classes? [1 point] You must show both formulas and numerical results for earning full mark. Although it is optional, it is great if you show your PyTorch code for your computation. **Question 1.3:** For Question 1.3, you have two options: (1) perform the forward, backward propagation, and SGD update for one


(10 points), or (2) manually implement a feed-forward neural network that can work on real tabular datasets (20 points). You can choose either (1) or (2) to proceed. **Option 1** [Total marks for this option: 10 points] h1(x) h2(x) h3(x) p(x) yˆ y = 2 (x, y) y = 2 (x, y) CE(1y, p(x)) CE(1y, p(x)) 0 (x, y) M = 4 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第5/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Assume that we are constructing a multilayered feed-forward neural network for a classification problem with three classes where the model parameters will be generated randomly using your student ID. The architecture of this network is

as shown in the following figure. Note that the ELU has the same formula as the one in Q1.1. We feed a batch

with the labels

as shown in the figure. Answer the following questions. You need to show both formulas, numerical results, and your PyTorch code for your computation for earning full marks.

Forward propagation **(a)** What is the value of

(the pre-activation values of )? [0.5 point] **(b)** What is the value of ? [0.5 point] 3(Input)→ 5(ELU)→ 3(Output) X Y In [ ]: import torch student_id = 1234

#insert your student id here for example 1234 torch.manual_seed(student_id) Out[ ]: In [ ]: #Code to generate random matrices and biases for W1, b1, W2, b2 h¯ 1(x) h1 In [ ]: #Show your code h1(x) In [ ]: #Show your code 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第6/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **(c)** What is the predicted value ? [0.5 point] (d) Suppose that we use the cross-entropy (CE) loss. What is the value of the CE loss

incurred by the mini-batch? [0.5 point] Backward propagation **(e)** What are the derivatives , and ? [3 points] **(f)** What are the derivatives , and ? [3 points] SGD update **(g)** Assume that we use SGD with learning rate

to update the model parameters. What are the values of


after updating? [2 points] **Option 2** [Total marks for this option: 20 points] yˆ In [ ]: #Show your code l In [ ]: #Show your code ,∂l ∂h2 ∂l ∂W 2 ∂l ∂b2 In [ ]: #Show your code , ,∂l ∂h1 ∂l ∂h¯ 1 ∂l ∂W 1 ∂l ∂b1 In [ ]: #Show your code η = 0.01 W 2, b2 W 1, b1 In [ ]: #Show your code In [ ]: import torch from torch.utils.data import DataLoader from torchvision import datasets, transforms 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第7/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html In Option 2, you need to implement a feed-forward NN manually using PyTorch and auto-differentiation of PyTorch. We then manually train the model on the MNIST dataset. We first download the MNIST

dataset and preprocess it. Each data point has dimension [28,28] . We need to flatten it to a vector to input to our FFN. Develop the feed-forward neural networks (a) You need to develop the class MyLinear

with the following skeleton. You need to declare the weight matrix and bias of this linear layer. [3 points] In [ ]: transform = transforms.Compose([


# Convert the image to a tensor with shape [C, H, W]

transforms.Normalize((0.5,), (0.5,)),

# Normalize to [-1, 1]

transforms.Lambda(lambda x: x.view(28*28)) # Flatten the tensor to shape [-1,HW] ]) # Load the MNIST dataset train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform) test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform) train_data, train_labels = train_dataset.data, train_dataset.targets test_data, test_labels = test_dataset.data, test_dataset.targets print(train_data.shape, train_labels.shape) print(test_data.shape, test_labels.shape) In [ ]: train_dataset.data = train_data.data.view(-1, 28*28) test_dataset.data = test_data.data.view(-1, 28*28) train_data, train_labels = train_dataset.data, train_dataset.targets test_data, test_labels = test_dataset.data, test_dataset.targets print(train_data.shape, train_labels.shape) print(test_data.shape, test_labels.shape) In [ ]: train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False) 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第8/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html (b) You need to develop the class MyFFN

with the following skeleton [7 points] In [ ]: class MyLinear(torch.nn.Module):

def __init__(self, input_size, output_size):


input_size: the size of the input

output_size: the size of the output



#Your code here

self.W =

self.b =

#forward propagation

def forward(self, x): #x is a mini-batch

#Your code here In [ ]: class MyFFN(torch.nn.Module):

def __init__(self, input_size, num_classes, hidden_sizes, act = torch.nn.ReLU()):


input_size: the size of the input

num_classes: the number of classes

act is the activation function

hidden_sizes is the list of hidden sizes

for example input_size = 3, hidden_sizes = [5, 7], num_classes = 4, and act = torch.nn.ReLU()

means that we are building up a FFN with the confirguration

(3 (Input) -> 5 (ReLU) -> 7 (ReLU) -> 4 (Output))


super(MyFFN, self).__init__()

self.input_size = input_size

self.num_classes = num_classes

self.act = act

self.hidden_sizes = hidden_sizes

self.num_layers = len(hidden_sizes) + 1

def create_FFN(self):


This function creates the feed-forward neural network

We stack many MyLinear layers


hidden_sizes = [self.input_size] + self.hidden_sizes + [self.num_classes]

self.layers = []

#Your code here

def forward(self,x):


This implements the forward propagation of the batch x

This needs to return the prediction probabilities of x


#Your code here

def compute_loss(self, x, y): 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第9/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html (c) Write the code to evaluate the accuracy of the current myFFN

model on a data loader (e.g., train_loader or test_loader). [2.5 points] (c) Write the code to evaluate the loss of the current myFFN

model on a data loader (e.g., train_loader or test_loader). [2.5 points]


This function computes the cross-entropy loss

You can use the built-in CE loss of PyTorch


#Your code here

def update_SGD(self, x, y, learning_rate = 0.01):


This function updates the model parameters using SGD using the batch (x,y)

You need to implement the update rule manually and cannot rely on the built-in optimizer


#Your code here

def update_SGDwithMomentum(self, x, y, learning_rate = 0.01, momentum = 0.9):


This function updates the model parameters using SGD with momentum using the batch (x,y)

You need to implement the update rule manually and cannot rely on the built-in optimizer


#Your code here

def update_AdaGrad(self, x, y, learning_rate = 0.01):


This function updates the model parameters using AdaGrad using the batch (x,y)

You need to implement the update rule manually and cannot rely on the built-in optimizer


#Your code here In [ ]: myFFN = MyFFN(input_size = 28*28, num_classes = 10, hidden_sizes = [100, 100], act = torch.nn.ReLU) myFFN.create_FFN() print(myFFN) In [ ]: def compute_acc(model, data_loader):


This function computes the accuracy of the model on a data loader


#Your code here In [ ]: def compute_loss(model, data_loader):


This function computes the loss of the model on a data loader


#Your code here 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第10/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Train on the MNIST

data with 50 epochs using updateSGD . (d) Implement the function updateSGDMomentum

in the class and train the model with this optimizer in 50

epochs. You can update the corresponding function in the MyFNN

class. [2.5 points] (e) Implement the function updateAdagrad

in the class and train the model with this optimizer in 50

epochs. You can update the corresponding function in the MyFNN

class. [2.5 points] Part 2: Deep Neural Networks (DNN) [Total marks for this part: 25 points] The second part of this assignment is to demonstrate your basis knowledge in deep learning that you have acquired from the lectures and tutorials materials. Most of the contents in this assignment are drawn from the tutorials covered from weeks 1 to 2. Going through these materials before attempting this assignment is highly recommended. In the second part of this assignment, you are going to work with the FashionMNIST dataset for image recognition task. It has the exact same format as MNIST (70,000 grayscale images of 28 × 28 pixels each with 10 classes), but the images represent fashion items rather than handwritten digits, so each class is more diverse, and the problem is significantly more challenging than MNIST. In [ ]: num_epochs = 50 for epoch in range(num_epochs):

for i, (x, y) in enumerate(train_loader):

myFFN.update_SGD(x, y, learning_rate = 0.01)

train_acc = compute_acc(myFFN, train_loader)

train_loss = compute_loss(myFFN, train_loader)

test_acc = compute_acc(myFFN, test_loader)

test_loss = compute_loss(myFFN, test_loader)

print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Acc: {train_acc*100:.2f}%, Test Loss: {test_loss:.4f}, Test Acc: {test_acc*100:.2f}%") In [ ]: #Your code here In [ ]: #Your code here 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第11/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Load the Fashion MNIST using torchvision torch.Size([60000, 28, 28]) torch.Size([60000]) torch.Size([10000, 28, 28]) torch.Size([10000]) torch.Size([60000, 784]) torch.Size([60000]) torch.Size([10000, 784]) torch.Size([10000]) Number of training samples: 18827 Number of training samples: 16944 Number of validation samples: 1883 **Question 2.1:** Write the code to visualize a mini-batch in train_loader

including its images and labels. [5 points] In [ ]: import torch from torch.utils.data import DataLoader from torchvision import datasets, transforms torch.manual_seed(1234) In [ ]: transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) train_dataset_orgin = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) print(train_dataset_orgin.data.shape, train_dataset_orgin.targets.shape) print(test_dataset.data.shape, test_dataset.targets.shape) train_dataset_orgin.data = train_dataset_orgin.data.view(-1, 28*28) test_dataset.data = test_dataset.data.view(-1, 28*28) print(train_dataset_orgin.data.shape, train_dataset_orgin.targets.shape) print(test_dataset.data.shape, test_dataset.targets.shape) N = len(train_dataset_orgin) print(f"Number of training samples: {N}") N_train = int(0.9*N) N_val = N - N_train print(f"Number of training samples: {N_train}") print(f"Number of validation samples: {N_val}") train_dataset, val_dataset = torch.utils.data.random_split(train_dataset_orgin, [N_train, N_val]) train_loader = DataLoader(dataset=train_dataset_orgin, batch_size=64, shuffle=True) val_loader = DataLoader(dataset=val_dataset, batch_size=64, shuffle=False) test_loader = DataLoader(dataset=test_dataset, batch_size=1000, shuffle=False) In [ ]: #Your code here 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第12/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **Question 2.2:** Write the code for the feed-forward neural net using PyTorch [5 points] We now develop a feed-forward neural network with the architecture . You can choose your own way to implement your network and an optimizer of interest. You should train model in

epochs and evaluate the trained model on the test set. **Question 2.3:** Tuning hyper-parameters with grid search [5 points] Assume that you need to tune the number of neurons on the first and second hidden layers ,

and the used activation function . The network has the architecture pattern

where , and

are in their grides. Write the code to tune the hyper-parameters , and . Note that you can freely choose the optimizer and learning rate of interest for this task. **Question 2.4:** Implement the loss with the form:


is the entropy of ,

is the prediction probabilities of a data point

with the ground-truth label ,

is an one-hot label, and

is a trade- off parameter. Set

to train a model. [5 points] 784→ 40(ReLU)→ 30(ReLU)→ 10(softmax) 50 In [ ]: #Your code here n1 ∈ {20, 40} n2 ∈ {20, 40} act ∈ {sigmoid, tanh, relu} 784→ n1(act)→ n2(act)→ 10(softmax) n1,n2 act n1,n2 act In [ ]: #Your code here loss(p, y) = CE(1y, p) + λH(p) H(p) = −∑ M i=1 pi log pi p p x y 1y λ > 0 λ = 0.1 In [ ]: #Your code here 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第13/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **Question 2.5:** Experimenting with sharpness-aware minimization technique [5 points] Sharpness-aware minimization (SAM) (i.e., link for main paper from Google Deepmind) is a simple yet but efficient technique to improve the generalization ability of deep learning models on unseen data examples. In your research or your work, you might potentially use this idea. Your task is to read the paper and implement Sharpness-aware minimization (SAM). Finally, you need to apply SAM to the best architecture found in Question 2.3. Part 3: Convolutional Neural Networks and Image Classification [Total marks for this part: 45 points] The third part of this assignment is to demonstrate your basis knowledge in deep learning that you have acquired from the lectures and tutorials materials. Most of the contents in this assignment are drawn from the tutorials covered from weeks 3 to 6. Going through these materials before attempting this assignment is highly recommended. The dataset used for this part is a specific dataset for this unit consisting of approximately

images of

classes of Animals, each of which has approximately 500 images. You can download the dataset at download here if you want to do your assignment on your machine. In [ ]: #Your code here 10, 000 20 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第14/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html CUDA is not available.

Training on CPU ...

Download the dataset to the folder of this Google Colab. Downloading... From (original): https://drive.google.com/uc?id=1aEkxNWaD02Z8ZNvZzeMefUoY 97C-3wTG From (redirected): https://drive.google.com/uc?id=1aEkxNWaD02Z8ZNvZzeMefU oY97C-3wTG&confirm=t&uuid=c6914688-3441-4198-b7a3-ee55f3c61869 To: /content/Animals_Dataset.zip 100% 643M/643M [00:13<00:00, 48.0MB/s] We unzip the dataset to the folder. In [ ]: import os import requests import tarfile import time from torchvision import datasets, transforms from torch.utils.data import DataLoader, random_split import torchvision.models as models import torch.nn as nn import torch import PIL.Image import pathlib from torchsummary import summary import matplotlib.pyplot as plt %matplotlib inline import numpy as np # check if CUDA is available train_on_gpu = torch.cuda.is_available() if not train_on_gpu:

print('CUDA is not available.

Training on CPU ...') else:

print('CUDA is available!

Training on GPU ...') device = torch.device("cuda" if torch.cuda.is_available() else "cpu") torch.manual_seed(1234) Out[ ]: In [ ]: !gdown --fuzzy https://drive.google.com/file/d/1aEkxNWaD02Z8ZNvZzeMefUoY97C-3wTG/view?usp=drive_link # !gdown --fuzzy https://drive.google.com/file/d/1qdElRqDS4TitXfv_iG_TFQSi9QIfy4uM/view?usp=drive_link # backup url In [ ]: !unzip -q Animals_Dataset.zip 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第15/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Number of instance in train_set: 8519 Number of instance in val_set: 947 In [ ]: data_dir = "./FIT5215_Dataset" # We resize the images to [3,64,64] transform = transforms.Compose([transforms.Resize((64,64)),

#resises the image so it can be perfect for our model.

transforms.RandomHorizontalFlip(), # FLips the image w.r.t horizontal axis


#Rotates the image to a specified angel

#transforms.RandomAffine(0, shear=10, scale=(0.8,1.2)), #Performs actions like zooms, change shear angles.

transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2), # Set the color params

transforms.ToTensor(), # convert the image to tensor so that it can work with torch

transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),

# Normalize the images, each R,G,B value is normalized with mean=0.5 and std=0.5

]) # Load the dataset using torchvision.datasets.ImageFolder and apply transformations dataset = datasets.ImageFolder(data_dir, transform=transform) # Split the dataset into training and validation sets train_size = int(0.9 * len(dataset)) valid_size = len(dataset) - train_size train_dataset, val_dataset = random_split(dataset, [train_size, valid_size]) # Example of DataLoader creation for training and validation train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False) print("Number of instance in train_set: %s" % len(train_dataset)) print("Number of instance in val_set: %s" % len(val_dataset)) In [ ]: class_names = ['bird', 'bottle', 'bread', 'butterfly', 'cake', 'cat', 'chicken', 'cow', 'dog', 'duck',

'elephant', 'fish', 'handgun', 'horse', 'lion', 'lipstick', 'seal', 'snake', 'spider', 'vase'] In [ ]: # obtain one batch of training images dataiter = iter(train_loader) images, labels = next(dataiter) images = images.numpy() # convert images to numpy for display 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第16/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html For questions 3.1 to 3.7, you'll need to write your own model in a way that makes it easy for you to experiment with different architectures and parameters. The goal is to be able to pass the parameters to initialize a new instance of YourModel

to build different network architectures with different parameters. Below are descriptions of some parameters for YourModel : In [ ]: import math def imshow(img):

img = img / 2 + 0.5

# unnormalize

plt.imshow(np.transpose(img, (1, 2, 0)))

# convert from Tensor image def visualize_data(images, categories, images_per_row = 8):

class_names = ['bird', 'bottle', 'bread', 'butterfly', 'cake', 'cat', 'chicken', 'cow', 'dog', 'duck',

'elephant', 'fish', 'handgun', 'horse', 'lion', 'lipstick', 'seal', 'snake', 'spider', 'vase']

n_images = len(images)

n_rows = math.ceil(float(n_images)/images_per_row)

fig = plt.figure(figsize=(1.5*images_per_row, 1.5*n_rows))


for i in range(n_images):

plt.subplot(n_rows, images_per_row, i+1)




class_index = categories[i]


plt.show() In [ ]: visualize_data(images, labels) 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第17/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html 1. Block confirguration : Our network consists of many blocks. Each block has the pattern [conv, batch norm, activation, conv, batch norm,

activation, max pool, dropout] . All convolutional layers have filter size , strides

and padding = 1, and all max pool layers have strides , kernel size , and padding = 0. The network will consists of a few blocks before applying a linear layer to output the logits for the softmax layer. 2. list_feature_maps : the number of feature maps in the blocks of the network. For example, if list_feature_maps = [16, 32, 64] , our network has two blocks with the input_channels or number of feature maps are 16, 32 , and 64

respectively. 3. drop_rate : the keep probability for dropout. Setting drop_rate


means not using dropout. 4. batch_norm : the batch normalization function is used or not. Setting batch_norm

to false

means not using batch normalization. 5. use_skip : the skip connection is used in the blocks or not. Setting this to true

means that we use 1x1

Conv2D with strides=2

for the skip connection. 6. At the end, you need to apply global average pooling (GAP) ( AdaptiveAvgPool2d((1, 1)) ) to flatten the 3D output tensor before defining the output linear layer for predicting the labels. Here is the model confirguration of YourCNN

if the list_feature_maps = [16,

32, 64]

and batch_norm = true . (3, 3) (1, 1) (2, 2) 2 0.0 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第18/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **Question 3.1:** You need to implement the aforementioned CNN. First, you need to implement the block of our CNN in the class YourBlock . You can ignore use_skip

and skip connection

for simplicity. However, you cannot earn full marks for this question. [6 points] In [ ]: #Your code here class YourBlock(nn.Module):

def __init__(self, in_feature_maps, out_feature_maps, drop_rate = 0.2, batch_norm = True, use_skip = True):

super(YourBlock, self).__init__()

self.use_skip = use_skip

#Your code here

def forward(self, x):

#Write your code here 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第19/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Second, you need to use the above YourBlock

to implement the class YourCNN . [6 points] We declare my_cnn

from YourCNN

as follows. In [ ]: class YourCNN(nn.Module):

def __init__(self, list_feature_maps = [16, 32, 64], drop_rate = 0.2, batch_norm= True, use_skip = True):

super(YourCNN, self).__init__()

layers = []

#Write your code here

def forward(self, x):

#Write your code here In [ ]: device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") my_cnn = YourCNN(list_feature_maps = [16, 32, 64], use_skip = True) my_cnn = my_cnn.to(device) print(my_cnn) 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第20/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html YourCNN(

(block): ModuleList(

(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_runn ing_stats=True)

(2): ReLU(inplace=True)

(3): YourBlock(

(block): ModuleList(

(0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=( 1, 1))

(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=( 1, 1))

(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True)

(5): ReLU(inplace=True)

(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ce il_mode=False)

(7): Dropout(p=0.2, inplace=False)


(skip_conv): Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2))


(4): YourBlock(

(block): ModuleList(

(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=( 1, 1))

(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=( 1, 1))

(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True)

(5): ReLU(inplace=True)

(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ce il_mode=False)

(7): Dropout(p=0.2, inplace=False)


(skip_conv): Conv2d(32, 64, kernel_size=(1, 1), stride=(2, 2))


(5): AdaptiveAvgPool2d(output_size=(1, 1))

(6): Flatten(start_dim=1, end_dim=-1)

(7): Linear(in_features=64, out_features=20, bias=True)

) ) We declare the optimizer and the loss function. Here are the codes to compute the loss and accuracy. In [ ]: # Loss and optimizer learning_rate = 0.001 loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(my_cnn.parameters(), lr=learning_rate) 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第21/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Here is the code to train our model. In [ ]: device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") def compute_loss(model, loss_fn, loader):

loss = 0

# Set model to eval mode for inference


with torch.no_grad():

# No need to track gradients for validation

for (batchX, batchY) in loader:

# Move data to the same device as the model

batchX, batchY = batchX.to(device).type(torch.float32), batchY.to(device).type(torch.long)

loss += loss_fn(model(batchX), batchY)

# Set model back to train mode


return float(loss)/len(loader) In [ ]: device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") def compute_acc(model, loader):

correct = 0

totals = 0

# Set model to eval mode for inference


for (batchX, batchY) in loader:

# Move batchX and batchY to the same device as the model

batchX, batchY = batchX.to(device).type(torch.float32), batchY.to(device)

outputs = model(batchX)

# feed batch to the model

totals += batchY.size(0)

# accumulate totals with the current batch size

predicted = torch.argmax(outputs.data, 1)

# get the predicted class

# Move batchY to the same device as predicted for comparison

correct += (predicted == batchY).sum().item()

return correct / totals 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第22/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html In [ ]: device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") def fit(model= None, train_loader = None, valid_loader= None, optimizer = None,

num_epochs = 50, verbose = True, seed= 1234):


# Move the model to the device before initializing the optimizer

model.to(device) # Move the model to the GPU

if optimizer == None:

optim = torch.optim.Adam(model.parameters(), lr = 0.001) # Now initialize optimizer with model on GPU


optim = optimizer

history = dict()

history['val_loss'] = list()

history['val_acc'] = list()

history['train_loss'] = list()

history['train_acc'] = list()

for epoch in range(num_epochs):


for (X, y) in train_loader:

# Move input data to the same device as the model

X,y = X.to(device), y.to(device)

# Forward pass

outputs = model(X.type(torch.float32)) # X is already on the correct device

loss = loss_fn(outputs, y.type(torch.long))

# Backward and optimize




#losses and accuracies for epoch

val_loss = compute_loss(model, loss_fn, valid_loader)

val_acc = compute_acc(model, valid_loader)

train_loss = compute_loss(model, loss_fn, train_loader)

train_acc = compute_acc(model, train_loader)





if not verbose: #verbose = True means we do show the training information during training

print(f"Epoch {epoch+1}/{num_epochs}")

print(f"train loss= {train_loss:.4f} - train acc= {train_acc*100:.2f}% - valid loss= {val_loss:.4f} - valid acc= {val_acc*100:.2f}%")

return history In [ ]: history = fit(model= my_cnn, train_loader=train_loader, valid_loader = val_loader, optimizer = optimizer, num_epochs= 10, verbose = False) 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第23/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Epoch 1/10 train loss= 2.2247 - train acc= 29.49% - valid loss= 2.3166 - valid acc=

29.88% Epoch 2/10 train loss= 1.9459 - train acc= 38.62% - valid loss= 2.2254 - valid acc=

34.42% Epoch 3/10 train loss= 1.7757 - train acc= 42.18% - valid loss= 1.8783 - valid acc=

42.13% Epoch 4/10 train loss= 1.6908 - train acc= 44.52% - valid loss= 1.7666 - valid acc=

40.55% Epoch 5/10 train loss= 1.6232 - train acc= 47.31% - valid loss= 1.8377 - valid acc=

44.56% Epoch 6/10 train loss= 1.5740 - train acc= 48.37% - valid loss= 1.7418 - valid acc=

47.41% Epoch 7/10 train loss= 1.4314 - train acc= 53.27% - valid loss= 1.7969 - valid acc=

52.06% Epoch 8/10 train loss= 1.3645 - train acc= 54.60% - valid loss= 1.4301 - valid acc=

52.16% Epoch 9/10 train loss= 1.3372 - train acc= 55.76% - valid loss= 1.4144 - valid acc=

53.64% Epoch 10/10 train loss= 1.2766 - train acc= 57.69% - valid loss= 1.5284 - valid acc=

54.28% **Question 3.2:** Now, let us tune the number of blocks,

and . Write your code for this tuning and report the result of the best model on the testing set. Note that you need to show your code for tuning and evaluating on the test set to earn the full marks. During tuning, you can set the instance variable verbose

of your model to True

for not showing the training details of each epoch. Note that for this question, depending on your computational resource, you can choose list_feature_maps= [32, 64]

or list_feature_maps= [16, 32,

64] . [3 points] Please note that you struggle in implementing the aforementioned CNN. You can use the MiniVGG network in our labs for doing the following questions. However, you cannot earn any mark for 3.1 and 3.2. use_skip ∈ {true, false} learning_rate ∈ {0.001, 0.0005} In [ ]: #Your code here 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第24/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **Question 3.3:** Exploring Data Mixup Technique for Improving Generalization Ability. [4 points] Data mixup is another super-simple technique used to boost the generalization ability of deep learning models. You need to incoroporate data mixup technique to the above deep learning model and experiment its performance. There are some papers and documents for data mixup as follows: Main paper for data mixup link for main paper and a good article article link. You need to extend your model developed above, train a model using data mixup, and write your observations and comments about the result. **Question 3.4:** Exploring CutMix Technique for Improving Generalization Ability. [4 points] CutMix is another super-simple technique used to boost the generalization ability of deep learning models. You need to incoroporate data CutMix technique to the above deep learning model and experiment its performance. There are some papers and documents for data mixup as follows: Main paper for Cutmix link for main paper and a good article article link. You need to extend your model developed above, train a model using data CutMix, and write your observations and comments about the result. In [ ]: #Your code here In [ ]: #Your code here 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第25/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **Question 3.5:** Implement the one-versus-all (OVA) loss. The details are as follows: You need to apply the sigmoid activation function

to logits

instead of the softmax activation

function as usual to obtain , meaning that . Note that

is the number of classes. Given a data example

with the ground-truth label , the idea is to maximize the likelihood

and to minimize the likelihoods . Therefore, the objective function is to find the model parameters to

or equivalently . For example, if

and , you need to minimize . Compare the model trained with the OVA loss and the same model trained with the standard cross-entropy loss. [4 points] **Question 3.6:** Attack your best obtained model with PGD attacks with

on the testing set. Write the code for the attacks and report the robust accuracies. Also choose a random set of 20 clean images in the testing set and visualize the original and attacked images. [4 points] **Question 3.7:** Train a robust model using adversarial training with PGD . Write the code for the adversarial training and report the robust accuracies. After finishing the training, you need to store your best robust model in the folder ./models

and load the model to evaluate the robust accuracies for PGD and FGSM attacks with

on the testing set. [4 points] h = [h1,h2, . . . ,hM ] p = [p1, p2, . . . , pM ] pi = sigmoid(hi), i = 1, . . . ,M M x y py pi, i ≠ y max{log py +∑i≠y log(1− pi)} min{− log py −∑i≠y log(1− pi)} M = 3 y = 2 min {− log(1− p1)− log p2 − log(1− p3)} In [ ]: #Your code here ϵ = 0.0313, k = 20, η = 0.002 In [ ]: #Your code here ϵ = 0.0313, k = 10, η = 0.002 ϵ = 0.0313, k = 20, η = 0.002 In [ ]: #Your code here 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第26/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **Question 3.8 (Kaggle competition)** [10 points] You can reuse the best model obtained in this assignment or develop new models to evaluate on the testing set of the FIT3181/5215 Kaggle competion. However, to gain any points for this question, your testing accuracy must exceed the accuracy threshold from a base model developed by us as shown in the leader board of the competition. The marks for this question are as follows: If you are in top 10% of your cohort, you gain 10 points. If you are in top 20% of your cohort, you gain 8 points. If you are in top 30% of your cohort, you gain 6 points. If you beat our base model, you gain 4 points. END OF ASSIGNMENT GOOD LUCK WITH YOUR ASSIGNMENT 1! 51作业君版权所有




添加客服微信: abby12468