# 代写辅导接单-FIT3181: -python代写-Assignment1

is your student ID, to the corresponding Assignment (Dropbox) in Moodle. You can use Google Colab to do Assigmnent 1 but you need to save it to an *.ipynb

[1.5 points] **(b)** Gaussian Error Linear Unit (GELU):

where

is the probability cummulative function

of the standard Gaussian distribution or

where . In addition, the GELU activation fuction (the link for the main paper) has been widely used in the state-of-the-art Vision for Transformers (e.g., here is the link for the main ViT paper). [1.5 points] Write your answer here. You can add more cells if needed. **Question 1.2:** Assume that we feed a data point

with a ground- truth label

to the feed-forward neural network with the ReLU

activation

function as shown in the following figure ELU(x) = { 0.1 (exp(x)− 1) ifx ≤ 0 x ifx > 0 GELU(x) = xΦ(x) Φ(x) Φ(x) = P (X ≤ x) X ∼ N (0, 1) x y = 2 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第4/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **(a)** What is the numerical value of the latent presentation ? [1 point] **(b)** What is the numerical value of the latent presentation ? [1 point] **(c)** What is the numerical value of the logit ? [1 point] **(d)** What is the corresonding prediction probabilities ? [1 point] **(e)** What is the predicted label ? Is it a correct and an incorect prediction? Remind that . [1 point] **(f)** What is the cross-entropy loss caused by the feed-forward neural network at ? Remind that . [1 point] **(g)** Why is the cross-entropy loss caused by the feed-forward neural network at

(i.e., ) always non-negative? When does this

loss get the value ? Note that you need to answer this question for a general pair

and a general feed-forward neural network with, for example

classes? [1 point] You must show both formulas and numerical results for earning full mark. Although it is optional, it is great if you show your PyTorch code for your computation. **Question 1.3:** For Question 1.3, you have two options: (1) perform the forward, backward propagation, and SGD update for one

mini-batch

(10 points), or (2) manually implement a feed-forward neural network that can work on real tabular datasets (20 points). You can choose either (1) or (2) to proceed. **Option 1** [Total marks for this option: 10 points] h1(x) h2(x) h3(x) p(x) yˆ y = 2 (x, y) y = 2 (x, y) CE(1y, p(x)) CE(1y, p(x)) 0 (x, y) M = 4 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第5/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Assume that we are constructing a multilayered feed-forward neural network for a classification problem with three classes where the model parameters will be generated randomly using your student ID. The architecture of this network is

as shown in the following figure. Note that the ELU has the same formula as the one in Q1.1. We feed a batch

with the labels

as shown in the figure. Answer the following questions. You need to show both formulas, numerical results, and your PyTorch code for your computation for earning full marks.

Forward propagation **(a)** What is the value of

(the pre-activation values of )? [0.5 point] **(b)** What is the value of ? [0.5 point] 3(Input)→ 5(ELU)→ 3(Output) X Y In [ ]: import torch student_id = 1234

#insert your student id here for example 1234 torch.manual_seed(student_id) Out[ ]: In [ ]: #Code to generate random matrices and biases for W1, b1, W2, b2 h¯ 1(x) h1 In [ ]: #Show your code h1(x) In [ ]: #Show your code 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第6/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **(c)** What is the predicted value ? [0.5 point] (d) Suppose that we use the cross-entropy (CE) loss. What is the value of the CE loss

incurred by the mini-batch? [0.5 point] Backward propagation **(e)** What are the derivatives , and ? [3 points] **(f)** What are the derivatives , and ? [3 points] SGD update **(g)** Assume that we use SGD with learning rate

to update the model parameters. What are the values of

and

after updating? [2 points] **Option 2** [Total marks for this option: 20 points] yˆ In [ ]: #Show your code l In [ ]: #Show your code ,∂l ∂h2 ∂l ∂W 2 ∂l ∂b2 In [ ]: #Show your code , ,∂l ∂h1 ∂l ∂h¯ 1 ∂l ∂W 1 ∂l ∂b1 In [ ]: #Show your code η = 0.01 W 2, b2 W 1, b1 In [ ]: #Show your code In [ ]: import torch from torch.utils.data import DataLoader from torchvision import datasets, transforms 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第7/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html In Option 2, you need to implement a feed-forward NN manually using PyTorch and auto-differentiation of PyTorch. We then manually train the model on the MNIST dataset. We first download the MNIST

dataset and preprocess it. Each data point has dimension [28,28] . We need to flatten it to a vector to input to our FFN. Develop the feed-forward neural networks (a) You need to develop the class MyLinear

with the following skeleton. You need to declare the weight matrix and bias of this linear layer. [3 points] In [ ]: transform = transforms.Compose([

transforms.ToTensor(),

# Convert the image to a tensor with shape [C, H, W]

transforms.Normalize((0.5,), (0.5,)),

# Normalize to [-1, 1]

with the following skeleton [7 points] In [ ]: class MyLinear(torch.nn.Module):

def __init__(self, input_size, output_size):

"""

input_size: the size of the input

output_size: the size of the output

"""

super().__init__()

self.W =

self.b =

#forward propagation

def forward(self, x): #x is a mini-batch

#Your code here In [ ]: class MyFFN(torch.nn.Module):

def __init__(self, input_size, num_classes, hidden_sizes, act = torch.nn.ReLU()):

"""

input_size: the size of the input

num_classes: the number of classes

act is the activation function

hidden_sizes is the list of hidden sizes

for example input_size = 3, hidden_sizes = [5, 7], num_classes = 4, and act = torch.nn.ReLU()

means that we are building up a FFN with the confirguration

(3 (Input) -> 5 (ReLU) -> 7 (ReLU) -> 4 (Output))

"""

super(MyFFN, self).__init__()

self.input_size = input_size

self.num_classes = num_classes

self.act = act

self.hidden_sizes = hidden_sizes

self.num_layers = len(hidden_sizes) + 1

def create_FFN(self):

"""

This function creates the feed-forward neural network

We stack many MyLinear layers

"""

hidden_sizes = [self.input_size] + self.hidden_sizes + [self.num_classes]

self.layers = []

def forward(self,x):

"""

This implements the forward propagation of the batch x

This needs to return the prediction probabilities of x

"""

def compute_loss(self, x, y): 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第9/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html (c) Write the code to evaluate the accuracy of the current myFFN

model on a data loader (e.g., train_loader or test_loader). [2.5 points] (c) Write the code to evaluate the loss of the current myFFN

"""

This function computes the cross-entropy loss

You can use the built-in CE loss of PyTorch

"""

def update_SGD(self, x, y, learning_rate = 0.01):

"""

This function updates the model parameters using SGD using the batch (x,y)

You need to implement the update rule manually and cannot rely on the built-in optimizer

"""

def update_SGDwithMomentum(self, x, y, learning_rate = 0.01, momentum = 0.9):

"""

This function updates the model parameters using SGD with momentum using the batch (x,y)

You need to implement the update rule manually and cannot rely on the built-in optimizer

"""

"""

You need to implement the update rule manually and cannot rely on the built-in optimizer

"""

#Your code here In [ ]: myFFN = MyFFN(input_size = 28*28, num_classes = 10, hidden_sizes = [100, 100], act = torch.nn.ReLU) myFFN.create_FFN() print(myFFN) In [ ]: def compute_acc(model, data_loader):

"""

This function computes the accuracy of the model on a data loader

"""

"""

This function computes the loss of the model on a data loader

"""

data with 50 epochs using updateSGD . (d) Implement the function updateSGDMomentum

in the class and train the model with this optimizer in 50

epochs. You can update the corresponding function in the MyFNN

in the class and train the model with this optimizer in 50

epochs. You can update the corresponding function in the MyFNN

class. [2.5 points] Part 2: Deep Neural Networks (DNN) [Total marks for this part: 25 points] The second part of this assignment is to demonstrate your basis knowledge in deep learning that you have acquired from the lectures and tutorials materials. Most of the contents in this assignment are drawn from the tutorials covered from weeks 1 to 2. Going through these materials before attempting this assignment is highly recommended. In the second part of this assignment, you are going to work with the FashionMNIST dataset for image recognition task. It has the exact same format as MNIST (70,000 grayscale images of 28 × 28 pixels each with 10 classes), but the images represent fashion items rather than handwritten digits, so each class is more diverse, and the problem is significantly more challenging than MNIST. In [ ]: num_epochs = 50 for epoch in range(num_epochs):

for i, (x, y) in enumerate(train_loader):

myFFN.update_SGD(x, y, learning_rate = 0.01)

print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Acc: {train_acc*100:.2f}%, Test Loss: {test_loss:.4f}, Test Acc: {test_acc*100:.2f}%") In [ ]: #Your code here In [ ]: #Your code here 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第11/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Load the Fashion MNIST using torchvision torch.Size([60000, 28, 28]) torch.Size([60000]) torch.Size([10000, 28, 28]) torch.Size([10000]) torch.Size([60000, 784]) torch.Size([60000]) torch.Size([10000, 784]) torch.Size([10000]) Number of training samples: 18827 Number of training samples: 16944 Number of validation samples: 1883 **Question 2.1:** Write the code to visualize a mini-batch in train_loader

epochs and evaluate the trained model on the test set. **Question 2.3:** Tuning hyper-parameters with grid search [5 points] Assume that you need to tune the number of neurons on the first and second hidden layers ,

and the used activation function . The network has the architecture pattern

where , and

are in their grides. Write the code to tune the hyper-parameters , and . Note that you can freely choose the optimizer and learning rate of interest for this task. **Question 2.4:** Implement the loss with the form:

where

is the entropy of ,

is the prediction probabilities of a data point

with the ground-truth label ,

is an one-hot label, and

is a trade- off parameter. Set

images of

Training on CPU ...

print('CUDA is not available.

Training on CPU ...') else:

print('CUDA is available!

Training on GPU ...') device = torch.device("cuda" if torch.cuda.is_available() else "cpu") torch.manual_seed(1234) Out[ ]: In [ ]: !gdown --fuzzy https://drive.google.com/file/d/1aEkxNWaD02Z8ZNvZzeMefUoY97C-3wTG/view?usp=drive_link # !gdown --fuzzy https://drive.google.com/file/d/1qdElRqDS4TitXfv_iG_TFQSi9QIfy4uM/view?usp=drive_link # backup url In [ ]: !unzip -q Animals_Dataset.zip 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第15/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Number of instance in train_set: 8519 Number of instance in val_set: 947 In [ ]: data_dir = "./FIT5215_Dataset" # We resize the images to [3,64,64] transform = transforms.Compose([transforms.Resize((64,64)),

#resises the image so it can be perfect for our model.

transforms.RandomHorizontalFlip(), # FLips the image w.r.t horizontal axis

#transforms.RandomRotation(4),

#Rotates the image to a specified angel

#transforms.RandomAffine(0, shear=10, scale=(0.8,1.2)), #Performs actions like zooms, change shear angles.

transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2), # Set the color params

transforms.ToTensor(), # convert the image to tensor so that it can work with torch

transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),

# Normalize the images, each R,G,B value is normalized with mean=0.5 and std=0.5

]) # Load the dataset using torchvision.datasets.ImageFolder and apply transformations dataset = datasets.ImageFolder(data_dir, transform=transform) # Split the dataset into training and validation sets train_size = int(0.9 * len(dataset)) valid_size = len(dataset) - train_size train_dataset, val_dataset = random_split(dataset, [train_size, valid_size]) # Example of DataLoader creation for training and validation train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False) print("Number of instance in train_set: %s" % len(train_dataset)) print("Number of instance in val_set: %s" % len(val_dataset)) In [ ]: class_names = ['bird', 'bottle', 'bread', 'butterfly', 'cake', 'cat', 'chicken', 'cow', 'dog', 'duck',

'elephant', 'fish', 'handgun', 'horse', 'lion', 'lipstick', 'seal', 'snake', 'spider', 'vase'] In [ ]: # obtain one batch of training images dataiter = iter(train_loader) images, labels = next(dataiter) images = images.numpy() # convert images to numpy for display 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第16/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html For questions 3.1 to 3.7, you'll need to write your own model in a way that makes it easy for you to experiment with different architectures and parameters. The goal is to be able to pass the parameters to initialize a new instance of YourModel

to build different network architectures with different parameters. Below are descriptions of some parameters for YourModel : In [ ]: import math def imshow(img):

img = img / 2 + 0.5

# unnormalize

plt.imshow(np.transpose(img, (1, 2, 0)))

# convert from Tensor image def visualize_data(images, categories, images_per_row = 8):

class_names = ['bird', 'bottle', 'bread', 'butterfly', 'cake', 'cat', 'chicken', 'cow', 'dog', 'duck',

'elephant', 'fish', 'handgun', 'horse', 'lion', 'lipstick', 'seal', 'snake', 'spider', 'vase']

n_images = len(images)

n_rows = math.ceil(float(n_images)/images_per_row)

fig = plt.figure(figsize=(1.5*images_per_row, 1.5*n_rows))

fig.patch.set_facecolor('white')

for i in range(n_images):

plt.subplot(n_rows, images_per_row, i+1)

plt.xticks([])

plt.yticks([])

imshow(images[i])

class_index = categories[i]

plt.xlabel(class_names[class_index])

plt.show() In [ ]: visualize_data(images, labels) 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第17/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html 1. Block confirguration : Our network consists of many blocks. Each block has the pattern [conv, batch norm, activation, conv, batch norm,

activation, max pool, dropout] . All convolutional layers have filter size , strides

and padding = 1, and all max pool layers have strides , kernel size , and padding = 0. The network will consists of a few blocks before applying a linear layer to output the logits for the softmax layer. 2. list_feature_maps : the number of feature maps in the blocks of the network. For example, if list_feature_maps = [16, 32, 64] , our network has two blocks with the input_channels or number of feature maps are 16, 32 , and 64

respectively. 3. drop_rate : the keep probability for dropout. Setting drop_rate

to

means not using dropout. 4. batch_norm : the batch normalization function is used or not. Setting batch_norm

to false

means not using batch normalization. 5. use_skip : the skip connection is used in the blocks or not. Setting this to true

means that we use 1x1

Conv2D with strides=2

for the skip connection. 6. At the end, you need to apply global average pooling (GAP) ( AdaptiveAvgPool2d((1, 1)) ) to flatten the 3D output tensor before defining the output linear layer for predicting the labels. Here is the model confirguration of YourCNN

if the list_feature_maps = [16,

32, 64]

and batch_norm = true . (3, 3) (1, 1) (2, 2) 2 0.0 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第18/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html **Question 3.1:** You need to implement the aforementioned CNN. First, you need to implement the block of our CNN in the class YourBlock . You can ignore use_skip

and skip connection

for simplicity. However, you cannot earn full marks for this question. [6 points] In [ ]: #Your code here class YourBlock(nn.Module):

def __init__(self, in_feature_maps, out_feature_maps, drop_rate = 0.2, batch_norm = True, use_skip = True):

super(YourBlock, self).__init__()

self.use_skip = use_skip

def forward(self, x):

to implement the class YourCNN . [6 points] We declare my_cnn

from YourCNN

as follows. In [ ]: class YourCNN(nn.Module):

def __init__(self, list_feature_maps = [16, 32, 64], drop_rate = 0.2, batch_norm= True, use_skip = True):

super(YourCNN, self).__init__()

layers = []

def forward(self, x):

#Write your code here In [ ]: device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") my_cnn = YourCNN(list_feature_maps = [16, 32, 64], use_skip = True) my_cnn = my_cnn.to(device) print(my_cnn) 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第20/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html YourCNN(

(block): ModuleList(

(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_runn ing_stats=True)

(2): ReLU(inplace=True)

(3): YourBlock(

(block): ModuleList(

(0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=( 1, 1))

(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=( 1, 1))

(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True)

(5): ReLU(inplace=True)

(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ce il_mode=False)

(7): Dropout(p=0.2, inplace=False)

)

(skip_conv): Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2))

)

(4): YourBlock(

(block): ModuleList(

(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=( 1, 1))

(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=( 1, 1))

(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True)

(5): ReLU(inplace=True)

(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ce il_mode=False)

(7): Dropout(p=0.2, inplace=False)

)

(skip_conv): Conv2d(32, 64, kernel_size=(1, 1), stride=(2, 2))

)

(6): Flatten(start_dim=1, end_dim=-1)

(7): Linear(in_features=64, out_features=20, bias=True)

) ) We declare the optimizer and the loss function. Here are the codes to compute the loss and accuracy. In [ ]: # Loss and optimizer learning_rate = 0.001 loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(my_cnn.parameters(), lr=learning_rate) 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第21/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Here is the code to train our model. In [ ]: device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") def compute_loss(model, loss_fn, loader):

loss = 0

# Set model to eval mode for inference

model.eval()

# No need to track gradients for validation

# Move data to the same device as the model

batchX, batchY = batchX.to(device).type(torch.float32), batchY.to(device).type(torch.long)

loss += loss_fn(model(batchX), batchY)

# Set model back to train mode

model.train()

return float(loss)/len(loader) In [ ]: device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") def compute_acc(model, loader):

correct = 0

totals = 0

# Set model to eval mode for inference

model.eval()

# Move batchX and batchY to the same device as the model

batchX, batchY = batchX.to(device).type(torch.float32), batchY.to(device)

outputs = model(batchX)

# feed batch to the model

totals += batchY.size(0)

# accumulate totals with the current batch size

predicted = torch.argmax(outputs.data, 1)

# get the predicted class

# Move batchY to the same device as predicted for comparison

correct += (predicted == batchY).sum().item()

return correct / totals 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第22/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html In [ ]: device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") def fit(model= None, train_loader = None, valid_loader= None, optimizer = None,

num_epochs = 50, verbose = True, seed= 1234):

torch.manual_seed(seed)

# Move the model to the device before initializing the optimizer

model.to(device) # Move the model to the GPU

if optimizer == None:

optim = torch.optim.Adam(model.parameters(), lr = 0.001) # Now initialize optimizer with model on GPU

else:

optim = optimizer

history = dict()

history['val_loss'] = list()

history['val_acc'] = list()

history['train_loss'] = list()

history['train_acc'] = list()

for epoch in range(num_epochs):

model.train()

# Move input data to the same device as the model

X,y = X.to(device), y.to(device)

# Forward pass

outputs = model(X.type(torch.float32)) # X is already on the correct device

loss = loss_fn(outputs, y.type(torch.long))

# Backward and optimize

loss.backward()

optim.step()

#losses and accuracies for epoch

history['val_loss'].append(val_loss)

history['val_acc'].append(val_acc)

history['train_loss'].append(train_loss)

history['train_acc'].append(train_acc)

if not verbose: #verbose = True means we do show the training information during training

print(f"Epoch {epoch+1}/{num_epochs}")

print(f"train loss= {train_loss:.4f} - train acc= {train_acc*100:.2f}% - valid loss= {val_loss:.4f} - valid acc= {val_acc*100:.2f}%")

return history In [ ]: history = fit(model= my_cnn, train_loader=train_loader, valid_loader = val_loader, optimizer = optimizer, num_epochs= 10, verbose = False) 4/9/2024 21:03FIT3181_DeepLearningAssignment1_Official 第23/26⻚file:///Users/yijiaxue/Downloads/FIT3181_DeepLearningAssignment1_Official.html Epoch 1/10 train loss= 2.2247 - train acc= 29.49% - valid loss= 2.3166 - valid acc=

29.88% Epoch 2/10 train loss= 1.9459 - train acc= 38.62% - valid loss= 2.2254 - valid acc=

34.42% Epoch 3/10 train loss= 1.7757 - train acc= 42.18% - valid loss= 1.8783 - valid acc=

42.13% Epoch 4/10 train loss= 1.6908 - train acc= 44.52% - valid loss= 1.7666 - valid acc=

40.55% Epoch 5/10 train loss= 1.6232 - train acc= 47.31% - valid loss= 1.8377 - valid acc=

44.56% Epoch 6/10 train loss= 1.5740 - train acc= 48.37% - valid loss= 1.7418 - valid acc=

47.41% Epoch 7/10 train loss= 1.4314 - train acc= 53.27% - valid loss= 1.7969 - valid acc=

52.06% Epoch 8/10 train loss= 1.3645 - train acc= 54.60% - valid loss= 1.4301 - valid acc=

52.16% Epoch 9/10 train loss= 1.3372 - train acc= 55.76% - valid loss= 1.4144 - valid acc=

53.64% Epoch 10/10 train loss= 1.2766 - train acc= 57.69% - valid loss= 1.5284 - valid acc=

54.28% **Question 3.2:** Now, let us tune the number of blocks,

and . Write your code for this tuning and report the result of the best model on the testing set. Note that you need to show your code for tuning and evaluating on the test set to earn the full marks. During tuning, you can set the instance variable verbose

for not showing the training details of each epoch. Note that for this question, depending on your computational resource, you can choose list_feature_maps= [32, 64]

or list_feature_maps= [16, 32,

to logits

function as usual to obtain , meaning that . Note that

is the number of classes. Given a data example

with the ground-truth label , the idea is to maximize the likelihood

and to minimize the likelihoods . Therefore, the objective function is to find the model parameters to

or equivalently . For example, if

and , you need to minimize . Compare the model trained with the OVA loss and the same model trained with the standard cross-entropy loss. [4 points] **Question 3.6:** Attack your best obtained model with PGD attacks with

on the testing set. Write the code for the attacks and report the robust accuracies. Also choose a random set of 20 clean images in the testing set and visualize the original and attacked images. [4 points] **Question 3.7:** Train a robust model using adversarial training with PGD . Write the code for the adversarial training and report the robust accuracies. After finishing the training, you need to store your best robust model in the folder ./models

and load the model to evaluate the robust accuracies for PGD and FGSM attacks with