代写辅导接单-CMPT 742

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

Assignment 1

Object Detection with SSD

Total points: 100 + 5 points

Overview

In this assignment, you will be implementing SSD (Single Shot MultiBox Detector), a

type of object detection network. Similar to YOLO, SSD divides the image into grids

and outputs bounding boxes in each cell. However, SSD is more complex as it divides

grids at different scales and uses default bounding boxes as anchors.

It is worth noting that the SSD you will be implementing in this assignment is a

simplified v ersion. I f t here a re a ny d iscrepancies b etween t he i nstructions provided

in this assignment and the SSD paper, please follow the instructions provided in the

assignment. Additionally, you can learn more about the task by reading the SSD

paper.

1

CMPT 742: Practices in visual computing I

A1

1 Introduction (0 Points)

The main difference between YOLO and SSD is the default bounding boxes. In YOLO,

we divide the image into 5x5 grid cells. Each cell has its confidence, or probabilities for

each class (P(cat), P(dog), P(person), P(background)). For SSD, each cell will provide

some default bounding boxes. In this assignment, each cell will provide four default

bounding boxes as follows:

Figure 1: Default bounding boxes of cell [1,1]

As you see, if for YOLO, we have at most 25 bounding boxes, then for SSD, we will

have25×4 = 100bounding boxes. Furthermore, we will introduce grids for multiple

scales. In YOLO, we only have one 5x5 grid in the output layer, but for SSD in this

assignment, we will have four output branches: 10x10, 5x5, 3x3 and 1x1, a total of 135

cells and 540 default bounding boxes.

We aim to matchground truth bounding boxesto some of our pre-determined

default bounding boxes. This will ensure that the default bounding boxes are

accurate estimates of the ground truth bounding box. We can then use these default

bounding boxes as "anchors" to predict the relative positions and sizes of objects in the

image. That is, we do not need to predict the absolute width and height as in YOLO;

2

CMPT 732

A1

we can just predict the relative positions and sizes with respect to the default bounding

box, as shown in the above figure,∆(cx, cy, w, h). In all, the main challenge in this

assignment (SSD) compared to the tutorial (YOLO) is how to generate the default

bounding boxes and how to assign ground truth objects to those default bounding

boxes.

2 Workspace Initialization (0 Points)

Download the dataset fromhereand extract it in the

data/directory.

3 Generating default bounding boxes (10 Points)

As mentioned, in this assignment you are going to work on(10×10 + 5×5 + 3×3 +

1×1)×4 = 540default bounding boxes. The first step is to generate them.

You need to complete the functiondefault_box_generatorindataset.py. The

function takes a series of parameters and eventually outputs a540×8array, storing 540

bounding boxes. The last dimension 8 means the 8 attributes of each default bounding

box:

[x_center, y_center, box_width, box_height, x_min, y_min, x_max, y_max]

Note that here, all values are absolute positions and sizes. For example, if a default

bounding box has a center (x=0.3, y=0.4), width=0.1 and height=0.2, the attributes

you need to store are:

[0.3, 0.4, 0.1, 0.2, 0.25, 0.3, 0.35, 0.5]

You need to generate 4 default bounding boxes for each grid cell in each grid (10, 5, 3,

1), using the provided scales large_scale and small_scale to determine the sizes of the

default bounding boxes. There may be bounding boxes exceeding the image boundary,

therefore you may clip them so that the bounding boxes stay inside the image.

For example, consider filling the default boxes for the first cell. The size of the grid is

10 since layers[0]=10. The scale of the three large boxes is lsize = large_scale[0] = 0.2.

The scale of the one small box is ssize = small_scale[0] = 0.1. The first cell is (0,0) in

a 10x10 cell grid.

3

CMPT 732

A1

For each cell;

Generate a box with width and height[ssize,ssize].

Generate a box with width and height[lsize,lsize].

Generate a box with width and height[lsize*sqrt(2),lsize/sqrt(2)].

Generate a box with width and height[lsize/sqrt(2),lsize*sqrt(2)].

All the four above boxes are centered at the center of the first cell (0.5/10, 0.5/10) The

four boxes you get for the first cell is:

[0.05, 0.05, 0.1, 0.1, 0, 0, 0.1, 0.1]

[0.05, 0.05, 0.2, 0.2, -0.05, -0.05, 0.15, 0.15]

[0.05, 0.05, 0.28, 0.14, -0.09, -0.02, 0.19, 0.12]

[0.05, 0.05, 0.14, 0.28, -0.02, -0.09, 0.12, 0.19]

The four boxes after clipping (optional) is:

[0.05, 0.05, 0.1, 0.1, 0, 0, 0.1, 0.1]

[0.05, 0.05, 0.2, 0.2, 0, 0, 0.15, 0.15]

[0.05, 0.05, 0.28, 0.14, 0, 0, 0.19, 0.12]

[0.05, 0.05, 0.14, 0.28, 0, 0, 0.12, 0.19]

You need to create boxes for all cells in a similar manner. You can modify the box

sizes in

large_scaleandsmall_scaleto see if you get better results when training

the network.

4 Assigning ground truth objects to default bounding

boxes (15 Points)

This part is done in the dataloader indataset.py. You need to read the image and

ground truth bounding boxes, and then return the resized and transposed image, the

ground truth probabilitiesann_confidence, and the ground truth bounding boxes

ann_box.ann_confidenceis 540x4, since we have 540 default boxes and 4 classes

(cat, dog, person, background).ann_confidenceshould be 540 one-hot vectors.

ann_boxis 540x4, since we have 4 attributes for a bounding box:

[relative_center_x, relative_center_y, relative_width, relative_height]

The attributes are all relative to the default bounding box as follows:

4

CMPT 732

A1

Suppose we have a default box[px,py,pw,ph], which are exactly the first four at-

tributes of our generated default bounding boxes. Also, we have a ground truth

object bounding box[gx,gy,gw,gh]. The 4 attributes we need for

ann_boxare

[tx,ty,tw,th], the relative positions and sizes of the ground truth bounding box

with respect to the default bounding box. If you need to recover the ground truth

bounding box, or to show the predicted bounding box of your network, you can do an

inverse process as follows, where[dx,dy,dw,dh]is the predicted relative attributes

from your network.

Then, how to determine if a default bounding box is carrying an object? Different from

YOLO, this time you need to assign one object to multiple default boxes. We will say

a default box is carrying an object, if the ground truth bounding box of this object has

an IOU greater than a threshold (0.5 in this assignment) with the default box. There

may be cases where the ground truth box does not have sufficient overlap with any of

the default boxes, in that case, we assign the object to the default box that has the

largest IOU with the object bounding box, to make sure at least one default bounding

box is used for each object.

5

CMPT 732

A1

You need to implement a function match to process each bounding box and update

the entries of ann_confidence and ann_box. You only need to fill in the bounding box

attributes for boxes that carry objects. For empty boxes, you can ignore them since

they are not used when training the network.

Other important things to do:

1. You should split the dataset into 90% training and 10% validation. You can use

all images for the training of course, but you will not know whether your network has

overfitted without a test or validation set. Your model may have very good performance

on the training set, but have very poor performance on the testing set. Keep in mind

that both training accuracy and testing accuracy are considered when grading your

assignment.

2. Data augmentation. Usually the networks are trained on millions of images. It is

uncommon to train an object detection network on only 6000 images. But we have

to reduce the data size so you can finish training within 2 hours. Therefore you have

a high chance of overfitting on the training set. You will need to augment the data to

mitigate that.

5 The network (15 Points)

Please open

model.pyand implement the network on the next page. The network is

the same as YOLO until you reach resolution 10x10. Be sure to include a bias term

for your convolution layer.

6

CMPT 732

A1

Figure 2: Model architecture

Pay close attention to how the four output branches are concatenated, especially their

ordering. You need to make sure that each entry in your 540 output boxes actually

corresponds to the entry in the 540 default boxes defined in the previous section. For

example, if you defineann_confidence[4]andann_box[4]to be a box in cell (0,1)

of the 10x10 grid, your network output must haveconfidence[4]andbboxes[4]

7

CMPT 732

A1

correspond to the cell (0,1) of the 10x10 output branch. This part is very error-prone.

6 The loss function (15 Points)

The loss function is almost the same as YOLO.

L

cls

=

1

P

i

x

obj

i

X

i

x

obj

i

crossentropy



conf

pred

i

,conf

gt

i



+

1

P

i

x

noobj

i

X

i

x

noobj

i

crossentropy



conf

pred

i

,conf

gt

i



L

box

=

1

P

i

x

obj

i

P

i

x

obj

i

L1



box

pred

i

, box

gt

i



conf

pred

i

: class probabilities, conf

gt

i

:ground truth probabilities box

pred

i

:predicted

box attributes, box

gt

i

:ground truth box attributesx

obj

i

= 1andx

noobj

i

= 0, if box(i)

carries an object otherwise,x

obj

i

= 0andx

noobj

i

= 1

You need to figure out how you can get the indices of all boxes carrying objects, and use

confidence[indices],box[indices]to select those boxes. If your implementation

is correct, after you get those indices, your code to compute loss should be no more

than three lines.

Note that the average operation is done automatically by the built-in functionsF.cross_entropy

andF.smooth_l1_loss.

7 Non maximum suppression (10 Points)

This part is already covered in the lecture and in the tutorial sessions. Please refer to

Ali’s slides.

You need to complete the function in

utils.py. An example of NMS is shown below:

8

CMPT 732

A1

A brief pipeline of NMS:

Keep two lists: A = all predicted bounding boxes; B = [ ]

1. Select the bounding box in A with the highest probability in class cat,

dog or person.

2. If that highest probability is greater than a threshold (threshold=0.5),

proceed; otherwise, the NMS is done.

3. Denote the bounding box with the highest probability as x. Move x from

A to B.

4. For all boxes in A, if a box has IOU greater than an overlap threshold

(overlap=0.5) with x, remove that box from A.

5. Jump to 1.

8 Evaluation (15 Points)

For this part, you need to plot a precision-recall curve and compute mAP using your

validation set. Implement the

generate_mAPfunction inutils.py.

9

CMPT 732

A1

9 (Optional) Albumentations (5 Points)

For this part, you will be using theAlbumentationslibrary to perform data augmen-

tation. You then need to provide separate evaluation results for it.

10 Demo (20 Points)

During the demo session, first, we will check if you implemented the other parts. Then,

we will ask you exactly four questions related to the assignment, each having 5 points.

1. We ask you to run your code on a set of examples provided by us. Your code

needs to run without errors

2. After running, your code needs to produce the expected results. We will not be

very tough on the numbers and metrics, we just visually inspect the results to

see if your implementation is performing reasonably

3. Then we will ask you a question about a part of the code. You need to be able

to clearly explain what’s going on

4. Finally, we will ask a (fairly simple) question to test your knowledge and intuition

about the assignment

Note that your codes should be zipped and submitted to Canvas before the deadline

and you may not change them during the demo session day.

10

CMPT 732

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: Fudaojun0228