CS4131

THE UNIVERSITY OF WARWICK

LEVEL 7 Open Book Assessment 2 hours

Department of Computer Science

CS413 Image and Video Analysis

Instructions

1. Read all instructions carefully and read through the entire paper before you

start writing.

2. You should attempt 4 questions. You should NOT submit answers to more

than the required number of questions.

3. All questions will carry the same number of marks.

4. You should handwrite your answers either with paper and pen or using an

electronic device with a stylus (unless you have special arrangements for exams

which allow the use of a computer).

5. Begin each question on a new page and clearly mark each page with the page

number, your student ID and the question number.

(a) Handwritten notes must be scanned or photographed and all individual

solutions should (if possible) collated into a single PDF with pages in the

correct order.

(b) You must upload two files to the AEP: your PDF of solutions and a

completed cover sheet.

(c) You must click FINISH ASSESSMENT to complete the submission pro-

cess. After you have done so you will not be able to upload anything

further.

6. Please check the legibility of your final submission before uploading. It is your

responsibility to ensure that your work can be read.

- 1 - Continued

CS4131

7. You are allowed to access module materials, notes, resources, references and

the internet during the assessment.

8. You should not try to communicate with any other candidate during the assess-

ment period or seek assistance from anyone else in completing your answers.

The Computer Science Department expects the conduct of all students tak-

ing this assessment to conform to the stated requirements. Measures will be

in operation to check for possible misconduct. These will include the use of

similarity detection tools and the right to require live interviews with selected

students following the assessment.

9. By starting this assessment, you are declaring yourself fit to undertake it. You

are expected to make a reasonable attempt at the assessment by answering the

questions in the paper.

Please note that:

• You must have completed and uploaded your assessment before the 24 hour

assessment window closes.

• You have an additional 45 minutes beyond the stated length of the paper to

allow for downloading and uploading the assessment, your files and technical

delays.

• For further details you should refer to the AEP documentation.

Notify [email protected] as soon as possible if you cannot complete

your assessment because:

• you lose your internet connection;

• your device fails;

• you become unwell and are unable to continue;

• you are affected by circumstances beyond your control (e.g. fire alarm).

Please note that this is for notification purposes, it is not a help line.

- 2 - Continued

CS4131

1. This question is about the Human Visual System (HVS).

(a) Sketch the anatomy of the eye and describe the structure of the retina

with reference to colour perception. What image analysis is performed

by the eye? Does it matter that the eye produces an up-side-down, left-

to-right image of the world? What are retinotopic maps and what do

they tell us about how the HVS processes information? In your answer,

where appropriate, give specific examples of image processing operations

performed by the eye and the brain. [10]

(b) Describe in detail the visual pathway of the HVS. Giving an example,

explain why perception is not simply a feed-forward process. What are

the similarities between how we think the HVS works and how artificial

neural networks are used to learn and perform visual tasks? [15]

- 3 - Continued

CS4131

2. (a) Giving definitions and simple examples, explain how a 1D Discrete Fourier

Transform works. What is the relationship between filtering using convo-

lution and filtering in the frequency domain? [6]

(b) A 2D Discrete Cosine Transform can be derived from the 1D discrete

projections using functions of the form:

g(x, u) = α(u) cos

[ pi

2N

(2x+ 1)u

]

,

where α(u) is,

α(u) =

√

1

N

, u = 0√

2

N

, 1 ≤ u < N.

Carefully explain this equation and how it can be used to perform de-

composition of a 1D signal f [x], 0 ≤ x < N into DCT coefficients, F [u].

Giving reasons, explain how the signal can be synthesised given the DCT

coefficients, F [u], 0 ≤ u < N . [5]

What is the 2D form of the forward DCT expansion given the 1D form

above? [3]

(c) Explain why blocking artefacts can be observed in an image compressed

by the standard JPEG technique at low bit rates. How might blocking

artefacts be reduced? [6]

(d) A webcam generates 8-bit monochrome video frames of resolution 320 ×

240 pixels at the rate of 10 frames per second (FPS). Calculate the total

number of bytes in the stream and the compression ratio achieved by a

simple DCT coder which uses 8× 8 blocks using the quantisation scheme:

8 8 6 6 4 4 2 2

8 8 6 6 4 4 2 2

6 6 6 6 4 4 2 2

6 6 6 6 4 4 2 2

4 4 4 4 4 4 2 2

4 4 4 4 4 4 2 2

2 2 2 2 2 2 2 2

2 2 2 2 2 2 2 2

i. if all the coefficients of every block are encoded

ii. if only frequency coefficients at 0 ≤ (u, v) ≤ 3 are encoded

You can assume that no other type of compression is applied to the re-

sulting encoded data stream.

[5]

- 4 - Continued

CS4131

3. (a) What are the major problems for background subtraction algorithms?

Give examples of video systems which might use background subtraction.

[6]

(b) A sequences of video frames, f(x, y, t), are being processed by a back-

ground modelling method using a running-Gaussian model and a learning

update rule:

i. What parameters define the background model? For image frames

with size 1280 × 720 using 64 bit floating point arithmetic, what is

approximately the memory size of the model in bytes? [4]

ii. Give the update equations for the model at frame t+ 1. [4]

iii. If the foreground classifier uses a significance of K standard devia-

tions, state the classifier rule for deciding if a frame pixel is foreground.

[2]

(c) The Stauffer-Grimson background model uses a mixture Gaussian distri-

bution. What is the advantage of this over a running average or a single

running-Gaussian model? [2]

(d) A two-component GMM is used to model background with weights w1 =

w2 = 0.5, means µ1 = 85, µ2 = 170 and variance σ

2

1 = σ

2

2 = 900. Explain

how the Stauffer-Grimson algorithm will update the model parameters.

If f(x, y, t+1) = 120, at some pixel (x, y), what are the parameters’ values

at t+ 1 if the running average feedback weight is α = 0.1? [7]

- 5 - Continued

CS4131

4. (a) What is the output of applying the 1D filter kernel, h(x) = {−1, 4,−1},

using a convolution operation to the following image matrix? Explain how

you deal with pixels on the boundary.

0 0 0 0

0 4 4 0

0 4 4 0

0 0 0 0

[3]

(b) Describe how an estimate of edge orientation and edge strength can be

produced by using a pair of convolution kernels,

R1 =

−1 0

0 1

, R2 =

0 −1

1 0

.

[6]

(c) A 2D Gaussian filter can be defined as

g(x, y;σ) =

1

2piσ2

exp

(

−x

2 + y2

2σ2

)

.

i. What is the effect of changing σ when g is used on an image? What

is a good size of kernel to use if say σ = 2? [4]

ii. Show how g can be made into an edge detector for vertical edges.

What does σ do in this case? [6]

iii. The 2D LoG operator takes the form

∇g(x, y, σ) = − 1

piσ4

[

1− x

2 + y2

2σ2

]

g(x, y;σ).

Show that the 3× 3 kernel

1 1 1

1 -8 1

1 1 1

is a fairly reasonable approximation. [6]

- 6 - Continued

CS4131

5. (a) What are the properties of good visual features and why? [5]

(b) What are key-points and what are feature descriptors? Given two features

sets:

P = {pi(x, y)} Q = {qj(x, y)}

with M and N numbers of features, how can nearest-neighbour matching

be used to find out if the P and Q contain the same object? [5]

(c) The following expression calculates the homogeneous coordinates of image

points for a pin-hole camera:

xy

1

=

f 0 0 00 f 0 0

0 0 1 0

X

Y

Z

1

Give a sketch of the geometry implied by this equation explaining the role

of f . [3]

(d) Using diagrams and equations, explain what are extrinsic and intrinsic

parameters of a camera. Why is camera calibration useful in image and

video analysis? [5]

(e) The combined homogeneous camera matrix M with 11 unknown param-

eters, m11 m12 m13 m14m21 m22 m23 m24

m31 m32 m33 1

,

takes world coordinates, X, onto image coordinates x. Show that given a

set of point pairs {Xi,xi}, the camera matrix can be solved using linear

least squares. What is the minimum number of points required to obtain

M? [7]

- 7 - Continued

CS4131

6. (a) A Perceptron takes two dimensional inputs, x, and produces scalar out-

puts y and has the following design:

X

In the design, the activation function is linear: f(z) = z. To train the

weights, a loss function, L(y, yˆ) = 1

2

(y − yˆ)2, is used.

i. Perform a forward pass to calculate the output given the initial input

and set of weights:

x =

(

1

1

)

, w =

0.50.5

−2

What is the loss if the corresponding true value to the current input

is yˆ = 1? [1]

ii. Write down an expressions for ∂L

∂w

and hence determine the propor-

tional gradient step, ∆w which will reduce the loss, given the single

sample pair {x, y}. What are the new weights at the second epoch if

the learning rate is set to 1

4

? [4]

iii. What is the forward-pass and the weight update if the activation

function is a ReLU? [4]

(b) Explain how Conv and Max-Pooling layers work. Including biases, what

is the total number of weights of a 2D Conv layer with 10, 3 by 3 filters

if the input size is 28 x 28? [4]

(The rest of this exam question continues on the next page.)

- 8 - Continued

CS4131

(c) Look carefully at the summary table description of a CNN intended for

classification:

i. Give an interpretation of what the network is likely to learn from

images. What can be said about the feature classification capabilities

of the fully connected part? What activation would you recommend

for the output layer and why? [6]

ii. This network is known to be overfitting on some data. Explain the

phenomenon of overfitting and what strategies can be employed to

prevent overfitting. [6]

- 9 - End