辅导案例-CS4131

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

CS4131
THE UNIVERSITY OF WARWICK
LEVEL 7 Open Book Assessment 2 hours
Department of Computer Science
CS413 Image and Video Analysis
Instructions
1. Read all instructions carefully and read through the entire paper before you
start writing.
2. You should attempt 4 questions. You should NOT submit answers to more
than the required number of questions.
3. All questions will carry the same number of marks.
4. You should handwrite your answers either with paper and pen or using an
electronic device with a stylus (unless you have special arrangements for exams
which allow the use of a computer).
5. Begin each question on a new page and clearly mark each page with the page
number, your student ID and the question number.
(a) Handwritten notes must be scanned or photographed and all individual
solutions should (if possible) collated into a single PDF with pages in the
correct order.
(b) You must upload two files to the AEP: your PDF of solutions and a
completed cover sheet.
(c) You must click FINISH ASSESSMENT to complete the submission pro-
cess. After you have done so you will not be able to upload anything
further.
6. Please check the legibility of your final submission before uploading. It is your
responsibility to ensure that your work can be read.
- 1 - Continued
CS4131
7. You are allowed to access module materials, notes, resources, references and
the internet during the assessment.
8. You should not try to communicate with any other candidate during the assess-
ment period or seek assistance from anyone else in completing your answers.
The Computer Science Department expects the conduct of all students tak-
ing this assessment to conform to the stated requirements. Measures will be
in operation to check for possible misconduct. These will include the use of
similarity detection tools and the right to require live interviews with selected
students following the assessment.
9. By starting this assessment, you are declaring yourself fit to undertake it. You
are expected to make a reasonable attempt at the assessment by answering the
questions in the paper.
Please note that:
• You must have completed and uploaded your assessment before the 24 hour
assessment window closes.
• You have an additional 45 minutes beyond the stated length of the paper to
allow for downloading and uploading the assessment, your files and technical
delays.
• For further details you should refer to the AEP documentation.
Notify [email protected] as soon as possible if you cannot complete
your assessment because:
• you lose your internet connection;
• your device fails;
• you become unwell and are unable to continue;
• you are affected by circumstances beyond your control (e.g. fire alarm).
Please note that this is for notification purposes, it is not a help line.
- 2 - Continued
CS4131
1. This question is about the Human Visual System (HVS).
(a) Sketch the anatomy of the eye and describe the structure of the retina
with reference to colour perception. What image analysis is performed
by the eye? Does it matter that the eye produces an up-side-down, left-
to-right image of the world? What are retinotopic maps and what do
they tell us about how the HVS processes information? In your answer,
where appropriate, give specific examples of image processing operations
performed by the eye and the brain. [10]
(b) Describe in detail the visual pathway of the HVS. Giving an example,
explain why perception is not simply a feed-forward process. What are
the similarities between how we think the HVS works and how artificial
neural networks are used to learn and perform visual tasks? [15]
- 3 - Continued
CS4131
2. (a) Giving definitions and simple examples, explain how a 1D Discrete Fourier
Transform works. What is the relationship between filtering using convo-
lution and filtering in the frequency domain? [6]
(b) A 2D Discrete Cosine Transform can be derived from the 1D discrete
projections using functions of the form:
g(x, u) = α(u) cos
[ pi
2N
(2x+ 1)u
]
,
where α(u) is,
α(u) =

√
1
N
, u = 0√
2
N
, 1 ≤ u < N.
Carefully explain this equation and how it can be used to perform de-
composition of a 1D signal f [x], 0 ≤ x < N into DCT coefficients, F [u].
Giving reasons, explain how the signal can be synthesised given the DCT
coefficients, F [u], 0 ≤ u < N . [5]
What is the 2D form of the forward DCT expansion given the 1D form
above? [3]
(c) Explain why blocking artefacts can be observed in an image compressed
by the standard JPEG technique at low bit rates. How might blocking
artefacts be reduced? [6]
(d) A webcam generates 8-bit monochrome video frames of resolution 320 ×
240 pixels at the rate of 10 frames per second (FPS). Calculate the total
number of bytes in the stream and the compression ratio achieved by a
simple DCT coder which uses 8× 8 blocks using the quantisation scheme:
8 8 6 6 4 4 2 2
8 8 6 6 4 4 2 2
6 6 6 6 4 4 2 2
6 6 6 6 4 4 2 2
4 4 4 4 4 4 2 2
4 4 4 4 4 4 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
i. if all the coefficients of every block are encoded
ii. if only frequency coefficients at 0 ≤ (u, v) ≤ 3 are encoded
You can assume that no other type of compression is applied to the re-
sulting encoded data stream.
[5]
- 4 - Continued
CS4131
3. (a) What are the major problems for background subtraction algorithms?
Give examples of video systems which might use background subtraction.
[6]
(b) A sequences of video frames, f(x, y, t), are being processed by a back-
ground modelling method using a running-Gaussian model and a learning
update rule:
i. What parameters define the background model? For image frames
with size 1280 × 720 using 64 bit floating point arithmetic, what is
approximately the memory size of the model in bytes? [4]
ii. Give the update equations for the model at frame t+ 1. [4]
iii. If the foreground classifier uses a significance of K standard devia-
tions, state the classifier rule for deciding if a frame pixel is foreground.
[2]
(c) The Stauffer-Grimson background model uses a mixture Gaussian distri-
bution. What is the advantage of this over a running average or a single
running-Gaussian model? [2]
(d) A two-component GMM is used to model background with weights w1 =
w2 = 0.5, means µ1 = 85, µ2 = 170 and variance σ
2
1 = σ
2
2 = 900. Explain
how the Stauffer-Grimson algorithm will update the model parameters.
If f(x, y, t+1) = 120, at some pixel (x, y), what are the parameters’ values
at t+ 1 if the running average feedback weight is α = 0.1? [7]
- 5 - Continued
CS4131
4. (a) What is the output of applying the 1D filter kernel, h(x) = {−1, 4,−1},
using a convolution operation to the following image matrix? Explain how
you deal with pixels on the boundary.
0 0 0 0
0 4 4 0
0 4 4 0
0 0 0 0
[3]
(b) Describe how an estimate of edge orientation and edge strength can be
produced by using a pair of convolution kernels,
R1 =
−1 0
0 1
, R2 =
0 −1
1 0
.
[6]
(c) A 2D Gaussian filter can be defined as
g(x, y;σ) =
1
2piσ2
exp
(
−x
2 + y2
2σ2
)
.
i. What is the effect of changing σ when g is used on an image? What
is a good size of kernel to use if say σ = 2? [4]
ii. Show how g can be made into an edge detector for vertical edges.
What does σ do in this case? [6]
iii. The 2D LoG operator takes the form
∇g(x, y, σ) = − 1
piσ4
[
1− x
2 + y2
2σ2
]
g(x, y;σ).
Show that the 3× 3 kernel
1 1 1
1 -8 1
1 1 1
is a fairly reasonable approximation. [6]
- 6 - Continued
CS4131
5. (a) What are the properties of good visual features and why? [5]
(b) What are key-points and what are feature descriptors? Given two features
sets:
P = {pi(x, y)} Q = {qj(x, y)}
with M and N numbers of features, how can nearest-neighbour matching
be used to find out if the P and Q contain the same object? [5]
(c) The following expression calculates the homogeneous coordinates of image
points for a pin-hole camera:
 xy
1
 =
 f 0 0 00 f 0 0
0 0 1 0


X
Y
Z
1

Give a sketch of the geometry implied by this equation explaining the role
of f . [3]
(d) Using diagrams and equations, explain what are extrinsic and intrinsic
parameters of a camera. Why is camera calibration useful in image and
video analysis? [5]
(e) The combined homogeneous camera matrix M with 11 unknown param-
eters,  m11 m12 m13 m14m21 m22 m23 m24
m31 m32 m33 1
 ,
takes world coordinates, X, onto image coordinates x. Show that given a
set of point pairs {Xi,xi}, the camera matrix can be solved using linear
least squares. What is the minimum number of points required to obtain
M? [7]
- 7 - Continued
CS4131
6. (a) A Perceptron takes two dimensional inputs, x, and produces scalar out-
puts y and has the following design:
X
In the design, the activation function is linear: f(z) = z. To train the
weights, a loss function, L(y, yˆ) = 1
2
(y − yˆ)2, is used.
i. Perform a forward pass to calculate the output given the initial input
and set of weights:
x =
(
1
1
)
, w =
 0.50.5
−2

What is the loss if the corresponding true value to the current input
is yˆ = 1? [1]
ii. Write down an expressions for ∂L
∂w
and hence determine the propor-
tional gradient step, ∆w which will reduce the loss, given the single
sample pair {x, y}. What are the new weights at the second epoch if
the learning rate is set to 1
4
? [4]
iii. What is the forward-pass and the weight update if the activation
function is a ReLU? [4]
(b) Explain how Conv and Max-Pooling layers work. Including biases, what
is the total number of weights of a 2D Conv layer with 10, 3 by 3 filters
if the input size is 28 x 28? [4]
(The rest of this exam question continues on the next page.)
- 8 - Continued
CS4131
(c) Look carefully at the summary table description of a CNN intended for
classification:
i. Give an interpretation of what the network is likely to learn from
images. What can be said about the feature classification capabilities
of the fully connected part? What activation would you recommend
for the output layer and why? [6]
ii. This network is known to be overfitting on some data. Explain the
phenomenon of overfitting and what strategies can be employed to
prevent overfitting. [6]
- 9 - End