—— SOLUTIONS ——
January 2018 7CCSMCVI
1. Compulsory Question
a. Give a brief definition of each of the following terms.
i. image processing
ii. mid-level vision
iii. horopter
[6 marks]
i) image processing = signal processing applied to an image, with
another image as the resulting output
ii) mid-level vision = a range of processes that group together related
image elements, and to segment them from all other image elements
iii) horopter = an imaginary surface on which all points have zero
disparity
Marking scheme
2 marks for each correct definition.
Page 2 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
b. Below are shown a convolution mask, H and an image I.
H =

0 0 1
0 1 0
0 0 1
 I =

0 1 1 1
2 0 2 0
2 2 1 1
2 2 0 0

What is the result of the convolution of mask H with image I? The
result should be an image that is the same size as I.
[5 marks]
0 3 1 3
2 2 5 2
2 6 3 3
2 4 2 1

Marking scheme
5 marks. Partial marks are possible for partially correct answers.
c. Briefly compare the mechanisms used for sampling an image in a cam-
era and in an eye.
[6 marks]
Camera:
• Has sensing elements sensitive to three wavelengths (RGB).
• Sensing elements occur in a fixed ratio across the whole image
plane.
• The sampling density is uniform across the whole image plane.
Eye:
• Has sensing elements sensitive to four wavelengths (RGBW).
• Sensing elements occur in a variable ratios across the image plane
(cone density highest at fovea, rod density highest outside fovea).
Page 3 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
• The sampling density is non-uniform across the image plane (density
is highest at the fovea).
Marking scheme
3 marks for each part.
d. The RGB channels for a 3-by-3 pixel colour image are shown below.
R =

140 140 150
150 140 150
0 10 20
 G =

160 170 255
170 160 150
0 0 10
 B =

200 190 180
210 200 200
255 200 210

i. What is the colour of the pixel at coordinates (1,3)?
[2 marks]
Blue
Marking scheme
2 marks
ii. What is the colour of the surface in the world shown at coordinates
[2 marks]
Unknown. The RGB values of the image will depend both on the
properties of the surface (including its colour) and the properties of
the light it is reflecting. Without knowledge of the latter, we can’t
know the former.
Marking scheme
2 marks
Page 4 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
e. Briefly explain the differences between “viewer-centred” and “object-
centred” approaches to object recognition?
[4 marks]
In the viewer-centred approach, the 3D object is modelled as a set of
2D images, showing different views of the object.
In the object-centred approach, a single 3D model is used to describe
the object.
Marking scheme
4 marks.
Page 5 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
2.
a. Draw a cross-sectional diagram showing how a lens forms an image
(P’ ) of a point (P). Ensure that you label the optical centre (O), the
focal point (F), and the coordinates of the world point (y,z) and the
image point (y’,z’).
[5 marks]
Marking scheme
5 marks
b. Derive the thin lens equation, which relates the focal length of a lens
to the depths of the image and object.
[6 marks]
From similar triangles:
y′
z′ =
y
z =⇒ y′ = z
′y
z
and
y′
z′−f =
y
f =⇒ y′ = (z
′−f)y
f
Page 6 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
equating for y’: z
′y
z =
(z′−f)y
f
y cancels, hence: z

z =
(z′−f)
f =
z′
f − 1
dividing both sides by z’: 1z =
1
f − 1z′ =⇒ 1z + 1z′ = 1f
Marking scheme
6 marks
c. If a lens has a focal length of 30mm at what depth should the image
plane be placed to bring an object 6m from the camera into focus?
[3 marks]
1
f =
1
‖z‖ +
1
‖z′‖ =⇒ 1‖z′‖ = 1f − 1‖z‖
For object at 6m:
1
‖z′‖ =
1
30 − 16000 =⇒ z′ = 30.15mm
Marking scheme
3 marks.
d. Briefly compare the mechanisms used for focusing a camera and an
eye.
[4 marks]
A camera lens has a fixed shape, and hence, a fixed focal length.
Focusing is achieved by moving the lens to change the distance to the
image plane.
An eye lens has an adjustable shape, and hence, a variable focal length.
Whereas the distance between the lens and the image plane (the retina)
is fixed. Focusing is achieved by changing the focal length of the lens.
Marking scheme
Page 7 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
4 marks.
e. Derive the equation for the pinhole camera model of image formation
relating the coordinates of a 3D point P(y,z) to the coordinates of its
image P’(y’,f’). Note that in the pinhole camera model, the image
plane is located at distance f’ from the optical centre.
[4 marks]
From similar triangles (as before):
y′
z′ =
y
z =⇒ y′ = z
′y
z
substituting z′ = f ′ gives: y′ = f
′y
z
Marking scheme
4 marks.
f. Use the pinhole camera model to calculate the coordinates (x’,y’) of
the image of a point in 3D space which has coordinates (0.4,0.5,6)
measured, in metres, relative to the optical centre of the camera. As-
sume that the lens has a focal length of 30mm.
[3 marks]
x′ =
f ′x
z
=⇒ x′ = 30 ∗ 400
6000
= 2mm
y′ =
f ′y
z
=⇒ y′ = 30 ∗ 500
6000
= 2.5mm
Marking scheme
1 mark each, plus 1 additional mark for getting the units correct.
Page 8 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
3.
a. To locate intensity discontinuities in an image a difference mask is
usually “combined” with a smoothing mask.
i. How are these masks “combined”?
ii. Why is this advantageous for edge detection?
[5 marks]
i) Masks are combined using convolution.
Marking scheme
2 marks
ii) A difference mask is sensitive to noise as well as other intensity-level
discontinuities.
The combination of the two produces a mask that is sensitive to
intensity-level discontinuities that are image features rather than noise.
Marking scheme
3 marks
Page 9 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
b. Use the following formula for a 2D Gaussian to calculate a 3-by-3 pixel
numerical approximation to a Gaussian with standard deviation of 0.46
pixels, rounding values to two decimal places.
G(x, y) =
1
2piσ2
exp
−(x2 + y2)
2σ2

[3 marks]
=

0.01 0.07 0.01
0.07 0.75 0.07
0.01 0.07 0.01

Marking scheme
1 mark each for the 3 different values.
c. Convolution masks can be used to provide a finite difference approx-
imation to first and second order directional derivatives. Write down
the masks that approximate the following directional derivatives:
i.− δδx
ii.− δ2δy2
[4 marks]
i) − δδx ≈
[
−1 1
]
ii) − δ2δx2 ≈

−1
2
−1

Marking scheme
2 marks for each correct definition.
Page 10 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
tion 3.b with the difference mask given in answer to question 3.c to
produce a 4-by-3 pixel x-derivative of Gaussian mask.
[3 marks]
To calculate the x-derivative of Gaussian mask:
Gx = G ∗ [−1, 1] =

−0.01 −0.06 0.06 0.01
−0.07 −0.68 0.68 0.07
−0.01 −0.06 0.06 0.01

Marking scheme
3 marks.
e. In order to locate intensity discontinuities in both the x and y directions
an image can be convolved with an x-derivative of Gaussian mask and
a y-derivative of Gaussian mask. Assuming the result of these two
convolutions are two images Ix and Iy of equal size, a single image
showing intensity discontinuities in all direction can be calculated by
taking the L2-norm of corresponding pixels in these two images. Write
a MATLAB function Ixy = l2norm(Ix, Iy) that will combine Ix and
Iy using the L2-norm.
[4 marks]
function Ixy = l2norm(Ix,Iy)
Ixy=sqrt(Ix.^2+Iy.^2);
Marking scheme
4 marks.
Page 11 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
f. Derivative of Gaussian masks (in the x and y directions) are used by the
Canny edge detector. Describe briefly in words, or using pseudo-code,
each step performed by the Canny edge detection algorithm.
[6 marks]
1. convolve the image with each derivative of Gaussian mask, to gen-
erate Ix and Iy.
2. calculate the magnitude and direction of the intensity gradient
(M =

I2x + I
2
y , D = tan
−1
(
Iy
Ix
)
).
3. perform non-maximum suppression (thin multi-pixel wide edges
down to a single pixel by setting M to zero for all pixels that have a
neighbour, perpendicular to the direction of the edge, with a higher
magnitude).
4. perform hysteresis thresholding (pixels above high thresholds set
to one, pixels below low threshold set to zero, pixels with values
between low and high thresholds set to one if they are connected to
a pixel with a magnitude over the high threshold, and set to zero
otherwise).
Marking scheme
6 marks
Page 12 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
4.
a. Below are four simple images. For each image identify the “Gestalt
Law” that accounts for the observed grouping of the image elements.
i.
ii.
iii.
iv.
[8 marks]
i) similarity
ii) proximity
iii) common region
iv) closure
Marking scheme
2 marks for each correct definition.
Page 13 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
b. One method of grouping image elements is clustering. Write pseudo-
code for the agglomerative hierarchical clustering algorithm.
[5 marks]
1. Assign each data point to a unique cluster
2. Compute the similarity between each pair of clusters (store this in
a proximity matrix)
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains (or an earlier stopping criterion
has been met)
Marking scheme
5 marks.
Page 14 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
c. The array below shows feature vectors for each pixel in a 2-by-3 pixel
image. 
(10, 15, 5) (15, 15, 15)
(5, 15, 10) (20, 10, 15)
(10, 20, 5) (10, 15, 5)

Apply the agglomerative hierarchical clustering algorithm to assign pix-
els into three regions. Assume that (1) the method used to assess
similarity is the sum of absolute differences (SAD), and (2) centroid
clustering is used to calculate the distance between clusters.
[8 marks]
Each point is a separate cluster initially. Compute the distance be-
tween each pair of clusters:

c1 c2 c3 c4 c5 c6
c1 : (10, 15, 5) − − − − − −
c2 : (15, 15, 15) 15 − − − − −
c3 : (5, 15, 10) 10 15 − − − −
c4 : (20, 10, 15) 25 10 25 − − −
c5 : (10, 20, 5) 5 20 15 30 − −
c6 : (10, 15, 5) 0 15 10 25 5 −

Merge closest clusters (c1 and c6), and update the proximity matrix:

c1 + c6 c2 c3 c4 c5 c6
c1 + c6 : (10, 15, 5) − − − − − −
c2 : (15, 15, 15) 15 − − − − −
c3 : (5, 15, 10) 10 15 − − − −
c4 : (20, 10, 15) 25 10 25 − − −
c5 : (10, 20, 5) 5 20 15 30 − −

Merge closest clusters (c1+c6 and c5), and update the proximity ma-
trix:
Page 15 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI

c1 + c6 + c5 c2 c3 c4 c5 c6
c1 + c6 + c5 : (10, 16.67, 5) − − − − − −
c2 : (15, 15, 15) 16.67 − − − − −
c3 : (5, 15, 10) 11.67 15 − − − −
c4 : (20, 10, 15) 26.67 10 25 − − −

Merge closest clusters (c2 and c4). We have three regions, so stop.
Regions are: c1+c6+c5, c2+c4, and c3.
Marking scheme
8 marks. 3 marks for proximity matrices, 2 for calculating distances
correctly, 2 for merging closest clusters correctly, 1 mark for knowing
when to stop.
d. In question 4.c SAD was used to assess the similarity between clusters.
It is also possible to perform clustering using a number of other stan-
dard metrics. If a and b represent the feature vectors associated with
two clusters, write down the formulae for comparing these two vectors
using:
i. sum of squared differences
ii. correlation coefficient
[4 marks]
sum of squared difference=

i (ai − bi)2
correlation coefficient =

i(ai−a¯)(bi−b¯)√∑
i(ai−a¯)2
√∑
i(bi−b¯)2
Marking scheme
2 marks for each correct definition.
Page 16 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
5.
a. Define what is meant by the “aperture problem” and suggest how this
problem can be overcome.
[4 marks]
The aperture problem refers to the fact that the direction of motion
of a small image patch can be ambiguous.
Particularly, for an edge information is only available about the motion
perpendicular to the edge, while no information is available about the
component of motion parallel to the edge.
Overcoming the aperture problem might be achieved by
1. integrating information from many local motion detectors / image
patches, or
2. by giving preference to image locations where image structure pro-
vides unambiguous information about optic flow (e.g. corners).
Marking scheme
2 marks for describing problem, 1 mark each for possible solutions.
b. Two frames in a video sequence were taken at times t and t+0.04s.
The point (110,50,t) in the first image has been found to correspond to
the point (95,50,t+0.04) in the second image. Given that the camera
is moving at 0.5ms−1 along the camera x-axis, the focal length of the
camera is 35mm, and the pixel size of the camera is 0.1mm/pixel,
calculate the depth of the identified scene point.
[4 marks]
Answer The depth is given by: Z = −fVxx˙ .
The velocity of the image point is 95−1100.04 = −375 pixels/s.
Given the pixel size this is equivalent to 0.0001 × −375 = −0.0375
m/s.
Page 17 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
Hence, the depth is Z = −0.035×0.5−0.0375 = 0.467m.
Marking scheme
2 marks for equation, 2 marks for correct application.
c. Two frames in a video sequence were taken at times t and t+0.04s.
The point (140,100,t) in the first image has been found to correspond
to the point (145,100,t+0.04) in the second image. Given that the
camera is moving at 0.5ms−1 along the optical axis of the camera
(i.e., the z-axis), and the centre of the image is at pixel coordinates
(100,100), calculate the depth of the identified scene point.
[4 marks]
Answer The depth is given by: Z2 =
x1Vz
x˙ .
The coordinates of the points with respect to the centre of the image
are: (40,0,t) and (45,0,t+0.1).
The velocity of the image point is 45−400.04 = 125 pixels/s.
Hence, the depth is Z2 =
40×0.5
125 = 0.16m.
Marking scheme
2 marks for equation, 2 marks for correct application.
d. Give an equation for the time-to-collision of a camera and a scene point
which does not require the recovery of the depth of the scene point.
Using this equation, calculate the time-to-collision of the camera and
the scene point in question 5.c, assuming the camera velocity remains
constant.
[3 marks]
Hence, time-to-collision = 40125 = 0.32s.
Marking scheme
2 marks for equation, 1 mark for correct application.
Page 18 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
e. In order to calculate depth or time-to-collision using video, it is neces-
sary to determine which image locations in two video frames correspond
to the same location in the world. Briefly describe two constraints typ-
ically applied to solving this video correspondence problem, and note
circumstances in which each constraint fails.
[6 marks]
• Spatial coherence (assume neighbouring points have similar optical
flow). Fails at discontinuities between surfaces at different depths.
• Small motion (assume optical flow vectors have small magnitude).
Fails if relative motion is fast or frame rate is slow.
Marking scheme
2 for each, plus 1 each for failure cases
f. There are many other cues to depth that can be obtained from a single
image. Name any four of these monocular cues to depth.
[4 marks]
Any four from:
• Interposition/Occlusion
• Size familiarity
• Linear perspective
• Aerial perspective
Marking scheme
1 mark for each.
Page 19 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
6.
a. What are “geons”, and what is their hypothesised role in biological
object recognition?
[4 marks]
Geons are geometrical icons, simple volumes such as cubes, spheres,
cylinders, and wedges. There is a hypothesis that object recognition
in biological systems is based on the ability to recognise a small set of
shapes geons from which more complex objects are built up. The
visual system breaks down an object into geons and compares this
arrangement of geons with arrangements of geons of known objects.
Marking scheme
2 marks for knowing what geons are. 2 marks for explaining how they
are used.
b. Below are shown three binary templates T1, T2 and T3 together with
a patch I of a binary image.
T1 =

1 1 1
1 1 1
1 1 1
 , T2 =

1 1 1
1 1 0
1 1 1
 , T3 =

1 1 1
1 0 0
1 1 1
 ,
I =

1 1 1
1 0 1
1 1 1

Determine which template best matches the image patch using the
following similarity measures:
i. cross-correlation,
[3 marks]
Page 20 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
ii. normalised cross-correlation,
[3 marks]
iii. sum of absolute differences.
[3 marks]
i) Cross-correlation.
Similarity = ∑
i,j
T (i, j)I(i, j)
For T1 Similarity = 8
For T2 Similarity = 7
For T3 Similarity = 7
Both T1 is the best match.
Marking scheme
2 for correct method, 1 for correct match
ii) Normalised cross-correlation.
Similarity = ∑
i,j T (i, j)I(i, j)√∑
i,j T (i, j)2
√∑
i,j I(i, j)2
For T1 Similarity =
8√
9×√8 = 0.943
For T2 Similarity =
7√
8×√8 = 0.875
For T3 Similarity =
7√
7×√8 = 0.935
T1 is the best match.
Marking scheme
2 for correct method, 1 for correct match
iii) Sum of absolute differences
Distance = ∑
i,j
‖T (i, j)− I(i, j)‖
Page 21 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
For T1 Distance = 8(1− 1) + 1(1− 0) = 1
For T2 Distance = 7(1− 1) + 1(1− 0) + 1(0− 1) = 2
For T3 Distance = 7(1− 1) + 1(1− 0) + 1(0− 0) = 1
T1 and T3 match equally well.
Marking scheme
2 for correct method, 1 for correct match
c. Below are an edge template T and a binary image I which is the result
of pre-processing an image to locate the edges.
T =

1 1 1
1 0 1
1 1 1
 , I =

0 0 1 0
0 1 1 1
0 0 0 1
0 1 1 1

Calculate the result of performing edge matching on the image, and
hence, suggest the location of the object depicted in the edge template
assuming that there is exactly one such object in the image. Calculate
the distance between the template and the image as the average of
the minimum distances between points on the edge template (T ) and
points on the edge image (I). Only consider those locations where the
template fits entirely within the image.
[5 marks]
At pixel (2,2) Distance =
1
8
[√
2 + 1 + 0 + 1 + 0 +

2 + 1 + 1
]
= 0.855
At pixel (3,2) Distance =
1
8
[1 + 0 + 1 + 0 + 0 + 1 + 1 + 0] = 0.5
At pixel (2,3) Distance =
1
8
[
1 + 0 + 0 +

2 + 1 + 1 + 0 + 0
]
= 0.552
Page 22 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
At pixel (3,3) Distance =
1
8
[0 + 0 + 0 + 1 + 0 + 0 + 0 + 0] = 0.125
Hence, object at location (3,3).
Marking scheme
5 marks.
d. A production line produces two objects (A and B) which are sorted into
separate bins using a computer vision system controlling a robot arm.
The two objects have distinct shapes from most viewpoints. However,
when object A lies at orientation 1 it is indistinguishable from object
B lying at orientation 2.
It is known that the production line produces four times as many of
object A than object B. It is also known that the probability of object
A lying at orientation 1 is 0.02, while the probability of object B lying
at orientation 2 is 0.04.
Use Bayes’ theorem to determine the bin into which the robot should
sort an object which could be either object A at orientation 1 or object
at orientation 2 in order to minimise the number of errors.
[7 marks]
p(objA) = 0.8
p(objB) = 0.2
p(I|objA) = 0.02
p(I|objB) = 0.05
p(objA|I) = p(I|objA)p(objA)p(I) = k(0.02× 0.8) = 0.016k
p(objB|I) = p(I|objB)p(objB)p(I) = k(0.05× 0.2) = 0.01k
Hence, indistinguishable images are most likely to contain object A.
Marking scheme
Page 23 SEE NEXT PAGE
—— SOLUTIONS ——
January 2018 7CCSMCVI
3 marks for knowing Bayes’ theorem, 4 marks for knowing how to
correctly apply the theorem to this task.
Page 24 FINAL PAGE

Email:51zuoyejun

@gmail.com