程序代写案例-BENG0095-Assignment 2

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

UNIVERSITY COLLEGE LONDON
Faculty of Engineering Sciences
BENG0095: Assignment 2
Dr. Dariush Hosseini ([email protected])
Monday 14th December 2020
Overview
• Assignment Release Date: 14th December 2020
• Assignment Hand-in Date: 5th January 2021 at 4.00 p.m.
• Weighting: 25% of module total
• Format: Problems
1
Guidelines
• You should answer all TWO questions.
• Note that not all questions carry equal marks.
• You should submit your final report as a pdf via the module’s Moodle page.
• Within your report you should begin each question on a new page.
• You should preface your report with a single page containing, on two lines:
– The module code and assignment title: ‘BENG0095: Assignment 2’
– Your candidate number: ‘Candidate Number: [YOUR NUMBER]’
• Your report should be neat and legible.
• You must use LATEX to format the report. A template, ‘BENG0095 Solution Template.tex’,
is provided on the module’s Moodle page for this purpose.
• Please attempt to express your answers as succinctly as possible.
• Please note that if your answer to a question or sub-question is illegible or incomprehen-
sible to the marker then you will receive no marks for that question or sub-question.
• Please remember to detail your working, and state clearly any assumptions which you
make.
• Unless a question specifies otherwise, then please make use of the Notation section as a
guide to the definition of objects.
• Failure to adhere to any of the guidelines may result in question-specific deduction of
marks. If warranted these deductions may be punitive, and on occasion may result in no
marks being awarded for the assignment.
Page 2
Notation & Formulae
Inputs:
x = [1, x1, x2, ..., xm]
T ∈ Rm+1
Outputs:
y ∈ R for regression problems
y ∈ {0, 1} for binary classification problems
Training Data:
S = {(x(i), y(i))}ni=1
Input Training Data:
The design matrix, X, is defined as:
X =

x(1)T
x(2)T
·
·
x(n)T
 =

1 x
(1)
1 · · x(1)m
1 x
(2)
1 · · x(2)m
· · · · ·
· · · · ·
1 x
(n)
1 · · x(n)m

Output Training Data:
y =

y(1)
y(2)
·
·
y(n)

Data-Generating Distribution:
S is drawn i.i.d. from a data-generating distribution, D
Derivatives of Traces:
∂
∂Xtr
(
BXXT
)
= BX+BTX
∂
∂Xtr
(
BXTX
)
= XBT +XB
Page 3
Properties of Rotation Matrices:
For any n-dimensional rotation matrix, R ∈ Rn×n, its properties include:
RRT = I
detR = ±1
Page 4
1. (a) [3 marks]
State the Representer Theorem, and briefly explain its importance in kernel approaches
to machine learning.
(b) [4 marks]
Consider a version of the so-called ‘Support Vector Machine’ (SVM) classifier optimi-
sation problem:
min
w
1
2
‖w‖22 + C
n∑
i=1
max
(
1− y(i)w · x(i)
)
Here:
{(x(i), y(i))}ni=1 represents a set of training data, where x ∈ Rm are input attributes,
while y ∈ {−1, 1} is the output label;
w ∈ Rm is the weight vector of the linear discriminant which we seek, and;
C > 0 is some constant.
Is this problem susceptible to the ‘Kernel Trick’? Explain.
(c) [3 marks]
State Mercer’s Theorem, and briefly explain its importance in kernel approaches to
machine learning.
(d) Given: that κ1 : Rm ×Rm → R, and κ2 : Rm ×Rm → R are valid Mercer kernels; that
κ˜(u,v) = κ1(u,v)κ2(u,v) is also a valid kernel; that α > 0 is some scalar; and that
p(·) is a polynomial with non-negative coefficients, then show that the following are
valid kernels (here u,v ∈ Rm denote input attribute vectors):
(i) [3 marks]
κ(u,v) = κ1(u,v) + κ2(u,v)
(ii) [3 marks]
κ(u,v) = ακ1(u,v)
(iii) [3 marks]
κ(u,v) = p(κ1(u,v))
(iv) [3 marks]
κ(u,v) = exp(κ1(u,v))
Page 5
(e) [3 marks]
Hence show that the following ‘Radial Basis Function’ (RBF) is a valid kernel:
κ(u,v) = exp
(
−‖u− v‖
2
2
2σ2
)
Where σ > 0
[Total for Question 1: 25 marks]
Page 6
2. Assume an unlabelled dataset, {x(i) ∈ Rm}ni=1, with sample mean, x = 1n .
In the ‘Projected Variance Maximisation’ approach to PCA we are interested in finding the
d-dimensional subspace spanned by {u[j] ∈ Rm}dj=1, where d < m, such that u[i] · u[j] =
δij ∀i, j, for which the sum of the sample variance of the data projected onto this subspace
is maximised.
We let X = [(x(1) − x), (x(2) − x), . . . , (x(n) − x)]T and U = [u[1],u[2], . . . ,u[d]].
(a) [9 marks]
Consider the orthogonal projection, PU, of the unlabelled data into the space spanned
by the columns of U such that the projection of the data point x(i) is given by:
PUx
(i) = U(UTU)−1UTx
Using this expression, and without making any further assumptions regarding the
nature of U, demonstrate that the ‘Projected Variance Maximisation’ approach to PCA
leads to an optimisation problem which can be solved by finding the stationary points
associated with the following Lagrangian, where H is a symmetric matrix composed of
Lagrange multipliers associated with the problem. (Note that we denote the Lagrange
multiplier associated with the constraint u[i] · u[j] = δij by λ[i,j]):
tr
(
UTSU
)
+ tr
(
H(I−UTU))
In so doing you should show that:
S =
1
n
XTX
[H]ij =
{
λ[i,i], i = j,
λ[i,j]
2 , i 6= j.
(b) [7 marks]
Making use of the expressions for the derivatives of traces included in the ‘Notation &
Formulae’ section, and seeking critical points of the Lagrangian, we derive the following
problem:
∇UL = 0 = ∂
∂U
tr(UTSU) +
∂
∂U
tr
(
H(I−UTU))
= SU+ STU−UH−UHT
= 2(SU−UH)
=⇒ SU = UH (1)
One solution to this problem is given by U = Ue, where the columns of Ue are the
first d eigenvectors of S, {u[j]e }dj=1, with corresponding eigenvalues {λ[j]e }dj=1 which are
ordered by magnitude with the largest first.
In this case He, the form of the matrix H associated with this solution, is characterised
by:
He = diag[λ
[1]
e , . . . , λ
[d]
e ]
Typical algorithms for finding the eigenvectors of a k×k matrix have a computational
cost that scales like O(k3).
Page 7
Using problem (1) and the solution given by U = Ue, derive the following equation
which characterises equivalent solutions {v[j] ∈ Rn}dj=1:
1
n
XXTv[j] = λ[j]e v
[j]
In so doing characterise v[j] in terms of quantities given so far.
(c) [2 marks]
Explain why this might be valuable for n m.
(d) [7 marks]
Given a solution v[j], find an expression for u
[j]
e in terms of this solution.
[Total for Question 2: 25 marks]
Page 8

欢迎咨询51作业君