辅导案例-COMP0088 1

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

COMP0088 1 TURN OVER
Alternative assessment:
Introduction to Machine Learning, COMP0088
Main Summer Examination period, 2019/20
Suitable for Cohorts: 2019/20, 2018/19, 2017/18
Distribution of Marks:
22: Support Vector Machines
10: Loss functions and margins
20: Loss-based learning and regularization
8: Neural Network training
40: Computational assignment
100: Total
Marks for each part of each question are indicated in square brackets.
There are NINE questions in total. Answer all questions.
For all True/False questions: always support your reply with a short answer, using at most
three sentences, or approximately 10-50 words in total (whatever suits you best). You may
find it easier to reply using equations - in that case there is no sentence, or word count. If you
do not provide a justification, your answer will not be taken into consideration, whether
true or false.
Calculators are permitted.
[7 marks]
b. Suppose K1(x,x′) = ψ1(x)Tψ1(x′) and K2(x,x′) = ψ2(x)Tψ2(x′), where ψ1(x) ∈
RD,ψ2(x)∈ RK are vectors of different dimensions. Let KP be the the product kernel
KP(x,x′) = K1(x,x′)K2(x,x′). Indicate the dimension and the expression of a feature
map ψP such that KP(x,x′) = ψP(x)TψP(x′).
[15 marks]
Loss functions and margins [10 Marks]
2. Consider a point that is correctly classified and distant from the decision boundary. Why
would the SVM’s decision boundary be unaffected by this point, but the one learned by
logistic regression be affected? Argue mathematically in terms of the loss functions used
to train the two different models, and how they penalize the margin associated with the
point in question. Specify what exactly ’distant’ means in the claim above.
[10 marks]
COMP0088 2 CONTINUED
1. Kernels can be composed by using addition and multiplication, yielding new Kernels that
satisfy the Mercer condition. This means that the sum of two kernels is a kernel, and the
product of two kernels is a kernel. You are asked to show this by expressing the resulting
Kernels in terms of the inner-products of the original Kernels
a. Suppose K1(x,x′) = ψ1(x)Tψ1(x′) and K2(x,x′) = ψ2(x)Tψ2(x′). Let KS be the sum
kernel, Ks(x,x′) = K1(x,x′)+K2(x,x′). Find a feature map ψS such that Ks(x,x′) =
ψS(x)TψS(x′).
Support Vector Machines [22 Marks]
Loss-based learning and regularization [20 Marks]
3. You are provided with a training set X = {(x1,y1),(x2,y2), . . . ,(xN ,yN)}.
Your training objective is expressed as the weighted sum of a data-dependent term, D(w),
and an `2 regularization term, R(w):
Jλ(w) =
1
N
N
∑
i=1
l(xi,yi)︸︷︷︸
D(w)
+λ‖w‖22︸︷︷︸
R(w)
(1)
The data-dependent term penalizes the deviation of the model predictions from the target
values on the training set, e.g. for least-squares prediction we have l(xi,yi) = (yi−wT xi)2.
For a given value of λ we denote by w∗λ the optimum of the associated optimization
problem:
w∗λ = argminw Jλ(w) (2)
We assume that we can always compute the global optimum of Jλ(w) with respect to
w - this is indeed the case for the case of linear regression, logistic regression, or SVM
training.
We are interested in understanding how the learned parameter vector and associated cost
terms are affected by changes in λ. You are asked to indicate which of the following
statements is true of false, while justifying your answers (please consult the instructions
at beginning of this test - if you do not provide a valid justification, your answer will not
be taken into consideration).
a. Decreasing λ will result in an decrease of R(w∗λ).
[5 marks]
b. Decreasing λ will result in an decrease of D(w∗λ).
COMP0088 3 TURN OVER
[5 marks]
c. Decreasing λ will result in an decrease of J(w∗λ). Here, rather than a verbal justifi-
cation, you need to prove this mathematically.
[5 marks]
d. Increasing λ always improves generalization performance.
[5 marks]
Neural Network training [8 Marks]
4. Consider the task of setting hyperparameters for a neural network; in particular adjusting
the learning rate, momentum and minibatch size for SGD, weight decay, dropout, number
of layers and number of neurons per layer. Recall that hyperparameters are often tuned
using a validation set.
a. (a) Among the above hyperparameters, indicate for which it is possible to tune on
the training set (rather than a validation set). Briefly explain your answer.
[3 marks]
b. (b) Indicate which hyperparameters should be tuned on a validation set, rather than
the training set. Briefly explain your answer.
[5 marks]
COMP0088 4 CONTINUED
Computational Assignment [40 Marks]
Please follow the instructions of the jupyter notebook accompanying this exam and return your
answers to this part in the form of a jupyter notebook.
The assignment has two parts, comprising in turn different questions. Their contributions to
the total mark is detailed below.
0.1 Part A: Autoencoders [25 Marks]
Question 1 Autoencoder-based PCA:
[10 marks]
Question 2 2-layer autoencoder:
[5 marks]
Question 3 3-layer autoencoder:
[10 marks]
0.2 Part B: Latent space-based synthesis [15 Marks]
Question 1 Autoencoder-based PCA:
[10 marks]
Question 2 2-layer autoencoder:
[5 marks]
[Total 100 marks]
END OF PAPER
COMP0088 5