程序代写案例-MAY 2021

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Page 1 of 7

MAY 2021
48 HOUR ASSESSMENT
SCHOOL OF COMPUTER SCIENCE

MODULE CODE:

CS5014
MODULE TITLE:

Machine Learning
TIME TO HAND IN: 48 hours

EXAM
INSTRUCTIONS
a. Answer all questions
b. Each question indicates the number of marks it
carries. The paper carries a total of 60 marks.

This assessment consists of exam-style questions and you should answer as you
would in an exam. As such, citations of sources are not expected, but your answers
should be from your own memory and understanding and significant stretches of text
should not be taken verbatim from sources. Any illustrations or diagrams you include
should be original (hand or computer drawn). You may word-process your answers,
or hand-write and scan them. In either case, please return your answers as a single
PDF. If you handwrite, please make sure the pages are legible, the right way up and
in the right order. Your submission should be your own unaided work. While you are
encouraged to work with your peers to understand the content of the course while
revising, once you have seen the questions you should avoid any further discussion
until you have submitted your results. You must submit your completed assessment
on MMS within 48 hours of it being sent to you. Assuming you have revised the module
contents beforehand, answering the questions should take no more than three hours.
Page 2 of 7

1. Machine Learning Concepts:

(a) A machine learning model is associated with the following cost function:
J() = !()")#"$! +(,% − & − '%0()%$! + (("(#"$!
where % is a vector representing the th training sample, % is a scalar
representing the target variable (output) for the ith sample, is the number
of training samples, and is the number of features. The |∙| operator
represents taking the absolute value.
Answer the following questions, always explaining your reasoning:
(i) Identify all loss terms and regularisation terms. [2 marks]
(ii) What is the equation of the model's prediction function f(x)? Is it a
linear or a non-linear model and why? [2 marks]
(iii) Is this a regression or classification model and why? [2 marks]
(iv) What are the parameters and hyperparameters of your model?
[2 marks]
(v) How would you determine the exact values of parameters and
hyperparameters to use in your final model? [2 marks]

(b) Do you agree or disagree with the following statements and why?
(i) Method A is better than method B if A's training accuracy is better
than B's training accuracy. [2 marks]
(ii) A neural network is a non-linear model as long as it has one or more
hidden layers. [2 marks]
(iii) By using Newton's method, both linear regression and logistic
regression will converge in one iteration. [2 marks]
(iv) The following approach is a correct way to reduce dimensionality
from % ∈ ℝ* to ℝ( while retaining as much variation as possible:
• zero-centre the data: % ← % − !)∑ %)%$!
• calculate the scatter matrix
• find the n pairs of eigenvalues/eigenvectors {% , %} of
• find the two smallest eigenvalues and corresponding
eigenvectors vectors !, (
• output % = @(%)'!, (%)'(A for = 1, . . . ,
[2 marks]
Page 3 of 7

(v) Given the loss function
(&, !) = F(,% − !% − &0()%$! H + 100000!&&&&&&(
the fitted model corresponds to the dashed red line in the following
plot: [2 marks]

[Total marks 20]

0 2 4 6 8 10
0
20
40
60
x
y
Page 4 of 7

2. Support Vector Machines
The plot shown in Figure 1 shows the results of applying a Support Vector
Machine classifier on a 2-dimensional training set. The solid line indicates the
decision boundary between the two classes, and dotted lines represent the
margin. Each data point is identified by a number.

(a) Based on this plot, answer the following questions, showing your
reasoning and any calculations:
(i) Calculate classification accuracy, precision, recall, and F1-score on the
training data in Figure 1. [4 marks]
(ii) Is this a soft-margin or a hard-margin solution and does it use a
kernel? [2 marks]
(iii) Identify all support vectors and all non-zero slack variables in
Figure 1. [2 marks]
(iv) Imagine that you retrain the model after removing point 10. What
effect would this have on the decision boundary and the margin?
[2 marks]
(b) The SVM classification equation is given by:
() = & +(,M%N, %O-%$! ,
where 〈, %〉 is the scalar (dot) product of vectors and %.
(i) How does the presence of the dot product allow us to use the kernel
trick? [2 marks]
(ii) Based on Figure 1, list all training points % which are needed to
classify a new data point . [2 marks]

1
2
3
4
6
7
8
5
9
10
11
Figure 1
Page 5 of 7

(c) The equation for a third-order polynomial kernel is given by: () = (1 + 〈, ′〉).,
where 〈, ′〉 is the scalar (dot) product of vectors and ′.
Assuming two-dimensional input samples = [!, (]' , this kernel
represents a polynomial expansion into a higher-dimensional feature space [ℎ!(),⋯ , ℎ/()]'.
(i) How can you verify that the equation above for () is a valid kernel
that can be used in a Support Vector Machine? [2 marks]
(ii) What is the number of dimensions R of the expanded features and
what is their form?
[2 marks]
(iii) Once a model has been trained using this kernel, will it be faster or
slower to evaluate on new data compared to a linear model? Why?
[2 marks]
[Total marks 20]

Page 6 of 7

3. Unsupervised Learning
(a) What is the relationship between K-means and the EM algorithm for finite
mixture of Gaussian models?
[4 marks]
(b) Give an example where you expect the EM algorithm to outperform K-
means, and an example where you expect K-means to perform better.
Explain your reasoning and show any plots you find useful.
[4 marks]
(c) James wants to cluster a dataset by using a finite mixture of Gaussians. He
has tried K=1, 2, 3, ..., 10 and found the following plot of log-likelihood
(Figure 2) as a function of K:

Figure 2 Log-likelihood of mixture Gaussian models with different K
Based on Figure 2, he decides to choose K=10 as the value with the highest
log-likelihood.
Do you agree or disagree with this decision? If you agree, explain why. If
you disagree, how would you choose K?
[2 marks]
(d) Write an EM algorithm with a hard assignment E-step in pseudo code for
a finite mixture of K Gaussians where the prior distribution of the
membership % is the same across the K components, i.e. ! = ( =. . . = 0.
Make sure you clearly state all the assumptions you make.
[6 marks]

2 4 6 8 10
−2
40
−2
30
−2
20
−2
10
−2
00
−1
90
K
log
lik
eli
ho
od
Page 7 of 7

(e) A von-Mises Fisher distribution is a popular distribution to model
directional vectors i.e. norm one unit vectors: ∈ ℝ* and ‖‖( = 1 . Its
density can be written as: () = ∙ exp('),
where c is a normalisation constant and u is the parameter of the
distribution called mean vector ( itself is a norm one directional vector,
i.e. ∈ ℝ* and ‖‖( = 1). Its maximum likelihood estimator is: 12 = ∑ %)%$!d∑ %)%$! d(,
where || ∙ ||( denotes the ( norm: ‖‖( = f!( + (( + . . . + *(. Its weighted
ML estimator is: 3 = ∑ %%)%$!d∑ %%)%$! d( .
Write the E step and M step in pseudo code for an EM algorithm that fits a
finite mixture of von Mises Fisher distributions (of the given type). Make
sure you clearly state all the assumptions you have made.
[4 marks]
[Total marks 20]

*** END OF PAPER ***

欢迎咨询51作业君