程序辅导案例 > Program >

程序代写案例-G53MLE

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

G53MLE-E1
G53MLE-E1 Turn over

The University of Nottingham

SCHOOL OF COMPUTER SCIENCE

A LEVEL 3 MODULE, AUTUMN SEMESTER 2016-2017

MACHINE LEARNING (G53MLE)

Time allowed TWO hours

Candidates may complete the front cover of their answer book and sign their desk card but must
NOT write anything else until the start of the examination period is announced

Answer All 3 Questions

Dictionaries are not allowed with one exception. Those whose first language is not English may
use a standard translation dictionary to translate between that language and English provided
that neither language is the subject of this examination. Subject specific translation dictionaries
are not permitted.

Only silent, self-contained calculators with a Single-Line Display or Dual-Line Display are
permitted in this examination

No electronic devices capable of storing and retrieving text, including electronic dictionaries, may
be used.

DO NOT turn your examination paper over until instructed to do so

INFORMATION FOR INVIGILATORS:
- Please ensure that Exam Paper and Answer Book are both submitted

G53MLE-E1
G53MLE-E1

G53MLE-E1
G53MLE-E1

Question 1: Foundations of Machine Learning [overall 34 marks]

(a) (i) give the formula for a multivariate linear regressor with a 3rd degree polynomial and 2
variables. (ii) See the data in the table below of paired values x1 and x2. If the goal is to
predict x2 from x1, what type of linear regressor would be used here? Give its general
formula (you don’t need to find the actual values of any weights or parameters). (iii) Give
the formula for the generalised linear basis function. (iv) Why would you want to introduce
non-linear basis functions in linear regression?

x1 x2
0 2
5 29.5
10 107
15 234.5
20 412
25 639.5
30 917

(8 marks)

(b) Consider a Linear Discriminant Analysis classifier with two parameters to learn a weight for –
one weight per parameter. Two different ways of learning the parameters of this algorithm
are brute force search and gradient descent. Describe how each works by (i) giving
pseudocode and the formula for the update of gradient descent and (ii) pseudocode for brute
force search, clearly addressing the range of values tested. (iii) Explain for both gradient
descent and brute force search what the biggest drawback is in using them. (iv) Include a
sketch illustrating how gradient descent works for a quadratic error function. Include in your
sketch a visualisation of the termination criterion.

(14 marks)

(c) Explain why in Data Mining one is never able to rely purely on machine-measurable objective
functions. Use in your explanation the criteria for interesting patterns.

(6 marks)

(a) (i) Draw a set of 2-dimensional linearly separable data points with binary labels, and clearly
indicate the decision boundary found by a maximum-margin classifier, and at least one decision
boundary that does not obtain a maximum margin but still represents a perfect classifier. (ii)
Explain how it’s possible to obtain such a non-maximum margin solution, using the concept of a
loss function.

(6 marks)

G53MLE-E1
G53MLE-E1
Question 2: Artificial Neural Networks and Deep Learning [overall 33 marks]

Below is given a training set of eruption duration and time to next eruption for two type of geyser
eruptions of The Old Faithful. This will be used for question 2a.

(a) Draw a diagram of an ANN’s topology that can learn this pattern based on the given data,
using a single hidden layer with three steerable units plus any additional necessary
nodes. Name all relevant elements using indices that indicate source and target layer
numbers, and ensure you account for biases. Initialise all weights to 1.

(6 marks)

(b) Design a Deep-Learning architecture using Convolutional Neural Networks as one of a
number of components for the task of classifying images into images that contain cats
and images that don’t. Name the different layers/components of the network; explain
what function they have, and provide the cost function used in the output layer to inform
back-propagation. Use at least one convolutional layer, one dimensionality reduction
layer, and any other layers you deem necessary.

(12 marks)

(c) Explain how the ReLU revolutionised Deep Learning, by relating it to the concept of the
vanishing gradient. Sketch the activation function of the ReLU and compare this with at
least one other activation function to illustrate your explanation.

(11 marks)

(d) Consider a CNN that applies a single channel convolutional layer with a 3x3 kernel to a
10x10 input image, followed by a MaxPool layer. How many weighted connections come
out of this network?

(4 marks)

X1: Eruption duration
(min)
X2: Time to next eruption
(min)
Y: Eruption
Type
3.6 79 1
1.8 54 2
3.3 74 1
2.3 62 3
4.5 85 1
2.9 55 2
4.7 88 1
3.6 85 1
2.0 51 3
4.4 85 3
1.8 54 2
G53MLE-E1
G53MLE-E1
Question 3: Graphs, Dynamics, and Decision Trees (33 marks)

(a) In learning decision trees, one has to find for every node the query that maximises the
change in entropy, where this change in entropy is given by:

For the partially learned tree displayed below, with two classes named c1 and c2, calculate
the values of , and the change in entropy for the split made at the
root node. The numbers in each node indicate how many examples of that class there are.

(12 marks)

i(N) = i(N) PLi(NL) (1 PL)i(NR)
PL, i(NL), i(NR)
40 x c1
60 x c2
10 x c1
50 x c2
30 x c1
10 x c2
G53MLE-E1
G53MLE-E1

(b) Below is drawn a second tree, with the decision rules written in each node. For this tree, (i)
draw the decision boundaries, clearly indicating for each region what class is associated
with it. (ii) is this a monothetic tree or a polythetic tree? Briefly describe the difference. (iii)
How would you change the decisions in the nodes to attain a more shallow version?

(8 marks)

(c) Below are drawn two trees, with equivalent functionality (that is, every data point passed
through the two trees results in exactly the same classification). Argue whether it is
possible to choose one tree over the other, referring to fundamental Machine Learning
concepts where possible.

(6 marks)

x2>3
x1>3
noyes
yes no
x2>2
yes
yes
c1 c2
x1>2
c2
c1
c2
no
no
c1 c2
c2
c1
c2
c1 c2c1 c2
Tree B Tree A
G53MLE-E1
G53MLE-E1

(d) Below are given the three sets of sequence labels below of a 4-class problem (labels
belong either to class 0, 1, 2, or 3). (i) draw a state transition diagram for this problem.
Include the probabilities of all transitions. Do not draw zero-probability transitions. (ii)
provide the prior probabilities for each class.

[0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 3 3 3 0 0 0 0]
[0 0 0 1 1 1 2 2 2 2 2 2 3 3 3 3 3 0 0 0]
[0 0 0 1 1 1 2 2 1 1 1 2 2 2 3 3 3 3 0 0 0 ]

(7 marks)

END OF EXAM

欢迎咨询51作业君