Sample paper ECS708P Machine Learning Duration: 2 hours This is an open-book exam, which should be completed in approximately 2 hours. You can refer to textbooks, notes and online materials to facilitate your working, but normal referencing and plagiarism rules apply and you must cite any sources used. Calculators are permitted in this examination. Answer FOUR questions You MUST adhere to the word limits, where specified in the questions. Failing to do so will lead to those answers not being marked. YOU MUST COMPLETE THE EXAM ON YOUR OWN, WITHOUT CONSULTING OTHERS Examiners: Dr. Jesus Requena Carrion, Dr. Lin Wang © Queen Mary, University of London, 2019 Page 2 ECS708P Sample paper Question 1 a) In this question we explore regression in a problem involving two attributes x and y, where x is the predictor feature and y is the prediction. The dataset that we will use for training contains four samples and is shown in Table 1. x y 2 1 + 0.1×D1 4 5 + 0.1×D2 1 2 + 0.1×D3 3 2 + 0.1×D4 Table 1 In Table 1, D1, D2, D3 and D4 represent the last four digits of your student ID (D1 being the last, D2 the second last, etc). Before continuing, calculate the numerical value of the predictor y for each sample (for instance, if D1 = 1, then 1 + 0.1×D1 = 1.1). The coefficients of the Minimum Mean Square Error (MMSE) solution of a simple linear model can be obtained as = (!)"#! where is the design matrix and is the prediction vector. i) Obtain the MMSE coefficients of the simple linear model y = w0 + w1x. You can use the following intermediate result: (!)"# = ( 1.5 −0.5−0.5 0.2 / ii) Calculate the training Mean Square Error (MSE) of the MMSE solution (Use w0 = 0 and w1 = 1 if you did not obtain the MMSE solution). [15 marks] b) Consider the cubic model y = w0 + w1x + w2x2 + w3x3 for the dataset in Table1. i) What would you expect the training MSE of this model to be? ii) Assuming that the true model is y = x + n, where n is zero-mean Gaussian noise, identify the main sources of error in the prediction of this cubic model during deployment. [10 marks] ECS708P Sample paper Page 3 Question 2 a) In this question we explore classification in a problem involving two predictor features xA and xB, and two classes, namely 〇 (positive class) and × (negative class). The dataset that will be used in this question is shown in Figure 1. Figure 1 Consider a family of linear classifiers defined by ! = 0, where = [", #, $]! and = [1, #, $]!. A sample such that ! > 0 will be labelled as 〇, otherwise it will be labelled as ×. Given the linear classifier defined by the coefficients $ = 0, # = 1 and " = 0.25 × , where is the last digit of your student ID: i) Obtain the classifier’s decision regions. ii) Obtain the classifier’s confusion matrix for the dataset shown in Figure 1 and identify its sensitivity and specificity. [10 marks] b) We now want to build a Linear Discriminant Analysis classifier that uses as predictor feature xA. i) Obtain the priors for each class, namely P(〇) and P(×), and the means of the distributions P(xA | 〇) and P(xA | ×). ii) Describe the corresponding Bayes classifier. iii) If the standard deviations of P(xA | 〇) and P(xA | ×) are equal, how would a sample such that xA = -0.5 be classified? -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 xA -3 -2 -1 0 1 2 3 x B Page 4 ECS708P Sample paper [15 marks] Question 3 a) This question concerns model optimisation in machine learning. The k-means algorithm is said to converge to local minima, rather than to the global minimum. (i) Use the notion of error function to explain the concepts of local minimum and global minimum. (ii) Explain what is meant by the statement the k-means algorithm converges to a local minimum. (iii) Considering the risk of converging to a local minimum, design a strategy that would improve the solution provided by the k-means algorithm. [15 marks] b) The validation-set approach allows one to evaluate the performance of different models during model selection. (i) After applying a validation-set approach, the validation errors of two models f1 and f2 are found to be respectively E1 = 10 and E2 = 12. How would you use this result to inform your selection of either f1 or f2? (ii) Due to the low number of samples in the available dataset, it is suggested that the whole dataset should be used for training models f1 and f2 and both models should be compared based on their training errors. What is your view on this suggestion? [10 marks] Page 5 ECS708P Sample paper Question 4 a) This question concerns neural networks. i) Which layers offer greater flexibility, fully-connected layers or convolutional layers? ii) Why are convolutional networks suitable for time series and image data? iii) The number of feature maps in convolutional architectures usually increase as we move closer to the output layer. What is the idea behind this design? [15 marks] b) Consider a dataset consisting of grayscale images of size 100 x 100 pixels and a binary label. A deep neural network combining convolutional, pooling and fully- connected layers is chosen for building a classifier for this dataset. i) The first hidden layer is a convolutional one and consists of two 100 x 100 feature maps. Each map is obtained by applying a different filter of dimensions 3 x 3. How many parameters need to be trained in the first layer? ii) The second layer is a 2x2 max-pooling layer. How many feature maps does this layer have and what are their dimensions? iii) The third hidden layer is also convolutional and consists of 8 feature maps defined by filters of dimensions 3 x 3 x D. What is the value of D? [10 marks] End of Paper
欢迎咨询51作业君