ICP3083 Page 1 of 6 Module: ICP3083 Department: School of Computer Science and Electronic Engineering Module credit: 20 Organiser: Prof L I Kuncheva Date out: 20 October 2020 Assessment 2 Deadline: 10th December 2020 (5pm) Preliminaries (1) Total number of points: 100 (2) Contribution of this assessment: 20% of the total mark for the module (3) Submission procedure. Submit a zip file with your solutions. The file should contain one pdf file - your Report, and a file with your MATLAB code. Include output from your MATLAB code in the Report as appropriate. Include snippets of your MATLAB code as answers to the questions, where appropriate. Just paste the code from the MATLAB Editor in a Word file. This way it will stay formatted and with the right font, e.g.: %% Practice 1 ------------------------------------------------------------- a = randn(1,7) figure, hold on plot(a,zeros(1,7),'k.','markersize',15), grid on This is tidy and easy to read. I will need your MATLAB code separately so that I can run it without having to assemble it from within your Report. (4) Plagiarism and Unfair Practice Plagiarised work or unfairly sourced work will be given a mark of zero. Remember, when you submit your assignment, you agree to the following statement: This piece of work is a result of my own work except where it is a group assignment for which approved collaboration has been granted. Material from the work of others (from a book, a journal or the Web) used in this assignment has been acknowledged and quotations and paraphrasing suitably indicated. I appreciate that to imply that such work is mine, could lead to a nil mark, failing the module or being excluded from the University. I also testify that no substantial part of this work has been previously submitted for assessment. (5) Late Submission Work submitted within a week after the stated deadline will be marked but the mark will be capped at 40%. Work submitted thereafter will be marked but the mark will not be taken forward towards the module mark. (6) Marking Scheme Full points are given to a complete and well documented solution. Points can be taken off for inaccurate or incomplete solution, inappropriate method, formatting flaws, etc. ICP3083 Page 2 of 6 (7) Feedback details The marks will be returned within 2 weeks of the date of the latest allowed submission. The assignment solutions will be posted on Blackboard. Tasks Part I. Clustering 1. Single linkage. Figure 1 shows a data set with 9 points. Figure 1. Data for problem 1. (a) Apply the single linkage clustering algorithm to this data. (Note: You don’t have to write MATLAB code for this part. Do the clustering by hand.) Give the steps in the following format: [7] (b) Plot the criterion function (the distance) and indicate the place of the largest jump. Subsequently, propose a number of clusters for this data set. (You don’t have to use MATLAB. You can plot your graphs with Excel or any other graphing software. However, handwritten solutions will score fewer marks for appearance.) [5] Iteration # Clusters Points joined # Clusters Distance Jump 1 1,2,3,4,5,6,7,8,9 - 9 0 - … ICP3083 Page 3 of 6 (c) Finally, re-plot the data and mark the clusters with different markers. (You don’t have to use MATLAB. However, as above, handwritten solutions will score fewer marks for appearance.) [5] 2. k-means Table 1 shows an implementation of the k-means algorithm in MATLAB. Table 1. k_means function. function a = k_means(c,q) y = size(q,1); g = randperm(y,c); l = q(g,:); f = 1; while f t = l; for i = 1:y m = q(i,:); u = sum((l - m).^2,2); [~,a(i,1)] = min(u); end for i = 1:c l(i,:) = mean(q(a == i,:),1); end f = ~all(t(:) == l(:)); end end (a) Explain the content of the following variables in the context of the k-means algorithm and give the size of each variable assuming that the function is called using a data set with N objects and n features: [13] a, c, q, y, g, l, f, t, m, u. (b) There are two “something,1)” in the code. Explain what these “,1)” are for. [6] (c) Carry out one iteration of the k-means algorithm on the data in Figure 1 assuming that the means are at points 2 and 6. Format your answer as shown below: Iteration 1 Old means: … Clusters: … Je value: … New means: … You don’t need to write MATLAB code for this problem if you don’t want to. However, you are expected to show your calculations. You can either write down the calculations or, if you did use MATLAB, the code that does the calculations. [9] (d) Modify the code in Table 1 and explore kmeans for the data set in Figure 1 for two clusters. To do this, run k-means for all possible pairs of points including point 1, as the initial means. Check how many ICP3083 Page 4 of 6 times the final clusters are the same (return the same means). Tally the results and present them in a bar chart. Give a comment. You don’t have to write MATLAB code for the checking part. Instead, you can print the means in the command window and count the occurrences. [7] 3. Multi-layer perceptron Figure 2 shows an MLP configuration with two neurons at the input layer, two at the hidden layer, and three at the output layer. The MLP is used as a classifier, where the three neurons at the output layer give the values of three discriminant functions. All neurons at the hidden and the output layers use the sigmoid activation function. Figure 2. MLP configuration. (a) Write a MATLAB function to calculate the output of the MLP NN for a given input [, ]. Make sure that the calculation will work if the input is a matrix of size × 2, not only a single point. (Note: You can hard-code the weights in this case.) Demonstrate the work of your code by labelling point P(2,-4). [10] (b) Write MATLAB code to show the classification regions of the MLP classifier for ∈ [−2,5] and ∈ [−2,5]. The expected output is shown in Figure 3. [4] ICP3083 Page 5 of 6 Figure 3. Expected output for 3 (b) (c) Figure 4 shows an MLP configuration. Figure 4. An MLP configuration The outputs represent the discriminant functions for three classes. There are 12 neurons in the first hidden layer and 3 neurons in the second hidden layer. Both hidden layers, as well as the output layer, use a bias. ICP3083 Page 6 of 6 Calculate the total number of parameters of this MLP (number of weights to train). Explain your calculation. [6] (d) Derive the formula for calculating the total number of weights of an MLP NN with layers, of which one is input, − 2 are hidden and one is output. There are neurons at layer , where = 1, … , . Give two versions of the formula: without bias nodes and with bias nodes. [4] 4. Radial Basis Function NNs (RBF) and Vector quantisation (VQ) (a) Consider an RBF NN with 30 hidden neurons and one output neuron as a classifier into two classes. The output neuron uses a sigmoid activation and has no bias input. In order to assign a class label to a point (, ), round the output of the NN. The data is 2D. All hidden neurons have spread = 0.04. Use the code from Lab 5 (generate_two_spirals_data.m) to generate a two-spirals data set (500 points, noise 0.02, 3 revolutions). Subsequently, use the Monte Carlo method to train your RBF NN to classify the two spiral data. The Monte Carlo method means that you generate a large number (say, 10,000) of centres and weights from hidden to output layer, and choose the parameters which give you the smallest classification error (resubstitution). You can generate both the centres and the weights using “randn”. Plot a histogram of the errors obtained during the Monte Carlo run and show the smallest error in the title of the figure. [13] (b) In SOM and VQ NNs, the learning rate alpha () decreases progressively with the training epochs. Prepare a plot to demonstrate this decrease if after every epoch we update as ← × where both and are values between 0 and 1 (typically between 0.7 and 0.999). Show graphs for different staring values of and different . [9]
欢迎咨询51作业君