辅导案例-CSCI433/933
CSCI433/933: Machine Learning Algorithms and Applications Assignment Problem Set III (Individual) Part A: General knowledge questions - 15 Marks 1. In what way are regression and neural networks models similar? 2. Why is a deep neural network better than a shallow network? 3. What is the difference between a convolutional neural network (CNN) and a recurrent neural network (RNN)? 4. Explain the principles of back propagation algorithm. 5. Explain the concepts of over-fitting and under-fitting 6. Explain the effects of ridge and LASSO regularizations as used in regression model. Part B: Design and Programming - 50 + 10 bonus Marks Aims and Objectives This assignment aims at evaluating basic familiarity with fundamental concepts and implementa- tion of deep neural networks and statistical machine learning. On completion of this assignment, you should be able to demonstrate basic mastery of: • concepts of deep autoencoder, feature extraction, SVM, convolutional autoencoder; • implementation of machine learning algorithms using Tensorflow, Keras, SkLearn. Introduction Autoencoder is a popular unsupervised learning technique used to learn data representations. Specifically, a neural network architecture is designed such that we impose a bottleneck in the network which forces a compressed knowledge representation of the original input. To build an autoencoder, we need three components: an encoder to compress the data, a decoder to decompress the data, and a loss function to measure the data reconstruction error. There are different types of autoencoder models with the variation of the encoder/decoder architectures. This assignment will focus mainly on two popular autoencoder models, which are deep fully-connected autoencoder and deep convolutional autoencoder. 1 What needs to be done 1. Implement a multi-layer fully-connected autoencoder using Tensorflow and Keras (15 + 5 bonus Marks): • Load fashion mnist using keras. Fashion mnist is a dataset of article images-consisting of a training set of 60,000 examples and a test of 10,000 examples. Each image is a 28×28 grayscale image, associated with a label from 10 classes. Change the data type of each image as float32 and normalize the pixel values to [0, 1]. Hint: x train, x test = x train.astype(’float32’)/255.0, x test.astype(’float32’)/255.0. • Use Tensorflow and keras to implement a six-layer fully-connected autoencoder based on fashion mnist dataset. The encoder and decoder of the autoencoder both consists of three layers. Each input image is flattened into dimensionality of 784. The three encoder layers are with output dimensionality of 128, 64, 32. The three decoder layers are with output dimensionality of 64, 128, 784. After each of the six layers, nonlinear function ReLU is applied. • Train the network using mean squared error as loss function and Adam as optimizer. The batch size should be set to 256. Train the network for 30 epochs. Hint: for each epoch, the training data should be randomly shuffled. • Print out the training error and testing error for each epoch. Randomly choose two test images, display and compare the original images and reconstructed images. • Bonus (5 marks): Try to increase the depth (more layers) and width (higher dimensional hidden layers) of the autoencoder and monitor the change of training and testing losses. 2. Train SVM classifier based on the image representations extracted from the above autoencoder (15 marks): • Once the above fully-connected autoencoder is trained, for each image, extract the 32- dimensional hidden vector (the output of the ReLU after the third encoder layer) as the image representation. • Train a linear SVM classifier on the training images of fashion mnist based on the 32- dimensional features. Tune the hyper-parameter ’C’ using cross-validation. Print out the training accuracy. SkLearn is recommended. • Test the trained SVM on the test images of fashion mnist. Print out the testing accuracy. • Try a kernel-based SVM and compare the performance with linear SVM. 3. Train deep convolutional autoencoder using Tensorflow and Keras (10 + 5 bonus Marks): • Load and pre-process fashion mnist as in fully-connected autoencoder. • Train a six-layer convolutional autoencoder. Different from fully-connected autoencoder, each encoder and decoder layer is now a convolutional layer instead of fully-connected layer. Again, the autoencoder consists of three encoder layers and three decoder layers. Each input image to the autoencoder is now 3D (28 × 28 × 1). For each convolutional layer, use convolutional kernel of 3 × 3, stride = 1, padding=’same’. After the first and second encoder convolutional layers, use (2, 2) max pooling to downsample the feature maps and after the first and second decoder convolutional layers, use (2, 2) upsampling2d operation to upsample the feature maps. Choose proper number of neurons for each convolutional layer, e.g. 16, 24. Each convolutional layer uses activation function ReLu. • Train the network using mean squared error as loss function and Adam as optimizer. The batch size should be set to 256. Train the network for 15 epochs. Hint: for each epoch, the training data should be randomly shuffled. 2 • Print out the training error and testing error for each epoch. Randomly choose two test images, display and compare the original images and reconstructed images. • Bonus (5 marks): Implement denoisy convolutional autoencoder using the same architecture as above. 4. Write a report of no more than three (3) pages to illustrate the experiments as well as your conclusions (10 Marks). By default, after reading your report, others should be able to reproduce your experiments. Part C: Numerical/Analytical - 35 Marks A company manufactures personal protective equipment (PPE) for use in hospitals and personal use. The company has two manufacturing facilities (factories) from which these PPE are made before they are transported to a warehouse and packed for export. After a few consignments were delivered, customers complained about defective equipment. The PPE are made from several components that could contribute to malfunctioning. It is desired to design a classifier that can identify which of the two factories produced a given defective PPE. Let the components of the PPE be available and measurable/testable. They can be be modelled as d conditionally independent binary-valued features, x = (x1, . . . , xd) t, where the components xi are either 0 or 1. The two factories are modelled as classes ω1 and ω2. Suppose the company has information about the reliability of the factories regarding each component. This information can be modelled as pi = Pr[xi = 1|ω1] qi = Pr[xi = 1|ω2], where pi and qi are respectively the probabilities of each factory making a non-defective component. Each feature thus gives us a yes/no answer about the pattern. Furthermore, if pi > qi, we expect the i th feature to give a “yes” answer more frequently when the state of nature is ω1 than when it is ω2. Now, the assumption of conditional independence allows the class conditional probabilities to be written as products of the components of x. For example we can write: P (x|ω1) = d∏ i=1 pxii (1− pi)1−xi (1) for the class ω1. The likelihood ratio is written as P (x|ω1) P (x|ω2) . The discriminant function for each class can be written equivalently (i.e. they will produce the same result) as any one of the following three equations: gi(x) = P (ωi|x) = p(x|ωi)P (ωi)∑c j=1 p(x|ωj)P (ωj) (2) gi(x) = p(x|ωi)P (ωi) gi(x) = ln p(x|ωi) + lnP (ωi) (since natural logarithm is a monotonic function.) where (ln) is the natural logarithm and c is the number of classes. For this binary classification problem we can combine the two discriminant functions and write g(x) ≡ g1(x)− g2(x) (3) and then decide class ω1 if g(x) > 0; class ω2 if g(x) ≤ 0. 1. Using Equations (1), (2 ) and (3) derive a discriminant function of the form g(x) = d∑ i=1 wixi + x0 (4) 3 where wi = ln pi(1− qi) qi(1− pi) i = 1, . . . , d and w0 = d∑ i=1 ln 1− pi 1− qi + ln P (ω1) P (ω2) and wi is a weight; w0 is the bias. 2. What is the relevance of the weight in determining a classification? 3. Assuming there are only three components in making a PPE, and P (ω1) = P (ω2) = 0.5, pi = 0.8, qi = 0.5 for i = 1, 2, 3, compute the values of the weights wi and threshold w0. 4. Using the values of wi and w0 obtained in the last question, determine the factory (ω1 or ω2) from which a PPE with feature (1, 0, 1) was manufactured. Part D: Submission instructions Submission: 1. Prepare your response to Part A in a PDF file of no more than 2 pages. 2. Prepare three executable python files, i.e. deep autoencoder.py, svm.py, conv autoencoder.py. All the implementations should be based on python3. 3. Prepare a PDF file containing your 3-page report. 4. Prepare your response to Part C in a PDF file of no more than 3 pages. 5. These files should be compresssed as a ZIP (or RAR) file and submitted on Moodle on or before the due date and time. 4