Statistical Learning-Classification STAT 841 / 441, CM 763 Assignment 2 Department of Statistics and Actuarial Science University of Waterloo Policy on Lateness: No assignment are accepted after the due date. 1. Face detection: Download the faces.mat from the course webpage. The file faces.mat is composed of train faces, train nonfaces, test faces, and test nonfaces. Make a training and a test set as follows: training_data=[train_faces’ train_nonfaces’]; % (This will be a 361 by 4858 matrix.) test_data=[test_faces’ test_nonfaces’]; % (This will be a 361 by 944 matrix.) • Write a program to fit a logistic regression model to the training data. Report the first 5 components of the optimum value of the logistic parameter , as well as the training error and the test error. Note : Attach your code to your assignment as an appendix, and submit the code to the assignment drop box in Learn as well. 2. Download the Ionosphere dataset from the course webpage : a) Write a program to fit a single hidden layer neural network via back-propagation and weight decay. You cannot use builtin functions or Deep Network frameworks (e.g. Keras, PyTorch, etc). You need to implement backpropagation yourself. b) Apply your program in part a) to the data . Chose Ion.test as the test set, and Ion.trin as the training set. Plot the training and test error curves as a function of the number of epochs for four di↵erent values of the weight decay parameter. Discuss the overfitting behavior in each case. c) Set the value of weight decay equal to zero, then vary the number of hidden units in the network (starting from 1 unit, and determine the minimum number needed to perform well for this task. Plot the training and test error curves as a function of the number of hidden units. 1 d) Select the best model (the optimum number of hidden nodes or the best value for weight decay) and classify the test data using the network and report the observed misclassification error rate. Construct a 2 by 2 table of the form hˆ(x) = 0 hˆ(x) = 1 y = 0 ? ? y = 1 ? ? Note 1: Attach your code to your assignment as an appendix and submit the code to the assignment drop box in D2L as well. 3. In a maximum likelihood problem, we can define an error function by taking the nega- tive logarithm of the likelihood. Show that the error function for the logistic regression model is a convex function of , and hence show that it has a unique minimum value. Only for Grad Students 4. Consider a multiclass logistic regression model (multilogit model) applied to d-dimensional data with K classes. Let be the (d + 1)(K 1)-vector consisting of all the coe - cients. Define a suitably enlarged version of the input vector x to accommodate this vectorized coe cient vector. Derive the Newton-Raphson algorithm for maximizing the log-likelihood, and describe how you would implement this algorithm. 5. Consider a classification model for two classes with prior class probabilities ⇡k, k = 1, 2. Suppose that the class-conditional densities are given by Gaussian distributions with a shared covariance matrix. Suppose we are given a training data set {(xi, yi)} where i = 1 . . . n, and y 2 {0, 1} are class labels. Assume that the data points are drawn independently from this model. a) Compute the maximum-likelihood estimation for the prior probabilities. b) Compute the maximum-likelihood estimation for the mean of the Gaussian dis- tribution for each class. c) Compute the maximum-likelihood estimation for the shared covariance matrix. 2
欢迎咨询51作业君