辅导案例-MTH6101
MTH6101 Introduction to Machine Learning Coursework four: submit electronically at the latest 09.00 (GMT) hours on Thursday 30th April 2020 . Read carefully the following instructions: • Coursework is to be submitted individually, in the equivalent of a single A4 page which could be written on both sides so the maximum is two sides of A4. Submission will be by email, details of how to submit to be clarified closer to the submission date. • Even if it is an electronic document, write clearly your name and student number on the top of the front side of your coursework. • You are asked to submit answers depending on the LAST digit of your student number. Make sure you submit the answer the correct question, as submitting an answer to a question not allocated to you will lead to zero marks. If in doubt, ask the Module organizer which is your question. • You will perform some computations numerically using R and your submission could be written using markdown, typed with a word processor or even done by hand. • Strictly speaking, all the activities described can be done without running R com- mands, after all I have already trained the models. However, I give all the necessary data in case you want to run analyses, but R is not strictly needed. • You are expected to only include relevant material to the question, and anything you put must be there for a reason. You are not to include raw R output. • This and each coursework contributes to the Module mark, so polish what you submit. Plagiarism will be punished. Description of activities 1. Classification The following are 0/1 classification data in variable y and associated variables x1 and x2. The data are given below in training and testing/validation splits. With the training data, three models were fitted: M1 logistic classifier, for which the coefficients of the linear predictor are given; M2 classification tree, for which predicted classes are given; and M3 K-nearest neighbors, for which also predicted classes are given. For model M1, compute predicted probabilities at each observation in the val- idation data. By thresholding, compute confusion matrices and performance 1 figures FPR and TPR. Summarize your results of M1 with a plot of the ROC curve of M1. Mark in the ROC curve the point for threshold equal to 1/2. For each of models M2 and M3, compute performance figures FPR and TPR and add the corresponding points into the ROC curve for model M1. 2. Regression The following data are measurements obtained in a series of experiments of a reaction rate y in terms of variables x1, x2 et al. Lasso was fitted to these data, and a table of coefficients given. Complete the given table of coefficients by determining at each breakpoint the proportion of shrinkage ||βˆL(λ)||1/maxλ ||βˆL(λ)||1. With this information, produce a plot with the lasso path (coefficients vs shrinkage proportion). In your plot, clearly label the path for each variable; also your plot needs to carefully consider a suitable scale for the vertical axis. 3. Report For the classification data, report the fitted “predicted” probabilities for model M1. For all models, report performance figures FPR and TPR for each model. Include the ROC curve with the added points (FPR,TPR) and briefly com- ment and compare classifiers. For the regression problem, report the augmented table of coefficients and the plot of the lasso path. Grading: Classification 50%, Regression 35%, Presentation 15%; Total 100%. Data sets ID ending in 0. Classification data, training and validation results. Training data x1 x2 y -3 -1 0 -2.2941 -1 0 -1.5882 -1 0 -0.8824 -1 1 -0.1765 -1 1 0.5294 1 1 1.2353 1 0 1.9412 1 1 2.6471 1 1 Test/validation data Predictions x1 x2 y M2 M3 -2.6471 -1 0 1 0 -1.9412 -1 1 1 0 -1.2353 -1 0 1 0 -0.5294 -1 0 1 1 0.1765 1 1 1 1 0.8824 1 1 1 1 1.5882 1 1 1 1 2.2941 1 1 1 1 3 1 1 1 1 Training results from model M1: βˆ0 = 0.4945, βˆ1 = 2.3612, βˆ2 = −2.3958. 2 ID ending in 0. Regression data and training results. Training data x1 x2 x3 x4 x5 y 0 1 0 1 1 0 1 0 0 1 0 0 0 0 1 1 0 1 0 1 1 0 1 -1 0 -1 0 1 -1 1 -1 -1 -1 -1 1 0 0 -1 0 1 -1 -1 0 1 1 1 0 1 0 -1 1 0 -1 1 1 0 1 1 -1 -1 1 -1 1 0 -1 1 1 1 -1 0 -1 -1 0 1 -1 -1 -1 0 -3 0 -3 -5 4 -1 Lasso results lambda x1 x2 x3 x4 x5 6 0 0 0 0 0 3.1351 0 0 0.1892 -0.027 0 2.7514 0 0 0.2144 -0.038 -0.0114 0.8008 0 -0.1816 0.314 -0.0467 -0.0231 0.4059 -0.2529 -0.1579 0.3595 0 -0.1343 0.3358 -0.4851 -0.1306 0.4366 0 -0.2575 0 -0.7002 -0.1104 0.4753 0.0397 -0.3521 ID ending in 1. Classification data, training and validation results. Training data x1 x2 y -2 -1 0 -1.5294 -1 0 -1.0588 -1 1 -0.5882 -1 1 -0.1176 -1 0 0.3529 1 1 0.8235 1 0 1.2941 1 1 1.7647 1 1 Test/validation data Predictions x1 x2 y M2 M3 -1.7647 -1 1 1 0 -1.2941 -1 1 1 1 -0.8235 -1 0 1 1 -0.3529 -1 1 1 1 0.1176 1 1 1 1 0.5882 1 1 1 1 1.0588 1 0 1 1 1.5294 1 1 1 1 2 1 1 1 1 Training results from model M1: βˆ0 = 0.3592, βˆ1 = 1.0108, βˆ2 = −0.261. 3 ID ending in 1. Regression data and training results. Training data x1 x2 x3 x4 x5 y -1 -1 1 -1 1 -1 1 0 1 -1 0 1 0 0 -1 1 -1 -1 0 0 1 1 0 0 0 1 -1 1 0 0 1 0 1 -1 0 1 0 -1 -1 -1 1 -1 1 0 -1 1 1 -1 1 0 -1 -1 0 -1 1 -1 1 1 1 1 1 -1 0 0 0 -1 0 0 0 0 0 1 -5 3 0 0 -3 2 Lasso results lambda x1 x2 x3 x4 x5 9 0 0 0 0 0 7 -0.0625 0 0 0 0 6.1429 -0.0536 0.0714 0 0 0 4.4286 0 0.2857 0.2143 0 0 0.7561 0 0.6098 0.6463 0 0 0.1111 0.1181 0.8056 0.75 0 0 0.0588 0.1287 0.8235 0.7574 -0.0074 0 0 0.1362 0.8632 0.7665 -0.0165 0.0283 ID ending in 2. Classification data, training and validation results. Training data x1 x2 y -8 -1 0 -6.3158 -1 0 -4.6316 -1 1 -2.9474 -1 0 -1.2632 -1 1 0.4211 1 1 2.1053 1 0 3.7895 1 1 5.4737 1 1 7.1579 1 1 Test/validation data Predictions x1 x2 y M2 M3 -7.1579 -1 0 0 0 -5.4737 -1 0 0 0 -3.7895 -1 0 0 1 -2.1053 -1 1 0 1 -0.4211 -1 1 0 1 1.2632 1 1 1 1 2.9474 1 1 1 1 4.6316 1 1 1 1 6.3158 1 1 1 1 8 1 1 1 1 4 Training results from model M1: βˆ0 = 0.8802, βˆ1 = 0.5544, βˆ2 = −1.1189. ID ending in 2. Regression data and training results. Training data x1 x2 x3 x4 y -1 -1 1 0 -1 -1 0 -1 1 1 0 -1 -1 1 0 0 -1 -1 -1 -1 0 -1 1 -1 -1 -1 0 1 1 -1 -1 -1 -1 1 0 1 -1 1 -1 -1 0 0 0 -1 1 0 0 1 0 1 3 6 -1 0 2 Lasso results lambda x1 x2 x3 x4 16 0 0 0 0 3.1667 0 0.3056 0 0 1.0882 0 0.3272 -0.1949 0 0.5879 0 0.3335 -0.2261 0.05 0.1044 -0.1648 0.4138 -0.2742 0 0.0169 -0.1874 0.4252 -0.2806 0 0 -0.2004 0.4312 -0.2846 -0.0108 ID ending in 3. Classification data, training and validation results. Training data x1 x2 y -2 -1 0 -1.5789 -1 0 -1.1579 -1 1 -0.7368 -1 1 -0.3158 -1 1 0.1053 1 1 0.5263 1 1 0.9474 1 0 1.3684 1 1 1.7895 1 1 Test/validation data Predictions x1 x2 y M2 M3 -1.7895 -1 0 1 0 -1.3684 -1 1 1 1 -0.9474 -1 0 1 1 -0.5263 -1 0 1 1 -0.1053 -1 1 1 1 0.3158 1 0 1 1 0.7368 1 1 1 1 1.1579 1 1 1 1 1.5789 1 0 1 1 2 1 1 1 1 Training results from model M1: βˆ0 = 1.4488, βˆ1 = 2.2175, βˆ2 = −1.6875. 5 ID ending in 3. Regression data and training results. Training data x1 x2 x3 x4 x5 y -1 1 0 1 -1 1 -1 0 1 0 -1 1 1 0 -1 0 0 -1 1 1 1 -1 0 -1 -1 1 -1 -1 -1 -1 -1 0 0 0 0 1 1 0 0 -1 1 1 -1 0 -1 -1 1 1 0 1 1 1 1 0 1 -1 1 0 -1 -1 1 0 0 -1 0 -1 1 -1 0 1 0 -1 0 -1 -1 0 1 0 -1 -1 0 2 0 1 Lasso results lambda x1 x2 x3 x4 x5 8 0 0 0 0 0 2.8571 -0.4286 0 0 0 0 1.5429 -0.5714 0 0 0 0.2 1.36 -0.6 0 0.04 0 0.24 1.0587 -0.6443 0 0.1029 0.009 0.3056 0.8609 -0.6854 -0.0397 0.1556 0 0.3477 0.1379 -0.8224 -0.1676 0.3346 0 0.5011 0 -0.8647 -0.2127 0.3853 -0.0393 0.531 6 ID ending in 4. Classification data, training and validation results. Training data x1 x2 y -7 -1 0 -5.5263 -1 0 -4.0526 -1 0 -2.5789 -1 0 -1.1053 -1 0 0.3684 1 1 1.8421 1 1 3.3158 1 1 4.7895 1 0 6.2632 1 1 Test/validation data Predictions x1 x2 y M2 M3 -6.2632 -1 0 0 0 -4.7895 -1 0 0 0 -3.3158 -1 0 0 0 -1.8421 -1 0 0 0 -0.3684 -1 0 1 0 1.1053 1 0 1 1 2.5789 1 1 1 1 4.0526 1 1 1 1 5.5263 1 1 1 1 7 1 1 1 1 Training results from model M1: βˆ0 = −9.8016, βˆ1 = −0.4954, βˆ2 = 13.1315. 7 ID ending in 4. Regression data and training results. Training data x1 x2 x3 x4 x5 y 1 1 0 -1 1 0 0 1 -1 0 0 -1 0 0 1 -1 0 1 1 0 0 1 0 -1 1 1 1 -1 1 0 1 1 -1 1 1 0 -1 0 0 0 1 1 1 -1 1 0 1 1 1 0 0 1 1 -1 1 0 0 0 1 0 0 1 -1 1 1 -1 1 0 1 -1 1 -1 0 0 -1 0 1 0 -7 -4 0 0 -10 2 Lasso results lambda x1 x2 x3 x4 x5 21 0 0 0 0 0 8.0588 0 0 0 0 -0.1176 4.486 -0.2011 0 0 0 -0.0112 4.3273 -0.2091 -0.0182 0 0 0 3.0145 -0.1812 -0.1159 0 0 0 2.9696 -0.1824 -0.1155 0.0061 0 0 0.992 -0.1738 -0.2043 0.1 -0.1885 0 0 -0.5022 -0.399 0.1913 -0.2555 0.312 ID ending in 5. Classification data, training and validation results. Training data x1 x2 y -3 -1 0 -2.2 -1 0 -1.4 -1 0 -0.6 -1 1 0.2 1 0 1 1 1 1.8 1 1 2.6 1 1 Test/validation data Predictions x1 x2 y M2 M3 -2.6 -1 0 1 0 -1.8 -1 0 0 0 -1 -1 0 1 0 -0.2 -1 1 1 0 0.6 1 1 0 1 1.4 1 1 0 1 2.2 1 1 0 1 3 1 1 0 1 Training results from model M1: βˆ0 = 12.1138, βˆ1 = 60.5692, βˆ2 = −48.6892. 8 ID ending in 5. Regression data and training results. Training data x1 x2 x3 x4 x5 y 0 0 1 1 -1 0 0 0 1 0 0 -1 0 0 0 -1 1 1 1 0 -1 -1 0 -1 0 0 1 -1 0 1 0 1 -1 -1 1 1 1 0 0 -1 1 1 1 1 -1 1 -1 -1 1 0 0 1 -1 0 1 0 0 -1 0 -1 -1 0 1 -1 1 1 -4 -2 -1 4 -1 -1 Lasso results lambda x1 x2 x3 x4 x5 8 0 0 0 0 0 4.5333 0 0 0 -0.1333 0 2.3929 0 0 0 -0.0595 0.369 1.5 0 0 0.119 0 0.5476 0.7347 0 0 0.2041 0 0.6327 0.6303 0 0.0148 0.2162 0 0.6405 0.323 -0.0952 0.206 0.2654 0 0.6487 0 -0.0218 0.4304 0.4018 0.3012 1.0117 ID ending in 6. Classification data, training and validation results. Training data x1 x2 y -4 -1 0 -3.1579 -1 0 -2.3158 -1 0 -1.4737 -1 0 -0.6316 -1 1 0.2105 1 0 1.0526 1 1 1.8947 1 1 2.7368 1 1 3.5789 1 1 Test/validation data Predictions x1 x2 y M2 M3 -3.5789 -1 0 0 0 -2.7368 -1 0 0 0 -1.8947 -1 0 0 0 -1.0526 -1 0 0 0 -0.2105 -1 1 1 0 0.6316 1 0 1 1 1.4737 1 1 1 1 2.3158 1 1 1 1 3.1579 1 1 1 1 4 1 1 1 1 Training results from model M1: βˆ0 = 11.7791, βˆ1 = 55.9443, βˆ2 = −47.3233. 9 ID ending in 6. Regression data and training results. Training data x1 x2 x3 x4 x5 y 0 -1 0 0 0 -1 -1 1 -1 -1 1 -1 1 0 -1 1 -1 0 1 1 1 0 1 0 -1 0 1 1 1 1 -1 1 -1 -1 -1 1 -1 0 0 0 -1 -1 0 0 1 1 -1 0 0 1 -1 0 -1 1 1 -1 1 1 -1 1 1 0 1 0 0 -1 0 1 -1 -1 -1 0 1 1 1 -1 1 1 -1 -4 -1 0 3 -1 Lasso results lambda x1 x2 x3 x4 x5 6 0 0 0 0 0 2.3077 0 0.1538 0 0 0 1.8512 0 0.1653 0 0 -0.0165 1.7632 0 0.1668 0.0075 0 -0.0205 0.7382 0 0.2492 0.0279 0.172 -0.0102 0.1204 0.03 0.2972 0.028 0.2742 0 0.0094 0.035 0.3047 0.0289 0.291 0 0 0.0358 0.3065 0.028 0.2941 0.002 ID ending in 7. Classification data, training and validation results. Training data x1 x2 y -7 -1 0 -5.3529 -1 1 -3.7059 -1 0 -2.0588 -1 1 -0.4118 -1 1 1.2353 1 0 2.8824 1 1 4.5294 1 1 6.1765 1 1 Test/validation data Predictions x1 x2 y M2 M3 -6.1765 -1 0 1 0 -4.5294 -1 0 1 1 -2.8824 -1 0 1 1 -1.2353 -1 1 1 1 0.4118 1 0 1 1 2.0588 1 1 1 1 3.7059 1 1 1 1 5.3529 1 1 1 1 7 1 1 1 1 10 Training results from model M1: βˆ0 = 1.3543, βˆ1 = 1.012, βˆ2 = −3.2558. ID ending in 7. Regression data and training results. Training data x1 x2 x3 x4 x5 y 0 0 0 1 -1 1 -1 1 0 0 -1 1 -1 0 0 0 0 0 1 -1 0 1 1 1 1 1 -1 -1 0 0 1 0 0 -1 0 0 0 1 0 -1 1 1 0 -1 1 -1 -1 -1 0 -1 0 1 0 1 1 -1 1 -1 0 0 1 1 -1 0 1 0 -1 1 1 -1 1 0 -2 -1 -1 3 -1 -4 Lasso results lambda x1 x2 x3 x4 x5 9 0 0 0 0 0 7.3636 0 0 0 -0.0909 0 3.8175 0.2336 0 0 -0.1971 0 2.4414 0.3625 0.1523 0 -0.1726 0 2.0472 0.4961 0.3504 0.2756 0 0 1.7 0.525 0.4 0.35 0 0 0.8735 0.5824 0.507 0.5234 0 0.0345 0.5414 0.8233 0.8496 0.9812 0.3459 0 0.1206 1.108 1.2613 1.5452 0.7739 0 0 1.216 1.4081 1.7273 0.9095 -0.0562 11 ID ending in 8. Classification data, training and validation results. Training data x1 x2 y -9 -1 0 -6.8824 -1 0 -4.7647 -1 0 -2.6471 -1 1 -0.5294 -1 0 1.5882 1 1 3.7059 1 1 5.8235 1 0 7.9412 1 1 Test/validation data Predictions x1 x2 y M2 M3 -7.9412 -1 0 0 0 -5.8235 -1 0 0 0 -3.7059 -1 0 0 0 -1.5882 -1 1 0 0 0.5294 1 1 0 1 2.6471 1 1 0 1 4.7647 1 1 0 1 6.8824 1 1 0 1 9 1 1 0 1 Training results from model M1: βˆ0 = −0.1495, βˆ1 = 0.0941, βˆ2 = 0.8121. ID ending in 8. Regression data and training results. Training data x1 x2 x3 x4 x5 y -1 1 -1 1 -1 -1 0 1 0 0 1 1 0 0 0 -1 1 0 0 0 -1 1 -1 0 0 1 1 1 0 1 1 0 0 -1 0 0 -1 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 -1 1 0 -1 -1 -1 -1 1 0 0 0 -1 -1 -1 -3 1 -2 1 0 Lasso results lambda x1 x2 x3 x4 x5 5 0 0 0 0 0 2.3333 0 0 0 0 0.3333 1.5 0 0 0 0.125 0.5 0.4314 0 0.0643 0 0.2435 0.7169 0.1329 0 0.0364 -0.1416 0.3205 0.8532 0.0741 0.0503 0 -0.1905 0.373 0.9101 0.0356 0.0656 0 -0.2079 0.3851 0.9335 0 0.1136 -0.0459 -0.252 0.4391 0.9817 12 ID ending in 9. Classification data, training and validation results. Training data x1 x2 y -3 -1 0 -2.3684 -1 0 -1.7368 -1 0 -1.1053 -1 0 -0.4737 -1 0 0.1579 1 1 0.7895 1 1 1.4211 1 0 2.0526 1 1 2.6842 1 1 Test/validation data Predictions x1 x2 y M2 M3 -2.6842 -1 0 0 0 -2.0526 -1 0 0 0 -1.4211 -1 0 0 0 -0.7895 -1 0 0 0 -0.1579 -1 1 1 0 0.4737 1 1 1 1 1.1053 1 1 1 1 1.7368 1 1 1 1 2.3684 1 1 1 1 3 1 1 1 1 Training results from model M1: βˆ0 = −9.5899, βˆ1 = 10−4, βˆ2 = 10.9763. 13 ID ending in 9. Regression data and training results. Training data x1 x2 x3 x4 x5 y 1 1 -1 -1 1 0 0 -1 1 0 1 1 -1 -1 0 0 0 0 0 1 1 0 -1 1 1 -1 0 -1 -1 -1 1 1 1 1 0 0 0 0 0 1 1 1 -1 -1 -1 1 -1 1 0 -1 0 1 -1 1 -1 -1 1 -1 -1 0 -1 1 -1 1 1 -1 -1 0 1 0 -1 1 -1 -1 1 0 1 -1 3 3 -3 -2 1 -3 Lasso results lambda x1 x2 x3 x4 x5 11 0 0 0 0 0 8.75 0 0 0.125 0 0 8.05 -0.025 0 0.15 0 0 7.9269 -0.0261 0 0.154 0.0078 0 3.849 -0.0097 -0.0885 0.2608 0.2772 0 1.5957 0 -0.1011 0.2952 0.4388 -0.1569 0.0242 0 -0.1067 0.3181 0.5488 -0.2663 0 0.0069 -0.11 0.3196 0.5533 -0.268 14