COMPSCI 361 Page 1 of 10 THE UNIVERSITY OF AUCKLAND SEMESTER ONE 2020 Campus: City COMPUTER SCIENCE Machine Learning NOTE: • This Final Assessment is out of 100 marks. • Attempt ALL questions. • You will need to put your answers into the Answer Booklet and save it as a pdf document. Upload the file on Canvas, like you do for the assignments. • This is an Open Book assessment. You may refer to and cite any written/printed material, including online sources. • You need to demonstrate your understanding of the subject matter and the ability to construct a well described solution or organised arguments to answer the question(s). Quotations (if used) should be used rarely and selectively. • You should include proper referencing of any material you have used (including author and year of publication). It is important that you do not just provide a list of quotations. Quotations should be used to support your own argument not replace it. • When a question requests you to explain your answer, that means you need to justify how you came up with the solution or why you made a certain choice. Be concise and as clear as possible. • You may choose to use diagram(s) to aid in your discussion. If you choose to do so, you may embed photo(s) of hand drawn diagram(s) into the answer booklet. • It is your responsibility to ensure that the diagrams are clear, legible, and have proper resolution. • Please use standard text processing tools to type the answers and avoid hand-written answers where possible. Use images only when it is required. • This Final Assessment has been designed so that a well-prepared student could complete it within 2 hours. • If you wish to raise concerns during the Final Assessment, please call the Contact Centre for advice: Auckland: 09 373 7513, Outside Auckland: 0800 61 62 63, International: +64 9 373 7513 • It is your responsibility to ensure your assessment is successfully submitted on time. Please don’t leave it to the last minute to submit your assessment. • For any Canvas issues, please use 24/7 help on Canvas by chat or phone. • If any corrections are made during the 24 hours, you will be notified by a Canvas Announcement. Please ensure your notifications are turned on during this period. COMPSCI 361 Page 2 of 10 Academic honesty declaration By completing this assessment, I agree to the following declaration: I understand the University expects all students to complete coursework with integrity and honesty. I promise to complete all online assessments with the same academic integrity standards and values. Any identified form of poor academic practice or academic misconduct will be followed up and may result in disciplinary action. As a member of the University’s student body, I will complete this assessment in a fair, honest, responsible and trustworthy manner. This means that: • I declare that this assessment is my own work. • I will not seek out any unauthorised help in completing this assessment. • I am aware the University of Auckland will use plagiarism detection tools to check my content. • I will not discuss the content of the assessment with anyone else in any form, including, Canvas, Piazza, Facebook, Twitter or any other social media or online platform within the assessment period. • I will not reproduce the content of this assessment anywhere in any form at any time. • I declare that I generated the calculations and data in this assessment independently, using only the tools and resources defined for use in this assessment. • I will not share or distribute any tools or resources I developed for completing this assessment. COMPSCI 361 Page 3 of 10 PART A: Association Rule Mining, Clustering, Data Stream Mining, Anomaly Detection Question 1 Association Rule Mining 15 marks Consider the following transactional database where 1, 2, 3, 4, 5, 6, 7, 8, 9 are items. ID Items t1 1, 2, 4, 5 t2 1, 2, 3, 6 t3 1, 2, 3 t4 1, 3, 6 t5 1, 2 t6 4, 5, 6, 7 t7 4, 5, 6 t8 1, 2, 8, 9 t9 4, 5, 6, 8 t10 1, 2, 3 Suppose the minimum support is 19%, describe the process of finding all the frequent itemsets in this database using the following techniques. We normally sort the items in a specific ordering before inserting it into the FP-Tree. Typically, this is in descending order of item frequency. Draw the FP-Tree. You must show your working. Do you believe that we have the most compressed (or compact) FP-Tree representation in this particular case? • If yes, please provide a different item ordering (i.e., not descending order of frequency), such that the FP-Tree has a less compact representation. • If no, please provide a different item ordering (i.e., not descending order of frequency), such that the FP-Tree has a more compact representation. Discuss your answers. You may choose to use diagram(s) to aid in your discussion. If you choose to do so, you may embed a photo of a hand drawn diagram. (15 marks) The text-based answer (excluding diagrams, equations and calculations) should be 400 words or less. Question 2 Clustering 13 marks a) Under what conditions would density-based clustering be more suitable than both partitioning-based clustering and hierarchical clustering? Illustrate an example of a dataset that would support your argument. You may choose to use diagram(s) to aid in your discussion. If you choose to do so, you may embed a photo of a hand drawn diagram. (8 marks) The text-based answer (excluding diagrams, equations and calculations) should be 300 words or less. COMPSCI 361 Page 4 of 10 b) Use single link agglomerative clustering to group the data described by the following distance matrix. Show your working and draw the dendrogram. You may embed a photo of a hand drawn diagram. A B C D E A 0 2 4 8 1 B 0 3 6 7 C 0 5 9 D 0 10 E 0 (5 marks) The text-based answer (excluding diagrams, equations and calculations) should be 100 words or less. Question 3 Data Stream Mining 12 marks (a) We processed a data stream through a predictive model, it produced a set of prediction errors. We show the prediction error along with the timestamp in the table below. T0 in the table indicates the oldest timestamp, and T32 indicates the latest timestamp. Timestamp T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 Prediction Error 0 1 0 0 0 0 0 0 0 0 1 Timestamp T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 Prediction Error 0 1 0 1 0 0 1 0 1 1 1 Timestamp T22 T23 T24 T25 T26 T27 T28 T29 T30 T31 T32 Prediction Error 1 1 1 1 1 1 1 1 1 1 1 Discuss how you would process the prediction error using any drift detector (please pick one that is appropriate) learnt in lectures. In your discussion you must reference how the process would perform on the data in the table. Please also explain whether you believe there is a drift in the data. (8 marks) The text-based answer (excluding diagrams, equations and calculations) should be 400 words or less. (b) The Adaptive Random Forest technique has the ability for background trees to be trained to replace the active trees. In a specific case, you observed that the background trees were frequently trained but only very few replaced the active trees. Based on this observation, what conclusions can you draw about the characteristics of the data stream? (4 marks) The text-based answer (excluding diagrams, equations and calculations) should be 100 words or less. COMPSCI 361 Page 5 of 10 Question 4 Outlier/Anomaly Detection 10 marks The following figure shows the monthly telecom usage activity in a specific area across several years. You were tasked to identify whether outliers exist in the data. (a) What outlier detection technique would you use to identify whether outliers exist for this case? Please discuss your choice of your outlier technique and discuss how the chosen technique is suitable for the data. Please link your answers to the data provided. (5 marks) The text-based answer (excluding diagrams, equations and calculations) should be 200 words or less. (b) Can you intuitively determine how many outlier points exist for the case above? If yes, specify how many outlier points exist. Discuss your answer. You do not have to perform any calculations. (5 marks) The text-based answer (excluding diagrams, equations and calculations) should be 200 words or less. 0 50 100 150 200 250 300 350 Feb-12 Jul-13 Nov-14 Apr-16 Aug-17 Dec-18 May-20 Sep-21 Feb-23 Jun-24 Nov-25 Te le co m U sa ge A ct iv ity Time COMPSCI 361 Page 6 of 10 PART B: Preprocessing, Instance-based Learning, SVM, Bayes Learning, Reinforcement Learning Question 5 12 marks Let us consider a dataset D of N instances with M+1 attributes, that is M input attributes X1,…,XM and the last attribute Y is the class. Additionally, let us consider the standard k- Nearest Neighbour (KNN) method that uses the Euclidean distance as a measure of closeness in the input space: (a) Assuming you have a function Euclidean(a,b) that calculates Euclidean the distance between two vectors a and b of the same size, write the pseudo code for KNN(k,D,t) and explain what it does line by line. What is the time complexity of your code? Explain your answer. You must show how you derived the solution. (4.5 marks) The text-based answer (excluding equations and calculations) should be 200 words or less. (b) You carried out clustering using the Euclidean distance on the input attributes, and obtained L clusters with approximately equal size. The result of the clustering is the set of clusters C={C1,...,CL} and the set of centroids C¢={c¢1,…,c¢L}, where c¢l is the centroid of cluster Cl and D=C1 U...U CL. Note that centroids are not necessarily part of D and therefore do not store a class label (c¢l=[x1c¢l,…,xMc¢l]). Consider the classification method METHODX that is described by the pseudo code below. Let us assume that M<
means “far smaller than”. • Explain what METHODX is doing. You can comment the pseudo code line by line. • For a given test instance ∉ and the assumptions above, will the outputs of the two classification methods be always the same, i.e. KNN(k,D,t)= METHODX(k,D,t,C,C¢)? • Is there are any advantage of using METHODX when compared to KNN? COMPSCI 361 Page 7 of 10 Explain, and if possible, illustrate your answers. (7.5 marks) The text-based answer (excluding equations and calculations) should be 200 words or less. Question 6 12 marks You are given a small training dataset with two input features F1 and F2. (a) If you train a linear hard margin support vector machine on this dataset, what will the decision boundary look like? Draw and properly annotate both the decision boundary and the margins in the input space. What are the support vectors for this dataset? How many instances will be misclassified on the training set with this hard margin classifier? Will this model perform well on unseen data? Explain your answers. (8 marks) The text-based answer (excluding equations and calculations) should be 200 words or less. (b) Now let us say you train using the same method but on different sub-samples (that will happen, for example, if you used cross-validation). How much will the decision boundary change for different sub-samples? Give two sub-samples that will change the decision boundary and another two sub-samples that will not change the decision boundary. Explain your answers. (4 marks) The text-based answer (excluding equations and calculations) should be 200 words or less. COMPSCI 361 Page 8 of 10 Question 7 16 marks You are given a toxicity data set that describes chemical compounds with five Boolean attributes water solubility (S), cytochrominhibitor (C), contains phosphate (P), and cancerogenic in the rat model (R), and the outcome of some toxicity test (O). For the given dataset, train a Naive Bayes classifier to predict the toxicity test outcome for new compounds. (a) Apply the trained classifier on the following two test instances (unseen compounds in the training process). new_comp_1(S=false,C=false,P=false,R=false) new_comp_2(S=true,C=true,P=true,R=false) What are the model predictions for both compounds? What is the probability that the model predicts a positive toxicity test for new_comp_2? What is the likelihood of new_comp_1, given that the positive class hypothesis is true? What is the probability that the model predicts a negative test result before seeing the test instances? Show your working step-by-step (you must show how you calculated all probabilities) and explain your answers. (8 marks) The text-based answer (excluding equations and calculations) should be 200 words or less. (b) The trained Naive Bayes model mostly predicts the negative class. Explain this effect. Explain which examples would be predicted as positive? (4 marks) The text-based answer (excluding equations and calculations) should be 200 words or less. (c) Could you learn a Bayesian network on the same dataset that will give the equivalent model predictions as the Naive Bayes classifier you have trained? If it is not possible, explain why. If possible, draw the network and comment why you designed it like that, COMPSCI 361 Page 9 of 10 as well as what the network represents. Moreover, how many conditional probability tables do you need to provide and what is their size. Give two example tables. Show your working and explain all your answers. (4 marks) The text-based answer (excluding equations and calculations) should be 200 words or less. Question 8 10 marks The following environment with nine grid cells defines a state diagram with eight states in which the agent can be. The black cell cannot be entered. In each state the agent can carry out one of the following actions A={left,right, up,down} which bring it deterministically to the neighbouring cell if it is available. If there is no cell available for that action, the action does not change the current state. That is, every action can be carried out, but for the transitions into non- existing cell the next state, s', is equal to the current state, s. There are no further actions available in the target state H. For each action the agent gets a negative reward of -1 except for actions that bring it into the target state (the reward is 0), and the black state (the reward is -2). Additionally, the partial policy, π, is given by the arrows in the grid. The discount factor, γ, is 0.5. Show your working step-by-step (you must show how you calculated all values) and explain all your answers. (a) Give for each state the value of the value function, Vπ, for the given policy. You can ignore states for which no policy is defined. (3 marks) COMPSCI 361 Page 10 of 10 The text-based answer (excluding equations and calculations) should be 100 words or less. (b) Assuming the initial values of the Q-table are -16, and the agent walks the path (F, right) → (G, up) → (G, right). Calculate the values for the Q-table after every step of the path. What is the value of V(G) after the last step? (3 marks) The text-based answer (excluding equations and calculations) should be 100 words or less. (c) Create an optimal policy, π*, and draw it on the given diagram. What are the corresponding values of V*? What is the most desirable state for the agent? (4 marks) The text-based answer (excluding equations and calculations) should be 100 words or less. 欢迎咨询51作业君