BUCI057H7 page 2 of 9 Question 1 (10 marks) Provost & Fawcett have defined Data Science in terms of 9 computational problems. Define the Similarity problem in general and propose examples on multi-dimensional data. Your answer: BUCI057H7 page 3 of 9 Question 2 (10 marks) Spectral analysis can be used to reduce data dimensionality. Explain why dimensionality reduction is desirable and how Spectral analysis can achieve it. Your answer: BUCI057H7 page 4 of 9 Question 3 (10 marks) Over D = {a, b, c, d, e}, frequency of observations gives us the following distribution: P = Pr[X=xi] = [3/8, 3/16, 1/8, 1/8, 3/16]. To simplify calculations, however, we decide to adopt the “simpler” distribution Q = Pr[X=xi] = [1/2, 1/8, 1/8, 1/8, 1/8]. Compute the Kullback-Leibler divergence between P and Q, defined as To simplify calculations, assume that log23 (logarithm in base 2 of 3) equals 1.585 and show the process by which you calculated the divergence. Your answer: BUCI057H7 page 5 of 9 Question 4 (10 marks) Define the decision trees employed in the Supervised Segmentation task and describe in words how the CART algorithm can recursively build a decision tree for a given dataset of labeled Yes/No examples. Your answer: BUCI057H7 page 6 of 9 Question 5 (10 marks) Sports Rating & Ranking: if a function S(i) measures the strength of a team/player i attending a tournament, how could we predict the outcome of a match between, say, team i and team j? What method would you use, among those seen in class, to extract function S(i) from a dataset of past results? Your answer: BUCI057H7 page 7 of 9 Question 6 (10 marks) Define the Kernel method for creating a feature space and discuss why it is used in combination with Support Vector Machines to classify data. Your answer: BUCI057H7 page 8 of 9 Question 7 (10 marks) Define the Degree sequence of networks, explain why the sum of degrees is always even and discuss its usage in network analysis. Your answer: BUCI057H7 page 9 of 9 Question 8 (10 marks) Ranking in Networks: what is the model of i) importance and ii) human navigation of Web pages that underpins PageRank? Your answer:
欢迎咨询51作业君