Statistical Learning-Classification STAT 441 / 841, CM 763 Assignment 4 Department of Statistics and Actuarial Science University of Waterloo </br>Policy on Lateness: Late assignments are NOT accepted. 1. In a binary classification problem where y ∈ {0, 1} a) Assume x = (x1, . . . , xd) T , and xj ∈ {0, 1}, for j = 1 . . . d. Define P (y = 1) = p, P (xj = 1|y = 1) = pj1, and P (xj = 1|y = 0) = pj0. Show that the Naive Bayes classifier is equivalent to a linear classification rule in the form of yˆ = sign(w.x− b). Write w and b in terms of p, pi1, and pi0. b) Now suppose xj ∈ R. Assume P (y = 1) = p, xj|y = 1 ∼ N(µj1, σ2j ), and xj|y = 0 ∼ N(µj0, σ2j ). Show that the Naive Bayes classifier is equivalent to a linear classification rule in the form of yˆ = sign(w.x− b). Write w and b in term of p, µj1, µj0, and σj. 2. Suppose X1:, . . . X10: are standard independent Gaussian, and the target y is defined by y = { 1 if ∑10 j X 2 j: > 9.34 −1 otherwise Sample 2000 training cases, with approximately 1000 points in each class, and 10,000 test observations. a) Write a program implementing AdaBoost with stumps. b) Plot the training error as well as test error, and discuss its behavior. c) Investigate the number of iterations needed to make the test error finally start to rise. 1 3. In the maximum-margin hyperplane problem, let’s τ denotes the value of the margin. Show that 1 τ 2 = 2 n∑ i=1 αi − n∑ i=1 n∑ j=1 αiαjyiyjk(xi,xj) where k(xi,xj) is a valid kernel. Only for Grad Students 4. Kernel functions can be defined over objects as diverse as graphs, sets, and text docu- ments. For instance consider the space of all possible subsets A of a given fixed set D. Show that the kernel function k(A1, A2) = 2 |A1∩A2| corresponds to an inner product in a feature space of dimensionality 2|D| defined by the mapping φ(A). Here A is a subset of D , A1 ∩A2 denotes the intersection of sets A1 and A2, and |A| denotes the number of elements in A. Find mapping φ(A) such that k(A1, A2) = φ(A1) Tφ(A2). 2
欢迎咨询51作业君