CS 4340-5340 Practice Questions 1. Consider a perceptron learning algorithm (PLA) used for a binary classification problem. (i) What will be the algorithm’s output if the training data are not linearly separable? (ii) Would you call the PLA supervised or unsupervised learning? 2. Obtain the linear regression of Y on X1, X2 (by hand-calculation or writing code, but do not use an off-the-shelf linear regression function): X1 X2 Y 0 0 2 0 1 2 1 0 10 1 1 10 (i) Write the regression equation. (ii) If we use the result of this regression as a classifier, what will be the equation of the classifier? 3. Exercises 3.6 and 3.7 from the textbook (page 92). 4. In least squares linear regression, we obtain the solution (the weights) analytically. In logistic regression, why don’t we analytically solve for the weights by setting the partial derivatives of the (log-)likelihood expression (or the negative of that expression) to zeros? 5. Explain the concepts of “training data,” “test data,” “training error,” and “test error.” Is it good to have as low a training error as possible? 6. What does “i.i.d.” stand for? Why is it important in the machine learning / statistics literature? 7. True or false: (i) The decision (classification) boundary in PLA is linear. (ii) The decision (classification) boundary in logistic regression is non-linear. (iii) The loss (error) function in linear regression is convex and therefore guarantees the existence of a unique (global) optimum solution. 8. Justify or refute: Gradient descent/ascent guarantees convergence to the global optimal solution. Advice: Please read the textbook carefully. If you can devote further time, read the references, particularly, the ISLR book.
欢迎咨询51作业君