程序代写案例-COMP-4311

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
Page 1 of 6

Midterm Examination
COMP-4311-WA
Big Data
Winter 2020

Time: 50 minutes Total Points: 30


Name:
Last Name First Name


Student ID:

Page 2 of 6

1) What is complete case deletion in imputation? Why is it an acceptable method of
imputation to deal with MCAR (Missing Completely at Random)? Explain with an
example. (3 points)












2) Let’s assume the population of a city consists of poor, middle-class, and rich. Someone
collects a random sample from this population and somehow ends up collecting data only
from middle-class people. What is the Gini Impurity of this sample that has only middle-
class people? Explain. (2 points)











Page 3 of 6

3) Explain the dynamic threshold approach to incorporate continuous-valued attributes in
decision tree learning. (2 points)








4) What is Bagging? Describe a machine learning method that uses bagging. (4 points)


















Page 4 of 6

5) How does increasing or decreasing the number of random features used for information
gain calculation at each node affect the performance of Random Forests? (4 points)











6) What is the objective function in k-means clustering? Why is it not always suitable for
determining the appropriate value of k? (3 points)














Page 5 of 6

7) What are single linkage and complete linkage methods in hierarchical clustering? Is the
single linkage method good for achieving compact clusters? Explain your answer using
an example/figure. (5 points)











8) When the size of the best subset of features is large, does sequential backward selection
perform better or worse? Why? Explain your answer. (3 points)













Page 6 of 6

9) Can replicated over-sampling cause overfitting? Why? Explain your answer (3 points).











10) What is the difference between cost-sensitive learning and cost-sensitive prediction?
(1 point)





欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468