Problem1 Small: 1Y 3N Medium: 4Y 2N Large: 1Y 4N Info(D)=I(6.9)=0.97095 I(1,3)=-(1/4)log2(1/4)-(3/4)log2(3/4)=0.81128 I(4,2)=-(4/6)log2(4/6)-(2/6)log2(2/6)=0.91829 I(1,4) )=-(1/5)log2(1/5)-(4/5)log2(4/5)=0.72193 Info(A1)=(4/15)I(1,3)+(6/15)I(4,2)+(5/15)I(1,4)=0.82430 Gain(A2)=Info(D)-Info(A2)=0.14665 A2 has the larger information gain, so A2 is the test attribute at root level. Problem2 (1) A1: A1=Low->no error rate=4/9 A1=high->yes error rate=1/2 A1 error rate = 7/15 A2 A2=hot->yes error rate=2/5 A2=mild->yes error rate =2/5 A2=cold->no error rate = 1/5 A2 error rate=5/15 A3 A3=medium->yes error rate=2/6 A3=large-> no error rate=1/5 A3=small->yes error rate=2/4 A3 error rate = 5/15 Choose A2 A2=hot->yes A2=mild->yes A2=cold->no (2) Y N Y N Problem 3 6Y and 9N P(Y)=6/15=0.4 P(N)=9/15=0.6 P(A1=high|Y)=2/6 P(A2=hot|Y)=3/6 P(A3=large|Y)=1/6 P(x|Y)=0.027 P(A1=high|N)=4/9 P(A2=hot|N)=1/9 P(A3=large|N)=3/9 P(X|N)=0.016 P(Y)*P(X|Y)=0.0108 P(N)*P(X|N)=0.0096 The instance belong to Y. Problem 4 Problem 5 Null Hyphothesis: Attrbutes A1 and A2 are independent(no correlation) BY USING X^2(D,B) DF=3 P-value also bigger then o.o5 Do not reject the null Problem6 (1) Problem 7 (1) (2) (3) A1 and A2 has the strongest correlation. Problem 8 (1) (2) Problem 9 (1) (2) lowerq = 23 upperq = 50 iqr = 50-23=27 mild.threshold.upper = (iqr * 1.5) + upperq =90.5 mild.threshold.lower = lowerq - (iqr * 1.5) =-17.5 result = which(data > mild.threshold.upper | data < mild.threshold.lower) which is: 19,27,92,97,99 (3) R code: data <- read.table("e1-p9.csv", header=TRUE,sep = ",") attach(data) quantile(data$A1) source("https://raw.githubusercontent.com/talgalili/R-code- snippets/master/boxplot.with.outlier.label.r") # Load the function lab <- as.character(1:nrow(data)) boxplot.with.outlier.label(data$A1, label_name = data$A1) legend(x = "topright", legend = "Outliers", pch = 1, col = "black") Problem 10.
欢迎咨询51作业君