辅导案例-ID3-Assignment 2

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

31005 Assignment 2 Full Name 12345678
Implementation of ID3 Decision
Tree Algorithm
Project link: https://colab.research.google.com/drive/17iKQhg2ho_ldyK9omPm_kvs3fCs2QhbZ?
usp=sharing

Introduction
Decision Trees are commonly used supervised learning data models. If the data attributes
consists of , a decision tree maps an input data sample of observed attribute
values to the target of prediction: . The tree performs the mapping by …

This report presents the implementation of a decision tree construction algorithm for binary
classification problems, where …

The machine learning algorithm to construct a decision tree takes …

ID3 Algorithm
The algorithm consists of XXX modules: entropy computation, information gain computation, …

…

 
Entropy
…

Information Gain
Information gain measures the amount of the reduced uncertainty about a random variable
given the knowledge of another random variable .

In the supervised learning problem, is the target of prediction, …

The computation of information gain in ID3 algorithm is implemented as in myid3.py, line
45–60, “compute_info_gain”. A snippet of the essential computation shown below

45. def compute_info_gain([TODO]):
46.     …
47.     for v, fr in [TODO]:
48.         …
55.         [TODO]
56.
57.     # [TO-ADD-COMMENTS]
58.     return [TODO]
Line… implements the sum-over -values given an (the attribute) in equation (2).

X1, X2,…, Xp fT p
fT : [X1,…, Xp]↦ Y fT
Y
X
IG (Y; X ) := H(Y ) − ∑
xi∈
p(xi)(∑
yj∈
p(yj |xi)log
1
p(yj |xi)
) (2)
Y
Y X
Link code with algorithm details.

- Number equations for cross-ref

- edit code number to the actual line number in your code
(take care of omissions in the quote, too)

- HINT: try "http://www.planetb.ca/syntax-highlight-word"
to format your code.
A PLAIN TEXT and Clickable link to colab (recommended) /
github / other cloud-based code repository
Your information
Start with title, NO cover page.
SECTION: A short introduction to the problem and the method. Better with your
comments on why you have chosen this method to address the problem.
SECTION: The algorithm technical details and your implementation.
A file easily identified in your
project, or a cell marked in
your notebook, e.g. you can
comment the cell in the first
line by #[cell:ig-comp], and
cross-ref here.
SUBSECTIONs: Important aspects of the algorithm
Recommendation and hints Mandatory Marking criteria
31005 Assignment 2 Full Name 12345678
Model Evaluation
The model is evaluated on XXXX Dataset and YYYY Dataset.

…

 
Data Preparation
…

Experiment Design and Evaluation
In one round of test, the data is split into two parts for training and another for evaluation … The
model is constructed on the training … There are a few hyper parameters to configure: XX, YY. …

Evaluation Results
…

Conclusion
…

One or two datasets that are prepared in
your project / notebook.
SECTION: evaluation report.
SUBSECTION: BRIEF intro of each dataset, including the observed attributes
and task, the necessary preprocessing steps. No more than 100 words for
each dataset (shorter the better). No figures/tables/plots.
SUBSECTION: BRIEF intro of the overall design
of the experiment and evaluation criteria and
scheme. No more than 100 words.
SUBSECTION: the result of training/test (using the criteria you have
discussed in the previous subsection). Include minimal program
print-outs/plots.
SECTION: Conclusion
GOOD: The report contains a clear brief introduction to the algorithm. The input/output data formats are
clearly stated. The implementation is correct. The program is well commented, explaining how the computer
program codes correspond to algorithm steps. The implementation can be easily accessed from a cloud-
based service such as colab/Github and replicated/executed for assessment. The evaluation is clearly
designed according to standard data analytics models (e.g. CRISP-DM). The report is clearly structured and
well written.

————

PASS: The report contains an introduction to the algorithm. The input/output data formats are listed.

The implementation is mostly correct. The program is commented to be readable. The implementation can
be easily accessed from a cloud-based service such as colab/Github and replicated/executed for
assessment. The implementation may contain minor issues that can be addressed by a tutor without major
revision. The report is written in clear English and is structured.

————

FAIL: The report misses an introduction to the algorithm, or the input/output data formats are not mentioned.
The implementation cannot be replicated. The report is not in English or has no clear structure.

欢迎咨询51作业君