Documente Academic
Documente Profesional
Documente Cultură
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
1
Copyright 2019 Wiley India Pvt. Ltd. All rights reserved.
Introduction to Classification and Decision Tree
Problem Solving Using Decision Trees
Basic Decision Tree Learning Algorithm
Iterative Dichotomiser 3 (ID3)
Popularity of Decision Tree Classifiers
Steps to Construct a Decision Tree
Issues in Decision Trees
Rule-Based Classification
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
A tree is build in which the leaf nodes
contain the output category.
The class of the output is predicted based on
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Decision trees can be used to solve problems
that have the following features:
◦ Instances or tuples are represented as attribute value
pairs, where the attributes take a small number of
disjoint possible values.
◦ The target function has discrete output values such as
yes or no.
◦ Decision trees require disjunctive descriptions which
implies that the output of the decision tree can be
represented using a rule-based classifier.
◦ Decision tree can be used when training data contains
errors and when it contains missing attribute values.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Two basic algorithms are the Iterative
Dichotomiser 3 (ID3) algorithm and the by C4.5
algorithm
Attribute selection measures are also known as
splitting rules because they determine how the
tuples at a given node are to be split
Information gain is the main selection measure
that is used
The attribute having the best score for the
measure is chosen as the splitting attribute for
the given tuples.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Information gain
Gain ratio
Gini index
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Entropy is an entity that controls the split in
data.
It computes the homogeneity of examples.
Entropy ranges between 0 and 1
0 if all members of S belong to the same
class
1 if there are equal number of positive and
negative examples
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Information gain is the measure of
effectiveness of an attribute in classifying the
training data.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
In decision tree learning, the most popular
algorithm is the Iterative Dichotomiser 3 (ID3)
algorithm.
Stopping condition of ID3
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Every element in the subset belongs to the
same class and then the node is turned into a
leaf node
There can be no more attributes to be
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
1. It maintains only a single current hypothesis as
it searches through the space of decision trees.
2. It does not have the ability to determine
alternative decision trees.
3. It does not perform backtracking in search.
Hence, there are chances of getting stuck in local
optima.
4. It is less sensitive to error because information
gain, which is a statistical property is used.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
No domain knowledge required and used in
exploratory knowledge discovery
Classification is based on probability alone
Can handle multidimentional data
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Compute the entropy for the given dataset.
For every attribute/feature:
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Handling continuous attributes.
Choosing an appropriate attribute selection
measure.
Handling training data with missing attribute
values.
Handling attributes with differing costs.
Improving computational efficiency.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Underfitting occurs when a machine learning
algorithm cannot capture the underlying trend of
the data
Overfitting occurs when a machine learning
algorithm captures the noise of the data
Given a hypothesis space H, a hypothesis h ∈ H is
said to overfit the training data if there exists some
alternative hypothesis h ‘ ∈ H, such that h has
smaller error than h ‘ over the training examples,
but h ‘ has a smaller error than h over the entire
distribution of instances.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Approaches that stop the growth of tree
before it reaches the point where it perfectly
classifies the training data.
Approaches that allow the tree to overfit the
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Use separate dataset for training and for
evaluation. This is done by the training and
validation set approach.
Use the entire dataset for training but use a
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
Using IF–THEN Rules for Classification:
Define coverage and accuracy of a rule. It is
given by
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
The rule ordering scheme does the
prioritization of the rules
Ordering based on class: the classes are
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
To extract rules from a decision tree, one rule
is created for each path from the root node to
the leaf node.
Each splitting criterion in each path is
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
For a given rule antecedent, any condition
that does not improve the estimated accuracy
of the rule can be pruned
Sequential covering algorithm can be used for
the same
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph