Sunteți pe pagina 1din 23

DECISION TREE

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
1
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 Introduction to Classification and Decision Tree
 Problem Solving Using Decision Trees
 Basic Decision Tree Learning Algorithm
 Iterative Dichotomiser 3 (ID3)
 Popularity of Decision Tree Classifiers
 Steps to Construct a Decision Tree
 Issues in Decision Trees
 Rule-Based Classification

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 A tree is build in which the leaf nodes
contain the output category.
 The class of the output is predicted based on

the rules generated from the tree structure.


 Learned trees can be represented as a set of

IF–THEN rules as well.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Decision trees can be used to solve problems
that have the following features:
◦ Instances or tuples are represented as attribute value
pairs, where the attributes take a small number of
disjoint possible values.
◦ The target function has discrete output values such as
yes or no.
◦ Decision trees require disjunctive descriptions which
implies that the output of the decision tree can be
represented using a rule-based classifier.
◦ Decision tree can be used when training data contains
errors and when it contains missing attribute values.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Two basic algorithms are the Iterative
Dichotomiser 3 (ID3) algorithm and the by C4.5
algorithm
 Attribute selection measures are also known as
splitting rules because they determine how the
tuples at a given node are to be split
 Information gain is the main selection measure
that is used
 The attribute having the best score for the
measure is chosen as the splitting attribute for
the given tuples.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Information gain
 Gain ratio
 Gini index

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Entropy is an entity that controls the split in
data.
 It computes the homogeneity of examples.
 Entropy ranges between 0 and 1
 0 if all members of S belong to the same

class
 1 if there are equal number of positive and

negative examples

where p stands for probability of various instances under consideration.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Information gain is the measure of
effectiveness of an attribute in classifying the
training data.

 Gain ratio overcomes the bias which is


present in information gain

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 In decision tree learning, the most popular
algorithm is the Iterative Dichotomiser 3 (ID3)
algorithm.
 Stopping condition of ID3

 Every element in the subset belongs to the

same class and then the node is turned into


a leaf node and labelled with the name of
that class.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Every element in the subset belongs to the
same class and then the node is turned into a
leaf node
 There can be no more attributes to be

selected, but the examples still do not belong


to the same class.
 There can be no more examples in the

subset, which happens when no example in


the parent set was found

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
1. It maintains only a single current hypothesis as
it searches through the space of decision trees.
2. It does not have the ability to determine
alternative decision trees.
3. It does not perform backtracking in search.
Hence, there are chances of getting stuck in local
optima.
4. It is less sensitive to error because information
gain, which is a statistical property is used.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 No domain knowledge required and used in
exploratory knowledge discovery
 Classification is based on probability alone
 Can handle multidimentional data

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Compute the entropy for the given dataset.
 For every attribute/feature:

◦ Calculate entropy for all categorical values.


◦ Take the average information entropy of the current
attribute.
◦ Calculate gain for the current attribute.
 Pick the highest gain attribute.
 Repeat until the desired tree is complete.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Handling continuous attributes.
 Choosing an appropriate attribute selection

measure.
 Handling training data with missing attribute

values.
 Handling attributes with differing costs.
 Improving computational efficiency.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Underfitting occurs when a machine learning
algorithm cannot capture the underlying trend of
the data
 Overfitting occurs when a machine learning
algorithm captures the noise of the data
 Given a hypothesis space H, a hypothesis h ∈ H is
said to overfit the training data if there exists some
alternative hypothesis h ‘ ∈ H, such that h has
smaller error than h ‘ over the training examples,
but h ‘ has a smaller error than h over the entire
distribution of instances.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Approaches that stop the growth of tree
before it reaches the point where it perfectly
classifies the training data.
 Approaches that allow the tree to overfit the

data, and then post-prune it.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Use separate dataset for training and for
evaluation. This is done by the training and
validation set approach.
 Use the entire dataset for training but use a

statistical test (chi square test) for estimation.


 Use an explicit measure of the complexity for

encoding the training examples and the decision


tree, halting growth of the tree when this
encoding size is minimized. This is done using
the minimum description length principle.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Continuous values attributes: Uses a
threshold based Boolean attribute approach
 Missing attribute values: assign the value that

is most common among training examples at


node n.
 Handling Attributes with Differing Costs:

prefer low-cost attributes are preferred over


high-cost attributes

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 Using IF–THEN Rules for Classification:
 Define coverage and accuracy of a rule. It is

given by

 Properties of rule generated by rule based


classifier
◦ Mutually exclusive rules
◦ Exhaustive rules

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 The rule ordering scheme does the
prioritization of the rules
 Ordering based on class: the classes are

sorted in the order of decreasing


“importance”.
 With rule ordering, the triggering rule that

appears first in the list has the highest


priority and it fires the class prediction

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 To extract rules from a decision tree, one rule
is created for each path from the root node to
the leaf node.
 Each splitting criterion in each path is

logically ANDed to form the rule antecedent.


The leaf node holds the class prediction.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph
 For a given rule antecedent, any condition
that does not improve the estimated accuracy
of the rule can be pruned
 Sequential covering algorithm can be used for

the same

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy
Joseph

S-ar putea să vă placă și