Decision Tree

Decision Tree
https://www.saedsayad.com/decision_tree.htm
1
1
Classification and Prediction
 Classification is the process of finding a model (or function)
that describes and distinguishes data classes or concepts.
 The model are derived based on the analysis of a set of
training data (i.e., data objects for which the class labels are
known).
 The model is used to predict the class label of objects for
which the class label is unknown.
2
Decision Tree Induction
 A decision tree is a flowchart-like tree structure, where each
node denotes a test on an attribute value, each branch
represents an outcome of the test, and tree leaves represent
classes or class distributions.
 At each node, the algorithm chooses the “best” attribute to
partition the data into individual classes.
 The construction of decision tree classifiers does not require
any domain knowledge or parameter setting, and therefore is
appropriate for exploratory knowledge discovery.
 Decision trees can easily be converted to classification rules.
 Decision trees can handle multidimensional data.
4
Decision Tree Induction: Training Dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
This >40 medium no fair yes
follows an >40 low yes fair yes
example of >40 low yes excellent no
31…40 low yes excellent yes
Quinlan’s <=30 medium no fair no
ID3 <=30 low yes fair yes
>40 medium yes fair yes
(Playing <=30 medium yes excellent yes
Tennis) 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
5
Output: A Decision Tree for “buys_computer”
age?
<=30 overcast
31..40 >40
student? yes credit rating?
no yes excellent fair
no yes yes
6
Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer
manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are discretized
in advance)
– Examples are partitioned recursively based on selected attributes
– Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
7
Attribute Selection Measure for ID3: Information Gain
 Select the attribute with the highest information gain

 Let pi be the probability that an arbitrary tuple in D belongs
to class Ci
 Expected information (entropy) needed to classify a tuple
in D: m
Info(D)   pi log2 ( pi )
i1
 Information needed (after using A to split D into v

v |D |
partitions) to classify D:
InfoA (D )    I(Dj )
j
j1 | D|
 Information gained by branching on attribute A
Gain(A) Info(D) InfoA(D)
9
Attribute Selection: Information Gain
 Class P: buys_computer = “yes” 5 4

Info age (D)  I (2,3)  I (4,0)
 Class N: buys_computer = “no” 14 14
Info(D)  I (9,5)   9 log 2 ( 9 )  5 log 2 ( 5 ) 0.940  5 I (3,2)  0.694
14 14 14 14 14
age pi ni I(pi, ni) 5
I (2,3)means “age <=30” has 5 out of
<=30 2 3 0.971 14 14 samples, with 2 yes’es and 3
31…40 4 0 0 no’s. Hence
>40 3 2 0.971
age income student credit_rating buys_computer Gain(age)  Info(D)  Infoage(D)  0.246
<=30 high no fair no
<=30 high no excellent no Similarly,
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40
31…40 low
low yes
yes
excellent
excellent
no
yes Gain(income)  0.029
<=30
<=30
medium
low
no
yes
fair
fair
no
yes Gain(student)  0.151
Gain(credit_ rating)  0.048
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40February
high 18, 2016 yes fair yes 10
>40 medium no excellent no
Gain Ratio for Attribute Selection (C4.5)
• Information gain measure is biased towards attributes with

a large number of values
• C4.5 (a successor of ID3) uses gain ratio to overcome the
problem.
v | Dj |
| Dj |
SplitInfoA (D)    log2 ( )
j1 | D | | D|
– GainRatio(A) = Gain(A)/SplitInfo(A)
4 4 6 6 4 4
• Ex. SplitInfo A(D)    log 2( )  log 2 ( )  log 2 ( )  0.926
14 14 14 14 14 14
– gain_ratio(income) = 0.029/0.926 = 0.031
• The attribute with the maximum gain ratio is selected as
the splitting attribute
11
Comparing Attribute Selection Measures
• The three measures, in general, return good results but

– Information gain:
• biased towards multivalued attributes
– Gain ratio:
• tends to prefer unbalanced splits in which one
partition is much smaller than the others
12
Thank you for your attention.
Any Question?
13

Decision Tree

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Decision Tree

Încărcat de

Drepturi de autor:

Formate disponibile

Decision Tree

student? yes credit rating?

no yes excellent fair

 Select the attribute with the highest information gain

 Information needed (after using A to split D into v

 Class P: buys_computer = “yes” 5 4

• Information gain measure is biased towards attributes with

• The three measures, in general, return good results but

S-ar putea să vă placă și