Sunteți pe pagina 1din 10

Decision Trees

Decision trees are a representation for classification.


• The root is labelled by an attribute.
• Edges are labeled by attribute values.
• Edges go to decision trees or leaves.
• Each leaf is labelled by a class.

windy
true false
outlook temp
sunny overcast rain hot mild cool
humidity good bad outlook outlook good
high normal sunny overcast rain s o r
bad good bad good ??? bad ??? good

TDIDT: Top-Down Induction of Decision Trees


Growth Phase: The tree is constructed top-down.
• Find the “best” attribute.
• Partition examples based on the attribute’s values.
• Apply the method to each partition.
Pruning Phase: The tree is pruned bottom-up.
• For each node, keep subtree or change to leaf.
• Choose by comparing estimated error.
Algorithm for Growing Decision Trees
Grow-DT(examples)
1. N ← a new node
2. N.class ← most common class in examples
3. N.test ← best attribute (or test)
4. if N.test is not good enough
5. then mark N as a leaf and return N
6. for each value vj of N.test
7. examplesj ← examples with N.test = vj
8. if examplesj is empty
9. then N.branchj ← N.class
10. else N.branchj ← Grow-DT(examplesj )
11. return N

Measuring Information
Information gain is a popular way to select an attribute.
Let I(p, n) be the information in p positive examples and
n negative examples.
p p n n
I(p, n) = − log − log
p+n 2p+n p+n 2p+n
Suppose there are pi positive and ni negative examples
for the ith value of an attribute. Then information gain
G(p1, n1, p2, n2) can be defined as:
G(p1, n1, p2, n2) = I(p1 + p2, n1 + n2) =
− p1+np11+n 1
+p2 +n2 I(p 1 , n 1 ) − p2 +n2
p1 +n1 +p2 +n2 I(p2 , n2 )
This graph shows I(p, n) assuming p + n = 100.
I(p, n=100-p)
1

0.8

0.6

0.4

0.2

0
0 20 40 60 80 100
p

This graph shows G(p1, n1, p2, n2) assuming p1 + n1 = 50


and p2 + n2 = 50.

G(p1, n1=50-p1, p2, n2=50-p2)


1

0.5

50
40
30 50
20 30 40
p2 10 10 20
0 0
p1
Example of Attribute Selection
9 good, 5 bad 9 good, 5 bad
Outlook Temp
Sunny Rain Cool Hot
Overcast Mild
2 good 4 good 3 good 3 good 4 good 2 good
3 bad 0 bad 2 bad 1 bad 2 bad 2 bad
G(Outlook) ≈ 0.246 G(Temp) ≈ 0.029
b

9 good, 5 bad 9 good, 5 bad


Humidity Wind
Normal High True False

6 good 3 good 3 good 6 good


1 bad 4 bad 3 bad 2 bad
G(Humidity) ≈ 0.152 G(Wind) ≈ 0.048

Outlook has the highest gain.


Overcast branch is pure.
Need to construct DTs for Sunny and Rain branches.
Comments on Growing Decision Trees
Implicit preference for small trees.
Handling numeric attributes:
Find the test ≤ x that maximizes gain.
Handling missing values: Alternatives:
Treat missing values as separate values.
Weight example across branches.

Addressing “costs” of attributes:


Attribute might have different costs to obtain.
Include cost in attribute measure.

There are alternative attribute measures.


• Information Gain Ratio (for > 2 branches)
G(A)/I(p1 + n1, . . . , pv + nv )

• Gini Index (use this in place of I)


 
2  
2
p n
Gini(p, n) = 1 −   − 
   
  
p+n p+n

• Chi-Squared Statistic
2 2
v (p j − p s j ) (n j − n s j )
χ2 = Σ +
j=1 p sj n sj
where sj = (pj + nj )/(p + n)
Overfitting
A hypothesis h overfits the training data if there
is a hypothesis h0 that is worse on the training
data, but better over the whole distribution.
Reasons for overfitting:
• Noise
• Coincidence
• Lack of Data
• Boundary Approximation

This decision tree has no errors on the examples.


A Decision Tree for Glass2
2.5

1.5
Aluminum

0.5

0
1.51 1.515 1.52 1.525 1.53 1.535
Refractive Index
In this region, note the boxes with only one example.
A Decision Tree for Glass2
1.8
1.7
1.6
1.5
Aluminum

1.4
1.3
1.2
1.1
1
1.5155 1.516 1.5165 1.517 1.5175 1.518 1.5185 1.519 1.5195
Refractive Index

Avoiding Overfitting by Pruning


For decision trees, try to avoid overfitting by trading off
smaller trees for small increases in training error.
Preferring smaller trees is justified by Occam’s Razor.
Modifying trees to make them smaller is called pruning.
• Prepruning: Avoid creation of subtrees based on num-
ber of examples or attribute relevance.
• Postpruning: Create overfitting DT and substitute
subtrees with leaves if estimated error is reduced.
Prepruning Example (c4.5)
Prepruned Decision Tree for Glass2
2.5

1.5
Aluminum

0.5

0
1.51 1.515 1.52 1.525 1.53 1.535
Refractive Index

Unpruned Example (c4.5 -m 1)


Unpruned Decision Tree for Iris2
2.5

2
Petal Width

1.5

0.5

0
1 2 3 4 5 6 7
Petal Length
Pruned Example (c4.5 -m 1)
Pruned Decision Tree for Iris2
2.5

2
Petal Width

1.5

0.5

0
1 2 3 4 5 6 7
Petal Length

Postpruning Algorithm
Prune-DT(N : node, examples)
1. leaferr ← number of examples 6= N.class
2. revise leaferr upward if examples were training set
3. if N is a leaf then return leaferr
4. treeerr ← 0
5. for each value vj of N.test
6. examplesj ← examples with N.test = vj
7. suberr ← Prune-DT(N.branchj , examplesj )
8. treeerr ← treeerr + suberr
9. if leaferr ≤ treeerr
10. then make N a leaf and return leaferr
11. else return treeerr
Comments on Pruning
The training and validation set approach is:
Remove “validation” exs. from training exs.
Grow decision tree using training exs.
Prune decision tree using validation set.
Subtree raising is replacing a tree with one of
its subtrees.
Rule post-pruning as described in the book is
performed by the C4.5rules program, not C4.5.

S-ar putea să vă placă și