Documente Academic
Documente Profesional
Documente Cultură
windy
true false
outlook temp
sunny overcast rain hot mild cool
humidity good bad outlook outlook good
high normal sunny overcast rain s o r
bad good bad good ??? bad ??? good
Measuring Information
Information gain is a popular way to select an attribute.
Let I(p, n) be the information in p positive examples and
n negative examples.
p p n n
I(p, n) = − log − log
p+n 2p+n p+n 2p+n
Suppose there are pi positive and ni negative examples
for the ith value of an attribute. Then information gain
G(p1, n1, p2, n2) can be defined as:
G(p1, n1, p2, n2) = I(p1 + p2, n1 + n2) =
− p1+np11+n 1
+p2 +n2 I(p 1 , n 1 ) − p2 +n2
p1 +n1 +p2 +n2 I(p2 , n2 )
This graph shows I(p, n) assuming p + n = 100.
I(p, n=100-p)
1
0.8
0.6
0.4
0.2
0
0 20 40 60 80 100
p
0.5
50
40
30 50
20 30 40
p2 10 10 20
0 0
p1
Example of Attribute Selection
9 good, 5 bad 9 good, 5 bad
Outlook Temp
Sunny Rain Cool Hot
Overcast Mild
2 good 4 good 3 good 3 good 4 good 2 good
3 bad 0 bad 2 bad 1 bad 2 bad 2 bad
G(Outlook) ≈ 0.246 G(Temp) ≈ 0.029
b
• Chi-Squared Statistic
2 2
v (p j − p s j ) (n j − n s j )
χ2 = Σ +
j=1 p sj n sj
where sj = (pj + nj )/(p + n)
Overfitting
A hypothesis h overfits the training data if there
is a hypothesis h0 that is worse on the training
data, but better over the whole distribution.
Reasons for overfitting:
• Noise
• Coincidence
• Lack of Data
• Boundary Approximation
1.5
Aluminum
0.5
0
1.51 1.515 1.52 1.525 1.53 1.535
Refractive Index
In this region, note the boxes with only one example.
A Decision Tree for Glass2
1.8
1.7
1.6
1.5
Aluminum
1.4
1.3
1.2
1.1
1
1.5155 1.516 1.5165 1.517 1.5175 1.518 1.5185 1.519 1.5195
Refractive Index
1.5
Aluminum
0.5
0
1.51 1.515 1.52 1.525 1.53 1.535
Refractive Index
2
Petal Width
1.5
0.5
0
1 2 3 4 5 6 7
Petal Length
Pruned Example (c4.5 -m 1)
Pruned Decision Tree for Iris2
2.5
2
Petal Width
1.5
0.5
0
1 2 3 4 5 6 7
Petal Length
Postpruning Algorithm
Prune-DT(N : node, examples)
1. leaferr ← number of examples 6= N.class
2. revise leaferr upward if examples were training set
3. if N is a leaf then return leaferr
4. treeerr ← 0
5. for each value vj of N.test
6. examplesj ← examples with N.test = vj
7. suberr ← Prune-DT(N.branchj , examplesj )
8. treeerr ← treeerr + suberr
9. if leaferr ≤ treeerr
10. then make N a leaf and return leaferr
11. else return treeerr
Comments on Pruning
The training and validation set approach is:
Remove “validation” exs. from training exs.
Grow decision tree using training exs.
Prune decision tree using validation set.
Subtree raising is replacing a tree with one of
its subtrees.
Rule post-pruning as described in the book is
performed by the C4.5rules program, not C4.5.