Documente Academic
Documente Profesional
Documente Cultură
Tree-Based Methods
Principle:
1. Partition the feature space, X, into a set of rectangles,
homogenous regions: Rm
Recursive binary partition
2. Fit a simple model for Y (e.g. constant) in each
rectangle
• Disadvantage: instability
– The error made at the upper level will be propagated to the lower
level
∑ i
( y
i =1
− f ( xi )) 2
xi ∈R1 ( j , s )
c2
∑ i 2 ]
( y − c ) 2
xi ∈R2 ( j , s )
How large should we grow the tree ?
• Strategies:
C (T )
|T |
Cα (T ) = ∑ N mQm (T ) + α | T |
m =1
Penalty on the
Cost: sum of complexity/size of
squared errors the tree
– For a α , a unique Tα
– To find Tα: weakest link pruning
• Each time collapse an internal node which add smallest error
→ a sequence of subtrees containing Tα
• Estimation of α: minimize the cross-validated sum of squared (p214)
• Choose from this tree sequence: T ̂
9.2.3. Classification Trees
• Y = { 1, 2 …, k, …,K}
• Classify the observations in node m to the major class in the
node: ∧
k (m) = arg max k pmk
With Pmk:= proportion of observation of class k in node m
Pruning : min C T N m Qm T T
m 1
pˆ mk pˆ mk ' k 1 pˆ mk 1 pˆ mk
K
– Gini index:
k k '
K ∧ ∧
– Cross-entropy:
∑p
k =1
mk log pmk
• Ex: 2 classes of Y, p = the proportion of second class
• Categorical Predictors, X:
– Problem:
Consider splits of sub tree t into tL and tR based on a
unordered categorical predictor x which has q
possible values: 2(q-1) possibilities !
– Solution:
– Order the predictor classes by increasing mean of the
outcome Y.
– Treat the categorical predictor as if it were ordered
→ optimal split, in terms squared error
or Gini index, among all 2(q-1) possible splits
• Classification: The Loss Matrix
– Consequences of misclassification depends on class
– Define loss function L → K x K Loss Matrix with Lkk’
∧ ∧
– Modify the Gini index as
∑L
k ≠k '
kk ' p mk p mk '
• Instability of Trees
• Other trees:
c5.0: after growing tree, dropping condition
without changing the subset