0 evaluări0% au considerat acest document util (0 voturi)

4 vizualizări6 paginiDecisionTree Intro IR2009 EMDM

Nov 21, 2019

© © All Rights Reserved

PDF, TXT sau citiți online pe Scribd

DecisionTree Intro IR2009 EMDM

© All Rights Reserved

0 evaluări0% au considerat acest document util (0 voturi)

4 vizualizări6 paginiDecisionTree Intro IR2009 EMDM

© All Rights Reserved

Sunteți pe pagina 1din 6

Decision Tree: Introduction or adenocarcinoma); histopathologic grade (a

crude indicator of tumor biology); and depth of

A decision tree is a powerful method for classifica- tumor invasion (PT classification). It is believed

tion and prediction and for facilitating decision that number of nodes to be removed should

making in sequential decision problems. This increase with more deeply invasive tumors when

entry considers three types of decision trees in histopathologic grade is poorly differentiated and

some detail. The first is an algorithm for a recom- that number of nodes differs by cell type.

mended course of action based on a sequence of The decision tree in this case is composed pre-

information nodes; the second is classification and dominantly of chance outcomes, these being the

regression trees; and the third is survival trees. results from pathology (cell type, grade, and tumor

depth). The surgeon’s only decision is whether to

perform the esophagectomy. If the decision is made

Decision Trees

to operate, then the surgeon follows this decision

Often the medical decision maker will be faced with line on the graph, moving from left to right, using

a sequential decision problem involving decisions pathology data to eventually determine the termi-

that lead to different outcomes depending on nal node. The terminal node, or final outcome, is

chance. If the decision process involves many number of lymph nodes to be removed.

sequential decisions, then the decision problem Decision trees can in some instances be used to

becomes difficult to visualize and to implement. make optimal decisions. To do so, the terminal

Decision trees are indispensable graphical tools in nodes in the decision tree must be assigned termi-

such settings. They allow for intuitive understand- nal values (sometimes called payoff values or end-

ing of the problem and can aid in decision making. point values). For example, one approach is to

A decision tree is a graphical model describing assign values to each decision branch and chance

decisions and their possible outcomes. Decision branch and define a terminal value as the sum of

trees consist of three types of nodes (see Figure 1): branch values leading to it. Once terminal values

are assigned, tree values are calculated by follow-

1. Decision node: Often represented by squares ing terminal values from right to left. To calculate

showing decisions that can be made. Lines the value of chance outcomes, multiply by their

emanating from a square show all distinct probability. The total for a chance node is the total

options available at a node. of these values. To determine the value of a deci-

sion node, the cost of each option along each deci-

2. Chance node: Often represented by circles

sion line is subtracted from the cost already

showing chance outcomes. Chance outcomes

calculated. This value represents the benefit of the

are events that can occur but are outside the

decision.

ability of the decision maker to control.

3. Terminal node: Often represented by triangles

or by lines having no further decision nodes or

Classification Trees

chance nodes. Terminal nodes depict the final In many medical settings, the medical decision

outcomes of the decision making process. maker may not know what the decision rule is.

Rather, he or she would like to discover the deci-

For example, a hospital performing esophagec- sion rule by using data. In such settings, decision

tomies (surgical removal of all or part of the trees are often referred to as classification trees.

esophagus) for patients with esophageal cancer Classification trees apply to data where the y-value

wishes to define a protocol for what constitutes an (outcome) is a classification label, such as the dis-

adequate lymphadenectomy in terms of total num- ease status of a patient, and the medical decision

ber of regional lymph nodes removed at surgery. maker would like to construct a decision rule that

The hospital believes that such a protocol should predicts the outcome using x-variables (dependent

be guided by pathology (available to the surgeon variables) available in the data. Because the data

prior to surgery). This information should include set available is just one sample of the underlying

324 Decision Tree: Introduction

Outcome 1

A Outcome 2

C

Outcome 3

1 Outcome 4

Outcome 5

2

Outcome 6

B

Outcome 7

Figure 1 Decision trees are graphical models for describing sequential decision problems.

rule that is accurate not only for the data at hand

Classification trees are decision trees derived

but over external data as well (i.e., the decision

using recursive partitioning data algorithms that

rule should have good prediction performance). At

classify each incoming x-data point (case) into

the same time, it is helpful to have a decision rule

one of the class labels for the outcome. A classifi-

that is understandable. That is, it should not be so

cation tree consists of three types of nodes (see

complex that the decision maker is left with a

Figure 2):

black box. Decision trees offer a reasonable way to

resolve these two conflicting needs.

1. Root node: The top node of the tree comprising

all the data.

Background

2. Splitting node: A node that assigns data to a

The use of tree methods for classification has a subgroup.

history that dates back at least 40 years. Much of

the early work emanated from the area of social 3. Terminal node: Final decision (outcome).

sciences, starting in the late 1960s, and computa-

tional algorithms for automatic construction of Figure 2 is a CART tree constructed using the

classification trees began as early as the 1970s. breast cancer databases obtained from the

Algorithms such as the THAID program devel- University of Wisconsin Hospitals, Madison (avail-

oped at the Institute for Social Research, University able from http://archive.ics.uci.edu/ml). In total,

of Michigan, laid the groundwork for recursive the data comprise 699 patients classified as having

partitioning algorithms, the predominate algo- either benign or malignant breast cancer. The goal

rithm used by modern-day tree classifiers, such as here is to predict true disease status based on nine

Classification and Regression Tree (CART). different variables collected from biopsy.

Decision Tree: Introduction 325

Benign

458/241

Nuclei = 1,2,3,4,5,? Nuclei = 10,6,7,8 Unshape < 2.5 Unshape > = 2.5

Benign Malignant

417/12 41/229

Clump < 5.5 Clump > = 5.5 Nuclei = 1,2,? Nuclei = 10,3,4,5,6,7,8,9

416/5 1/7 18/5 23/224

Unshape < 3.5 Unshape > = 3.5

18/1 0/4 13/23 10/201

Benign Malignant

7/0 6/23

Note: Light-shaded and dark-shaded barplots show frequency of data at each node for the two classes: benign (light shaded);

malignant (dark shaded). Terminal nodes are classified by majority voting (i.e., assignment is made to the class label having the

largest frequency). Labels in black given above a splitting node show how data are split depending on a given variable. In some

cases, there are missing data, which are indicated by a question mark.

The first split of the tree (at the root node) is on construction is completed, terminal nodes are

the variable “unsize,” measuring uniformity of cell assigned class labels by majority voting (the class

size. All patients having values less than 2.5 for label with the largest frequency). Each patient in a

this variable are assigned to the left node (the left given terminal node is assigned the predicted class

daughter node); otherwise they are assigned to the label for that terminal node. For example, the left-

right node (right daughter node). The left and right most terminal node in Figure 2 is assigned the class

daughter nodes are then split (in this case, on the label “benign” because 416 of the 421 cases in the

variable “unshape” for the right daughter node node have that label. Looking at Figure 2, one can

and on the variable “nuclei” for the left daughter see that voting heavily favors one class over the

node), and patients are assigned to subgroups other for all terminal nodes, showing that the deci-

defined by these splits. These nodes are then split, sion tree is accurately classifying the data. However,

and the process is repeated recursively in a proce- it is important to assess accuracy using external

dure called recursive partitioning. When the tree data sets or by using cross-validation as well.

326 Decision Tree: Introduction

into daughter nodes, each node in the tree becomes

In general, recursive partitioning works as fol-

homogeneous and is populated by cases with simi-

lows. The classification tree is grown starting at the

lar outcomes (recall Figure 2).

root node, which is the top node of the tree, com-

There are several impurity functions used. These

prising all the data. The root node is split into two

include the twoing criterion, the entropy criterion,

daughter nodes: a left and a right daughter node.

and the gini index. The gini index is arguably the

In turn, each daughter node is split, with each split

most popular. When the outcome has two class

giving rise to left and right daughters. The process

labels (the so-called two-class problem), the gini

is repeated in a recursive fashion until the tree can-

index corresponds to the variance of the outcome

not be partitioned further due to lack of data or

if the class labels are recoded as being 0 and 1.

some stopping criterion is reached, resulting in a

collection of terminal nodes. The terminal nodes

represent a partition of the predictor space into a Stopping Rules

collection of rectangular regions that do not over- The size of the tree is crucial to the accuracy of

lap. It should be noted, though, that this partition the classifier. If the tree is too shallow, terminal

may be quite different than what might be found nodes will not be pure (outcomes will be heteroge-

by exhaustively searching over all partitions cor- neous), and the accuracy of the classifier will suf-

responding to the same number of terminal nodes. fer. If the tree is too deep (too many splits), then

However, for many problems, exhaustive searches the number of cases within a terminal node will be

for globally optimal partitions (in the sense of pro- small, and the predicted class label will have high

ducing the most homogeneous leaves) are not com- variance—again undermining the accuracy of the

putationally feasible, and recursive partitioning classifier.

represents an effective way of undertaking this task To strike a proper balance, pruning is employed

by using a one-step procedure instead. in methodologies such as CART. To determine the

A classification tree as described above is referred optimal size of a tree, the tree is grown to full size

to as a binary recursive partitioned tree. Another (i.e., until all data are spent) and then pruned back.

type of recursively partitioned tree is multiway The optimal size is determined using a complexity

recursive partitioned tree. Rather than splitting the measure that balances the accuracy of the tree as

parent node into two daughter nodes, such trees measured by cost complexity and by the size of

use multiway splits that define multiple daughter the tree.

nodes. However, there is little evidence that multi-

way splits produce better classifiers, and for this

reason, as well as for their simplicity, binary recur- Regression Trees

sive partitioned trees are often favored.

Decision trees can also be used to analyze data

when the y-outcome is a continuous measurement

Splitting Rules

(such as age, blood pressure, ejection fraction for

The success of CART as a classifier can be the heart, etc.). Such trees are called regression

largely attributed to the manner in which splits are trees. Regression trees can be constructed using

formed in the tree construction. To define a good recursive partitioning similar to classification trees.

split, CART uses an impurity function to measure Impurity is measured using mean-square error. The

the decrease in tree impurity for a split. The purity terminal node values in a regression tree are

of a tree is a measure of how similar observations defined as the mean value (average) of outcomes

in the leaves are to one another. The best split for for patients within the terminal node. This is the

a node is found by searching over all possible vari- predicted value for the outcome.

ables and all possible split values and choosing

that variable and split that reduces impurity the

Survival Trees

most. Reduction of tree impurity is a good princi-

ple because it encourages the tree to push dissimi- Time-to-event data are often encountered in the

lar cases apart. Eventually, as the number of nodes medical sciences. For such data, the analysis

Decision Tree: Introduction 327

focuses on understanding how time-to-event var- without occurrence of a symptom and then

ies in terms of different variables that might be released from a hospital. Such a patient is said to

collected for a patient. Time-to-event can be time be right censored because the time-to-event must

to death from a certain disease, time until recur- exceed 2 weeks, but the exact event time is

rence (for cancer), time until first occurrence of a unknown. Another example of right censoring

symptom, or simple all-cause mortality. occurs when patients enter a study at different

The analysis of time-to-event data is often com- times and the study is predetermined to end by a

plicated by the presence of censoring. Generally certain time. Then, all patients who do not experi-

speaking, this means that the event times for some ence an event within the study period are right

individuals in a study are not observed exactly censored.

and are only known to fall within certain time Decision trees can be used to analyze right-cen-

intervals. Right censoring is one of the most com- sored survival data. Such trees are referred to as

mon types of censoring encountered. This occurs survival trees. Survival trees can be constructed

when the event of interest is observed only if it using recursive partitioning. The measure of impu-

occurs prior to some prespecified time. For exam- rity plays a key role, as in CART, and this can be

ple, a patient might be monitored for 2 weeks defined in many ways. One popular approach is to

NKI70

p = .007

>0

≤0 3

TSP

p = .298

≤1 >1

1 1 1

.8 .8 .8

.6 .6 .6

.4 .4 .4

.2 .2 .2

0 0 0

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Note: Dependent variables NKI70 and TSP are gene signatures. For example, extreme right terminal node (Node 5) corresponds

to presence of both the NKI70 and TSP gene signatures. Underneath each terminal node are Kaplan-Meier survival curves for

patients within that node.

328 Decision Trees, Advanced Techniques in Constructing

CART, growing a tree by reducing impurity

Decision trees, by their very nature, are simple and

ensures that terminal nodes are populated by indi-

intuitive to understand. For example, a binary

viduals with similar behavior. In the case of a sur-

classification tree assigns data by dropping a data

vival tree, terminal nodes are composed of patients

point (case) down the tree and moving either left

with similar survival. The terminal node value in a

or right through nodes depending on the value of

survival tree is the survival function and is esti-

a given variable. The nature of a binary tree

mated using those patients within the terminal

ensures that each case is assigned to a unique ter-

node. This differs from classification and regres-

minal node. The value for the terminal node (the

sion trees, where terminal node values are a single

predicted outcome) defines how the case is classi-

value (the estimated class label or predicted value

fied. By following the path as a case moves down

for the response, respectively). Figure 3 shows an

the tree to its terminal node, the decision rule for

example of a survival tree.

that case can be read directly off the tree. Such a

Hemant Ishwaran and J. Sunil Rao rule is simple to understand, as it is nothing more

than a sequence of simple rules strung together.

The decision boundary, on the other hand, is a

See also Decision Trees, Advanced Techniques in

more abstract concept. Decision boundaries are

Constructing; Recursive Partitioning

estimated by a collection of decision rules for cases

taken together—or, in the case of decision trees,

the boundary produced in the predictor space

Further Readings between classes by the decision tree. Unlike deci-

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, sion rules, decision boundaries are difficult to

C. J. (1984). Classification and regression trees. visualize and interpret for data involving more

Belmont, CA: Wadsworth. than one or two variables. However, when the data

LeBlanc, M., & Crowley, J. (1993). Survival trees by involve only a few variables, the decision bound-

goodness of split. Journal of the American Statistical ary is a powerful way to visualize a classifier and

Association, 88, 457–467. to study its performance.

Segal, M. R. (1988). Regression trees for censored data. Consider Figure 1. On the left-hand side is the

Biometrics, 44, 35–47. classification tree for a prostate data set. Here, the

Stone, M. (1974). Cross-validatory choice and assessment outcome is presence or absence of prostate cancer

of statistical predictions. Journal of the Royal and the independent variables are prostate-specific

Statistical Society, Series B, 36, 111–147. antigen (PSA) and tumor volume, both having been

transformed on the log scale. Each case in the data

is classified uniquely depending on the value of

these two variables. For example, the leftmost ter-

Decision Trees, Advanced minal node in Figure 1 is composed of those

patients with tumor volumes less than 7.851 and

Techniques in Constructing PSA levels less than 2.549 (on the log scale).

Terminal node values are assigned by majority vot-

Decision trees such as classification, regression, ing (i.e., the predicted outcome is the class label

and survival trees offer the medical decision maker with the largest frequency). For this node, there are

a comprehensive way to calculate predictors and 54 nondiseased patients and 16 diseased patients,

decision rules in a variety of commonly encoun- and thus, the predicted class label is nondiseased.

tered data settings. However, performance of deci- The right-hand side of Figure 1 displays the

sion trees on external data sets can sometimes be decision boundary for the tree. The dark-shaded

poor. Aggregating decision trees is a simple way to region is the space of all values for PSA and tumor

improve performance—and in some instances, volume that would be classified as nondiseased,

aggregated tree predictors can exhibit state-of-the- whereas the light-shaded regions are those values

art performance. classified as diseased. Superimposed on the figure,