Documente Academic
Documente Profesional
Documente Cultură
Abstract The main objective of this study is to compare the results of decision
tree classifier and its ensembles for landslide susceptibility assessment along the
National Road 32 of Vietnam. First, a landslide inventory map with 262 landslide
locations was constructed using data from various sources that accounts for
landslides that occurred during the last 20 years. Second, ten landslide conditioning factors (slope, aspect, relief amplitude, topographic wetness index, toposhape, distance to roads, distance to rivers, distance to faults, lithology, and
rainfall) were prepared. Third, using decision tree and two ensemble techniques
i.e. Bagging and AdaBoost, landslide susceptibility maps were constructed.
Finally, the resultant landslide susceptibility maps were validated and compared
using a validation dataset not used during the model building. The results show
that the decision tree with Bagging ensemble technique have the highest prediction
capability (90.6 %), followed by the decision tree (87.8 %) and the decision tree
with AdaBoost (86.2 %).
Keywords Decision tree
analysis Vietnam
Ensemble technique
Landslide
GIS
Spatial
303
304
1 Introduction
Rainfall-triggered landslides are considered to be the most significant natural
hazards in the north-western mountainous region of Vietnam (Tien Bui et al.
2012b). They have caused different types of damage affecting people, organizations, infrastructure, and the environment. The identification of areas susceptible to
landslides is an essential task for assessing the landslide risk, and will contribute to
public safety and decision-making in land management (Gorsevski et al. 2006).
However, only a few landslide studies have been carried out and thus study of
landslides is an urgent task in Vietnam (Tien Bui et al. 2012b).
Over the years, various methods and techniques for landslides prediction have
been proposed and they vary from simple expert-based procedures to sophisticated
mathematical models (Chung and Fabbri 2008). Review of these methods and
techniques can be seen in Chacon et al. (2006).
Since the quality of landslide susceptibility models influence the method used to
produce them (Yilmaz 2010), the investigation of new methods and techniques
therefore, is higly necessary. In recent years, artificial intelligence techniques and
data mining approaches are used in landslide studies and in general they outperform
the conventional methods (Pradhan et al. 2010). In more recent years, ensemblebased approaches have received much attention in many fields including landslide
studies. This is because the ensemble-based approaches have a capability to improve
the prediction performance of models (Rokach 2010). In the ensemble methods,
multiple classifiers are integrated and combined to produce the final model.
The main objective of this study is to apply the decision tree and its ensemble
techniques for landslide susceptibility assessment along the National Road 32 of
Vietnam. The difference between this study and the aforementioned literature is
that two ensembles techniques: Bagging and AdaBoost were used. The computation process was carried out using MATLAB 7.11 and WEKA ver.3.6.6. Finally,
a comparison of the results were made to choose the best one.
305
of the study area. The main lithologies are sandstone, conglomerate, clay shale,
clayey limestone, siltstone, limestone, and clayey limestone.
In the study area, landslide locations were derived from the inventory map
compiled earlier by Ho (2008). The landslide inventory (Fig. 1) was used to derive
the quantitative relationships between the landslide occurrences and conditioning
factors. A total of 262 landslides depicted by polygons were registered. These
landslides occurred during the last 20 years. The smallest landslide size is about
476 m2, the largest landslide size is 37,326 m2.
In order to predict the location of future landslides, ten landslide conditioning
factors were consideredin this study. Slope, aspect, relief amplitude, toposhape,
and topographic wetness index (TWI) (Figs. 3, 4) were extracted from a digital
elevation model (DEM) that was generated from national topographic maps at
1:50,000 scale. The resolution of the DEM is 20 m.
Distance to roads and distance to rivers maps (Figs. 5a, b) were constructed
based on the river and road networks from the national topographic maps.
Lithology map (Fig. 2) and distance to faults map (Fig. 4c) were constructed from
the Geological and Mineral Resources Maps 1:200,000 scale. Finally, a rainfall
map (Fig. 5c) was included in the analysis. The detailed classes for the ten
landslide conditioning factors are shown in Table 1.
306
307
Fig. 4 a Toposhade map; b topographic wetness index (TWI) map; c distance to faults map
(Table 1) using the MaxMin formula (Tien Bui et al. 2012a). The attribute value
was obtained based on frequency ratio. In the landslide inventory map, a 1 was
assigned to landslide pixels whereas a 0 was assigned for pixels outside a
landslide i.e. non-landslide pixels.
To evaluate the prediction capability of a landslide model, a landslide inventory
should be split into two subsets; one is used for training and the other is used for
validation (Chung and Fabbri 2003). Since the dates for the past landslide are
unknown, the temporal division of the landslide inventory map is impossible. In
this study, the landslide inventory map was randomly split in a 70/30 ratio for
training and validation of the model, respectively (Fig. 1). In the next step, a total
of 2,781 non-landslide pixels were randomly sampled from the landslide-free area.
Finally, values for the ten conditioning factors were then extracted to build a
training dataset.
308
Aspect
Relief
Amplitude
TWI
Toposhape
Lithology
08
815
1525
2535
3545
[45
Flat
North
Northeast
East
Southeast
South
Southwest
West
Northwest
050
50200
200350
350500
[500
\5
510
1015
1520
[20
Ridge
Saddle
Flat
Ravine
Convex hillside
Saddle hillside
Slope hillside
Concave hillside
Inflection hillside
Unknown hillside
Aluvium
Conglomerate
Dyke
Intermediate
K-Pluton
K-Volcanic
Limestone
P-Volcanic
Sandstone
1050626
707774
1949706
2352056
1379361
431672
370810
880893
954851
887194
943832
1016869
1061249
893222
862275
494988
3797449
3106823
449532
27992
744
6246498
1328468
233435
20661
1437448
113672
374668
1399148
1030755
2408283
16366
945577
60540
90484
239956
789689
27674
42216
87073
4030918
240162
5770
237588
0
197
1226
1471
748
150
0
282
217
468
595
891
622
252
465
85
2664
994
49
0
0
3538
249
5
0
744
0
0
546
517
1403
0
555
27
0
79
188
23
0
0
1725
39
0
192
0.000
0.578
1.305
1.298
1.126
0.721
0.000
0.665
0.472
1.095
1.309
1.819
1.217
0.586
1.119
0.356
1.456
0.664
0.226
0.000
0.000
1.176
0.389
0.044
0.000
1.074
0.000
0.000
0.810
1.041
1.209
0.000
1.218
0.926
0.000
0.683
0.494
1.725
0.000
0.000
0.888
0.337
0.000
1.677
Attribute
Normalized
Classes
1
2
3
4
5
6
1
2
3
4
5
6
7
8
9
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
0.10
0.45
0.90
0.89
0.79
0.54
0.10
0.39
0.31
0.58
0.68
0.90
0.64
0.36
0.59
0.30
0.90
0.46
0.22
0.10
0.10
0.90
0.36
0.13
0.11
0.81
0.10
0.11
0.63
0.78
0.89
0.12
0.90
0.71
0.13
0.10
0.11
0.21
0.24
0.30
0.38
0.43
0.46
0.48
(continued)
Distance
to Faults
(m)
Distance to
Roads
(m)
Distance to
Rivers (m)
Rainfall
(mm)
Schist
Shale
Tuff
0200
200400
400600
[600
040
4080
80120
[120
040
4080
80120
[120
\1500
15001700
17001900
19002200
[2200
309
Class
Pixel
Landslide
Pixel
Frequency
Ratio
Attribute
Normalized
Classes
274344
679581
1204892
2419662
2027888
1442500
1986891
273124
292995
288433
7022389
541068
581557
576604
6177712
892649
1397443
2272315
2060447
1254071
125
269
1152
1636
989
635
532
1139
958
582
1113
467
518
555
2252
215
770
1644
925
238
0.946
0.822
1.985
1.403
1.012
0.914
0.556
8.656
6.787
4.188
0.329
1.792
1.849
1.998
0.757
0.500
1.144
1.502
0.932
0.394
10
11
12
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
5
0.78
0.80
0.90
0.90
0.53
0.44
0.10
0.90
0.72
0.47
0.10
0.77
0.80
0.90
0.10
0.18
0.64
0.90
0.49
0.10
Selected
Type of pruning
Confidence factor for tree pruning
Binary splits or multiple splits
Minimum number of instances per leaf
Using Laplace smoothing
310
The first step in constructing the landslide model using decision trees is to
determine parameters that influence the size of the result tree. Therefore, a test has
been carried out to find the most suitable parameters for the study area. The most
preferable parameters are selected based on the classification accuracy. The results
are shown in Table 2.
Using the training data set and the determined parameters, the model was
trained using the stratified tenfold cross-validation method. For each running, one
fold was used for testing whereas the remaining ninefolds were used for training
the model. Finally, the decision tree model for landslide susceptibility was constructed. The size of the tree is 189 including the root node, 93 internal nodes, and
95 leafs. The detail accuracy by class and performance of the decision tree model
is shown in Tables 3, and 4.
Table 3 Performance of the decision tree, the decision tree with Bagging, and the decision tree
with AdaBoost
Model
True positive
False positive
F-measure
Class
rate (%)
rate (%)
(%)
Decision tree
Decision tree with Bagging
Decision tree with AdaBoost
0.919
0.847
0.925
0.870
0.941
0.886
0.153
0.081
0.130
0.075
0.114
0.059
0.887
0.878
0.900
0.895
0.916
0.911
Landslide
No-landslide
Landslide
No-landslide
Landslide
No-landslide
Table 4 Accuracy assessment by classes of the decision tree, the decision tree with Bagging, and
the decision tree with AdaBoost
Parameters
Decision tree
Decision tree
Decision tree
with Bagging
with AdaBoost
Classification accuracy (%)
Cohens Kappa index
Root mean squared error (RMSE)
88.28
0.765
0.307
89.75
0.795
0.286
91.33
0.827
0.273
311
312
model and reality. For decision tree with Bagging, Cohens kappa index is 0.827
indicates a good agreement between the susceptibility model and reality.
The detailed accuracy assessment by classes is shown in Table 4. It can be seen
that the True Positive rates and the F-measures are higher for the landslide class
than for the no-landslide class for all the three susceptibility models.
The success-rate curves for three models were derived by comparing the 2,781
landslide grid cells in the training dataset with the three susceptibility maps. The
areas under the success-rate curves (AUC) were then estimated for all cases
(Table 5). The result shows that all three models have a good fit with the training
dataset. The highest degree of fit is for the decision trees with AdaBoost (0.955),
followed by the decision tree with Bagging (0.933) and the single decision tree
(0.916).
313
Fig. 7 Landslide susceptibility map using the decision tree with Bagging
The closer the curve is to the upper-left corner, the better is the model. For
quantitative comparison, the areas under theprediction-rate curves (AUC) were
further calculated. The closer the AUC value is to 1, the better is the model. The
result (Fig. 9) show that the decision tree with Bagging has highest prediction
capability of future landslides (AUC = 0.906), followed by the decision tree
(AUC = 0.878) and the decision tree with AdaBoost (AUC = 0.862).
314
Fig. 8 Landslide susceptibility map using the decision tree with AdaBoost
0.916
0.933
0.955
distance to rives, this factor has a contribution to the two ensemble models.
However, it might have caused slightly noise by reducing the classification
accuracy by 0.66 % in the decision tree model.
315
Fig. 9 Prediction-rate curves and area under the curves (AUC) for the decision tree, the decision
tree with Bagging, and the decision tree with AdaBoost
Minus
Minus
Minus
Minus
Minus
Minus
Minus
Minus
Minus
Minus
All
slope
aspect
relief amplitude
TWI
toposhade
lithology
distance to faults
distance to roads
distance to rivers
rainfall
Decision tree
Decision tree
with Bagging
Decision tree
with AdaBoost
87.95
86.98
87.55
88.42
88.38
87.36
86.33
81.01
88.94
87.07
88.28
88.87
87.39
88.9
89.85
89.73
88.17
87.36
83.08
89.21
87.95
89.75
91.06
88.20
90.07
91.38
92.63
88.71
89.30
85.33
90.85
88.44
91.33
method. The final models were then applied to construct three landslide susceptibility maps. These maps only represent spatial predictions of future landslides.
They do not provide information about when and how frequently a landslide
will occur.
The performance evaluation results show that the classification accuracy of the
decision-tree models with Bagging and AdaBoost increased of about 1.47 % and
3.05 % respectively, compared to the single decision-tree model. The evaluation
of the degrees of fit of the models with training dataset show that the decision-trees
with Bagging and AdaBoost have slightly better compared to the singe decision
tree model.
316
Using Cohens Kappa index, the reliabilities of the landslide models were
assessed. The index results from 0.765 to 0.827 show a substantial agreement
between the susceptibility model and reality for all the models. The results are
satisfying compared with other works such as Saito et al. (2009) and Tien Bui et al.
(2012a).
The prediction capability of the susceptibility models were estimated using
landslide location data that was not used in the training phase. The results show
that the decision tree model with Bagging has the highest prediction capability. In
the case of the decision tree model with AdaBoost, although this model has the
highest degree of fit to the training data, the prediction capability has the lowest
value.
The relative importance of the ten conditioning factors for the three susceptibility models show that distance to roads, distance to faults, rainfall, lithology,
slope, and relief amplitude have a high contribution to all the models, the highest
for the first one. This result is different compared to studies carried out by others
such as Pradhan and Lee (2010) and Van Den Eeckhaut et al. (2006) where slope is
indicated as the most important factor. The difference is due to the fact that this
study only focuses on landslides along the corridor of the National Road 32 of
Vietnam.
As a final conclusion, the finding of this results suggested that the decision tree
model with Bagging is the most preferable in this study. The results may be useful
for policy planning and decision making in areas prone to landslides.
Acknowledgments This research was supported by the Geomatics Section, Department of
Mathematical Sciences and Technology, Norwegian University of Life Sciences, Norway.
References
Breiman L (1996) Bagging predictors. Mach Learn 24:123140
Chacon J, Irigaray C, Fernandez T, El Hamdouni R (2006) Engineering geology maps: landslides
and geographical information systems. Bull Eng Geol Environ 65:341411
Chung CJF, Fabbri AG (2003) Validation of spatial prediction models for landslide hazard
mapping. Nat Hazards 30:451472
Chung C-J, Fabbri AG (2008) Predicting landslides for risk analysisspatial models tested by a
cross-validation technique. Geomorphology 94:438452
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an
application to boosting. J Comput Syst Sci 55:119139
Gorsevski PV, Gessler PE, Boll J, Elliot WJ, Foltz RB (2006) Spatially and temporally
distributed modeling of landslide susceptibility. Geomorphology 80:178198
Ho TC (2008) Application of structural geology methods, remote sensing, and GIS for the
assessment and prediction of landslide and flood along the National Road 32 in the Yen Bai
and Lai Chau provinces of Vietnam. Vietnam Institute of Geosciences and Mineral Resources,
Hanoi, p 118
Pradhan B, Lee S (2010) Delineation of landslide hazard areas on Penang Island, Malaysia, by
using frequency ratio, logistic regression, and artificial neural network models. Environ Earth
Sci 60:10371054
317