Documente Academic
Documente Profesional
Documente Cultură
pp1067-nrr-477452
18:23
C 2003)
Natural Resources Research, Vol. 12, No. 4, December 2003 (
This study compares the performance of favorability mappings by weights of evidence (WOE),
probabilistic neural networks (PNN), logistic regression (LR), and discriminant analysis (DA).
Comparisons are made by an objective measure of performance that is based on statistical
decision theory. The study further emphasizes out-of-sample inference, and quantifies the
extent to which outcome is influenced by optimum variable discretization with classification
and regression trees (CARTS).
Favorability mapping methodologies are evaluated systematically across three case studies
with contrasting scale and geologic information:
Case Study
Carlin
sediment-hosted
gold
deposit
small (0.01 km2 )
high
complex
moderate
modest
Scale
Cell size
Information level
Geovariables
Variable interdependency
Asymmetry in frequency of
barren and mineralized cells
Alamos
intrusion-related
copper
district
medium (1 km2 )
moderate
simple
low
considerable
Nevada
intrusion-related
copper
regional
large (7 km2 )
low
simple
high
severe
Estimated favorabilities for all cells then are represented by computed percent correct
classification, and expected loss of optimum decision.
The deposit-scale Carlin study reveals that the performances of the various methods from
lowest to highest expected decision loss are: PNN, nonparametric DA, binary PNN (WOE
variables), LR, and WOE. Moreover, the study indicates that approximately 40% of the
increase in expected decision loss using WOE instead of PNN is the result of information loss
from variable discretization. The remaining increases in losses using WOE are the result of its
lesser inferential power than PNN.
The district-scale Alamos study shows that the lowest expected decision loss is not by
PNN, but by canonical DA. CARTS discretization improves greatly the performance of WOE.
However, PNN and DA perform better than WOE.
Unlike findings from the Alamos and Carlin studies, results from the regional-scale Nevada
study indicate that decision losses by LR and DA are lower than those by WOE or PNN.
241
C 2003 International Association for Mathematical Geology
1520-7439/03/1200-0241/1
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
242
INTRODUCTION
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
243
Features of the Case Studies
The three case studies represent three different scales of exploration: regional (Nevada), mining district (Alamos), and individual mineral deposit
(Carlin). Accordingly, exploration completeness, cell
size, and specificity of geovariables differ greatly
across these three scales of exploration:
Carlin gold mineralization: High-level information,
small cell size (10,000 m2 , or 0.01 km2 ), small geographic area with relatively homogeneous geology, a single deposit type, complete exploration,
complex geovariables, low to moderate variable
interdependency, moderate asymmetry in numbers
of barren and mineralized cells.
Alamos District intrusion-related mineralization:
Moderate-level information, cell size of 1 km2 , incomplete exploration, simple geovariables, low interdependency of variables, multiple deposit types,
considerable asymmetry in number of mineralized
and barren cells.
Nevada (north-central) intrusion-related mineralization: Low-level information, relatively large cell
size (7 km2 , compared with 1 km2 for Alamos
and 0.01 km2 for Carlin), incomplete exploration,
simple geovariables, high geovariable interdependency, low homogeneity of geology, multiple deposit types, and severe asymmetry in numbers of
mineralized and barren cells.
Each of these case studies followed the same
paradigm:
Divide the cells of the case study randomly into
two subsets: Training set and validation set;
train the method (estimate model parameters)
on the training set only;
use the trained model to estimate favorability
for all cells (training cells + validation cells);
compute the percent correct classification for
all cells; and
compute expected decision loss of optimum
decision.
The number of randomly selected cells in the training set was constrained by the capacity of the PNN
software. Moreover, the fraction of those randomly
selected cells that were mineralized was determined
totally by the random sampling process.
It is important to emphasize that only one of these
case studies, Carlin Gold, satisfies the requirement for
unqualified comparison: All cells in the training and
validation areas have been drill-tested and determined
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
244
to be either barren or mineralized. Thus, although two
additional case studies are employed to examine performances on different scales and complexity of geology and mineralization, only the Carlin study permits
unqualified judgment.
Misclassified Cells: A Difficult Problem
The most problematic of the factors affecting performance is the misclassification of mineralized cells
in the training set as barren, reflecting the fact that
such cells have deposits, but they have not yet been
discovered. The presence of these misclassified cells
in the training set presents models with contradictory
or ambiguous information about geology and mineral occurrence. To the extent that the geological variables do capture important differences in the geology
of mineralized and barren cells, misclassification decreases the capability of the models to identify these
differences and utilize them in classification of unknown cells. The presence of misclassified cells in the
barren group causes the trained models to not fit their
training data as well as they would otherwise, and their
performances on the validation set are similarly confounded. Although misclassification of training cells
diminishes the performance of all methods, its effect
may not be equal across methods.
Measuring Performance
Although all investigated methods were selected
because they provide a Bayesian probability for mineralization, output is treated only as a scaled [0,1]
index. Accordingly, a methods performance is described by the percentage of mineralized cells or barren cells correctly classified when a specific index
value is used as the classification criterion (see Fig. 1).
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
245
correct classification for a range of cutoff probabilities for mineralization, as described in the following
section.
A DECISION-THEORY APPROACH
TO PERFORMANCE VALUATION
(1)
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
246
Mineralized
Barren
10
25
50
100
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1
0.9448
0.9428
0.9338
0.9196
0.893
0.8596
0.7917
0.6858
0.4977
0.2635
0
0.8716
0.8944
0.9116
0.9297
0.9447
0.9597
0.9754
0.9884
0.9973
1
14612.0
1973.9
1644.3
1408.9
1169.6
997.5
837.5
728.4
725.9
929.0
1304.3
14612.0
2853.8
2556.0
2464.1
2451.1
2703.0
3075.3
4048.4
5734.0
8935.2
13043.4
14612.0
4320.2
4075.6
4222.7
4586.9
5545.5
6805.1
9581.9
14080.7
22278.8
32608.5
14612.0
6764.1
6608.1
7153.7
8146.6
10282.9
13021.3
18804.4
27991.9
44518.1
65217.1
14612.0
11652.1
11673.1
13015.7
15266.1
19757.7
25453.7
37249.4
55814.3
88996.8
130434.2
0.8
725.9
0.4
2451.1
0.2
4075.6
0.2
6608.1
0.1
11652.1
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
247
by measurements on eight geological variables:
tm
trend
vmag
resis
tc
au
as
hg
Generation of Variables
Each of the cells is described by the presence/absence of gold, as confirmed by drilling, and
TM linear score
regional structural trend score
vertical magnetic component
resistivity ratio
total radiometric count
soil Au in ppb
soil As in ppm
soil Hg in ppb
Tm is a quantified structural score from a set of lineaments interpreted from Thematic Mapping images.
The quantification is based upon a set of structural
measurements generated from a moving window. Details of this quantification procedure are given in Pan
(1989). Trend is a quantified structural score based
upon regional structures interpreted by geologists.
The quantification procedure is the same as for tm.
Vmag is the first vertical derivative of the total magnetic field. Resis is the ratio of the resistivity value at
900 Hz frequency to the resistivity value at 7200 Hz
frequency. Tc is the total count of K, Th, and U from an
airborne radiometric survey. Au, as, and hg are interpolated, nontransformed, values of gold, arsenic, and
mercury, respectively, from a soil sample geochemical
survey.
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
248
vmag
resis
tc
au
as
hg
1 for absence
of trend
1 otherwise
1 otherwise
1 otherwise
1 otherwise
1 otherwise
1 otherwise
Tm was dropped, as there were no preferred subdivisions. Dropping of tm probably had little effect on
relative performance of WOE, as a subsequent analysis by CARTS (Classification And Regression Trees)
revealed that tm was next to last in importance in the
building of a classification tree.
WOE analysis of the 3,277 training cells revealed
that the most important variables are the geochemical
scores, as they have the highest contrasts:
trend
vmag
resis
tc
au
as
hg
0.174
0.762
0.594
0.771
1.665
1.116
2.188
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
249
decision loss by the binary PNN is lower than that for
LR (logistic regression) using the original variables.
Generally, for the Carlin case study, Figure 7 shows
that the favorability mapping methods, ranked from
lowest to highest expected decision loss are:
Generation of Variables
The study area contains thirty seven prospects.
Two, or possibly three prospects, are related to a porphyry stock, seven are associated with skarn, and the
rest are polymetallic vein systems.
From old to young, explanatory variables include metasediments (TJ seds), intermediate volcanics (Kande), limestone (Kls), batholith (KTgd),
felsic volcanics (Trhy), bimodal volcanics (Tvolcs),
conglomerate (Tbaucarit), alluvium (Qal), and fractures (Frac). The variables were quantified by their
aerial extent within a given unit area (1 km2 ) cell,
based on a digitized version of the map by VazquezPerez (1975). Because of their small outcrop proportion, porphyry intrusions were left out. Fractures were
quantified based on selected influence buffers. In the
initial analysis, a buffer of 300 meters was used to
quantify fractures (Frac300), as that buffer size maximized WOEs contrast. A buffer of 500 meters was
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
250
trices. Interestingly, this DA achieves the lowest decision loss even though it is parametric, being based
upon the assumption that discriminant scores are normally distributed.
A Second AnalysisAn Investigation of Enhanced Discretization. The second analysis investigates the impact of (1) a more rigorous discretization than simple presence or absence, and (2) fewer
geovariables. The finding of the large decision loss by
WOE begs the question: How well would WOE perform when it is applied under conditions that enhance
its performance? Such conditions would include reduction of variables and a discretization scheme that
enhances the relationship between the resulting binary geovariables and mineral occurrence.
Although there are various methods for discretization (Pan and Harris, 2000), this study employed CARTS. The criterion used to rank candidate
splits was the heterogeneity of a tree node, which in
this study was described by the Gini coefficient. Analysis by CARTS determined the optimum tree to have
5 nodes, when prior probability is an average of relative frequency from the data and equal priors, and
when rejection loss is 3 times acceptance loss. Because a tree with 7 nodes had only a slightly higher
decision cost, it was selected to ensure that the favorability mapping methods were not overly constrained
by the methodology of CARTS, and to accommodate
differences in how the various methods employed the
geovariables. For a 7-node classification tree, the most
important variables were, in decreasing order, Kande,
KTgd, Trhy, TJseds, and Frac200. The cut points selected for discretization were early splits on the five
variables: Kande > 0.00478; KTgd > 0.0237; Trhy >
0.0796; Frac200 > 0.111; and TJseds > 0.158.
The new binary variables improved greatly the
performance by WOE. Figure 9 shows a great
Analyses
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
251
that of Carlin. A consequence of the larger cell size is
decreased specificity of geovariables in terms of their
relationship to mineral occurrence. The larger number of geovariables (15) describe a broad spectrum
of information, for example magnetics, radiometrics,
and geochemistry, in addition to lithologies and structure. Everything else being equal, a large number of
geovariables commonly results in increased variable
interdependency.
Known mineralization in the Nevada study area
is not nearly as abundant as in the Carlin and Alamos
areas. For example, mineralized cells constitute only
about 1.4% of the cells in the Nevada study area. This
is a small percentage when compared with Carlin,
for which 10.8% are mineralized, and with Alamos,
for which about 5% are mineralized. Thus, as statistical populations, the two classes of cells (mineralized
and barren) are asymmetric in number. This asymmetry could represent lower mineralization rates or increased post-mineralization cover. As with Alamos,
mineralized cells in the Nevada study area include
several types of intrusion-related Cu deposits.
Generation of Variables
Conforming to the field descriptor Genlith of
the digital version of the geologic map of Nevada
(Turner, Bawiec, and Ambroziak, 1991), variables
were delineated based on lithology and age associations. They include:
Uptec
Lowtev
Lowtep
Mzvol
Mzplut
Lmzsed
Upzseq
Lpzcarb
Lpztrans
Lpzclast
Thrusts
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
252
Radiometrics (Phillips, Duval, and
Ambroziak, 1993) generalized from
the frequency distribution to include
high (Eu 2) and low (Eu < 2) values
from an original cell size of 2.4 2.4
kms
Mag
Total magnetic intensity (Kucks, 1999a)
generalized from the frequency
distribution to include high (Gammas
0) and low (Gammas < 0) values
from an original cell size of
2.4 2.4 kms
Grav
Bouguer gravity (Kucks, 1999b)
generalized from the frequency
distribution to include high (Milligals
220) and low (Milligals < 220)
values from an original cell size of
4.8 4.8 kms
Geochem Principal components score generated
from analysis of 1553 NURE
(Grossman, 1998) base metal stream
sediment geochemical data. The
high-low contour map is based on the
frequency distribution of the
ln(Cu)-bearing factor.
Mineral deposit sites used for this analysis include 207 copper-bearing intrusion-related mines and
prospects extracted from MRDS (Frank, 1999) based
on the presence of Cu as a major commodity: replacement, skarn, disseminated, or porphyry copper.
Rad
Analyses
Initial Analysis. The initial analysis employed all
15 variables. Both WOE and Binary PNN employed a
simple presence and absence discretization. As shown
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
253
entire (training + validation) set, that ratio is extremely low, 0.08, indicating severe violation of the
conditional independency assumption even with the
CARTS variables.
The performance of WOE on the CARTS variables contrasts markedly with the relative performances of PNN and WOE in the Alamos and Carlin
studies. It is important to note, however, that even
though losses by the CARTS-based WOE are reduced, they are considerably larger than the decision
losses by LR and DA for all 15 original variables (compare Figs. 11 and 12).
Using the same CARTS binary variables, LR
and canonical DA were trained and used to estimate
a probability for mineralization for all 14,844 cells.
Decision losses by CARTS-based LR are somewhat
lower than for the CARTS-based WOE, but not as
low as LR on the original variables. However, decision losses by CARTS-based canonical DA are noticeably the lowest of all, being lower than those of the
CARTS-based WOE or LR and lower than those by
the LR and DA using the original variables (compare
Figs. 11 and 13).
Figure 12. Decision Loss for Nevada based upon CARTS variables:
WOE and PNN (continuous and binary variables).
The Carlin case study provides the rare opportunity to examine relative performances of favorability
mapping methodologies for a single deposit type
when exploration is complete. This circumstance is
important when comparing mapping methodologies.
Conclusions about relative performance of mapping
methods must be qualified whenever exploration is
incomplete because of possible conflicting or ambiguous information presented for training when mineralized cells are misidentified as belonging to the barren
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
254
set. Moreover, everything else being equal, the presence of multiple deposit types further compounds the
training of mapping models and the analysis of out-ofsample performance. This study determined the performance of WOE to be inferior to the other methods,
which included PNN, DA, and LR. Furthermore, this
study also demonstrated that only part of this inferior
performance by WOE is the result of the loss of information when the variables are discretized to satisfy
data requirement by WOE.
The relative inferential power of PNN and
WOE was examined by training PNN on the same
discretized variables used by WOE. Interestingly,
expected decision losses by this binary PNN are
considerably lower than those by WOE, thereby
demonstrating that part of the noticeably inferior performance by WOE reflects a less powerful use of
information than by PNN. Moreover, decision losses
by the binary PNN are less than those by Fisher
DA and LR based upon the original variables.
Favorability methods ranked by decision loss on
the Carlin case study from lowest to highest
are:
P1: IAZ/JLS
Natural Resources Research (NRR)
pp1067-nrr-477452
18:23
255
Cooley, W. W., and Lohnes, P. R., 1962, Multivariate procedures
for the behavioral sciences: John Wiley & Sons, New York,
211 p.
Frank, D. G., 1999, Mineral Resources Data System (MRDS) data
base: U.S. Geol. Survey digital data series DDS-52, 1 CDROM.
Grossman, J. N., 1998, National geochemical atlas: The geochemical landscape of the conterminous United States derived from
stream sediment and other solid sample media analyzed by the
National Uranium Resource Evaluation (NURE) Program
(version 3.01): U.S. Geol. Survey Open-File Rept. 980622,
1 CD-ROM.
Harris, D. P., 1984, Mineral resources appraisal; mineral endowment, resources, and potential supply; concepts, methods, and
cases: Oxford Univ. Press, New York, 455 p.
Harris, D. P., and Pan, G., 1999, Mineral favorability mapping: A
comparison of artificial neural networks, logistic regression,
and discriminant analysis: Natural Resources Research, v. 8,
no. 2, p. 93109.
Kemp, L. D., Bonham-Carter, G. F., and Raines, G. L., 1999, ArcWofE: Arc View extension for weights of evidence mapping:
http://gis.nrcan.gc.ca/software/arcview/wofe.
Kucks, R. P., 1999a, Magnetic anomaly data grid: U.S. Geol. Survey
digital data series DDS-09, 1 CD-ROM.
Kucks, R. P., 1999b, Gravity anomaly data grid: U.S. Geol. Survey
digital data series DDS-09, 1 CD-ROM.
Mihalasky, M. J., and Bonham-Carter, G. F., 2001, Lithodiversity
and its spatial association with metallic mineral sites, Great
Basin of Nevada: Natural Resources Research, v. 10, no. 3,
p. 209226.
Pan, G., 1989, Concepts and methods of multivariate information
synthesis for mineral resources estimation: unpubl. doctoral
dissertation, Univ. Arizona, 302 p.
Pan, G., and Harris, D. P., 2000, Information synthesis for mineral
exploration: Oxford Univ. Press, New York, 461 p.
Phillips, J. D., Duval, J. S., and Ambroziak, R. A., 1993, National
geophysical data grids: gamma-ray, gravity, magnetic, and topographic data for the conterminous United States: U.S. Geol.
Survey digital data series DDS-09, 1 CD-ROM.
Singer, D. A., and Kouda, R., 1999, A comparison of the weights-ofevidence method and probabilistic neural networks: Natural
Resources Research, v. 8, no. 4, p. 287298.
Steinberg, D., and Colla, P. L., 1995, CART: Tree-structured
non-parametric data analysis: Salford Systems, San Diego,
California
Turner, R. M., Bawiec, W. J, and Ambroziak, R. A., 1991, Geology
of Nevada; a digital representation of the 1978 geologic map
of Nevada: U.S. Geol. Survey digital data series DDS-02, 1
CD-ROM.
Vazquez-Perez, A., 1975, Economic geology of the Alamos mining
district, Sonora, Mexico: unpubl. masters thesis, Univ. Arizona,
170 p.
Wright, D. F., and Bonham-Carter, G. F., 1996, VHMS favourability mapping with GIS-based integration models, Chisel LakeAnderson Lake area, in Bonham-Carter, G. F., Galley, A. G.,
and Hall, G. E. M., eds., EXTECH I; a multidisciplinary approach to massive sulphide research in the Rusty Lake-Snow
Lake greenstone belts, Manitoba: Geol. Survey Canada, Bull.
p. 339379.