Documente Academic
Documente Profesional
Documente Cultură
6, 1990
Three nonparametric techniques for the optimum discretization of quantitative geological features
are proposed and demonstrated. The three methods are: isolated weight, entropy information, and
rank correlation. Optimum discretization plays important roles in solutions to the following geoscience problems: (1) signal~noise separation and delineation of meaningful anomalies and other
geofields related to mineral targets; (2) selection of those geological variables that explain variations in mineral resources; (3) determination of the best subintervals of values for a variable with
respect to mineralization; (4) enhancement of certain complex and concealed information of a
geofeature about its correlation with magnitude of mineralization; and (5) unification of diverse
geodata so that these data can be integrated and analyzed.
KEY WORDS: optimum discretization, isolated weight, entropy information, rank correlation,
INTRODUCTION
In mineral exploration, geologists often consider many quantitative geological
measurements as discrete geological phenomena. For instance, the size of ore
deposits may be expressed in terms of qualitative categories, such as large,
medium, and small deposits. This transformation takes place in the mind of a
geologist and may be considered a type of subjective discretization. Although
such a transformation is rough and imprecise, it is useful in establishing the
idea of discretization. This paper describes nonparametric statistical techniques
for optimum discretization of a quantitative measurement.
Some major geoscience problems requiring discretization include anomaly
and background separation, delineation of mineral targets, selection of geological variables, and enhancement of geoinformation. In addition, discretization
~Manuscript received 31 July 1989; accepted 9 January 1990.
2Mineral Resources Estimation and Mineral Economics, Department of Mining and Geological
Engineering, University of Arizona, Tucson, Arizona 85721.
3Director of Mineral Economics, Department of Mining and Geological Engineering, University
of Arizona, Tucson, Arizona 85721.
699
0882-812l/90/0800-0699806.00/1 1990InternationalAssociationfor MathematicalGeology
700
is also useful for unifying diverse geodata for implementation of some statistical
techniques for mineral resource estimations [e.g., characteristic analysis (Botbol, 1971), pattern recognition (Agterberg, 1989; Bonham-Carter, et al., 1988)].
The common task of discretization problems is to find one or more critical
threshold values. Several often-used methods include: (1) the frequency methods [e.g., upper (lower) anomalous limit defined as/z + 3o(/~ - 30), where
# is the expectation of the feature and a the standard deviation]; (2) second
vertical derivative; and (3) trend analysis for residuals.
Although there does not exist a universal rule for discretization, a useful
basic principle is to discretize the quantitative measure so as to enhance as much
as possible that information of the measure that describes some other a priori
selected feature. This external feature is identified from the objective of the
analysis (e.g., estimation of resources or exploration targets of a particular
metal). Conventional approaches to discretization are not optimum in terms of
this principle, as they define the critical cutoff values only by some feature of
the measurement itself, without considering external information related to the
major objective. For instance, the upper (lower) anomalous limit method determines the threshold values simply by considering the statistical characteristics of density distribution of the feature of interest, and the second vertical
derivative method (Botbol et al., 1979) defines the critical cutoff values by the
inflection points of a curve characterizing the data measured in a profile or map.
Both of these ignore other important related information.
Three new nonparametric statistical techniques for the optimum discretization, which avoid major limitations of the "self-definition" problem associated with the traditional methods, are proposed in this paper: isolated weight
method, entropy information method, and rank correlation method. Each of
these methods is based upon associations between the variable to be discretized
and a selected feature characterizing the major objective.
ISOLATED WEIGHT METHOD
This approach is especially designed for finding a single critical threshold
value for a quantitative measurement. In other words, the goal of this method
is to transform a quantitative geodescriptor into a binary variable. Such analysis
is useful for anomalous background separation. This simple transformation also
is necessary for some statistical approaches which require binary input variables
[e.g., the characteristic analysis (Botbol, 1971; McCammon et al., 1983; Pan
and Wang, 1987; Pan and Harris, 1989a), the related information (Pan, 1985),
quantification theories (Dong et al., 1979), etc.].
Definition o f Isolated W e i g h t
701
E ('~= { i l e i =
1, i ~ E } ,
E (,:
{ilei=O,i~E
Let the numbers o f elements contained in sets E (1~ and E () be nl and no,
respectively. Then,
nj + n o =
n,
E (l) U E () = E ,
E (~) (3 E (~ =
q5
d(e) = Z
p-I
w(ep, e q ) ( q - p
q=p+l
- 1)
(1)
Definition 2.
de(0, 1) - Ad(nl, no ) [ d ( e )
drain(e)]
(2)
702
where A d ( n l , no) = dmax( e ) - drain ( e ); dmax ( e ) and drain (e) are the maximum
and minimum array distances in e corresponding to the best and invalid DAs of
y*, respectively.
On the basis of definitions 1, 2, and 3 and Theorem 1, the following results
are obvious.
Theorem 2.
(1)
(2)
(3)
(4)
dmax(e )
max [ d ( e ) ] = ~
~]
i = l t=nl--i
nOr/l(/'/
--
2)
nonl(n -
n--2
Z
2) i = l j = i + l
_<
w(ei, e j ) ( j _ i _ l ) _ 3 ` d m a x ( e ) ]
w(ei, ej)
- -
(j-
i-
1 - 3,
1)
(3)
1 - 3'
P r o c e d u r e of Discretization
The basic rule of the isolated weight (IW) method for discretization is to
find a critical threshold value such that the IW defined in Eq. (3) is maximized.
Obviously, maximization of the IW is equivalent to maximizing the conformity
of the discretized variable and the selected objective feature. The term "conformity" may be illustrated by the following example. Suppose that the size of
sample (n) is 10 and number nl = 5 (no = 10 - 5 = 5). Suppose that the
703
x) m -
,j
= 1,2 ....
n-
de .... ( 0 , 1).
In the last step, determine the critical threshold that maximizes the isolated
weight, i.e.,
de~(O, 1) =
max
{dej(O, 1)}
l<_j~n--I
where e k* is one of the ej's. The optimum critical point is, therefore, x m) =
x~) and the optimally discretized binary array for variable x is: e* = e~ =
(k)
(el
(k)
, e2
(k),
, . . , en
).
It is worthwhile to point out that when variables x and y are highly correlated, the optimum critical point tends to be close to the mid-value of x. In
the extreme case, as one of the referees of this paper suggested, if the correla-
704
tion between x and y is one or negative one, then, the optimum critical value
determined by the above procedure will be the medium point of x.
Case Study
Threshold
4.0
210.0
4.0
0.019
0.015
0.8
16.01
15.05
Value"
(>
(>
(>
(>
(>
(>
(>
(>
1
1
1
1
1
1
1
1
aThe column contains value assignment: 1 ifx > threshold and 0, otherwise.
d*(0, 1)
0.82
0.64
0.46
0.71
0.65
0.62
0.73
0.91
705
H(y) = -
~]
j=l
p( yj) ln p( y;)
(4)
Hx(y ) = -
2 p(yjlx) lnp(yjlx )
(5)
j=l
l/y(y ) lnfs(y )
dy
and
Hx(y) =-
~ f(ylx) ln f(ylx) dy
p(x
y) - AI(x
h'(y)
y)
(6)
706
(7)
rn
Z Z p(yilxi)g(xi)lnp(yjIxi )
i=lj=l
~(x ~ y)=
1 +
(8)
H(y)
p(x, lyi)p(yj)
p(yjlxi)
g(xi)
such that
Z Z p(x, lyj)p(yj)In
i=,j=|
~(x ~ y) = 1 -
~
Z..a
j=l
g(xi)
)
(9)
P(
yj)
In p ( y j )
The following properties of the entropy information are proven (see Appendix
B).
Theorem 3.
0-< ~(x~y)
-< 1
~ ( x ~ y ) = 0 if and only if y is statistically independent of x
~ ( x --' y ) = 1 if and only if s -< m and y is a deterministic function o f x,
i.e., y = 6 ( x ) , a.s.
~ ( y -~ x ) = 1 if and only i f m -< s and x is a deterministic function of y,
i.e., x = ~b(y), a.s.
~ ( x -~ y ) = ~ ( y ~ x ) = 1 if and only i f m = s and there exists a
deterministic function 3', such that y = 3,(x) and x = , y - i ( y ) , a.s.
707
p ( X i [ Yi) :
n .j
= --,
n
P(Yj)
g(xi)
ni
""
n
(10)
~(x --~ y) :
E E n o In ( n i f f n i . )
1 -~='J='
j=1
(11)
n.j ln ( n j / n )
For the purpose of robustness, it is necessary to use the Bayesian estimators for probabilities p ( y j ) , g ( xi ), and p ( xi [yi ):
fi(YJ)
--
rl.j -t- 1
n + s '
~(Xi)
--
hi. + 1
n + m'
fi(xilYj)
rlij -}- 1
rt.j + m
(12)
Given these estimators, the Bayesian estimate of the mean of the relative entropy information in (9) is given by:
y) = 1
~,
i=l j = l
( n j + 1)
In
-\n.j
j=l
q-
\nj +
+ 1
-~-
(13)
Implementation of Discretization
The basic objective of this approach is to choose a scheme for discretizing
the interval Xo < x < x into m ( m > 1)subintervals. This is equivalent to
determining m - 1 threshold values. One criterion for such a performance is
708
to select a set of (m - 1 ) critical cutoff values within the range [Xo, x ] such
that the estimated relative entropy information of x on the objective variable y
is maximized. Such a scheme will be considered as the best discretization of x
into m subintervals with respect to y.
Conceptually, this optimum discretization may be cast to a nonlinear programming problem. The most convenient and practical method, however, is
still a trial-and-error search algorithm, particularly when the number of threshold values being determined is not large At first glance, a thorough search
appears to be hopeless, as the possible schemes of discretization for a quantitative variable are infinite. Fortunately, this is not true. Denote the m - 1
threshold values by x0 = (x(01), X~o
2)
x(0m-l)) where X~ol) < X(o2) <(
<
X ~ - 1). Clearly, all of the threshold values must be selected on the interval [Xo,
x ]. More precisely, each of these threshold values must be determined within
one of the n - 1 intervals [xi, x i + j ] for i = 1, 2 . . . . .
n - 1 (here, x is
assumed to be ordered). Adopting an approach similar to that used in the isolated weight method, we consider the midvalues of these intervals, x (l), x (2),
. . , x (" - ~), as the possible candidates for the m - 1 optimum threshold values.
With this specification, a thorough search algorithm is feasible and effective.
In the majority of the practical cases, the binary or ternary discretization
of a quantitative measurement is satisfactory. For the ternary transformation, a
search procedure may be developed on the basis of the procedure suggested
above. Here, a detail search algorithm for the binary transformation only is
presented:
Select a qualitative feature ( y ) being of the most interest as the objective
variable, which takes s possible values, Yt < Y2 < - - < Ys.
Determine the minimum and maximum values (x 0 and x ) of n observations on variable x to be discretized. Compute the difference/x = x - Xo and
step length Ax = A I N , where N is the total number of discretizing schemes.
For any given scheme k, x is discretized into a binary array by using the
quantity x ~) = Xo + k A x as the threshold value. On the basis of this discretization, construct a two-dimensional contingency table [3 x (s + 1)], containing frequencies nij (i = 1, 2, and j = 1, 2, . . . , s).
Based upon the data in the contingency table, estimate the relevant probabilities in (10) for the maximum likelihood method, or in (12) for the Bayesian
robust method. Then, using Eq. (11) or (13), compute the estimate of the average relative entropy information for the k th scheme, ~ (x --' y).
Repeating the steps above N times, we obtain N estimated means of the
relative entropy information, ~ l ( x --* y), ~2(x ~ y) . . . . . pN(x ~ y). Then,
determine the largest mean: 3 7 ( x ~ y) = max~{~k(x --' y ) } (1 _< 1 _< N ) .
Subsequently, the optimum threshold value is x (t) = x o + I A x
It should be noted that the foregoing discussion requires that the objective
variable y be qualitative. In order to satisfy this requirement in the cases where
.
, .
709
rxy =
1 --
E (e,-
Hill
Yi)
Case Study
The entropy information method described above is applied to the problem
of delineating anomalous targets for the epithermal gold-silver deposits in the
Walker Lake quadrangle, which comprises the area between 38 and 39 North
latitude and 118 and 120 West longitude and includes parts of the states of
California and Nevada. The geochemical data used in this study were collected
from stream sediment samples. Among 30 elements analyzed, 14 elements,
including Au, Ag, Cu, Pb, Zn, Fe, Ca, Sb, Zr, V, Bi, Mo, Be, and B, were
employed in this analysis.
These elements were synthesized into a single measurement--the geochemical scores by using the model referred to as the weighted and targeted
710
o
o ~)
oo
J
o
ao
too
co
_>
o
tM
~o
o~
iv3
a)
)
711
multivariate criterion (Harris and Pan, 1987, 1989a). These scores were then
filtered for noise. The filtered scores are contoured and shown in Fig. 1.
Consider the objective of delineating anomalous targets for the exploration
of epithermal gold-silver deposits. This objective can be perceived as a problem
in optimum discretization of the synthesized geochemical measurement into a
binary variable representing anomaly and background. In order to enhance the
information of the scores about the gold-silver deposits, the sum of gold and
silver concentrates was selected as the objective variable. The entropy information method was then applied to the geochemical scores and the optimum
threshold value was found to be about 580 (Fig. 2). The value of 1 was assigned
to the scores greater than 580 and 0 to the scores less than or equal to 580.
Finally, exploration targets for the epithermal gold-silver deposits were delineated as those areas, such as Windmill, across the entire Walker Lake quadrangle that are represented by a value of 1 and do not have known deposits (Fig.
3).
o"
C
0
=0
O_
o"
c3-
o"
o
c~
Weighted
Scores
Fig. 2. Optimal discretization for the filtered geochemical scores using the entropy information method.
712
r-
..=
l--le,
E
i~
i ,.-'o o
~G"
<[
~.
~'
~:
.~.~
"
\'<~)-" ~ <~
\-f.
oo.0:
,,,
0c-
~-----~\
I, "~'~1-
c~
""
oc~ ~./
~,-"
'vKIn~,/~\ o ~
t-5
o_~
c.-
~>"
n.. o
<[z
o.
o
0
~
0 ~ "~a.
"',~,
713
Let us suppose x and y are the quantitative and objective variables, respectively, that have been observed on a sample of size n. Rearrange the elements in y and obtain Y0
(Yi~, Yi2. . . . .
Y i . ) where Yi~ <- Yi2 <- . <- Yi,,.
Divide the range D = [Yi,, Yi,,] into t mutually exclusive subintervals denoted
by D l, D2 . . . . .
D t (t < n). Let y* be an array containing midvalues of these
subintervals: y* = (y*, y* . . . . . y*), which is referred to as the standard array
of y. Clearly, the rank of y~ is k for k = 1, 2 . . . . .
t. Select the minimum and
maximum values, x0 and x , for measurement x and then divide the region E =
[Xo, x ] into s mutually exclusive subregions denoted by El, E2 . . . . . Es (s <
n).
For any given subinterval E~, probabilities Pij = P ( x e E i, y ~ Dj) ( j =
1, 2 . . . . .
t) may be computed:
=
PiJ= I
.I f ( x , y )
Dj Ei
dxdy, j=
1,2,...,t
(14)
p(i)
1
~Y =
t(t 2 -
~] (rik
1)~=1
-- k) 2,
i = 1, 2 . . . . .
(15)
714
1) k=, ,=~+,
Wk,(Pk -- 0,) 2
(16)
where wkt = min(po~, p 0 / ) / m a x (Pok, P01), Poj = Eti= J Pik, and E represents
any discretization scheme for x.
P r o c e d u r e for D e t e r m i n i n g T h r e s h o l d Values
The basic strategy for finding the optimum discretizing scheme for the
entropy information method discussed in the last section is also appropriate for
the rank correlation method. The criterion employed here for determining critical threshold values for a quantitative measurement x is to maximize the quantity (16). Given ApZ(E),find E* such that
7r(E*) = max { A p 2 ( E ) } .
(17)
where E* is called the optimum discretization for x with respect to y*. Clearly,
for a division of s subintervals, s - 1 optimum threshold values are sought.
The rationality of criterion (17) is intuitive because it is based upon partial
rank correlation, PRCC in (15), which describes the rank consistency between
two sequences: the joint occurrence probabilities defined in (14) and the standard array y*. Larger P_(i)
values indicate that the subregion Ei of measurement
xy
x more likely corresponds with the larger values of the objective variable y,
and vice versa. Therefore, maximization of the partial rank correlation difference in (17) enhances the contrasts between the PRCCs as much as possible.
One consequence of this is the maximum separation of two groups of subregions. One of the groups contains those subregions having the most positive
715
PRCCs with y*, meaning that these regions most likely co-occur with the largest values of y, whereas the other includes those subregions having the most
negative PRCCs, suggesting that they are most likely associated with the smallest values of y. For example, let s = 2, meaning that x is transformed into a
binary variable. Denote two subregions by 0 and 1, respectively, and assume
(1)
^(2)
that the o~y
> 0 and ~,xy
< 0. Then, criterion (17) would lead to an optimum
binary discretization of x in that value " 1 " represents information of x about
the largest values of y, while ' 0 " represents information of x about the smallest
values of y. If y is the size of ore deposits, then, observations of " 1 " on the
discretized variable indicate possible occurrence of large deposits, while " 0 "
indicates small or no deposits.
V a l u e A s s i g n m e n t to the D i s c r e t i z e d V a r i a b l e s
A general role for value assignment does not exist for this method. The
principles for value assignments to binary and ternary variables only are suggested below. These principles are useful when the discretization is motivated
by the objective of mineral resources estimation.
In the binary cases, two PRCCs are computed for two discretized subregions. Assign 1 and 0 to the two subregions according to the following roles.
* If the sign of Pxy
^(1) and ~,,y
~(2) are opposite and ] P~(1)
x y - - P x(~)l
y F ~ c, where c
is a positive number, then the subregion corresponding to the positive p
is assigned a value of 1 ; the other is given 0.
(1) and Pxy
~(2) have the same sign and ] px(ly) -- P~y
^(2) rt > C, then the subre" If Pxy
gion with the larger absolute value is assigned a value of 1, and the
other, 0.
* When px,,
(~) - ,~,,
~(2) I
c, this variable may be deleted, if it is believed
to be not strongly correlated with the objective variable y.
In ternary cases, three PRCCs are computed for the three discretized subregions. Assign value 1, 0, or - 1 to each of the 3 subregions according to the
following rules.
" If different signs exist among the three PRCCs, and maxi~; t ^(i)~,~, P(J)
xy l > c, then the subregion with the largest positive PRCC is given a
value of 1, the subregion with the smallest PRCC a value of - 1 , and
the other a value o f 0.
When the PRCCs have the same sign, but [ p(j~ - PXY
~(J) > c for all i v~
j, then the subregion with the largest PRCC is given a value of 1 if
PRCCs are positive, and value - 1 if the PRCCs are negative; other
subregions are given a value of 0.
Except for the two cases above, when the variable x is believed to be
weakly correlated with the variation of the objective variable y, it may
be deleted.
716
The rank correlation method described above was applied to a set of data
collected in the Walter Lake 1 2 quadrangle. The data set consists o f 9
integrated geofeatures, which are briefly described as follows (Harris and Pan,
1987, 1988, 1989a, b; Pan and Harris, 1989b): x~, filtered geochemical scores
that were derived from synthesis o f the 14 elements sampled from drainage
basins; x2, high pass structural fields that were obtained by synthesis of the 10
structural descriptors related to faults; x3, band pass gravity fields that were
derived from coherency analysis between high pass isostatic gravity fields and
filtered geochemical fields; x 4, band pass magnetic fields that were derived from
coherency analysis between high pass magnetic fields and filtered geochemical
fields; xs, ratio of rock density to susceptibility contrast estimated by a Poisson
moving window, based upon high pass gravity and magnetic fields; x 6, correlation between high pass gravity and magnetic fields estimated by a Poisson
moving window; XT, area of host rocks (in km 2) outcropped within a cell for
epithermal g o l d - s i l v e r deposits; xs, area o f Tertiary intrusives that outcrop
within a cell; and x9, area o f hydrothermal alterations found within a cell.
Each of these geofeatures is valued on a 55 x 55 inter-grid matrix across
the W a l k e r Lake region. In order to apply the discretization approach, a region
located chiefly in the Aurora 15' quadrangle and containing 324 sample locations was selected as a control region. Using the number of epithermal g o l d silver mineral occurrences as the objective variable, these quantitative measures
were discretized optimally into ternary variables by the rank correlation method.
The basic results of this transformation are shown in Table 2 where c* and c~
are the two optimum threshold values. The best subintervals are recognized in
terms of their correlations with mineral occurrences. F o r example, geochemical
scores (x~) greater than 558.5 (value o f 1) are most favorable for mineral oc-
p~3)
Ap*
c*
c*
<c*
[c*, c*]
> c~'
Xl
X2
0.031
3705.5
-0.179
-0.17
0.679
0.477
26.38
558.5
0
0
1
- 17.35
16.93
0.821
-0.857
0.714
0.440
--2.494
4.365
1
-1
1
X3
X4
X5
X6
X7
X8
X9
-0.864
0.000
0.000
0.000
0.845
6.330
0.880
0.840
-0.857 -0.821 -1.000 -0.750
0 . 6 4 3 0.679
0 . 1 7 9 0.607
0.893
0 . 8 5 7 0 . 6 7 9 0.750
0.664
0 . 5 7 7 0 . 7 0 3 0.599
--0.010
0.211
0.059
0.084
0.389
1.477
0 . 2 9 3 0.336
-1
-1
-1
-1
1
1
0
1
1
1
1
1
717
currence, while scores less than the same cutoff offer little evidence for the
occurrence of mineral deposits. Another interesting feature is that the mid-subintervals of some variables are most valuable for indicating existence of mineralization-e.g., interval [0.146, 1.569] regal of band pass gravity (x3)and
interval [ - 6 6 . 8 5 , - 3 9 . 8 6 ] gamma of band pass magnetics (x4). The third
feature is that the discretization reveals operational directions of the variables
vis-a-vis objective variables. For example, geochemical scores, hostrocks, hydrothermal alterations, etc., are positively associated with the number of mineral occurrences.
SUMMARY
The three techniques for the optimum discretization proposed and demonstrated in this paper are potentially useful for many geological problems.
Possible applications include the following:
1. Defining the optimum boundaries of geologic objects, geofields, and
various anomalous targets. These geological boundaries are important
in mineral exploration and mineral resource estimation, as mineral endowment units of various scales and kinds are closely related to these
geologic boundaries.
2. Revealing those subregions of a geologic variable carrying the most
information about the variations of mineral resources, although overall
the variable may insignificantly correlate with mineral resource descriptors.
3. Refining and selecting important and useful geological variables based
upon the maximum correlations between the geologic measurements
and some objective variable.
4. Unifying diverse geodata through transformation of quantitative geologic variables into binary or ternary data (e.g., characteristic analysis),
which requires binary or ternary input variables.
5. Recognizing the operational directions of variables in relations to variations of the objective variable--detecting whether a geological measurement is a positive or negative factor in terms of its influence on the
objective variable.
APPENDIX A
P r o o f for T h e o r e m 1
718
1, 0,
0 .....
0)
(ekl,
ek2)
O)
+ (k, + k2 -
1)
1,(3, 1, 1 . . . . .
1, 0, 0, . . . .
0)
= (e~, + t, e ~ ) = (e~, O)
where
e~,+~ = (1, 1 . . . . .
1 , 0 , 1, 1 . . . . .
1), e~2 = ( 0 , 0 . . . .
0)
1,13, 1, 1 . . . . .
0 .....
0)
Accordingly, we have
d(e;+,)
= d(e~,) + (k2 -
1) + . . .
+ (kl + k2 - i -
+ (k, + k2 - i + 1) + . . .
+ (k, + k 2 -
1)
1)
= d ( e ~ ) + k2 + (k2 + l ) + . . .
+ (k, + k 2 -
1) - ( k -
+ (k I +
1) - (k - i -
= d ( e k + , ) - (k - i + 1) _<
k2 -
i + 1)
1)
d(ek+,)
F o r the case that ek+ 1 ----- 1, a similar result can also be obtained. Therefore,
according to the role o f the d e d u c t i o n method, the proof for the first part of
T h e o r e m 1 is completed.
719
, 0 , 1)
The distance associated with this array is denoted by d(ek). Append an additional element, ek+l = O, to the end o f the array ek and form a new array:
ek+ , = (0, 1, 0, 1 . . . . .
where e k_ 1 = (0, 1, 0, 1 . . . . .
pute the array distance:
0, 1, 0) = (ek, 0) = (e~_l, 1, 0)
0), containing the first k -
I elements. Com-
+ (k - 2 ) ]
However, if the additional element ek +1 is inserted into the array ek at the ith
position (1 < i __ k), it destroys the original configuration of the array (i.e.,
I1 :g 1 and l o :~ 1 ). In order to compute the new array distance, this array is
reversed (note that such a modification does not alter the array distance). Then,
the new array becomes:
e~+, = ( 1 , 0 , 1 , 0 . . . . .
1, 0 , 0 , 1 , 0 , . .
= d(e;_,)
+ (k-
+ [2 + 4 + . . .
i + 1) + ( k -
+ [2 + 4 + . . .
+ (k-
i -
+ (k-
i + 3) + . . .
+ (k-
1) + ( k -
i-
2)
+ (k-
1)]
+ (k-
3)]
i - 2)
i + 1) + . . .
= d(e~,_l) + 2 [ 2 + 4 + . .
+ (k-
2)]
Finally, we have
d(e~+,) - d(ek+,) = d(e_,
- d(ek_,)
>_ 0
720
The proof is given only for the cases where both x and y are discrete.
Equation (8) can be written as p ( x ~ y ) = 1 - Vx(y), where
m
i=lj=l
~ p(yjlxi)g(xi)lnp(yjlxi)
Vx(y) =
E p(yi)in p(yj)
j=l
Clearly, v X( y ) >_ 0. Furthermore,
i=lj=l
~ p(yjlxi)g(xi)lnp(yjlxi)
= 2 p(yj) lnp(yj) = H ( y )
j=l
Thus, Vx(y) = 1, meaning that ~ ( x --* y) = 0, which completes the proof for
the second part of the theorem.
We know that ~ ( x --' y) = 1 if and only if Vx(y) = 0, which suggests
that for any given i (i = 1, 2 . . . . .
m), there exists a unique j*, such that
I1,
p(yjlxi) =
O,
j --j*
J 4= j*
j = 1, 2 . . . . .
This implies that there exists a function q~, such that yj. = q5(xi) with probability
1. Because i is arbitrary, we have
y = (x),
a.s.,
s _< m
which completes the proof for the third part of the theorem. Similarly, we also
can prove the fourth part of the theorem, i.e.,
x=
~(y),
a.s.,
m <_ s
x = ~(y),
a.s.,
s --- m
Next, prove 0 and ~b are mutually reversible from one to the other. Suppose
that there exist j * and i*, such that yj. = 4~(xi.), i.e., p(yj. Ixi,) = 1. Then,
721
we have
j=1
> o
ACKNOWLEDGMENTS
Grateful a c k n o w l e d g e m e n t is m a d e o f the support o f the data p r o v i d e d by
various U . S . G e o l o g i c a l Survey offices and personnel. Special thanks are g i v e n
to the g u i d a n c e p r o v i d e d by Dr. D a v i d M e n z i e . I also wish to thank the referees
for their v a l u a b l e c o m m e n t s and suggestions. Finally, we are appreciative o f
the assistance o f A l i c e Y e l v e r t o n and Y i n g h o n g M i a o in preparation o f tables
and figures.
REFERENCES
Agterberg, F. P., 1989, Systematic approach to dealing with uncertainty of geoscience information
in mineral exploration, in Weiss, A. (Ed.), 21st Application of Computers and Operations
Research in the Mineral Industry, p. 165-178.
Bonham-Carter, G, F., Agterberg, F. P., and Wright, D. F., 1988, Integration of geological datasets for gold exploration in Nova Scotia: Photogram. Engin. Remote Sensing, v. 54, p.
1585-1592.
Botbol, J. M., 1971, An application of characteristic analysis to mineral exploration: Proc. 9th Int.
Sym. on Techniques for Decision-Making in the Mineral Industry, Special v. 12, p. 92-99.
Botbol, J. M., Sinding-Larsen, R., and McCammon, R. B., 1978, A regionalized multivariate
approach to target selection in geochemical expIoration: Econ. Geol., v. 73, p, 534-546.
Dong, W. Q., Zhou, G. Y., and Xia, L. X., 1979, Theory of quantifications and their applications
(in Chinese): Jilin People's Publisher, Changchun, 197 p.
Harris, D. P., and Pan, G. C., 1987, An investigation of quantification methods and multivariate
relations designed explicitly to support the estimation of mineral resources--intrinsic samples:
Report on Research Sponsored by U.S. Geological Survey Grant No. 14-08-0001-G1399, 200
p.
Harris, D. P., and Pan, G. C., 1988, Intrinsic sample methodology in Gaal, G., and Merriam, D.
F. (Eds.), Computer Applications in Resource Exploration: Prediction and Assessment for
Petroleum, Metals and Nonmetals: v. 6, Computers and Geology Series, Pergamon Press,
New York.
Harris, D. P., and Pan, G. C., 1989a, Updated concepts of intrinsic samples and methodology for
722
simultaneous estimation of discovered resources and endowments: Report and Research Sponsored by U. S. Geological Survey, in preparation.
Hams, D. P., and Pan, G. C., 1989b, Information fields and exploration targets with a demonstration on the Walker Lake quadrangle of Nevada and California: Math. Geol., submitted.
McCammon, R. B., Botbol, J. M., Sinding-Larsen, R., and Bowan, R. W., 1983, Characteristic
analysis--1981: Final program and a possible discovery: Math. Geol., v. 15, p. 59-83.
Pan, G. C., 1985, Quantitative mineral resource assessment on the pegmatitic Nb-Ta mineral
deposits in Fulian Province of China--Method Investigation (in Chinese): M.S. thesis,
Changchun College of Geology, 141 p.
Pan, G. C., and Hams, D. P. 1989a, Decomposed and weighted characteristic analysis in the
quantitative evaluation of mineral resources with a case study on the pegmatitic Nb-Ta deposits in China: Math. Geol., submitted.
Pan, G. C., and Hams, D. P., 1989b, Quantitative analysis of anomalous sources and geochemical
signatures in the Walker Lake quadrangle of Nevada and California: J. Geochem. Explor. (in
press).
Pan, G. C., and Wang, Y., 1987, Weighted characteristic analysis and its applications in the
assessment of the pegmatitic Nb-Ta mineral resources in Fujian Province: Geol. Prospect.,
v. 23, p. 34-42.
Pan, G. C., and Xia, L.0 1988, Methods for quantification of association between variables by
means of information theory: Math. Star. Applied Prob., v. 3, p. 7-20.