Documente Academic
Documente Profesional
Documente Cultură
Potential Modeling:
A Concise Survey of Models and Methods
Helmut Schaeben
Department of Geophysics and Geoinformatics
Potential modeling
Contents
Introduction
Objective
Prerequisites
Methods
Weights-of-evidence
Logistic regression
Artificial neural nets
Comparison
Examples
Conclusions
Potential Modeling
Potential Modeling
Objective
The ultimate goal of potential modeling or targeting is to
recognize locations for which the probability of a target event T
like a specific mineralization is a relative maximum.
The event must be sufficiently well understood in terms of cause
and effect to collect data corresponding to spatially referenced
factors B` , ` = 0, . . . , m, in favor or against the event T to occur.
Then spatially referenced posterior probabilities given the factors
can be estimated by several approaches including
weights-of-evidence, logistic regression, fuzzy logic, artificial neural
nets, statistical learning, support vector machines, and others.
These methods require a training area to estimate the parameters
of the model M (0 , . . . , m | (b0,i , . . . , bm,i , ti )i=1,...,n ).
Definition of Terms
Odds
For probabilities P(T ) 6= 1, T A, odds are defined as the ratio
O(T ) =
P(T )
P(T )
=
, p [0, 1).
1 P(T )
P({T )
Definition of Terms
Odds
For probabilities P(T ) 6= 1, T A, odds are defined as the ratio
O(T ) =
P(T )
P(T )
=
, p [0, 1).
1 P(T )
P({T )
Logits
Logits are defined as
logit(T ) = ln O(T ) = ln
P(T )
, p [0, 1).
1 P(T )
Definition of Terms
Odds
For probabilities P(T ) 6= 1, T A, odds are defined as the ratio
O(T ) =
P(T )
P(T )
=
, p [0, 1).
1 P(T )
P({T )
Logits
Logits are defined as
logit(T ) = ln O(T ) = ln
P(T )
, p [0, 1).
1 P(T )
Logistic function
The logistic function is defined as
1
(z) =
, z R.
1 + exp(z)
Definition of Terms
1.0
0.8
0.6
0.0
0.2
0.4
logisticfunction
0.6
0.4
0.0
0.2
logisticfunction
0.8
1.0
Logistic function
-10
-5
0
z
10
-10
-5
10
Definition of Terms
1.0
0.8
0.6
0.0
0.2
0.4
logisticfunction
0.6
0.4
0.0
0.2
logisticfunction
0.8
1.0
Logistic function
-10
-5
10
-10
-5
10
Definition of Terms
Conditional independence of random variables
Two random variables B1 and B2 are conditionally independent
given the random target variable T, B1 B2 | T, if the joint
conditional probability factorizes into the individual conditional
probabilities
PB1 B2 |T = PB1 |T PB2 |T .
Conditional independence in terms of irrelevance
Equivalently, but more instructively, two random variables B1 , B2
are conditionally independent given the random variable T, if
knowing T renders B2 irrelevant for predicting B1 .
In terms of conditional probabilities,
PB1 |B2 T = PB1 |T .
Independence vs. conditional independence
Independence does not imply conditional independence and vv.
Weights-of-Evidence
Assuming conditional independence of binary predictors
B` , ` = 1, . . . , m, given the binary random target variable T yields
weights of evidence
(1)
W`
:= ln
P(B` = 1 | T = 1)
,
P(B` = 1 | T = 0)
(1)
C` = W `
(0)
W`
(0)
:= ln
P(B` = 0 | T = 1)
,
P(B` = 0 | T = 0)
W` , ` = 1, . . . , m.
Weights-of-Evidence
Assuming conditional independence of binary predictors
B` , ` = 1, . . . , m, given the binary random target variable T yields
weights of evidence
(1)
W`
:= ln
P(B` = 1 | T = 1)
,
P(B` = 1 | T = 0)
(1)
C` = W `
(0)
W`
:= ln
P(B` = 0 | T = 1)
,
P(B` = 0 | T = 0)
(0)
W` , ` = 1, . . . , m.
m
X
C` B` ,
`=1
with W =
Pm
`=1
(0)
W` .
in terms of a probability
P (T = 1 | B1 . . . Bm ) = logitP(T = 1) + W +
m
X
`=1
!
C` B`
Logistic Regression
Conditional expectation of a binary random target variable T given
a (m + 1)variate random predictor variable
B = (B0 , B1 , . . . , Bm )T with B0 1
E(T | B) = P(T = 1 | B).
Then the logistic regression model without interaction terms is
in terms of a logit
logitP(T = 1 | B) = T B = 0 +
m
X
` B` ,
`=1
in terms of a probability
P (T = 1 | B) = B = 0 +
m
X
`=1
!
` B`
P (T = 1 | B) = logitP(T = 1) + W +
X
`:B` =1
with
W =
m
X
(0)
W`
`=1
P (T = 1 | B) = 0 +
X
`:B` =1
C`
P (T = 1 | B) = logitP(T = 1) + W +
C`
`:B` =1
with
W =
m
X
(0)
W`
`=1
P (T = 1 | B) = 0 +
`:B` =1
` = C` , ` = 1, . . . , m.
is called a singlelayer
perceptron or singlelayer ANN,
(1)
(2)
(1)
Aj
m
X
(1)
j` B` , j = 1, . . . , J,
`=0
hidden units: Zj
(2)
(1)
= h Aj
, j = 1, . . . , J,
=
J
X
(2)
kj Zj , k = 1, . . . , K ,
j=0
outputs: k
= S
(2)
Ak
, k = 1, . . . , K .
Then
!
X
m
X
J (2)
(1)
k = S
kj h
j` B`
, k = 1, . . . , K .
j=0
`=0
|
{z
}
hidden layer
!
X
m
X
J (2)
(1)
j
j` B`
P(T = 1 | B) =
.
j=0
| `=0{z
}
hidden layer
If S = , h = id and K = 1, then
J
X
(2)
P(T = 1 | B) =
m
X
j=0
= 0 +
`=0
m
X
`=1
!
(1)
j` B`
!
` B`
10
10
8
indicator target T
indicator predictor B2
y
1
6
x
10
10
indicator predictor B1
1
6
x
10
10
Correlation matrix
Correlation matrix of the fabricated dataset with B1 , B2 , T
B1
B2
T
B1
B2
1.0000000
-0.0000000
0.3026050
-0.0000000
1.0000000
0.2305562
0.3026050
0.2305562
1.0000000
2
5.484206
5.185943
df
2
2
P(> 2 )
0.06443468
0.07479743
2
0
0
df
0
0
P(> 2 )
1
1
0.8
8
rows
0.6
6
0.4
4
0.2
2
0.0
2
10
cols
b = 1 | B1 B2 )
Spatial distribution of P(T
according to elementary estimation training set!!
B1
B2
sum of weights
c (1)
W
c (0)
W
b
C
1.1962
0.9679
2.1642
-0.5292
-0.3819
-0.9111
1.7255
1.3499
b = 1) = 1.8152 the
b = 1) = 0.1627, logitP(T
With O(T
weights-of-evidence model reads explicitly
b = 1 | B1 B2 ) = (2.7264 + 1.7255 B1 + 1.3499 B2 ) .
P(T
1.0
10
10
0.8
0.8
0.6
rows
rows
0.6
6
0.4
4
0.4
4
0.2
2
0.2
2
0.0
2
cols
10
0.0
2
cols
10
(Intercept)
B1
B2
Estimate
Std. Error
z value
P(> |z|)
-2.8312
1.8736
1.5354
0.5045
0.6553
0.6722
-5.61
2.86
2.28
0.0000
0.0042
0.0224
1.0
10
10
0.8
0.8
0.6
rows
rows
0.6
6
0.4
4
0.4
4
0.2
2
0.2
2
0.0
2
cols
10
0.0
2
cols
10
(Intercept)
B1
B2
B1 B2
Estimate
Std. Error
z value
P(> |z|)
-3.4340
2.9232
2.6455
-3.2333
0.7184
0.8848
0.8984
1.5515
-4.78
3.30
2.94
-2.08
0.0000
0.0010
0.0032
0.0372
1.0
10
10
0.8
0.8
0.6
rows
rows
0.6
6
0.4
4
0.4
4
0.2
2
0.2
2
0.0
2
cols
10
0.0
2
cols
10
B1
B2
B1 B2
sum of weights
c (1)
W
c (0)
W
b
C
1.1963
0.9680
0.7167
2.8809
-0.5293
-0.3819
-0.0386
-0.9497
1.7255
1.3499
0.7553
1.0
10
10
0.8
0.8
0.6
rows
rows
0.6
6
0.4
4
0.4
4
0.2
2
0.2
2
0.0
2
cols
10
0.0
2
cols
10
b = 1 | B)
lR : P(T
b
lRit : P(T = 1 | B)
10
8
indicator target T
indicator predictor B2
y
1
10
indicator predictor B1
1
10
10
10
1.0
1.0
10
10
0.8
0.8
0.8
0.6
0.6
6
0.4
4
0.2
2
cols
10
0.2
2
0.0
4
0.4
4
0.2
0.4
4
0.6
rows
rows
rows
1.0
10
0.0
2
cols
10
0.0
2
10
cols
b = 1 | B1 B2 ) according to elementary
Spatial distribution of P(T
estimation, weights-of-evidence, weights-of-3evidences
10
8
indicator target T
indicator predictor B2
y
1
10
indicator predictor B1
1
10
10
10
1.0
10
10
0.8
0.8
0.8
0.6
0.6
6
0.4
4
0.2
2
cols
10
0.2
2
0.0
4
0.4
4
0.2
0.4
4
0.6
rows
rows
rows
10
0.0
2
cols
10
0.0
2
10
cols
b = 1 | B1 B2 ) according to elementary
Spatial distribution of P(T
estimation, logistic regression, logistic regression with interaction
1.0
1.0
10
10
0.8
0.8
0.8
0.6
6
0.6
rows
0.6
rows
rows
1.0
10
0.4
0.4
0.4
0.2
0.2
0.2
0.0
2
0.0
10
cols
0.0
10
cols
10
1.0
10
10
0.8
0.8
0.6
6
0.6
rows
rows
cols
0.4
4
0.4
4
0.2
2
0.2
2
0.0
2
cols
10
0.0
2
10
cols
b = 1 | B1 B2 ) according to elementary
Spatial distribution of P(T
estimation, logistic regression with interaction term, R-ANNGA,
weights-of-evidence, logistic regression without interaction
Numbers
Comparison of predicted conditional probabilities for various
methods
b
P(T
= 1 | B1 B2 )
B1
B1
B1
B1
=
=
=
=
1, B2
1, B2
0, B2
0, B2
=
=
=
=
1
0
1
0
counting
WofE
LogReg
LogRegwI
CompRegwI
ANNGA
0.25000
0.37500
0.31250
0.03125
0.58636
0.26875
0.20156
0.06142
0.64055
0.27736
0.21486
0.05565
0.25000
0.37500
0.31250
0.03125
0.25000
0.37500
0.31250
0.03125
0.24992
0.37502
0.31250
0.03124
Conclusions
Weights-of-evidence is the special case of logistic
regression with indicator predictor variables B, if they are
conditionally independent given the target variable T.
In this case, the contrasts of weights of evidence are identical
to the logistic regression coefficients.
Applying weights-of-evidence despite lacking conditional
independence corrupts not only the predicted conditional
probabilities but also their rank-transforms and thus the
pattern of the potential.
The canonical generalization of the nave Bayes model
featuring weights of evidence to the case of lacking conditional
independence is logistic regression including interaction
terms corresponding to violations of conditional
independence and compensating exactly for them.
In case of indicator or discrete predictor variables, the logistic
regression model is optimum.
DEPARTMENT OF CHEMISTRY,
NARULA INSTITUTE OF TECHNOLOGY,
WEST BENGAL, INDIA
2
Enzyme characteristics
Protein in nature
Catalytic activity and may act as activator
or inhibitor
Acts on a specific substance called
substrate to create a product
Names of enzymes end with ase e.g.
Reverse trancriptase
A. Low [S]
Polymerization
Sedimentation and centrifugation
Flocculation and coagulation
Extraction
Precipitation
Chromatography
Protein modification
Starch conversion
Leather industry
Dairy industry
Textiles
Edible oils
Medicines
Biotransformation
Leonor Michaelis
(1875-1940)
Maud L. Menten
(1879-1960)
Title page of
Michaelis &
Mentens
original paper
in
Biochemische
Zeitschrift in
1913
k2
S + E SE P + E
k-1
S= substrate
E = enzyme
SE = substrate - enzyme complex
P = product
Here, ideal reaction conditions are considered i.e. no delay time between
conformational changes from ES to EP is observed.
The magnitude of oscillation increases as time delay parameter increases from 0.6
For constant control input, the delayed system has a stable nature around its
interior equilibrium point
Applying optimal control the unstable system becomes stable after 2.5 hours
CONCLUSIONS
Delay induced mathematical model in the enzyme
REFERENCES
Thanks for
kind attention
Spraying
Optimal Control Theoretic Approach
Numerical Simulation
Concluding remarks
to spurge family.
It is a poisonous, semi-evergreen small tree of height 6 m.
The plant can be cultivated in wastelands and grows on almost any type of territory,
even on sandy and saline soils.
The seeds of Jatropha Curcas contain 27% - 40% oil that can be administered to
obtain a high-quality alternative fuel biodiesel through chemical process.
After extracting biodiesel, the residue (press cake) can also be used as biomass
feedstock to power electric plants. This cake can also be applied to manufacture
biogas.
Unfortunately, this natural resource Jatropha Curcas plant is highly aected mainly
by the mosaic virus that is carried through infected white-ies.
It causes a great loss in our socio-economic environment.
JATROPHA CURCAS
ABOUT WHITE-FLY
White-ies are tiny ying insects having a dry wax that protects them from
adverse situation.
They will be displayed year round in the southern part but latent throughout
the winter months in the northern region of the world.
The species are developed from eggs and raised by way of a series of instars.
Once the adult ies appear, they will lay eggs that are very small in size and
almost unseen in less than a week mainly at the safe portions of the plant.
Their growth can happen in less than a month but life-cycles can be matured
as long as a year if the circumstances are not in favor.
As white-ies are tremendously productive, if once they get conventional on
any part of the plant around the garden, they will voluntarily move and try to
attack any other immediate vegetation.
WHITE-FLY
INFECTED PLANT
ABOUT INSECTICIDE
o There are several organic substances that can be sprayed / applied over
o
o
o
o
INSECTICIDAL SOAP
OUR OBJECTIVES
Huge population or dense feeding by white-ies can harm Jatropha
Curcas plants causing yellowing leaves and ultimately death to the host
plants.
On that outlook, we wish to spray the Insecticidal Soap on the host plant in
two dierent avenues.
One is xed optimal control (implicit form) and another is the impulsive
approach (explicit form).
On that basis, we desire to study a comparative analysis between two
dierent approaches and nd the most eective method to control the
vector white-ies.
We wish to incorporate the impulsive approach to reduce the vector
population (white-y), which causes the movement of virus to the Jatropha
Curcas plant.
I.
VII. We assume the rate a for the transfer of plants to the infected state from
the latent phase.
VIII. We consider the harvesting of infected plants at a rate g and also the
rate of normal plant loss is treated as
IX. We assume logistic type growth in the non-infected vectors white-y
with b as the maximum vector birth rate and m as the maximum vector
abundance because the population do not grow unboundedly.
X. We consider the interaction between infected plants and non-infected
vectors. The non-infected vector population is reduced with as the
acquisition rate.
XI. The vector mortality rate is considered as c for both non-infected and
infected vectors.
MATHEMATICAL MODEL
A.
B.
C.
D.
E.
One-Dimensional
Impulsive Dierential EquatioN
BIOLOGICAL REMARKS
If the time interval be always less than some preassigned quantity i.e.,
then we are capable to restrict
the vector population under the threshold value
In
other words, this contributes the maximum interval for
spraying to restrain of the vectors that the infection lies
within
Numerical Simulation
Table. Values of the parameters used in the model equation
Figure 1: Population densities are plotted as a function of time and value of the parameters
are given in Table.
Figure 3: Population densities of Healthy Plants, Infected Plants, Non-Infected Vectors and
Infected Vectors are plotted as a function of time after applying the insecticide at an impulsive
mode and value of the parameters are given in Table.
Figure 4: Population densities of Healthy Plants (x), Infected Plants (y) and Infected
Vectors (v) are plotted as a function of time with and without control approach and value
of the parameters are given in Table.
imposing optimal control approach in 100 days. But when we apply the
insecticide spraying through the impulsive mode, the healthy plant
population reaches above 150 plants (approximately) in 100 days time
period.
In case of infected plants, the population decreases but not less than 65
plants in 100 days time span if we apply the optimal control approach. The
infected population becomes stable below 10 days in case of spraying
through impulsive mode and goes to the extinction near about 10 days with
the spraying of 80% eectiveness.
sharply but does not go to extinction in 100 days span. On the other hand,
this population becomes extinct within more or less 10 days in case of
spraying by impulsive method.
CONCLUDINg REMARkS
1.
2.
3.
4.
5.
Concluding REMARkS
6. At the same time the healthy Jatropha Curcas plant population is going to be
enhanced for the spraying of insecticide at 80% eectiveness.
7. The eectiveness of the insecticide (Insecticidal Soap) spraying improves the
system towards the increasing of healthy and reducing of infected population.
8. If we communicate our research ndings in the infected zone, then a new
prospect will come in front to ght against white-ies in the global worldwide
viewpoint.
References
ACKNOWLEDGEMENT
(Universit de Bordeaux)
1 / 22
2 / 22
mc/s
500
1000
1500
2000
2500
3000
1960
1970
1980
1990
2000
2010
time
3 / 22
Outline
Robust estimation
(Universit de Bordeaux)
4 / 22
Yns+ =
(1)
k ()Yns+k + ns+
k =1
(Universit de Bordeaux)
5 / 22
PAR estimation
Consider Yns+ with n = 0, 1, .., N 1 and = 1, .., s.
The PAR model (1) is reformulated as :
z() = X()() + e()
where () the (p() 1) vector of model parameters
>
() = 1 (), . . . , p() ()
with the (N 1) vectors
z()
e()
>
Y , Ys+ , . . . , Y(N1)s+
>
, s+ , . . . , (N1)s+
Y1
Y2
Y
Y
s+1
s+2
X() =
..
.
Y(N1)s+1
(Universit de Bordeaux)
Y(N1)s+2
...
...
..
.
Yp()
Ys+p()
..
.
...
Y(N1)s+p()
6 / 22
(Universit de Bordeaux)
7 / 22
(Universit de Bordeaux)
8 / 22
p()
X
Yns+
k ()Yns+k , ns + > p(),
ns+ =
k =1
0,
ns + p(),
by their modified residuals ns+ :
ns+ =
ns+
()
with
() the robust estimator of () and the Huber function :
H,k (x) = sgn(x) min {|x|, k }
where k is a constant and sgnx is the signum function.
(Universit de Bordeaux)
9 / 22
=1
=2
=3
=4
1
0.30
0
-0.80
0
(Universit de Bordeaux)
2
0.50
-0.65
0.30
0
3
0
0
0.35
0
4
0
0.50
0
0
5
0
0
0
0
6
0
0
0
0.81
7
0
0
0
0.70
July 811, 2014
10 / 22
(Universit de Bordeaux)
Estimate
Mean
MSE
LS
RA
LS
RA
LS
RA
LS
RA
LS
RA
LS
RA
LS
RA
LS
RA
LS
RA
0.2999
0.2990
0.4988
0.4980
-0.6500
-0.6508
0.4992
0.4985
-0.8016
-0.8010
0.3006
0.2993
0.3493
0.3499
0.8099
0.8107
0.6991
0.6984
0.0317
0.0332
0.0243
0.0255
0.0331
0.0337
0.0349
0.0361
0.0431
0.0442
0.0378
0.0392
0.0385
0.0401
0.0352
0.0361
0.0371
0.0384
d =5
d = 10
Mean
MSE
Mean
0.2685
0.2887
0.4685
0.4893
-0.5809
-0.6326
0.4368
0.4800
-0.7171
-0.7957
0.2994
0.2964
0.3612
0.3432
0.6909
0.7827
0.5837
0.6708
0.0474
0.0355
0.0437
0.0289
0.0817
0.0405
0.0756
0.0431
0.1080
0.0487
0.0496
0.0422
0.0532
0.0432
0.1297
0.0482
0.1275
0.0514
0.2053
0.2849
0.4000
0.4876
-0.4457
-0.6278
0.3216
0.4761
-0.5731
-0.8044
0.2780
0.2946
0.3526
0.3381
0.4909
0.7751
0.3958
0.6656
MSE
0.1051
0.0375
0.1109
0.0302
0.2161
0.0439
0.1872
0.0459
0.2572
0.0519
0.0772
0.0437
0.0860
0.0464
0.3291
0.0543
0.3132
0.0557
11 / 22
with
log(N)
p(),
N
where ns+ , denotes the residuals of the adjustment,
() is the
least squares estimators of (), and p() is the number of
autoregressive parameters in season .
BIC() = log
2 () +
Genetic algorithms
(Universit de Bordeaux)
12 / 22
(Universit de Bordeaux)
13 / 22
14 / 22
Simulations experiments
The selection algorithm is applied to 100 independent simulations with
N = 300 observations by period. Jump to model
The seasonal order vary from 0 to 15 (the length of the chromosome is
L = 15). The size of the GA population is Np = 40, the crossover
probability Pc = 0.8, the mutation probability Pm = 0.05, the number of
generations is Ng = 50 and the number of elite individuals is 1.
TABLE : Frequencies of correct model selection using GA algorithm for the
fourth season (random contamination)
d
d =0
d =5
d = 10
(Universit de Bordeaux)
Classic
GA
Robust
GA
0.93
0.51
0.16
0.90
0.87
0.87
15 / 22
500
500
mc/s
1000
1500
2000
1960
1970
1980
1990
2000
2010
year
(Universit de Bordeaux)
16 / 22
1500
2000
1000
2500
3000
Detection of outliers
500
10
11
12
Period
(Universit de Bordeaux)
17 / 22
Garonne identification
The last year or 12 observations are omitted from the data set. GA is
run for 50 generations with parameters L = 15, Np = 40, Pc = 0.8,
Pm = 0.05 and with one elite individual (giving 4 105 possible models)
=1
=2
=3
=4
=5
=6
=7
=8
=9
= 10
= 11
= 12
un-robust
(1,15)
(1,6)
(1)
(1)
(1)
(1)
(1,8)
(1)
(1)
(1,2,3,6,10,11)
(1,4,5,8,15)
(1,8,11)
robust
(1)
(1,6)
(1)
(1,10)
(1,4)
(1)
(1,8,11)
(1)
(1)
(1)
(1,4,5,8,15)
(1,8,11)
(Universit de Bordeaux)
18 / 22
800
600
700
201011
201012
300
400
mc/s
500
200
100
201001
observed
robust forecast
forecast
201002
201003
201004
201005
201006
201007
201008
201009
201010
yearmonth
(Universit de Bordeaux)
19 / 22
Predictions
To evaluate the forecast accuracy we compute the following
measures : mean absolute percentage error (MAPE), median absolute
percentage error (MdAPE), mean absolute error (MAE) and root mean
square error (RMSE).
Garonne
criterion unrobustified robustified
MAPE
26.847
22.543
MdAPE
28.374
16.745
MAE
94.305
93.240
RMSE
126.663
119.332
TABLE : Accuracy of forecasts of Garonne river flows.
20 / 22
Conclusion
(Universit de Bordeaux)
21 / 22
References
(Universit de Bordeaux)
22 / 22