Documente Academic
Documente Profesional
Documente Cultură
1 Introduction
Mathematical models used for computing predictions of many geo- and environmental systems are traditionally of parametric nature. The model parameters,
usually, bear a large degree of uncertainty due to lack of knowledge about the
particular phenomenon. Conventionally, the model parameters can be fitted to the
available observations using various optimisation techniques, which raises the problem of the confidence of such model fit, e.g. in Demyanov et al., (2006).
Uncertainty of a forecast is based on the probabilistic analysis of prediction
model solutions. A probabilistic approach was applied to quantify uncertainty of
P.M. Atkinson and C.D. Lloyd (eds.), geoENV VII Geostatistics for Environmental
Applications, Quantitative Geology and Geostatistics 16,
c Springer Science+Business Media B.V. 2010
DOI 10.1007/978-90-481-2322-3 30,
345
346
V. Demyanov et al.
Fig. 1 Adaptive sampling in the search for good fitting models: blue colour corresponds to low
misfit (each model is displayed by a Voronoi cell following NA sampling)
347
Every misfit evaluation entails forward simulation of the reservoir model, which
is a computationally costly process and entails solving finite-difference flow equations on a fine grid. A guided sampling approach was proposed to increase the
computational efficiency of the adaptive sampling by reducing the number of forward reservoir simulations (Demyanov, 2007). The approach proposed using fast
misfit artificial neural network interpolation models to compute approximate misfit values instead of the exact ones from forward flow simulation. Artificial neural
networks were used as non-linear and non-parametric regression models to evaluate the misfit at chosen locations based on the previously simulated pool of models
(Christie et al., 2006).
The principal question in the search for good fitting models is whether to run a
flow simulation to evaluate the goodness of model fit. This splits all possible models into two classes the likely good fitting models for which we need to run the
flow simulation and the models for which no flow simulation run is needed as they
are unlikely to provide a reasonable match. Thus, we can reformulate the adaptive
sampling problem using classification to decide whether to run the flow simulation
at any particular location. The parameter space becomes separated by the classifier
into the areas where the flow simulation is to run and where it is not. Therefore,
instead of solving a misfit interpolation problem we have a classification problem.
The classification problem is focused on detection of the regions where evaluation
of the misfit is needed via flow simulation to find good history match models. Application of classification algorithms rather than regression algorithms to the guided
sampling allows more flexibility and it is less influenced by noise and inaccuracy of
the approximate interpolation solution.
There exists a wide selection of statistical and data driven algorithms to solve
the classification problem. Machine learning also includes a number of classification tools such as Support Vector Machines, probabilistic neural networks,
self-organising maps, k-nearest neighbours, etc. (Haykin, 1999). A comprehensive
review of applications of traditional and recent machine learning algorithms to spatial prediction problems is presented in Kanevski et al., 2009.
The high dimensionality of the parameter space is another problem that complicates the search for good fitting models. Even when large numbers of potential
models are evaluated the space remains fairly empty and poorly explored. Therefore,
the search for multiple low misfit models becomes difficult even for adaptive algorithms; this is true especially in the presence of local minima in the misfit surface.
The exploration of high dimensional space is burdened by the curse of dimensionality problem (Hastie et al., 2001). The problem of curse of dimensionality is in
exponential increase in volume with adding an extra dimension to the parameter
space and is illustrated in Fig. 2 (Kanevski et al., 2009). Suppose 1,000 data samples are drawn uniformly in the unit volume around the origin of the coordinate
system. Then, the distance from the origin to the nearest sample increases drastically with dimension. In high dimensional space, the notion of nearest neighbour
becomes obsolete all the samples are equally far or, in other words, are located in
the corners of the high dimensional space.
V. Demyanov et al.
Distance from the origin to the nearest data sample
348
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
20
40
60
Input space dimension
80
100
Fig. 2 Dependence of the distance from the origin to the nearest sample from the space dimension
1.1 Aims
The aim of the paper is to propose a way to improve the search for good fitting
models with a robust classification method, which can overcome the curse of the
dimensionality problem.
Support Vector Machine (SVM) is a data driven classification method which provides a non-linear and a non-parametric classification in high dimensional input
space and effectively handles the curse of dimensionality. As a machine learning
349
algorithm it is able to capture dependencies from the data in the model parameter
space whilst training. Based on the captured dependencies SVM is used to classify
models in parameter space according to their goodness of fit.
The purpose of using SVM classification is to improve the sampling efficiency
by reducing the computational effort spent on forward reservoir simulation for every
generated model. Guided sampling based on the classification results would be able
to find good fitting models faster with fewer forward simulations computed only
for the models from the regions classified as good fit. SVM classification would
improve sampling quality because this data-driven algorithm overcomes the curse
of dimensionality problem.
In this paper we propose the methodology of classification for sampling for the
good fitting models and illustrate it with a synthetic feasibility example. However,
the illustrative case study we used is of low dimension with just three model
parameters. A higher dimensional study, which would tackle the curse of dimensionality problem, will be the subject of future research.
V. Demyanov et al.
margin
wx
+b
=0
wx
+b
=1
y=+1
wx
+b
=1
350
Support Vectors
y=-1
xi
N
X
yi i xi :
(2)
i D1
The samples with non-zero weights are the only ones which contribute to this maximum margin solution. They are the closest samples to the decision boundary and are
called Support Vectors (SVs) (see Fig. 3). SVs are penalized such that 0 < < C
to allow for misclassification of training data (taking into account the mislabelled
samples or noise).
A so-called kernel trick is used to make this classifier non-linear. A kernel is a
symmetric semi-positive definite function K.x; x 0 /. According to the Mercer theorem, this implies that it corresponds to a dot product in some space (Reproducing
Kernel Hilbert Space, RKHS). Generally, given a (linear) algorithm, which includes
data samples in the form of dot products only, one can obtain a (non-linear) kernel
version of it by substituting the dot products with kernel functions. This is the case
for linear SVM, where the decision function (1) relies on the dot products between
samples, as clearly seen by substituting (2) into (1). The final classification model
is a kernel expansion:
f .x; / D
N
X
i K.x; xi / C b
(3)
i D1
The choice of the kernel function is an open research issue. Using some typical kernels like Gaussian RBF, one takes into account some knowledge like distance-based
similarity of the samples. The parameters of the kernel are the hyper-parameters of
SVM and have to be tuned using cross-validation or another similar technique.
351
1
;
.1 C exp.a f .x/ C b//
(4)
where a and b are constants. These constants are tuned using a maximum likelihood (usually, the negative log-likelihood to simplify the optimisation) on the testing
dataset. The value of a is negative, and if b is found to be close to zero, then the default SVM decision threshold f .x/ D 0 coincides with a confidence threshold level
of 0.5.
352
V. Demyanov et al.
Worse models
C
p(x) > 0.5
x
good models
+> 0
> 0
Fig. 4 The sampling point inside the class region of good models can be identified by
considering the predicted probability and ranked following its potential weight as if the sample
would become a Support Vector. Predicted worse models can be filtered out. The unexplored
regions can be identified as well
In our search for the prospective good models we are interested in the following
situations illustrated in Fig. 4. A sample lies inside the region of good models if it
simultaneously satisfies the following two conditions:
When labelled positively it obtains zero or some small weight C 0.
And when labelled negatively its weight is bounded with C. C/.
Such samples are subject to simulation. Although, it is well described by the available data set and does not provide new information about the parameter space. Those
samples with both C > 0 and > 0 are inside the unexplored regions of the parameter space and can be proposed for consideration.
Another important issue is the ranking by significance of those samples which
were found to be interesting. Since here we are only interested in a relative measure
in order to provide the ranking, the following value will be used:
8
<0; if C D 0; D C I or C D C; D 0;
k
k
k
.xk / D C Ck
: k k ; otherwise:
2C
(5)
Since it is time-consuming to explore the multi-dimensional parameter space by retraining an SVM classifier two times for every prospective sampling location, one
may carry out this procedure for the samples with sufficient probabilities p.y D
1jx/ > . The estimation of the latter involves a simple computation of (3) and (4)
which is reasonably fast.
353
354
V. Demyanov et al.
injector
0
producer
200
Distance FEET
400
600
800
1000
8320
8330
Depth FEET
8340
8350
8360
8370
8380
8390
PermX (MDARCY)
1.98
41.53
81.08
120.63
160.18
6.0e+01
a
1.00e+00
5.00e+00
1.00e+01
2.00e+01
3.00e+01
5.00e+01
7.00e+01
1.00e+02
5.00e+02
1.00e+03
5.00e+03
1.00e+04
5.00e+04
1.00e+05
3.0e+01
1.7e01
3.9e02
355
2.5e+01
5.0e+01
Fig. 6 Misfit for 200 initial models for training (a), model classes for 200 training models (b)
a handful of models with a misfit below 70 (see Fig. 6b). However, they are located
in different regions of the parameter space, which is important for maintaining good
generalisation ability of the classifier.
Validation with the exhaustive database of 160,000 models was carried out to
check whether SVM classification is accurate; i.e., do the SVM models, which were
classified to run the flow simulation for, match the production well? All 160,000
model locations were classified by the SVM to determine which misfit class they
belong to below 70 or above 70. SVM classified 15,000 models out of 160,000 as
the models with the misfit <70, for which the flow simulation is to be computed.
This is just under 10% of the total number of models. Once the actual misfit of the
classified 15,000 models was obtained based on the flow simulation runs, it became
possible to analyse how good were the classified modes in terms of the history
match. Figure 7a shows the cumulative histogram of the actual misfit of the 15,000
models classified to have the misfit below 70. It demonstrates that just under 90%
of the models classified for running the simulation have the actual misfit below 70,
which confirms that the SVM classifier is accurate.
The next stage entails checking if the SVM classifier has missed any of the good
models in the database; i.e. how many good fitting models from the data base have
not been classified to run the flow simulation for. Figure 7b shows the lower end
of the misfit histogram for all the models from the data base. The bars in the histogram indicate the proportion of the models classified for running the simulation
(according to SVM) in each interval of the misfit values. It can be seen that over
75% of the models with the misfit below 20, which can be considered as reasonably
low, were classified by SVM for running the simulation (see the first four bars in
Fig. 7b). As the model misfit increases the chance of a model to be classified by
SVM for running the flow simulation decreases. This is illustrated by the decrease
of the bar proportion representing the models classified for flow simulations out of
all the models from the database with the misfit above 30 (see Fig. 7b).
356
V. Demyanov et al.
10
20
30
40 50 60 70
True misfit
80
90 100
0.0
0.2
0.4
0.6
0.8
Number of models
2000 4000
6000
8000 10000
1.0
Data Base
SVM class
Low misfit
models
10
20
30
True misfit
40
50
Fig. 7 Cumulative histogram of the true misfit for the modes classified by SVM to have the misfit
below 70 (a); histogram for the misfit of models with low misfit: in the data base (DB) versus the
low misfit models classified by SVM to have the misfit below 70 (b)
Fig. 8 Good fitting models with the misfit below 30 from the three ensembles of models: exhaustive data set of 160,000 models (a), 500 samples based on SVM classification (b), conventional
NA sampling of 7,000 models (c)
Figure 8 shows the locations of the good fitting models with the misfit below 30
in the 3D parameter space. Good fitting models from the exhaustive database of
160,000 models shows how complicated the surface of the good models is (see
Fig. 8a). The SVM classifier was able to capture some of the basic structure of the
misfit surface with just 500 samples and detect diverse good fitting models in different areas (see Fig. 8b), which provided more robust predictions. NA sampling with
1,700 flow simulations concentrated on good fitting models in a particular cluster
away from the truth case model (see Fig. 8c), which resulted in deviation of the
forecasts from the truth case solution (see Fig. 9b) (Erbas, 2006).
The production profiles computed by flow simulation of the models inferred from
the generated ensembles are shown in Fig. 9, where the models are run into the forecasting period past the period where the fitted data were available. Inferred models
from both SVM classification based ensemble (500 flow simulations from Fig. 8b)
demonstrate fair spread of forecasts, which comfortably encompass the truth case
500
1000
Time (days)
1500
Oil production
Oil production
0
Oil production
500
1000
Time (days)
1500
357
500
1000
Time (days)
1500
Fig. 9 Good fitting models inferred from the three ensembles of models: ensemble of 500 models
based on SVM classification (a), conventional NA sampling of 1,700 models (b) and long GA
sampling run of 2,000 models (c). Vertical line shows the start of the forecasting period
(see Fig. 9a). The results were compared with conventional sampling methods
Neighborhood Approximation (NA) and Genetic Algorithms (GA) (Erbas, 2006).
The conventional NA run with a limited number of generated models (1,700 flow
simulations) demonstrates reduction of the spread in the predictions for the forecasting period (see Fig. 9b). The inferred ensemble of models generated by GA based
on 2,000 flow simulations provide a fair spread in the forecasting period, but some
of the models do not fit the history well (see Fig. 9c). For all approaches the truth
case model solution lies towards the edge of uncertainty envelope. This can be explained by a peculiar choice of the truth case model, which lies on the edge of the
good fitting model region (see Fig. 8a). This suggests a low posterior probability of
the truth case model.
4 Conclusions
An SVM classification model was used to detect the models for which to run flow
simulations. Thus classification is able to separate the regions in the parameter
space where to search for good fitting models. Classification is more robust than
regression, which aims at accurate estimation of the actual misfit value given by the
interpolation model. The SVM approach is beneficial in high dimensional space relative to other machine learning algorithms, because it provides the best prediction
in terms of error without compromising the complexity of the classification model.
The SVM classifier applied to a synthetic model demonstrated accurate results,
which captured better the pattern of the low misfit regions in the parameter space
than other data driven interpolation algorithms (e.g. a multi-layer perceptron). Models found with SVM classification based on just 200 forward simulations are as good
as the ones generated from a traditional adaptive NA sampling, which entailed more
forward flow simulations.
We have to admit that the observed performance of SVMs in terms of predicting
the low misfit models is not better than equivalent NA techniques reported for this
358
V. Demyanov et al.
case study by (Erbas, 2006). Rather, it is in the possibly wider application of SVMs
to the exploration of the parameter space and uncertainty characterization that we
see considerable potential. This view is justified since SVMs are specifically designed to handle high-dimensional data and extract a sparse set of support vectors
from them. Thus, SVM being a data-driven method is useful to describe the previously unexplored regions of the parameter space and find hidden dependencies in it.
Acknowledgements Funding of this work was provided by UK Engineering and Physical
Sciences Research Council (GR/T24838/01) and by the industrial sponsors of the Heriot-Watt
Uncertainty Project.
The authors acknowledge Swiss National Science Foundation funding GeoKernels: kernelbased methods for geo and environmental sciences project N 200021 113944.
The authors would like to thank J. Carter of Imperial College for providing the model and the
data for the IC Fault case study.
References
Carter JN, Ballester PJ, Tavassoli Z, King PR (2004) Our calibrated model has no predictive value:
an example from the petroleum industry. Proceedings of the Fourth International Conference
on Sensitivity Analysis
Christie M, Demyanov V, Erbas D (2006) Uncertainty quantification for porous media flows.
J Comput Phys 217:143158
Demyanov V (2007) Neural network guided sampling for uncertainty quantification of production
forecasts. Presentation at UFORDS, Scarborough
Demyanov V, Wood SN, Kedwards TJ (2006) Improving ecological impact assessment by statistical data synthesis using process based models. J Royal Stat Soc, Appl Stat (Ser C) 55(1,
Part 1):4162
Erbas D (2006) Sampling strategies for uncertainty quantification in oil recovery prediction. Thesis
for the Degree of Doctor of Philosophy, Heriot-Watt University, August 2006
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
Haykin S (1999) Neural networks: a comprehensive foundation. Pearson Higher Education,
New Delhi
Kanevski M, Pozdnoukhov A, Timonin V (2009) Machine learning algorithms for geoSpatial data.
369, EPFL Press, Switzerland, p 300
Platt J (1999) Probabilistic outputs for support vector machines and comparison to regularized
likelihood methods. In: Smola AJ, Bartlett P, Scholkopf B, Schuurmans (eds) Advances in
large margin classifiers. MIT Press, Cambridge, MA
Pozdnoukhov A, Kanevski M (2007). Interactive monitoring network optimization using support
vector machines. In Spatial Statistics and GIS conference (Stat-GIS 2007), Klagenfurt
Sambridge M (1999) Geophysical inversion with a neighbourhood algorithm I. Searching a parameter space. Geophys J Int 138:479494
Scholkopf B, Smola AJ (2002) Learning with kernels. MIT press, Cambridge, MA
Subbey S, Christie MA, Sambridge M (2002) Uncertainty reduction in reservoir modelling. In:
Chen Z, Ewing RE (eds) Fluid flow and transport in porous media: mathematical and numerical treatment. American Mathematical Society Contemporary Mathematics Monograph,
Providence, Rhode Island, pp 457467
Tavassoli Z, Carter JN, King PR (2004) Errors in history matching, SPE Journal, September 2004,
pp 352361
Vapnik V (1995) The nature of statistical learning theory. Springer, New York