Documente Academic
Documente Profesional
Documente Cultură
a r t i c l e in f o
a b s t r a c t
Article history:
Received 5 January 2008
Received in revised form
19 January 2009
Accepted 4 June 2009
Available online 19 September 2009
Economic evaluation of a new oil well is important for decision-making in the petroleum industry, and
this evaluation is based on a good prediction on a wells production. However, it is difcult to accurately
predict a wells production due to the complex subsurface conditions of reservoirs. The industrial
standard approach is to use either curve-tting methods or complex and time-consuming reservoir
simulations. In this paper, an enhanced decision tree learning approach called neural-based decision
tree (NDT) model is applied in an attempt to investigate its performance in predicting petroleum
production. The primary strength of this model is that it can capture dependencies among attributes,
and therefore, it is likely to provide an improved or more accurate prediction (Lee and Yen, 2002).
This paper presents an application of the NDT model for petroleum prediction. Our models were
developed based on the ve most signicant parameters that affect oil production: permeability,
porosity, rst shut-in pressure, residual oil and saturation of water. The ve parameters were used as
input variables, and oil production is the output variable for modeling. Four different models were
generated in the modeling process, and each involves a different combination of parameters. First, an
overall oil production model is developed using the three geoscience parameters of permeability,
porosity and rst shut-in pressure. Secondly, two different models, with different input parameters,
were developed to predict production in the post-water ooding stage only. The results of the above
models indicate that data-driven models may not be effective for classifying the data set. Hence, a trend
model was developed in an attempt to improve the effectiveness and accuracy of the predictive model.
The result shows that the trend model can provide an improved performance, and its performance is
comparable to that of the articial neural network.
& 2009 Elsevier Ltd. All rights reserved.
Keywords:
Decision tree
Neural network
Attribute dependency
Data mining
Petroleum production prediction
1. Introduction
Prediction of oil well production is important in estimating
economic benet of a well. However, this prediction task is
difcult because of the complex subsurface conditions of wells.
Even two wells, located side by side in the same reservoir, may not
have the same production (Mattar and Anderson, 2003). In
addition, core analysis data obtained from oil elds are limited
and tend to be biased. It is also difcult to adequately model the
core analysis variables, which are signicant factors in oil
production. These variables have some dependencies or correlation among each other, however, there is no equation to describe
inter-relationships. The traditional approach of modeling the
variables is to use the curve-tting approach, which is complex
and time consuming. The current amount of petroleum data
collected in databases today has far exceeded our ability to reduce
and analyze data without the use of automated analysis
ARTICLE IN PRESS
X. Li, C.W. Chan / Engineering Applications of Articial Intelligence 23 (2010) 102109
2. Background literature
2.1. Articial neural network in petroleum prediction
Articial neural network (ANN) was rst introduced by the
psychologist Frank Rosenblatt in 1958 (Minsky, 1969). After that,
the approach has been utilized for predicting oil production based
on geological parameters such as porosity, permeability and uid
saturation from conventional well log and core analysis (Mohaghegh et al., 2005). Since the inter-relationships among the
parameters are complicated, and the prediction process using the
conventional methods can be time consuming and expensive in
terms of labor and computational resources, demand for effective
models to predict production increases rapidly. On the other hand,
compared to the conventional methods, the ANN approach has
been shown to generate more accurate and repeatable results
(Mohaghegh et al., 1995; Chan and Nguyen, 2003). Some recent
efforts at applying ANN methods in petroleum engineering
include the following. Nguyen et al. (2004) used a multiple neural
network model to make both short- and long-term time series
predictions of petroleum production. Mohaghegh et al. (1994)
applied ANN to predict the permeability of the formations using
the data provided by geophysical well logs with good accuracy.
Wong et al. (1998) used ANN to predict permeability with an
application to the Ravva oil and gas eld, offshore India. Nikravesh
et al. (1996) used ANN for the prediction of formation damage
during uid injection into fractured, low permeability reservoirs.
103
Mixed Type
2.2. Decision tree learning for petroleum engineering
A decision tree is an idea generation tool that generally refers
to a graph or model of decisions and their possible consequences,
including chance event outcomes, resource costs, and utility. In
data mining, a decision tree is a predictive model; it is a mapping
of observations about an item to conclusions about the items
target value (JiaWei and Micheline, 2001).
One of the advantages of decision trees is that they are easy to
be interpreted and used by petroleum engineer. For example, the
best potential oil production among the wells can be easily
identied by the set of terminal nodes in the tree that have the
highest percentage of production, and then the user can focus on
the specic wells described by those nodes. In comparison with
other methods, a decision tree can be constructed relatively easily
and quickly.
Some research works on application of decision tree learning
for petroleum engineering are discussed as follows. Perez et al.
(2005) used decision trees to classify the data for permeability
predictions based on the well logs. Jensen (1998) applied decision
tree analysis to estimate the range of uncertainty in the reservoir
production prognosis. Agbon et al. (2003) compared the decision
tree model with the fuzzy model for the ranking of reservoirs
based on the amount of natural gas production in Venezuela. The
research emphasis on decision tree learning in the existing
literature used implementations that support univariate attribute
testing at each node, which is an adequately expressive representation if the training data are assumed to exhibit attribute
independence. But in petroleum engineering, the available data
sets typically contain attributes that are interdependent. Hence, it
is necessary to take the issue of possible dependencies among
attributes into consideration when designing decision tree
Nominal Type
Numeric Type
Encode
Normalization
Neural Network
Feed-forward Back-propagation Model
Collect hidden weights
Combine/Merge Attributes
Decision Rules
Fig. 1. Design of the NDT Model.
ARTICLE IN PRESS
104
The original data set used for modeling suffered from the
commonplace inadequacies of being incomplete (lacking attribute
values or certain attributes of interest, or containing only
aggregate data), noisy (containing errors, or outlier values that
deviate from the expected), and inconsistent (containing discrepancies in the tag names used to label attributes). In other
words, there are reported errors, unusual values, and inconsistencies in the data recorded for some transactions. Due to these
inadequacies in the dataset as well as the fact that a large amount
of redundant data can slow down the analysis process, data
preprocessing was conducted so as to improve the efciency of
the analysis process. Since the geoscience values of permeability,
porosity, water saturation, and residual oil are measured in
different depth ranges at each well in the core analysis, we
calculated the weighted averaged value and fed them as the
corresponding input values of each geoscience parameter for each
well.
ARTICLE IN PRESS
X. Li, C.W. Chan / Engineering Applications of Articial Intelligence 23 (2010) 102109
105
300
200
100
0
Fig. 2. Distribution of data in each class.
B
E
C
F
ARTICLE IN PRESS
106
Table 1
Comparison of NDT model and C4.5 results.
Measures
131
52
320
86.88
13.12
0.13
5.51
15692
117
45
320
85.31
14.69
0.15
6.90
14070
67
24
320
80.94
19.06
0.19
11.63
28211
53
17
320
79.69
20.31
0.20
13.20
26468
Table 2
Comparison of NDT model and ANN results.
Measures
NDT cross-valid
ANN cross-valid
35
122
71.0744
0.4033
0.2132
89.4032
116.8512
3
122
71.1
0.3465
0.2277
95.5131
100.4011
increases after water ooding, and with more void space within a
rock, the oil can be more easily pushed onto the surface.
4.4.3. Trend predicting model
The error analysis of the post-water ooding model reported
above led us to the conclusion that the mechanisms used so far
were somewhat supercial, and these congurations may not
allow the data-driven models to classify oil production into
physically interpretable classes.
In order to improve the effectiveness of the classication
model, we attempt to predict the trend of the oil well instead of
directly estimating its production. We worked with petroleum
ARTICLE IN PRESS
X. Li, C.W. Chan / Engineering Applications of Articial Intelligence 23 (2010) 102109
Table 3
Comparison of NDT model and ANN results.
Table 4
Experimental results of NDT.
Measures
NDT cross-valid
ANN cross-valid
30
122
76.8595
0.4233
0.2215
92.884
122.6586
3
122
72.7273
0.3488
0.223
93.5436
101.0489
Measures
Number of samples
Root mean square error(RMSE)
Mean absolute error
Relative absolute error (%)
Root relative squared error (%)
Correlation coefcient
107
NDT qi prediction
NDT Di prediction
Training
Testing
Training
Testing
28
17.17
13.48
62.18
61.39
0.79
3
42.31
38.32
124
127.86
0.50
28
0.0653
0.0315
92.3
94.6
0.52
3
0.0094
0.0086
62.8
51.3
0.95
Table 5
Experimental results of ANN.
Measures
Number of samples
Root mean square error(RMSE)
Mean absolute error
Relative absolute error (%)
Root relative squared error (%)
Correlation coefcient
ANN qi prediction
ANN Di prediction
Training
Testing
Training
Testing
28
32.43
25.37
117.03
115.96
0.337
3
38.46
32.11
104
116.23
0.91
28
0.0607
0.039
115.36
87.82
0.484
3
0.0546
0.0395
287.83
298.03
0.94
the accuracy of the model. From Tables 4 and 5, it shows that for
testing, these relative errors of Di in ANN model are higher than
NDT model, which means the output value of the NDT model
tends to lie fairly close to its average value, and therefore easier to
predict compared with the ANN model. However, in qi prediction,
these relative errors of both NDT and ANN are high, which
demonstrates that predictability of the model is low.
In summary, the performance of NDT and ANN in predicting qi
and Di, is inconclusive in that: the NDT is better in Di, prediction
according to the Mean Absolute Error, and the reverse is true for qi
prediction.
5. Conclusions
This paper reports on an ongoing research program that has
the objective of predicting petroleum production using different
machine intelligence techniques. The initial research goal is to
perform prediction modeling for petroleum production, and the
approach we used is the neural-decision tree (NDT) model. It is
shown that the NDT model, being analogous to decision tree
algorithms, have some advantages compared to articial neural
network. From the experimentation presented here involving
different strategies and different parameters combinations, the
following conclusions can be made:
Firstly, an overall oil production model was developed using
the three geoscience parameters of permeability, porosity and rst
shut-in pressure, the model has an average classication accuracy
of 80.31%. Although this classication accuracy shows a 5%
decrease compared with the regular C4.5 model, the NDT model
reduces the tree size and number of rules by half, which makes it
easier for petroleum engineers to analyze results from the rules it
generated. In addition, permeability is shown to be the most
important variable in predicting petroleum production.
Secondly, post-water ooding models were developed using
three geoscience parameters and four geoscience parameters. In
spite of low classication accuracy in these models, the models
demonstrate porosity is the most signicant factor in prediction of
petroleum production instead of permeability in the post-water
ARTICLE IN PRESS
108
ooding stage. This nding is consistent with the fact that the oil
well had different characteristics after water ooding was applied.
Also, the reason of the low classication accuracy using just three
attributes is likely because the input data have not been processed
more efciently, and lack of some domain knowledge of the postwater ooding conditions of the well.
Thirdly, a trend model was developed in order to improve
effectiveness of the prediction model. In this model, the oil wells
with harmonic relationships between the variables of production
rate and time were selected for prediction of the parameters of qi
and Dt using the empirical Arps decline equation. The model
demonstrated an improved performance in prediction, with the
Mean Absolute Error of 38.32 in prediting qi, and 0.0086 in
prediting Dt respectively. By comparing these results with the
value range of qi [00.38] and Di [23.798, 161.699], we found that
the Mean Absolute Error of them is small.
In addition, we also found that the performance of NDT model
is comparable to the articial neural network. The advantages of
the NDT model when compared to ANN model are
each class, obtaining one best rule for each class of the decision
variable. It is important to separate the growing and pruning
sets because it would be misleading to evaluate a rule on the
data used to form it, and would lead to serious errors if rules
that overt the data were preferred (Witten and Frank, 2005).
Since relation between the geoscience parameters are highly
nonlinear, the prediction would likely be improved by using
more geoscience parameters as input attributes. In this way,
more dependencies among attributes are considered and
identied in the NDT model.
In the situation when data-driven models are used (NDT or
ANN, etc.), some domain knowledge of petroleum engineering
is needed to analyze and validate the rules and results that
generated by these models.
6. Future work
It can be observed that the problem of predicting oil
production is difcult. Although we have utilized different
approaches on different aspects of the problem, such as data
preprocessing by NDT model, different mechanisms of input
parameters used, and different AI techniques, the prediction
results are still not as accurate as desired. It seems that oil
production depends on many factors, some of which have not
been taken into account. Moreover, each factor such as permeability cannot be measured accurately because the values vary at
different locations and in different rock formations.
The NDT model has been applied for petroleum prediction,
which is a domain that involved primarily numeric data. For
future work, the project can be extended to include more data,
that belong to the categorical and mixed-types. Further research is
needed to dene formal processes of integrating attribute grouping into the construction of a multivariate decision tree for
categorical data modeling. Without a dened process, the multivariate tree model generated is entirely dependent on the users
interpretation of link weights, which are used to prune errors. A
heuristic search based on link weights should also be considered
for constructing a multivariate tree that favours low class entropy
and supports meaningful attribute groupings. It is our belief that,
if all the dependencies among input attributes can be captured,
more improvement in data classication accuracy can be realized.
Some models with high reported errors need to be further
investigated in different ways. It would also be of interest to apply
the NDT model for prediction of gas and water in the oil wells,
integrating other geoscience parameters from DST, well log, and
core analysis data sources. It is necessary to collaborate with
petroleum engineers who can help classify geoscience data into
different rock formation groups and develop one model for each
group. In that approach, permeability and porosity will not be
averaged from all depths but instead will be averaged from values
derived from one formation only. Also, the production will be
calculated as the summation of productions from different
formations of one well.
Furthermore, the NDT model implemented in this project can
not deal with the problem of missing attribute values. This issue
needs to be investigated so that an effective process for dealing
with missing values can be dened. One possible way is to use the
attribute mean to ll in the missing value or to use the attribute
mean for all samples belonging to the same class as the given
tuple. However, this method biases the data, and the lled-in
value may not be correct. Another popular solution to this
problem is to use the most probable value to ll in the missing
value (JiaWei and Micheline, 2001), which uses all the information
ARTICLE IN PRESS
X. Li, C.W. Chan / Engineering Applications of Articial Intelligence 23 (2010) 102109
from the present data set to predict the missing value. The
judgment on the most probable value is made by the user.
Acknowledgements
The generous support of a grant from the Canada Research
Chairs Program for the rst author is gratefully acknowledged. The
authors would also like to thank Hahn H. Nguyen and Jon Hromek
for their contributions to this work.
References
Agbon, I.S., Aldana, G.J., Araque, J.C., 2003. Fuzzy ranking of gas exploitation
opportunities in mature oil elds in Eastern Venezula. SPE paper 84337
presented at SPE Annual Technical Conference and Exhibition, Denver, CO, USA,
58 October 2003.
Arps, J.J., 1945. Analysis of decline curves. Trans. AIME 160, 228247.
Chan, C.W., Nguyen, H.H., 2003. An analysis of articial neural networks versus
curve estimation techniques in predicting petroleum production. Paper EIA03038, vol. 1, International Society for Environmental Information Sciences, pp.
375385.
JiaWei, Han, Micheline, Kamber, 2001. Data Mining: Concepts and Techniques.
Morgan Kaufmann Publishers.
Jensen, T.B., 1998. Estimation of production forecast uncertainty for a mature
production license, SPE Annual Technical Conference and Exhibitions, New
Orleans, USA, SPE 49091, September 1998.
Lee, Y.-S., Yen, S.-J., 2002. Neural-based approaches for improving the accuracy of
decision trees. In: Proceedings of the Data Warehousing and Knowledge
Discovery Conference, pp. 114123.
Li, X., Chan, C.W., 2007. Towards a neural-network-based decision tree learning
algorithm for petroleum production prediction. Proceedings at the IEEE CCECE
2007, Vancouver, BC, Canada, April 2226, 2007.
109