Spe 167869 MS

SPE-167869-MS
Enhancing Wellwork Efficiency with Data Mining and Predictive Analytics

Mohamed Sidahmed, SPE, Eric Ziegel, SPE, Shahryar Shirzadi, SPE, David Stevens, SPE, and Maria Marcano,
SPE BP
Copyright 2014, Society of Petroleum Engineers
This paper was prepared for presentation at the SPE Intelligent Energy Conference and Exhibition held in Utrecht, The Netherlands, 13 April 2014.
This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been
reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its
officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to
reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.
Abstract
Effective well management and a productive wellwork program are valuable and integral business objectives. Wellwork
involves various well interventions and optimisation activities for enhancing and extending hydrocarbon production. These
remedial processes involve substantial CAPEX and OPEX, as well as other resource allocations.
Failure to prioritize objectives and improper selection of candidate wells can have significant implications on both derived
value and potential risk. A primary challenge is to ensure that wellwork is delivering production growth while maintaining cost
efficiency. Well-by-well reviews with actionable decision support information will provide the best method for identifying
potential production improvements. The selection and prioritisation of candidate jobs is a critical investment decision.
This paper addresses the business problem of reducing the uncertainty of well work program outcomes so that more informed
choices can be made from all the options such that the benefits and value of an overall well work program is enhanced and
optimized.
It illustrates the use of data-driven models for estimating key performance indicators for wellwork jobs and predicting the
likely outcome for a new planned job using pre-determined success criteria. Nine different machine learning and advanced
analytics learning schemes were applied to the training dataset of wellwork history. The competing models performance was
evaluated on a separate validation data set for a balance between best fit and prediction accuracy.
The application of developed models provided intelligence augmentation for the decision-making process. This methodology
embeds learning from past wellwork activities to streamline and guide complex workflows. The business value for embedding
quantitative predictions into strategic and operational decision-making processes is realized in reducing less-favorable
investments and maximizing the value of wellwork.
Introduction
Wellwork comprises the complete end-to-end business process covering any operation on an oil or gas well during or at the
end of its productive life. Effective wellwork maintains the integrity of the well stock and/or alters the state of the well and/or
well geometry, provides well diagnostics, or manages the production of the well. The objective of the wellwork program is to
help manage the integrity, maintenance and optimization of the existing well inventory in support of ongoing production and
opportunity identification.
A significant percentage, 15-20%, of a fields total recovery is a result of reservoir complexity and fluid composition. This
makes planning workover activities asset-dependent. Well-by-well reviews, with good support information, remain the best
way of spotting large amounts of potential production. The ability to learn from large accumulated data and embed decision
analytics techniques in support of the decision-making process strengthens a businesss competitive advantage. Data-driven
incremental learning models provide a set of intelligent tools that synthesize large volumes of data and make timely
recommendations based on learned historical behaviors and discovered hidden patterns across scattered heterogeneous
sources.
SPE-167869-MS
In response to the challenges and uncertainty facing expensive wellwork programs decisions, BP recognized the value of
embedding data-driven analytics modeling to provide more informed outcome scenarios. The envisioned benefits result in an
overall efficient and optimized program.
This project makes a ground breaking initiative and represents a first endeavor to develop a comprehensive suite of predictive
analytics models in support of workover planning. In this project we examined a wide range of data mining and machine
learning algorithms capable of dealing with the large volume of data, data quality issues, and restrictive parameter constraints
inherent in the process. The resulting model utilizes existing variables available at the planning for workover jobs as input to
predict the likely outcome of individual jobs. Enhancing the decision-making process with reduced uncertainty for the
wellwork portfolio maximizes the overall program value and its yield on investment.
This paper is structured as follows. An overview of the BP wellwork evaluation and tracking system is presented. Next,
current data mining and predictive analytics applications in oil and gas are discussed. The subsequent section introduces the
data-driven model development process established for this project and the results that were achieved. The conclusion section
summarizes the project lessons and looks forward toward deployment and sustainability.
Overview of Wellwork Evaluation Tracking System
Determining the optimal wellwork investment portfolio is a critical business decision affecting expected and derived value and
production gain in annualized mboed. This is only achieved by careful understanding and management of the program. It has
been argued that a lack of an efficient wellwork strategy could lead to an accelerated production decline in magnitude of up to
five times faster than industry averages (King 2005).
BP wellwork evaluation tracking system (WETS) tracks the companys workover activities and manages its large upstream
projects investments. It involves all capital workovers, rig workovers, side-tracks, and other major completion activities.
WETS assesses the incremental benefits achieved by individual workover jobs and aggregates data from across all workover
activities to assess the whole program (Martins et al. 1995). The program is used globally to help optimize value from
hundreds of workover operations per year that cost several hundred million US dollars (Postnikov 2005). Cismoski et al.
(2008) described the process of wellwork planning and prioritizing the wellwork activity list for the North Slope.
In order to ensure sustained rate-enhancement wellwork, WETS embarked on active surveillance program, scheduled
quality well and production technical limits (PTL) reviews, and encouraged active development and application of technology.
The program established a centralized database for managing and tracking all wellwork activities, which grew to include large
volumes of data over the past fourteen years since being deployed globally. The production data is pulled automatically from
asset production reporting systems. Accumulated data includes information about workover characteristics and post job
evaluations.
BP embraced technology as a vehicle for deriving sustained levels of efficiency and managing risks. Significant research and
development (R&D) investments in new technologies have proven to yield significant returns on investment (ROI) within their
planned timeline and in some cases exceeded the expected value (Roberts 2013).
It has been demonstrated that using sophisticated algorithms that learn from the data and provide actionable insights has
significant potential in the energy industry (Stone 2007). The objective of this initiative is to expand data mining and machine
learning applications across upstream by applying state-of-the-art techniques to support wellwork programs. The goal is to
help unlock the value of massively collected data for developing predictive models capable of learning from historical data and
reasoning about uncertainty in the surrounding environment.
Value for Oil and Gas from Data Mining and Predictive Analytics
Examples of the value of data-driven analytics and the potential for its use in the oil and gas industry have been demonstrated
by BP in Shirzadi et al. (2013). Other studies have shown the application of data mining techniques for modeling production
optimization (Zangl and Oberwinkler 2004), history matching (Esmaili et al. 2012), and predicting flow and pressure. Liu and
Horne (2011) mined permanent downhole gauges (PDG) data and developed a kernalized linear model to predict flow rate.
Bhattacharya et al. (2013) used decision trees to gain insights into the occurrence of a screenout in a wellwork jobs and to
present a visual analysis. Sharma et al. (2010) used multivariate linear regression to predict the recovery factor based on
reservoir information. Additional commercial implementation of multivariate state estimation techniques have been described
for machinery fault alerts and reliability (Rawi 2010).
SPE-167869-MS
One of the objectives of this project was to determine whether BPs WETS data could be leveraged to predict workover
outcomes during the planning phase. It did not appear that data-mining methods had been explored in the past within a
wellwork context. In this paper we focus on predicting the likelihood of a job falling in the bottom 30% of realized value from
all of the wellwork interventions from within one asset across a five-year period. Not attempting to do jobs, or reconfiguring
jobs, that have a low likelihood of returning sufficient value will significantly lower BPs wellwork costs.
We adopted the job success criterion, designated inefficiency, used in WETS to guide model training and evaluation. Figure
1 highlights the subset of wellwork jobs identified as relatively less successful compared to other workover projects. The
inefficiency measure accounts for both the short-term and long-term value of a workover and is not sensitive to superficial
temporary gains. The red points in the plot have high inefficiency levels and represent the jobs with the highest cost and least
productivity.
Inefficiency
(mm$/mboed)
Lift Cost ($/boe)
Figure 1: Success Criteria for Determining Wellwork Outcome

The Data-Driven Predictive Analytics Approach
Data mining is the science of extracting valuable knowledge from large databases. It involves several iterative steps. We
applied an iterative development process, progressing from clarifying business objectives through data pre-processing, fitting
of learning models, validation of the models, and performance testing to examine the consistency between model results and
domain experts knowledge.
Data Collection and Database Construction. One of the challenges for manipulating wellwork big data was developing a
systematic acquisition and assimilation strategy for the relevant input from multiple heterogeneous sources. There were several
functional and technical constraints that amplified the level of complexity for this task. To overcome the challenges associated
with acquiring relevant data, the project adopted an incremental phased approach by selecting one specific asset from the
wellwork portfolio.
The decision was made to retrieve data for a large Alaska asset with a substantial history of tracked wellwork. The project had
been championed by asset stakeholders. This limited approach was sufficient as a proof of concept. The asset selection
criterion was based on number of factors including workover history for the asset, regional support, and timely access to data.
One of the project recommendations, derived from the lessons learned, was for the establishment of collaboration across data
governance, data custodians, and analytics teams to streamline the process of creating the database for data mining.
SPE-167869-MS
Data Preparation. Data collection and archiving first must remove errors. Despite careful safeguard measures which may be
in place, there are still incidents where measurement sensors drift or malfunction. Human data entry is another common source
of data quality issues. The performance of any data-driven model is only as good as the input data from which it learns. Thus,
meticulously addressing data issues such as missing values, anomalies, outliers, and imbalanced datasets is a crucial step
towards achieving acceptable results.
While extreme data sanitization is considered to be both impractical and cost-prohibitive, ensuring adequate levels of
conformance with typical operating conditions and anticipated deployment environments must be taken into account when
preparing data for modeling. We applied various schemes of data cleansing relevant to each individual attribute. This approach
overcomes a common mistaken assumption of treating all dataset attributes during preprocessing.
For this project, the raw data extraction had to be processed to prepare data for various modeling algorithms and learning
schemes. We applied several methods from statistics, signal processing, and machine learning to maximize the information
content in the raw data while diminishing the influence of poor or missing measurements. These methods include filtering,
spectral decomposition, estimation of dynamic characteristics, and missing data imputation. For efficiency purposes, we also
created grouped categories for variables with large number of categories.
Domain experts helped rationalize some of the inconsistency which was discovered in the data. Additional derived attributes
were created in the preprocessing step to provide additional predictive power for the models. This phase of the project
constitutes a major milestone and involves a significant amount of time and resources. 60-70% of the project timeline is
expended on the tasks in this phase.
Exploratory Analysis Phase. Exploratory analysis phase (EAP) determines relationships among the variables. Bivariate
cross-correlation matrices help to identify causal variables (process inputs) that are potentially good predictors of response
variables of interest. This step involves determining the best noise-resilient predictors for the subsequent learning phase.
A previous study showed that noise levels in attributes data have a significant impact on the volatility of a learning model
(Sidahmed 2008). Attribute sensitivity to noise provides a practical method for determining attribute importance and feature
ranking. The EAP also identifies collinear input variables, which must be de-correlated when they are used to generate
multivariate predictive models.
We used multi-layer perceptron (MLP) artificial neural networks (ANN) to synthesize non-linear functions that would
optimally fit correlated multivariate data. We also performed sensitivity analysis to quantify the level of interaction between
various input variables. A visual representation of response surfaces generated by these functions greatly enhances
understanding how they represent the underlying correlation structure.
The exploratory analysis phase involved the determination of natural groupings of wellwork job attributes. We applied a
robust version of the K-Means clustering algorithm to find two sets of groupings in the wellwork population that distinguished
between the high success jobs and the low success jobs. Figure 2 shows the grouping of the data that was discovered for the
wellwork jobs. The similarity between low-efficiency wellwork jobs and relatively high-efficiency ones is projected in twodimensional space using the first two principal components to simplify the interpretation of the outcome. The two groups of
attributes were used to guide the subsequent selection of the most relevant predictors for the supervised learning models.
SPE-167869-MS
Figure 2: Clusters of Two Distinct Wellwork Outcome

Predictive Modeling Development and Calibration. A predictive model is a virtual process that is developed directly
from the data created in data preparation and the correlation structures obtained in the exploratory analysis. It is usually
necessary to adjust the process to accommodate changes in the uncontrolled variables. Broadly, modeling and optimization can
be used for the activities:
Operating Practices - evaluate current and alternative practices for performance and cost.
Fault Detections a special model configuration can detect a faulty sensor or manual input value, predict a
replacement from correlations with other variables, and generate an alarm.
Forecasts - inferential model for estimating spatio-temporal trend, with measurements that are normally observed
or computed periodically over time and space.
Predictive Modeling Approach.
The focus of this project was on the third activity. We adopted a hybrid approach employing both unsupervised and supervised
learning techniques for the project. We applied nine different machine learning algorithms to "training data" that had been
organized into cases (vectors) of associated inputs and outputs. The algorithms synthesize (learn) a generalized mathematical
formula that predicts values of outputs for different input values. A portion of the modeling data was set aside to provide
"test data" that was used to evaluate how well the models predicted outcomes using data that was not part of the learning
process. This process was also used to determine the best of learned algorithms.
Some supervised learning algorithms such as multi-layer perceptron (MLP) artificial neural networks (ANN) are fitted using
regressions similar to ones used for statistical models. However, in statistical modeling the forms of the functions that are
fitted are specified, e.g., a linear fit, whereas in statistical learning, an MLP ANN uses a flexible mathematical structure that is
automatically manipulated to provide a numerically optimized fit that is customized to the "shape" of the data. Recognizing the
nonlinear nature of the problem of predicting wellwork outcome, the cost functions for the competing algorithms were
assessed to determine the model quality and best fit for predictions.
Sensitivity Analysis of Various Models
A sensitivity analysis is conducted using a process that represents how much an output will change for a specified change in an
input. In a model having multiple inputs, the sensitivity is the slope of the models response surface at coordinates defined by
the values of the models inputs. In a linear model, the sensitivity of the output to a given input is constant; however, in a
nonlinear model, such as a Support Vector Machines (SVM) or an ANN, the sensitivity can change with any change in any
input. It is commonly a goal of data mining projects to determine which parameters most influence an output, and under what
conditions their influence is greatest.
In this project, mean sensitivities (Sen) were calculated by incrementing each input across its range. Other inputs were fixed at
their midranges, and averages were calculated for how much the models output changed across all of the increments. Sen
SPE-167869-MS
were normalized so that their absolute values | Sen | summed to 1.0. The input parameters were ranked according to their
| Sen |.
The signs of the mean sensitivities indicated whether the relationships between the inputs and the output were proportionate
(positive) or inverse (negative). This determination was important for verifying that a models input-output relationship was
consistent with known wellwork process parameters. Table 1 shows the top five mean sensitivities of input variables.
Because nonlinear model sensitivities vary whenever any input is changed, Sen values and rankings are merely suggestive and
not always a good representation of model behavior or process dynamics. Systematic model prototyping, ranging and response
surface visualization are critical for precise interpretation.
Table 1: Mean Sensitivity for Top Five Input Variables
Rank
Input Var
Sen
1
WELLATTR3
0.165
2
WELLATTR4
-0.111
3
MTNATTR6
0.107
4
JBATTR2
-0.082
5
JBATTR1
-0.080
Models Training and Validation
We partitioned the modeling dataset into 75% for model training and 25% for model validation. The data partition preserved
the ratio of the prevalence of the two-classes and employed a stratified sampling mechanism. A separate testing set was
withheld for final assessment and selection of the best model. We developed an automated process for encoding the supervised
learning target class outcome for historical wellwork jobs. The data partitions were derived directly from the results of the data
pre-processing step.
In order to determine the best model for prediction performance and to overcome some of the learning bias inherent in the
algorithms, several different models were trained to learn attributes for planned jobs for predicting outcome of wellwork. The
first two in the list below are ensemble models of separately trained underlying models, independent of the other seven learned
models:
Ensemble of weak learners - Boosting model

Random Forest model
Kernel learner - SVM model
Decision Trees model
Gradient descent model
Multi-layer perceptron ANN model
Logistic regression model
Rule-based induction model
Non-linear additive model
For all models, we carried out a variables selection process to include relevant predictors from the full set of inputs. The
feature selection procedure employed a forward stepwise least squares model using R-Squared and Chi-Square as selection
criterion for drawing inferences about expected outcomes.
Since multiple models were developed in parallel, an interim decision tree model was generated to facilitate preliminary
discussions with the stakeholders and obtain feedback on some primary results. The graphical representation of the decision
tree model is a valuable tool for concisely communicating results in a more concise manner with less complexity relative to
other algorithms. Figure 3 shows a pruned decision tree. The decision tree model provided a useful mechanism for the postpruning that produced compact decision rules.
SPE-167869-MS
Figure 3: Pruned Decision Trees Model

The selection of the best model was based on having a minimal misclassification rate and the largest area (AUC) under the
Receiver Operating Characteristic (ROC) curve. This curve plots false positive rates against the true positive rates at different
threshold levels. It represents the ratio of correct and incorrect predictions of wellwork outcomes. The graph highlights
tradeoffs between sensitivity and specificity. The curves close to the blue 45 diagonal line have small AUCs and represent
models that are less accurate. An AUC value closer to unity is desired. Model comparisons and performance are shown in
Figure 4. The best performing model has a balance between accurately predicting the least successful wellwork jobs and
incorrectly predicting that the job would fall into the unsuccessful category
Top Model
Figure 4: ROC Curve Analysis - Assessing Competing Models Performance

Figure 5 shows the importance of the predictive variables ranked from highest to lowest for the top model (Random Forest).
Sub-figure 5(a) indicates that the top three variables contributing to prediction accuracy are intervention, well and job related
respectively. 5(b) ranks variables in terms of contributing to the Gini impurity criteria at each split on that particular variable.
SPE-167869-MS
Attributes Importance
The decrease in Gini value, a measure of inequality for splits based on discrete variables, is another indicator of the importance
of each variable.
Lo
Hi
Lo
MTNATTR6
WELLATTR4
WELLATTR3
JBATTR2
JBATTR1
JBATTR1
PLNATTR1
MTNATTR6
JBATTR2
WELLATTR3
MTNATTR7
MTNATTR8
WELLATTR4
PLNATTR1
MTNATTR8
MTNATTR4
MTNATTR4
MTNATTR7
MTNATTR2
MTNATTR2
WELLATTR1
WELLATTR1
MTNATTR1
MTNATTR9
MTNATTR3
MTNATTR1
MTNATTR5
MTNATTR3
MTNATTR9
MTNATTR5
0
10
15
Hi
MeanDecreaseAccuracy
10
15
MeanDecreaseGini
5(b)
5(a)
Figure 5: Attributes Importance

Based on model selection criteria, Table 2 shows the confusion matrix for the top model performance with the training and
validation sets. The overall model accuracy is 76%, with misclassification rate of 24%. The model average squared error is
0.17. The model AUC value is 0.8.
Table 2: Model Performance on Training and Testing Datasets
Model performance on Training Set
Other
78.7%
Bottom 30 21.3%
Predicted
Bottom
Other 30
29.2%
70.8%
Actual
Actual
Predicted
Bottom
Other 30
Model performance on Validation

Set
Other
78.2%
33.3%
Bottom 30 21.8%
66.7%
The top model accuracy was characterized by computing the margins. For an ensemble model with two possible output (least
efficient wellwork job vs. others), the margin measures the degree to which average number of votes for the correct output
exceed the average votes for the other incorrect output. The larger the margin is, the more confidence in the model.
In this situation there are a couple of outcomes (two classes), specifically jobs that fall within the bottom 30% in terms of
success criterion, and those outside of the first group. The margin of each observation is computed as the proportion of votes
for the correct outcome adjusted for the highest proportion of votes among the wrong class. When the difference between the
two is positive, majority rule predicts the correct target class. Margin of predictions from the random forest model are
SPE-167869-MS
B30 Ensemble Model Margins
0.0
-1.0
-0.5
0.5
1.0
presented in Figure 6.
50
100
150
200
250
Index
Figure 6: Top Model Margins for the Two-class Outcome

Throughout the process of training models, we introduced several levels of complexity to the learned model. This process was
carefully examined for all the models developed for the project. Careful experimentation with various numbers of base
learners provided better trade-off choices between model performance and complexity. To account for operating environment
and deployment constraints, we favored less complex model whenever there was a similar performance between two models.
0.1
0.2
Error
0.5
The top and selected model exhibits stable behavior throughout the training, learning, and validation process. This provided an
additional level of confidence of model robustness. Fig.7 shows the average error rate for the random forest model over the
Error rates
/ MSE for
B30 model The graph reflects steady behavior over three different sets of the data.
training, testing, and validation
datasets
respectively.
200
400
600
800
1000
trees
Figure 7: Top Model Error Rates (MSE)

We applied post-model learning using multi-dimensional scaling (MDS) to examine the underlying structure of the distance
measures between thewellwork cases. MDS is a set of techniques that help identify key dimensions that could represent the
underlying structure with minimal loss of information. They represent the structure of distance-like data in a geometrical
fashion. One of the advantages of applying this multivariate data analysis method is to enable the visualization of the
similarity or dissimilarity among cases in a substantially lower dimensional space.
10
SPE-167869-MS
0.0
-0.4
-0.2
Dim 2
0.2
0.4
We projected wellwork jobs into two-dimensional space in Figure 8. We noticed that some of the wellwork jobs exhibit
mariginal characteristics that do not allow conclusive classification into either the successful or unsuccessful categories.
Overlapping points represent jobs with partial similarity in some of their attributes values. We expect that the models will have
more discriminatory
power as matrix
additional
variables for the wellwork planning become available as model inputs.
MDS of proximity
for input
B30 model
-0.2
0.0
0.2
0.4
Dim 1
Figure 8: Multi-Dimensional Scaling of Model

In addition to variables importance determined by the model, we used a post training assesssment for each developed model.
We examined the contribution of each attribute used for predicting the outcome of the model . The value of the post-training
analysis was to provide the business with a short list of the most relevant variables for planning a new wellwork. We computed
the intrinsic proximity measure between successful (high efficient) and unsuccessful (least efficient) wellwork jobs.
Fig. 9 illustrates the proximity matrix of the fifteen predictor variables and the two-class target variable. The diagonal
represents attributes name and two-dimensions. The shows that a particular wellwork job is similar to a job i but is far
dissimilar from another job j.
SPE-167869-MS
11
B30: Predictors and MDS of the Model

0.6
1.4
0.6
1.4
0.6
1.4
1.0 1.8
1.0 1.8
-1 2
-0.2
5 15
1.0 1.8
WELLATTR1
1.0 1.8
1.0 1.8
JBATTR2
1.4
PLNATTR1
1.0 1.8
0.6
MTNATTR1
1.4
MTNATTR2
1.0 1.8
0.6
MTNATTR3
1.4
MTNATTR4
1.0 1.8
0.6
MTNATTR5
1.0 1.8
MTNATTR6
1.0 1.8
MTNATTR7
1.0 1.8
MTNATTR8
1.0 1.8
MTNATTR9
-1 2
WELLATTR3
20
JBATTR1
WELLATTR4
5 15
1.0 1.8
1.0 1.8
1.0 1.8
1.0 1.8
1.0 1.8
1.0 1.8
20
-0.4
-0.2
-0.4
Figure 9: Top Model Predictor Variables Contribution to Dissimilarities/Similarities between Jobs

Conclusion
Investment decisions in the selection and prioritization of a wellwork portfolio have a significant impact on the potential for
production improvement and significant future cash flow. There are several challenges which contributing to the risk of the
decision to undertake wellwork: lack of relevant information, lack of knowledge to do proper risk assessment, and an ungainly
hurdle rate ($24 oil), among other boundary constraints. The application of data mining and predictive analytics has been
widely used across many disciplines, but it has been a less-exploited capability within oil and gas. In this paper we have shown
that these capabilities allow us to predict the outcome of planned wellwork activities with an adequate level of confidence.
This is a vital step towards mitigating the high risk associated with major annual expenditures and attain ultimate return on
investment.
This paper demonstrates the value of applying a data-driven predictive modeling approach to large historical wellwork data
and predicting the likely outcome for a new planned job versus pre-determined success criteria. For this project we trained
nine different machine learning algorithms and tested their performance against wellwork history. The competing models
performance was evaluated on a separate withheld testing set for best fit and prediction accuracy. Overall, the top model
achieved 76% accuracy for predicting wellwork outcome prior to job execution. The model relies on available information
without introducing any additional overhead to the established process.
Achieving more accurate predictions will be possible as additional readily available information about wellwork planning is
incorporated into predictive models. The project has highlighted issues around data quality that are prevalent within the
industry. As more streams of data are captured and stored, emphasis on improving data QA/QC versus missing and inaccurate
values will result in more accurately informed decisions.
Building on the promising results of predicting the least successful wellwork jobs, our plan is to extend predictive models to
support other business-defined criteria. While this project is limited to a particular asset, we will consider generalizing the
12
SPE-167869-MS
approach to more assets, hoping to realize the benefits of building intelligence augmentation into the wellwork decisionmaking process.
Acknowledgements
The authors would like to thank BP for approving the publication of this work and recognize the Alaska team for their support
of this project.
Nomenclature
AUC - Area Under the Curve
MDS - Multi-dimensional Scaling
MSE - Mean Squared Error
ROC - Receiver Operating Characteristic
References
Bhattacharya, S., Mauec, M., Yarus, J., Fulton, D., Orth, J. and Singh, A. Causal Analysis and Data Mining of Well
Stimulation Data Using Classification and Regression Tree with Enhancements, SPE 166472, presented at the SPE Annual
Technical Conference and Exhibition, 30 Sep-2 Oct 2013, New Orleans, Louisiana, USA.
Cismoski, D. A., Rossberg, R. S., Julian, J. Y., Murphy, G., Scarpella, D., Zambrano, A., Meyer, C. A. High-Volume
Wellwork Planning and Execution on the North Slope, Alaska, SPE 113955, presented at SPE/ICoTA Coiled Tubing and Well
Intervention Conference and Exhibition, 1-2 April 2008, The Woodlands, TX. USA
Esmaili, S., Kalantari-Dahaghi, A., Mohaghegh, S. Modeling and History Matching of Hydrocarbon Production from
Marcellus Shale Using Data Mining and Pattern Recognition Technologies, SPE 161184, presented at the SPE Eastern
Regional Meeting, 3-5 Oct 2012, Lexington, Kentucky, USA
King, G. Wellwork Candidate Selection, BP internal report, 2005.
Liu, Y., Horne, R. Interpreting Pressure and Flow Rate Data from Permanent Downhole Gauges Using Data Mining
Approaches, SPE 147298, presented at the SPE Annual Technical Conference and Exhibition 30 Oct-2 Nov 2011, Denver,
CO, USA
Martins, J. P., MacDonald, J. M., Stewart, C. G., Phillips, C. J. The Management and Optimization of a Major Wellwork
Program at Prudhoe Bay, SPE 30649, presented at the SPEAnnual Technical Conference and Exhibition, 22-25 October 1995,
Dallas, TX, USA
Postnikov, M. 2005. TNK-BP Wellwork Performance Management, SPE 94687, presented at the SPE European Formation
Damage Conference, Netherlands, 25-27 May 2005
Rawi, Z. Machinery Predictive Analytics, SPE 128559, presented at the SPE Intelligent Energy Conference and Exhibition,
23-25 March 2010, Utrecht, The Netherlands
Roberts, S. BP Field of the Future Flagship Technology Contribution to Target Production Increase by 2017, internal report,
2013.
Sharma, A., Srinivasan, S., Lake, L. Classification of Oil and Gas Reservoirs Based on Recovery Factor: A Data-Mining
Approach, SPE 130257, presented at SPE Annual Technical Conference and Exhibition 19-22 September 2010, Florence, Italy
Shirzadi, S., Ziegel, E., Bailey, R. Data Mining and Predictive Analytics Transforms Data to Barrels, SPE 163731, presented
at the SPE Digital Energy Conference and Exhibition, 5-7 Mar 2013, The Woodlands, TX, USA
Sidahmed, M. Attribute Noise-Sensitivity Impact: Model Performance and Feature Ranking, Proceedings of the 14th Americas
Conference on Information Systems 14-17 August 2008, Toronto, Ontario, Canada
SPE-167869-MS
13
Stone, P. Introducing Predictive Analytics: Opportunities, SPE 106865, presented at Digital Energy Conference and
Exhibition, 11-12 April 2007, Houston, Texas, USA.
Zang, G., Oberwinkler C. Predictive Data Mining Techniques for Production Optimization, SPE 90372, presented at SPE
Annual Technical Conference and Exhibition, 26-29 September 2004, Houston, Texas, USA
Field of the Future is a registered trademark of BP.

Spe 167869 MS

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Spe 167869 MS

Încărcat de

Drepturi de autor:

Formate disponibile

SPE-167869-MS

Enhancing Wellwork Efficiency with Data Mining and Predictive Analytics

Lift Cost ($/boe)

Figure 1: Success Criteria for Determining Wellwork Outcome

Figure 2: Clusters of Two Distinct Wellwork Outcome

Ensemble of weak learners - Boosting model

Figure 3: Pruned Decision Trees Model

Figure 4: ROC Curve Analysis - Assessing Competing Models Performance

Figure 5: Attributes Importance

Model performance on Validation

B30 Ensemble Model Margins

Figure 6: Top Model Margins for the Two-class Outcome

Figure 7: Top Model Error Rates (MSE)

Figure 8: Multi-Dimensional Scaling of Model

B30: Predictors and MDS of the Model

Figure 9: Top Model Predictor Variables Contribution to Dissimilarities/Similarities between Jobs

S-ar putea să vă placă și