Sunteți pe pagina 1din 9

OTC-28990-MS

Increasing Production Efficiency via Compressor Failure Predictive


Analytics Using Machine Learning

D. Pandya, Shell Global Solutions; A. Srivastava, Shell Business Operations; A. Doherty, Shell U.K. Limited;
S. Sundareshwar, Shell UK Ltd.; C. Needham, Shell U.K. Oil Products; A. Chaudry, Shell Global Solutions; S.
KrishnaIyer, Shell Business Operations

Copyright 2018, Offshore Technology Conference

This paper was prepared for presentation at the Offshore Technology Conference held in Houston, Texas, USA, 30 April–3 May 2018.

This paper was selected for presentation by an OTC program committee following review of information contained in an abstract submitted by the author(s). Contents of
the paper have not been reviewed by the Offshore Technology Conference and are subject to correction by the author(s). The material does not necessarily reflect any
position of the Offshore Technology Conference, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Offshore Technology Conference is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of OTC copyright.

Abstract
Unintended loss of uptime (trips) in gas compression systems is one of the top causes for unscheduled
deferment across hydrocarbon production facilities. Compression failures and the deferments they cause
have been at similar levels for the past 5–10 years. Causes for compressor failures could be attributed
to lack of or inappropriate maintenance, incorrect operating practices and integrity issues, as identified
in the Oil and Gas UK compressor study. The focus of this research paper is on compressor systems on
production facilities that have major production deferments associated with them. In this paper, an advanced
machine learning approach is presented for determining anomalous behavior to predict a potential trip and
probable root cause with sufficient warning to allow for intervention. One class support vector machines
(SVM) and rate of change based outlier detection have been utilized to classify abnormal operation and
detect the specific variables contributing to instability respectively. Development of the algorithms started
with data from more than 2,000 sensors on the low-pressure compressor as well as processes tied to the
compressor. This initial data set was reduced to more relevant tags by feature selection methodologies. Two
separate, one class SVM models were then trained on one years normal working data to identify abnormal
behavior considering the multivariant approach. An outlier detection algorithm was developed to identify
and rank major contributors for potential faulty behavior of the compressor. The algorithms are trained
and tested in R and a near real-time, online implementation is scheduled using Alteryx platform which
provides new predictions every 10 minutes. The results are then visualized on a Spotfire dashboard and when
initiated, the model flags abnormal behavior via automated email to the end user. At present, the algorithms
have improved the identification efficiency with a mediam detection time of seven hours. With upper
detection time as high as few days, investigation and remedial action is possible. As this field progresses
and identification time increases, the application of machine learning for compressor failure has potential to
revolutionize maintenance strategies and mitigate against the now-periodic downtime of compressors across
the industry. Current efforts to identify anomalous compressor behavior and degradation of performance
are limited to traditional exception based surveillance, using pre-determined limits and manual univariate
observation of critical compressor variables. This paper presents application of scalable machine learning as
2 OTC-28990-MS

more advanced methods of failure classification. We can now utilize the vast catalog of historical and real-
time data to build smart algorithms allowing engineers to go beyond basic anomaly detection mechanisms.
This predictive maintenance approach has huge potential to save production deferments caused by downtime
associated with rotating equipment failures.

Introduction
Predicting when an equipment or a system is going to fail sufficiently in advance and determining potential
cause of failure to take preventive action can unlock significant value in terms of deferment reduction.
A typical production facility is now equipped with anywhere from few thousand to more than hundred
thousand sensors which are collecting data at a sub-second time scales. The advancement in Industrial
Internet of Things (IIoT), and Big Data technologies now enable collection and processing of this high
volumes of data much more efficiently. Development in predictive algorithms opens up possibilities of
identifying patterns which was not possible before. Y. Peng et.al., (2010) highlight the need for more
complex approaches for condition based monitoring. The machinery as well as production systems have
become much more complex due to technological advanced and it becomes alost impossible to identify
and predict failure conditions in a timely manner (Y. Peng et.al., 2010). Prognostics of rotating machines is
critical for reliability and operational safety. A systems level approach to prognostics of rotating equipment
which has proved to be more effective compared to component level prognostics, is presented most recently
by Li., X. et al. (2018). Kumar, A. et al. (2017) have developed a big data analytics framework for condition
based monitoring(CBM) optimization approach that outperforms tradional frameworks. The combination of
big data driven approach with systems level modeling is quite promising and is being explored in this work.

Problem description and compressor layout


Discipline engineers generally have over 1,000 system parameters or more to consider at any given moment.
It is not possible to monitor all of these and the engineer has to manage plant health monitoring by exception,
looking at situations when important parameters exceed pre-set boundaries. This approach is reasonably
effective, but if our aim is to reduce deferment and achieve top-class reliability, we need something more
robust.
A model that could give advance warning of equipment failure would enable engineers to intervene before
it could trip. The machine-learning modelling developed for gas compression systems is an important step
towards positioning Predictive Maintenance is as a major enabler to minimise unscheduled deferment.
Machine learning explores the study and construction of algorithms that can learn from and make
predictions on data. One of the main attractions of this method is that it can extract useful information
from huge amounts of data: volumes that would be much too large for any human engineer to manage in
one go. For the compressor system considered considered, for example the analytics system was assessing
300 system parameters per minute. In this case, we are using machine learning models to help us find the
important correlations and weak signals that are a pre-cursor to compressor system trips that are hidden in
noisy and large volumes of data. Volumes of data is understood in terms of million and billions of data.
The first step in using a machine-learning system is to train the model on what is normal and abnormal
operating condition. The model can then classify real-time data from the equipment and indicate when
the equipment's performance strays outside the identified steady state. The ability to identify anomalies
is the big differentiator to traditional monitoring tools. In the past data correlation was time consuming,
with the advancements of new Digital Technologies, correlations and warnings could be achieved in matter
of minutes. When engineers receive a warning of predicted failure, they can take appropriate preventative
action well ahead of time. Oil and Gas industry is progressing multiple Advance Analytics (i.e. predictive)
opportunities equipping operators and engineers to face operations issues in a faster manner.
OTC-28990-MS 3

We analysed historical data for 2016 to assess how effective the system would be at predicting failures.
The proof-of-concept system correctly predicted 11 trip events over the course of the year: almost 50% of
the 23 actual failures that occurred during that period.
One of the most important aspects of the study was that the machine-learning model predicted many
failures hours in advance. In one case, it gave 36 hours’ notice. The median period of notice for all 11 eight
events was around 10 seven hours, which creates a substantial window for remedial action before failure.
Current solution is a step forward from the proof-of-concept based on rate of change approach and
has even better prediction and is more stable. The median period of notice for eight events which were
aubsequently analyzed was around seven hours.

Figure 1—Schematic for compressor system analysed and associated systems

Support Vector Machines


One class Support Vector Machines (SVM) are used in this study as a classifier for decting not- normal
state of the machine. SVMs wer developed by Cortes and Vapnik (1995) for binary classification. They
have proved to be a powerful tool in general classification, regression ad novelty detection [r ref]. The R
package e1071 developed by David Meyer [ref] which mimics the implementation by Chang and Lin (2001)
in LIBSVM package is used for this study. This is formulation of this model is based on work by Scholkopf
et al. (2000). Chen, K. et al. (2011) have demonstrated that using SVM based method for equipment fault
detection in a thermal power plant. Authors have argued that the SVM classifier has better results compared
to techniques such as linear discriminant analysis (LDA) and back-propagation neural networks (BPN)
when classification is done. Li, K., et al. (2003) discuss the problem of intrusion detection is in information
systems. One-class SVM is used to identify attacks and misuse patterns for anomaly detection. Novelty
detection in timeseries data is investigated by Ma, J. and Perkins, S. (2003). One-class SVM is used to
identify the outliers. Experiments are performed both on real and synthetic data to validate the performance
of the model.
The compressor usually operates under normal working conditions. This poses as a highly unbalanced
problem for a twoclass classification. Due to this highly unbalanced nature of the problem, one class
classification using SVM is implemented. The algorithm is trained on only normal data and it creates a
4 OTC-28990-MS

representation of this data. When the new points inferred are substantially different to the modeled class,
they are labled as outlier. Linear as well as radial kernel functions ar explored.
One of the properties of SVM is that it may create a non-liner decision boundy by projecting the data
throught non-liner function to a space in higher dimension. The one-class SVM basically seperates all the
data points from the origin in feature space and maximized the distance from this hyperplane to the origin.
This crates a binary function that captures region in the input space where most of the data lives. The function
thus formed, returns +1 for the region defined by training data points and -1 everywher else. Details of the
formulation can be found in work by Scholkopf et a. (2000)

Figure 2—Simple demonstration of SVM

Solution
The low-pressure compressor is a production critical equipment and it is ideal that it does not trip. For that
reason, it is bypassed or manually shutdown when the engineers notice an anomaly. This is the primary
reason of having a data set which does not include real trips but many manual shutdowns. For instance,
out of 31 documented deferments in 2017; four were manual shutdowns, seven were process standby and
19 were classified as breakdowns. Out of 19 breakdowns, only six were due to some fault on the LPC and
remaining 13 were due to sub-systems upstream to LPC and start-up failures. Hence it was not possible to
get (a) all modes of failure of the LPC (b) get repetitive modes of failures of the LPC. Accordingly, One
class SVM was chosen as an appropriate strategy to model the normal working of the LPC. This would
enable us to identify any event, irrespective of the mode of failure, that is not normal.
For training the SVM for LPC and its processes approximately 300 analogue tags were identified.
Analogue tags are lesser than the digital tags but the data points are continuous and not discrete as in Digital
tags. This helps the model to observe every increment and flag an anomaly much before a threshold is
breached. These 300 tags were split into LPC and Process model. LPC model consists of close to 230
input tags and the Process model inputs close to 70 tags. Process model include sub-systems such as - 1st
Stage Separator, AMINE PRE-COOLERS, Test Separator etc. Respective tags of LPC and Process undergo
certain pre-processing steps before they are fed into respective SVM models, these are: Removal of low
variance tags, Linear interpolation of missing data, Removal of alarms and scaling of data
To identify the normal working condition of the LPC two strategies were explored: (1) Events log – Every
event on the LPC is recorded as an error and later classified in to a breakdown or other type of deferment. (2)
Process steady state – based on positioning of key valves, pressure and temperature limits; which indicate
that the LPC is online and producing. Further, it was decided to create certain margins in time for the LPC
and Process model. For LPC, data points up to two hours prior and after the deferment were considered
as not normal. Similarly, for the Process model a margin of six hours was considered. For modeling the
OTC-28990-MS 5

normal data, the downtime (approach a) / offline (approach b) data was removed. Also removed were the
margins identified for LPC and the Process.
Feature Engineering: Several methods were adopted to generate features other than the raw data to better
train and improve the accuracy of SVM models, key variants are:

• Use of mean values as features

• Use of standard deviation as features

• Calculation of z-scores and use as features

• Range scaling, based on maximum and minimum

• Raw Rate of change as features

• Cumulative Rate of change as features

• Use of aggregated rate of change calculated over a rolling window as features

• Aggregation of data over rolling window as lag to calculate rate of change

Best performing One class SVM model is based on the "Process steady state" to define the normal
working condition of the machine and "Aggregation of data over rolling window as lag to calculate rate
of change" as feature to train the SVM models for LPC and Process. The test data stream is fed in every
10 minutes and same pre-processing steps are performed as for training the model. Rolling window of one
week is identified as the best measure to smoothen the data and to calculate rate of change from the current
observation. Model output contains a reduced list of tags – approximately 200 tags for LPC and 30 tags for
the Process model. The reduction in the number of tags is explained by the preprocessing steps – "Removal
of low variance tags" and "Removal of Alarms". The output of SVM model is Boolean in nature, indicating
a data point as normal or anomaly. This continuous stream of output indicates the state of the LPC and its
process in near real time.
Whenever the One-class SVM model flips from normal to not normal (i.e. "TRUE" to "FALSE"), it is
considered as an alert or a change in state of LPC. Whenever a flip is encountered, all the values of all tags
i.e. Rate of Change, is ordered in descending order and top ten tags are picked and reported. The root cause
identification is a derived mechanism which considers the SVM output as the only input.
The performance data for the Low-Pressure Compressor is stored in a PI System™ time series database
provided by OSIsoft. A workflow was built in Alteryx to carry out the end-to-end ETL and data processing
tasks, this workflow is deployed on a server and run every10 minutes. Each time the workflow is run the
API to the PI database is called by Alteryx which ingests the previous 10 minutes of data, at one minute
frequency, for the approximately 300 selected compressor and process tags. This data is passed through
various Alteryx modules to clean, transform and validate the data. The data is then passed through R
modules embedded within the Alteryx workflow, standard R libraries are used to generate the binary outlier
classifications from the compressor data with an SVM algorithm. Alteryx then outputs the results, which
are simply a Boolean "TRUE" or "FALSE" indicating failure prediction, to a SQL database. Spotfire was
used to build a visualisation of these results connected directly to the database. When the results change
from "TRUE" to "FALSE" and stays "FALSE" for more than 10 minutes Alteryx sends an email to a mailist
to alert that there has been a failure prediction. The solution workflow diagram and solution architecture is
presented in Figure 3 and Figure 4 respectively
6 OTC-28990-MS

Figure 3—Solution Workflow

Figure 4—Solution Architecture

Results
The continuous output from the two SVM model can be fed into any dashboard such as Spotfire for visual
inspection. An auto-generated email is also configured to alert the engineers about the state of the LPC
and the top ten tags which can be potential root causes. Accuracy is also a derived measure. Since the alert
before a trip or breakdown can vary in time. On few occasions, the trip alert is few minutes/hours in advance
while at others it can be few days in advance. Figure 5 and Figure 6 show the dash board on validation data
set with results from LPC model and Process model respectively. The red lines are timing for actual trips
and the black bars represent trigger from our model. Accuracy is measured based on the number of flips
(switch from "TRUE" to FALSE") generated by the model in each time period say a day or a week. The
model should not flip or raise alerts too frequently so that it generates random noise and at the same time
should flag the trips sufficiently in advance for the engineers to act. Current best model for LPC generated
OTC-28990-MS 7

~70 flips / alerts over a period of six months. On average 11 alerts a month or two alerts a week. However,
it is to be noted that the number of alerts pile up towards an impending trip indicating a build up. Hence
the random noise is less than two alerts a week. Similarly, for the Process model the best model till date
generates approximately 80 alerts over six months period, averaging to 13 alerts a month and three alerts a
week. Similar build up toward the trip can be observed in the Process model as in the LPC model.

LPC Model:

Figure 5—Model output for LPC model

Process Model

Figure 6—Model output for Process model

The root cause identification has performed well and showed clear patterns when similar type of failure /
trip was encountered. For instance, there was an issue with the "Feedback Arm" where it came loose and
flow could not be controlled properly leading to a trip; in such instances "LP Comp Standard Flow" was
picked up by the algorithm as the top root cause. In one instance, the alert was one almost 12 hours in
advance. In certain cases, it was just few minutes, this is attributed to some manual intervention which failed
and resulted in trips. In most cases there is sufficient time for the engineers to take preventive measures.
Preformace for each validation trip is presented in Table 1
8 OTC-28990-MS

Table 1—Model Performance on validation data set

Conclusion
We have demonstrated development and deployment of an intelligent prognostic systems that leaverageds
machine learning method like one-class SVM and looks at data in a multivariant domain. A systems level
thinking, knowledge of first principles based modeling and expertise in machine learning methods are key
to developing a successful predictive maintenance strategy. Oil and gas equipment m like a low-pressure
compressor, operates in unique conditions and in typically unique configurations designed for a specific
application. This leads to a sparse data problem. A predictive machine learning model which is able provide
more information than what is already known by domain expertise and first principles is devloped. This is
achieved in three ways, (1) a machine learning model can identify patterns and correlations amongst large
number of variables and derived features from these variables that are not considered in simulation models,
(2) it can identify anomalous behaviour in multivariant fashion sufficiently in advance of onset of symptoms
of abnormality in traditionally monitored parameters and (3) it is able to provide potential root causes of
failure for taking corrective actions. Further work on scalability and development of generalizable model
is suggested.

Reference
Chen, K., Chen, L., Chen, M. et al 2011. Using SVM based method for equipment fault detection in a thermal power plant.
Computers in Industry 62 (1): 42-50. doi: https://doi.org/10.1016/j.compind.2010.05.013.
Cortes, C. and Vapnik, V. 1995. Support vector machine. Machine Learning 20 (3): 273-97.
Ho, S. L., Xie, M. and Goh, T. N. 2002. A comparative study of neural network and Box-Jenkins ARIMA
modeling in time series prediction. Computers & Industrial Engineering 42 (2): 371-5. doi: https://doi.org/10.1016/
S0360-8352(02)00036-0.
Kumar, A., Shankar, R. and Thakur, L. S. 2017. A big data driven sustainable manufacturing framework for condition-
based maintenance prediction. Journal of Computational Science doi: https://doi.org/10.1016/j.jocs.2017.06.006.
Li, K., Huang, H., Tian, S. et al 2003. Improving one-class SVM for anomaly detection. Presented at the Machine Learning
and Cybernetics, 2003 International Conference on.
Li, X., Duan, F., Mba, D. et al 2018. Rotating machine prognostics using system-level models. In Engineering Asset
Management 2016, Chap. 123-141: Springer.
Ma, J. and Perkins, S. 2003. Time-series novelty detection using one-class support vector machines. Presented at the
Neural Networks, 2003. Proceedings of the International Joint Conference on.
Peng, Y., Dong, M. and Zuo, M. J. 2010. Current status of machine prognostics in condition-based maintenance: a review.
The International Journal of Advanced Manufacturing Technology 50 (1): 297-313.
OTC-28990-MS 9

Shin, H. J., Eom, D. and Kim, S. 2005. One-class support vector machines—an application in machine fault detection and
classification. Computers & Industrial Engineering 48 (2): 395-408. doi: https://doi.org/10.1016/j.cie.2005.01.009.
Susto, G. A., Schirru, A., Pampuri, S. et al 2015. Machine learning for predictive maintenance: A multiple classifier
approach. IEEE Transactions on Industrial Informatics 11 (3): 812-20.
Suykens, J. A. and Vandewalle, J. 1999. Least squares support vector machine classifiers. Neural Processing Letters 9
(3): 293-300.
Tran, V. T., Thom Pham, H., Yang, B. et al 2012. Machine performance degradation assessment and remaining useful life
prediction using proportional hazard model and support vector machine. Mechanical Systems and Signal Processing
32: 320-30. doi: https://doi.org/10.1016/j.ymssp.2012.02.015.
Widodo, A. and Yang, B. 2007. Support vector machine in machine condition monitoring and fault diagnosis. Mechanical
Systems and Signal Processing 21 (6): 2560-74. doi: https://doi.org/10.1016/j.ymssp.2006.12.007.

S-ar putea să vă placă și