SlideShare a Scribd company logo
1 of 35
PTE588 – Smart Oilfield Data Mining
December 12th
, 2013
Rod Pump Failure Prediction
Team 2
William Chessum, Marco Jarrin, Max Solanki, Xin Jason Du &
Jeffrey Daniels
Rod Pump Failure Prediction Team 2
2 | P a g e
Table of Contents
1 Rod Pump Introduction........................................................................................................... 5
2 Exploring the Problem Space.................................................................................................. 5
2.1 Raw Rod Pump Data........................................................................................................ 6
2.2 Processed Rod Pump Data ............................................................................................... 8
3 Exploring the Solution Space................................................................................................ 12
3.1 Preliminary Approach: ................................................................................................... 12
3.2 First Approach:............................................................................................................... 12
3.3 Second Approach: .......................................................................................................... 12
4 Implementation Specification............................................................................................... 13
5 Data Mining Development:................................................................................................... 13
5.1 Preliminary Approach .................................................................................................... 13
5.1.1 Attribute Selection:................................................................................................. 13
5.1.2 Classifiers:............................................................................................................... 14
5.2 First Approach:............................................................................................................... 16
5.2.1 Data Selection:........................................................................................................ 16
5.2.2 Clusterization:......................................................................................................... 16
5.2.3 Classification: ......................................................................................................... 16
5.2.4 Future failure prediction algorithm:........................................................................ 17
5.2.5 Testing: ................................................................................................................... 18
5.3 Second Approach ........................................................................................................... 19
5.3.1 Data Surveying........................................................................................................ 19
5.3.2 Data Preparation...................................................................................................... 20
5.3.3 Clustering................................................................................................................ 22
5.3.4 Data Modeling ........................................................................................................ 23
5.3.5 Testing Results........................................................................................................ 24
6 Conclusions........................................................................................................................... 34
7 Works Cited .......................................................................................................................... 35
Tables
Table 1. Future Failure Prediction from cluster transitions.......................................................... 17
Table 2 - Results for the models using the different performance evaluation methods for the
failure examples............................................................................................................................ 24
Table 3 - Results for the models using the different performance evaluation methods for the
normal examples........................................................................................................................... 25
Rod Pump Failure Prediction Team 2
3 | P a g e
Table of Figures
Figure 2 - Problem definition: Data Sets ........................................................................................ 5
Figure 1 - Surface and pump card................................................................................................... 5
Figure 3 – Raw card area; normal and failure data......................................................................... 6
Figure 4 – Raw peak surface load; normal and failure data ........................................................... 7
Figure 5 – Raw minimum surface load; normal and failure data ................................................... 7
Figure 6 – Raw daily run time; normal and failure data................................................................. 8
Figure 7 - Processed card area; normal and failure data................................................................. 9
Figure 8 - Processed peak surface load; normal and failure data ................................................... 9
Figure 9 - Processed minimum surface load; normal and failure data ......................................... 10
Figure 10 - Processed daily run time trend; normal and failure data............................................ 10
Figure 11 - WEKA visualization results....................................................................................... 11
Figure 12 - Problem process flow diagram................................................................................... 13
Figure 13 - PCA Analysis............................................................................................................. 14
Figure 14 - ClassifierSubsetEval (K nearest neighbor) /GeneticSearch....................................... 14
Figure 15 - Classifier nearest Neighbor........................................................................................ 15
Figure 16 - Classifier Decision/Regression Tree.......................................................................... 15
Figure 17 - (a) CardArea, (b) Clusters, (c) New class labeled, (d) Run time for comparisson ... 16
Figure 18 - Classification Results using K-nearest neighbor........................................................ 17
Figure 19 - Classification Results using Decision tree ................................................................. 17
Figure 20 - Future failure prediction............................................................................................. 18
Figure 21 - Card Area of Failure Example, Well 19 plotted showing missing data before
preprocessing ................................................................................................................................ 19
Figure 22 - Card Area of Normal Example, Well 02 plotted showing noise before preprocessing
....................................................................................................................................................... 19
Figure 23 - Sample of data from Failure Example Well11, showing daily runtime attribute as 0
but the other attributes having non-zero values ............................................................................ 20
Figure 24 - Snippet of MatLab preprocessing code showing how missing data was filled for Well
01................................................................................................................................................... 20
Figure 25 - Snippet of MatLab preprocessing code showing the spans used for the various wells
....................................................................................................................................................... 21
Figure 26 - Snippet of MatLab code used showing the implementation of the mefilt1 function. 21
Figure 27 - Plot of smoothed CardArea attribute for failure example Well 19............................ 21
Figure 28 - Plot of smoothed CardArea attribute for Well 02...................................................... 22
Figure 29 - Snippet of MatLab preprocessing code for normalizing the various well datasets.... 22
Figure 30 - Visualization of various clusters for Card Area plotted against History Date for Well
19................................................................................................................................................... 23
Figure 31 - Results for 10-Fold Cross Validation using Logistic Regression for failure examples
dataset ........................................................................................................................................... 25
Figure 32 - Results for Percentage Split using Logistic Regression for failure examples dataset 26
Figure 33 - Results for learning class of training set instances for Logistic Regression for failure
examples dataset ........................................................................................................................... 26
Figure 34 - Results for predicting class of test set instances using Logistic Regression for unseen
failure examples dataset................................................................................................................ 27
Rod Pump Failure Prediction Team 2
4 | P a g e
Figure 35 - Results for 10-Fold Cross Validation using Logistic Regression for normal examples
dataset ........................................................................................................................................... 27
Figure 36 - Results for Percentage Split using Logistic Regression for normal examples dataset
....................................................................................................................................................... 28
Figure 37 - Results for learning class of training set instances using Logistic Regression for
normal examples........................................................................................................................... 28
Figure 38 - Results for predicting class of test set instances using Logistic Regression for unseen
normal examples........................................................................................................................... 29
Figure 39 - Results for 10-Fold Cross Validation using Neural Network for failure examples
dataset ........................................................................................................................................... 30
Figure 40 - Results for Percentage Split using Neural Network for failure examples dataset ..... 30
Figure 41 - Results for learning class of training set instances using Neural Network for failure
examples dataset ........................................................................................................................... 31
Figure 42 - Results for predicting class of test set instances using Neural Network for unseen
failure examples dataset................................................................................................................ 31
Figure 43 - Results for 10-Fold Cross Validation using Neural Network for normal examples
dataset ........................................................................................................................................... 32
Figure 44 - Results for Percentage Split using Neural Network for normal examples dataset .... 32
Figure 45 - Results for learning training set instances using Neural Network for normal examples
dataset ........................................................................................................................................... 33
Figure 46 - Results for predicting class of test set instances using Neural Network for unseen
normal examples dataset............................................................................................................... 33
Rod Pump Failure Prediction Team 2
5 | P a g e
1 Rod Pump Introduction
Rod pumps are currently the most widely used artificial lift method within the oil and gas
industry. They are employed to bring fluid to the surface when the reservoir pressure is
insufficient. Rod pump failures which include surface, tubing, and down-hole failures,
commonly occur and will result in production
loss and operational expenses. If you can detect
the precursors of a well failure or the well failure
itself you can minimize downtime and increase
operational efficiency (Liu & Patel, March 26-28,
2013).
There are many different operating variables you can
examine to determine a pumps operating status but
the most telling is the dynamometer card [Figure 1].
This will indicate the maximum and minimum
forces being applied to the rod pump and also the
area inside the pump cycle. For the purpose of this
report we will be using the pump card area to
determine rod well failures.
2 Exploring the Problem Space
The objective of this project is to develop a data mining technique to accurately predict rod pump
failures based on a set of data from an unknown oil field. Rod pump data from normal and
failure wells was given with both raw and processed data. Additionally, each well had four
numeric attributes; card area, peak surface load, minimum surface load and daily run time.
Figure 2 - Problem definition: Data Sets
Figure 1 - Surface and pump card
Card Area
Rod Pump Failure Prediction Team 2
6 | P a g e
2.1 Raw Rod Pump Data
The below four figures (Figure 3, Figure 4, Figure 5 & Figure 6) depict the raw data from 20
different wells (10 failure and 10 normal) over the course of roughly one year. Four attributes
were given for each well; card area, peak surface load, minimum surface load and daily run time.
The data is quite noisy with multiple outliers and cannot be used for analysis purposes.
Processing and normalizing is required before data mining techniques can be applied.
Figure 3 – Raw card area; normal and failure data
0
5000
10000
15000
20000
25000
30000
1-Jan-16
26-Jan-16
20-Feb-16
16-Mar-16
10-Apr-16
5-May-16
30-May-16
24-Jun-16
19-Jul-16
13-Aug-16
7-Sep-16
2-Oct-16
27-Oct-16
21-Nov-16
16-Dec-16
10-Jan-17
CardArea
CardArea Failure
Normal
Rod Pump Failure Prediction Team 2
7 | P a g e
Figure 4 – Raw peak surface load; normal and failure data
Figure 5 – Raw minimum surface load; normal and failure data
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
1-Jan-16
26-Jan-16
20-Feb-16
16-Mar-16
10-Apr-16
5-May-16
30-May-16
24-Jun-16
19-Jul-16
13-Aug-16
7-Sep-16
2-Oct-16
27-Oct-16
21-Nov-16
16-Dec-16
10-Jan-17
PeakSurfLoad
PeakSurface Load Failure
Normal
0
5000
10000
15000
20000
25000
30000
35000
1-Jan-16
26-Jan-16
20-Feb-16
16-Mar-16
10-Apr-16
5-May-16
30-May-16
24-Jun-16
19-Jul-16
13-Aug-16
7-Sep-16
2-Oct-16
27-Oct-16
21-Nov-16
16-Dec-16
10-Jan-17
MinSurfLoad
MinimumSurface Load Failure
Normal
Rod Pump Failure Prediction Team 2
8 | P a g e
Figure 6 – Raw daily run time; normal and failure data
2.2 Processed Rod Pump Data
The below four figures (Figure 7, Figure 8, Figure 9 & Figure 10) depict the processed data from
19 different wells (9 failure and 10 normal) over the course of roughly one year. Four attributes
were given for each well; card area, peak surface load, minimum surface load and daily run time.
After processing and normalization, the data shows distinct clustering and an obvious contrast
can be seen between normal and failure wells. This data will be used in MatLab and WEKA
using various data mining techniques to predict rod pump failures.
0
5
10
15
20
25
30
1-Jan-16
26-Jan-16
20-Feb-16
16-Mar-16
10-Apr-16
5-May-16
30-May-16
24-Jun-16
19-Jul-16
13-Aug-16
7-Sep-16
2-Oct-16
27-Oct-16
21-Nov-16
16-Dec-16
10-Jan-17
DailyRunTime,hrs
Daily Run Time Failure
Normal
Rod Pump Failure Prediction Team 2
9 | P a g e
Figure 7 - Processed card area; normal and failure data
Figure 8 - Processed peak surface load; normal and failure data
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1-Jan-16
26-Jan-16
20-Feb-16
16-Mar-16
10-Apr-16
5-May-16
30-May-16
24-Jun-16
19-Jul-16
13-Aug-16
7-Sep-16
2-Oct-16
27-Oct-16
21-Nov-16
16-Dec-16
10-Jan-17
NormalizedCardArea
CardArea Failure
Normal
0
0.2
0.4
0.6
0.8
1
1.2
1-Jan-16
26-Jan-16
20-Feb-16
16-Mar-16
10-Apr-16
5-May-16
30-May-16
24-Jun-16
19-Jul-16
13-Aug-16
7-Sep-16
2-Oct-16
27-Oct-16
21-Nov-16
16-Dec-16
10-Jan-17
NormalizedPeakSurfLoad
PeakSurface Load Failure
Normal
Rod Pump Failure Prediction Team 2
10 | P a g e
Figure 9 - Processed minimum surface load; normal and failure data
Figure 10 - Processed daily run time trend; normal and failure data
Figure 11 shows the results from WEKA using the Visualization tool. From this analysis you can
see that the card area is the most important attribute in identifying potential rod pump failures.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1-Jan-16
26-Jan-16
20-Feb-16
16-Mar-16
10-Apr-16
5-May-16
30-May-16
24-Jun-16
19-Jul-16
13-Aug-16
7-Sep-16
2-Oct-16
27-Oct-16
21-Nov-16
16-Dec-16
10-Jan-17
NormalizedMinSurfLoad
MinimumSurface Load Failure
Normal
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1-Jan-16
26-Jan-16
20-Feb-16
16-Mar-16
10-Apr-16
5-May-16
30-May-16
24-Jun-16
19-Jul-16
13-Aug-16
7-Sep-16
2-Oct-16
27-Oct-16
21-Nov-16
16-Dec-16
10-Jan-17
NormalizedDailyRunTime
Daily Run Time Failure
Normal
Rod Pump Failure Prediction Team 2
11 | P a g e
On the other hand, daily run time is the least significant attribute due to the inability to determine
why the well is down i.e. regular well maintenance or true failure.
Figure 11 - WEKA visualization results
Rod Pump Failure Prediction Team 2
12 | P a g e
3 Exploring the Solution Space
3.1 Preliminary Approach:
• Analyze processed data globally
• Class: Run-time
• Attribute selection:
– Visual, PCA analysis, ClassifierSubsetEval/GeneticSearch
• Classifier:
– K-nearest neighbors, Decision Trees
• Need to define new class from data-set to interpret failures (adding class labels)
3.2 First Approach:
• Heuristic: Card Area trend seems to predict failures
• Only Card Area Clustering: K-means, 5 clusters.
• Define Class labels from cluster numbers
• Training and Test data Sets: holdout split 2/3 & 1/3
• Classifier: K-nearest neighbor, Naïve Bayes, decision/regression tree (predicts current well
status)
• Challenge: Learning from cluster transitions to predict future failures
3.3 Second Approach:
• Raw data for failure and normal wells was processed for: missing data, outliers, then
normalized.
• Added a class label based on clusterization using all the attributes (except well name and
history) by Expectation Maximization Method.
• Model learning using different algorithms, with 10-fold cross validation: Decision Tree,
Neural Network, Logistic Regression, AdaBoost (Decision Stump classifier), Naïve Bayes
Rod Pump Failure Prediction Team 2
13 | P a g e
4 Implementation Specification
Figure 12 - Problem process flow diagram
5 Data Mining Development:
5.1 Preliminary Approach
We analyzed the data from normal wells and from failure wells. At this point our objective was
to identify a model that indicates a well failure, which means DailyRunTime = 0. Since we
require recognizing a perturbation from a failure-type well, we decided to use only a unified
dataset from the failure wells. Data Preparation was not required since processed data was
already provided.
5.1.1 Attribute Selection:
PCA analysis, ClassifierSubsetEval/GeneticSearch, were evaluated against the DailyRunTime
Class to determine if a dimension reduction makes sense.
Figure 13 and Figure 14 show the results from Weka.
Rod Pump Failure Prediction Team 2
14 | P a g e
Figure 13 - PCA Analysis
Figure 14 - ClassifierSubsetEval (K nearest neighbor) /GeneticSearch
5.1.2 Classifiers:
K-nearest neighbors and Decision/Regression Tree were used for evaluation, as shown in Figure
15 and Figure 16
Rod Pump Failure Prediction Team 2
15 | P a g e
Figure 15 - Classifier nearest Neighbor
Figure 16 - Classifier Decision/Regression Tree
From the analysis, we determined we will need to add a class label to the data set.
Rod Pump Failure Prediction Team 2
16 | P a g e
5.2 First Approach:
By a heuristic process, using graphical inspection, we determined that card area declining trend
could be used to predict the well failures.
5.2.1 Data Selection:
We selected a training and test data sets using a holdout split 2/3 & 1/3 from the failure wells.
5.2.2 Clusterization:
We attacked this problem using clusterization (K-means) for labeling 5 clusters as follow:
1 = shutdown well, 2 = Prefailure, 4 = Normal, 3 and 5 = quasi normal states.
Figure 17.b shows the colored data of CardArea, once 5 clusters were obtained.
Figure 17.c, the new labeled class, can be related to the run time to predict a failure.
The transition between steps could be used to anticipate a shutdown, once the information is
validated with the run time.
Figure 17 - (a) CardArea, (b) Clusters, (c) New class labeled, (d) Run time for comparisson
5.2.3 Classification:
Once the new label class was added to the data set, we used 3 types of classifiers to predict the
current well status: K-nearest neighbor, Naïve Bayes, and decision/regression tree (with a 10-
0 200 400 600 800 1000 1200
0
0.5
1
1.5
CardArea1
0 50 100 150 200 250 300 350 400 450 500
0
0.5
1
1.5
1
2
3
4
5
0 200 400 600 800 1000 1200
0
2
4
6
new attribute
0 200 400 600 800 1000 1200
0
0.5
1
1.5
Run time
Rod Pump Failure Prediction Team 2
17 | P a g e
fold cross validation). Figure 18 and Figure 19 show the comparison between K-nearest neighbor
and decision tree results.
K-nearest neighbor
Figure 18 - Classification Results using K-nearest
neighbor
Decision/regression tree
Figure 19 - Classification Results using Decision tree
5.2.4 Future failure prediction algorithm:
We determined that by using a simple logic between transitions we would be able to predict
future failures. We first detected the flank changes in the label class. Then we identified each
negative transition and generated 3 annunciator labels: Warning, Failure Probable and Failure
Imminent, as follow:
Table 1 - Future Failure Prediction from cluster transitions
State 1 State 2 Operator Label Priority (0-100)
From 5 or 4 to 3 Warning 10
From 5 or 4 to 2 Failure probable 50
From Warning to 2 Failure Imminent 100
Rod Pump Failure Prediction Team 2
18 | P a g e
5.2.5 Testing:
Figure 20 shows the results for well 16. After the model and prediction algorithm have been
applied, this algorithm was able to predict a failure 13 days in advance.
Figure 20 - Future failure prediction
0 20 40 60 80 100 120 140 160 180 200
1
2
3
4
5
new attribute
0 20 40 60 80 100 120 140 160 180 200
0
0.5
1
1.5
Run time
0 20 40 60 80 100 120 140 160 180 200
0
20
40
60
80
100
Failure Prediction
Well-16
Declining trend
learned
Prediction of
failure alarm
(13 days before
real event)
Warnings (not
imminent failures)
Real Failure
Rod Pump Failure Prediction Team 2
19 | P a g e
5.3 Second Approach
5.3.1 Data Surveying
When the unprocessed data from the normal and failure examples are visualized in MatLab, it is
seen that there are some missing values from the data set as seen in Figure 21
Figure 21 - Card Area of Failure Example, Well 19 plotted showing missing data before preprocessing
Moreover, the curves have some noise as seen in Figure 22
Figure 22 - Card Area of Normal Example, Well 02 plotted showing noise before preprocessing
Lastly, when the csv file is checked some instances have their Daily Run Time attribute as a zero
value, while the other attributes in those instances have non-zero values as seen in the second
row of the Figure 23
Rod Pump Failure Prediction
Figure 23 - Sample of data from Failure Example Well11, showing daily runtime attribute as 0 but the other
attributes having non-zero values
5.3.2 Data Preparation
The datasets were preprocessed individual
the unprocessed data ready for modeling, first the missin
done because since there is no data for those instances we can assume the rod pumps have failed,
so representing those missing instance attributes with a zero value is justifiable.
Figure 24 - Snippet of MatLab preprocessing code showing how missing data was filled for Well 01
After the missing data have been filled, the curves are then smoothed using a moving median
with varying spans to remove noise. This helps to minimize overfitting
to be used. The normal examples are smoothed over varying spans of specifi
seen in Figure 25. The varying spans for the moving median was to avoid distortion of the
individual datasets. The failure examples however were all smoothed using a moving median
span of 11.
Rod Pump Failure Prediction
Sample of data from Failure Example Well11, showing daily runtime attribute as 0 but the other
The datasets were preprocessed individually using MatLab as shown in Figure
the unprocessed data ready for modeling, first the missing values are filled in with 0. This is
done because since there is no data for those instances we can assume the rod pumps have failed,
so representing those missing instance attributes with a zero value is justifiable.
Snippet of MatLab preprocessing code showing how missing data was filled for Well 01
After the missing data have been filled, the curves are then smoothed using a moving median
with varying spans to remove noise. This helps to minimize overfitting of the data mining model
to be used. The normal examples are smoothed over varying spans of specifi
. The varying spans for the moving median was to avoid distortion of the
individual datasets. The failure examples however were all smoothed using a moving median
Team 2
20 | P a g e
Sample of data from Failure Example Well11, showing daily runtime attribute as 0 but the other
Figure 24. In order to have
g values are filled in with 0. This is
done because since there is no data for those instances we can assume the rod pumps have failed,
so representing those missing instance attributes with a zero value is justifiable.
Snippet of MatLab preprocessing code showing how missing data was filled for Well 01
After the missing data have been filled, the curves are then smoothed using a moving median
of the data mining model
to be used. The normal examples are smoothed over varying spans of specifically 1, 3 and 9 as
. The varying spans for the moving median was to avoid distortion of the
individual datasets. The failure examples however were all smoothed using a moving median
Rod Pump Failure Prediction
Figure 25 - Snippet of MatLab preprocessing code showing the spans used for the various wells
After the spans have been chosen, the medfilt1 MatLab function is then applied to all the
numeric attributes of the dataset as shown in
the moving median span is because the medfilt1 MatLab function works better with odd values.
Figure 26 - Snippet of MatLab code used showing the implementation of the mefilt1 function
After smoothing the values, it is seen that the attributes are less noisy as
Figure 28 compared to Figure
Figure 27 - Plot of smoothed CardArea attribute for failure example Well 19
Rod Pump Failure Prediction
Snippet of MatLab preprocessing code showing the spans used for the various wells
After the spans have been chosen, the medfilt1 MatLab function is then applied to all the
of the dataset as shown in Figure 26. The reason for choosing odd values as
the moving median span is because the medfilt1 MatLab function works better with odd values.
Snippet of MatLab code used showing the implementation of the mefilt1 function
After smoothing the values, it is seen that the attributes are less noisy as shown in
Figure 21 and Figure 22.
Plot of smoothed CardArea attribute for failure example Well 19
Team 2
21 | P a g e
Snippet of MatLab preprocessing code showing the spans used for the various wells
After the spans have been chosen, the medfilt1 MatLab function is then applied to all the
. The reason for choosing odd values as
the moving median span is because the medfilt1 MatLab function works better with odd values.
Snippet of MatLab code used showing the implementation of the mefilt1 function
shown in Figure 27 and
Rod Pump Failure Prediction
Figure 28 - Plot of smoothed CardArea attribute for Well 02
After smoothing all the individual datasets, they are normalized in MatLab in order to make the
attributes comparable to one another due to the
using the Min-Max normalization method, which is plausible because most of the attributes for
the datasets are not skewed either to the left or the right.
Figure 29 - Snippet of MatLab preprocessing code for normalizing the various well datasets
5.3.3 Clustering
With the datasets for each well having better quality, they are then clustered individually using
Weka. The cluster method used here is the Expectation Maximization method, which i
probabilistic cluster method. Rather than using an attribute selection method for clustering, all
Rod Pump Failure Prediction
Plot of smoothed CardArea attribute for Well 02
After smoothing all the individual datasets, they are normalized in MatLab in order to make the
attributes comparable to one another due to the varying ranges as seen in Figure
Max normalization method, which is plausible because most of the attributes for
the datasets are not skewed either to the left or the right.
b preprocessing code for normalizing the various well datasets
With the datasets for each well having better quality, they are then clustered individually using
Weka. The cluster method used here is the Expectation Maximization method, which i
probabilistic cluster method. Rather than using an attribute selection method for clustering, all
Team 2
22 | P a g e
After smoothing all the individual datasets, they are normalized in MatLab in order to make the
Figure 29. This is done
Max normalization method, which is plausible because most of the attributes for
b preprocessing code for normalizing the various well datasets
With the datasets for each well having better quality, they are then clustered individually using
Weka. The cluster method used here is the Expectation Maximization method, which is a
probabilistic cluster method. Rather than using an attribute selection method for clustering, all
Rod Pump Failure Prediction Team 2
23 | P a g e
the numeric attributes namely CardArea, PeakSurfLoad, MinSurfLoad and DailyRunTime are
used by the EM Cluster algorithm to cluster the instances.
The number of clusters is set to 3 so that we will know Normal, Pre-failure and Failure as shown
in Figure 30. These 3 clusters then become the class attribute used for failure prediction. Pre-
failure serves as the warning that failure of the rod pump is imminent so that technicians can
service the rod pumps and this increases productivity and saves operational costs.
Figure 30 - Visualization of various clusters for Card Area plotted against History Date for Well 19
5.3.4 Data Modeling
After all the dataset instances have been assigned a class label, suitable data mining models are
selected for failure prediction. Five predictive models were compared namely Naïve Bayes,
AdaBoost (Decision Stump classifier), Multilayer Perceptron (Neural Network), Decision Tree
and Logistic Regression.
In order to evaluate the performance of these models, three separate methods of testing were
developed namely 10-Fold Cross Validation, a percentage split and then a blind test.
Rod Pump Failure Prediction Team 2
24 | P a g e
In the 10-Fold Cross validation, all the well datasets for the failure examples, Wells 11-20 were
merged into one dataset and input into Weka for analysis. The datasets for the normal examples,
Wells 1-10 were also merged into a different single dataset, for analysis in Weka.
Secondly, In the Percentage Split, all the well datasets for the failure examples, Wells 11-20
were merged into one dataset and input into Weka for analysis. The datasets for the normal
examples, Wells 1-10 were also merged into a different single dataset. However, we decided to
use an extreme case test of using 5% of the dataset for training the models and then using the rest
of the 95% for testing. This would was done in order to give us a feel of how well the datasets
were preprocessed and clustered.
Finally in the blind test, for the failure examples, Wells 11-16 were merged into one dataset and
used to train the models while Wells 17-20 were merged into a single dataset supplied to the
model as a test set. For the normal examples, Wells 1-6 were merged into a single dataset and
used to train the models while Wells 7-10 were merged into a single dataset and supplied to the
models as a test set.
5.3.5 Testing Results
After testing we see that the models most of them have very good predictive performances above
95%. However, we see that in the blind test for the failure examples, the Naïve Bayes model had
a training accuracy of 99.05% and a testing accuracy of 76.21%, which suggests the model, is
severely overfitted. Likewise in testing the normal examples, with the Decision Tree model, it is
seen that it has a training accuracy of 98.63% and a testing accuracy of 63.31%, which suggests
it, is severely overfitted.
Considering the performance of the models for both the normal wells and the failure wells, the
Neural Network and the Logistic Regression are the preferred models for failure prediction.
Table 2 - Results for the models using the different performance evaluation methods for the failure examples
AdaBoost NaiveBayes Neural
Network
Decision
Tree
Logistic
Regression
10-Fold
Cross
Validation
99.31% 98.39% 99.86% 99.77% 99.82%
Percentage
Split
99.32% 99.13% 99.32% 97.19% 99.32%
Blind Test 99.12%
99.13%
99.05%
76.21%
100%
99.88%
99.9%
92.81%
100%
92.69%
Rod Pump Failure Prediction
Table 3 - Results for the models using the different performance evaluation methods for the normal examples
AdaBoost
10-Fold Cross
Validation
88.44%
Percentage
Split
87.69%
Blind Test 91.16%
81.97%
The Logistic Regression Models and the Neural Network models are our preferred models for
failure prediction so we will now take an in
Weka analysis. While these are our two preferred models, a look
different tests suggests that the Neural Network is the most superior model out of all the models
tried.
5.3.5.1 Logistic Regression
The confusion matrices show that it was able to predict all the classes quite accurately
Failure Examples Dataset
10-Fold Cross Validation
Figure 31 - Results for 10-Fold Cross Validation using Logistic Regression for failure examples dataset
Rod Pump Failure Prediction
Results for the models using the different performance evaluation methods for the normal examples
NaiveBayes Neural
Network
Decision
Tree
84.91% 90.54% 96.11%
84.2% 89.08% 87.46%
86.65%
81.97%
91.61%
86.02%
98.63%
63.31%
The Logistic Regression Models and the Neural Network models are our preferred models for
failure prediction so we will now take an in-depth look into the results of these models from our
these are our two preferred models, a look at the confusion matrices of the
different tests suggests that the Neural Network is the most superior model out of all the models
Logistic Regression
The confusion matrices show that it was able to predict all the classes quite accurately
Fold Cross Validation using Logistic Regression for failure examples dataset
Team 2
25 | P a g e
Results for the models using the different performance evaluation methods for the normal examples
Decision Logistic
Regression
92.91%
89.87%
94.21%
89.03%
The Logistic Regression Models and the Neural Network models are our preferred models for
depth look into the results of these models from our
at the confusion matrices of the
different tests suggests that the Neural Network is the most superior model out of all the models
The confusion matrices show that it was able to predict all the classes quite accurately
Fold Cross Validation using Logistic Regression for failure examples dataset
Rod Pump Failure Prediction
Percentage Split
Figure 32 - Results for Percentage Split using Logistic Regression for failure examples dataset
Blind Test
Training Set
Figure 33 - Results for learning class of training set instances for Logistic Regression for failure examples
dataset
Rod Pump Failure Prediction
Results for Percentage Split using Logistic Regression for failure examples dataset
Results for learning class of training set instances for Logistic Regression for failure examples
Team 2
26 | P a g e
Results for Percentage Split using Logistic Regression for failure examples dataset
Results for learning class of training set instances for Logistic Regression for failure examples
Rod Pump Failure Prediction
Test Set
Figure 34 - Results for predicting class of test set instances using Logistic Regression for unseen failure
examples dataset
Normal Examples Dataset
10-Fold Cross Validation
Figure 35 - Results for 10-Fold Cross Validation using Logistic Regression for normal examples dataset
Rod Pump Failure Prediction
Results for predicting class of test set instances using Logistic Regression for unseen failure
Fold Cross Validation using Logistic Regression for normal examples dataset
Team 2
27 | P a g e
Results for predicting class of test set instances using Logistic Regression for unseen failure
Fold Cross Validation using Logistic Regression for normal examples dataset
Rod Pump Failure Prediction
Percentage Split
Figure 36 - Results for Percentage Split using Logistic Regression for normal examples dataset
Blind Test
Training Set
Figure 37 - Results for learning class of training set instances using Logistic Regression for normal examples
Rod Pump Failure Prediction
Results for Percentage Split using Logistic Regression for normal examples dataset
Results for learning class of training set instances using Logistic Regression for normal examples
Team 2
28 | P a g e
Results for Percentage Split using Logistic Regression for normal examples dataset
Results for learning class of training set instances using Logistic Regression for normal examples
Rod Pump Failure Prediction
Test Set
Figure 38 - Results for predicting class of test set instances using Logistic Regression for unseen normal
examples
5.3.5.2 Neural Network (Multilayer Perceptron)
From the confusion matrices for the failure examples, the model performs well predicting each
class accurately however, when we look at the normal examples, it can be seen that it performs
badly predicting the failure class. However, going back to visualizing all the normal examples
dataset, it can be seen that there are hardly
zero for an extended amount of time as seen with the failure examples datasets. Therefore
we labeled those instances as failures from our clustering
the failure class rightfully as pre
prediction
Failure Examples Dataset
10-Fold Cross Validation
Rod Pump Failure Prediction
Results for predicting class of test set instances using Logistic Regression for unseen normal
(Multilayer Perceptron)
the confusion matrices for the failure examples, the model performs well predicting each
ly however, when we look at the normal examples, it can be seen that it performs
badly predicting the failure class. However, going back to visualizing all the normal examples
seen that there are hardly any failures since the attributes
zero for an extended amount of time as seen with the failure examples datasets. Therefore
those instances as failures from our clustering, the Neural Network actually predicts
the failure class rightfully as pre-failures. For this reason alone, it is the best model for failure
Team 2
29 | P a g e
Results for predicting class of test set instances using Logistic Regression for unseen normal
the confusion matrices for the failure examples, the model performs well predicting each
ly however, when we look at the normal examples, it can be seen that it performs
badly predicting the failure class. However, going back to visualizing all the normal examples
since the attributes are almost never at
zero for an extended amount of time as seen with the failure examples datasets. Therefore while
, the Neural Network actually predicts
For this reason alone, it is the best model for failure
Rod Pump Failure Prediction
Figure 39 - Results for 10-Fold Cross Validation using Neural Network for failure examples dataset
Percentage Split
Figure 40 - Results for Percentage Split using Neural Network for failure examples dataset
Blind Test
Training Set
Rod Pump Failure Prediction
Fold Cross Validation using Neural Network for failure examples dataset
Results for Percentage Split using Neural Network for failure examples dataset
Team 2
30 | P a g e
Fold Cross Validation using Neural Network for failure examples dataset
Results for Percentage Split using Neural Network for failure examples dataset
Rod Pump Failure Prediction
Figure 41 - Results for learning class of training set instances using Neural Network for failure examples
dataset
Test Set
Figure 42 - Results for predicting class of test set instances using Neural Network for unseen failure examples
dataset
Rod Pump Failure Prediction
Results for learning class of training set instances using Neural Network for failure examples
Results for predicting class of test set instances using Neural Network for unseen failure examples
Team 2
31 | P a g e
Results for learning class of training set instances using Neural Network for failure examples
Results for predicting class of test set instances using Neural Network for unseen failure examples
Rod Pump Failure Prediction
Normal Examples Dataset
10-Fold Cross Validation
Figure 43 - Results for 10-Fold Cross Validation using Neural Network for normal examples dataset
Percentage Split
Figure 44 - Results for Percentage Split using Neural Network for normal examples dataset
Blind Test
Training Set
Rod Pump Failure Prediction
Fold Cross Validation using Neural Network for normal examples dataset
s for Percentage Split using Neural Network for normal examples dataset
Team 2
32 | P a g e
Fold Cross Validation using Neural Network for normal examples dataset
s for Percentage Split using Neural Network for normal examples dataset
Rod Pump Failure Prediction
Figure 45 - Results for learning training set instances using Neural Network for normal examples dataset
Test Set
Figure 46 - Results for predicting class of test set instances using Neural Network for unseen normal
examples dataset
Rod Pump Failure Prediction
Results for learning training set instances using Neural Network for normal examples dataset
Results for predicting class of test set instances using Neural Network for unseen normal
Team 2
33 | P a g e
Results for learning training set instances using Neural Network for normal examples dataset
Results for predicting class of test set instances using Neural Network for unseen normal
Rod Pump Failure Prediction Team 2
34 | P a g e
6 Conclusions
Wells should be pre-processed on an individual basis to ensure a consistent dataset. We
attempted to analyze the data on an individual and global basis and without a doubt, individual
analysis resulted in better prediction accuracy.
We evaluated multiple data mining algorithms and in the end Neural Network and Logistic
Regression were the two preferred algorithms for failure prediction. However a look at the
detailed results of the two models shows that the Neural Network is the best model for failure
prediction.
Machine learning with expert domain knowledge is critical for training the model. A lot of
assumptions were made regarding the data, especially when a failure was observed. Having the
knowledge expert on hand during training would result in the highest possible model accuracy.
For true validation of the model, additional testing is required with the corroboration of the
knowledge expert. This would ensure that all failures observed are actually well failures. Also,
additional well data from another field would be a great test set to validate the model.
Rod Pump Failure Prediction Team 2
35 | P a g e
7 Works Cited
Liu, F., & Patel, A. (March 26-28, 2013). Well Failure Detection for Rod Pump Artificial Lift
System through Pattern Recognition. Beijing, China: International Petroleum Technology
Conference.
Liu, Y. (December 2013). Failure Prediction for Rod Pump Artifiicial Lift Systems.
S. Liu, Y. L. Automatic Early Fault Detection for Rod Pump Systems. Society of Petroleum
Engineers (SPE).

More Related Content

What's hot

Fundamentos de Programacion.pdf
Fundamentos de Programacion.pdfFundamentos de Programacion.pdf
Fundamentos de Programacion.pdfJorge Serran
 
Help Desk Ticketing System - Requirements Specification
Help Desk Ticketing System - Requirements SpecificationHelp Desk Ticketing System - Requirements Specification
Help Desk Ticketing System - Requirements SpecificationKyle Thompson
 
Three Reasons Augmented Intelligence Is the Future of AI in Healthcare
Three Reasons Augmented Intelligence Is the Future of AI in HealthcareThree Reasons Augmented Intelligence Is the Future of AI in Healthcare
Three Reasons Augmented Intelligence Is the Future of AI in HealthcareHealth Catalyst
 
Types of Blockchain, AI and its future
Types of Blockchain, AI and its futureTypes of Blockchain, AI and its future
Types of Blockchain, AI and its futureAarthi Srinivasan
 
Data Manipulation and Math Instruction on RSLogix 500
Data Manipulation and Math Instruction on RSLogix 500Data Manipulation and Math Instruction on RSLogix 500
Data Manipulation and Math Instruction on RSLogix 500Lino Hugun Saputra
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMamiteshg
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine LearningScaleway
 
Microsoft Dynamics CRM 2011 - Guide Utilisateur
Microsoft Dynamics CRM 2011 - Guide UtilisateurMicrosoft Dynamics CRM 2011 - Guide Utilisateur
Microsoft Dynamics CRM 2011 - Guide UtilisateurPhilippe LEAL
 
Big Data: Banking Industry Use Case
Big Data: Banking Industry Use Case Big Data: Banking Industry Use Case
Big Data: Banking Industry Use Case Ramandeep Kaur Bagri
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAndrea Dal Pozzolo
 
Helical machining guidebook
Helical machining guidebookHelical machining guidebook
Helical machining guidebookDave Davidson
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine LearningYuriy Guts
 
510(k) Pre-Market Notification Project
510(k) Pre-Market Notification Project510(k) Pre-Market Notification Project
510(k) Pre-Market Notification ProjectBrandon MacAleese
 
Software architect design documentation template
Software architect design documentation templateSoftware architect design documentation template
Software architect design documentation templateSalim M Bhonhariya
 
Thiết kế ngược Geomagic tập 2
Thiết kế ngược Geomagic tập 2Thiết kế ngược Geomagic tập 2
Thiết kế ngược Geomagic tập 2Trung tâm Advance Cad
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Pratibha Singh
 

What's hot (20)

Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Fundamentos de Programacion.pdf
Fundamentos de Programacion.pdfFundamentos de Programacion.pdf
Fundamentos de Programacion.pdf
 
Help Desk Ticketing System - Requirements Specification
Help Desk Ticketing System - Requirements SpecificationHelp Desk Ticketing System - Requirements Specification
Help Desk Ticketing System - Requirements Specification
 
Đề tài: Thông tin địa lý để quản lý cơ sở hạ tầng tỉnh Lào Cai, HAY
Đề tài: Thông tin địa lý để quản lý cơ sở hạ tầng tỉnh Lào Cai, HAYĐề tài: Thông tin địa lý để quản lý cơ sở hạ tầng tỉnh Lào Cai, HAY
Đề tài: Thông tin địa lý để quản lý cơ sở hạ tầng tỉnh Lào Cai, HAY
 
Three Reasons Augmented Intelligence Is the Future of AI in Healthcare
Three Reasons Augmented Intelligence Is the Future of AI in HealthcareThree Reasons Augmented Intelligence Is the Future of AI in Healthcare
Three Reasons Augmented Intelligence Is the Future of AI in Healthcare
 
Types of Blockchain, AI and its future
Types of Blockchain, AI and its futureTypes of Blockchain, AI and its future
Types of Blockchain, AI and its future
 
Data Manipulation and Math Instruction on RSLogix 500
Data Manipulation and Math Instruction on RSLogix 500Data Manipulation and Math Instruction on RSLogix 500
Data Manipulation and Math Instruction on RSLogix 500
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 
Microsoft Dynamics CRM 2011 - Guide Utilisateur
Microsoft Dynamics CRM 2011 - Guide UtilisateurMicrosoft Dynamics CRM 2011 - Guide Utilisateur
Microsoft Dynamics CRM 2011 - Guide Utilisateur
 
Big Data: Banking Industry Use Case
Big Data: Banking Industry Use Case Big Data: Banking Industry Use Case
Big Data: Banking Industry Use Case
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
 
Helical machining guidebook
Helical machining guidebookHelical machining guidebook
Helical machining guidebook
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
510(k) Pre-Market Notification Project
510(k) Pre-Market Notification Project510(k) Pre-Market Notification Project
510(k) Pre-Market Notification Project
 
Software architect design documentation template
Software architect design documentation templateSoftware architect design documentation template
Software architect design documentation template
 
SAP Portal kılavuzu
SAP Portal kılavuzuSAP Portal kılavuzu
SAP Portal kılavuzu
 
Thiết kế ngược Geomagic tập 2
Thiết kế ngược Geomagic tập 2Thiết kế ngược Geomagic tập 2
Thiết kế ngược Geomagic tập 2
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...
 

Viewers also liked

Sucker rod pumping short course!!! ~downhole diagnostic
Sucker rod pumping short course!!!   ~downhole diagnosticSucker rod pumping short course!!!   ~downhole diagnostic
Sucker rod pumping short course!!! ~downhole diagnosticenLightNme888
 
Sucker Rod Pump (SRP)
Sucker Rod Pump (SRP)Sucker Rod Pump (SRP)
Sucker Rod Pump (SRP)Rakesh Kumar
 
workover operation
workover operationworkover operation
workover operationabbas salim
 
price list + editorial calendar 2015
price list + editorial calendar 2015price list + editorial calendar 2015
price list + editorial calendar 2015Denis Demyanikov
 
Rotating right resp presentation web-show
Rotating right resp presentation web-showRotating right resp presentation web-show
Rotating right resp presentation web-showRjwing
 
WELL HEAD pcp overview1234
WELL HEAD  pcp overview1234WELL HEAD  pcp overview1234
WELL HEAD pcp overview1234Afsal Ameen c
 
IRZ oilfield equipment (ESP)
IRZ oilfield equipment (ESP)IRZ oilfield equipment (ESP)
IRZ oilfield equipment (ESP)Ivan Isaev
 
13 artificial-lift
13 artificial-lift13 artificial-lift
13 artificial-liftjuanca0106
 
article1433749720_Fozao et al
article1433749720_Fozao et alarticle1433749720_Fozao et al
article1433749720_Fozao et allionel mbanda
 
Subsurface Pumps, pumpsandpipesmdhc
Subsurface Pumps, pumpsandpipesmdhcSubsurface Pumps, pumpsandpipesmdhc
Subsurface Pumps, pumpsandpipesmdhctmhsweb
 
Artificial Lift Screening and Selection
Artificial Lift Screening and SelectionArtificial Lift Screening and Selection
Artificial Lift Screening and SelectionAndres Martingano
 
Forever Living Products distributor price list 2014-2015
Forever Living Products distributor price list 2014-2015Forever Living Products distributor price list 2014-2015
Forever Living Products distributor price list 2014-2015Katalin Hidvegi
 
capitulo-2-bombeo-mecanico
capitulo-2-bombeo-mecanicocapitulo-2-bombeo-mecanico
capitulo-2-bombeo-mecanicoArturo Montiel
 
Artificial lift technology
Artificial lift technologyArtificial lift technology
Artificial lift technologyjosepazv
 

Viewers also liked (20)

Sucker rod pumping short course!!! ~downhole diagnostic
Sucker rod pumping short course!!!   ~downhole diagnosticSucker rod pumping short course!!!   ~downhole diagnostic
Sucker rod pumping short course!!! ~downhole diagnostic
 
Sucker Rod Pump (SRP)
Sucker Rod Pump (SRP)Sucker Rod Pump (SRP)
Sucker Rod Pump (SRP)
 
Project 2
Project 2Project 2
Project 2
 
workover operation
workover operationworkover operation
workover operation
 
price list + editorial calendar 2015
price list + editorial calendar 2015price list + editorial calendar 2015
price list + editorial calendar 2015
 
Rotating right resp presentation web-show
Rotating right resp presentation web-showRotating right resp presentation web-show
Rotating right resp presentation web-show
 
Pricelist inner
Pricelist innerPricelist inner
Pricelist inner
 
WELL HEAD pcp overview1234
WELL HEAD  pcp overview1234WELL HEAD  pcp overview1234
WELL HEAD pcp overview1234
 
IRZ oilfield equipment (ESP)
IRZ oilfield equipment (ESP)IRZ oilfield equipment (ESP)
IRZ oilfield equipment (ESP)
 
13 artificial-lift
13 artificial-lift13 artificial-lift
13 artificial-lift
 
Utkarsh Bhargava - Thesis
Utkarsh Bhargava - ThesisUtkarsh Bhargava - Thesis
Utkarsh Bhargava - Thesis
 
article1433749720_Fozao et al
article1433749720_Fozao et alarticle1433749720_Fozao et al
article1433749720_Fozao et al
 
Subsurface Pumps, pumpsandpipesmdhc
Subsurface Pumps, pumpsandpipesmdhcSubsurface Pumps, pumpsandpipesmdhc
Subsurface Pumps, pumpsandpipesmdhc
 
Well Workover
Well Workover Well Workover
Well Workover
 
Artificial Lift Screening and Selection
Artificial Lift Screening and SelectionArtificial Lift Screening and Selection
Artificial Lift Screening and Selection
 
Forever Living Products distributor price list 2014-2015
Forever Living Products distributor price list 2014-2015Forever Living Products distributor price list 2014-2015
Forever Living Products distributor price list 2014-2015
 
capitulo-2-bombeo-mecanico
capitulo-2-bombeo-mecanicocapitulo-2-bombeo-mecanico
capitulo-2-bombeo-mecanico
 
Artificial lift technology
Artificial lift technologyArtificial lift technology
Artificial lift technology
 
Born To Lead
Born To LeadBorn To Lead
Born To Lead
 
GE O&G overview 2013
GE O&G overview 2013GE O&G overview 2013
GE O&G overview 2013
 

Similar to Smart Oilfield Data Mining Final Project-Rod Pump Failure Prediction

How to manage future grid dynamics: system value of Smart Power Generation in...
How to manage future grid dynamics: system value of Smart Power Generation in...How to manage future grid dynamics: system value of Smart Power Generation in...
How to manage future grid dynamics: system value of Smart Power Generation in...Smart Power Generation
 
EVS-06-33e.pdf
EVS-06-33e.pdfEVS-06-33e.pdf
EVS-06-33e.pdfInstansi
 
White Paper - Process Neutral Data Modelling
White Paper -  Process Neutral Data ModellingWhite Paper -  Process Neutral Data Modelling
White Paper - Process Neutral Data ModellingDavid Walker
 
Optimization of an Energy-Generating Turnstile
Optimization of an Energy-Generating TurnstileOptimization of an Energy-Generating Turnstile
Optimization of an Energy-Generating TurnstileWayne Smith
 
Math for programmers
Math for programmersMath for programmers
Math for programmersmustafa sarac
 
Paul Ebbs (2011) - Can lean construction improve the irish construction industry
Paul Ebbs (2011) - Can lean construction improve the irish construction industryPaul Ebbs (2011) - Can lean construction improve the irish construction industry
Paul Ebbs (2011) - Can lean construction improve the irish construction industryPaul Ebbs
 
Connect ups ms-web-snmp_card_user_guide.568
Connect ups ms-web-snmp_card_user_guide.568Connect ups ms-web-snmp_card_user_guide.568
Connect ups ms-web-snmp_card_user_guide.568David Reyes
 
B7.2 a1353-ra platform commissioning solaris 2.6
B7.2 a1353-ra platform commissioning solaris 2.6B7.2 a1353-ra platform commissioning solaris 2.6
B7.2 a1353-ra platform commissioning solaris 2.6chungminh1108
 
Q46R-ORP-Monitor O&M Manual.pdf
Q46R-ORP-Monitor O&M Manual.pdfQ46R-ORP-Monitor O&M Manual.pdf
Q46R-ORP-Monitor O&M Manual.pdfENVIMART
 
Q46R-ORP-Monitor.pdf
Q46R-ORP-Monitor.pdfQ46R-ORP-Monitor.pdf
Q46R-ORP-Monitor.pdfENVIMART
 
27911 gyroscope appnote2
27911 gyroscope appnote227911 gyroscope appnote2
27911 gyroscope appnote2ssuseraddd15
 
Grid connected pv power system
Grid connected pv power systemGrid connected pv power system
Grid connected pv power systemZelalem Girma
 
Hp 40gs user's guide english
Hp 40gs user's guide englishHp 40gs user's guide english
Hp 40gs user's guide englishdanilegg17
 

Similar to Smart Oilfield Data Mining Final Project-Rod Pump Failure Prediction (20)

How to manage future grid dynamics: system value of Smart Power Generation in...
How to manage future grid dynamics: system value of Smart Power Generation in...How to manage future grid dynamics: system value of Smart Power Generation in...
How to manage future grid dynamics: system value of Smart Power Generation in...
 
EVS-06-33e.pdf
EVS-06-33e.pdfEVS-06-33e.pdf
EVS-06-33e.pdf
 
Tutorial imex builder (field units)
Tutorial imex builder (field units)Tutorial imex builder (field units)
Tutorial imex builder (field units)
 
Datasheet
DatasheetDatasheet
Datasheet
 
White Paper - Process Neutral Data Modelling
White Paper -  Process Neutral Data ModellingWhite Paper -  Process Neutral Data Modelling
White Paper - Process Neutral Data Modelling
 
Optimization of an Energy-Generating Turnstile
Optimization of an Energy-Generating TurnstileOptimization of an Energy-Generating Turnstile
Optimization of an Energy-Generating Turnstile
 
Math for programmers
Math for programmersMath for programmers
Math for programmers
 
Paul Ebbs (2011) - Can lean construction improve the irish construction industry
Paul Ebbs (2011) - Can lean construction improve the irish construction industryPaul Ebbs (2011) - Can lean construction improve the irish construction industry
Paul Ebbs (2011) - Can lean construction improve the irish construction industry
 
Tommy Marker
Tommy MarkerTommy Marker
Tommy Marker
 
Connect ups ms-web-snmp_card_user_guide.568
Connect ups ms-web-snmp_card_user_guide.568Connect ups ms-web-snmp_card_user_guide.568
Connect ups ms-web-snmp_card_user_guide.568
 
B7.2 a1353-ra platform commissioning solaris 2.6
B7.2 a1353-ra platform commissioning solaris 2.6B7.2 a1353-ra platform commissioning solaris 2.6
B7.2 a1353-ra platform commissioning solaris 2.6
 
Q46R-ORP-Monitor O&M Manual.pdf
Q46R-ORP-Monitor O&M Manual.pdfQ46R-ORP-Monitor O&M Manual.pdf
Q46R-ORP-Monitor O&M Manual.pdf
 
Q46R-ORP-Monitor.pdf
Q46R-ORP-Monitor.pdfQ46R-ORP-Monitor.pdf
Q46R-ORP-Monitor.pdf
 
27911 gyroscope appnote2
27911 gyroscope appnote227911 gyroscope appnote2
27911 gyroscope appnote2
 
Smarty 2
Smarty 2Smarty 2
Smarty 2
 
Ca 7 primer
Ca 7 primerCa 7 primer
Ca 7 primer
 
Ostrander_Project_Final_Report
Ostrander_Project_Final_ReportOstrander_Project_Final_Report
Ostrander_Project_Final_Report
 
GENESYS™ 10S UV-Vis (English)
GENESYS™ 10S UV-Vis (English)GENESYS™ 10S UV-Vis (English)
GENESYS™ 10S UV-Vis (English)
 
Grid connected pv power system
Grid connected pv power systemGrid connected pv power system
Grid connected pv power system
 
Hp 40gs user's guide english
Hp 40gs user's guide englishHp 40gs user's guide english
Hp 40gs user's guide english
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Smart Oilfield Data Mining Final Project-Rod Pump Failure Prediction

  • 1. PTE588 – Smart Oilfield Data Mining December 12th , 2013 Rod Pump Failure Prediction Team 2 William Chessum, Marco Jarrin, Max Solanki, Xin Jason Du & Jeffrey Daniels
  • 2. Rod Pump Failure Prediction Team 2 2 | P a g e Table of Contents 1 Rod Pump Introduction........................................................................................................... 5 2 Exploring the Problem Space.................................................................................................. 5 2.1 Raw Rod Pump Data........................................................................................................ 6 2.2 Processed Rod Pump Data ............................................................................................... 8 3 Exploring the Solution Space................................................................................................ 12 3.1 Preliminary Approach: ................................................................................................... 12 3.2 First Approach:............................................................................................................... 12 3.3 Second Approach: .......................................................................................................... 12 4 Implementation Specification............................................................................................... 13 5 Data Mining Development:................................................................................................... 13 5.1 Preliminary Approach .................................................................................................... 13 5.1.1 Attribute Selection:................................................................................................. 13 5.1.2 Classifiers:............................................................................................................... 14 5.2 First Approach:............................................................................................................... 16 5.2.1 Data Selection:........................................................................................................ 16 5.2.2 Clusterization:......................................................................................................... 16 5.2.3 Classification: ......................................................................................................... 16 5.2.4 Future failure prediction algorithm:........................................................................ 17 5.2.5 Testing: ................................................................................................................... 18 5.3 Second Approach ........................................................................................................... 19 5.3.1 Data Surveying........................................................................................................ 19 5.3.2 Data Preparation...................................................................................................... 20 5.3.3 Clustering................................................................................................................ 22 5.3.4 Data Modeling ........................................................................................................ 23 5.3.5 Testing Results........................................................................................................ 24 6 Conclusions........................................................................................................................... 34 7 Works Cited .......................................................................................................................... 35 Tables Table 1. Future Failure Prediction from cluster transitions.......................................................... 17 Table 2 - Results for the models using the different performance evaluation methods for the failure examples............................................................................................................................ 24 Table 3 - Results for the models using the different performance evaluation methods for the normal examples........................................................................................................................... 25
  • 3. Rod Pump Failure Prediction Team 2 3 | P a g e Table of Figures Figure 2 - Problem definition: Data Sets ........................................................................................ 5 Figure 1 - Surface and pump card................................................................................................... 5 Figure 3 – Raw card area; normal and failure data......................................................................... 6 Figure 4 – Raw peak surface load; normal and failure data ........................................................... 7 Figure 5 – Raw minimum surface load; normal and failure data ................................................... 7 Figure 6 – Raw daily run time; normal and failure data................................................................. 8 Figure 7 - Processed card area; normal and failure data................................................................. 9 Figure 8 - Processed peak surface load; normal and failure data ................................................... 9 Figure 9 - Processed minimum surface load; normal and failure data ......................................... 10 Figure 10 - Processed daily run time trend; normal and failure data............................................ 10 Figure 11 - WEKA visualization results....................................................................................... 11 Figure 12 - Problem process flow diagram................................................................................... 13 Figure 13 - PCA Analysis............................................................................................................. 14 Figure 14 - ClassifierSubsetEval (K nearest neighbor) /GeneticSearch....................................... 14 Figure 15 - Classifier nearest Neighbor........................................................................................ 15 Figure 16 - Classifier Decision/Regression Tree.......................................................................... 15 Figure 17 - (a) CardArea, (b) Clusters, (c) New class labeled, (d) Run time for comparisson ... 16 Figure 18 - Classification Results using K-nearest neighbor........................................................ 17 Figure 19 - Classification Results using Decision tree ................................................................. 17 Figure 20 - Future failure prediction............................................................................................. 18 Figure 21 - Card Area of Failure Example, Well 19 plotted showing missing data before preprocessing ................................................................................................................................ 19 Figure 22 - Card Area of Normal Example, Well 02 plotted showing noise before preprocessing ....................................................................................................................................................... 19 Figure 23 - Sample of data from Failure Example Well11, showing daily runtime attribute as 0 but the other attributes having non-zero values ............................................................................ 20 Figure 24 - Snippet of MatLab preprocessing code showing how missing data was filled for Well 01................................................................................................................................................... 20 Figure 25 - Snippet of MatLab preprocessing code showing the spans used for the various wells ....................................................................................................................................................... 21 Figure 26 - Snippet of MatLab code used showing the implementation of the mefilt1 function. 21 Figure 27 - Plot of smoothed CardArea attribute for failure example Well 19............................ 21 Figure 28 - Plot of smoothed CardArea attribute for Well 02...................................................... 22 Figure 29 - Snippet of MatLab preprocessing code for normalizing the various well datasets.... 22 Figure 30 - Visualization of various clusters for Card Area plotted against History Date for Well 19................................................................................................................................................... 23 Figure 31 - Results for 10-Fold Cross Validation using Logistic Regression for failure examples dataset ........................................................................................................................................... 25 Figure 32 - Results for Percentage Split using Logistic Regression for failure examples dataset 26 Figure 33 - Results for learning class of training set instances for Logistic Regression for failure examples dataset ........................................................................................................................... 26 Figure 34 - Results for predicting class of test set instances using Logistic Regression for unseen failure examples dataset................................................................................................................ 27
  • 4. Rod Pump Failure Prediction Team 2 4 | P a g e Figure 35 - Results for 10-Fold Cross Validation using Logistic Regression for normal examples dataset ........................................................................................................................................... 27 Figure 36 - Results for Percentage Split using Logistic Regression for normal examples dataset ....................................................................................................................................................... 28 Figure 37 - Results for learning class of training set instances using Logistic Regression for normal examples........................................................................................................................... 28 Figure 38 - Results for predicting class of test set instances using Logistic Regression for unseen normal examples........................................................................................................................... 29 Figure 39 - Results for 10-Fold Cross Validation using Neural Network for failure examples dataset ........................................................................................................................................... 30 Figure 40 - Results for Percentage Split using Neural Network for failure examples dataset ..... 30 Figure 41 - Results for learning class of training set instances using Neural Network for failure examples dataset ........................................................................................................................... 31 Figure 42 - Results for predicting class of test set instances using Neural Network for unseen failure examples dataset................................................................................................................ 31 Figure 43 - Results for 10-Fold Cross Validation using Neural Network for normal examples dataset ........................................................................................................................................... 32 Figure 44 - Results for Percentage Split using Neural Network for normal examples dataset .... 32 Figure 45 - Results for learning training set instances using Neural Network for normal examples dataset ........................................................................................................................................... 33 Figure 46 - Results for predicting class of test set instances using Neural Network for unseen normal examples dataset............................................................................................................... 33
  • 5. Rod Pump Failure Prediction Team 2 5 | P a g e 1 Rod Pump Introduction Rod pumps are currently the most widely used artificial lift method within the oil and gas industry. They are employed to bring fluid to the surface when the reservoir pressure is insufficient. Rod pump failures which include surface, tubing, and down-hole failures, commonly occur and will result in production loss and operational expenses. If you can detect the precursors of a well failure or the well failure itself you can minimize downtime and increase operational efficiency (Liu & Patel, March 26-28, 2013). There are many different operating variables you can examine to determine a pumps operating status but the most telling is the dynamometer card [Figure 1]. This will indicate the maximum and minimum forces being applied to the rod pump and also the area inside the pump cycle. For the purpose of this report we will be using the pump card area to determine rod well failures. 2 Exploring the Problem Space The objective of this project is to develop a data mining technique to accurately predict rod pump failures based on a set of data from an unknown oil field. Rod pump data from normal and failure wells was given with both raw and processed data. Additionally, each well had four numeric attributes; card area, peak surface load, minimum surface load and daily run time. Figure 2 - Problem definition: Data Sets Figure 1 - Surface and pump card Card Area
  • 6. Rod Pump Failure Prediction Team 2 6 | P a g e 2.1 Raw Rod Pump Data The below four figures (Figure 3, Figure 4, Figure 5 & Figure 6) depict the raw data from 20 different wells (10 failure and 10 normal) over the course of roughly one year. Four attributes were given for each well; card area, peak surface load, minimum surface load and daily run time. The data is quite noisy with multiple outliers and cannot be used for analysis purposes. Processing and normalizing is required before data mining techniques can be applied. Figure 3 – Raw card area; normal and failure data 0 5000 10000 15000 20000 25000 30000 1-Jan-16 26-Jan-16 20-Feb-16 16-Mar-16 10-Apr-16 5-May-16 30-May-16 24-Jun-16 19-Jul-16 13-Aug-16 7-Sep-16 2-Oct-16 27-Oct-16 21-Nov-16 16-Dec-16 10-Jan-17 CardArea CardArea Failure Normal
  • 7. Rod Pump Failure Prediction Team 2 7 | P a g e Figure 4 – Raw peak surface load; normal and failure data Figure 5 – Raw minimum surface load; normal and failure data 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 1-Jan-16 26-Jan-16 20-Feb-16 16-Mar-16 10-Apr-16 5-May-16 30-May-16 24-Jun-16 19-Jul-16 13-Aug-16 7-Sep-16 2-Oct-16 27-Oct-16 21-Nov-16 16-Dec-16 10-Jan-17 PeakSurfLoad PeakSurface Load Failure Normal 0 5000 10000 15000 20000 25000 30000 35000 1-Jan-16 26-Jan-16 20-Feb-16 16-Mar-16 10-Apr-16 5-May-16 30-May-16 24-Jun-16 19-Jul-16 13-Aug-16 7-Sep-16 2-Oct-16 27-Oct-16 21-Nov-16 16-Dec-16 10-Jan-17 MinSurfLoad MinimumSurface Load Failure Normal
  • 8. Rod Pump Failure Prediction Team 2 8 | P a g e Figure 6 – Raw daily run time; normal and failure data 2.2 Processed Rod Pump Data The below four figures (Figure 7, Figure 8, Figure 9 & Figure 10) depict the processed data from 19 different wells (9 failure and 10 normal) over the course of roughly one year. Four attributes were given for each well; card area, peak surface load, minimum surface load and daily run time. After processing and normalization, the data shows distinct clustering and an obvious contrast can be seen between normal and failure wells. This data will be used in MatLab and WEKA using various data mining techniques to predict rod pump failures. 0 5 10 15 20 25 30 1-Jan-16 26-Jan-16 20-Feb-16 16-Mar-16 10-Apr-16 5-May-16 30-May-16 24-Jun-16 19-Jul-16 13-Aug-16 7-Sep-16 2-Oct-16 27-Oct-16 21-Nov-16 16-Dec-16 10-Jan-17 DailyRunTime,hrs Daily Run Time Failure Normal
  • 9. Rod Pump Failure Prediction Team 2 9 | P a g e Figure 7 - Processed card area; normal and failure data Figure 8 - Processed peak surface load; normal and failure data 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1-Jan-16 26-Jan-16 20-Feb-16 16-Mar-16 10-Apr-16 5-May-16 30-May-16 24-Jun-16 19-Jul-16 13-Aug-16 7-Sep-16 2-Oct-16 27-Oct-16 21-Nov-16 16-Dec-16 10-Jan-17 NormalizedCardArea CardArea Failure Normal 0 0.2 0.4 0.6 0.8 1 1.2 1-Jan-16 26-Jan-16 20-Feb-16 16-Mar-16 10-Apr-16 5-May-16 30-May-16 24-Jun-16 19-Jul-16 13-Aug-16 7-Sep-16 2-Oct-16 27-Oct-16 21-Nov-16 16-Dec-16 10-Jan-17 NormalizedPeakSurfLoad PeakSurface Load Failure Normal
  • 10. Rod Pump Failure Prediction Team 2 10 | P a g e Figure 9 - Processed minimum surface load; normal and failure data Figure 10 - Processed daily run time trend; normal and failure data Figure 11 shows the results from WEKA using the Visualization tool. From this analysis you can see that the card area is the most important attribute in identifying potential rod pump failures. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1-Jan-16 26-Jan-16 20-Feb-16 16-Mar-16 10-Apr-16 5-May-16 30-May-16 24-Jun-16 19-Jul-16 13-Aug-16 7-Sep-16 2-Oct-16 27-Oct-16 21-Nov-16 16-Dec-16 10-Jan-17 NormalizedMinSurfLoad MinimumSurface Load Failure Normal 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1-Jan-16 26-Jan-16 20-Feb-16 16-Mar-16 10-Apr-16 5-May-16 30-May-16 24-Jun-16 19-Jul-16 13-Aug-16 7-Sep-16 2-Oct-16 27-Oct-16 21-Nov-16 16-Dec-16 10-Jan-17 NormalizedDailyRunTime Daily Run Time Failure Normal
  • 11. Rod Pump Failure Prediction Team 2 11 | P a g e On the other hand, daily run time is the least significant attribute due to the inability to determine why the well is down i.e. regular well maintenance or true failure. Figure 11 - WEKA visualization results
  • 12. Rod Pump Failure Prediction Team 2 12 | P a g e 3 Exploring the Solution Space 3.1 Preliminary Approach: • Analyze processed data globally • Class: Run-time • Attribute selection: – Visual, PCA analysis, ClassifierSubsetEval/GeneticSearch • Classifier: – K-nearest neighbors, Decision Trees • Need to define new class from data-set to interpret failures (adding class labels) 3.2 First Approach: • Heuristic: Card Area trend seems to predict failures • Only Card Area Clustering: K-means, 5 clusters. • Define Class labels from cluster numbers • Training and Test data Sets: holdout split 2/3 & 1/3 • Classifier: K-nearest neighbor, Naïve Bayes, decision/regression tree (predicts current well status) • Challenge: Learning from cluster transitions to predict future failures 3.3 Second Approach: • Raw data for failure and normal wells was processed for: missing data, outliers, then normalized. • Added a class label based on clusterization using all the attributes (except well name and history) by Expectation Maximization Method. • Model learning using different algorithms, with 10-fold cross validation: Decision Tree, Neural Network, Logistic Regression, AdaBoost (Decision Stump classifier), Naïve Bayes
  • 13. Rod Pump Failure Prediction Team 2 13 | P a g e 4 Implementation Specification Figure 12 - Problem process flow diagram 5 Data Mining Development: 5.1 Preliminary Approach We analyzed the data from normal wells and from failure wells. At this point our objective was to identify a model that indicates a well failure, which means DailyRunTime = 0. Since we require recognizing a perturbation from a failure-type well, we decided to use only a unified dataset from the failure wells. Data Preparation was not required since processed data was already provided. 5.1.1 Attribute Selection: PCA analysis, ClassifierSubsetEval/GeneticSearch, were evaluated against the DailyRunTime Class to determine if a dimension reduction makes sense. Figure 13 and Figure 14 show the results from Weka.
  • 14. Rod Pump Failure Prediction Team 2 14 | P a g e Figure 13 - PCA Analysis Figure 14 - ClassifierSubsetEval (K nearest neighbor) /GeneticSearch 5.1.2 Classifiers: K-nearest neighbors and Decision/Regression Tree were used for evaluation, as shown in Figure 15 and Figure 16
  • 15. Rod Pump Failure Prediction Team 2 15 | P a g e Figure 15 - Classifier nearest Neighbor Figure 16 - Classifier Decision/Regression Tree From the analysis, we determined we will need to add a class label to the data set.
  • 16. Rod Pump Failure Prediction Team 2 16 | P a g e 5.2 First Approach: By a heuristic process, using graphical inspection, we determined that card area declining trend could be used to predict the well failures. 5.2.1 Data Selection: We selected a training and test data sets using a holdout split 2/3 & 1/3 from the failure wells. 5.2.2 Clusterization: We attacked this problem using clusterization (K-means) for labeling 5 clusters as follow: 1 = shutdown well, 2 = Prefailure, 4 = Normal, 3 and 5 = quasi normal states. Figure 17.b shows the colored data of CardArea, once 5 clusters were obtained. Figure 17.c, the new labeled class, can be related to the run time to predict a failure. The transition between steps could be used to anticipate a shutdown, once the information is validated with the run time. Figure 17 - (a) CardArea, (b) Clusters, (c) New class labeled, (d) Run time for comparisson 5.2.3 Classification: Once the new label class was added to the data set, we used 3 types of classifiers to predict the current well status: K-nearest neighbor, Naïve Bayes, and decision/regression tree (with a 10- 0 200 400 600 800 1000 1200 0 0.5 1 1.5 CardArea1 0 50 100 150 200 250 300 350 400 450 500 0 0.5 1 1.5 1 2 3 4 5 0 200 400 600 800 1000 1200 0 2 4 6 new attribute 0 200 400 600 800 1000 1200 0 0.5 1 1.5 Run time
  • 17. Rod Pump Failure Prediction Team 2 17 | P a g e fold cross validation). Figure 18 and Figure 19 show the comparison between K-nearest neighbor and decision tree results. K-nearest neighbor Figure 18 - Classification Results using K-nearest neighbor Decision/regression tree Figure 19 - Classification Results using Decision tree 5.2.4 Future failure prediction algorithm: We determined that by using a simple logic between transitions we would be able to predict future failures. We first detected the flank changes in the label class. Then we identified each negative transition and generated 3 annunciator labels: Warning, Failure Probable and Failure Imminent, as follow: Table 1 - Future Failure Prediction from cluster transitions State 1 State 2 Operator Label Priority (0-100) From 5 or 4 to 3 Warning 10 From 5 or 4 to 2 Failure probable 50 From Warning to 2 Failure Imminent 100
  • 18. Rod Pump Failure Prediction Team 2 18 | P a g e 5.2.5 Testing: Figure 20 shows the results for well 16. After the model and prediction algorithm have been applied, this algorithm was able to predict a failure 13 days in advance. Figure 20 - Future failure prediction 0 20 40 60 80 100 120 140 160 180 200 1 2 3 4 5 new attribute 0 20 40 60 80 100 120 140 160 180 200 0 0.5 1 1.5 Run time 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 Failure Prediction Well-16 Declining trend learned Prediction of failure alarm (13 days before real event) Warnings (not imminent failures) Real Failure
  • 19. Rod Pump Failure Prediction Team 2 19 | P a g e 5.3 Second Approach 5.3.1 Data Surveying When the unprocessed data from the normal and failure examples are visualized in MatLab, it is seen that there are some missing values from the data set as seen in Figure 21 Figure 21 - Card Area of Failure Example, Well 19 plotted showing missing data before preprocessing Moreover, the curves have some noise as seen in Figure 22 Figure 22 - Card Area of Normal Example, Well 02 plotted showing noise before preprocessing Lastly, when the csv file is checked some instances have their Daily Run Time attribute as a zero value, while the other attributes in those instances have non-zero values as seen in the second row of the Figure 23
  • 20. Rod Pump Failure Prediction Figure 23 - Sample of data from Failure Example Well11, showing daily runtime attribute as 0 but the other attributes having non-zero values 5.3.2 Data Preparation The datasets were preprocessed individual the unprocessed data ready for modeling, first the missin done because since there is no data for those instances we can assume the rod pumps have failed, so representing those missing instance attributes with a zero value is justifiable. Figure 24 - Snippet of MatLab preprocessing code showing how missing data was filled for Well 01 After the missing data have been filled, the curves are then smoothed using a moving median with varying spans to remove noise. This helps to minimize overfitting to be used. The normal examples are smoothed over varying spans of specifi seen in Figure 25. The varying spans for the moving median was to avoid distortion of the individual datasets. The failure examples however were all smoothed using a moving median span of 11. Rod Pump Failure Prediction Sample of data from Failure Example Well11, showing daily runtime attribute as 0 but the other The datasets were preprocessed individually using MatLab as shown in Figure the unprocessed data ready for modeling, first the missing values are filled in with 0. This is done because since there is no data for those instances we can assume the rod pumps have failed, so representing those missing instance attributes with a zero value is justifiable. Snippet of MatLab preprocessing code showing how missing data was filled for Well 01 After the missing data have been filled, the curves are then smoothed using a moving median with varying spans to remove noise. This helps to minimize overfitting of the data mining model to be used. The normal examples are smoothed over varying spans of specifi . The varying spans for the moving median was to avoid distortion of the individual datasets. The failure examples however were all smoothed using a moving median Team 2 20 | P a g e Sample of data from Failure Example Well11, showing daily runtime attribute as 0 but the other Figure 24. In order to have g values are filled in with 0. This is done because since there is no data for those instances we can assume the rod pumps have failed, so representing those missing instance attributes with a zero value is justifiable. Snippet of MatLab preprocessing code showing how missing data was filled for Well 01 After the missing data have been filled, the curves are then smoothed using a moving median of the data mining model to be used. The normal examples are smoothed over varying spans of specifically 1, 3 and 9 as . The varying spans for the moving median was to avoid distortion of the individual datasets. The failure examples however were all smoothed using a moving median
  • 21. Rod Pump Failure Prediction Figure 25 - Snippet of MatLab preprocessing code showing the spans used for the various wells After the spans have been chosen, the medfilt1 MatLab function is then applied to all the numeric attributes of the dataset as shown in the moving median span is because the medfilt1 MatLab function works better with odd values. Figure 26 - Snippet of MatLab code used showing the implementation of the mefilt1 function After smoothing the values, it is seen that the attributes are less noisy as Figure 28 compared to Figure Figure 27 - Plot of smoothed CardArea attribute for failure example Well 19 Rod Pump Failure Prediction Snippet of MatLab preprocessing code showing the spans used for the various wells After the spans have been chosen, the medfilt1 MatLab function is then applied to all the of the dataset as shown in Figure 26. The reason for choosing odd values as the moving median span is because the medfilt1 MatLab function works better with odd values. Snippet of MatLab code used showing the implementation of the mefilt1 function After smoothing the values, it is seen that the attributes are less noisy as shown in Figure 21 and Figure 22. Plot of smoothed CardArea attribute for failure example Well 19 Team 2 21 | P a g e Snippet of MatLab preprocessing code showing the spans used for the various wells After the spans have been chosen, the medfilt1 MatLab function is then applied to all the . The reason for choosing odd values as the moving median span is because the medfilt1 MatLab function works better with odd values. Snippet of MatLab code used showing the implementation of the mefilt1 function shown in Figure 27 and
  • 22. Rod Pump Failure Prediction Figure 28 - Plot of smoothed CardArea attribute for Well 02 After smoothing all the individual datasets, they are normalized in MatLab in order to make the attributes comparable to one another due to the using the Min-Max normalization method, which is plausible because most of the attributes for the datasets are not skewed either to the left or the right. Figure 29 - Snippet of MatLab preprocessing code for normalizing the various well datasets 5.3.3 Clustering With the datasets for each well having better quality, they are then clustered individually using Weka. The cluster method used here is the Expectation Maximization method, which i probabilistic cluster method. Rather than using an attribute selection method for clustering, all Rod Pump Failure Prediction Plot of smoothed CardArea attribute for Well 02 After smoothing all the individual datasets, they are normalized in MatLab in order to make the attributes comparable to one another due to the varying ranges as seen in Figure Max normalization method, which is plausible because most of the attributes for the datasets are not skewed either to the left or the right. b preprocessing code for normalizing the various well datasets With the datasets for each well having better quality, they are then clustered individually using Weka. The cluster method used here is the Expectation Maximization method, which i probabilistic cluster method. Rather than using an attribute selection method for clustering, all Team 2 22 | P a g e After smoothing all the individual datasets, they are normalized in MatLab in order to make the Figure 29. This is done Max normalization method, which is plausible because most of the attributes for b preprocessing code for normalizing the various well datasets With the datasets for each well having better quality, they are then clustered individually using Weka. The cluster method used here is the Expectation Maximization method, which is a probabilistic cluster method. Rather than using an attribute selection method for clustering, all
  • 23. Rod Pump Failure Prediction Team 2 23 | P a g e the numeric attributes namely CardArea, PeakSurfLoad, MinSurfLoad and DailyRunTime are used by the EM Cluster algorithm to cluster the instances. The number of clusters is set to 3 so that we will know Normal, Pre-failure and Failure as shown in Figure 30. These 3 clusters then become the class attribute used for failure prediction. Pre- failure serves as the warning that failure of the rod pump is imminent so that technicians can service the rod pumps and this increases productivity and saves operational costs. Figure 30 - Visualization of various clusters for Card Area plotted against History Date for Well 19 5.3.4 Data Modeling After all the dataset instances have been assigned a class label, suitable data mining models are selected for failure prediction. Five predictive models were compared namely Naïve Bayes, AdaBoost (Decision Stump classifier), Multilayer Perceptron (Neural Network), Decision Tree and Logistic Regression. In order to evaluate the performance of these models, three separate methods of testing were developed namely 10-Fold Cross Validation, a percentage split and then a blind test.
  • 24. Rod Pump Failure Prediction Team 2 24 | P a g e In the 10-Fold Cross validation, all the well datasets for the failure examples, Wells 11-20 were merged into one dataset and input into Weka for analysis. The datasets for the normal examples, Wells 1-10 were also merged into a different single dataset, for analysis in Weka. Secondly, In the Percentage Split, all the well datasets for the failure examples, Wells 11-20 were merged into one dataset and input into Weka for analysis. The datasets for the normal examples, Wells 1-10 were also merged into a different single dataset. However, we decided to use an extreme case test of using 5% of the dataset for training the models and then using the rest of the 95% for testing. This would was done in order to give us a feel of how well the datasets were preprocessed and clustered. Finally in the blind test, for the failure examples, Wells 11-16 were merged into one dataset and used to train the models while Wells 17-20 were merged into a single dataset supplied to the model as a test set. For the normal examples, Wells 1-6 were merged into a single dataset and used to train the models while Wells 7-10 were merged into a single dataset and supplied to the models as a test set. 5.3.5 Testing Results After testing we see that the models most of them have very good predictive performances above 95%. However, we see that in the blind test for the failure examples, the Naïve Bayes model had a training accuracy of 99.05% and a testing accuracy of 76.21%, which suggests the model, is severely overfitted. Likewise in testing the normal examples, with the Decision Tree model, it is seen that it has a training accuracy of 98.63% and a testing accuracy of 63.31%, which suggests it, is severely overfitted. Considering the performance of the models for both the normal wells and the failure wells, the Neural Network and the Logistic Regression are the preferred models for failure prediction. Table 2 - Results for the models using the different performance evaluation methods for the failure examples AdaBoost NaiveBayes Neural Network Decision Tree Logistic Regression 10-Fold Cross Validation 99.31% 98.39% 99.86% 99.77% 99.82% Percentage Split 99.32% 99.13% 99.32% 97.19% 99.32% Blind Test 99.12% 99.13% 99.05% 76.21% 100% 99.88% 99.9% 92.81% 100% 92.69%
  • 25. Rod Pump Failure Prediction Table 3 - Results for the models using the different performance evaluation methods for the normal examples AdaBoost 10-Fold Cross Validation 88.44% Percentage Split 87.69% Blind Test 91.16% 81.97% The Logistic Regression Models and the Neural Network models are our preferred models for failure prediction so we will now take an in Weka analysis. While these are our two preferred models, a look different tests suggests that the Neural Network is the most superior model out of all the models tried. 5.3.5.1 Logistic Regression The confusion matrices show that it was able to predict all the classes quite accurately Failure Examples Dataset 10-Fold Cross Validation Figure 31 - Results for 10-Fold Cross Validation using Logistic Regression for failure examples dataset Rod Pump Failure Prediction Results for the models using the different performance evaluation methods for the normal examples NaiveBayes Neural Network Decision Tree 84.91% 90.54% 96.11% 84.2% 89.08% 87.46% 86.65% 81.97% 91.61% 86.02% 98.63% 63.31% The Logistic Regression Models and the Neural Network models are our preferred models for failure prediction so we will now take an in-depth look into the results of these models from our these are our two preferred models, a look at the confusion matrices of the different tests suggests that the Neural Network is the most superior model out of all the models Logistic Regression The confusion matrices show that it was able to predict all the classes quite accurately Fold Cross Validation using Logistic Regression for failure examples dataset Team 2 25 | P a g e Results for the models using the different performance evaluation methods for the normal examples Decision Logistic Regression 92.91% 89.87% 94.21% 89.03% The Logistic Regression Models and the Neural Network models are our preferred models for depth look into the results of these models from our at the confusion matrices of the different tests suggests that the Neural Network is the most superior model out of all the models The confusion matrices show that it was able to predict all the classes quite accurately Fold Cross Validation using Logistic Regression for failure examples dataset
  • 26. Rod Pump Failure Prediction Percentage Split Figure 32 - Results for Percentage Split using Logistic Regression for failure examples dataset Blind Test Training Set Figure 33 - Results for learning class of training set instances for Logistic Regression for failure examples dataset Rod Pump Failure Prediction Results for Percentage Split using Logistic Regression for failure examples dataset Results for learning class of training set instances for Logistic Regression for failure examples Team 2 26 | P a g e Results for Percentage Split using Logistic Regression for failure examples dataset Results for learning class of training set instances for Logistic Regression for failure examples
  • 27. Rod Pump Failure Prediction Test Set Figure 34 - Results for predicting class of test set instances using Logistic Regression for unseen failure examples dataset Normal Examples Dataset 10-Fold Cross Validation Figure 35 - Results for 10-Fold Cross Validation using Logistic Regression for normal examples dataset Rod Pump Failure Prediction Results for predicting class of test set instances using Logistic Regression for unseen failure Fold Cross Validation using Logistic Regression for normal examples dataset Team 2 27 | P a g e Results for predicting class of test set instances using Logistic Regression for unseen failure Fold Cross Validation using Logistic Regression for normal examples dataset
  • 28. Rod Pump Failure Prediction Percentage Split Figure 36 - Results for Percentage Split using Logistic Regression for normal examples dataset Blind Test Training Set Figure 37 - Results for learning class of training set instances using Logistic Regression for normal examples Rod Pump Failure Prediction Results for Percentage Split using Logistic Regression for normal examples dataset Results for learning class of training set instances using Logistic Regression for normal examples Team 2 28 | P a g e Results for Percentage Split using Logistic Regression for normal examples dataset Results for learning class of training set instances using Logistic Regression for normal examples
  • 29. Rod Pump Failure Prediction Test Set Figure 38 - Results for predicting class of test set instances using Logistic Regression for unseen normal examples 5.3.5.2 Neural Network (Multilayer Perceptron) From the confusion matrices for the failure examples, the model performs well predicting each class accurately however, when we look at the normal examples, it can be seen that it performs badly predicting the failure class. However, going back to visualizing all the normal examples dataset, it can be seen that there are hardly zero for an extended amount of time as seen with the failure examples datasets. Therefore we labeled those instances as failures from our clustering the failure class rightfully as pre prediction Failure Examples Dataset 10-Fold Cross Validation Rod Pump Failure Prediction Results for predicting class of test set instances using Logistic Regression for unseen normal (Multilayer Perceptron) the confusion matrices for the failure examples, the model performs well predicting each ly however, when we look at the normal examples, it can be seen that it performs badly predicting the failure class. However, going back to visualizing all the normal examples seen that there are hardly any failures since the attributes zero for an extended amount of time as seen with the failure examples datasets. Therefore those instances as failures from our clustering, the Neural Network actually predicts the failure class rightfully as pre-failures. For this reason alone, it is the best model for failure Team 2 29 | P a g e Results for predicting class of test set instances using Logistic Regression for unseen normal the confusion matrices for the failure examples, the model performs well predicting each ly however, when we look at the normal examples, it can be seen that it performs badly predicting the failure class. However, going back to visualizing all the normal examples since the attributes are almost never at zero for an extended amount of time as seen with the failure examples datasets. Therefore while , the Neural Network actually predicts For this reason alone, it is the best model for failure
  • 30. Rod Pump Failure Prediction Figure 39 - Results for 10-Fold Cross Validation using Neural Network for failure examples dataset Percentage Split Figure 40 - Results for Percentage Split using Neural Network for failure examples dataset Blind Test Training Set Rod Pump Failure Prediction Fold Cross Validation using Neural Network for failure examples dataset Results for Percentage Split using Neural Network for failure examples dataset Team 2 30 | P a g e Fold Cross Validation using Neural Network for failure examples dataset Results for Percentage Split using Neural Network for failure examples dataset
  • 31. Rod Pump Failure Prediction Figure 41 - Results for learning class of training set instances using Neural Network for failure examples dataset Test Set Figure 42 - Results for predicting class of test set instances using Neural Network for unseen failure examples dataset Rod Pump Failure Prediction Results for learning class of training set instances using Neural Network for failure examples Results for predicting class of test set instances using Neural Network for unseen failure examples Team 2 31 | P a g e Results for learning class of training set instances using Neural Network for failure examples Results for predicting class of test set instances using Neural Network for unseen failure examples
  • 32. Rod Pump Failure Prediction Normal Examples Dataset 10-Fold Cross Validation Figure 43 - Results for 10-Fold Cross Validation using Neural Network for normal examples dataset Percentage Split Figure 44 - Results for Percentage Split using Neural Network for normal examples dataset Blind Test Training Set Rod Pump Failure Prediction Fold Cross Validation using Neural Network for normal examples dataset s for Percentage Split using Neural Network for normal examples dataset Team 2 32 | P a g e Fold Cross Validation using Neural Network for normal examples dataset s for Percentage Split using Neural Network for normal examples dataset
  • 33. Rod Pump Failure Prediction Figure 45 - Results for learning training set instances using Neural Network for normal examples dataset Test Set Figure 46 - Results for predicting class of test set instances using Neural Network for unseen normal examples dataset Rod Pump Failure Prediction Results for learning training set instances using Neural Network for normal examples dataset Results for predicting class of test set instances using Neural Network for unseen normal Team 2 33 | P a g e Results for learning training set instances using Neural Network for normal examples dataset Results for predicting class of test set instances using Neural Network for unseen normal
  • 34. Rod Pump Failure Prediction Team 2 34 | P a g e 6 Conclusions Wells should be pre-processed on an individual basis to ensure a consistent dataset. We attempted to analyze the data on an individual and global basis and without a doubt, individual analysis resulted in better prediction accuracy. We evaluated multiple data mining algorithms and in the end Neural Network and Logistic Regression were the two preferred algorithms for failure prediction. However a look at the detailed results of the two models shows that the Neural Network is the best model for failure prediction. Machine learning with expert domain knowledge is critical for training the model. A lot of assumptions were made regarding the data, especially when a failure was observed. Having the knowledge expert on hand during training would result in the highest possible model accuracy. For true validation of the model, additional testing is required with the corroboration of the knowledge expert. This would ensure that all failures observed are actually well failures. Also, additional well data from another field would be a great test set to validate the model.
  • 35. Rod Pump Failure Prediction Team 2 35 | P a g e 7 Works Cited Liu, F., & Patel, A. (March 26-28, 2013). Well Failure Detection for Rod Pump Artificial Lift System through Pattern Recognition. Beijing, China: International Petroleum Technology Conference. Liu, Y. (December 2013). Failure Prediction for Rod Pump Artifiicial Lift Systems. S. Liu, Y. L. Automatic Early Fault Detection for Rod Pump Systems. Society of Petroleum Engineers (SPE).