SlideShare a Scribd company logo
STUDENT PROJECT 1
WINNING SUBMISSION
Pexitics – Jigsaw Contest
Preventive Maintenance Case Study
Contents:
Preparation Modeling
and
validation
Model
insights
Recommen-
dations
Model
monitoringBackground Approach Exploration Preparation
Modeling
and
validation
Model
insights
Clustering
and profiling
Recommen-
dations
Background
Defining the Problem Statement
Problem Statement:
 The machine in the diagram shows high pressure air
entering the machine and low pressure air exiting, while
water pressure is applied at the bottom.
 We can clearly see that the range for pressure 2 is lower
than pressure 1 and pressure 3, hence indicating that
pressure 2 can be water pressure
 The lower pressure applied here can be due to 2 reasons:
 The viscosity of water is higher than air
 Water is incompressible unlike air
Problem Statement:
▪ In 90,000 instances of maintenance work done by a company across various
machines, 39.47% was the breakage rate in a year.
▪ The company now wants to build a model to predict breakdown by analyzing
factors like pressure indicator points, machine lifetime, team usage, service
provider and use preventive maintenance to reduce downtime.
▪ The company is also looking for measures in terms of creating rules across
pressure levels , lifetime of a machine and create segments across teams and
service providers to predict and prevent breakage.
Approach
Methodology & Stages to solve the Problem
Our Approach
• Assess
Data
• Descriptive
Analysis
Exploration
• Missing Data
• Outlier
Detection
• Qualitative to
Quantitative
Variables
• Dummies
• Dividing
Datasets
(50:50)
Preparation
• Logistic
Regression
• Cluster (K
Means)
Model Building
• Perform Fit
Statistics
• Confusion
Matrix
• Confidence
Intervals
• Gains
Charts
• Rank
Ordering
• AUC &
ROC Curve
Model
Validation
• Visualizations
• Recommendation
Model Insights
• Assess Model
Performance
Solution
Monitoring
Exploration
Insights & Descriptive Statistics
Machine Breakage
Total Breakage Rate = 39.4%
39.4%
Breakage % by Teams - A,B,C
▪ Machines used by Team B have the highest
breakdown (15%) as compared to Team A
(12%) and Team C (12%)
Team B
– 15%
Breakage % by Service Providers – 1,2,3 & 4
▪ Manufacturer (Provider) 4 has the
lowest breakdown rate (8%) as
compared to manufacturers 1 (11%)
& manufacturer 3 (11%)
Manufacturer
4 – 8%
Distribution of Machine Life (in months)
▪ At an overall level, the average lifetime of a machine
is 55 months.
▪ All machines start breaking when lifetime reaches 60
months and continue to break till 93 months with peak
at 60 - 80 months.
0
28.6 28.8
26.08
16.51
0
5
10
15
20
25
30
35
0 - 59 months 60 - 70 months 71 - 80 months 81 - 90 months 91 - 93 months
Broken
Broken
Within broken = 1
Distribution of Pressure Indicator Points 1,2 & 3
▪ Most of the machines break at pressure indicator 1 between 63.0 to 112.5 of pressure.
▪ Most of the machines break at pressure indicator 2 between 78.1 to 102.0 of pressure.
▪ Most of the machines break at pressure indicator 3 between 70.1 to 110 of pressure.
Exploration
Within providers and teams
Breakage % across Providers & Teams
Breakage % across Providers & Teams
▪ Amongst breakage, more than 60% of the
machines for team B are provided from
manufacturer 1 and manufacturer 3 (which are
most likely to break) hence this team has a
higher probability of breaking
▪ Provider 4 has the lowest breakdown-rate at
team A and team B whereas provider 2 should
be the preferred vendor for team C.
Within broken = 1
Current scenario across Providers & Teams
▪ Currently, team A is buying the highest number
of m/c’s from provider 2 followed by provider 1,
instead of their preferred vendor provider 4.
▪ Team B is using the lowest number of m/c’s
from provider 4 (their preferred vendor)
▪ While, team C is taking highest number of
m/c’s from provider 2 (their preferred vendor)
Within broken = 0
Exploration
Within pressure indicator points, providers and teams
Breakage Summary - Pressure
Indicator 1, Teams & Providers
• At pressure indicator 1, most of the
machines break between 63.0 to
112.5 of pressure
• In this faceted density graph, the
probability of breakage is highest in
Machines of Provider 3 used by
Team A
• Breakage Probability is also
noticeably high with machines of
Provider 4 to Teams B & C
Breakage Summary - Pressure
Indicator 2, Teams & Providers
• At pressure indicator 2, most of the
machines break between 78.1 to
102.0 of pressure.
• In this faceted graph, except for
machines provided by Provider 3 to
Team B, all other instances of
machines reflected a broken trend
between 80-120 psi for pressure
indicator 2.
• High Breakage is noticed for
Machines provided by Provider 2 to
Team A at 100 psi.
• Provider 3’s Machines have shown
a noticeable non breakage for all
teams at 90-110 psi
Breakage Summary - Pressure
Indicator 3, Teams & Providers
• At pressure indicator 3, most of the
machines break between 70.1 to
110 of pressure.
• In this faceted graph, a high
breakage probability is noticeable
in Machines provided by Provider 3
to Team A and Team C, when
pressure indicator 3 is between 90-
120 psi.
• Non - breakage probability is
relatively high when Team B uses
machines from Provider 4 at 120
psi for pressure indicator 3.
Breakage Summary - Machine
Life, Average Pressure & Provider
 Machines from provider 4 and 2 do not
have breakage up to 80 months while
machines from provider 1 & 3 have
100% breakage till the same point
 Machines from Provider 4 show a
consistent life irrespective of pressure
points
 Machines from Provider 2 have a longer
lifetime; i.e. only 29% breakage
happens till 89 months as compared to
100% of provider 4.
Exploration
Within machine life, providers and teams
Breakage Summary - Machine
Life, Teams & Providers
• Machines provided by manufacturer 3
have least lifetime; breakdown
completely at 66 months
• Also, machines used from Provider 1 &
3 show maximum breakage
• This graph also proves that, provider 4
should be the preferred vendor for
Team A and Team B while provider 2
should be the preferred vendor for
Team C basis likelihood of breakage.
Breakage Summary - Machine
Life > 60 Months, Teams &
Providers
• Machines provided by Provider 1
and 3 constitute more than 57% of
the breakage.
• 40% of machines from Team A &
Team B show a breakage between
88 - 93 months.
Preparation
Data Checks, Missing Data, Outliers, Converting Qualitative variables to Quantitative Variables
Dummy Variables for Modelling
▪ Binned Pressure Indicator Points &
Life Time based on Quartiles
▪ Converted Teams & Providers into
numeric variables
▪ Average of Three Pressure Points as
Variable created
▪ Training and Validation Datasets
split with ratio 50:50
Dataset 90000 observations & 8 variables
Target
Variable Broken, Binary Variable
Data Type Character Variables =2, Numeric Variables = 6
Missing
Value
No Missing Values
Factors
Machine Life, Pressure Indicator Points, Teams ,
Providers
Outlier Detection - Box Plots
▪ Outliers seen at pressure points but they have been
taken into consideration while doing the modelling
▪ Data for Lifetime variable looks consistent; breakage
appears post 59 months
Modelling & Validation
Logistic Regression, Clustering & Validation Techniques
Explanation: Logistic Regression using SAS
▪ Fisher’s Binary Logit was performed with 45,000 observations read & used under training dataset
▪ Response Profile in the model tells us how many observations were "successful”; The model over here is
estimating the probability of the target variable “Broken” where broken = 1.
▪ Model Convergence status is satisfied; i.e. to get an indication if the algorithm converged well as we are
using an iterative procedure to fit the model. A satisfied convergence criteria also means that all the
independent variables are significant and enough in number.
▪ Model Fit Statistics – AIC, SC and -2LogL (Lack of fit) - Values range up to 60,431 after variable
transformations and bucketing. It is likely that we are getting a high number here as we don’t have a high
number of independent variables.
▪ Testing Global Null Hypothesis: Beta = 0, corresponds to the likelihood ratio, score and Wald tests,
where the likelihood ratio is used to comparing the goodness of fit of two statistical models – Null and
Alternative. Here, for a good model atleast 1 of the 3 values need to be significant (<0.05 at 5% level of
significance) for us to reject the null hypothesis. For our model, all the 3 variables are significant at 5%
level of significance.
▪ Maximum Likelihood Estimator (MLE) is a method of estimating the probability of parameters which
influence the predictor variable; P-values signifies that likelihood is purely by randomness and is not biased.
This also gives you an estimate value which tells us that with a unit change in x1, it will lead to a β1 change
in the log of the odds ratio of the probability of success of Y. A positive estimate value shows a directly
proportional relationship with the dependent variable while a negative estimate value shows an inversely
proportional relationship with the same.
▪ Odds Ratios Estimate gives the upper and lower limit for the confidence intervals signifying that 95% of
the time, the estimate will be in the given range. This also gives you a point estimate which tells us that with
a unit change in x1, it will lead to a β1 change in the log of the odds ratio of the probability of success of Y.
▪ Concordant and Discordant Pairs is the total number of possible pairs of (1’s) and (0’s), since it’s a binary
classification model. Every possible combination of 0 & 1 is counted v/s what the model predicts will be the
outcome. Every pair of response variable (0,1) combinations are reviewed. The number of times the
predicted probability of 0 was greater than the probability of 1 (and vice versa) are counted. This is then
compared to the real life probabilities of 0 and 1. If probability of 0/default is higher where there was a
default in real life relative to where there wasn’t a default in real life it is a concordant pair, else it’s a
discordant pair. The higher the concordance ratio, the better is the model. Our model has a concordant
percentage of 94.3.
Explanation: Logistic Regression using SAS
▪ Classification Table : The classification table is another method to evaluate the predictive accuracy of the
logistic regression model. In this table the observed values for the dependent outcome and the predicted
values (at a user defined cut-off value, for example p=0.50) are cross-classified. Our model correctly
predicts 86.7% of the cases at a cutoff value of 0.50.
▪ Gain chart / Lift curve : It is a popular technique to measure the performance of a logistic regression
model.
▪ Rank ordering : For our model, the rank ordering is well maintained since response rate decreases by
decile. We have also captured 72% of the outcomes in the first 3 deciles. This is shown in the excel
provided alongside this presentation.
▪ The entire dataset is divided into training and validation dataset in the ratio of 50:50. The model is first
built on the training dataset and then validated on the validation dataset. We have successfully validated our
model on the validation dataset with the same significant variables.
Explanation: Logistic Regression using SAS
Model Insights
Training dataset - Visualizations of the model
Model Insights
 The variables that are significant according to the
model and their explanation is given below:
 Lifetime 80 – Lifetime of machines ranging from 61
months to 80 months. This variable has a positive
estimate value signifying that an unit increase in the
number of months will lead to an increase in the log
of the odds ratio of the probability of success of
broken by 5.4286.
 Lifetime 93 – Lifetime of machines ranging from 81
months to 93 months. This variable also has a
positive estimate value signifying that an unit
increase in the number of months will lead to an
increase in the log of the odds ratio of the probability
of success of broken by 8.2307.
 Insight - Hence the machines should be replaced
by month 60.
Model Insights
 PressureInd_197 – Pressure indicator 1 ranging
from 85.01 to 97.00. This variable has a negative
estimate value signifying that an unit increase in
pressure indicator 1 within this range will lead to a
decrease in the log of the odds ratio of the
probability of success of broken by -0.6753.
 PressureInd_1115 – Pressure indicator 1 ranging
from 97.01 to 115.00. This variable also has a
negative estimate value signifying that an unit
increase in pressure indicator 1 within this range will
lead to a decrease in the log of the odds ratio of the
probability of success of broken by -0.3858.
Model Insights
 PressureInd_2100 – Pressure indicator 2 ranging
from 88.01 to 100.00. This variable also has a
negative estimate value signifying that an unit
increase in pressure indicator 2 within this range will
lead to a decrease in the log of the odds ratio of the
probability of success of broken by -0.129.
 PressureInd_3101 – Pressure indicator 3 ranging
from 88.01 to 101.00. This variable also has a
negative estimate value signifying that an unit
increase in pressure indicator 3 within this range will
lead to a decrease in the log of the odds ratio of the
probability of success of broken by -0.2199.
Model Insights
 PressureInd_3173 – Pressure indicator 3 ranging
from 114.01 to 173.00. This variable has a positive
estimate value signifying that an unit increase in
pressure indicator 3 within this range will lead to an
increase in the log of the odds ratio of the probability
of success of broken by 0.2485.
 Insight – Machines at lower pressure points like
p197, p115, p2100 and p3101 are less likely to
break at various pressure points (have negative
estimate value) as compared to p3173 which has
a higher likelihood (positive estimate value) of
breakage.
Model Insights
 Team B and Team C are significant variables with
Team A as base with a positive estimate value
signifying that with increase in the number of
machines used by these teams there will be an
increase in the log of the odds ratio of the probability
of success of broken by 0.2079 and 1.1111
respectively
 Insight – Machines used by Team C are more
likely to break as compared to Team B
Team CTeam B
Model Insights  Provider 3 is a significant variables with Provider 1
as base with a positive estimate value signifying that
with increase in the number of machines provided by
provider 3 there will be an increase in the log of the
odds ratio of the probability of success of broken by
3.2723.
 Provider 4 is a significant variables with Provider 1
as base with a negative estimate value signifying
that with increase in the number of machines
provided by provider 4 there will be a decrease in
the log of the odds ratio of the probability of success
of broken by -1.9197.
 Insight – Machines provided by provider 3 are
more likely to break (high positive estimate
value) as compared to machines provided by
provider 4 (high negative estimate value).
Provider 4 Provider 3
Model Insights - Visualization
Lifetime 80
– 61 to 80
months
Lifetime 93 – 81
to 93 months PressureInd_3173 –
114.01 to 173.00
Team B Team C Provider 3
Top 6 factors affecting machine breakage:
Clustering machines
Segmentation and Profiling
Profiling – Cluster 1
Variable N Cluster Mean Pop mean Pop std. dev Z-value
Lifetime 7744 42.87 55.09 26.51 -0.46
Broken 7744 0.01 0.39 0.49 -0.78
Pressure Indicator 1 7744 103.17 98.56 19.98 0.23
Pressure Indicator 2 7744 99.04 99.34 10.04 -0.03
Pressure Indicator 3 7744 95.01 100.59 19.62 -0.28
Team 7744 1.96 1.97 0.8 -0.01
Provider 7744 3.85 2.47 1.11 1.24
 7744 machines, 9% of the total machines
 Here we want to focus the most on the dependent variable “broken” which is much lower in this cluster than
normal. Hence showing that these machines are least likely to break within its product lifecycle (PLC).
 But lifetime of the machines is lower than average indicating that these m/c’s need to be replaced sooner than
others.
 Provider though is much higher than normal hence signifying that m/c’s bought from manufacturer 4 (as mean is
very close to 4) are least likely to break.
Profiling – Cluster 2
Variable N Cluster Mean Pop mean Pop std. dev Z-value
Lifetime 11332 45.02 55.09 26.51 -0.38
Broken 11332 0.16 0.39 0.49 -0.47
Pressure Indicator 1 11332 97.29 98.56 19.98 -0.06
Pressure Indicator 2 11332 99.9 99.34 10.04 0.06
Pressure Indicator 3 11332 124.63 100.59 19.62 1.23
Team 11332 1.94 1.97 0.8 -0.04
Provider 11332 2.23 2.47 1.11 -0.22
 11332 machines, 13% of the total machines
 Here also we want to focus the most on the dependent variable “broken” which is much lower in this cluster than
normal. Hence showing that these machines are less likely to break within its product lifecycle (PLC).
 Here too, the lifetime of the machines is lower than average indicating that these m/c’s need to be replaced
sooner than others.
Profiling – Cluster 3
Variable N Cluster Mean Pop mean Pop std. dev Z-value
Lifetime 16231 38.55 55.09 26.51 -0.62
Broken 16231 0.22 0.39 0.49 -0.35
Pressure Indicator 1 16231 97.69 98.56 19.98 -0.04
Pressure Indicator 2 16231 98.75 99.34 10.04 -0.06
Pressure Indicator 3 16231 94.89 100.59 19.62 -0.29
Team 16231 2.43 1.97 0.8 0.58
Provider 16231 2.15 2.47 1.11 -0.29
 16231 machines, 18% of the total machines
 Here again we want to focus the most on the dependent variable “broken” which is lower in this cluster than
normal. Hence showing that these machines are less likely to break within its product lifecycle (PLC).
 Here too, the lifetime of the machines is lower than average indicating that these m/c’s need to be replaced
sooner than others.
 Pressure indicator 3 though is much lower than normal hence signifying that m/c’s used here are more likely to
break at pressure indicator 3. Team and provider too is different than average hence proving that manufacturer 2
(mean < 2.5) should supply m/c’s to Team 2.
Team-wise and provider-wise cluster distribution
 Team A and Team C use the majority of the machines in all the clusters, hence signifying the fact that machines
used by Team A and Team C are least likely to break (among clusters with the least probability of break-down).
 Provider 4 is providing almost all the m/c’s in cluster 1 which is least likely to break
Cluster N Team A Team B Team C
1 7744 49.0 6.0 45.1
2 11332 47.5 10.6 41.9
3 16231 28.5 0 71.5
Cluster N Provider 1 Provider 2 Provider 3 Provider 4
1 7744 3.7 2.1 0 94.2
2 11332 35.0 31.8 8.1 25.1
3 16231 28.1 28.4 43.5 0
Recommendations
 Sensors Installation
 We recommend installing sensors and soft-wares which can predict/flag breakage at a safe lifetime and pressure
points.
 Service repair as a measure to Preventive Maintenance
 Regular maintenance protects your investment against unplanned breakdowns and hence, we recommend a
scheduled repair of machines on different time intervals as follows:
Recommendations
Provider Time Interval
(between months)
Total No. Of
Services
Flag-off Month
(Sensors)
Provider 1 35-80 3 34
Provider 2 60-93 3 59
Provider 3 35-66 3 34
Provider 4 60-89 3 59
*The starting pt. for servicing for m/c’s from provider 1 & 3 is 35 months basis cluster analysis (least likelihood of breakage) – as they are most prone to
breakage while for m/c’s from provider 2 & 4 it is 60 months as there is no breakage till that point.
*The ending point for the m/c’s provided by all the providers is decided basis the entire life-span of the machine.
 Sensors and software at pressure points 1,2 & 3 to predict breakage
 If we can apply the sensors at all the 3 pressure points (at different pressure levels), we can easily predict/prevent a
machine from breaking due to increase in pressure levels.
Recommendations
Pressure point Pressure with highest
likelihood of breakage
Flag-off pressure points
(Sensors)
Pressure point 1 63.0 – 112.5 55.0
Pressure point 2 78.1 – 102.0 70.0
Pressure point 3 70.1 – 110.0 62.0
*The sensors are applied at the above mentioned pressure levels so as to give enough time for the teams to shut off the m/c or prevent excess pressure on the
machines.
 Preferred Vendor to Teams
 Provider 2 should be the preferred vendor for Team C basis likelihood of breakage.
 Provider 4 should be the preferred vendor for Team A and Team B;
 Based on clusters analysis, with least likelihood of breakage, we believe that to successfully prevent a
machine breakdown we need to start servicing the machines from their 35th month into operation.
 Last but not the least, for future, it would be advisable to not purchase machines from Provider 1 and
Provider 3; but as for now, we have provided the necessary steps that need to be taken to prevent
breakage from these manufacturers.
 Also, it would be interesting to understand factors as to why machines were bought from these providers
– Cost Cutting? Logistic Feasibility? Goodwill?
Recommendations
STUDENT PROJECT 2
PEXITICS – PROACTIVE
MAINTENANCE STRATEGY
• Lakshmi
Kulkarni
• Meghashree R
• Mayur
Lalwani
Provider 1 - Insights
Observations from given data :
• Max age of the machine is 80. 1st breakdown observed at 73 , irrespective of factors ( pressure points )
• Broken (machine breaking down )is fully dependent on lifetime of the machine.
• Pressure Indicator 1 (PI1) and Pressure Indicator 3(PI3) are having a negligible positive impact on
machine breakdown.
• Pressure Indicator 2 (PI2) is having a negligible negative impact on machine breakdown.
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
1
Relationship of parameters with “Broken”
lifetime pressureInd_1 pressureInd_2 pressureInd_3
Insights from Analysis:
1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 64 with p(BD) at 5% approx.
2. For max values of PI1 & PI3 , and min values of PI2 , the p(BD) is 22% with BDA being 64(chart1 below).
3. For min values of PI1 & PI3 , and max values of PI2 , the p(BD) is 1% with BDA being 64.
4. Based on Points 1 through 3 above , we have used Break Down Age (BDA) as 64 and Maximum Cut Off Age as
(MCA) as 67 (mean values of PI1, PI2 and PI3).
Note : Use the RUL Estimator to calculate the values shown above in the screen print.
Input table for different
variables RUL estimator table
lifetime pressureIn
d_1
press
ureInd
_2
pressureI
nd_3
Probability of
breakdown =>
p(BD)
In percentage Break
Down Age
(BDA)
RUL
Provider
1
64 151 74 172
0.218811901 21.88119006 64 0
0
5
10
15
20
25
61 62 63 64
%ofBreakDown
Life Time
Chart 1 - % of BD v/s Life Time
RUL based on max values of PI1 & PI3 , and min values of PI2
Input table for different
variables RUL estimator table
lifetim
e
pressur
eInd_1
pressureI
nd_2
pressure
Ind_3
Probability of
breakdown => p(BD)
In
percentage
Break
Down
Age
(BDA)
RUL
Provider1
64 99 99 100
0.057757457
5.7757456
59 64 0
RUL based on mean values of PI1 , PI2 and PI3
Provider 2 - Insights
Observations from given data :
• Max age of the machine is 93. 1st breakdown observed at 85 , irrespective of factors ( pressure points )
• Broken (machine breaking down )is fully dependent on lifetime of the machine.
• Pressure Indicator 1 (PI1) and Pressure Indicator 3(PI3) are having a negligible positive impact on
machine breakdown.
• Pressure Indicator 2 (PI2) is having a negligible negative impact on machine breakdown.
1
Relationship of parameters with "Broken"
lifetime pressureInd_1 pressureInd_2 pressureInd_3
Insights from Analysis:
1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 77 with p(BD) at 5% approx.
2. For max values of PI1 & PI3 , and min values of PI2 , the p(BD) is 3129.46% with BDA being 77 (Chart 1
below).
3. For max values of PI1 & PI3 , and min values of PI2 , the p(BD) is 5% approx. with BDA being 66.
4. For min values of PI1 & PI3 , and max values of PI2 , the p(BD) is 0.020% with BDA being 77.
5. Based on points 1 through 3 above , we have used Break Down Age (BDA) as 66 and Maximum Cut Off Age
as (MCA) as 69 (max values of PI1 & PI3 , and min values of PI2 ).
Note : Use the RUL Estimator to calculate
the values.
Input table for different
variables RUL estimator table
lifetime pressure
Ind_1
pressureI
nd_2
pressureIn
d_3
Probability of
breakdown =>
p(BD)
In
percentage
Break Down
Age
(BDA)
RUL
Provid
er2 66 173 70 155 0.054603744
5.4603744
42 77 11
Input table for different
variables RUL estimator table
lifetim
e
pressureI
nd_1
pressure
Ind_2
pressureInd
_3
Probability of
breakdown =>
p(BD)
In
percentag
e
Break
Down Age
(BDA)
RUL
Provid
er2 77 173 71 155 31.29465076
3129.465
076 77 0
0
500
1000
1500
2000
2500
3000
3500
66 67 68 69 70 71 72 73 74 75 76 77
%ofBreakDown
Life Time
Chart 1 - % of BD v/s Life Time
RUL based on max values of PI1 & PI3 , and min values of PI2
RUL based on max values of PI1 & PI3 , and min values of PI2
Provider 3 - Insights
Observations from given data :
• Max age of the machine is 66. 1st breakdown observed at 60 , irrespective of factors ( pressure points )
• Broken (machine breaking down )is fully dependent on lifetime of the machine.
• Pressure Indicator 1 (PI1), Pressure Indicator 2 (PI2) and Pressure Indicator 3(PI3) are having a
negligible positive impact on machine breakdown.
• .
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1
Relationship of parameters with "Broken"
lifetime pressureInd_1 pressureInd_2 pressureInd_3
Insights from Analysis:
1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 54 with p(BD) at 5% approx.
2. For max values of PI1 , PI2 & PI3 , the p(BD) is 52.35% with BDA being 54 (Chart 1 below).
3. For min values of PI1, PI2 and PI3 , the p(BD) is 0.7749% with BDA being 54.
4. Based on points 1 through 3 above , we have used Break Down Age (BDA) as 50(5%) and Maximum Cut Off
Age as (MCA) as 53 (max values of PI1, PI2 and PI3).
Note : Use the RUL Estimator to calculate
the values.
0
20
40
60
80
100
120
51 52 53 54 55
%ofBreakDown
Life Time
Chart 1 - % of BD v/s Life Time
RUL based on max values of PI1 , PI2 and PI3
Input table for different
variables RUL estimator table
lifetim
e
pressureI
nd_1
pressureI
nd_2
pressure
Ind_3
Probability of
breakdown
=> p(BD)
In
percentag
e
Break
Down
Age
(BDA)
RUL
Provider3 54 152 123 148
0.51615260
1
51.61526
007 54 0
RUL based on max values of PI1 PI2 & PI3
Input table for different
variables RUL estimator table
lifetim
e
pressureIn
d_1
pressureInd
_2
pressureInd
_3
Probability of
breakdown
=> p(BD)
In
percentag
e
Break
Down
Age
(BDA)
RU
L
Provider3 51 152 123 148 0.065209324
6.520932
365 54 3
Provider 4 - Insights
Observations from given data :
• Max age of the machine is 89. 1st breakdown observed at 81 , irrespective of factors ( pressure points )
• Broken (machine breaking down )is fully dependent on lifetime of the machine.
• Pressure Indicator 1 (PI1) and Pressure Indicator 2 (PI2) are having a negligible negative impact on
machine breakdown.
• Pressure Indicator 3 (PI3) is having a negligible positive impact on machine breakdown.
-0.1
0
0.1
0.2
0.3
0.4
0.5
1
Relationship of parameters with "Broken"
lifetime pressureInd_1 pressureInd_2 pressureInd_3
Insights from Analysis:
1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 71 with p(BD) at 5% approx.
2. For min values of PI1, PI2 and For max values of PI3 , the p(BD) is 5.9% with BDA being 63(Chart 1 below).
3. Based on points 1 through 3 above , we have used Break Down Age (BDA) as 63 and Maximum Cut Off Age
as (MCA) as 67 (min values of PI1, PI2 and max PI3.
Note : Use the RUL Estimator to calculate
the values.
RUL based on min values of PI1 PI2 & max PI3
RUL based on min values of PI1 PI2 & max PI3
0
5
10
15
20
25
30
35
63 64 65 66 67
%ofBreakDown
Life Time
Chart 1 - % of BD v/s Life Time
Input table for different
variables RUL estimator table
lifetime pressur
eInd_1
pressureI
nd_2
pressureIn
d_3
Probability of
breakdown
=> p(BD)
In
percentage
Break
Down
Age
(BDA)
RUL
Provider4 67 36 77 149 0.300035998
30.003599
76 66 -1
Input table for different variables RUL estimator table
lifetim
e
pressure
Ind_1
pressureInd
_2
pressureInd_
3
Probability
of
breakdown
=> p(BD)
In
percentage
Break
Down
Age
(BDA)
RUL
Provider4 63 36 77 149
0.0591870
32
5.9187031
75 66 3
Preventive Maintenance Strategy
Provider
s
Break Down
Age
Max cut-off
Age
Provider
1
64 67
Provider
2
66 69
Provider
3
50 53
Provider
4
63 67• Break Down Age: Life time at/after which intimate the respective provider for
maintenance.
• Max Cut-off Age: Life time by which maintenance should be completed, else
machine may break down any time.
• After first cycle of maintenance is over ,the same time period should be
considered.
STUDENT PROJECT 3
Pexitics Preventive Maintenance Project
Submitted by Vicky Crasto & Arijit Mitra
Problem Statement – Data related to machine breakdowns is provided which must be used to predict future
occurrences and create a framework for preventive maintenance
Note – The entire R Code is available in the server - C:Jig1324221-Pexitics Case study
Overall Approach
• Understand the distribution of life time and breakdown across team and provider.
• Understand the distribution of life time and break down across team and provider basis the 3 Pressure indices
• Interaction between the 3 pressure indices and understand the distribution of break down
• Distribution of breakdown across teams and providers.
• Plotting Kaplan Meir survival function for the machine breakdown
• Plotting Kaplan Meir survival function basis team and provider
• Testing statistically if the survival function is different across team and provider.
• Determine the parametric survival function and the appropriate distribution.
• Use the regression model to predict the life time of the machine and identify the machine to be replaced urgently.
• Determine cox proportional model to determine the hazard rate and understand the relation between the covariates.
Distribution of lifetime basis machine status
We clearly see that the lifetime of machines that have broken is
more than 60 months, with the median value around 79 months
On the other most of the hand machine that have not been
broken have a lifetime between 20 to 60 months.
Overview of the data
Total no. of observation - 90000
39.47% of machine are broken
Machine status Team A Team B Team C
Not broken 22% 21% 18%
broken 12% 15% 12%
Distribution of the machines basis status
Machine status Provider 1 Provider 2 Provider 3 Provider 4
Not broken 14% 18% 13% 16%
broken 11% 9% 11% 8%
Understand the distribution of life time and breakdown across teams and provider
The lifetime of the machines manufactured by
Provider 3 is lower than the remaining. This needs to
be further tested
The lifetime of the machines belonging to Team C is
lower than the remaining. This needs to be further
tested
Understand the distribution of life time and breakdown across teams and provider basis PressureInd1
The lifetime of machines belonging to TeamC have less
lifetime with respect to pressureind_1
As highlighted machines from Provider3 seem to have a
lower lifetime with the pressureind_1 spread across the
range.
Understand the distribution of life time and breakdown across teams and provider basis PressureInd2
The lifetime of machines belonging to TeamC have less
lifetime with respect to pressureind_2
As highlighted machines from Provider3 seem to have a
lower lifetime with the pressureind_2 spread across the
range.
Understand the distribution of life time and breakdown across teams and provider basis PressureInd3
The lifetime of machines belonging to TeamC have less
lifetime with respect to pressureind_3 and they seem to
breaking down at 3 distinct levels
As highlighted machines from Provider3 seem to have a
lower lifetime with the pressureind_3 spread across the
range and also the break down has been occurring at
two distinct levels.
Understand the distribution of breakdown basis the interaction between pressure indices
As we see in the plots, machines tends to break at a lower pressure for pressureind_3 compared to the other 2 pressure
indices.
Distribution of breakdown across teams and providers
As highlighted the proportion of
broken machines is higher in
• Provider 1 and Team A
• Provider 1 and Team B
• Provider 3 and Team B
This needs to be investigated
further.
Plotting Kaplan Meir survival function for the machine breakdown
Things to note
• The survival probability decreases with increase in the
lifetime.
• At each level of the lifetime, the number of machines at risk
are lower as the number of machines censored are also
removed.
Censored machines means, observation which
• Machines which does not experiences the event
• Machine that is lost during the follow-up period
• Machine which has withdrawn from the study
Table showing the survival probability each level of lifetime
Kaplan Meir survival plot
Plotting Kaplan Meir survival function for the machine breakdown basis teams
Things to note
• Survival curve for Team C is different than Team A and B.
Table showing the survival probability each level of lifetime
Kaplan Meir survival plot across team
Plotting Kaplan Meir survival function for the machine breakdown basis provider
Things to note
• Survival curve for each provider is different. This must be tested statistically.
Table showing the survival probability each level of lifetime
Kaplan Meir survival plot across provider
Testing statistically if the survival function is different across team and provider.
Things to note
• We see that the p value is almost zero, indicating we reject the null hypothesis. We conclude that the survival function across team and provider is
different.
• Here the p value is not zero but almost zero. Since the dataset is huge, very small difference is magnified and found to be significant.
Using Log-rank test to check if the survival function is different across groups
H0 – Survival function is same across the groups
H1 – Survival function is different across the groups
Log Rank Test output for team Log Rank Test output for provider
Determine the parametric survival function and the appropriate distribution
Parametric models assume the knowledge of the survival or density function up to K unknown parameter.
However we need to determine the distribution of the underlying survival function.
For this we create regression models with different distribution and check basis the log likelihood value, which
fit the data the best.
Below are the log likelihood values for the various distribution
The lognormal distribution has the lowest log likelihood value and hence fits the data the best.
Using this regression model we predict the lifetime of the machines which has not been broken down and
determine the remaining lifetime of the machine.
Identifying the machine to replaced urgently
Using the remaining lifetime, we divide the machines into three groups
Remaining lifetime Label
Less than 15 months Need urgent attention
15 to 50 months Maintenance needed in Short term
Greater than 50 months Maintenance needed in Long term
As highlighted machines
manufactured by provider
3 and belonging to Team A
and Team B have a higher
proportion of machine in
need of maintenance on
immediate basis.
Determine Cox proportional hazard model to determine the hazard rate and understand the
relation between the covariates
Cox proportional hazard model is a semi - parametric model which does not assume any underlying
distribution for the hazard function but assume some distribution for the covariates.
The output of the model is shown below
We see that all the 3 pressure index are significant . Along with that the interaction between Team B and
Provider2 , and interaction between Team B and Provider 3 are significant.
The exp(coefficient ) are very small and make it very difficult to interpret into a meaningful equation.
On the whole model explains 71% of the variance in the data which good.
Significance of the individual covariates and their interaction Performance of the model
Recommendations
• The management must look into the machines belonging to Team B and the machines manufactured by
Provider 3.
• Preventive maintenance must be carried out as per the labels provided from the parametric regression
model.
• Moreover, best practices of machine maintenance carried out by Team A and Team C must be documented
and shared with all the teams.
• Machine manufacture audit can be carried out to understand the quality of the spares used in the
machine, so that frequent breakdown of machine can be avoided.
Areas of improvement for the model
• Determine the performance of the parametric model by divided the data into model and validation
dataset. Plot the lift chart to determine how well the model is working.
• Fine tune the Cox proportional hazard function and determine the hazard ratio for each covariate.
Reference
• PD Allison (1995). Survival Analysis Using SAS: A Practical Guide. SAS Publishing
• https://www.analyticsvidhya.com/blog/2015/05/comprehensive-guide-parametric-survival-analysis/
• http://www.sthda.com/english/wiki/survival-analysis-basics
• http://www.biostat.umn.edu/~wguan/class/PUBH7402/notes/lecture11.pdf
• Michael J. Crawley The R Book Imperial College, Silwood Park, Ascot, Berks
STUDENT PROJECT 4
Pexitics Preventive Maintenance
Data Analysis Project
- Deepeshkumar Malviya
OBJECTIVE & SCOPE
Pexitics would like to build a model to predict machine breakdown and use preventive maintenance to reduce downtime. The insights
and results obtained from predictive model thus created should be indicative enough to create framework for identification of
breakdowns and also suggestive enough to enable stakeholders for taking corrective actions.
Understanding of the Data:
The dataset provided 90000 instances of maintenance work done for various machines across 1 year. It appears to comprise periodic
instances of sensor data picked up from a random sample of observations during the period wherein each record provide details for
instance readings of respective machine status in form of –
• Readings of pressure indicators at particular instance
• Only machine specific attribute in form of ‘Lifetime’
• Representative information about usage (team handing machine in the factory)
• Machine entity (manufacturer) information
• Random observation number
• Breakdown status
It is imperative to preprocess/rationalize the given data to check data sanity, create derived variables and possibly restructure data based
on assumptions for model building. At the outset, data appear to be sensor data, so it is rather very structured.
APPROACH
Major part of the analysis revolves around pressures, stakeholders involved and lifetimes in decreasing order of importance. Since the
machine functioning parameters are not given; those are assumed to be standard; a lot depends on the interplay of the external
environmental factors. There has to be controllable element for changing the pressures but the right combination for optimal
performance of machines given the wear and tear and the handling of the machines seem to be complex in nature. Hence, the failures
behavior does not seem to be linear in nature and hence cannot be predicted in straightforward fashion.
Important Considerations:
Three types of interactions are important from understanding of breakdown phenomenon and later model building, arising out of
segmented behaviors on these interactions –
A. Interactions among Stakeholders:
The interaction of variables ‘Team’ and ‘Provider’, the rationale being two-fold:
- The probability of failures will highly depend on the usage patterns; even when the machines are standard and reliable.
- Serious defects at the time of manufacturing will hamper the performance of the machines even though the usage patterns are
standard.
In practical business scenarios, the accountability of machine maintenance lies with the team/factory/department and hence
financial obligations. Also, the SLAs and contracts for machines from provider/company gets reviewed after a holistic review in the
entire organizational setup wherein multiple team/factory/department provides their performance reports. Hence, the interaction
between variables ‘team’ and ‘provider’ cannot be viewed separately. The collective assessment will give better insights to net effect.
B. Interactions among Pressures:
The pressures ought to have significant interactions among themselves governed by domain specific as well as process specific laws
of applied physics and mathematics. These pressures can also work as environmental stressing conditions.
APPROACH
C. Interactions arising out of Lifetimes :
Although the lifetimes are in months; the records do not belong to equally spaced time periods; meaning that random observations
do not lead to time series implications. The importance of the lifetime data gets reduced also from the fact that the spread of the
random observations are not equal for all the lifetimes. Hence, no analysis is done on lifetimes as main driving factor; rather it is used
as a supplementary information (though very useful) to derive failure behavior based on age.
Broad Assumptions about the data:
• The variable ‘S.no’ only signifies chronological readings.
• Each machinery breakdown reading has no dependency on the breakdown behavior of subsequent breakdown reading.
• At any given instance of recordings of data, no conditional or joint probability exists for pressures acting on one machine of ‘x’
lifetime with pressures acting of another machine of ‘y’ lifetime from same manufacturer (e.g. Provider1) and that belonging to same
factory (e.g. TeamA).
• The breakdown as in 1 in variable ‘broken’ signifies complete breakdown and not partial working condition.
Techniques/ Methods of Analysis:
• Cluster Analysis: Unsupervised Learning method for segmentation based on distance measure (proximity).
• Markov Chain Model: A stochastic (random) model for deriving sequence of events and then probability of events depending on
previously attained events.
• Exploratory Analysis: Involving Data Manipulation and Data visualizations to draw insights in the modeling process.
Cluster Analysis:
Rationale:
It aims to identify segments that exhibit similar behavior towards failure/machine breakdowns conditions.
Premises, Importance & Thought Process:
The variable ‘broken’ mentions about failures in a very objective sense as 0 and 1. To deduce qualitative information about the data, it is
imperative to obtain patterns in the data beyond binary outcomes. Except failure status, given data is used to derive important metrics
(explained in detail in the next section) as a proxy to indicate interplay of factors influencing failure behavior.
Clustering analysis take these metrics for analyzing unexpected fluctuations from normal conditions. It is aimed at finding distinct
segments based on working conditions without knowledge of baseline threshold working model; purely based on variance in the data.
The cluster analysis considers only fluctuations of the pressures for the set of conditions taking into account net effect of ‘team’,
‘provider’ and ‘lifetime’. Since there is no benchmark available, the fluctuations indicating ‘above’ or ‘below’ standard working conditions
for performance is obtained by numerically assessing distribution for the given phenomenon. One of the key part of analysis is
‘standardization’ of the data along it’s mean & measured in terms of standard deviations.
Data for clustering comprise of unique records for ‘lifetime’ and three pressure values. This is relatively small dataset (1000 records) but
includes all the possible values that the pressures can take for all given lifetimes.
The analysis required multi-phased sequential re-clustering to capture finer fluctuations. The rationale is that these finer fluctuations can
take different sizes on complete data and hence none of those could have been neglected.
APPROACH
Markov Chain Model:
Rationale:
It aims to evaluate and establish probabilistic nature of failure conditions.
Premises, Importance & Thought Process:
In absence of time element in the analysis, it is not possible to evaluate or establish any time-based metric. Hence, a lot of Age-to-Failure
(Mean Time Between Failure) analysis; and Life Data analysis (parametric Weibull Distribution) along with associated Time dependent
Reliability Analysis of machinery breakdowns cannot be performed. To facilitate analysis on ‘state-space’ as supposed to ‘time-
parameter’, Markov chain model is used. It aids prediction on future states solely based on the inter-relationships of sequential
occurrence of states in the past.
Conventionally, the input to Markov chain model are the distinct states. The rationale behind cluster analysis is to identify these states.
Since the data is chronologically arranged for the combination of ‘team’ & ‘provider’ along with ‘lifetime’, the cluster membership will
reflect the sequential states along which the machine breakdown progresses through normal, possibly sub-optimal conditions and then
failures. The results from Markov Chain mention ‘transition probabilities’ i.e. the likelihood of going from one state to another.
Since, the cluster memberships are straightaway considered as states; there is a scenario wherein the two sequential states are not to
be considered. This happens when two non-comparable states come together. The situation arises in the data when data for two
consecutive rows are for two different sets e.g. ‘TeamA_Provider1’ and ‘Team_Provider2’. These cases are very few in total data as it is
sorted accordingly to avoid the same. The resultant probabilities of such cases are too less and do not make an overall impact.
APPROACH
Exploratory Analysis:
Rationale:
It is used for unearthing insights about the failure behavior at various stages. Exploratory analysis aims at guiding course of analysis and
also at critical junctures while evaluating parameters for statistical model customization. Visualizations are important part of the
exploratory analysis, and it is used at specific occasions.
Functionally has following importance in Analysis :
A. Insights-driven (both pre and post modeling) judgment specific:
Since the whole approach of analysis is derived metrics oriented, it is used for verifying the suitability of the application of such metrics
from business point of view, mainly before modeling. Post modeling, it provides interpretability and aid to infer important business
critical information.
B. Modeling Diagnostics (Model improvement) & Results (Analytical importance) specific :
The results of the clustering diagnostics are influenced by thumb rules mentioning best practices about model-specific parameters.
Clustering results are shown for the performance of key metrics only with respect to failures. The results obtained from Markov chain
model are included for providing better business context i.e. the probabilities obtained are linked to cluster profiles exhibiting transient
(changing) failures.
APPROACH
STEPS IN ANALYSIS
A. Data Preparation:
The outliers in the dataset are identified based on evaluation of the derived metric but no special treatment is done owing to two
reasons – a. Clustering is sensitive to outliers, hence those will anyways get filtered. b. Markov Chain denote probabilities of co-
occurrence, outliers will have too less probabilities and hence will be ignored. Data preparation is carried on following lines.
Derived Variables creation:
Derived variables are created to help summarisation of the data and also define the units of aggregation. This ultimately are key to
drawing insights & building a model around those. In the course of data manipulation throughout the analysis, many variables are
created but only important variables are listed below:
1. ‘team_provider’ & ‘life_pres_all’: Concatenated variables indicating interactions among given variables.
2. ‘pr1_pr2_corres_pr3’, ‘pr2_pr3_corres_pr1’, ‘pr3_pr2_corres_pr1’: Calculated metrics capturing interactions in pressure values
among each other. It implies product of first two pressure values, divided by the third. So, all changes get captured.
3. ‘normz_int1’, ‘normz_int2’, ‘normz_int3’: Standardized scores implying differences from mean (for the grouped data on lifetime and
team_provider) in terms of standard deviations for the three variables created in pt.2 above.
4. ‘std_int1’, ‘std_int2’, ‘std_int3’: Based on standardized scores philosophy, population statistics are compared with cluster statistics.
Interaction 1 (int1) stand for first metric in pt.2 above and likewise for other two interactions.
5. ‘Int1_Inc_ge_0.45_1_stdev’, ‘Int1_Inc_ge_1_1.5_stdev’ ’, ‘Int1_Inc_gt_1.5_stdev” : For ‘std_int1’ in pt.4 above, magnitude of
increase in standard deviations in three distinct levels. Likewise for three levels of decrease and then for other two interactions. It is
used for cluster profiling and naming states in Markov Chain terminology.
6. Links & Nodes: It is used majorly in network diagram and has been used a lot in data manipulation to get desired matrices.
STEPS IN ANALYSIS
B. Broad Steps in Analysis:
1. Initial Exploratory Analysis to understand the data and know the distributions.
2. Creation of Derived Variables, potential outliers detection (not removal) based on distribution of the calculated metrics.
3. Cluster Analysis:
a. Creation of Data for Clustering and performing data checks/ exploration.
b. Hierarchical Clustering to know the optimal number of clusters by initially creating dissimilarity matrix and visually confirming
through dendrogram by applying different methods of linkage between clusters.
c. Scaling complete data for clustering and perform detailed Clustering Diagnostics on scaled data to arrive at optimal clusters.
d. Again performing Hierarchical Clustering for optimal number of clusters to get the cluster centers for K-means clustering.
e. Perform K-Means Clustering on scaled data & use the cluster centers obtained for the optimal number of clusters.
f. Re-cluster the data following above steps and append the cluster information.
g. Profiling the clusters through by comparing population characteristics with cluster characteristics. Visualize the data graphically.
4. Markov Chain Modeling:
a. Creation of Data for Markov Chain modeling and performing data checks/ exploration.
b. Create Sequence matrix based on cluster memberships and then create Transition Probabilities matrix.
c. Data manipulation to ascertain probabilities only for transient states i.e. changes between distinct states involving failures.
5. Visualization
a. Extensive Data Manipulation involving new metrics creation to arrive at right data for Visualizations.
b. Visualization 1 : To illustrate the magnitude of deviations in metrics for the clusters having transient failure conditions.
c. Visualization 2 : To illustrate the association between transient states depending upon transition probabilities.
STEPS IN ANALYSIS
C. Key Statistical Methodologies/Diagnostic Evaluation metrics used in Analysis:
• Scaling: Scaling is used normalization of the data. It adopts similar standardization technique as used earlier for calculation of
metrics for pressures. The rationale being to make sure that complete data becomes comparable for calculating distances by
any distance proximity measure and cluster linkage method.
• Clustering- Average & Ward.D2: It denotes method of linkage among each group of clustering. Average is used for the average
of distances between all pair of objects among clusters. Ward.D2 is an improvement over Ward method. Ward method
minimizes the total within-cluster variance i.e. at each iteration of clustering it finds the pair of clusters that leads to minimum
increase in total within-cluster variance after merging. Ward.D2 implements criterion wherein dissimilarities are squared
before cluster updating.
• Pseudo F-statistic : Pseudo F-statistic is intended to capture the 'tightness' of clusters and describes the ratio of between cluster
variance to within-cluster variance. Optimal number of clusters should have maximum value among all the clusters considered.
D. Important Thresholds considered in the Analysis:
• Minimum Proportion of Failures as 40% for states which exhibit machine breakdown tendency; implying that in at least 40% of
instances for given cluster machines must have failed across the complete period under consideration.
• Minimum Proportion of Failures as 10% for states which exhibit transient states for machine breakdown tendency; implying
that in at least 10% of all failing conditions of machines for cluster; subsequent condition differ from previous failing condition.
• 0.45 standard deviation as the lower limit for qualifying condition in understanding towards fluctuation in cluster means from
population means for metrics. The lower limit is 0.45 and not 0.50 (considering equal differences among three levels) since
there are some values hovering around 0.50 and will not get considered if the lower is not relaxed a bit.
• 90 % as confidence for calculating transition matrix probabilities. Since data don’t have equal spread, it is relaxed at 10% risk.
The variable ‘avg_3way_int’ is the composite average of variables ‘pr1_pr2_corres_pr3’, ‘pr2_pr3_corres_pr1’, ‘pr3_pr2_corres_pr1’ (explained in earlier
section). As the name suggests, the variable indicates the average behavior of three interactions. The distribution looks very normal & symmetric about
mean implying that on an average basis the fluctuations in the interactions gets compensated by another. However, there seems to be some extreme cases;
on close inspection 523 cases were found that fall in the extremes of the curve.
VISUALIZATIONS : EXPLORATORY ANALYSIS
The dendrogram suggest the agglomerative method for hierarchical
clustering through tree diagram. As previously mentioned, the rationale
of clustering is to get as many justifiable and distinct clusters as possible.
The red rectangles drawn suggest 22 as number of clusters to be used as
the first level clusters.
Pseudo F-statistic is calculated using a custom built function for all clusters obtained on
k-means clustering. The plot suggest that the maximum ‘Pseudo F-statistic’ is obtained
at clusters 22 (plot starts from value2) and hence the optimal clusters are 22. No seed
is put deliberately in the custom function to check reliability and not to ensure
reproducibility and hence it throws results in absolutely randomized manner. A lot of
potential outliers (if considered for reduced dataset of 1000 only) are included and
hence lots of iterations where required to deduce that the optimal clusters lie in the
range of 16-22. To take into consideration all possibilities, 22 was chosen as optimal
clusters since it was also suggested by hierarchical clustering above.
VISUALIZATIONS : CLUSTERING DIAGNOSTICS
Population statistics are compared with Cluster statistics only for clusters with chosen failures conditions (failing at least 40% and 10% transient) .
Externally drawn red lines suggest the lower limit (0.45) of threshold s used for cluster profiling. The visualization suggest that only three clusters viz. 13,36
& 39 do not show any significant fluctuation beyond the threshold set.
VISUALIZATIONS : CLUSTER ANALYSIS RESULTS
VISUALIZATIONS : MARKOV CHAIN RESULTS
The interactive network diagram known as ‘sankey diagram ‘ show the flow of the transient states. The width of the band between two nodes denote the
probability of change from one state to another as immediate subsequent state (probabilities visible in R, here not visible it being an image). The three clusters
13,36 & 39 which do not show any significant increase or decrease identified previously are represented as ‘No_major_differentiator’. The three levels (pt.5 in
data preparation) are labeled as Slight (Sli), Moderate (Mod) & Extreme(Ext) along with Increase (High) & Decrease (Low) for interactions (Int1, Int2 & Int3).
SYNOPSIS OF THE ANALYSIS
In a nutshell, the broad philosophy of the analysis is..
• Identify the key players responsible for the phenomenon i.e. machine breakdown & then use those to measure aggregated and
comparable behavior; here, in the decreasing order of three pressures, stakeholders and then lifetime.
• In absence of any business information about the process and the machines, create metrics that capture any important data
indicating failure behavior. The failures (owing to extrinsic factors) generally happen only when there are serious deviations
from normal conditions. Hence, standard scores are calculated and any large deviations in that is the reflection of non-
acceptable behavior.
• Since the business parameters are missing, the criticality in terms of outliers cannot be ascertained. Hence, choice of algorithm
is very strategically done wherein outliers becomes part of the result and yet does not affect model behavior; unlike parametric
regression methods wherein outliers can have serious impacts on results - beta estimates. Cluster Analysis is chosen algorithm
which provide the segments that gives the different patterns in the fluctuations.
• To calculate the probabilistic nature of failures, the clusters memberships have to be used as input. Markov Chain calculates the
probabilities of sequential co-occurrence of states and hence preferred.
SHORTCOMINGS
Shortcomings of the Analysis:
• No complete representation of data i.e. data is not equally spread across all lifetimes and hence predictive ability cannot be
measured with utmost precision although Markov Chain can predict the likely states. Hence, although the objective is achieved
for developing a framework for predicting failures; the model is not representing behavior holistically & it cannot be deployed
in production mode. This also imply that the results in form of probabilities of associated failures obtained need to be revisited
in light of complete data wherein all lifetimes are considered for all stakeholders.
• No direct association with business end of the preventive maintenance of breakdowns like economic losses and strategic
implications (e.g. capacity planning) can be measured or benchmarked as such representative information is not present. As a
result, no business success metric or milestones can be defined or recommended. Only analytical methodology is explained
herewith.
• No definitive domain intelligence can be integrated with the results since information regarding type, purpose of machines,
criticality and machine specific attribution is not available.
Thanks!

More Related Content

What's hot

QbD by central composite design
QbD by central composite designQbD by central composite design
QbD by central composite design
sushmita rana
 
Ppt On S.Q.C.
Ppt On S.Q.C.Ppt On S.Q.C.
Ppt On S.Q.C.dvietians
 
STATISTICAL QUALITY CONTROL
STATISTICAL QUALITY CONTROLSTATISTICAL QUALITY CONTROL
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W3 Sample Size
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W3 Sample Size Javier Garcia - Verdugo Sanchez - Six Sigma Training - W3 Sample Size
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W3 Sample Size
J. García - Verdugo
 
Quantifying Risk of End Result Specifications
Quantifying Risk of End Result SpecificationsQuantifying Risk of End Result Specifications
Quantifying Risk of End Result Specifications
California Asphalt Pavement Association
 
6 statistical quality control
6   statistical quality control6   statistical quality control
6 statistical quality control
Ijan Rahman Ode
 

What's hot (6)

QbD by central composite design
QbD by central composite designQbD by central composite design
QbD by central composite design
 
Ppt On S.Q.C.
Ppt On S.Q.C.Ppt On S.Q.C.
Ppt On S.Q.C.
 
STATISTICAL QUALITY CONTROL
STATISTICAL QUALITY CONTROLSTATISTICAL QUALITY CONTROL
STATISTICAL QUALITY CONTROL
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W3 Sample Size
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W3 Sample Size Javier Garcia - Verdugo Sanchez - Six Sigma Training - W3 Sample Size
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W3 Sample Size
 
Quantifying Risk of End Result Specifications
Quantifying Risk of End Result SpecificationsQuantifying Risk of End Result Specifications
Quantifying Risk of End Result Specifications
 
6 statistical quality control
6   statistical quality control6   statistical quality control
6 statistical quality control
 

Similar to Jigsaw Academy Pexitics Student Projects

Recommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning ProjectRecommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning Project
Pranov Mishra
 
Lecture5 Applied Econometrics and Economic Modeling
Lecture5 Applied Econometrics and Economic ModelingLecture5 Applied Econometrics and Economic Modeling
Lecture5 Applied Econometrics and Economic Modelingstone55
 
Measurement system analysis Presentation.ppt
Measurement system analysis Presentation.pptMeasurement system analysis Presentation.ppt
Measurement system analysis Presentation.ppt
jawadullah25
 
Dilshod Achilov Gage R&R
Dilshod Achilov Gage R&RDilshod Achilov Gage R&R
Dilshod Achilov Gage R&R
ahmad bassiouny
 
Six sigma using minitab
Six sigma using minitabSix sigma using minitab
Six sigma using minitab
Vimal sam singh
 
Validation of lab instruments and quantitative test methods
Validation of lab instruments and quantitative test methods Validation of lab instruments and quantitative test methods
Validation of lab instruments and quantitative test methods
Mostafa Mahmoud
 
Continuous Sampling Planning
Continuous Sampling PlanningContinuous Sampling Planning
Continuous Sampling Planning
ahmad bassiouny
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Measurement System ...
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Measurement System ...Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Measurement System ...
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Measurement System ...
J. García - Verdugo
 
Cmt learning objective 36 case study of s&amp;p 500
Cmt learning objective 36   case study of s&amp;p 500Cmt learning objective 36   case study of s&amp;p 500
Cmt learning objective 36 case study of s&amp;p 500
Professional Training Academy
 
Interactive GR&R Self-teach Presentation-1
Interactive GR&R Self-teach Presentation-1Interactive GR&R Self-teach Presentation-1
Interactive GR&R Self-teach Presentation-1Kea Jolicoeur
 
MSA R&R for training in manufacturing industry
MSA R&R for training in manufacturing industryMSA R&R for training in manufacturing industry
MSA R&R for training in manufacturing industry
abhishek558363
 
Case study of s&amp;p 500
Case study of s&amp;p 500Case study of s&amp;p 500
Case study of s&amp;p 500
Professional Training Academy
 
Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...
Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...
Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...
Minitab, LLC
 
Statistical quality control .pdf
Statistical quality control .pdfStatistical quality control .pdf
Statistical quality control .pdf
UVAS
 
Process Control
Process ControlProcess Control
Process Control
Ronald Shewchuk
 
Quality Improvement Using Gr&R : A Case Study
Quality Improvement Using Gr&R : A Case StudyQuality Improvement Using Gr&R : A Case Study
Quality Improvement Using Gr&R : A Case Study
IRJET Journal
 
Statistical Process Control Part 2
Statistical Process Control Part 2Statistical Process Control Part 2
Statistical Process Control Part 2
Malay Pandya
 

Similar to Jigsaw Academy Pexitics Student Projects (20)

Recommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning ProjectRecommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning Project
 
Lecture5 Applied Econometrics and Economic Modeling
Lecture5 Applied Econometrics and Economic ModelingLecture5 Applied Econometrics and Economic Modeling
Lecture5 Applied Econometrics and Economic Modeling
 
Measurement system analysis Presentation.ppt
Measurement system analysis Presentation.pptMeasurement system analysis Presentation.ppt
Measurement system analysis Presentation.ppt
 
LCsimulator
LCsimulatorLCsimulator
LCsimulator
 
QA QC
QA QCQA QC
QA QC
 
Dilshod Achilov Gage R&R
Dilshod Achilov Gage R&RDilshod Achilov Gage R&R
Dilshod Achilov Gage R&R
 
Six sigma using minitab
Six sigma using minitabSix sigma using minitab
Six sigma using minitab
 
Validation of lab instruments and quantitative test methods
Validation of lab instruments and quantitative test methods Validation of lab instruments and quantitative test methods
Validation of lab instruments and quantitative test methods
 
Continuous Sampling Planning
Continuous Sampling PlanningContinuous Sampling Planning
Continuous Sampling Planning
 
Guagerr
GuagerrGuagerr
Guagerr
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Measurement System ...
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Measurement System ...Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Measurement System ...
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Measurement System ...
 
Cmt learning objective 36 case study of s&amp;p 500
Cmt learning objective 36   case study of s&amp;p 500Cmt learning objective 36   case study of s&amp;p 500
Cmt learning objective 36 case study of s&amp;p 500
 
Interactive GR&R Self-teach Presentation-1
Interactive GR&R Self-teach Presentation-1Interactive GR&R Self-teach Presentation-1
Interactive GR&R Self-teach Presentation-1
 
MSA R&R for training in manufacturing industry
MSA R&R for training in manufacturing industryMSA R&R for training in manufacturing industry
MSA R&R for training in manufacturing industry
 
Case study of s&amp;p 500
Case study of s&amp;p 500Case study of s&amp;p 500
Case study of s&amp;p 500
 
Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...
Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...
Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...
 
Statistical quality control .pdf
Statistical quality control .pdfStatistical quality control .pdf
Statistical quality control .pdf
 
Process Control
Process ControlProcess Control
Process Control
 
Quality Improvement Using Gr&R : A Case Study
Quality Improvement Using Gr&R : A Case StudyQuality Improvement Using Gr&R : A Case Study
Quality Improvement Using Gr&R : A Case Study
 
Statistical Process Control Part 2
Statistical Process Control Part 2Statistical Process Control Part 2
Statistical Process Control Part 2
 

More from Jigsaw Academy

Taximan Challenge on Data Visualization - Vicky Crasto
Taximan Challenge on Data Visualization - Vicky CrastoTaximan Challenge on Data Visualization - Vicky Crasto
Taximan Challenge on Data Visualization - Vicky Crasto
Jigsaw Academy
 
Jigsaw Academy Cafe Great Contest - Winning Presentations
Jigsaw Academy Cafe Great Contest - Winning PresentationsJigsaw Academy Cafe Great Contest - Winning Presentations
Jigsaw Academy Cafe Great Contest - Winning Presentations
Jigsaw Academy
 
Jigsaw Academy Digital India Contest - Kerala & Tamil Nadu
Jigsaw Academy Digital India Contest - Kerala & Tamil NaduJigsaw Academy Digital India Contest - Kerala & Tamil Nadu
Jigsaw Academy Digital India Contest - Kerala & Tamil Nadu
Jigsaw Academy
 
Jigsaw Academy Digital India Contest - Andhra Pradesh & Karnataka
Jigsaw Academy Digital India Contest - Andhra Pradesh & KarnatakaJigsaw Academy Digital India Contest - Andhra Pradesh & Karnataka
Jigsaw Academy Digital India Contest - Andhra Pradesh & Karnataka
Jigsaw Academy
 
Jigsaw Academy Digital India Contest - Uttarakhand & Himachal Pradesh
Jigsaw Academy Digital India Contest - Uttarakhand & Himachal PradeshJigsaw Academy Digital India Contest - Uttarakhand & Himachal Pradesh
Jigsaw Academy Digital India Contest - Uttarakhand & Himachal Pradesh
Jigsaw Academy
 
The Jigsaw Team
The Jigsaw TeamThe Jigsaw Team
The Jigsaw Team
Jigsaw Academy
 
Topic 1 Introduction to web analytics
Topic  1   Introduction to web analytics Topic  1   Introduction to web analytics
Topic 1 Introduction to web analytics
Jigsaw Academy
 
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
Jigsaw Academy
 
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam An...
 Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam An... Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam An...
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam An...Jigsaw Academy
 
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...
 Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds... Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds...
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...Jigsaw Academy
 
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Balaji A...
 Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Balaji A... Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Balaji A...
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Balaji A...
Jigsaw Academy
 
Advanced certication in financial analytics e-brochure
Advanced certication in financial analytics e-brochureAdvanced certication in financial analytics e-brochure
Advanced certication in financial analytics e-brochureJigsaw Academy
 

More from Jigsaw Academy (12)

Taximan Challenge on Data Visualization - Vicky Crasto
Taximan Challenge on Data Visualization - Vicky CrastoTaximan Challenge on Data Visualization - Vicky Crasto
Taximan Challenge on Data Visualization - Vicky Crasto
 
Jigsaw Academy Cafe Great Contest - Winning Presentations
Jigsaw Academy Cafe Great Contest - Winning PresentationsJigsaw Academy Cafe Great Contest - Winning Presentations
Jigsaw Academy Cafe Great Contest - Winning Presentations
 
Jigsaw Academy Digital India Contest - Kerala & Tamil Nadu
Jigsaw Academy Digital India Contest - Kerala & Tamil NaduJigsaw Academy Digital India Contest - Kerala & Tamil Nadu
Jigsaw Academy Digital India Contest - Kerala & Tamil Nadu
 
Jigsaw Academy Digital India Contest - Andhra Pradesh & Karnataka
Jigsaw Academy Digital India Contest - Andhra Pradesh & KarnatakaJigsaw Academy Digital India Contest - Andhra Pradesh & Karnataka
Jigsaw Academy Digital India Contest - Andhra Pradesh & Karnataka
 
Jigsaw Academy Digital India Contest - Uttarakhand & Himachal Pradesh
Jigsaw Academy Digital India Contest - Uttarakhand & Himachal PradeshJigsaw Academy Digital India Contest - Uttarakhand & Himachal Pradesh
Jigsaw Academy Digital India Contest - Uttarakhand & Himachal Pradesh
 
The Jigsaw Team
The Jigsaw TeamThe Jigsaw Team
The Jigsaw Team
 
Topic 1 Introduction to web analytics
Topic  1   Introduction to web analytics Topic  1   Introduction to web analytics
Topic 1 Introduction to web analytics
 
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
 
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam An...
 Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam An... Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam An...
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam An...
 
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...
 Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds... Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds...
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...
 
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Balaji A...
 Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Balaji A... Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Balaji A...
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Balaji A...
 
Advanced certication in financial analytics e-brochure
Advanced certication in financial analytics e-brochureAdvanced certication in financial analytics e-brochure
Advanced certication in financial analytics e-brochure
 

Recently uploaded

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 

Recently uploaded (20)

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 

Jigsaw Academy Pexitics Student Projects

  • 2. Pexitics – Jigsaw Contest Preventive Maintenance Case Study
  • 3. Contents: Preparation Modeling and validation Model insights Recommen- dations Model monitoringBackground Approach Exploration Preparation Modeling and validation Model insights Clustering and profiling Recommen- dations
  • 5. Problem Statement:  The machine in the diagram shows high pressure air entering the machine and low pressure air exiting, while water pressure is applied at the bottom.  We can clearly see that the range for pressure 2 is lower than pressure 1 and pressure 3, hence indicating that pressure 2 can be water pressure  The lower pressure applied here can be due to 2 reasons:  The viscosity of water is higher than air  Water is incompressible unlike air
  • 6. Problem Statement: ▪ In 90,000 instances of maintenance work done by a company across various machines, 39.47% was the breakage rate in a year. ▪ The company now wants to build a model to predict breakdown by analyzing factors like pressure indicator points, machine lifetime, team usage, service provider and use preventive maintenance to reduce downtime. ▪ The company is also looking for measures in terms of creating rules across pressure levels , lifetime of a machine and create segments across teams and service providers to predict and prevent breakage.
  • 7. Approach Methodology & Stages to solve the Problem
  • 8. Our Approach • Assess Data • Descriptive Analysis Exploration • Missing Data • Outlier Detection • Qualitative to Quantitative Variables • Dummies • Dividing Datasets (50:50) Preparation • Logistic Regression • Cluster (K Means) Model Building • Perform Fit Statistics • Confusion Matrix • Confidence Intervals • Gains Charts • Rank Ordering • AUC & ROC Curve Model Validation • Visualizations • Recommendation Model Insights • Assess Model Performance Solution Monitoring
  • 10. Machine Breakage Total Breakage Rate = 39.4% 39.4%
  • 11. Breakage % by Teams - A,B,C ▪ Machines used by Team B have the highest breakdown (15%) as compared to Team A (12%) and Team C (12%) Team B – 15%
  • 12. Breakage % by Service Providers – 1,2,3 & 4 ▪ Manufacturer (Provider) 4 has the lowest breakdown rate (8%) as compared to manufacturers 1 (11%) & manufacturer 3 (11%) Manufacturer 4 – 8%
  • 13. Distribution of Machine Life (in months) ▪ At an overall level, the average lifetime of a machine is 55 months. ▪ All machines start breaking when lifetime reaches 60 months and continue to break till 93 months with peak at 60 - 80 months. 0 28.6 28.8 26.08 16.51 0 5 10 15 20 25 30 35 0 - 59 months 60 - 70 months 71 - 80 months 81 - 90 months 91 - 93 months Broken Broken Within broken = 1
  • 14. Distribution of Pressure Indicator Points 1,2 & 3 ▪ Most of the machines break at pressure indicator 1 between 63.0 to 112.5 of pressure. ▪ Most of the machines break at pressure indicator 2 between 78.1 to 102.0 of pressure. ▪ Most of the machines break at pressure indicator 3 between 70.1 to 110 of pressure.
  • 16. Breakage % across Providers & Teams
  • 17. Breakage % across Providers & Teams ▪ Amongst breakage, more than 60% of the machines for team B are provided from manufacturer 1 and manufacturer 3 (which are most likely to break) hence this team has a higher probability of breaking ▪ Provider 4 has the lowest breakdown-rate at team A and team B whereas provider 2 should be the preferred vendor for team C. Within broken = 1
  • 18. Current scenario across Providers & Teams ▪ Currently, team A is buying the highest number of m/c’s from provider 2 followed by provider 1, instead of their preferred vendor provider 4. ▪ Team B is using the lowest number of m/c’s from provider 4 (their preferred vendor) ▪ While, team C is taking highest number of m/c’s from provider 2 (their preferred vendor) Within broken = 0
  • 19. Exploration Within pressure indicator points, providers and teams
  • 20. Breakage Summary - Pressure Indicator 1, Teams & Providers • At pressure indicator 1, most of the machines break between 63.0 to 112.5 of pressure • In this faceted density graph, the probability of breakage is highest in Machines of Provider 3 used by Team A • Breakage Probability is also noticeably high with machines of Provider 4 to Teams B & C
  • 21. Breakage Summary - Pressure Indicator 2, Teams & Providers • At pressure indicator 2, most of the machines break between 78.1 to 102.0 of pressure. • In this faceted graph, except for machines provided by Provider 3 to Team B, all other instances of machines reflected a broken trend between 80-120 psi for pressure indicator 2. • High Breakage is noticed for Machines provided by Provider 2 to Team A at 100 psi. • Provider 3’s Machines have shown a noticeable non breakage for all teams at 90-110 psi
  • 22. Breakage Summary - Pressure Indicator 3, Teams & Providers • At pressure indicator 3, most of the machines break between 70.1 to 110 of pressure. • In this faceted graph, a high breakage probability is noticeable in Machines provided by Provider 3 to Team A and Team C, when pressure indicator 3 is between 90- 120 psi. • Non - breakage probability is relatively high when Team B uses machines from Provider 4 at 120 psi for pressure indicator 3.
  • 23. Breakage Summary - Machine Life, Average Pressure & Provider  Machines from provider 4 and 2 do not have breakage up to 80 months while machines from provider 1 & 3 have 100% breakage till the same point  Machines from Provider 4 show a consistent life irrespective of pressure points  Machines from Provider 2 have a longer lifetime; i.e. only 29% breakage happens till 89 months as compared to 100% of provider 4.
  • 24. Exploration Within machine life, providers and teams
  • 25. Breakage Summary - Machine Life, Teams & Providers • Machines provided by manufacturer 3 have least lifetime; breakdown completely at 66 months • Also, machines used from Provider 1 & 3 show maximum breakage • This graph also proves that, provider 4 should be the preferred vendor for Team A and Team B while provider 2 should be the preferred vendor for Team C basis likelihood of breakage.
  • 26. Breakage Summary - Machine Life > 60 Months, Teams & Providers • Machines provided by Provider 1 and 3 constitute more than 57% of the breakage. • 40% of machines from Team A & Team B show a breakage between 88 - 93 months.
  • 27. Preparation Data Checks, Missing Data, Outliers, Converting Qualitative variables to Quantitative Variables
  • 28. Dummy Variables for Modelling ▪ Binned Pressure Indicator Points & Life Time based on Quartiles ▪ Converted Teams & Providers into numeric variables ▪ Average of Three Pressure Points as Variable created ▪ Training and Validation Datasets split with ratio 50:50 Dataset 90000 observations & 8 variables Target Variable Broken, Binary Variable Data Type Character Variables =2, Numeric Variables = 6 Missing Value No Missing Values Factors Machine Life, Pressure Indicator Points, Teams , Providers
  • 29. Outlier Detection - Box Plots ▪ Outliers seen at pressure points but they have been taken into consideration while doing the modelling ▪ Data for Lifetime variable looks consistent; breakage appears post 59 months
  • 30. Modelling & Validation Logistic Regression, Clustering & Validation Techniques
  • 31. Explanation: Logistic Regression using SAS ▪ Fisher’s Binary Logit was performed with 45,000 observations read & used under training dataset ▪ Response Profile in the model tells us how many observations were "successful”; The model over here is estimating the probability of the target variable “Broken” where broken = 1. ▪ Model Convergence status is satisfied; i.e. to get an indication if the algorithm converged well as we are using an iterative procedure to fit the model. A satisfied convergence criteria also means that all the independent variables are significant and enough in number. ▪ Model Fit Statistics – AIC, SC and -2LogL (Lack of fit) - Values range up to 60,431 after variable transformations and bucketing. It is likely that we are getting a high number here as we don’t have a high number of independent variables. ▪ Testing Global Null Hypothesis: Beta = 0, corresponds to the likelihood ratio, score and Wald tests, where the likelihood ratio is used to comparing the goodness of fit of two statistical models – Null and Alternative. Here, for a good model atleast 1 of the 3 values need to be significant (<0.05 at 5% level of significance) for us to reject the null hypothesis. For our model, all the 3 variables are significant at 5% level of significance.
  • 32. ▪ Maximum Likelihood Estimator (MLE) is a method of estimating the probability of parameters which influence the predictor variable; P-values signifies that likelihood is purely by randomness and is not biased. This also gives you an estimate value which tells us that with a unit change in x1, it will lead to a β1 change in the log of the odds ratio of the probability of success of Y. A positive estimate value shows a directly proportional relationship with the dependent variable while a negative estimate value shows an inversely proportional relationship with the same. ▪ Odds Ratios Estimate gives the upper and lower limit for the confidence intervals signifying that 95% of the time, the estimate will be in the given range. This also gives you a point estimate which tells us that with a unit change in x1, it will lead to a β1 change in the log of the odds ratio of the probability of success of Y. ▪ Concordant and Discordant Pairs is the total number of possible pairs of (1’s) and (0’s), since it’s a binary classification model. Every possible combination of 0 & 1 is counted v/s what the model predicts will be the outcome. Every pair of response variable (0,1) combinations are reviewed. The number of times the predicted probability of 0 was greater than the probability of 1 (and vice versa) are counted. This is then compared to the real life probabilities of 0 and 1. If probability of 0/default is higher where there was a default in real life relative to where there wasn’t a default in real life it is a concordant pair, else it’s a discordant pair. The higher the concordance ratio, the better is the model. Our model has a concordant percentage of 94.3. Explanation: Logistic Regression using SAS
  • 33. ▪ Classification Table : The classification table is another method to evaluate the predictive accuracy of the logistic regression model. In this table the observed values for the dependent outcome and the predicted values (at a user defined cut-off value, for example p=0.50) are cross-classified. Our model correctly predicts 86.7% of the cases at a cutoff value of 0.50. ▪ Gain chart / Lift curve : It is a popular technique to measure the performance of a logistic regression model. ▪ Rank ordering : For our model, the rank ordering is well maintained since response rate decreases by decile. We have also captured 72% of the outcomes in the first 3 deciles. This is shown in the excel provided alongside this presentation. ▪ The entire dataset is divided into training and validation dataset in the ratio of 50:50. The model is first built on the training dataset and then validated on the validation dataset. We have successfully validated our model on the validation dataset with the same significant variables. Explanation: Logistic Regression using SAS
  • 34. Model Insights Training dataset - Visualizations of the model
  • 35. Model Insights  The variables that are significant according to the model and their explanation is given below:  Lifetime 80 – Lifetime of machines ranging from 61 months to 80 months. This variable has a positive estimate value signifying that an unit increase in the number of months will lead to an increase in the log of the odds ratio of the probability of success of broken by 5.4286.  Lifetime 93 – Lifetime of machines ranging from 81 months to 93 months. This variable also has a positive estimate value signifying that an unit increase in the number of months will lead to an increase in the log of the odds ratio of the probability of success of broken by 8.2307.  Insight - Hence the machines should be replaced by month 60.
  • 36. Model Insights  PressureInd_197 – Pressure indicator 1 ranging from 85.01 to 97.00. This variable has a negative estimate value signifying that an unit increase in pressure indicator 1 within this range will lead to a decrease in the log of the odds ratio of the probability of success of broken by -0.6753.  PressureInd_1115 – Pressure indicator 1 ranging from 97.01 to 115.00. This variable also has a negative estimate value signifying that an unit increase in pressure indicator 1 within this range will lead to a decrease in the log of the odds ratio of the probability of success of broken by -0.3858.
  • 37. Model Insights  PressureInd_2100 – Pressure indicator 2 ranging from 88.01 to 100.00. This variable also has a negative estimate value signifying that an unit increase in pressure indicator 2 within this range will lead to a decrease in the log of the odds ratio of the probability of success of broken by -0.129.  PressureInd_3101 – Pressure indicator 3 ranging from 88.01 to 101.00. This variable also has a negative estimate value signifying that an unit increase in pressure indicator 3 within this range will lead to a decrease in the log of the odds ratio of the probability of success of broken by -0.2199.
  • 38. Model Insights  PressureInd_3173 – Pressure indicator 3 ranging from 114.01 to 173.00. This variable has a positive estimate value signifying that an unit increase in pressure indicator 3 within this range will lead to an increase in the log of the odds ratio of the probability of success of broken by 0.2485.  Insight – Machines at lower pressure points like p197, p115, p2100 and p3101 are less likely to break at various pressure points (have negative estimate value) as compared to p3173 which has a higher likelihood (positive estimate value) of breakage.
  • 39. Model Insights  Team B and Team C are significant variables with Team A as base with a positive estimate value signifying that with increase in the number of machines used by these teams there will be an increase in the log of the odds ratio of the probability of success of broken by 0.2079 and 1.1111 respectively  Insight – Machines used by Team C are more likely to break as compared to Team B Team CTeam B
  • 40. Model Insights  Provider 3 is a significant variables with Provider 1 as base with a positive estimate value signifying that with increase in the number of machines provided by provider 3 there will be an increase in the log of the odds ratio of the probability of success of broken by 3.2723.  Provider 4 is a significant variables with Provider 1 as base with a negative estimate value signifying that with increase in the number of machines provided by provider 4 there will be a decrease in the log of the odds ratio of the probability of success of broken by -1.9197.  Insight – Machines provided by provider 3 are more likely to break (high positive estimate value) as compared to machines provided by provider 4 (high negative estimate value). Provider 4 Provider 3
  • 41. Model Insights - Visualization Lifetime 80 – 61 to 80 months Lifetime 93 – 81 to 93 months PressureInd_3173 – 114.01 to 173.00 Team B Team C Provider 3 Top 6 factors affecting machine breakage:
  • 43. Profiling – Cluster 1 Variable N Cluster Mean Pop mean Pop std. dev Z-value Lifetime 7744 42.87 55.09 26.51 -0.46 Broken 7744 0.01 0.39 0.49 -0.78 Pressure Indicator 1 7744 103.17 98.56 19.98 0.23 Pressure Indicator 2 7744 99.04 99.34 10.04 -0.03 Pressure Indicator 3 7744 95.01 100.59 19.62 -0.28 Team 7744 1.96 1.97 0.8 -0.01 Provider 7744 3.85 2.47 1.11 1.24  7744 machines, 9% of the total machines  Here we want to focus the most on the dependent variable “broken” which is much lower in this cluster than normal. Hence showing that these machines are least likely to break within its product lifecycle (PLC).  But lifetime of the machines is lower than average indicating that these m/c’s need to be replaced sooner than others.  Provider though is much higher than normal hence signifying that m/c’s bought from manufacturer 4 (as mean is very close to 4) are least likely to break.
  • 44. Profiling – Cluster 2 Variable N Cluster Mean Pop mean Pop std. dev Z-value Lifetime 11332 45.02 55.09 26.51 -0.38 Broken 11332 0.16 0.39 0.49 -0.47 Pressure Indicator 1 11332 97.29 98.56 19.98 -0.06 Pressure Indicator 2 11332 99.9 99.34 10.04 0.06 Pressure Indicator 3 11332 124.63 100.59 19.62 1.23 Team 11332 1.94 1.97 0.8 -0.04 Provider 11332 2.23 2.47 1.11 -0.22  11332 machines, 13% of the total machines  Here also we want to focus the most on the dependent variable “broken” which is much lower in this cluster than normal. Hence showing that these machines are less likely to break within its product lifecycle (PLC).  Here too, the lifetime of the machines is lower than average indicating that these m/c’s need to be replaced sooner than others.
  • 45. Profiling – Cluster 3 Variable N Cluster Mean Pop mean Pop std. dev Z-value Lifetime 16231 38.55 55.09 26.51 -0.62 Broken 16231 0.22 0.39 0.49 -0.35 Pressure Indicator 1 16231 97.69 98.56 19.98 -0.04 Pressure Indicator 2 16231 98.75 99.34 10.04 -0.06 Pressure Indicator 3 16231 94.89 100.59 19.62 -0.29 Team 16231 2.43 1.97 0.8 0.58 Provider 16231 2.15 2.47 1.11 -0.29  16231 machines, 18% of the total machines  Here again we want to focus the most on the dependent variable “broken” which is lower in this cluster than normal. Hence showing that these machines are less likely to break within its product lifecycle (PLC).  Here too, the lifetime of the machines is lower than average indicating that these m/c’s need to be replaced sooner than others.  Pressure indicator 3 though is much lower than normal hence signifying that m/c’s used here are more likely to break at pressure indicator 3. Team and provider too is different than average hence proving that manufacturer 2 (mean < 2.5) should supply m/c’s to Team 2.
  • 46. Team-wise and provider-wise cluster distribution  Team A and Team C use the majority of the machines in all the clusters, hence signifying the fact that machines used by Team A and Team C are least likely to break (among clusters with the least probability of break-down).  Provider 4 is providing almost all the m/c’s in cluster 1 which is least likely to break Cluster N Team A Team B Team C 1 7744 49.0 6.0 45.1 2 11332 47.5 10.6 41.9 3 16231 28.5 0 71.5 Cluster N Provider 1 Provider 2 Provider 3 Provider 4 1 7744 3.7 2.1 0 94.2 2 11332 35.0 31.8 8.1 25.1 3 16231 28.1 28.4 43.5 0
  • 48.  Sensors Installation  We recommend installing sensors and soft-wares which can predict/flag breakage at a safe lifetime and pressure points.  Service repair as a measure to Preventive Maintenance  Regular maintenance protects your investment against unplanned breakdowns and hence, we recommend a scheduled repair of machines on different time intervals as follows: Recommendations Provider Time Interval (between months) Total No. Of Services Flag-off Month (Sensors) Provider 1 35-80 3 34 Provider 2 60-93 3 59 Provider 3 35-66 3 34 Provider 4 60-89 3 59 *The starting pt. for servicing for m/c’s from provider 1 & 3 is 35 months basis cluster analysis (least likelihood of breakage) – as they are most prone to breakage while for m/c’s from provider 2 & 4 it is 60 months as there is no breakage till that point. *The ending point for the m/c’s provided by all the providers is decided basis the entire life-span of the machine.
  • 49.  Sensors and software at pressure points 1,2 & 3 to predict breakage  If we can apply the sensors at all the 3 pressure points (at different pressure levels), we can easily predict/prevent a machine from breaking due to increase in pressure levels. Recommendations Pressure point Pressure with highest likelihood of breakage Flag-off pressure points (Sensors) Pressure point 1 63.0 – 112.5 55.0 Pressure point 2 78.1 – 102.0 70.0 Pressure point 3 70.1 – 110.0 62.0 *The sensors are applied at the above mentioned pressure levels so as to give enough time for the teams to shut off the m/c or prevent excess pressure on the machines.
  • 50.  Preferred Vendor to Teams  Provider 2 should be the preferred vendor for Team C basis likelihood of breakage.  Provider 4 should be the preferred vendor for Team A and Team B;  Based on clusters analysis, with least likelihood of breakage, we believe that to successfully prevent a machine breakdown we need to start servicing the machines from their 35th month into operation.  Last but not the least, for future, it would be advisable to not purchase machines from Provider 1 and Provider 3; but as for now, we have provided the necessary steps that need to be taken to prevent breakage from these manufacturers.  Also, it would be interesting to understand factors as to why machines were bought from these providers – Cost Cutting? Logistic Feasibility? Goodwill? Recommendations
  • 52. PEXITICS – PROACTIVE MAINTENANCE STRATEGY • Lakshmi Kulkarni • Meghashree R • Mayur Lalwani
  • 53. Provider 1 - Insights Observations from given data : • Max age of the machine is 80. 1st breakdown observed at 73 , irrespective of factors ( pressure points ) • Broken (machine breaking down )is fully dependent on lifetime of the machine. • Pressure Indicator 1 (PI1) and Pressure Indicator 3(PI3) are having a negligible positive impact on machine breakdown. • Pressure Indicator 2 (PI2) is having a negligible negative impact on machine breakdown. -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 1 Relationship of parameters with “Broken” lifetime pressureInd_1 pressureInd_2 pressureInd_3
  • 54. Insights from Analysis: 1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 64 with p(BD) at 5% approx. 2. For max values of PI1 & PI3 , and min values of PI2 , the p(BD) is 22% with BDA being 64(chart1 below). 3. For min values of PI1 & PI3 , and max values of PI2 , the p(BD) is 1% with BDA being 64. 4. Based on Points 1 through 3 above , we have used Break Down Age (BDA) as 64 and Maximum Cut Off Age as (MCA) as 67 (mean values of PI1, PI2 and PI3). Note : Use the RUL Estimator to calculate the values shown above in the screen print. Input table for different variables RUL estimator table lifetime pressureIn d_1 press ureInd _2 pressureI nd_3 Probability of breakdown => p(BD) In percentage Break Down Age (BDA) RUL Provider 1 64 151 74 172 0.218811901 21.88119006 64 0 0 5 10 15 20 25 61 62 63 64 %ofBreakDown Life Time Chart 1 - % of BD v/s Life Time RUL based on max values of PI1 & PI3 , and min values of PI2 Input table for different variables RUL estimator table lifetim e pressur eInd_1 pressureI nd_2 pressure Ind_3 Probability of breakdown => p(BD) In percentage Break Down Age (BDA) RUL Provider1 64 99 99 100 0.057757457 5.7757456 59 64 0 RUL based on mean values of PI1 , PI2 and PI3
  • 55. Provider 2 - Insights Observations from given data : • Max age of the machine is 93. 1st breakdown observed at 85 , irrespective of factors ( pressure points ) • Broken (machine breaking down )is fully dependent on lifetime of the machine. • Pressure Indicator 1 (PI1) and Pressure Indicator 3(PI3) are having a negligible positive impact on machine breakdown. • Pressure Indicator 2 (PI2) is having a negligible negative impact on machine breakdown. 1 Relationship of parameters with "Broken" lifetime pressureInd_1 pressureInd_2 pressureInd_3
  • 56. Insights from Analysis: 1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 77 with p(BD) at 5% approx. 2. For max values of PI1 & PI3 , and min values of PI2 , the p(BD) is 3129.46% with BDA being 77 (Chart 1 below). 3. For max values of PI1 & PI3 , and min values of PI2 , the p(BD) is 5% approx. with BDA being 66. 4. For min values of PI1 & PI3 , and max values of PI2 , the p(BD) is 0.020% with BDA being 77. 5. Based on points 1 through 3 above , we have used Break Down Age (BDA) as 66 and Maximum Cut Off Age as (MCA) as 69 (max values of PI1 & PI3 , and min values of PI2 ). Note : Use the RUL Estimator to calculate the values. Input table for different variables RUL estimator table lifetime pressure Ind_1 pressureI nd_2 pressureIn d_3 Probability of breakdown => p(BD) In percentage Break Down Age (BDA) RUL Provid er2 66 173 70 155 0.054603744 5.4603744 42 77 11 Input table for different variables RUL estimator table lifetim e pressureI nd_1 pressure Ind_2 pressureInd _3 Probability of breakdown => p(BD) In percentag e Break Down Age (BDA) RUL Provid er2 77 173 71 155 31.29465076 3129.465 076 77 0 0 500 1000 1500 2000 2500 3000 3500 66 67 68 69 70 71 72 73 74 75 76 77 %ofBreakDown Life Time Chart 1 - % of BD v/s Life Time RUL based on max values of PI1 & PI3 , and min values of PI2 RUL based on max values of PI1 & PI3 , and min values of PI2
  • 57. Provider 3 - Insights Observations from given data : • Max age of the machine is 66. 1st breakdown observed at 60 , irrespective of factors ( pressure points ) • Broken (machine breaking down )is fully dependent on lifetime of the machine. • Pressure Indicator 1 (PI1), Pressure Indicator 2 (PI2) and Pressure Indicator 3(PI3) are having a negligible positive impact on machine breakdown. • . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 Relationship of parameters with "Broken" lifetime pressureInd_1 pressureInd_2 pressureInd_3
  • 58. Insights from Analysis: 1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 54 with p(BD) at 5% approx. 2. For max values of PI1 , PI2 & PI3 , the p(BD) is 52.35% with BDA being 54 (Chart 1 below). 3. For min values of PI1, PI2 and PI3 , the p(BD) is 0.7749% with BDA being 54. 4. Based on points 1 through 3 above , we have used Break Down Age (BDA) as 50(5%) and Maximum Cut Off Age as (MCA) as 53 (max values of PI1, PI2 and PI3). Note : Use the RUL Estimator to calculate the values. 0 20 40 60 80 100 120 51 52 53 54 55 %ofBreakDown Life Time Chart 1 - % of BD v/s Life Time RUL based on max values of PI1 , PI2 and PI3 Input table for different variables RUL estimator table lifetim e pressureI nd_1 pressureI nd_2 pressure Ind_3 Probability of breakdown => p(BD) In percentag e Break Down Age (BDA) RUL Provider3 54 152 123 148 0.51615260 1 51.61526 007 54 0 RUL based on max values of PI1 PI2 & PI3 Input table for different variables RUL estimator table lifetim e pressureIn d_1 pressureInd _2 pressureInd _3 Probability of breakdown => p(BD) In percentag e Break Down Age (BDA) RU L Provider3 51 152 123 148 0.065209324 6.520932 365 54 3
  • 59. Provider 4 - Insights Observations from given data : • Max age of the machine is 89. 1st breakdown observed at 81 , irrespective of factors ( pressure points ) • Broken (machine breaking down )is fully dependent on lifetime of the machine. • Pressure Indicator 1 (PI1) and Pressure Indicator 2 (PI2) are having a negligible negative impact on machine breakdown. • Pressure Indicator 3 (PI3) is having a negligible positive impact on machine breakdown. -0.1 0 0.1 0.2 0.3 0.4 0.5 1 Relationship of parameters with "Broken" lifetime pressureInd_1 pressureInd_2 pressureInd_3
  • 60. Insights from Analysis: 1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 71 with p(BD) at 5% approx. 2. For min values of PI1, PI2 and For max values of PI3 , the p(BD) is 5.9% with BDA being 63(Chart 1 below). 3. Based on points 1 through 3 above , we have used Break Down Age (BDA) as 63 and Maximum Cut Off Age as (MCA) as 67 (min values of PI1, PI2 and max PI3. Note : Use the RUL Estimator to calculate the values. RUL based on min values of PI1 PI2 & max PI3 RUL based on min values of PI1 PI2 & max PI3 0 5 10 15 20 25 30 35 63 64 65 66 67 %ofBreakDown Life Time Chart 1 - % of BD v/s Life Time Input table for different variables RUL estimator table lifetime pressur eInd_1 pressureI nd_2 pressureIn d_3 Probability of breakdown => p(BD) In percentage Break Down Age (BDA) RUL Provider4 67 36 77 149 0.300035998 30.003599 76 66 -1 Input table for different variables RUL estimator table lifetim e pressure Ind_1 pressureInd _2 pressureInd_ 3 Probability of breakdown => p(BD) In percentage Break Down Age (BDA) RUL Provider4 63 36 77 149 0.0591870 32 5.9187031 75 66 3
  • 61. Preventive Maintenance Strategy Provider s Break Down Age Max cut-off Age Provider 1 64 67 Provider 2 66 69 Provider 3 50 53 Provider 4 63 67• Break Down Age: Life time at/after which intimate the respective provider for maintenance. • Max Cut-off Age: Life time by which maintenance should be completed, else machine may break down any time. • After first cycle of maintenance is over ,the same time period should be considered.
  • 63. Pexitics Preventive Maintenance Project Submitted by Vicky Crasto & Arijit Mitra Problem Statement – Data related to machine breakdowns is provided which must be used to predict future occurrences and create a framework for preventive maintenance Note – The entire R Code is available in the server - C:Jig1324221-Pexitics Case study Overall Approach • Understand the distribution of life time and breakdown across team and provider. • Understand the distribution of life time and break down across team and provider basis the 3 Pressure indices • Interaction between the 3 pressure indices and understand the distribution of break down • Distribution of breakdown across teams and providers. • Plotting Kaplan Meir survival function for the machine breakdown • Plotting Kaplan Meir survival function basis team and provider • Testing statistically if the survival function is different across team and provider. • Determine the parametric survival function and the appropriate distribution. • Use the regression model to predict the life time of the machine and identify the machine to be replaced urgently. • Determine cox proportional model to determine the hazard rate and understand the relation between the covariates.
  • 64. Distribution of lifetime basis machine status We clearly see that the lifetime of machines that have broken is more than 60 months, with the median value around 79 months On the other most of the hand machine that have not been broken have a lifetime between 20 to 60 months. Overview of the data Total no. of observation - 90000 39.47% of machine are broken Machine status Team A Team B Team C Not broken 22% 21% 18% broken 12% 15% 12% Distribution of the machines basis status Machine status Provider 1 Provider 2 Provider 3 Provider 4 Not broken 14% 18% 13% 16% broken 11% 9% 11% 8%
  • 65. Understand the distribution of life time and breakdown across teams and provider The lifetime of the machines manufactured by Provider 3 is lower than the remaining. This needs to be further tested The lifetime of the machines belonging to Team C is lower than the remaining. This needs to be further tested
  • 66. Understand the distribution of life time and breakdown across teams and provider basis PressureInd1 The lifetime of machines belonging to TeamC have less lifetime with respect to pressureind_1 As highlighted machines from Provider3 seem to have a lower lifetime with the pressureind_1 spread across the range.
  • 67. Understand the distribution of life time and breakdown across teams and provider basis PressureInd2 The lifetime of machines belonging to TeamC have less lifetime with respect to pressureind_2 As highlighted machines from Provider3 seem to have a lower lifetime with the pressureind_2 spread across the range.
  • 68. Understand the distribution of life time and breakdown across teams and provider basis PressureInd3 The lifetime of machines belonging to TeamC have less lifetime with respect to pressureind_3 and they seem to breaking down at 3 distinct levels As highlighted machines from Provider3 seem to have a lower lifetime with the pressureind_3 spread across the range and also the break down has been occurring at two distinct levels.
  • 69. Understand the distribution of breakdown basis the interaction between pressure indices As we see in the plots, machines tends to break at a lower pressure for pressureind_3 compared to the other 2 pressure indices.
  • 70. Distribution of breakdown across teams and providers As highlighted the proportion of broken machines is higher in • Provider 1 and Team A • Provider 1 and Team B • Provider 3 and Team B This needs to be investigated further.
  • 71. Plotting Kaplan Meir survival function for the machine breakdown Things to note • The survival probability decreases with increase in the lifetime. • At each level of the lifetime, the number of machines at risk are lower as the number of machines censored are also removed. Censored machines means, observation which • Machines which does not experiences the event • Machine that is lost during the follow-up period • Machine which has withdrawn from the study Table showing the survival probability each level of lifetime Kaplan Meir survival plot
  • 72. Plotting Kaplan Meir survival function for the machine breakdown basis teams Things to note • Survival curve for Team C is different than Team A and B. Table showing the survival probability each level of lifetime Kaplan Meir survival plot across team
  • 73. Plotting Kaplan Meir survival function for the machine breakdown basis provider Things to note • Survival curve for each provider is different. This must be tested statistically. Table showing the survival probability each level of lifetime Kaplan Meir survival plot across provider
  • 74. Testing statistically if the survival function is different across team and provider. Things to note • We see that the p value is almost zero, indicating we reject the null hypothesis. We conclude that the survival function across team and provider is different. • Here the p value is not zero but almost zero. Since the dataset is huge, very small difference is magnified and found to be significant. Using Log-rank test to check if the survival function is different across groups H0 – Survival function is same across the groups H1 – Survival function is different across the groups Log Rank Test output for team Log Rank Test output for provider
  • 75. Determine the parametric survival function and the appropriate distribution Parametric models assume the knowledge of the survival or density function up to K unknown parameter. However we need to determine the distribution of the underlying survival function. For this we create regression models with different distribution and check basis the log likelihood value, which fit the data the best. Below are the log likelihood values for the various distribution The lognormal distribution has the lowest log likelihood value and hence fits the data the best. Using this regression model we predict the lifetime of the machines which has not been broken down and determine the remaining lifetime of the machine.
  • 76. Identifying the machine to replaced urgently Using the remaining lifetime, we divide the machines into three groups Remaining lifetime Label Less than 15 months Need urgent attention 15 to 50 months Maintenance needed in Short term Greater than 50 months Maintenance needed in Long term As highlighted machines manufactured by provider 3 and belonging to Team A and Team B have a higher proportion of machine in need of maintenance on immediate basis.
  • 77. Determine Cox proportional hazard model to determine the hazard rate and understand the relation between the covariates Cox proportional hazard model is a semi - parametric model which does not assume any underlying distribution for the hazard function but assume some distribution for the covariates. The output of the model is shown below We see that all the 3 pressure index are significant . Along with that the interaction between Team B and Provider2 , and interaction between Team B and Provider 3 are significant. The exp(coefficient ) are very small and make it very difficult to interpret into a meaningful equation. On the whole model explains 71% of the variance in the data which good. Significance of the individual covariates and their interaction Performance of the model
  • 78. Recommendations • The management must look into the machines belonging to Team B and the machines manufactured by Provider 3. • Preventive maintenance must be carried out as per the labels provided from the parametric regression model. • Moreover, best practices of machine maintenance carried out by Team A and Team C must be documented and shared with all the teams. • Machine manufacture audit can be carried out to understand the quality of the spares used in the machine, so that frequent breakdown of machine can be avoided. Areas of improvement for the model • Determine the performance of the parametric model by divided the data into model and validation dataset. Plot the lift chart to determine how well the model is working. • Fine tune the Cox proportional hazard function and determine the hazard ratio for each covariate. Reference • PD Allison (1995). Survival Analysis Using SAS: A Practical Guide. SAS Publishing • https://www.analyticsvidhya.com/blog/2015/05/comprehensive-guide-parametric-survival-analysis/ • http://www.sthda.com/english/wiki/survival-analysis-basics • http://www.biostat.umn.edu/~wguan/class/PUBH7402/notes/lecture11.pdf • Michael J. Crawley The R Book Imperial College, Silwood Park, Ascot, Berks
  • 80. Pexitics Preventive Maintenance Data Analysis Project - Deepeshkumar Malviya
  • 81. OBJECTIVE & SCOPE Pexitics would like to build a model to predict machine breakdown and use preventive maintenance to reduce downtime. The insights and results obtained from predictive model thus created should be indicative enough to create framework for identification of breakdowns and also suggestive enough to enable stakeholders for taking corrective actions. Understanding of the Data: The dataset provided 90000 instances of maintenance work done for various machines across 1 year. It appears to comprise periodic instances of sensor data picked up from a random sample of observations during the period wherein each record provide details for instance readings of respective machine status in form of – • Readings of pressure indicators at particular instance • Only machine specific attribute in form of ‘Lifetime’ • Representative information about usage (team handing machine in the factory) • Machine entity (manufacturer) information • Random observation number • Breakdown status It is imperative to preprocess/rationalize the given data to check data sanity, create derived variables and possibly restructure data based on assumptions for model building. At the outset, data appear to be sensor data, so it is rather very structured.
  • 82. APPROACH Major part of the analysis revolves around pressures, stakeholders involved and lifetimes in decreasing order of importance. Since the machine functioning parameters are not given; those are assumed to be standard; a lot depends on the interplay of the external environmental factors. There has to be controllable element for changing the pressures but the right combination for optimal performance of machines given the wear and tear and the handling of the machines seem to be complex in nature. Hence, the failures behavior does not seem to be linear in nature and hence cannot be predicted in straightforward fashion. Important Considerations: Three types of interactions are important from understanding of breakdown phenomenon and later model building, arising out of segmented behaviors on these interactions – A. Interactions among Stakeholders: The interaction of variables ‘Team’ and ‘Provider’, the rationale being two-fold: - The probability of failures will highly depend on the usage patterns; even when the machines are standard and reliable. - Serious defects at the time of manufacturing will hamper the performance of the machines even though the usage patterns are standard. In practical business scenarios, the accountability of machine maintenance lies with the team/factory/department and hence financial obligations. Also, the SLAs and contracts for machines from provider/company gets reviewed after a holistic review in the entire organizational setup wherein multiple team/factory/department provides their performance reports. Hence, the interaction between variables ‘team’ and ‘provider’ cannot be viewed separately. The collective assessment will give better insights to net effect. B. Interactions among Pressures: The pressures ought to have significant interactions among themselves governed by domain specific as well as process specific laws of applied physics and mathematics. These pressures can also work as environmental stressing conditions.
  • 83. APPROACH C. Interactions arising out of Lifetimes : Although the lifetimes are in months; the records do not belong to equally spaced time periods; meaning that random observations do not lead to time series implications. The importance of the lifetime data gets reduced also from the fact that the spread of the random observations are not equal for all the lifetimes. Hence, no analysis is done on lifetimes as main driving factor; rather it is used as a supplementary information (though very useful) to derive failure behavior based on age. Broad Assumptions about the data: • The variable ‘S.no’ only signifies chronological readings. • Each machinery breakdown reading has no dependency on the breakdown behavior of subsequent breakdown reading. • At any given instance of recordings of data, no conditional or joint probability exists for pressures acting on one machine of ‘x’ lifetime with pressures acting of another machine of ‘y’ lifetime from same manufacturer (e.g. Provider1) and that belonging to same factory (e.g. TeamA). • The breakdown as in 1 in variable ‘broken’ signifies complete breakdown and not partial working condition. Techniques/ Methods of Analysis: • Cluster Analysis: Unsupervised Learning method for segmentation based on distance measure (proximity). • Markov Chain Model: A stochastic (random) model for deriving sequence of events and then probability of events depending on previously attained events. • Exploratory Analysis: Involving Data Manipulation and Data visualizations to draw insights in the modeling process.
  • 84. Cluster Analysis: Rationale: It aims to identify segments that exhibit similar behavior towards failure/machine breakdowns conditions. Premises, Importance & Thought Process: The variable ‘broken’ mentions about failures in a very objective sense as 0 and 1. To deduce qualitative information about the data, it is imperative to obtain patterns in the data beyond binary outcomes. Except failure status, given data is used to derive important metrics (explained in detail in the next section) as a proxy to indicate interplay of factors influencing failure behavior. Clustering analysis take these metrics for analyzing unexpected fluctuations from normal conditions. It is aimed at finding distinct segments based on working conditions without knowledge of baseline threshold working model; purely based on variance in the data. The cluster analysis considers only fluctuations of the pressures for the set of conditions taking into account net effect of ‘team’, ‘provider’ and ‘lifetime’. Since there is no benchmark available, the fluctuations indicating ‘above’ or ‘below’ standard working conditions for performance is obtained by numerically assessing distribution for the given phenomenon. One of the key part of analysis is ‘standardization’ of the data along it’s mean & measured in terms of standard deviations. Data for clustering comprise of unique records for ‘lifetime’ and three pressure values. This is relatively small dataset (1000 records) but includes all the possible values that the pressures can take for all given lifetimes. The analysis required multi-phased sequential re-clustering to capture finer fluctuations. The rationale is that these finer fluctuations can take different sizes on complete data and hence none of those could have been neglected. APPROACH
  • 85. Markov Chain Model: Rationale: It aims to evaluate and establish probabilistic nature of failure conditions. Premises, Importance & Thought Process: In absence of time element in the analysis, it is not possible to evaluate or establish any time-based metric. Hence, a lot of Age-to-Failure (Mean Time Between Failure) analysis; and Life Data analysis (parametric Weibull Distribution) along with associated Time dependent Reliability Analysis of machinery breakdowns cannot be performed. To facilitate analysis on ‘state-space’ as supposed to ‘time- parameter’, Markov chain model is used. It aids prediction on future states solely based on the inter-relationships of sequential occurrence of states in the past. Conventionally, the input to Markov chain model are the distinct states. The rationale behind cluster analysis is to identify these states. Since the data is chronologically arranged for the combination of ‘team’ & ‘provider’ along with ‘lifetime’, the cluster membership will reflect the sequential states along which the machine breakdown progresses through normal, possibly sub-optimal conditions and then failures. The results from Markov Chain mention ‘transition probabilities’ i.e. the likelihood of going from one state to another. Since, the cluster memberships are straightaway considered as states; there is a scenario wherein the two sequential states are not to be considered. This happens when two non-comparable states come together. The situation arises in the data when data for two consecutive rows are for two different sets e.g. ‘TeamA_Provider1’ and ‘Team_Provider2’. These cases are very few in total data as it is sorted accordingly to avoid the same. The resultant probabilities of such cases are too less and do not make an overall impact. APPROACH
  • 86. Exploratory Analysis: Rationale: It is used for unearthing insights about the failure behavior at various stages. Exploratory analysis aims at guiding course of analysis and also at critical junctures while evaluating parameters for statistical model customization. Visualizations are important part of the exploratory analysis, and it is used at specific occasions. Functionally has following importance in Analysis : A. Insights-driven (both pre and post modeling) judgment specific: Since the whole approach of analysis is derived metrics oriented, it is used for verifying the suitability of the application of such metrics from business point of view, mainly before modeling. Post modeling, it provides interpretability and aid to infer important business critical information. B. Modeling Diagnostics (Model improvement) & Results (Analytical importance) specific : The results of the clustering diagnostics are influenced by thumb rules mentioning best practices about model-specific parameters. Clustering results are shown for the performance of key metrics only with respect to failures. The results obtained from Markov chain model are included for providing better business context i.e. the probabilities obtained are linked to cluster profiles exhibiting transient (changing) failures. APPROACH
  • 87. STEPS IN ANALYSIS A. Data Preparation: The outliers in the dataset are identified based on evaluation of the derived metric but no special treatment is done owing to two reasons – a. Clustering is sensitive to outliers, hence those will anyways get filtered. b. Markov Chain denote probabilities of co- occurrence, outliers will have too less probabilities and hence will be ignored. Data preparation is carried on following lines. Derived Variables creation: Derived variables are created to help summarisation of the data and also define the units of aggregation. This ultimately are key to drawing insights & building a model around those. In the course of data manipulation throughout the analysis, many variables are created but only important variables are listed below: 1. ‘team_provider’ & ‘life_pres_all’: Concatenated variables indicating interactions among given variables. 2. ‘pr1_pr2_corres_pr3’, ‘pr2_pr3_corres_pr1’, ‘pr3_pr2_corres_pr1’: Calculated metrics capturing interactions in pressure values among each other. It implies product of first two pressure values, divided by the third. So, all changes get captured. 3. ‘normz_int1’, ‘normz_int2’, ‘normz_int3’: Standardized scores implying differences from mean (for the grouped data on lifetime and team_provider) in terms of standard deviations for the three variables created in pt.2 above. 4. ‘std_int1’, ‘std_int2’, ‘std_int3’: Based on standardized scores philosophy, population statistics are compared with cluster statistics. Interaction 1 (int1) stand for first metric in pt.2 above and likewise for other two interactions. 5. ‘Int1_Inc_ge_0.45_1_stdev’, ‘Int1_Inc_ge_1_1.5_stdev’ ’, ‘Int1_Inc_gt_1.5_stdev” : For ‘std_int1’ in pt.4 above, magnitude of increase in standard deviations in three distinct levels. Likewise for three levels of decrease and then for other two interactions. It is used for cluster profiling and naming states in Markov Chain terminology. 6. Links & Nodes: It is used majorly in network diagram and has been used a lot in data manipulation to get desired matrices.
  • 88. STEPS IN ANALYSIS B. Broad Steps in Analysis: 1. Initial Exploratory Analysis to understand the data and know the distributions. 2. Creation of Derived Variables, potential outliers detection (not removal) based on distribution of the calculated metrics. 3. Cluster Analysis: a. Creation of Data for Clustering and performing data checks/ exploration. b. Hierarchical Clustering to know the optimal number of clusters by initially creating dissimilarity matrix and visually confirming through dendrogram by applying different methods of linkage between clusters. c. Scaling complete data for clustering and perform detailed Clustering Diagnostics on scaled data to arrive at optimal clusters. d. Again performing Hierarchical Clustering for optimal number of clusters to get the cluster centers for K-means clustering. e. Perform K-Means Clustering on scaled data & use the cluster centers obtained for the optimal number of clusters. f. Re-cluster the data following above steps and append the cluster information. g. Profiling the clusters through by comparing population characteristics with cluster characteristics. Visualize the data graphically. 4. Markov Chain Modeling: a. Creation of Data for Markov Chain modeling and performing data checks/ exploration. b. Create Sequence matrix based on cluster memberships and then create Transition Probabilities matrix. c. Data manipulation to ascertain probabilities only for transient states i.e. changes between distinct states involving failures. 5. Visualization a. Extensive Data Manipulation involving new metrics creation to arrive at right data for Visualizations. b. Visualization 1 : To illustrate the magnitude of deviations in metrics for the clusters having transient failure conditions. c. Visualization 2 : To illustrate the association between transient states depending upon transition probabilities.
  • 89. STEPS IN ANALYSIS C. Key Statistical Methodologies/Diagnostic Evaluation metrics used in Analysis: • Scaling: Scaling is used normalization of the data. It adopts similar standardization technique as used earlier for calculation of metrics for pressures. The rationale being to make sure that complete data becomes comparable for calculating distances by any distance proximity measure and cluster linkage method. • Clustering- Average & Ward.D2: It denotes method of linkage among each group of clustering. Average is used for the average of distances between all pair of objects among clusters. Ward.D2 is an improvement over Ward method. Ward method minimizes the total within-cluster variance i.e. at each iteration of clustering it finds the pair of clusters that leads to minimum increase in total within-cluster variance after merging. Ward.D2 implements criterion wherein dissimilarities are squared before cluster updating. • Pseudo F-statistic : Pseudo F-statistic is intended to capture the 'tightness' of clusters and describes the ratio of between cluster variance to within-cluster variance. Optimal number of clusters should have maximum value among all the clusters considered. D. Important Thresholds considered in the Analysis: • Minimum Proportion of Failures as 40% for states which exhibit machine breakdown tendency; implying that in at least 40% of instances for given cluster machines must have failed across the complete period under consideration. • Minimum Proportion of Failures as 10% for states which exhibit transient states for machine breakdown tendency; implying that in at least 10% of all failing conditions of machines for cluster; subsequent condition differ from previous failing condition. • 0.45 standard deviation as the lower limit for qualifying condition in understanding towards fluctuation in cluster means from population means for metrics. The lower limit is 0.45 and not 0.50 (considering equal differences among three levels) since there are some values hovering around 0.50 and will not get considered if the lower is not relaxed a bit. • 90 % as confidence for calculating transition matrix probabilities. Since data don’t have equal spread, it is relaxed at 10% risk.
  • 90. The variable ‘avg_3way_int’ is the composite average of variables ‘pr1_pr2_corres_pr3’, ‘pr2_pr3_corres_pr1’, ‘pr3_pr2_corres_pr1’ (explained in earlier section). As the name suggests, the variable indicates the average behavior of three interactions. The distribution looks very normal & symmetric about mean implying that on an average basis the fluctuations in the interactions gets compensated by another. However, there seems to be some extreme cases; on close inspection 523 cases were found that fall in the extremes of the curve. VISUALIZATIONS : EXPLORATORY ANALYSIS
  • 91. The dendrogram suggest the agglomerative method for hierarchical clustering through tree diagram. As previously mentioned, the rationale of clustering is to get as many justifiable and distinct clusters as possible. The red rectangles drawn suggest 22 as number of clusters to be used as the first level clusters. Pseudo F-statistic is calculated using a custom built function for all clusters obtained on k-means clustering. The plot suggest that the maximum ‘Pseudo F-statistic’ is obtained at clusters 22 (plot starts from value2) and hence the optimal clusters are 22. No seed is put deliberately in the custom function to check reliability and not to ensure reproducibility and hence it throws results in absolutely randomized manner. A lot of potential outliers (if considered for reduced dataset of 1000 only) are included and hence lots of iterations where required to deduce that the optimal clusters lie in the range of 16-22. To take into consideration all possibilities, 22 was chosen as optimal clusters since it was also suggested by hierarchical clustering above. VISUALIZATIONS : CLUSTERING DIAGNOSTICS
  • 92. Population statistics are compared with Cluster statistics only for clusters with chosen failures conditions (failing at least 40% and 10% transient) . Externally drawn red lines suggest the lower limit (0.45) of threshold s used for cluster profiling. The visualization suggest that only three clusters viz. 13,36 & 39 do not show any significant fluctuation beyond the threshold set. VISUALIZATIONS : CLUSTER ANALYSIS RESULTS
  • 93. VISUALIZATIONS : MARKOV CHAIN RESULTS The interactive network diagram known as ‘sankey diagram ‘ show the flow of the transient states. The width of the band between two nodes denote the probability of change from one state to another as immediate subsequent state (probabilities visible in R, here not visible it being an image). The three clusters 13,36 & 39 which do not show any significant increase or decrease identified previously are represented as ‘No_major_differentiator’. The three levels (pt.5 in data preparation) are labeled as Slight (Sli), Moderate (Mod) & Extreme(Ext) along with Increase (High) & Decrease (Low) for interactions (Int1, Int2 & Int3).
  • 94. SYNOPSIS OF THE ANALYSIS In a nutshell, the broad philosophy of the analysis is.. • Identify the key players responsible for the phenomenon i.e. machine breakdown & then use those to measure aggregated and comparable behavior; here, in the decreasing order of three pressures, stakeholders and then lifetime. • In absence of any business information about the process and the machines, create metrics that capture any important data indicating failure behavior. The failures (owing to extrinsic factors) generally happen only when there are serious deviations from normal conditions. Hence, standard scores are calculated and any large deviations in that is the reflection of non- acceptable behavior. • Since the business parameters are missing, the criticality in terms of outliers cannot be ascertained. Hence, choice of algorithm is very strategically done wherein outliers becomes part of the result and yet does not affect model behavior; unlike parametric regression methods wherein outliers can have serious impacts on results - beta estimates. Cluster Analysis is chosen algorithm which provide the segments that gives the different patterns in the fluctuations. • To calculate the probabilistic nature of failures, the clusters memberships have to be used as input. Markov Chain calculates the probabilities of sequential co-occurrence of states and hence preferred.
  • 95. SHORTCOMINGS Shortcomings of the Analysis: • No complete representation of data i.e. data is not equally spread across all lifetimes and hence predictive ability cannot be measured with utmost precision although Markov Chain can predict the likely states. Hence, although the objective is achieved for developing a framework for predicting failures; the model is not representing behavior holistically & it cannot be deployed in production mode. This also imply that the results in form of probabilities of associated failures obtained need to be revisited in light of complete data wherein all lifetimes are considered for all stakeholders. • No direct association with business end of the preventive maintenance of breakdowns like economic losses and strategic implications (e.g. capacity planning) can be measured or benchmarked as such representative information is not present. As a result, no business success metric or milestones can be defined or recommended. Only analytical methodology is explained herewith. • No definitive domain intelligence can be integrated with the results since information regarding type, purpose of machines, criticality and machine specific attribution is not available.