Jigsaw Academy conducted a case study contest for its students in collaboration with Pexitics Analytics. These are the submissions by different student groups
Jigsaw Corporate Contest: Pexitics Preventive Maintenance Case StudyAnupama Rathore
The winning presentation of the Jigsaw Academy Corporate Contest.
Brief Description of the case study:
A company which produces corrugated boxes with honeycomb pattern wants to build a model to predict breakdown and use preventive maintenance to reduce downtime (time during which production is stopped especially during setup for an operation or when making repairs). The dataset provided had 90000 instances of maintenance work done across 1 year across various machines. Jigsaw Students were expected to analyze the data and check if the data was consistent across various teams and providers, does one need to analyze across various segments (teams/providers/any other), if so how should the segmentation be done and how can various rules be created across pressure levels and lifetime of a machine to predict the breakage.
Jigsaw Corporate Contest: Pexitics Preventive Maintenance Case StudyAnupama Rathore
The winning presentation of the Jigsaw Academy Corporate Contest.
Brief Description of the case study:
A company which produces corrugated boxes with honeycomb pattern wants to build a model to predict breakdown and use preventive maintenance to reduce downtime (time during which production is stopped especially during setup for an operation or when making repairs). The dataset provided had 90000 instances of maintenance work done across 1 year across various machines. Jigsaw Students were expected to analyze the data and check if the data was consistent across various teams and providers, does one need to analyze across various segments (teams/providers/any other), if so how should the segmentation be done and how can various rules be created across pressure levels and lifetime of a machine to predict the breakage.
Presentation by Tony Limas of Granite Construction titled "Quantifying Risk of End Result Specifications," delivered at the California Asphalt Pavement Association (CalAPA) Spring Asphalt Pavement Conference April 25-26, 2018 in Ontario, CA.
Recommendations for Preventive Maintenance - A Machine Learning ProjectPranov Mishra
A business problem of finding a method to reduce time wasted in the manufacturing unit due to machines breaking down was solved by building a decision tree model. CART algorithm was used for the purpose. High level details are below:
A thorough analysis was done to identify if there are ways of knowing which machines have higher probabilities of breaking down. The ultimate goal of the management is to improve the productivity of the company by ensuring minimum or no stoppage of work at any point of time.
The idea of reviewing the data is to come up with a implementable framework and establish protocols which will enable visibility of machine health status and proactively take remedial steps before an actual breakdown. Post analysis the summary and recommendations are given below:
Machines delivered by Provider3 breakdown much earlier, as early as at 60 months. Management needs to have discussions around, if they should continue with Provider3 and/or initiate discussions with them to get them to improve their quality of delivered products.
In the interim, mandate monthly review of all Provider 3 machines aged more than 60 months.
Mandate monthly review of all machines older than 72.5 months that are provided by providers 1,2 and 4.
Essentially all machines older than 72.5 months will need monthly preventative maintenance review.
Validation of lab instruments and quantitative test methods Mostafa Mahmoud
This lecture shows the procedures applied when going to validate your laboratory instruments and quantitative test methods also either FDA approved or laboratory developed tests.
Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...Minitab, LLC
Watch online at: https://hubs.ly/H0hswm60
Organizations in the pharmaceutical and health sectors are being asked by regulators to:
- Apply more complete methods to validate analytical techniques and measurement systems, known as Data Integrity
-Monitor and evaluate the performance of production processes, otherwise called Statistical Process Control (SPC)
In this presentation you will learn how to:
-Improve the precision and accuracy of analytical techniques, using Minitab's tools for Gage R & R, Gage Linearity and Bias studies and Design of Experiments
-Select the relevant control charts and capability analyses for data that does and does not follow the normal distribution
The presentation will explain how data integrity and process monitoring are critical to each other for regulatory compliance. If the data is not healthy, the evaluation of the process could also be incorrect.
You will finish with the confidence to use more sophisticated statistical techniques, in particular for data integrity.
Presentation by Tony Limas of Granite Construction titled "Quantifying Risk of End Result Specifications," delivered at the California Asphalt Pavement Association (CalAPA) Spring Asphalt Pavement Conference April 25-26, 2018 in Ontario, CA.
Recommendations for Preventive Maintenance - A Machine Learning ProjectPranov Mishra
A business problem of finding a method to reduce time wasted in the manufacturing unit due to machines breaking down was solved by building a decision tree model. CART algorithm was used for the purpose. High level details are below:
A thorough analysis was done to identify if there are ways of knowing which machines have higher probabilities of breaking down. The ultimate goal of the management is to improve the productivity of the company by ensuring minimum or no stoppage of work at any point of time.
The idea of reviewing the data is to come up with a implementable framework and establish protocols which will enable visibility of machine health status and proactively take remedial steps before an actual breakdown. Post analysis the summary and recommendations are given below:
Machines delivered by Provider3 breakdown much earlier, as early as at 60 months. Management needs to have discussions around, if they should continue with Provider3 and/or initiate discussions with them to get them to improve their quality of delivered products.
In the interim, mandate monthly review of all Provider 3 machines aged more than 60 months.
Mandate monthly review of all machines older than 72.5 months that are provided by providers 1,2 and 4.
Essentially all machines older than 72.5 months will need monthly preventative maintenance review.
Validation of lab instruments and quantitative test methods Mostafa Mahmoud
This lecture shows the procedures applied when going to validate your laboratory instruments and quantitative test methods also either FDA approved or laboratory developed tests.
Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...Minitab, LLC
Watch online at: https://hubs.ly/H0hswm60
Organizations in the pharmaceutical and health sectors are being asked by regulators to:
- Apply more complete methods to validate analytical techniques and measurement systems, known as Data Integrity
-Monitor and evaluate the performance of production processes, otherwise called Statistical Process Control (SPC)
In this presentation you will learn how to:
-Improve the precision and accuracy of analytical techniques, using Minitab's tools for Gage R & R, Gage Linearity and Bias studies and Design of Experiments
-Select the relevant control charts and capability analyses for data that does and does not follow the normal distribution
The presentation will explain how data integrity and process monitoring are critical to each other for regulatory compliance. If the data is not healthy, the evaluation of the process could also be incorrect.
You will finish with the confidence to use more sophisticated statistical techniques, in particular for data integrity.
Jigsaw Academy Cafe Great Contest - Winning PresentationsJigsaw Academy
Jigsaw Academy tied up with a data science leader in launching a project for our students through the Great Cafe Contest. The students got a chance to work on industry dataset and a real life project. The final presentation is the answer to the business problem highlighting the analytical skills of the winners.
The overall objective of the project was to demonstrate the use of analytical tools for drawing insights from the available dataset. The learning objective was to demonstrate the use of tools like SAS/R to extract, manipulate, analyze and visualize data.
A brief introduction to Web Analytics
For a course in Web Analytics - See more at: http://www.jigsawacademy.com/web-analytics-training#sthash.LsGc9ZIT.dpuf.
This course is for professionals interested in assessing online business performances and students interested in e-commerce marketing & sales careers.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
5. Problem Statement:
The machine in the diagram shows high pressure air
entering the machine and low pressure air exiting, while
water pressure is applied at the bottom.
We can clearly see that the range for pressure 2 is lower
than pressure 1 and pressure 3, hence indicating that
pressure 2 can be water pressure
The lower pressure applied here can be due to 2 reasons:
The viscosity of water is higher than air
Water is incompressible unlike air
6. Problem Statement:
▪ In 90,000 instances of maintenance work done by a company across various
machines, 39.47% was the breakage rate in a year.
▪ The company now wants to build a model to predict breakdown by analyzing
factors like pressure indicator points, machine lifetime, team usage, service
provider and use preventive maintenance to reduce downtime.
▪ The company is also looking for measures in terms of creating rules across
pressure levels , lifetime of a machine and create segments across teams and
service providers to predict and prevent breakage.
11. Breakage % by Teams - A,B,C
▪ Machines used by Team B have the highest
breakdown (15%) as compared to Team A
(12%) and Team C (12%)
Team B
– 15%
12. Breakage % by Service Providers – 1,2,3 & 4
▪ Manufacturer (Provider) 4 has the
lowest breakdown rate (8%) as
compared to manufacturers 1 (11%)
& manufacturer 3 (11%)
Manufacturer
4 – 8%
13. Distribution of Machine Life (in months)
▪ At an overall level, the average lifetime of a machine
is 55 months.
▪ All machines start breaking when lifetime reaches 60
months and continue to break till 93 months with peak
at 60 - 80 months.
0
28.6 28.8
26.08
16.51
0
5
10
15
20
25
30
35
0 - 59 months 60 - 70 months 71 - 80 months 81 - 90 months 91 - 93 months
Broken
Broken
Within broken = 1
14. Distribution of Pressure Indicator Points 1,2 & 3
▪ Most of the machines break at pressure indicator 1 between 63.0 to 112.5 of pressure.
▪ Most of the machines break at pressure indicator 2 between 78.1 to 102.0 of pressure.
▪ Most of the machines break at pressure indicator 3 between 70.1 to 110 of pressure.
17. Breakage % across Providers & Teams
▪ Amongst breakage, more than 60% of the
machines for team B are provided from
manufacturer 1 and manufacturer 3 (which are
most likely to break) hence this team has a
higher probability of breaking
▪ Provider 4 has the lowest breakdown-rate at
team A and team B whereas provider 2 should
be the preferred vendor for team C.
Within broken = 1
18. Current scenario across Providers & Teams
▪ Currently, team A is buying the highest number
of m/c’s from provider 2 followed by provider 1,
instead of their preferred vendor provider 4.
▪ Team B is using the lowest number of m/c’s
from provider 4 (their preferred vendor)
▪ While, team C is taking highest number of
m/c’s from provider 2 (their preferred vendor)
Within broken = 0
20. Breakage Summary - Pressure
Indicator 1, Teams & Providers
• At pressure indicator 1, most of the
machines break between 63.0 to
112.5 of pressure
• In this faceted density graph, the
probability of breakage is highest in
Machines of Provider 3 used by
Team A
• Breakage Probability is also
noticeably high with machines of
Provider 4 to Teams B & C
21. Breakage Summary - Pressure
Indicator 2, Teams & Providers
• At pressure indicator 2, most of the
machines break between 78.1 to
102.0 of pressure.
• In this faceted graph, except for
machines provided by Provider 3 to
Team B, all other instances of
machines reflected a broken trend
between 80-120 psi for pressure
indicator 2.
• High Breakage is noticed for
Machines provided by Provider 2 to
Team A at 100 psi.
• Provider 3’s Machines have shown
a noticeable non breakage for all
teams at 90-110 psi
22. Breakage Summary - Pressure
Indicator 3, Teams & Providers
• At pressure indicator 3, most of the
machines break between 70.1 to
110 of pressure.
• In this faceted graph, a high
breakage probability is noticeable
in Machines provided by Provider 3
to Team A and Team C, when
pressure indicator 3 is between 90-
120 psi.
• Non - breakage probability is
relatively high when Team B uses
machines from Provider 4 at 120
psi for pressure indicator 3.
23. Breakage Summary - Machine
Life, Average Pressure & Provider
Machines from provider 4 and 2 do not
have breakage up to 80 months while
machines from provider 1 & 3 have
100% breakage till the same point
Machines from Provider 4 show a
consistent life irrespective of pressure
points
Machines from Provider 2 have a longer
lifetime; i.e. only 29% breakage
happens till 89 months as compared to
100% of provider 4.
25. Breakage Summary - Machine
Life, Teams & Providers
• Machines provided by manufacturer 3
have least lifetime; breakdown
completely at 66 months
• Also, machines used from Provider 1 &
3 show maximum breakage
• This graph also proves that, provider 4
should be the preferred vendor for
Team A and Team B while provider 2
should be the preferred vendor for
Team C basis likelihood of breakage.
26. Breakage Summary - Machine
Life > 60 Months, Teams &
Providers
• Machines provided by Provider 1
and 3 constitute more than 57% of
the breakage.
• 40% of machines from Team A &
Team B show a breakage between
88 - 93 months.
28. Dummy Variables for Modelling
▪ Binned Pressure Indicator Points &
Life Time based on Quartiles
▪ Converted Teams & Providers into
numeric variables
▪ Average of Three Pressure Points as
Variable created
▪ Training and Validation Datasets
split with ratio 50:50
Dataset 90000 observations & 8 variables
Target
Variable Broken, Binary Variable
Data Type Character Variables =2, Numeric Variables = 6
Missing
Value
No Missing Values
Factors
Machine Life, Pressure Indicator Points, Teams ,
Providers
29. Outlier Detection - Box Plots
▪ Outliers seen at pressure points but they have been
taken into consideration while doing the modelling
▪ Data for Lifetime variable looks consistent; breakage
appears post 59 months
31. Explanation: Logistic Regression using SAS
▪ Fisher’s Binary Logit was performed with 45,000 observations read & used under training dataset
▪ Response Profile in the model tells us how many observations were "successful”; The model over here is
estimating the probability of the target variable “Broken” where broken = 1.
▪ Model Convergence status is satisfied; i.e. to get an indication if the algorithm converged well as we are
using an iterative procedure to fit the model. A satisfied convergence criteria also means that all the
independent variables are significant and enough in number.
▪ Model Fit Statistics – AIC, SC and -2LogL (Lack of fit) - Values range up to 60,431 after variable
transformations and bucketing. It is likely that we are getting a high number here as we don’t have a high
number of independent variables.
▪ Testing Global Null Hypothesis: Beta = 0, corresponds to the likelihood ratio, score and Wald tests,
where the likelihood ratio is used to comparing the goodness of fit of two statistical models – Null and
Alternative. Here, for a good model atleast 1 of the 3 values need to be significant (<0.05 at 5% level of
significance) for us to reject the null hypothesis. For our model, all the 3 variables are significant at 5%
level of significance.
32. ▪ Maximum Likelihood Estimator (MLE) is a method of estimating the probability of parameters which
influence the predictor variable; P-values signifies that likelihood is purely by randomness and is not biased.
This also gives you an estimate value which tells us that with a unit change in x1, it will lead to a β1 change
in the log of the odds ratio of the probability of success of Y. A positive estimate value shows a directly
proportional relationship with the dependent variable while a negative estimate value shows an inversely
proportional relationship with the same.
▪ Odds Ratios Estimate gives the upper and lower limit for the confidence intervals signifying that 95% of
the time, the estimate will be in the given range. This also gives you a point estimate which tells us that with
a unit change in x1, it will lead to a β1 change in the log of the odds ratio of the probability of success of Y.
▪ Concordant and Discordant Pairs is the total number of possible pairs of (1’s) and (0’s), since it’s a binary
classification model. Every possible combination of 0 & 1 is counted v/s what the model predicts will be the
outcome. Every pair of response variable (0,1) combinations are reviewed. The number of times the
predicted probability of 0 was greater than the probability of 1 (and vice versa) are counted. This is then
compared to the real life probabilities of 0 and 1. If probability of 0/default is higher where there was a
default in real life relative to where there wasn’t a default in real life it is a concordant pair, else it’s a
discordant pair. The higher the concordance ratio, the better is the model. Our model has a concordant
percentage of 94.3.
Explanation: Logistic Regression using SAS
33. ▪ Classification Table : The classification table is another method to evaluate the predictive accuracy of the
logistic regression model. In this table the observed values for the dependent outcome and the predicted
values (at a user defined cut-off value, for example p=0.50) are cross-classified. Our model correctly
predicts 86.7% of the cases at a cutoff value of 0.50.
▪ Gain chart / Lift curve : It is a popular technique to measure the performance of a logistic regression
model.
▪ Rank ordering : For our model, the rank ordering is well maintained since response rate decreases by
decile. We have also captured 72% of the outcomes in the first 3 deciles. This is shown in the excel
provided alongside this presentation.
▪ The entire dataset is divided into training and validation dataset in the ratio of 50:50. The model is first
built on the training dataset and then validated on the validation dataset. We have successfully validated our
model on the validation dataset with the same significant variables.
Explanation: Logistic Regression using SAS
35. Model Insights
The variables that are significant according to the
model and their explanation is given below:
Lifetime 80 – Lifetime of machines ranging from 61
months to 80 months. This variable has a positive
estimate value signifying that an unit increase in the
number of months will lead to an increase in the log
of the odds ratio of the probability of success of
broken by 5.4286.
Lifetime 93 – Lifetime of machines ranging from 81
months to 93 months. This variable also has a
positive estimate value signifying that an unit
increase in the number of months will lead to an
increase in the log of the odds ratio of the probability
of success of broken by 8.2307.
Insight - Hence the machines should be replaced
by month 60.
36. Model Insights
PressureInd_197 – Pressure indicator 1 ranging
from 85.01 to 97.00. This variable has a negative
estimate value signifying that an unit increase in
pressure indicator 1 within this range will lead to a
decrease in the log of the odds ratio of the
probability of success of broken by -0.6753.
PressureInd_1115 – Pressure indicator 1 ranging
from 97.01 to 115.00. This variable also has a
negative estimate value signifying that an unit
increase in pressure indicator 1 within this range will
lead to a decrease in the log of the odds ratio of the
probability of success of broken by -0.3858.
37. Model Insights
PressureInd_2100 – Pressure indicator 2 ranging
from 88.01 to 100.00. This variable also has a
negative estimate value signifying that an unit
increase in pressure indicator 2 within this range will
lead to a decrease in the log of the odds ratio of the
probability of success of broken by -0.129.
PressureInd_3101 – Pressure indicator 3 ranging
from 88.01 to 101.00. This variable also has a
negative estimate value signifying that an unit
increase in pressure indicator 3 within this range will
lead to a decrease in the log of the odds ratio of the
probability of success of broken by -0.2199.
38. Model Insights
PressureInd_3173 – Pressure indicator 3 ranging
from 114.01 to 173.00. This variable has a positive
estimate value signifying that an unit increase in
pressure indicator 3 within this range will lead to an
increase in the log of the odds ratio of the probability
of success of broken by 0.2485.
Insight – Machines at lower pressure points like
p197, p115, p2100 and p3101 are less likely to
break at various pressure points (have negative
estimate value) as compared to p3173 which has
a higher likelihood (positive estimate value) of
breakage.
39. Model Insights
Team B and Team C are significant variables with
Team A as base with a positive estimate value
signifying that with increase in the number of
machines used by these teams there will be an
increase in the log of the odds ratio of the probability
of success of broken by 0.2079 and 1.1111
respectively
Insight – Machines used by Team C are more
likely to break as compared to Team B
Team CTeam B
40. Model Insights Provider 3 is a significant variables with Provider 1
as base with a positive estimate value signifying that
with increase in the number of machines provided by
provider 3 there will be an increase in the log of the
odds ratio of the probability of success of broken by
3.2723.
Provider 4 is a significant variables with Provider 1
as base with a negative estimate value signifying
that with increase in the number of machines
provided by provider 4 there will be a decrease in
the log of the odds ratio of the probability of success
of broken by -1.9197.
Insight – Machines provided by provider 3 are
more likely to break (high positive estimate
value) as compared to machines provided by
provider 4 (high negative estimate value).
Provider 4 Provider 3
41. Model Insights - Visualization
Lifetime 80
– 61 to 80
months
Lifetime 93 – 81
to 93 months PressureInd_3173 –
114.01 to 173.00
Team B Team C Provider 3
Top 6 factors affecting machine breakage:
43. Profiling – Cluster 1
Variable N Cluster Mean Pop mean Pop std. dev Z-value
Lifetime 7744 42.87 55.09 26.51 -0.46
Broken 7744 0.01 0.39 0.49 -0.78
Pressure Indicator 1 7744 103.17 98.56 19.98 0.23
Pressure Indicator 2 7744 99.04 99.34 10.04 -0.03
Pressure Indicator 3 7744 95.01 100.59 19.62 -0.28
Team 7744 1.96 1.97 0.8 -0.01
Provider 7744 3.85 2.47 1.11 1.24
7744 machines, 9% of the total machines
Here we want to focus the most on the dependent variable “broken” which is much lower in this cluster than
normal. Hence showing that these machines are least likely to break within its product lifecycle (PLC).
But lifetime of the machines is lower than average indicating that these m/c’s need to be replaced sooner than
others.
Provider though is much higher than normal hence signifying that m/c’s bought from manufacturer 4 (as mean is
very close to 4) are least likely to break.
44. Profiling – Cluster 2
Variable N Cluster Mean Pop mean Pop std. dev Z-value
Lifetime 11332 45.02 55.09 26.51 -0.38
Broken 11332 0.16 0.39 0.49 -0.47
Pressure Indicator 1 11332 97.29 98.56 19.98 -0.06
Pressure Indicator 2 11332 99.9 99.34 10.04 0.06
Pressure Indicator 3 11332 124.63 100.59 19.62 1.23
Team 11332 1.94 1.97 0.8 -0.04
Provider 11332 2.23 2.47 1.11 -0.22
11332 machines, 13% of the total machines
Here also we want to focus the most on the dependent variable “broken” which is much lower in this cluster than
normal. Hence showing that these machines are less likely to break within its product lifecycle (PLC).
Here too, the lifetime of the machines is lower than average indicating that these m/c’s need to be replaced
sooner than others.
45. Profiling – Cluster 3
Variable N Cluster Mean Pop mean Pop std. dev Z-value
Lifetime 16231 38.55 55.09 26.51 -0.62
Broken 16231 0.22 0.39 0.49 -0.35
Pressure Indicator 1 16231 97.69 98.56 19.98 -0.04
Pressure Indicator 2 16231 98.75 99.34 10.04 -0.06
Pressure Indicator 3 16231 94.89 100.59 19.62 -0.29
Team 16231 2.43 1.97 0.8 0.58
Provider 16231 2.15 2.47 1.11 -0.29
16231 machines, 18% of the total machines
Here again we want to focus the most on the dependent variable “broken” which is lower in this cluster than
normal. Hence showing that these machines are less likely to break within its product lifecycle (PLC).
Here too, the lifetime of the machines is lower than average indicating that these m/c’s need to be replaced
sooner than others.
Pressure indicator 3 though is much lower than normal hence signifying that m/c’s used here are more likely to
break at pressure indicator 3. Team and provider too is different than average hence proving that manufacturer 2
(mean < 2.5) should supply m/c’s to Team 2.
46. Team-wise and provider-wise cluster distribution
Team A and Team C use the majority of the machines in all the clusters, hence signifying the fact that machines
used by Team A and Team C are least likely to break (among clusters with the least probability of break-down).
Provider 4 is providing almost all the m/c’s in cluster 1 which is least likely to break
Cluster N Team A Team B Team C
1 7744 49.0 6.0 45.1
2 11332 47.5 10.6 41.9
3 16231 28.5 0 71.5
Cluster N Provider 1 Provider 2 Provider 3 Provider 4
1 7744 3.7 2.1 0 94.2
2 11332 35.0 31.8 8.1 25.1
3 16231 28.1 28.4 43.5 0
48. Sensors Installation
We recommend installing sensors and soft-wares which can predict/flag breakage at a safe lifetime and pressure
points.
Service repair as a measure to Preventive Maintenance
Regular maintenance protects your investment against unplanned breakdowns and hence, we recommend a
scheduled repair of machines on different time intervals as follows:
Recommendations
Provider Time Interval
(between months)
Total No. Of
Services
Flag-off Month
(Sensors)
Provider 1 35-80 3 34
Provider 2 60-93 3 59
Provider 3 35-66 3 34
Provider 4 60-89 3 59
*The starting pt. for servicing for m/c’s from provider 1 & 3 is 35 months basis cluster analysis (least likelihood of breakage) – as they are most prone to
breakage while for m/c’s from provider 2 & 4 it is 60 months as there is no breakage till that point.
*The ending point for the m/c’s provided by all the providers is decided basis the entire life-span of the machine.
49. Sensors and software at pressure points 1,2 & 3 to predict breakage
If we can apply the sensors at all the 3 pressure points (at different pressure levels), we can easily predict/prevent a
machine from breaking due to increase in pressure levels.
Recommendations
Pressure point Pressure with highest
likelihood of breakage
Flag-off pressure points
(Sensors)
Pressure point 1 63.0 – 112.5 55.0
Pressure point 2 78.1 – 102.0 70.0
Pressure point 3 70.1 – 110.0 62.0
*The sensors are applied at the above mentioned pressure levels so as to give enough time for the teams to shut off the m/c or prevent excess pressure on the
machines.
50. Preferred Vendor to Teams
Provider 2 should be the preferred vendor for Team C basis likelihood of breakage.
Provider 4 should be the preferred vendor for Team A and Team B;
Based on clusters analysis, with least likelihood of breakage, we believe that to successfully prevent a
machine breakdown we need to start servicing the machines from their 35th month into operation.
Last but not the least, for future, it would be advisable to not purchase machines from Provider 1 and
Provider 3; but as for now, we have provided the necessary steps that need to be taken to prevent
breakage from these manufacturers.
Also, it would be interesting to understand factors as to why machines were bought from these providers
– Cost Cutting? Logistic Feasibility? Goodwill?
Recommendations
53. Provider 1 - Insights
Observations from given data :
• Max age of the machine is 80. 1st breakdown observed at 73 , irrespective of factors ( pressure points )
• Broken (machine breaking down )is fully dependent on lifetime of the machine.
• Pressure Indicator 1 (PI1) and Pressure Indicator 3(PI3) are having a negligible positive impact on
machine breakdown.
• Pressure Indicator 2 (PI2) is having a negligible negative impact on machine breakdown.
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
1
Relationship of parameters with “Broken”
lifetime pressureInd_1 pressureInd_2 pressureInd_3
54. Insights from Analysis:
1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 64 with p(BD) at 5% approx.
2. For max values of PI1 & PI3 , and min values of PI2 , the p(BD) is 22% with BDA being 64(chart1 below).
3. For min values of PI1 & PI3 , and max values of PI2 , the p(BD) is 1% with BDA being 64.
4. Based on Points 1 through 3 above , we have used Break Down Age (BDA) as 64 and Maximum Cut Off Age as
(MCA) as 67 (mean values of PI1, PI2 and PI3).
Note : Use the RUL Estimator to calculate the values shown above in the screen print.
Input table for different
variables RUL estimator table
lifetime pressureIn
d_1
press
ureInd
_2
pressureI
nd_3
Probability of
breakdown =>
p(BD)
In percentage Break
Down Age
(BDA)
RUL
Provider
1
64 151 74 172
0.218811901 21.88119006 64 0
0
5
10
15
20
25
61 62 63 64
%ofBreakDown
Life Time
Chart 1 - % of BD v/s Life Time
RUL based on max values of PI1 & PI3 , and min values of PI2
Input table for different
variables RUL estimator table
lifetim
e
pressur
eInd_1
pressureI
nd_2
pressure
Ind_3
Probability of
breakdown => p(BD)
In
percentage
Break
Down
Age
(BDA)
RUL
Provider1
64 99 99 100
0.057757457
5.7757456
59 64 0
RUL based on mean values of PI1 , PI2 and PI3
55. Provider 2 - Insights
Observations from given data :
• Max age of the machine is 93. 1st breakdown observed at 85 , irrespective of factors ( pressure points )
• Broken (machine breaking down )is fully dependent on lifetime of the machine.
• Pressure Indicator 1 (PI1) and Pressure Indicator 3(PI3) are having a negligible positive impact on
machine breakdown.
• Pressure Indicator 2 (PI2) is having a negligible negative impact on machine breakdown.
1
Relationship of parameters with "Broken"
lifetime pressureInd_1 pressureInd_2 pressureInd_3
56. Insights from Analysis:
1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 77 with p(BD) at 5% approx.
2. For max values of PI1 & PI3 , and min values of PI2 , the p(BD) is 3129.46% with BDA being 77 (Chart 1
below).
3. For max values of PI1 & PI3 , and min values of PI2 , the p(BD) is 5% approx. with BDA being 66.
4. For min values of PI1 & PI3 , and max values of PI2 , the p(BD) is 0.020% with BDA being 77.
5. Based on points 1 through 3 above , we have used Break Down Age (BDA) as 66 and Maximum Cut Off Age
as (MCA) as 69 (max values of PI1 & PI3 , and min values of PI2 ).
Note : Use the RUL Estimator to calculate
the values.
Input table for different
variables RUL estimator table
lifetime pressure
Ind_1
pressureI
nd_2
pressureIn
d_3
Probability of
breakdown =>
p(BD)
In
percentage
Break Down
Age
(BDA)
RUL
Provid
er2 66 173 70 155 0.054603744
5.4603744
42 77 11
Input table for different
variables RUL estimator table
lifetim
e
pressureI
nd_1
pressure
Ind_2
pressureInd
_3
Probability of
breakdown =>
p(BD)
In
percentag
e
Break
Down Age
(BDA)
RUL
Provid
er2 77 173 71 155 31.29465076
3129.465
076 77 0
0
500
1000
1500
2000
2500
3000
3500
66 67 68 69 70 71 72 73 74 75 76 77
%ofBreakDown
Life Time
Chart 1 - % of BD v/s Life Time
RUL based on max values of PI1 & PI3 , and min values of PI2
RUL based on max values of PI1 & PI3 , and min values of PI2
57. Provider 3 - Insights
Observations from given data :
• Max age of the machine is 66. 1st breakdown observed at 60 , irrespective of factors ( pressure points )
• Broken (machine breaking down )is fully dependent on lifetime of the machine.
• Pressure Indicator 1 (PI1), Pressure Indicator 2 (PI2) and Pressure Indicator 3(PI3) are having a
negligible positive impact on machine breakdown.
• .
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1
Relationship of parameters with "Broken"
lifetime pressureInd_1 pressureInd_2 pressureInd_3
58. Insights from Analysis:
1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 54 with p(BD) at 5% approx.
2. For max values of PI1 , PI2 & PI3 , the p(BD) is 52.35% with BDA being 54 (Chart 1 below).
3. For min values of PI1, PI2 and PI3 , the p(BD) is 0.7749% with BDA being 54.
4. Based on points 1 through 3 above , we have used Break Down Age (BDA) as 50(5%) and Maximum Cut Off
Age as (MCA) as 53 (max values of PI1, PI2 and PI3).
Note : Use the RUL Estimator to calculate
the values.
0
20
40
60
80
100
120
51 52 53 54 55
%ofBreakDown
Life Time
Chart 1 - % of BD v/s Life Time
RUL based on max values of PI1 , PI2 and PI3
Input table for different
variables RUL estimator table
lifetim
e
pressureI
nd_1
pressureI
nd_2
pressure
Ind_3
Probability of
breakdown
=> p(BD)
In
percentag
e
Break
Down
Age
(BDA)
RUL
Provider3 54 152 123 148
0.51615260
1
51.61526
007 54 0
RUL based on max values of PI1 PI2 & PI3
Input table for different
variables RUL estimator table
lifetim
e
pressureIn
d_1
pressureInd
_2
pressureInd
_3
Probability of
breakdown
=> p(BD)
In
percentag
e
Break
Down
Age
(BDA)
RU
L
Provider3 51 152 123 148 0.065209324
6.520932
365 54 3
59. Provider 4 - Insights
Observations from given data :
• Max age of the machine is 89. 1st breakdown observed at 81 , irrespective of factors ( pressure points )
• Broken (machine breaking down )is fully dependent on lifetime of the machine.
• Pressure Indicator 1 (PI1) and Pressure Indicator 2 (PI2) are having a negligible negative impact on
machine breakdown.
• Pressure Indicator 3 (PI3) is having a negligible positive impact on machine breakdown.
-0.1
0
0.1
0.2
0.3
0.4
0.5
1
Relationship of parameters with "Broken"
lifetime pressureInd_1 pressureInd_2 pressureInd_3
60. Insights from Analysis:
1. For all mean values of PI1, PI2 and PI3 , the Break Down Age (BDA) is 71 with p(BD) at 5% approx.
2. For min values of PI1, PI2 and For max values of PI3 , the p(BD) is 5.9% with BDA being 63(Chart 1 below).
3. Based on points 1 through 3 above , we have used Break Down Age (BDA) as 63 and Maximum Cut Off Age
as (MCA) as 67 (min values of PI1, PI2 and max PI3.
Note : Use the RUL Estimator to calculate
the values.
RUL based on min values of PI1 PI2 & max PI3
RUL based on min values of PI1 PI2 & max PI3
0
5
10
15
20
25
30
35
63 64 65 66 67
%ofBreakDown
Life Time
Chart 1 - % of BD v/s Life Time
Input table for different
variables RUL estimator table
lifetime pressur
eInd_1
pressureI
nd_2
pressureIn
d_3
Probability of
breakdown
=> p(BD)
In
percentage
Break
Down
Age
(BDA)
RUL
Provider4 67 36 77 149 0.300035998
30.003599
76 66 -1
Input table for different variables RUL estimator table
lifetim
e
pressure
Ind_1
pressureInd
_2
pressureInd_
3
Probability
of
breakdown
=> p(BD)
In
percentage
Break
Down
Age
(BDA)
RUL
Provider4 63 36 77 149
0.0591870
32
5.9187031
75 66 3
61. Preventive Maintenance Strategy
Provider
s
Break Down
Age
Max cut-off
Age
Provider
1
64 67
Provider
2
66 69
Provider
3
50 53
Provider
4
63 67• Break Down Age: Life time at/after which intimate the respective provider for
maintenance.
• Max Cut-off Age: Life time by which maintenance should be completed, else
machine may break down any time.
• After first cycle of maintenance is over ,the same time period should be
considered.
63. Pexitics Preventive Maintenance Project
Submitted by Vicky Crasto & Arijit Mitra
Problem Statement – Data related to machine breakdowns is provided which must be used to predict future
occurrences and create a framework for preventive maintenance
Note – The entire R Code is available in the server - C:Jig1324221-Pexitics Case study
Overall Approach
• Understand the distribution of life time and breakdown across team and provider.
• Understand the distribution of life time and break down across team and provider basis the 3 Pressure indices
• Interaction between the 3 pressure indices and understand the distribution of break down
• Distribution of breakdown across teams and providers.
• Plotting Kaplan Meir survival function for the machine breakdown
• Plotting Kaplan Meir survival function basis team and provider
• Testing statistically if the survival function is different across team and provider.
• Determine the parametric survival function and the appropriate distribution.
• Use the regression model to predict the life time of the machine and identify the machine to be replaced urgently.
• Determine cox proportional model to determine the hazard rate and understand the relation between the covariates.
64. Distribution of lifetime basis machine status
We clearly see that the lifetime of machines that have broken is
more than 60 months, with the median value around 79 months
On the other most of the hand machine that have not been
broken have a lifetime between 20 to 60 months.
Overview of the data
Total no. of observation - 90000
39.47% of machine are broken
Machine status Team A Team B Team C
Not broken 22% 21% 18%
broken 12% 15% 12%
Distribution of the machines basis status
Machine status Provider 1 Provider 2 Provider 3 Provider 4
Not broken 14% 18% 13% 16%
broken 11% 9% 11% 8%
65. Understand the distribution of life time and breakdown across teams and provider
The lifetime of the machines manufactured by
Provider 3 is lower than the remaining. This needs to
be further tested
The lifetime of the machines belonging to Team C is
lower than the remaining. This needs to be further
tested
66. Understand the distribution of life time and breakdown across teams and provider basis PressureInd1
The lifetime of machines belonging to TeamC have less
lifetime with respect to pressureind_1
As highlighted machines from Provider3 seem to have a
lower lifetime with the pressureind_1 spread across the
range.
67. Understand the distribution of life time and breakdown across teams and provider basis PressureInd2
The lifetime of machines belonging to TeamC have less
lifetime with respect to pressureind_2
As highlighted machines from Provider3 seem to have a
lower lifetime with the pressureind_2 spread across the
range.
68. Understand the distribution of life time and breakdown across teams and provider basis PressureInd3
The lifetime of machines belonging to TeamC have less
lifetime with respect to pressureind_3 and they seem to
breaking down at 3 distinct levels
As highlighted machines from Provider3 seem to have a
lower lifetime with the pressureind_3 spread across the
range and also the break down has been occurring at
two distinct levels.
69. Understand the distribution of breakdown basis the interaction between pressure indices
As we see in the plots, machines tends to break at a lower pressure for pressureind_3 compared to the other 2 pressure
indices.
70. Distribution of breakdown across teams and providers
As highlighted the proportion of
broken machines is higher in
• Provider 1 and Team A
• Provider 1 and Team B
• Provider 3 and Team B
This needs to be investigated
further.
71. Plotting Kaplan Meir survival function for the machine breakdown
Things to note
• The survival probability decreases with increase in the
lifetime.
• At each level of the lifetime, the number of machines at risk
are lower as the number of machines censored are also
removed.
Censored machines means, observation which
• Machines which does not experiences the event
• Machine that is lost during the follow-up period
• Machine which has withdrawn from the study
Table showing the survival probability each level of lifetime
Kaplan Meir survival plot
72. Plotting Kaplan Meir survival function for the machine breakdown basis teams
Things to note
• Survival curve for Team C is different than Team A and B.
Table showing the survival probability each level of lifetime
Kaplan Meir survival plot across team
73. Plotting Kaplan Meir survival function for the machine breakdown basis provider
Things to note
• Survival curve for each provider is different. This must be tested statistically.
Table showing the survival probability each level of lifetime
Kaplan Meir survival plot across provider
74. Testing statistically if the survival function is different across team and provider.
Things to note
• We see that the p value is almost zero, indicating we reject the null hypothesis. We conclude that the survival function across team and provider is
different.
• Here the p value is not zero but almost zero. Since the dataset is huge, very small difference is magnified and found to be significant.
Using Log-rank test to check if the survival function is different across groups
H0 – Survival function is same across the groups
H1 – Survival function is different across the groups
Log Rank Test output for team Log Rank Test output for provider
75. Determine the parametric survival function and the appropriate distribution
Parametric models assume the knowledge of the survival or density function up to K unknown parameter.
However we need to determine the distribution of the underlying survival function.
For this we create regression models with different distribution and check basis the log likelihood value, which
fit the data the best.
Below are the log likelihood values for the various distribution
The lognormal distribution has the lowest log likelihood value and hence fits the data the best.
Using this regression model we predict the lifetime of the machines which has not been broken down and
determine the remaining lifetime of the machine.
76. Identifying the machine to replaced urgently
Using the remaining lifetime, we divide the machines into three groups
Remaining lifetime Label
Less than 15 months Need urgent attention
15 to 50 months Maintenance needed in Short term
Greater than 50 months Maintenance needed in Long term
As highlighted machines
manufactured by provider
3 and belonging to Team A
and Team B have a higher
proportion of machine in
need of maintenance on
immediate basis.
77. Determine Cox proportional hazard model to determine the hazard rate and understand the
relation between the covariates
Cox proportional hazard model is a semi - parametric model which does not assume any underlying
distribution for the hazard function but assume some distribution for the covariates.
The output of the model is shown below
We see that all the 3 pressure index are significant . Along with that the interaction between Team B and
Provider2 , and interaction between Team B and Provider 3 are significant.
The exp(coefficient ) are very small and make it very difficult to interpret into a meaningful equation.
On the whole model explains 71% of the variance in the data which good.
Significance of the individual covariates and their interaction Performance of the model
78. Recommendations
• The management must look into the machines belonging to Team B and the machines manufactured by
Provider 3.
• Preventive maintenance must be carried out as per the labels provided from the parametric regression
model.
• Moreover, best practices of machine maintenance carried out by Team A and Team C must be documented
and shared with all the teams.
• Machine manufacture audit can be carried out to understand the quality of the spares used in the
machine, so that frequent breakdown of machine can be avoided.
Areas of improvement for the model
• Determine the performance of the parametric model by divided the data into model and validation
dataset. Plot the lift chart to determine how well the model is working.
• Fine tune the Cox proportional hazard function and determine the hazard ratio for each covariate.
Reference
• PD Allison (1995). Survival Analysis Using SAS: A Practical Guide. SAS Publishing
• https://www.analyticsvidhya.com/blog/2015/05/comprehensive-guide-parametric-survival-analysis/
• http://www.sthda.com/english/wiki/survival-analysis-basics
• http://www.biostat.umn.edu/~wguan/class/PUBH7402/notes/lecture11.pdf
• Michael J. Crawley The R Book Imperial College, Silwood Park, Ascot, Berks
81. OBJECTIVE & SCOPE
Pexitics would like to build a model to predict machine breakdown and use preventive maintenance to reduce downtime. The insights
and results obtained from predictive model thus created should be indicative enough to create framework for identification of
breakdowns and also suggestive enough to enable stakeholders for taking corrective actions.
Understanding of the Data:
The dataset provided 90000 instances of maintenance work done for various machines across 1 year. It appears to comprise periodic
instances of sensor data picked up from a random sample of observations during the period wherein each record provide details for
instance readings of respective machine status in form of –
• Readings of pressure indicators at particular instance
• Only machine specific attribute in form of ‘Lifetime’
• Representative information about usage (team handing machine in the factory)
• Machine entity (manufacturer) information
• Random observation number
• Breakdown status
It is imperative to preprocess/rationalize the given data to check data sanity, create derived variables and possibly restructure data based
on assumptions for model building. At the outset, data appear to be sensor data, so it is rather very structured.
82. APPROACH
Major part of the analysis revolves around pressures, stakeholders involved and lifetimes in decreasing order of importance. Since the
machine functioning parameters are not given; those are assumed to be standard; a lot depends on the interplay of the external
environmental factors. There has to be controllable element for changing the pressures but the right combination for optimal
performance of machines given the wear and tear and the handling of the machines seem to be complex in nature. Hence, the failures
behavior does not seem to be linear in nature and hence cannot be predicted in straightforward fashion.
Important Considerations:
Three types of interactions are important from understanding of breakdown phenomenon and later model building, arising out of
segmented behaviors on these interactions –
A. Interactions among Stakeholders:
The interaction of variables ‘Team’ and ‘Provider’, the rationale being two-fold:
- The probability of failures will highly depend on the usage patterns; even when the machines are standard and reliable.
- Serious defects at the time of manufacturing will hamper the performance of the machines even though the usage patterns are
standard.
In practical business scenarios, the accountability of machine maintenance lies with the team/factory/department and hence
financial obligations. Also, the SLAs and contracts for machines from provider/company gets reviewed after a holistic review in the
entire organizational setup wherein multiple team/factory/department provides their performance reports. Hence, the interaction
between variables ‘team’ and ‘provider’ cannot be viewed separately. The collective assessment will give better insights to net effect.
B. Interactions among Pressures:
The pressures ought to have significant interactions among themselves governed by domain specific as well as process specific laws
of applied physics and mathematics. These pressures can also work as environmental stressing conditions.
83. APPROACH
C. Interactions arising out of Lifetimes :
Although the lifetimes are in months; the records do not belong to equally spaced time periods; meaning that random observations
do not lead to time series implications. The importance of the lifetime data gets reduced also from the fact that the spread of the
random observations are not equal for all the lifetimes. Hence, no analysis is done on lifetimes as main driving factor; rather it is used
as a supplementary information (though very useful) to derive failure behavior based on age.
Broad Assumptions about the data:
• The variable ‘S.no’ only signifies chronological readings.
• Each machinery breakdown reading has no dependency on the breakdown behavior of subsequent breakdown reading.
• At any given instance of recordings of data, no conditional or joint probability exists for pressures acting on one machine of ‘x’
lifetime with pressures acting of another machine of ‘y’ lifetime from same manufacturer (e.g. Provider1) and that belonging to same
factory (e.g. TeamA).
• The breakdown as in 1 in variable ‘broken’ signifies complete breakdown and not partial working condition.
Techniques/ Methods of Analysis:
• Cluster Analysis: Unsupervised Learning method for segmentation based on distance measure (proximity).
• Markov Chain Model: A stochastic (random) model for deriving sequence of events and then probability of events depending on
previously attained events.
• Exploratory Analysis: Involving Data Manipulation and Data visualizations to draw insights in the modeling process.
84. Cluster Analysis:
Rationale:
It aims to identify segments that exhibit similar behavior towards failure/machine breakdowns conditions.
Premises, Importance & Thought Process:
The variable ‘broken’ mentions about failures in a very objective sense as 0 and 1. To deduce qualitative information about the data, it is
imperative to obtain patterns in the data beyond binary outcomes. Except failure status, given data is used to derive important metrics
(explained in detail in the next section) as a proxy to indicate interplay of factors influencing failure behavior.
Clustering analysis take these metrics for analyzing unexpected fluctuations from normal conditions. It is aimed at finding distinct
segments based on working conditions without knowledge of baseline threshold working model; purely based on variance in the data.
The cluster analysis considers only fluctuations of the pressures for the set of conditions taking into account net effect of ‘team’,
‘provider’ and ‘lifetime’. Since there is no benchmark available, the fluctuations indicating ‘above’ or ‘below’ standard working conditions
for performance is obtained by numerically assessing distribution for the given phenomenon. One of the key part of analysis is
‘standardization’ of the data along it’s mean & measured in terms of standard deviations.
Data for clustering comprise of unique records for ‘lifetime’ and three pressure values. This is relatively small dataset (1000 records) but
includes all the possible values that the pressures can take for all given lifetimes.
The analysis required multi-phased sequential re-clustering to capture finer fluctuations. The rationale is that these finer fluctuations can
take different sizes on complete data and hence none of those could have been neglected.
APPROACH
85. Markov Chain Model:
Rationale:
It aims to evaluate and establish probabilistic nature of failure conditions.
Premises, Importance & Thought Process:
In absence of time element in the analysis, it is not possible to evaluate or establish any time-based metric. Hence, a lot of Age-to-Failure
(Mean Time Between Failure) analysis; and Life Data analysis (parametric Weibull Distribution) along with associated Time dependent
Reliability Analysis of machinery breakdowns cannot be performed. To facilitate analysis on ‘state-space’ as supposed to ‘time-
parameter’, Markov chain model is used. It aids prediction on future states solely based on the inter-relationships of sequential
occurrence of states in the past.
Conventionally, the input to Markov chain model are the distinct states. The rationale behind cluster analysis is to identify these states.
Since the data is chronologically arranged for the combination of ‘team’ & ‘provider’ along with ‘lifetime’, the cluster membership will
reflect the sequential states along which the machine breakdown progresses through normal, possibly sub-optimal conditions and then
failures. The results from Markov Chain mention ‘transition probabilities’ i.e. the likelihood of going from one state to another.
Since, the cluster memberships are straightaway considered as states; there is a scenario wherein the two sequential states are not to
be considered. This happens when two non-comparable states come together. The situation arises in the data when data for two
consecutive rows are for two different sets e.g. ‘TeamA_Provider1’ and ‘Team_Provider2’. These cases are very few in total data as it is
sorted accordingly to avoid the same. The resultant probabilities of such cases are too less and do not make an overall impact.
APPROACH
86. Exploratory Analysis:
Rationale:
It is used for unearthing insights about the failure behavior at various stages. Exploratory analysis aims at guiding course of analysis and
also at critical junctures while evaluating parameters for statistical model customization. Visualizations are important part of the
exploratory analysis, and it is used at specific occasions.
Functionally has following importance in Analysis :
A. Insights-driven (both pre and post modeling) judgment specific:
Since the whole approach of analysis is derived metrics oriented, it is used for verifying the suitability of the application of such metrics
from business point of view, mainly before modeling. Post modeling, it provides interpretability and aid to infer important business
critical information.
B. Modeling Diagnostics (Model improvement) & Results (Analytical importance) specific :
The results of the clustering diagnostics are influenced by thumb rules mentioning best practices about model-specific parameters.
Clustering results are shown for the performance of key metrics only with respect to failures. The results obtained from Markov chain
model are included for providing better business context i.e. the probabilities obtained are linked to cluster profiles exhibiting transient
(changing) failures.
APPROACH
87. STEPS IN ANALYSIS
A. Data Preparation:
The outliers in the dataset are identified based on evaluation of the derived metric but no special treatment is done owing to two
reasons – a. Clustering is sensitive to outliers, hence those will anyways get filtered. b. Markov Chain denote probabilities of co-
occurrence, outliers will have too less probabilities and hence will be ignored. Data preparation is carried on following lines.
Derived Variables creation:
Derived variables are created to help summarisation of the data and also define the units of aggregation. This ultimately are key to
drawing insights & building a model around those. In the course of data manipulation throughout the analysis, many variables are
created but only important variables are listed below:
1. ‘team_provider’ & ‘life_pres_all’: Concatenated variables indicating interactions among given variables.
2. ‘pr1_pr2_corres_pr3’, ‘pr2_pr3_corres_pr1’, ‘pr3_pr2_corres_pr1’: Calculated metrics capturing interactions in pressure values
among each other. It implies product of first two pressure values, divided by the third. So, all changes get captured.
3. ‘normz_int1’, ‘normz_int2’, ‘normz_int3’: Standardized scores implying differences from mean (for the grouped data on lifetime and
team_provider) in terms of standard deviations for the three variables created in pt.2 above.
4. ‘std_int1’, ‘std_int2’, ‘std_int3’: Based on standardized scores philosophy, population statistics are compared with cluster statistics.
Interaction 1 (int1) stand for first metric in pt.2 above and likewise for other two interactions.
5. ‘Int1_Inc_ge_0.45_1_stdev’, ‘Int1_Inc_ge_1_1.5_stdev’ ’, ‘Int1_Inc_gt_1.5_stdev” : For ‘std_int1’ in pt.4 above, magnitude of
increase in standard deviations in three distinct levels. Likewise for three levels of decrease and then for other two interactions. It is
used for cluster profiling and naming states in Markov Chain terminology.
6. Links & Nodes: It is used majorly in network diagram and has been used a lot in data manipulation to get desired matrices.
88. STEPS IN ANALYSIS
B. Broad Steps in Analysis:
1. Initial Exploratory Analysis to understand the data and know the distributions.
2. Creation of Derived Variables, potential outliers detection (not removal) based on distribution of the calculated metrics.
3. Cluster Analysis:
a. Creation of Data for Clustering and performing data checks/ exploration.
b. Hierarchical Clustering to know the optimal number of clusters by initially creating dissimilarity matrix and visually confirming
through dendrogram by applying different methods of linkage between clusters.
c. Scaling complete data for clustering and perform detailed Clustering Diagnostics on scaled data to arrive at optimal clusters.
d. Again performing Hierarchical Clustering for optimal number of clusters to get the cluster centers for K-means clustering.
e. Perform K-Means Clustering on scaled data & use the cluster centers obtained for the optimal number of clusters.
f. Re-cluster the data following above steps and append the cluster information.
g. Profiling the clusters through by comparing population characteristics with cluster characteristics. Visualize the data graphically.
4. Markov Chain Modeling:
a. Creation of Data for Markov Chain modeling and performing data checks/ exploration.
b. Create Sequence matrix based on cluster memberships and then create Transition Probabilities matrix.
c. Data manipulation to ascertain probabilities only for transient states i.e. changes between distinct states involving failures.
5. Visualization
a. Extensive Data Manipulation involving new metrics creation to arrive at right data for Visualizations.
b. Visualization 1 : To illustrate the magnitude of deviations in metrics for the clusters having transient failure conditions.
c. Visualization 2 : To illustrate the association between transient states depending upon transition probabilities.
89. STEPS IN ANALYSIS
C. Key Statistical Methodologies/Diagnostic Evaluation metrics used in Analysis:
• Scaling: Scaling is used normalization of the data. It adopts similar standardization technique as used earlier for calculation of
metrics for pressures. The rationale being to make sure that complete data becomes comparable for calculating distances by
any distance proximity measure and cluster linkage method.
• Clustering- Average & Ward.D2: It denotes method of linkage among each group of clustering. Average is used for the average
of distances between all pair of objects among clusters. Ward.D2 is an improvement over Ward method. Ward method
minimizes the total within-cluster variance i.e. at each iteration of clustering it finds the pair of clusters that leads to minimum
increase in total within-cluster variance after merging. Ward.D2 implements criterion wherein dissimilarities are squared
before cluster updating.
• Pseudo F-statistic : Pseudo F-statistic is intended to capture the 'tightness' of clusters and describes the ratio of between cluster
variance to within-cluster variance. Optimal number of clusters should have maximum value among all the clusters considered.
D. Important Thresholds considered in the Analysis:
• Minimum Proportion of Failures as 40% for states which exhibit machine breakdown tendency; implying that in at least 40% of
instances for given cluster machines must have failed across the complete period under consideration.
• Minimum Proportion of Failures as 10% for states which exhibit transient states for machine breakdown tendency; implying
that in at least 10% of all failing conditions of machines for cluster; subsequent condition differ from previous failing condition.
• 0.45 standard deviation as the lower limit for qualifying condition in understanding towards fluctuation in cluster means from
population means for metrics. The lower limit is 0.45 and not 0.50 (considering equal differences among three levels) since
there are some values hovering around 0.50 and will not get considered if the lower is not relaxed a bit.
• 90 % as confidence for calculating transition matrix probabilities. Since data don’t have equal spread, it is relaxed at 10% risk.
90. The variable ‘avg_3way_int’ is the composite average of variables ‘pr1_pr2_corres_pr3’, ‘pr2_pr3_corres_pr1’, ‘pr3_pr2_corres_pr1’ (explained in earlier
section). As the name suggests, the variable indicates the average behavior of three interactions. The distribution looks very normal & symmetric about
mean implying that on an average basis the fluctuations in the interactions gets compensated by another. However, there seems to be some extreme cases;
on close inspection 523 cases were found that fall in the extremes of the curve.
VISUALIZATIONS : EXPLORATORY ANALYSIS
91. The dendrogram suggest the agglomerative method for hierarchical
clustering through tree diagram. As previously mentioned, the rationale
of clustering is to get as many justifiable and distinct clusters as possible.
The red rectangles drawn suggest 22 as number of clusters to be used as
the first level clusters.
Pseudo F-statistic is calculated using a custom built function for all clusters obtained on
k-means clustering. The plot suggest that the maximum ‘Pseudo F-statistic’ is obtained
at clusters 22 (plot starts from value2) and hence the optimal clusters are 22. No seed
is put deliberately in the custom function to check reliability and not to ensure
reproducibility and hence it throws results in absolutely randomized manner. A lot of
potential outliers (if considered for reduced dataset of 1000 only) are included and
hence lots of iterations where required to deduce that the optimal clusters lie in the
range of 16-22. To take into consideration all possibilities, 22 was chosen as optimal
clusters since it was also suggested by hierarchical clustering above.
VISUALIZATIONS : CLUSTERING DIAGNOSTICS
92. Population statistics are compared with Cluster statistics only for clusters with chosen failures conditions (failing at least 40% and 10% transient) .
Externally drawn red lines suggest the lower limit (0.45) of threshold s used for cluster profiling. The visualization suggest that only three clusters viz. 13,36
& 39 do not show any significant fluctuation beyond the threshold set.
VISUALIZATIONS : CLUSTER ANALYSIS RESULTS
93. VISUALIZATIONS : MARKOV CHAIN RESULTS
The interactive network diagram known as ‘sankey diagram ‘ show the flow of the transient states. The width of the band between two nodes denote the
probability of change from one state to another as immediate subsequent state (probabilities visible in R, here not visible it being an image). The three clusters
13,36 & 39 which do not show any significant increase or decrease identified previously are represented as ‘No_major_differentiator’. The three levels (pt.5 in
data preparation) are labeled as Slight (Sli), Moderate (Mod) & Extreme(Ext) along with Increase (High) & Decrease (Low) for interactions (Int1, Int2 & Int3).
94. SYNOPSIS OF THE ANALYSIS
In a nutshell, the broad philosophy of the analysis is..
• Identify the key players responsible for the phenomenon i.e. machine breakdown & then use those to measure aggregated and
comparable behavior; here, in the decreasing order of three pressures, stakeholders and then lifetime.
• In absence of any business information about the process and the machines, create metrics that capture any important data
indicating failure behavior. The failures (owing to extrinsic factors) generally happen only when there are serious deviations
from normal conditions. Hence, standard scores are calculated and any large deviations in that is the reflection of non-
acceptable behavior.
• Since the business parameters are missing, the criticality in terms of outliers cannot be ascertained. Hence, choice of algorithm
is very strategically done wherein outliers becomes part of the result and yet does not affect model behavior; unlike parametric
regression methods wherein outliers can have serious impacts on results - beta estimates. Cluster Analysis is chosen algorithm
which provide the segments that gives the different patterns in the fluctuations.
• To calculate the probabilistic nature of failures, the clusters memberships have to be used as input. Markov Chain calculates the
probabilities of sequential co-occurrence of states and hence preferred.
95. SHORTCOMINGS
Shortcomings of the Analysis:
• No complete representation of data i.e. data is not equally spread across all lifetimes and hence predictive ability cannot be
measured with utmost precision although Markov Chain can predict the likely states. Hence, although the objective is achieved
for developing a framework for predicting failures; the model is not representing behavior holistically & it cannot be deployed
in production mode. This also imply that the results in form of probabilities of associated failures obtained need to be revisited
in light of complete data wherein all lifetimes are considered for all stakeholders.
• No direct association with business end of the preventive maintenance of breakdowns like economic losses and strategic
implications (e.g. capacity planning) can be measured or benchmarked as such representative information is not present. As a
result, no business success metric or milestones can be defined or recommended. Only analytical methodology is explained
herewith.
• No definitive domain intelligence can be integrated with the results since information regarding type, purpose of machines,
criticality and machine specific attribution is not available.