A business problem of finding a method to reduce time wasted in the manufacturing unit due to machines breaking down was solved by building a decision tree model. CART algorithm was used for the purpose. High level details are below:
A thorough analysis was done to identify if there are ways of knowing which machines have higher probabilities of breaking down. The ultimate goal of the management is to improve the productivity of the company by ensuring minimum or no stoppage of work at any point of time.
The idea of reviewing the data is to come up with a implementable framework and establish protocols which will enable visibility of machine health status and proactively take remedial steps before an actual breakdown. Post analysis the summary and recommendations are given below:
Machines delivered by Provider3 breakdown much earlier, as early as at 60 months. Management needs to have discussions around, if they should continue with Provider3 and/or initiate discussions with them to get them to improve their quality of delivered products.
In the interim, mandate monthly review of all Provider 3 machines aged more than 60 months.
Mandate monthly review of all machines older than 72.5 months that are provided by providers 1,2 and 4.
Essentially all machines older than 72.5 months will need monthly preventative maintenance review.
Determinants of health, dimensions of health, positive health and spectrum of...
Recommendations for Preventive Maintenance - A Machine Learning Project
1. A Machine Learning Project by
Pranov Mishra
Preventive Maintenance
Recommendations
2. Executive Summary
A thorough analysis was done to identify if there are ways of knowing which machines have
higher probabilities of breaking down. The ultimate goal of the management is to improve
the productivity of the company by ensuring minimum or no stoppage of work at any point
of time.
The idea of reviewing the data is to come up with a implementable framework and establish
protocols which will enable visibility of machine health status and proactively take remedial
steps before an actual breakdown. Post analysis the summary and recommendations are
given below:
I. Machines delivered by Provider3 breakdown much earlier, as early as at 60 months.
Management needs to have discussions around, if they should continue with Provider3
and/or initiate discussions with them to get them to improve their quality of delivered
products.
II. In the interim, mandate monthly review of all Provider 3 machines aged more than 60
months.
III. Mandate monthly review of all machines older than 72.5 months that are provided by
providers 1,2 and 4.
IV. Essentially all machines older than 72.5 months will need monthly preventative
3. Data set Summary
The data-set has 90,000 observations.
The data-set constitutes historical information of whether a machine has broken
down or not and the various predictor variables which supposedly play a role in
deciding overall health and longevity of the machines in use.
There are 7 variables with the variable, “broken” indicating whether the machine
had broken down or not.
The variables are the key initial insights summarizing them are given below
Variable Name Data Type Max Min Levels
lifetime Numeric 93 1 NA
broken Numeric* 1 0 Needs to be converted to factor
variable with 2 levels – 0 & 1
pressureInd_1 Numeric 173.28 33.48 NA
pressureInd_2 Numeric 128.60 58.55 NA
pressureInd_3 Numeric 172.54 42.28 NA
team Categorical NA NA 3 – TeamA,B and C
provider Categorical NA NA 4 – Provider1,2,3 and 4
4. Approach to Solution
All machines, over their life-time undergo wear and tear and require constant monitoring to
ensure that their thresholds to break down are not breached thereby extending the longevity.
The goal here is to analyze the data to identify the variables that indeed contribute to wear and
tear of machines thereby affecting (negatively) the lifetime of a machine.
The next goal is to assess and calculate the thresholds which will work as early warning
indicators, thereby triggering timely repair, ensuring a prevention of an early break down.
The approach would involve doing thorough exploratory analysis and building a predictive model
to call out early warning indicators
I. Identify if there are distinct patterns that point to what specifically contributes to a
break down.
II. Check if the distribution of # of machines broken down or otherwise across all levels
of teams and providers are same or different.
III. Check the lifetime of machines, both broken and otherwise, across all combinations
of teams and providers.
IV. Partition the data to Training and Testing dataset to build the build on the former and
test it on the latter.
V. Build a model to identify which factors are statistically significant in terms of
contributing to the machine breakdown.
VI. Identify the thresholds of the combination of and/or individual factors that will trigger
inspection and appropriate work prior to a breakdown.
5. Approach to Solution – Initial Insights
The initial data exploration suggests that no machine with a lifetime less than 60 months has broken down.
See below. Hence one of
the approach to be taken would be to select all observations with lifetime greater than 60 and explore further
to identify any
significant factor contributing towards machine break down.
6. Variable Profiling – Continuous Variables
Upon binning the lifetime it was found that highest percent of machine breakdowns happen
in the “lifetime” range of 88-93. However it is also seen that the minimum age at which
machine breaks down is 60, as was seen in the previous slide. Breakdown is more than
50% in every grouping after machine has crossed 60 years.
Upon completing a similar analysis on pressure indicators, no pattern was observed. As the
average pressure increases across the groups, the break percentage is not exhibiting any
distinct pattern.
7. Variable Profiling – Categorical Variables
Analysis of the categorical variables individually with the target variable is shown below.
Providers 1 and 3 seem to have higher contribution towards a machine breakdown.
Machines used by team B seems to be experiencing much higher break downs than the
machines used by other teams.
8. Data Analysis - - Exploration & Visualization
Lifetime Comparison of Machines
The average life of machines that are broken is seen to be almost double of that of
machines that are not broken. This is a good thing and expected since the machines that
are broken have served the company for a long time before breaking down and the newer
machines would be expected to serve for close to 78 months on average before breaking
down.
9. Data Analysis - Exploration & Visualization
Comparisons -Pressure Versus Machine Health Status
There does not seem to be any significant difference in the average pressures at
any of the pressure indicator points for machines that have broken down versus
machines that have not broken down. There needs to be further multivariate
analysis to understand if interaction of the pressure with other variables plays a
role or not.
10. Data Analysis - Exploration & Visualization
Defect Proportion comparisons by absolute Numbers
There does not seem to be any significant difference in the average pressures at
any of the pressure indicator points for machines that have broken down versus
machines that have not broken down. There needs to be further multivariate
analysis to understand if interaction of the pressure with other variables plays a
role or not.
11. Data Analysis - Exploration & Visualization
Pressure Indicator1 V Lifetime
The pressure Indicator1 does not give any major insight as pressure values are consistent
across all combination of
providers and teams. Similar pattern is seen for both broken and non-broken machines.
However what we can infer
is that for all machines with Team C, there is a tendency to break down earlier than machines
with Team A and B.
12. Data Analysis - Exploration & Visualization
Pressure Indicator1 V Lifetime – Filtered by Machines = Broken
Further analysis by sub setting the data to be consisting only of machines that have broken
down, we see that pressure indicator is consistent all across but TeamC machines break down
much earlier and the lifetime values are different across the providers. Lifetime values are least
for Provider3, followed by Provider1 and Provider4.
13. Data Analysis - Exploration & Visualization
Pressure Indicators(2 & 3) V Lifetime – Filtered by Machines = Broken
Exactly same observation was made for pressures at indicator point 2 and 3. The graphs are
below to
demonstrate the same.
14. Data Analysis - Exploration & Visualization
Data Split for further analysis
For further analysis the data is subset by filtering out all machines with age less than 60
since all machines less than 60 months of lifetime are found to be in good health across all
variables. After sub-setting data the attempt would be to identify the significant factors
contributing to a machine breakdown and work towards developing a strategy to use this
information to improve the longevity of the machines.
With the split we have 47550 observations with the same 7 variables. It has 35,522
machines that have broken down and 12028 machines that are in good health, about 25%
of the total in good health. The new distribution is shown below. New insights would suggest
that, Provider4 seems to be providing best machines and TeamA seems to be handling the
machines best.
15. Data Analysis - Exploration & Visualization
Outlier Analysis of numeric variables
The box plots of all the numeric variables are shown below. It is seen that pressures at
indicator points at 1,2 and 3 have outliers. An analysis was done on these outliers and it
was found that 2/3rd of the outliers are aligning with machines that are broken but 1/3
(=32.85%) of them align with machines which are not broken. It is noticed that total no. of
outlier observations is equal to 1385 which is less than 3% of total observations. The
client needs to be asked if the outlier values are probable values or we need to impute
with cut offs. In the current scenario, assuming outlier values are possible (since outliers
do not represent specific pattern for broken status) we will build the model without further
treatment of them.
16. Model Building
As mentioned earlier, for model building effort the observations involving lifetime of
machines less than 60 months is eliminated. The dimensions of the new data is 47550
obs. of 7 variables.
The data is further split into training and testing datasets in the ratio of 75:25. The code is
below for the split.
count.rows=nrow(MydataNew)
train.end.row=round(count.rows*0.75)
test.start.row=train.end.row+1
set.seed(1234)
Mydata_Random=MydataNew[order(runif(count.rows)),]
Train=Mydata_Random[1:train.end.row,]
Test=Mydata_Random[test.start.row:count.rows,]
17. Model Building
A Model is built using CART technique to predict the likelihood of machine breaking down.
The model is built on the training dataset. The 1st model is built with a reasonable
restriction of, 3000 observations required to be existing in any node to qualify for splitting.
The tree is allowed to grow fully with complexity parameter set at zero. The tree looks as
below
18. Model Building
The model was then pruned by increasing the complexity parameter to 0.04000147. The
optimal cp is arrived at
by looking at the cptable from the final model. The cptable shows that the optimal # of
terminal nodes should be
5, which is reinforced in the screeplot below. The optimal size of the tree should be 5. The cp
value which
corresponds to 5 terminal nodes = 0.04000147. See below
CP nsplit rel error xerror xstd
1 0.25237359 0 1.0000000 1.0000000 0.009075164
2 0.09295650 2 0.4952528 0.4952528 0.006913607
3 0.04000147 4 0.3093398 0.3093398 0.005609610
4 0.00000000 7 0.1893354 0.3093398 0.005609610
19. Final Model
The final model looks like below. There are 5 terminal nodes but the nodes of great interest
are the 3 which are the
nodes that are predicting which machines will breakdown. Before extracting the rules, lets
assess the model
performance (next slide).
20. Model Performance Assessment
The final model is tested by applying the model on an unseen data to predict the accuracy. The
accuracy of the model built is found to be 97.60. Code below:
Pred=predict(ModelFinal, newdata = Test, type = "class")
CT_Cart=table(Actual=Test$broken,Predicted=Pred)
Acc_Cart=sum(diag(CT_Cart))/sum(CT_Cart)
The ROCR for the model built is shown below which has an AUC of 98.45, which is amazing result.
21. Rules from the Model to Identify the Machines that may require Preventive care
The rules that are extracted from the model for the terminal nodes is given below. The focus
is majorly on the first
3 nodes below as they have a very high proportion of machines that breakdown. The first 3
nodes below constitute
82% of the total observations and all the machines that have a high probability of breaking
down. Hence this is a
very good split as by identifying the 17.542% of the total population which constitutes the last
2 nodes, 100% of
the machines which have a high probability of breaking down can be identified. They can be
looked into for
providing preventive maintenance to prevent a breakdown.
22. Recommendations
Create a framework which will mandate monthly preventative maintenance review for all
machines that reach 78.5 months of age(lifetime). The average age of machines that have
crossed 78.5 months and have broken down is 85.3 months and there are instances of
machine breaking down at 79,80, 81 months. Hence preventive maintenance can either
increase longevity or prompt the management to replace the machine in case it is likely to
breakdown. Either way its a proactive measure to prevent sudden stoppage of work due to
lack of knowledge of when a machine will breakdown.
The same framework needs to be applied on all machines that are provided by Provider3
and are older than 60 months ( & less than 78.5).
For all machines that are in the range of 72.5 months and 78.5 months and the provider is
not Provider3, consistent monitoring as mentioned above is required. This is essentially
combining Rule 3 and Rule 4 from previous slide. Though rule 4 states that there is no need
to monitor machines aged between 75 and 78.5 if they are not from provider3, it is sensible
to monitor them as machines with lesser lifetime than 75 have broken down as seen
through rule 2.