Predictive Vehicle Inspection
Matous Havlena
matous@havlena.net
Tim Ojo
timmyojo@gmail.com
Akin Alao
alaoraufu@yahoo.co.uk
Project Charter
Evaluate the feasibility of using Big Data analytics solutions for
Manufacturing to solve the problem of Predictive Vehicle
Inspection:
● Analyzing vehicle production history to predict car inspection
failures from the production line.
● Production shifts, specific employee, and other factors
The two Big Data Analytics solutions to be evaluated:
● IBM BigInsights
● Datameer 2.1
Approach & Proposed Solution
● Recognized the problem as a classification problem
similar to credit scoring or fraud detection.
● Classification is the problem of identifying to which of a
set of categories a new observation belongs, on the basis
of a training set of data containing observations whose
category membership is known.
● Build a predictive model based on machine learning
classification (supervised learning) to identify whether a
vehicle can be classified as good (passes quality check
on 1st try) or bad (fails quality check on 1st try)
Proposed Solutions - Tools
● BigInsights + SPSS Modeler
○ Hadoop is used to store big data and execute data
processing jobs in an efficient and distributed
fashion. IBM provides BigInsights as a management
and operational interface to simplify working with
Hadoop without doing much coding.
○ SPSS Modeler is a data analytics workbench that
allows the user to build predictive models by
leveraging built in algorithms and functions without
the need for programming
Proposed Solutions - Tools
● Datameer
○ Like BigInsights, Datameer Analytics Solution presents a
web based spreadsheet interface on top of a Hadoop
cluster and provides analytics functions and
visualizations out of the box without the need for writing
code.
○ DAS also has a Smart Analytics suite. One of the tools
available in that suite is a decision tree model which is a
descriptive model that can identify important factors that
affect quality.
○ Datameer can also be extended to run predictive models
created in R, SAS, SPSS, etc.
IBM Solution Architecture
SPSS Modeler
Client (only
Windows)

SPSS Modeler
Server (multiplatform)

SPSS Analytic Server
● allows analysts to do predictive analytics over big
data
● data centric architecture ensures scalability and
performance
SPSS Analytic Catalyst
● automatically discovers statistically interesting
relationships in data
● close the analytic specialist gap
● good in early discovery dataset stage (helps to
focus on important parts)
● automate some parts of CRISP-DM

SPSS Analytic
Server
(multiplatform)

SPSS Analytic
Catalyst

Hadoop
(BigInsights)
Prediction in SPSS Modeler

425 predictors
85.4% accuracy
(on the training dataset)
Model Outcome
Original value | Predicted value | Confidence
Predictor Importance
c5.0 Algorithm
● C5.o is an algorithm used to generate a decision tree
which can be used for classification therefore it is often
referred to as a statistical classifier
● A C5.0 model works by splitting the sample based on the
field that provides the maximum information gain. Each
subsample defined by the first split is then split again,
usually based on a different field, and the process
repeats until the subsamples cannot be split any further.
Finally, the lowest-level splits are reexamined, and those
that do not contribute significantly to the value of the
model are removed or pruned.
c5.0 Algorithm
● C5.0 models are quite robust in the presence of
problems such as missing data and large numbers of
input fields.
● They usually do not require long training times to
create. Because of the algorithm’s recursive nature it can
benefit from parallel processing.
● C5.0 offers the boosting method to increase accuracy of
classification
Datameer Analysis
● As previously mentioned Datameer has some built in
advanced analytics tools but most of them are in the
descriptive analytics area. The sole predictive analytics
tool they have is a specialized recommendation engine.
● Datameer can be extended to include predictive models
generated in tools like R, SAS, SPSS, etc. These take the
form of functions in DAS similar to the concept of
functions in Excel.
○ The disadvantage of this approach is that the hard work
of building the model is done without the support of big
data
○ Another disadvantage is the lack of tight integration that
is present in the IBM solution however you do get the
freedom to use any tool
Project Challenges & Opportunities
● Data understanding and formatting
● Time constraints
● More interaction with people on the ground
● More predictor data (diverse dataset is a key!)
○ Plant environment (temperature, humidity,
pressure)
○ Specific employees
○ Supplier & parts data
○ Warranty data
Questions?
Matous Havlena
matous@havlena.net
Tim Ojo
timmyojo@gmail.com
Akin Alao
alaoraufu@yahoo.co.uk

Predictive Analytics Project in Automotive Industry

  • 1.
    Predictive Vehicle Inspection MatousHavlena matous@havlena.net Tim Ojo timmyojo@gmail.com Akin Alao alaoraufu@yahoo.co.uk
  • 2.
    Project Charter Evaluate thefeasibility of using Big Data analytics solutions for Manufacturing to solve the problem of Predictive Vehicle Inspection: ● Analyzing vehicle production history to predict car inspection failures from the production line. ● Production shifts, specific employee, and other factors The two Big Data Analytics solutions to be evaluated: ● IBM BigInsights ● Datameer 2.1
  • 3.
    Approach & ProposedSolution ● Recognized the problem as a classification problem similar to credit scoring or fraud detection. ● Classification is the problem of identifying to which of a set of categories a new observation belongs, on the basis of a training set of data containing observations whose category membership is known. ● Build a predictive model based on machine learning classification (supervised learning) to identify whether a vehicle can be classified as good (passes quality check on 1st try) or bad (fails quality check on 1st try)
  • 4.
    Proposed Solutions -Tools ● BigInsights + SPSS Modeler ○ Hadoop is used to store big data and execute data processing jobs in an efficient and distributed fashion. IBM provides BigInsights as a management and operational interface to simplify working with Hadoop without doing much coding. ○ SPSS Modeler is a data analytics workbench that allows the user to build predictive models by leveraging built in algorithms and functions without the need for programming
  • 5.
    Proposed Solutions -Tools ● Datameer ○ Like BigInsights, Datameer Analytics Solution presents a web based spreadsheet interface on top of a Hadoop cluster and provides analytics functions and visualizations out of the box without the need for writing code. ○ DAS also has a Smart Analytics suite. One of the tools available in that suite is a decision tree model which is a descriptive model that can identify important factors that affect quality. ○ Datameer can also be extended to run predictive models created in R, SAS, SPSS, etc.
  • 6.
    IBM Solution Architecture SPSSModeler Client (only Windows) SPSS Modeler Server (multiplatform) SPSS Analytic Server ● allows analysts to do predictive analytics over big data ● data centric architecture ensures scalability and performance SPSS Analytic Catalyst ● automatically discovers statistically interesting relationships in data ● close the analytic specialist gap ● good in early discovery dataset stage (helps to focus on important parts) ● automate some parts of CRISP-DM SPSS Analytic Server (multiplatform) SPSS Analytic Catalyst Hadoop (BigInsights)
  • 7.
    Prediction in SPSSModeler 425 predictors 85.4% accuracy (on the training dataset)
  • 8.
    Model Outcome Original value| Predicted value | Confidence
  • 9.
  • 10.
    c5.0 Algorithm ● C5.ois an algorithm used to generate a decision tree which can be used for classification therefore it is often referred to as a statistical classifier ● A C5.0 model works by splitting the sample based on the field that provides the maximum information gain. Each subsample defined by the first split is then split again, usually based on a different field, and the process repeats until the subsamples cannot be split any further. Finally, the lowest-level splits are reexamined, and those that do not contribute significantly to the value of the model are removed or pruned.
  • 11.
    c5.0 Algorithm ● C5.0models are quite robust in the presence of problems such as missing data and large numbers of input fields. ● They usually do not require long training times to create. Because of the algorithm’s recursive nature it can benefit from parallel processing. ● C5.0 offers the boosting method to increase accuracy of classification
  • 12.
    Datameer Analysis ● Aspreviously mentioned Datameer has some built in advanced analytics tools but most of them are in the descriptive analytics area. The sole predictive analytics tool they have is a specialized recommendation engine. ● Datameer can be extended to include predictive models generated in tools like R, SAS, SPSS, etc. These take the form of functions in DAS similar to the concept of functions in Excel. ○ The disadvantage of this approach is that the hard work of building the model is done without the support of big data ○ Another disadvantage is the lack of tight integration that is present in the IBM solution however you do get the freedom to use any tool
  • 13.
    Project Challenges &Opportunities ● Data understanding and formatting ● Time constraints ● More interaction with people on the ground ● More predictor data (diverse dataset is a key!) ○ Plant environment (temperature, humidity, pressure) ○ Specific employees ○ Supplier & parts data ○ Warranty data
  • 14.