SlideShare a Scribd company logo
Santander Bank Challenge
Duy Tran, Indranil Dey, Sriram RV, Sushir
Simkhada, Dane Arnesen
Agenda
› Santander Bank customer satisfaction dataset overview (Sushir)
› Data preprocessing (Sushir)
› Algorithms / Tools
– Random Forest using Python (Dane Arnesen)
– SVM using Matlab (Indranil Dey)
– Gradient Tree Boosting / XGBoost using R (Duy Tran)
– Neural Network using Spark with H2O (Sriram RV)
› Conclusions & Lessons Learned (Sushir)
› Q&A
Santander Bank Challenge
• The competition was listed in www.kaggle.com.
• Santander Bank wants to identify the dissatisfied customers.
• This will help them to take actions to improve the customers
happiness.
• Which customers are unhappy?
– Happy = 0, Unhappy = 1
– 371 features including CustomerID & TargetAttr
– 76,020 rows in training data, only 3,008 rows where TargetAttr=1
Preprocessing Issues:
 More happy customer than unhappy customer.
 Variables were provided in Spanish so we don’t understand the
meaning of these variables.
 Data processing
• How to remove highly correlated variables and zero frequency
variables
 Solution
• Removal of zero variance attributes
• Removal of highly correlated attributes using correlation matrix
Random Forest
Python
Python RandomForestClassifier
› Python DS library called Scikit-Learn
– Classification, Regression, Clustering, Dimensionality Reduction, Visualization, etc.
– Open Source
– Recommend Anacanda download: https://www.continuum.io/downloads
› RandomForestClassifier part of the Ensemble family of classifiers
– Using random subset of features + bagging techniques
– Lots of parameters…
Model Prediction Probability
Number of Random Trees
Model Feature Importance
› Of 371 total features…
– Only 13 features with
measurable impact to the
Random Forest classifier
AUC Curve & Confusion Matrix
Class 1 1 0
1 1603 (TP) 405 (FN)
0 586 (FP) 1408 (TN)
› Using 55% probability cutoff:
– Accuracy: 75%
– TPR: 80%
– FPR: 29%
– Precision: 73%
– F1: 76%
Support Vector Machine
Matlab
Support Vector Machine
12
› A Support Vector Machine (SVM) is a discriminative classifier formally defined by
a separating hyperplane. Given labeled training data (supervised learning), the
algorithm outputs an optimal hyperplane which categorizes new examples.
Advantages:
› SVMs produces large margin separating hyperplane, and efficient in higher dimension
› It maximizes the margin between points closest to the boundary
› SVMs only consider points near the margin (support vectors) – more robust
Disadvantages:
› Due to complexity of the algorithm it requires high amount of memory and takes long time to train
the model and predict the test data
› The model is sensitive to optimal choice of kernel and regularization parameters
Support Vector Machine
13
MODEL INFO:
Status: Trained
Training Time: 04:48:27
Classifier Options
Type: SVM
Kernel function: Linear
kernel scale: 1.0
Kernel scale mode: Auto
Box constraint level: 1.0
Multiclass method: One-vs-One
Standardize data: true
Cross Validation: 10 Folds
Feature Selection Options
Features Included: 369
Validation Results
Validation accuracy: 96%
› Model 1 : SVM using Linear Kernel – complete dataset with 369 predictors
Class Precision Recall F1
0 100% 96.04% 97.98%
1 0% 0% --
Class 0
AUC: 58.01%
Class 1
AUC: 58.01%
Reducing the Number of Predictors
14
› By using MATLAB we created a correlation matrix for 369 predictors
› From the correlation matrix we identified predictors which are highly positively or
negatively correlated
– Highly positively correlated: Correlation greater than 0.75
– Highly negatively correlated: Correlation less than -0.75
› After removing the highly correlated predictors the total number of predictors gor
reduced to 115 from 369
Correlation Matrix with 369 Predictors
Balancing the Dataset & Applying PCA
15
› After removal of correlated predictors the SVM models became more trained in
predicting class 0, which was not a desired outcome
› To overcome this issue we had to balance the training dataset, i.e. keeping equal
number of records of both the classes in the training data
– Using MATLAB randomly selected 3008 records of class 0 and combined 3008 records of class 1
› Also to improve the SVM models further, we used PCA with 50 components
– Principal component analysis (PCA) is a statistical procedure that uses an orthogonal
transformation to convert a set of observations of possibly correlated variables into a set of
values of linearly uncorrelated variables called principal components#.
Support Vector Machine
16
MODEL INFO:
Status: Trained
Training Time: 00:06:42
Classifier Options
Type: SVM
Kernel function: Linear
kernel scale: 1
Kernel scale mode: Auto
Box constraint level: 1.0
Multiclass method: One-vs-One
Standardize data: true
Cross Validation: 10 Folds
Feature Selection Options
Features Included: 115
PCA Options
Enable PCA: true
Maximum number of components: 50
Validation Results
Validation accuracy: 72.6%
› Model 6 : SVM using Linear kernel – PCA (50 components)
Class Precision Recall F1
0 72.47% 72.67% 72.57%
1 72.74% 72.55% 72.64%
Class 0
AUC: 77.54%
Class 1
AUC: 77.54%
PCA explained variances: 61.5%, 28.6%, 10.0%, …….
Comparing the SVM Models
17
› The model 6 has best prediction accuracy for both the classes
Model No. Description Accuracy Class Precision Recall F1 AUC
Model 1
SVM Linear Kernel – Complete
dataset with 369 predictors
96%
0 100% 96.04% 97.98%
58.01%
1 0% 0% --
Model 2
SVM Linear Kernel – Complete
dataset with 115 predictors
96%
0 99.99% 96.04% 97.98%
59.68%
1 0% 0% --
Model 3
SVM Gaussian Kernel – Complete
dataset with 115 predictors
96%
0 99.99% 96.04% 97.98%
51.07%
1 0% 0% --
Model 4
SVM Linear Kernel – Balanced
dataset with 115 predictors
70.8%
0 67.75% 72.14% 69.88%
78.64%
1 73.84% 69.6% 71.66%
Model 5
SVM Gaussian Kernel – Balanced
dataset with 115 predictors
70.2%
0 84.48% 65.71% 73.92%
77.58%
1 55.92% 78.27% 65.23%
Model 6
SVM Linear Kernel (PCA) –
Balanced dataset with 115
predictors
72.6%
0 72.47% 72.67% 72.57%
77.54%
1 72.74% 72.55% 72.64%
* All models built with 10 folds cross-validation
Learnings from building SVM Model
18
› Removing highly correlated predictors simplifies models
› PCA is also a good way to deal with correlated attributes in a dataset
› Unbalanced training dataset will impact the model’s prediction, and skew it
towards the class with higher number of instances in the dataset
› There is no single way for increasing the prediction accuracy of a model, we
should take multiple approaches to iteratively improve the prediction accuracy of
the predictive models
Gradient Tree Boosting
R
Performance Metrics - GBM
Class 1 Class 0
Class 1 256 316
Class 0 1104 13569
› Accuracy : 0.9069
› Precision : 0.44755
› TPR : 0.18824
› TNR: 0.97724
› F1 : 0.51751
Training Process - GBM
Number of Trees
Use all observations?
Use all predictors?
Maximum depth of each tree
Learning rate
Balance response classes?
Increase true positive rate but also
increase false positive rate!
Hyperparameter optimization – Grid vs Random
http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/3/docs-website/h2o-
docs/booklets/GBM_Vignette.pdf
› Grid search – exhaustive, curse of
dimensionality.
› Random search – found to be more
effective:
http://jmlr.csail.mit.edu/papers/v
olume13/bergstra12a/bergstra12a
.pdf
› Easy parallelization
Neural Network
Spark with H2O
What is Deep Learning?
› Deep Learning learns a hierarchy of non linear transformations.
› Neurons transform their input in non linear way.
› Three types of neurons Input, Output and Hidden neurons
› Input neurons get activated by numbers in your dataset and output neurons is the output
you want to see.
Why did I choose this model?
• Prediction speed is fast and also the results are very significant with less
misclassification errors compared to any other algorithms.
• Handles lots of irrelevant features well (separates signal from noise).
• Automatically learns feature interactions.
• H2O is a Java Virtual Machine that brings database-like interactiveness to Hadoop
that is optimized for doing “in memory” processing of distributed, parallel machine
learning algorithms on clusters. It can be installed as a standalone or on top of
existing Hadoop installation.
Performance Metrics – Deep Learning
Class 0 Class 1
Class 0 64856 8156
Class 1 1673 1335
› Error Rate:0.12925
› Accuracy: 0.70785
› F1 : 0.31751
0.129295
Performance Metrics
Training the Deep Learning Model
Spark Integration RStudio
Drawbacks
› Needs a large data set.
› The training time is long.
› Needs a lot of parameter tuning (feature selection).
› Features need to be on the same scale.
Conclusions & Lessons
Learned
Conclusions & Lessons Learned
› Understanding the concept of data mining using
Classification
› Python/R/Scala/Matlab are useful tool for data mining
› Data processing and removal of highly correlated variables
helps to identify the main variables.
› Random Forest classifier/Confusion matrix
/PCA/SVM/Neural Network/ Gradient Tree Boosting
› Combination of various technique helps to identify the
factors related to unsatisfied customers.
› ROC curve was helpful to detect the accuracy of the model.
› Gradient Tree Boosting gave us the best model.
Q&A

More Related Content

What's hot

H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
Sri Ambati
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
Roger Barga
 
Prediction of quality for different type of winebased on different feature se...
Prediction of quality for different type of winebased on different feature se...Prediction of quality for different type of winebased on different feature se...
Prediction of quality for different type of winebased on different feature se...
Venkat Projects
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
Andrew Ferlitsch
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
ankit_ppt
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
ankit_ppt
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking System
Ayan Omer
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
Vishal Patel
 
SVM
SVMSVM
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
Sri Ambati
 
Four machine learning methods to predict academic achievement of college stud...
Four machine learning methods to predict academic achievement of college stud...Four machine learning methods to predict academic achievement of college stud...
Four machine learning methods to predict academic achievement of college stud...
Venkat Projects
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 

What's hot (15)

H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
Prediction of quality for different type of winebased on different feature se...
Prediction of quality for different type of winebased on different feature se...Prediction of quality for different type of winebased on different feature se...
Prediction of quality for different type of winebased on different feature se...
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking System
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 
SVM
SVMSVM
SVM
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
Four machine learning methods to predict academic achievement of college stud...
Four machine learning methods to predict academic achievement of college stud...Four machine learning methods to predict academic achievement of college stud...
Four machine learning methods to predict academic achievement of college stud...
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 

Similar to Build Deep Learning model to identify santader bank's dissatisfied customers

House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
Tahmid Abtahi
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
SrushtiSuvarna
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Sagar Deogirkar
 
Machine Learning Automation using Flask API
Machine Learning Automation using Flask APIMachine Learning Automation using Flask API
Machine Learning Automation using Flask API
SayantanGhosh58
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
CodeOps Technologies LLP
 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra Johnson
SigOpt
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Leo Salemann
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Karunakar Kotha
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Wenfan Xu
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllBernard Ong
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
 
Support Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random ForestSupport Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random Forest
umarcybermind
 

Similar to Build Deep Learning model to identify santader bank's dissatisfied customers (20)

House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Machine Learning Automation using Flask API
Machine Learning Automation using Flask APIMachine Learning Automation using Flask API
Machine Learning Automation using Flask API
 
eam2
eam2eam2
eam2
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra Johnson
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_All
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
Support Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random ForestSupport Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random Forest
 

Build Deep Learning model to identify santader bank's dissatisfied customers

  • 1. Santander Bank Challenge Duy Tran, Indranil Dey, Sriram RV, Sushir Simkhada, Dane Arnesen
  • 2. Agenda › Santander Bank customer satisfaction dataset overview (Sushir) › Data preprocessing (Sushir) › Algorithms / Tools – Random Forest using Python (Dane Arnesen) – SVM using Matlab (Indranil Dey) – Gradient Tree Boosting / XGBoost using R (Duy Tran) – Neural Network using Spark with H2O (Sriram RV) › Conclusions & Lessons Learned (Sushir) › Q&A
  • 3. Santander Bank Challenge • The competition was listed in www.kaggle.com. • Santander Bank wants to identify the dissatisfied customers. • This will help them to take actions to improve the customers happiness. • Which customers are unhappy? – Happy = 0, Unhappy = 1 – 371 features including CustomerID & TargetAttr – 76,020 rows in training data, only 3,008 rows where TargetAttr=1
  • 4. Preprocessing Issues:  More happy customer than unhappy customer.  Variables were provided in Spanish so we don’t understand the meaning of these variables.  Data processing • How to remove highly correlated variables and zero frequency variables  Solution • Removal of zero variance attributes • Removal of highly correlated attributes using correlation matrix
  • 6. Python RandomForestClassifier › Python DS library called Scikit-Learn – Classification, Regression, Clustering, Dimensionality Reduction, Visualization, etc. – Open Source – Recommend Anacanda download: https://www.continuum.io/downloads › RandomForestClassifier part of the Ensemble family of classifiers – Using random subset of features + bagging techniques – Lots of parameters…
  • 9. Model Feature Importance › Of 371 total features… – Only 13 features with measurable impact to the Random Forest classifier
  • 10. AUC Curve & Confusion Matrix Class 1 1 0 1 1603 (TP) 405 (FN) 0 586 (FP) 1408 (TN) › Using 55% probability cutoff: – Accuracy: 75% – TPR: 80% – FPR: 29% – Precision: 73% – F1: 76%
  • 12. Support Vector Machine 12 › A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. Given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. Advantages: › SVMs produces large margin separating hyperplane, and efficient in higher dimension › It maximizes the margin between points closest to the boundary › SVMs only consider points near the margin (support vectors) – more robust Disadvantages: › Due to complexity of the algorithm it requires high amount of memory and takes long time to train the model and predict the test data › The model is sensitive to optimal choice of kernel and regularization parameters
  • 13. Support Vector Machine 13 MODEL INFO: Status: Trained Training Time: 04:48:27 Classifier Options Type: SVM Kernel function: Linear kernel scale: 1.0 Kernel scale mode: Auto Box constraint level: 1.0 Multiclass method: One-vs-One Standardize data: true Cross Validation: 10 Folds Feature Selection Options Features Included: 369 Validation Results Validation accuracy: 96% › Model 1 : SVM using Linear Kernel – complete dataset with 369 predictors Class Precision Recall F1 0 100% 96.04% 97.98% 1 0% 0% -- Class 0 AUC: 58.01% Class 1 AUC: 58.01%
  • 14. Reducing the Number of Predictors 14 › By using MATLAB we created a correlation matrix for 369 predictors › From the correlation matrix we identified predictors which are highly positively or negatively correlated – Highly positively correlated: Correlation greater than 0.75 – Highly negatively correlated: Correlation less than -0.75 › After removing the highly correlated predictors the total number of predictors gor reduced to 115 from 369 Correlation Matrix with 369 Predictors
  • 15. Balancing the Dataset & Applying PCA 15 › After removal of correlated predictors the SVM models became more trained in predicting class 0, which was not a desired outcome › To overcome this issue we had to balance the training dataset, i.e. keeping equal number of records of both the classes in the training data – Using MATLAB randomly selected 3008 records of class 0 and combined 3008 records of class 1 › Also to improve the SVM models further, we used PCA with 50 components – Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components#.
  • 16. Support Vector Machine 16 MODEL INFO: Status: Trained Training Time: 00:06:42 Classifier Options Type: SVM Kernel function: Linear kernel scale: 1 Kernel scale mode: Auto Box constraint level: 1.0 Multiclass method: One-vs-One Standardize data: true Cross Validation: 10 Folds Feature Selection Options Features Included: 115 PCA Options Enable PCA: true Maximum number of components: 50 Validation Results Validation accuracy: 72.6% › Model 6 : SVM using Linear kernel – PCA (50 components) Class Precision Recall F1 0 72.47% 72.67% 72.57% 1 72.74% 72.55% 72.64% Class 0 AUC: 77.54% Class 1 AUC: 77.54% PCA explained variances: 61.5%, 28.6%, 10.0%, …….
  • 17. Comparing the SVM Models 17 › The model 6 has best prediction accuracy for both the classes Model No. Description Accuracy Class Precision Recall F1 AUC Model 1 SVM Linear Kernel – Complete dataset with 369 predictors 96% 0 100% 96.04% 97.98% 58.01% 1 0% 0% -- Model 2 SVM Linear Kernel – Complete dataset with 115 predictors 96% 0 99.99% 96.04% 97.98% 59.68% 1 0% 0% -- Model 3 SVM Gaussian Kernel – Complete dataset with 115 predictors 96% 0 99.99% 96.04% 97.98% 51.07% 1 0% 0% -- Model 4 SVM Linear Kernel – Balanced dataset with 115 predictors 70.8% 0 67.75% 72.14% 69.88% 78.64% 1 73.84% 69.6% 71.66% Model 5 SVM Gaussian Kernel – Balanced dataset with 115 predictors 70.2% 0 84.48% 65.71% 73.92% 77.58% 1 55.92% 78.27% 65.23% Model 6 SVM Linear Kernel (PCA) – Balanced dataset with 115 predictors 72.6% 0 72.47% 72.67% 72.57% 77.54% 1 72.74% 72.55% 72.64% * All models built with 10 folds cross-validation
  • 18. Learnings from building SVM Model 18 › Removing highly correlated predictors simplifies models › PCA is also a good way to deal with correlated attributes in a dataset › Unbalanced training dataset will impact the model’s prediction, and skew it towards the class with higher number of instances in the dataset › There is no single way for increasing the prediction accuracy of a model, we should take multiple approaches to iteratively improve the prediction accuracy of the predictive models
  • 20. Performance Metrics - GBM Class 1 Class 0 Class 1 256 316 Class 0 1104 13569 › Accuracy : 0.9069 › Precision : 0.44755 › TPR : 0.18824 › TNR: 0.97724 › F1 : 0.51751
  • 21. Training Process - GBM Number of Trees Use all observations? Use all predictors? Maximum depth of each tree Learning rate Balance response classes? Increase true positive rate but also increase false positive rate!
  • 22.
  • 23. Hyperparameter optimization – Grid vs Random http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/3/docs-website/h2o- docs/booklets/GBM_Vignette.pdf › Grid search – exhaustive, curse of dimensionality. › Random search – found to be more effective: http://jmlr.csail.mit.edu/papers/v olume13/bergstra12a/bergstra12a .pdf › Easy parallelization
  • 25. What is Deep Learning? › Deep Learning learns a hierarchy of non linear transformations. › Neurons transform their input in non linear way. › Three types of neurons Input, Output and Hidden neurons › Input neurons get activated by numbers in your dataset and output neurons is the output you want to see.
  • 26. Why did I choose this model? • Prediction speed is fast and also the results are very significant with less misclassification errors compared to any other algorithms. • Handles lots of irrelevant features well (separates signal from noise). • Automatically learns feature interactions. • H2O is a Java Virtual Machine that brings database-like interactiveness to Hadoop that is optimized for doing “in memory” processing of distributed, parallel machine learning algorithms on clusters. It can be installed as a standalone or on top of existing Hadoop installation.
  • 27. Performance Metrics – Deep Learning Class 0 Class 1 Class 0 64856 8156 Class 1 1673 1335 › Error Rate:0.12925 › Accuracy: 0.70785 › F1 : 0.31751 0.129295
  • 29. Training the Deep Learning Model
  • 31. Drawbacks › Needs a large data set. › The training time is long. › Needs a lot of parameter tuning (feature selection). › Features need to be on the same scale.
  • 33. Conclusions & Lessons Learned › Understanding the concept of data mining using Classification › Python/R/Scala/Matlab are useful tool for data mining › Data processing and removal of highly correlated variables helps to identify the main variables. › Random Forest classifier/Confusion matrix /PCA/SVM/Neural Network/ Gradient Tree Boosting › Combination of various technique helps to identify the factors related to unsatisfied customers. › ROC curve was helpful to detect the accuracy of the model. › Gradient Tree Boosting gave us the best model.
  • 34. Q&A