This document summarizes a presentation given by Duy Tran, Indranil Dey, Sriram RV, Sushir Simkhada, and Dane Arnesen on their work for the Santander Bank customer satisfaction challenge. They tested several machine learning algorithms including random forest (Python), support vector machine (Matlab), gradient tree boosting (R), and neural network (Spark with H2O). Their goal was to identify dissatisfied customers. Through data preprocessing, model tuning, and comparing results, they found that gradient tree boosting performed best at predicting customer satisfaction. They concluded that combining multiple techniques helps identify key factors related to customer satisfaction.
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelSri Ambati
H2O World 2015 - Arno Candel
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
In this tutorial, we will learn the the following topics -
+ Linear SVM Classification
+ Soft Margin Classification
+ Nonlinear SVM Classification
+ Polynomial Kernel
+ Adding Similarity Features
+ Gaussian RBF Kernel
+ Computational Complexity
+ SVM Regression
Winning Kaggle 101: Introduction to StackingTed Xiao
An Introduction to Stacking by Erin LeDell, from H2O.ai
Presented as part of the "Winning Kaggle 101" event, hosted by Machine Learning at Berkeley and Data Science Society at Berkeley. Special thanks to the Berkeley Institute of Data Science for the venue!
H2O.ai: http://www.h2o.ai/
ML@B: ml.berkeley.edu
DSSB: http://dssberkeley.org
BIDS: http://bids.berkeley.edu/
How to Win Machine Learning Competitions ? HackerEarth
This presentation was given by Marios Michailidis (a.k.a Kazanova), Current Kaggle Rank #3 to help community learn machine learning better. It comprises of useful ML tips and techniques to perform better in machine learning competitions. Read the full blog: http://blog.hackerearth.com/winning-tips-machine-learning-competitions-kazanova-current-kaggle-3
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelSri Ambati
H2O World 2015 - Arno Candel
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
In this tutorial, we will learn the the following topics -
+ Linear SVM Classification
+ Soft Margin Classification
+ Nonlinear SVM Classification
+ Polynomial Kernel
+ Adding Similarity Features
+ Gaussian RBF Kernel
+ Computational Complexity
+ SVM Regression
Winning Kaggle 101: Introduction to StackingTed Xiao
An Introduction to Stacking by Erin LeDell, from H2O.ai
Presented as part of the "Winning Kaggle 101" event, hosted by Machine Learning at Berkeley and Data Science Society at Berkeley. Special thanks to the Berkeley Institute of Data Science for the venue!
H2O.ai: http://www.h2o.ai/
ML@B: ml.berkeley.edu
DSSB: http://dssberkeley.org
BIDS: http://bids.berkeley.edu/
How to Win Machine Learning Competitions ? HackerEarth
This presentation was given by Marios Michailidis (a.k.a Kazanova), Current Kaggle Rank #3 to help community learn machine learning better. It comprises of useful ML tips and techniques to perform better in machine learning competitions. Read the full blog: http://blog.hackerearth.com/winning-tips-machine-learning-competitions-kazanova-current-kaggle-3
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
H2O World 2015 - Mark Landry
Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
H2O World 2015
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Availability of large data sets like ImageNet and VGG has provided scopes of applying machine learning classifiers to train models. However high data dimensionality is an issue while training classifiers such as Support Vector Machine (SVM) and perceptron. To reduce data dimensionality and take advantage of parallel and distributed processing, we propose a framework with Convolutional Neural Network (CNN) as Feature extractor and SVM and perceptron as the classifier. MPI (Message passing interface) was used for programming clusters of CPUs. SVM showed 1.05x times improvement over perceptron in terms of run time and CNN reduced data dimensionality by 10x times.
This presentation inludes step-by step tutorial by including the screen recordings to learn Rapid Miner.It also includes the step-step-step procedure to use the most interesting features -Turbo Prep and Auto Model.
Heuristic design of experiments w meta gradient searchGreg Makowski
Once you have started learning about predictive algorithms, and the basic knowledge discovery in databases process, what is the next level of detail to learn for a consulting project?
* Give examples of the many model training parameters
* Track results in a "model notebook"
* Use a model metric that combines both accuracy and generalization to rank models
* How to strategically search over the model training parameters - use a gradient descent approach
* One way to describe an arbitrarily complex predictive system is by using sensitivity analysis
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
Comparing the State-of-the-Art Deep Learning with Machine Learning algorithms performance on TF-IDF vector creation for Sentiment Analysis using Airline Tweeter Data Set.
The presentation describes how Machine Learning Algorithms can be automated through a Flask Web API. It represents the effectivity of machine learning automation that would reduce operation time dramatically.
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
H2O World 2015 - Mark Landry
Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
H2O World 2015
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Availability of large data sets like ImageNet and VGG has provided scopes of applying machine learning classifiers to train models. However high data dimensionality is an issue while training classifiers such as Support Vector Machine (SVM) and perceptron. To reduce data dimensionality and take advantage of parallel and distributed processing, we propose a framework with Convolutional Neural Network (CNN) as Feature extractor and SVM and perceptron as the classifier. MPI (Message passing interface) was used for programming clusters of CPUs. SVM showed 1.05x times improvement over perceptron in terms of run time and CNN reduced data dimensionality by 10x times.
This presentation inludes step-by step tutorial by including the screen recordings to learn Rapid Miner.It also includes the step-step-step procedure to use the most interesting features -Turbo Prep and Auto Model.
Heuristic design of experiments w meta gradient searchGreg Makowski
Once you have started learning about predictive algorithms, and the basic knowledge discovery in databases process, what is the next level of detail to learn for a consulting project?
* Give examples of the many model training parameters
* Track results in a "model notebook"
* Use a model metric that combines both accuracy and generalization to rank models
* How to strategically search over the model training parameters - use a gradient descent approach
* One way to describe an arbitrarily complex predictive system is by using sensitivity analysis
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
Comparing the State-of-the-Art Deep Learning with Machine Learning algorithms performance on TF-IDF vector creation for Sentiment Analysis using Airline Tweeter Data Set.
The presentation describes how Machine Learning Algorithms can be automated through a Flask Web API. It represents the effectivity of machine learning automation that would reduce operation time dramatically.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
Plotcon 2016 Visualization Talk by Alexandra JohnsonSigOpt
Machine learning is full of ideas that are far abstracted away from the underlying data and difficult to understand. Luckily, this represents an amazing opportunity for visualization! These slides dive into the machine learning meta-problem of hyperparameter optimization. We'll show 4 opportunities for visualization in helping people understand, implement, and evaluate hyperparameter optimization strategies.
Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann
With only three months' instruction, a five-person team uses Azure Machine Learning Studio to predict Moscow real estate prices based on property descriptors, macroeconomic indicators, and geospatial data.
Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha
With only three months' instruction, a five-person team uses Azure Machine Learning Studio to predict Moscow real estate prices based on property descriptors, macroeconomic indicators, and geospatial data
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
Using Bayesian Optimization to Tune Machine Learning Models: In this talk we briefly introduce Bayesian Global Optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time-consuming or expensive. We will motivate the problem and give example applications.
We will also talk about our development of a robust benchmark suite for our algorithms including test selection, metric design, infrastructure architecture, visualization, and comparison to other standard and open source methods. We will discuss how this evaluation framework empowers our research engineers to confidently and quickly make changes to our core optimization engine.
We will end with an in-depth example of using these methods to tune the features and hyperparameters of a real world problem and give several real world applications.
2. Agenda
› Santander Bank customer satisfaction dataset overview (Sushir)
› Data preprocessing (Sushir)
› Algorithms / Tools
– Random Forest using Python (Dane Arnesen)
– SVM using Matlab (Indranil Dey)
– Gradient Tree Boosting / XGBoost using R (Duy Tran)
– Neural Network using Spark with H2O (Sriram RV)
› Conclusions & Lessons Learned (Sushir)
› Q&A
3. Santander Bank Challenge
• The competition was listed in www.kaggle.com.
• Santander Bank wants to identify the dissatisfied customers.
• This will help them to take actions to improve the customers
happiness.
• Which customers are unhappy?
– Happy = 0, Unhappy = 1
– 371 features including CustomerID & TargetAttr
– 76,020 rows in training data, only 3,008 rows where TargetAttr=1
4. Preprocessing Issues:
More happy customer than unhappy customer.
Variables were provided in Spanish so we don’t understand the
meaning of these variables.
Data processing
• How to remove highly correlated variables and zero frequency
variables
Solution
• Removal of zero variance attributes
• Removal of highly correlated attributes using correlation matrix
6. Python RandomForestClassifier
› Python DS library called Scikit-Learn
– Classification, Regression, Clustering, Dimensionality Reduction, Visualization, etc.
– Open Source
– Recommend Anacanda download: https://www.continuum.io/downloads
› RandomForestClassifier part of the Ensemble family of classifiers
– Using random subset of features + bagging techniques
– Lots of parameters…
12. Support Vector Machine
12
› A Support Vector Machine (SVM) is a discriminative classifier formally defined by
a separating hyperplane. Given labeled training data (supervised learning), the
algorithm outputs an optimal hyperplane which categorizes new examples.
Advantages:
› SVMs produces large margin separating hyperplane, and efficient in higher dimension
› It maximizes the margin between points closest to the boundary
› SVMs only consider points near the margin (support vectors) – more robust
Disadvantages:
› Due to complexity of the algorithm it requires high amount of memory and takes long time to train
the model and predict the test data
› The model is sensitive to optimal choice of kernel and regularization parameters
13. Support Vector Machine
13
MODEL INFO:
Status: Trained
Training Time: 04:48:27
Classifier Options
Type: SVM
Kernel function: Linear
kernel scale: 1.0
Kernel scale mode: Auto
Box constraint level: 1.0
Multiclass method: One-vs-One
Standardize data: true
Cross Validation: 10 Folds
Feature Selection Options
Features Included: 369
Validation Results
Validation accuracy: 96%
› Model 1 : SVM using Linear Kernel – complete dataset with 369 predictors
Class Precision Recall F1
0 100% 96.04% 97.98%
1 0% 0% --
Class 0
AUC: 58.01%
Class 1
AUC: 58.01%
14. Reducing the Number of Predictors
14
› By using MATLAB we created a correlation matrix for 369 predictors
› From the correlation matrix we identified predictors which are highly positively or
negatively correlated
– Highly positively correlated: Correlation greater than 0.75
– Highly negatively correlated: Correlation less than -0.75
› After removing the highly correlated predictors the total number of predictors gor
reduced to 115 from 369
Correlation Matrix with 369 Predictors
15. Balancing the Dataset & Applying PCA
15
› After removal of correlated predictors the SVM models became more trained in
predicting class 0, which was not a desired outcome
› To overcome this issue we had to balance the training dataset, i.e. keeping equal
number of records of both the classes in the training data
– Using MATLAB randomly selected 3008 records of class 0 and combined 3008 records of class 1
› Also to improve the SVM models further, we used PCA with 50 components
– Principal component analysis (PCA) is a statistical procedure that uses an orthogonal
transformation to convert a set of observations of possibly correlated variables into a set of
values of linearly uncorrelated variables called principal components#.
16. Support Vector Machine
16
MODEL INFO:
Status: Trained
Training Time: 00:06:42
Classifier Options
Type: SVM
Kernel function: Linear
kernel scale: 1
Kernel scale mode: Auto
Box constraint level: 1.0
Multiclass method: One-vs-One
Standardize data: true
Cross Validation: 10 Folds
Feature Selection Options
Features Included: 115
PCA Options
Enable PCA: true
Maximum number of components: 50
Validation Results
Validation accuracy: 72.6%
› Model 6 : SVM using Linear kernel – PCA (50 components)
Class Precision Recall F1
0 72.47% 72.67% 72.57%
1 72.74% 72.55% 72.64%
Class 0
AUC: 77.54%
Class 1
AUC: 77.54%
PCA explained variances: 61.5%, 28.6%, 10.0%, …….
17. Comparing the SVM Models
17
› The model 6 has best prediction accuracy for both the classes
Model No. Description Accuracy Class Precision Recall F1 AUC
Model 1
SVM Linear Kernel – Complete
dataset with 369 predictors
96%
0 100% 96.04% 97.98%
58.01%
1 0% 0% --
Model 2
SVM Linear Kernel – Complete
dataset with 115 predictors
96%
0 99.99% 96.04% 97.98%
59.68%
1 0% 0% --
Model 3
SVM Gaussian Kernel – Complete
dataset with 115 predictors
96%
0 99.99% 96.04% 97.98%
51.07%
1 0% 0% --
Model 4
SVM Linear Kernel – Balanced
dataset with 115 predictors
70.8%
0 67.75% 72.14% 69.88%
78.64%
1 73.84% 69.6% 71.66%
Model 5
SVM Gaussian Kernel – Balanced
dataset with 115 predictors
70.2%
0 84.48% 65.71% 73.92%
77.58%
1 55.92% 78.27% 65.23%
Model 6
SVM Linear Kernel (PCA) –
Balanced dataset with 115
predictors
72.6%
0 72.47% 72.67% 72.57%
77.54%
1 72.74% 72.55% 72.64%
* All models built with 10 folds cross-validation
18. Learnings from building SVM Model
18
› Removing highly correlated predictors simplifies models
› PCA is also a good way to deal with correlated attributes in a dataset
› Unbalanced training dataset will impact the model’s prediction, and skew it
towards the class with higher number of instances in the dataset
› There is no single way for increasing the prediction accuracy of a model, we
should take multiple approaches to iteratively improve the prediction accuracy of
the predictive models
20. Performance Metrics - GBM
Class 1 Class 0
Class 1 256 316
Class 0 1104 13569
› Accuracy : 0.9069
› Precision : 0.44755
› TPR : 0.18824
› TNR: 0.97724
› F1 : 0.51751
21. Training Process - GBM
Number of Trees
Use all observations?
Use all predictors?
Maximum depth of each tree
Learning rate
Balance response classes?
Increase true positive rate but also
increase false positive rate!
22.
23. Hyperparameter optimization – Grid vs Random
http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/3/docs-website/h2o-
docs/booklets/GBM_Vignette.pdf
› Grid search – exhaustive, curse of
dimensionality.
› Random search – found to be more
effective:
http://jmlr.csail.mit.edu/papers/v
olume13/bergstra12a/bergstra12a
.pdf
› Easy parallelization
25. What is Deep Learning?
› Deep Learning learns a hierarchy of non linear transformations.
› Neurons transform their input in non linear way.
› Three types of neurons Input, Output and Hidden neurons
› Input neurons get activated by numbers in your dataset and output neurons is the output
you want to see.
26. Why did I choose this model?
• Prediction speed is fast and also the results are very significant with less
misclassification errors compared to any other algorithms.
• Handles lots of irrelevant features well (separates signal from noise).
• Automatically learns feature interactions.
• H2O is a Java Virtual Machine that brings database-like interactiveness to Hadoop
that is optimized for doing “in memory” processing of distributed, parallel machine
learning algorithms on clusters. It can be installed as a standalone or on top of
existing Hadoop installation.
27. Performance Metrics – Deep Learning
Class 0 Class 1
Class 0 64856 8156
Class 1 1673 1335
› Error Rate:0.12925
› Accuracy: 0.70785
› F1 : 0.31751
0.129295
31. Drawbacks
› Needs a large data set.
› The training time is long.
› Needs a lot of parameter tuning (feature selection).
› Features need to be on the same scale.
33. Conclusions & Lessons Learned
› Understanding the concept of data mining using
Classification
› Python/R/Scala/Matlab are useful tool for data mining
› Data processing and removal of highly correlated variables
helps to identify the main variables.
› Random Forest classifier/Confusion matrix
/PCA/SVM/Neural Network/ Gradient Tree Boosting
› Combination of various technique helps to identify the
factors related to unsatisfied customers.
› ROC curve was helpful to detect the accuracy of the model.
› Gradient Tree Boosting gave us the best model.