SlideShare a Scribd company logo
1 of 22
Download to read offline
PRUDENTIAL
LIFE
INSURANCE
RISK MODELA Kaggle competition for GA – PT Data Science ’15-’16
Patrick Kennedy – 2.15.16
patrick@structuredmotivation.com
What is the
problem?
• Prudential life insurance
30 day process to establish risk
• What if we could …
make life insurance selection on-demand?
• Let’s build a model to predict levelsof risk as measured
by application status
Leaderboard • Show kaggle leaderboardwith scores(asmeasured by
QWK)
• Goal? 30k
The Data Anonymized:
– Train [59381, 128], Test[19765, 127]
– 13 continuous
– 65 categorical
– 4 discrete
– 48 other
– 1 Id, 1 Response
– Containsno apriori intuition
The real trick is that there are 8 classes of output… I choose to build models based on a
continuous target and then use a function to provide cut points before submitting final predictions
(…it seemed a little easier than building 8 separate models)
Initial
Exploration …
Roadmap
1. Find a model
2. Build a network of models
3.Tune
4. Results?
Baseline
model
(1/2)
• XGBoost – Score of .669
• XGBoost standsfor eXtremeGradient Boosting
• Parallelized tree boosting /FAST
• Has python wrappersfor ease of use
Rank: 138 / 1970
Top 10%**
Baseline
model
(2/2)
• Process:
1) train model
2) train offsets
3) apply offsets to predicted test set
fmin_powell,quadratic weighted kappa
• fmin_powell isan optimizationmethod – sequentially
minimizingeach vector passed and updating iteratively
• QWK isinter-rater agreement measure.Except it takes
into account howwrong measuresare and penalizes
greater disagreement
8
6
7
3
6
5
5
4
1
2
4
2
8
3
Actual
7.35
6.72
7.11
1.32
5.49
5.12
5.03
3.19
1.01
2.47
4.11
2.54
8.32
3.00
Predicted New Predictions
12.48
5.99
11.22
2.56
5.56
5.11
5.03
3.78
0.03
2.48
3.76
1.98
23.09
3.24
-QWK
Offset Guesses
(applied per class)
optimize
sequentially np.clip(data, 1, 8)
round
1. 2. 3. 4. 5. 6. 7.
MOAR
models
• When one is good,howabout 29?
Model 1
Model 2
Model 3
Model 4
…
Model 27
Level 1 Level 2 Level 3
XGBoost
AdaBoost
Train / Apply offset
Level 4
Weighted
Predictions
Stacking: [...] stacked generalization is a means of non-linearly combining generalizers to make a new generalizer,
to try to optimally integrate what each of the original generalizers has to say about the learning set.
The more each generalizer has to say (which isn’t duplicated in what the other generalizer’s have to say),
the better the resultant stacked generalization. Wolpert (1992) StackedGeneralization
Blending: A word introduced by the Netflix winners. It is very close to stacked generalization,
but a bit simpler and less risk of an information leak. Some researchers use “stacked ensembling”
and “blending” interchangeably. With blending, instead of creating out-of-fold predictions for
the train set, you create a small holdout set of say 10% of the train set.The stacker model then
trains on this holdout set only. (http://mlwave.com/kaggle-ensembling-guide/)
TRAINTESTCV
1.Train Model
2. Predict CV
3. PredictTest
5. Iterate
4. CV predictions become new train set
Avg. test predictions become new test set
do this for each classifier…
Or youcan use [stacked_generalization] @ https://github.com/dustinstansbury/stacked_generalization
and do this automatically – and a lot faster!
Stay tuned
• Grid search,Random search
• hyperopt &BayesOpt
(others: MOE, spearmint require mongodb instance)
• Note: hyperopt also hasthe ability to select
preprocessing and classifierstoo … pretty cool
Method Score Time
GridSearchCV n/a Too long
RandomizedSearchCV 0.473 24.4 hours
Hyperopt 0.613 13 hours
BayesOpt 0.663 62 minutes
scores for single XGBRegressor model
Back to my
models…
• Trying newparamswith network of models(but fewer
of them)… using ensemble based on optimizations
• What are the results? (score and time)
• What is the level system like?
Model 1
Model 2
Model 3
Model 4
…
Model 27
Level 1 Level 2 Level 3
XGBoost
AdaBoost
Train / Apply offset
Level 4
Weighted
Predictions
Auto-
sklearn
• Eh?
Final-ish
Results
Model Best Score Time
Single XGBoost 0.669* 15 minutes
4 level stack 0.665 ~12 hours
Tuned single XGBoost 0.663 75 minutes
Auto-sklearn + XGBoost 0.667 60 minutes
* Lucky seed
In the mean time my position has gone from 138/1970 to 660/2695 ~ 24th percentile
Last ditch
effort
• If model optimizationisa dead-end,what other
aspectscan be optimized?
• Offsets!
– 1a) Initial offset guesses (fmin is sensitive to these)
– 1b)Order in which the offsets are applied (fminsensitive)
– 2) Binning predictions instead of applying offsets?
• Are there really no intuitionsabout the data?
Final
Results
Model Best Score Time
Single XGBoost 0.669 15 minutes
4 level stack 0.665 ~12 hours
Tuned single XGBoost 0.663 75 minutes
Auto-sklearn + XGBoost 0.667 60 minutes
Optimize XGBoost offsets 0.667 15 minutes
+ ~12hrs for optimizations
Optimize XGBoost bins 0.664 15 minutes
+ ~4 hrs for optimizations
Roadmap
1. Find a model
2. Build a network of models
3.Tune
4. Results?
Next steps… • 5 days left to....
– Explore potential structural intuitions
• (Count / Sum / Interactive effects)
– Explore additional models like Neural Networks...
• Down the road...
– Beef up skills stacking and blending (optimizetime)-or-
Build my own
– Win a GD competition
• A note about insurance and risk...
PRUDENTIAL
LIFE
INSURANCE
RISK MODELA Kaggle competition for GA – PT Data Science ’15-’16
Patrick Kennedy – 2.15.16
patrick@structuredmotivation.com

More Related Content

What's hot

EuroSciPy 2019: Visual diagnostics at scale
EuroSciPy 2019: Visual diagnostics at scaleEuroSciPy 2019: Visual diagnostics at scale
EuroSciPy 2019: Visual diagnostics at scaleRebecca Bilbro
 
Common Problems in Hyperparameter Optimization
Common Problems in Hyperparameter OptimizationCommon Problems in Hyperparameter Optimization
Common Problems in Hyperparameter OptimizationSigOpt
 
Linear Regression Ex
Linear Regression ExLinear Regression Ex
Linear Regression Exmailund
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2OSri Ambati
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneDhiana Deva
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringSri Ambati
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnAmol Agrawal
 
Nearest neighbour algorithm
Nearest neighbour algorithmNearest neighbour algorithm
Nearest neighbour algorithmAnmitas1
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Sneha Ravikumar
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneDhiana Deva
 
Machine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies StockholmMachine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies StockholmDhiana Deva
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistRebecca Bilbro
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking SystemAyan Omer
 
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017MLconf
 
Changing paradigms in ai prototyping
Changing paradigms in ai prototypingChanging paradigms in ai prototyping
Changing paradigms in ai prototypingCarlos Toxtli
 

What's hot (20)

EuroSciPy 2019: Visual diagnostics at scale
EuroSciPy 2019: Visual diagnostics at scaleEuroSciPy 2019: Visual diagnostics at scale
EuroSciPy 2019: Visual diagnostics at scale
 
Common Problems in Hyperparameter Optimization
Common Problems in Hyperparameter OptimizationCommon Problems in Hyperparameter Optimization
Common Problems in Hyperparameter Optimization
 
Linear Regression Ex
Linear Regression ExLinear Regression Ex
Linear Regression Ex
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for Everyone
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-Learn
 
Nearest neighbour algorithm
Nearest neighbour algorithmNearest neighbour algorithm
Nearest neighbour algorithm
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Machine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies StockholmMachine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies Stockholm
 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking System
 
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
 
Changing paradigms in ai prototyping
Changing paradigms in ai prototypingChanging paradigms in ai prototyping
Changing paradigms in ai prototyping
 

Viewers also liked

Stephen N Martin Resume
Stephen N Martin ResumeStephen N Martin Resume
Stephen N Martin ResumeStephen Martin
 
Ilu4 PHILIP LEE HARVEY - Cristielen Souza
Ilu4 PHILIP LEE HARVEY - Cristielen SouzaIlu4 PHILIP LEE HARVEY - Cristielen Souza
Ilu4 PHILIP LEE HARVEY - Cristielen SouzaCristielen Souza
 
Sponsorplan Marathon Eindhoven
Sponsorplan Marathon EindhovenSponsorplan Marathon Eindhoven
Sponsorplan Marathon EindhovenNienke van Elk
 
Fotografia Newborn - Patrícia Salvi
Fotografia Newborn - Patrícia SalviFotografia Newborn - Patrícia Salvi
Fotografia Newborn - Patrícia SalviAmanda Petry
 
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Sri Ambati
 
Mandibular movements / dental courses
Mandibular movements / dental coursesMandibular movements / dental courses
Mandibular movements / dental coursesIndian dental academy
 
Lasso Screening Rules via Dual Polytope Projection
Lasso Screening Rules via Dual Polytope ProjectionLasso Screening Rules via Dual Polytope Projection
Lasso Screening Rules via Dual Polytope ProjectionChester Chen
 
SF Scala meet up, lighting talk: SPA -- Scala JDBC wrapper
SF Scala meet up, lighting talk: SPA -- Scala JDBC wrapperSF Scala meet up, lighting talk: SPA -- Scala JDBC wrapper
SF Scala meet up, lighting talk: SPA -- Scala JDBC wrapperChester Chen
 
Alpine ML Talk: Vtreat: A Package for Automating Variable Treatment in R By ...
Alpine ML Talk:  Vtreat: A Package for Automating Variable Treatment in R By ...Alpine ML Talk:  Vtreat: A Package for Automating Variable Treatment in R By ...
Alpine ML Talk: Vtreat: A Package for Automating Variable Treatment in R By ...Chester Chen
 
Frequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountFrequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountChester Chen
 
Strategic Sourcing in the Digital Economy
Strategic Sourcing in the Digital EconomyStrategic Sourcing in the Digital Economy
Strategic Sourcing in the Digital EconomySAP Ariba
 
Supplier Risk Is Your Risk. Are you prepared?
Supplier Risk Is Your Risk. Are you prepared?Supplier Risk Is Your Risk. Are you prepared?
Supplier Risk Is Your Risk. Are you prepared?SAP Ariba
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
SF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher BernerSF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher BernerChester Chen
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"Jo-fai Chow
 
The Journey to the Cloud : Preparing for Success in the Digital Economy
The Journey to the Cloud: Preparing for Success in the Digital EconomyThe Journey to the Cloud: Preparing for Success in the Digital Economy
The Journey to the Cloud : Preparing for Success in the Digital EconomySAP Ariba
 

Viewers also liked (20)

Stephen N Martin Resume
Stephen N Martin ResumeStephen N Martin Resume
Stephen N Martin Resume
 
Off the page
Off the pageOff the page
Off the page
 
Ilu4 PHILIP LEE HARVEY - Cristielen Souza
Ilu4 PHILIP LEE HARVEY - Cristielen SouzaIlu4 PHILIP LEE HARVEY - Cristielen Souza
Ilu4 PHILIP LEE HARVEY - Cristielen Souza
 
Sponsorplan Marathon Eindhoven
Sponsorplan Marathon EindhovenSponsorplan Marathon Eindhoven
Sponsorplan Marathon Eindhoven
 
Fotografia Newborn - Patrícia Salvi
Fotografia Newborn - Patrícia SalviFotografia Newborn - Patrícia Salvi
Fotografia Newborn - Patrícia Salvi
 
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
 
Mandibular movements / dental courses
Mandibular movements / dental coursesMandibular movements / dental courses
Mandibular movements / dental courses
 
Lasso Screening Rules via Dual Polytope Projection
Lasso Screening Rules via Dual Polytope ProjectionLasso Screening Rules via Dual Polytope Projection
Lasso Screening Rules via Dual Polytope Projection
 
SF Scala meet up, lighting talk: SPA -- Scala JDBC wrapper
SF Scala meet up, lighting talk: SPA -- Scala JDBC wrapperSF Scala meet up, lighting talk: SPA -- Scala JDBC wrapper
SF Scala meet up, lighting talk: SPA -- Scala JDBC wrapper
 
Encuesta sobre perfil docente
Encuesta sobre perfil docenteEncuesta sobre perfil docente
Encuesta sobre perfil docente
 
Machine Learning for Data Mining
Machine Learning for Data MiningMachine Learning for Data Mining
Machine Learning for Data Mining
 
Alpine ML Talk: Vtreat: A Package for Automating Variable Treatment in R By ...
Alpine ML Talk:  Vtreat: A Package for Automating Variable Treatment in R By ...Alpine ML Talk:  Vtreat: A Package for Automating Variable Treatment in R By ...
Alpine ML Talk: Vtreat: A Package for Automating Variable Treatment in R By ...
 
Frequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountFrequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John Mount
 
Strategic Sourcing in the Digital Economy
Strategic Sourcing in the Digital EconomyStrategic Sourcing in the Digital Economy
Strategic Sourcing in the Digital Economy
 
Supplier Risk Is Your Risk. Are you prepared?
Supplier Risk Is Your Risk. Are you prepared?Supplier Risk Is Your Risk. Are you prepared?
Supplier Risk Is Your Risk. Are you prepared?
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
SF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher BernerSF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher Berner
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
 
The Journey to the Cloud : Preparing for Success in the Digital Economy
The Journey to the Cloud: Preparing for Success in the Digital EconomyThe Journey to the Cloud: Preparing for Success in the Digital Economy
The Journey to the Cloud : Preparing for Success in the Digital Economy
 

Similar to Prudential Life Insurance Risk Model: A Kaggle Competition Case Study

Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 
Salt Identification Challenge
Salt Identification ChallengeSalt Identification Challenge
Salt Identification Challengekenluck2001
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris..."A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...Quantopian
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkSri Ambati
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecastingDevon Barrow
 
model simulating
model simulatingmodel simulating
model simulatingFEG
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
MLX 2018 - Marcos López de Prado, Lawrence Berkeley National Laboratory Comp...
MLX 2018 - Marcos López de Prado, Lawrence Berkeley National Laboratory Comp...MLX 2018 - Marcos López de Prado, Lawrence Berkeley National Laboratory Comp...
MLX 2018 - Marcos López de Prado, Lawrence Berkeley National Laboratory Comp...Mehdi Merai Ph.D.(c)
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection TechniquesSwati .
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowDatabricks
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarSigOpt
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoostJoonyoung Yi
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Max Kleiner
 

Similar to Prudential Life Insurance Risk Model: A Kaggle Competition Case Study (20)

Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
C3 w1
C3 w1C3 w1
C3 w1
 
Salt Identification Challenge
Salt Identification ChallengeSalt Identification Challenge
Salt Identification Challenge
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Spock Framework
Spock FrameworkSpock Framework
Spock Framework
 
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris..."A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecasting
 
model simulating
model simulatingmodel simulating
model simulating
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
MLX 2018 - Marcos López de Prado, Lawrence Berkeley National Laboratory Comp...
MLX 2018 - Marcos López de Prado, Lawrence Berkeley National Laboratory Comp...MLX 2018 - Marcos López de Prado, Lawrence Berkeley National Laboratory Comp...
MLX 2018 - Marcos López de Prado, Lawrence Berkeley National Laboratory Comp...
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflow
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62
 
unit_tests_tutorial
unit_tests_tutorialunit_tests_tutorial
unit_tests_tutorial
 

Prudential Life Insurance Risk Model: A Kaggle Competition Case Study

  • 1. PRUDENTIAL LIFE INSURANCE RISK MODELA Kaggle competition for GA – PT Data Science ’15-’16 Patrick Kennedy – 2.15.16 patrick@structuredmotivation.com
  • 2. What is the problem? • Prudential life insurance 30 day process to establish risk • What if we could … make life insurance selection on-demand? • Let’s build a model to predict levelsof risk as measured by application status
  • 3. Leaderboard • Show kaggle leaderboardwith scores(asmeasured by QWK) • Goal? 30k
  • 4. The Data Anonymized: – Train [59381, 128], Test[19765, 127] – 13 continuous – 65 categorical – 4 discrete – 48 other – 1 Id, 1 Response – Containsno apriori intuition The real trick is that there are 8 classes of output… I choose to build models based on a continuous target and then use a function to provide cut points before submitting final predictions (…it seemed a little easier than building 8 separate models)
  • 6. Roadmap 1. Find a model 2. Build a network of models 3.Tune 4. Results?
  • 7. Baseline model (1/2) • XGBoost – Score of .669 • XGBoost standsfor eXtremeGradient Boosting • Parallelized tree boosting /FAST • Has python wrappersfor ease of use Rank: 138 / 1970 Top 10%**
  • 8. Baseline model (2/2) • Process: 1) train model 2) train offsets 3) apply offsets to predicted test set fmin_powell,quadratic weighted kappa • fmin_powell isan optimizationmethod – sequentially minimizingeach vector passed and updating iteratively • QWK isinter-rater agreement measure.Except it takes into account howwrong measuresare and penalizes greater disagreement
  • 10. MOAR models • When one is good,howabout 29?
  • 11. Model 1 Model 2 Model 3 Model 4 … Model 27 Level 1 Level 2 Level 3 XGBoost AdaBoost Train / Apply offset Level 4 Weighted Predictions Stacking: [...] stacked generalization is a means of non-linearly combining generalizers to make a new generalizer, to try to optimally integrate what each of the original generalizers has to say about the learning set. The more each generalizer has to say (which isn’t duplicated in what the other generalizer’s have to say), the better the resultant stacked generalization. Wolpert (1992) StackedGeneralization Blending: A word introduced by the Netflix winners. It is very close to stacked generalization, but a bit simpler and less risk of an information leak. Some researchers use “stacked ensembling” and “blending” interchangeably. With blending, instead of creating out-of-fold predictions for the train set, you create a small holdout set of say 10% of the train set.The stacker model then trains on this holdout set only. (http://mlwave.com/kaggle-ensembling-guide/)
  • 12. TRAINTESTCV 1.Train Model 2. Predict CV 3. PredictTest 5. Iterate 4. CV predictions become new train set Avg. test predictions become new test set do this for each classifier… Or youcan use [stacked_generalization] @ https://github.com/dustinstansbury/stacked_generalization and do this automatically – and a lot faster!
  • 13. Stay tuned • Grid search,Random search • hyperopt &BayesOpt (others: MOE, spearmint require mongodb instance) • Note: hyperopt also hasthe ability to select preprocessing and classifierstoo … pretty cool Method Score Time GridSearchCV n/a Too long RandomizedSearchCV 0.473 24.4 hours Hyperopt 0.613 13 hours BayesOpt 0.663 62 minutes scores for single XGBRegressor model
  • 14. Back to my models… • Trying newparamswith network of models(but fewer of them)… using ensemble based on optimizations • What are the results? (score and time) • What is the level system like? Model 1 Model 2 Model 3 Model 4 … Model 27 Level 1 Level 2 Level 3 XGBoost AdaBoost Train / Apply offset Level 4 Weighted Predictions
  • 16. Final-ish Results Model Best Score Time Single XGBoost 0.669* 15 minutes 4 level stack 0.665 ~12 hours Tuned single XGBoost 0.663 75 minutes Auto-sklearn + XGBoost 0.667 60 minutes * Lucky seed In the mean time my position has gone from 138/1970 to 660/2695 ~ 24th percentile
  • 17. Last ditch effort • If model optimizationisa dead-end,what other aspectscan be optimized? • Offsets! – 1a) Initial offset guesses (fmin is sensitive to these) – 1b)Order in which the offsets are applied (fminsensitive) – 2) Binning predictions instead of applying offsets? • Are there really no intuitionsabout the data?
  • 18.
  • 19. Final Results Model Best Score Time Single XGBoost 0.669 15 minutes 4 level stack 0.665 ~12 hours Tuned single XGBoost 0.663 75 minutes Auto-sklearn + XGBoost 0.667 60 minutes Optimize XGBoost offsets 0.667 15 minutes + ~12hrs for optimizations Optimize XGBoost bins 0.664 15 minutes + ~4 hrs for optimizations
  • 20. Roadmap 1. Find a model 2. Build a network of models 3.Tune 4. Results?
  • 21. Next steps… • 5 days left to.... – Explore potential structural intuitions • (Count / Sum / Interactive effects) – Explore additional models like Neural Networks... • Down the road... – Beef up skills stacking and blending (optimizetime)-or- Build my own – Win a GD competition • A note about insurance and risk...
  • 22. PRUDENTIAL LIFE INSURANCE RISK MODELA Kaggle competition for GA – PT Data Science ’15-’16 Patrick Kennedy – 2.15.16 patrick@structuredmotivation.com