SlideShare a Scribd company logo
Importance of
Domain
Expertise for
Building ML
Based Models
Mark Seiss, PhD
Director, Advanced Analytic
Services
Dun & Bradstreet
#H2OWORLD
#H2OWORLD
– Introduction
– Questions Explored
– Data
– Analysis/Experiments
– Summary Recommendations
AGENDA
Special Thanks to Venkata Vipparthi from the D&B Chennai office for his
contributions to this research.
Introduction
MOTIVATION
GENERAL STEPS FOR
BUILDING TRADITIONAL
MODELS
GUIDANCE FOR ML MODELS
GOALS
– For traditional modeling methods,
the Advanced Analytic Services
(AAS) team at Dun and Bradstreet
have steps and best practices
developed by the team.
– Our goal is to provide similar
guidance for Machine Learning
models that are utilized more and
more by our team to provide our
customers improved assessments of
risk and targeting of prospects.
1. Segmentation Analysis
2. Univariate Analysis
3. Variables Selection
4. Explore Interactions
5. Model Validation
6. Finalize and Document
Model
– Anecdotal evidence often cited
– Limited literature
– How to answer customer queries on
ML?
– Create a rubric for implementing
ML models for all D&B data
scientists.
– Provide guidance on when ML
models should be used versus
traditional models.
Questions Explored
How does the
univariate
performance
distribution affect
the lift provided by
ML models?
Does model
segmentation
improve the
performance of ML
models?
Should the pool of
predictor variables
be filtered prior to
input into ML
models?How many records are
needed for ML models
to outperform
traditional models?
Data Used in Study
This study uses analytic datasets previously aggregated the D&B Advanced Analytic Services (AAS)
team for the development of various standard and custom models.
ALTERNATIVE LENDERS DATA FINANCIAL STRESS SCORE (FSS)1 2
– FSS is one of D&B traditional standard risk scores, assessing
the risk that a business will experience financial stress and
declare bankruptcy in this next 12 months.
– Current form of the score is built based on scorecard modeling
methodology (form of logistic regression).
– Data split into 2 Segments: Small Companies (~1.1M Records)
and Large Companies (~300K Records).
– Alternative lenders make financing available to US business,
small ones in particular, when traditional loans are not
available.
– Alternative lenders often make loan approvals much faster
than traditional banks, requiring analytic solutions that quickly
and accurately assess a company’s payment risk.
– Alternative Lenders Credit Score assesses a small businesses
likelihood to be delinquent with their payments.
CANADIAN EXPORT PROPENSITY SBA LENDER PURCHASE RATING3 4
– The mission of the Office of Credit Risk Management (OCRM)
at the Small Business Administration (SBA) is to manage
program credit risk, monitor lender performance, and enforce
lending program requirements.
– The Lender Purchase Rating (LPR) predicts the performance of
loans in a lender’s SBA portfolio over the next 12 months.
– Canadian government and provincial ministries have a need to
identify businesses that export for planning and economic
development purposes.
– Export propensity assesses the likelihood that a business
exports goods or services.
Numerous clients in the past few years have asked a simple
question:
How many records do we need for ML models to
outperform traditional modeling methods?
Hypothesis #1: How many records do we
need?
METHODOLOGY
1. Randomly sample differing numbers of records from the FSS Small
Business datasets.
2. Fit models to random samples.
3. Assess fit on the Out-of-Time Validation dataset.
FINDINGS
45
50
55
60
65
70
75
80
85
1K 5K 10K 50K 100K 500K 1M
GINICoefficient
Number of Records Sampled
GINI Coeffient by FSS Small Business Modeling
Sample
Scorecard Methodlogy XG Boost Random Forest
1. ML models start to outperform the Scorecard model after around
50K records.
2. For smaller samples (5K and 10K), the Scorecard model
outperforms the ML models.
3. The XG Boost models generally outperform the Random Forest
models.
4. The performance of the scorecard models peak at 100K records
and then finds a deterioration in performance as sample increases.
Hypothesis #1: How many records do we
need?
M E T H O D O L O G Y
1. Randomly sample differing numbers of “good” and “bad” records from both the FSS Small Business dataset for varying numbers of
Total Records and Bad Rates.
2. Fit models to random samples.
3. Assess fit on the Out-of-Time Validation dataset.
More Appropriate Question: How many “Bads” are needed for ML models to outperform traditional models?
TO TA L N U M B E R O F R E C O R D S N U M B E R O F “ B A D ” R E C O R D S
60
65
70
75
80
85
1000 10000 100000 1000000
GINICoefficient
Number of Records Sampled
XG Boost
60
65
70
75
80
85
1000 10000 100000 1000000
GINICoefficient
Number of Records Sampled
Random Forest
60
65
70
75
80
85
100 1000 10000 100000
GINICoefficient
Number of Bad Records Sampled
XG Boost
60
65
70
75
80
85
100 1000 10000 100000
GINICoefficient
Number of Bad Records Sampled
Random Forest
Hypothesis #1: How many records do we
need?
F I N D I N G S
– ML model performance has a
stronger dependence on the
number of “bad” records rather
than the total number of records.
– XG Boost generally outperforms
the traditional model in
development samples with over
1,000 “bads”.
– Random Forest performed
similarly as the traditional model
for more balanced samples.
– Traditional model performs
worse with more “goods” for a
given number of “bads”.
More Appropriate Question: How many “Bads” are needed for ML models to outperform traditional models?
60
65
70
75
80
85
100 1000 10000
GINICoefficient
Number of "Bad" Records
Scorecards XG Boost Random Forest
Hypothesis #1: How many records do we
need?
F I N D I N G S
– ML model performance has a
stronger dependence on the
number of “bad” records rather
than the total number of records.
– XG Boost generally outperforms
the traditional model in
development samples with over
1,000 “bads”.
– Random Forest performed
similarly as the traditional model
for more balanced samples.
– Traditional model performs
worse with more “goods” for a
given number of “bads”.
More Appropriate Question: How many “Bads” are needed for ML models to outperform traditional models?
1M Total Sample, 10K Bads
100K Total Sample, 10K Bads
60
65
70
75
80
85
100 1000 10000
GINICoefficient
Number of "Bad" Records
Scorecards
Scorecards
Hypothesis #2: Should we filter predictor
variables?
VARIABLE FILTERING
METHOD
GINI
XG
BOOST
RANDOM
FOREST
All Relevant Variables 21.2% 23.2%
Univariate Performance Metrics
Top ~150
22.3% 23.1%
Initial ML Model Run Top ~150 20.5% 21.0%
Traditional Filtering Top ~150
(Univariate Analysis and Clustering)
21.5% 24.3%
The Dun and Bradstreet database has thousands of variables for predictive
modeling. Anecdotal guidance suggests that all variables should be input into
ML models with no filtering.
Is inputting all available variables into ML
algorithms the best approach?
METHODOLOGY
1. Analyze the Alternative Lenders developmental
dataset, which contains over 1,000 variables that
have not been previously filtered.
2. Apply 3 variable filtering methods to 1,000 potential
predictor variables.
3. Assess fit on a Out-of-Time Validation dataset.
FINDINGS
– For both the XG Boost and Random Forest models,
simply inputting all available variables was not the
best approach.
– Univariate performance metrics seem to be the
best approach, possibly as part of Traditional
filtering.
Hypothesis #3: Does model segmentation
apply?
METHOD SEGMENTATION GINI KS
XG
BOOST
Single Model 76.9% 61.2%
Segmented Model
(Business Size)
78.7% 62.6%
RANDOM
FOREST
Single Model 67.8% 55.2%
Segmented Model
(Business Size)
75.5% 59.6%
Model Segmentation analysis is the first step in building traditional models. For ML
models, general guidance is that segmentation is not required.
Can segmentation improve the
performance of ML models?
METHODOLOGY
1. Fit separate models for small and large businesses
in the FSS datasets and assess fit on the combined
FSS dataset.
2. Fit one model on both small and large businesses in
the FSS datasets and assess fit on the combined
FSS dataset.
FINDINGS
For both the XG Boost and Random Forest models,
segmentation provided improved performance over a
single model.
Hypothesis #4: Why isn’t my ML model
better?
DATASET SCORECARD XG BOOST ML RELATIVE LIFT TOP 20 CV
Alternative Lenders 22.6% 25.6% 15% 0.11
Canadian Export
Propensity 41.0% 42.7% 4% 0.27
SBA LPR 79.5% 81.2% 2% 0.32
For custom modeling engagements, the D&B AAS team builds both traditional and ML
models to determine the amount of predictive lift that ML models would provide.
Under what conditions do ML
models provide more predictive lift
over traditional models?
– ML models were evaluated on the amount lift provided relative to the
traditional model performance.
– Variable distributions were evaluated on the coefficient of variation
of the Top 20 variables.
Univariate performance distributions with more variation
of the top variables coincide with a decrease in ML lift.
0
20
40
60
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
103
109
115
121
127
133
139
145
151
GINI Distribution: Alternative Lenders
0
10
20
30
1
9
17
25
33
41
49
57
65
73
81
89
97
105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
GINI Distribution: Canadian Export Propensity
0
20
40
60
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
101
105
109
GINI Distribution: SBA LPR
Summary Recommendations
Based on the results of the analysis explored in this presentation, we provide the following summary
recommendations for the implementation of ML models.
1 2 3 4
The performance of ML
models relative to
traditional models is
more dependent on the
number of “bad” records
available than the total
number of records
available, where at least
1,000 “bads” are
recommended for
building ML models.
Variables should be
filtered prior to building
ML models, utilizing
univariate performance
metrics and variable
clustering.
Model segmentation
may provide lift to ML
models and should be
investigated in a
manner similar to that of
traditional models.
ML models showed less
lift when a small
number of predictors
exhibit significantly
higher performance
metrics than the other
predictors. In this case,
traditional modeling
methods may be
preferred.

More Related Content

What's hot

Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCMegan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCSri Ambati
 
Machine Learning with H2O
Machine Learning with H2OMachine Learning with H2O
Machine Learning with H2OSri Ambati
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2OSri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFSri Ambati
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneSri Ambati
 
Custom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the EnterpriseCustom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the EnterpriseSri Ambati
 
Introduction & Hands-on with H2O Driverless AI
Introduction & Hands-on with H2O Driverless AIIntroduction & Hands-on with H2O Driverless AI
Introduction & Hands-on with H2O Driverless AISri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...
Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...
Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...Sri Ambati
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Sri Ambati
 
H2O Driverless AI Workshop
H2O Driverless AI WorkshopH2O Driverless AI Workshop
H2O Driverless AI WorkshopSri Ambati
 
Tom Aliff, Equifax - Configurable Modeling for Maximizing Business Value - H2...
Tom Aliff, Equifax - Configurable Modeling for Maximizing Business Value - H2...Tom Aliff, Equifax - Configurable Modeling for Maximizing Business Value - H2...
Tom Aliff, Equifax - Configurable Modeling for Maximizing Business Value - H2...Sri Ambati
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignJuliet Hougland
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 RecapSri Ambati
 
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...Sri Ambati
 
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Sri Ambati
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 
Introducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoIntroducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoSri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 

What's hot (20)

Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCMegan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
 
Machine Learning with H2O
Machine Learning with H2OMachine Learning with H2O
Machine Learning with H2O
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2O
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
 
Custom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the EnterpriseCustom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the Enterprise
 
Introduction & Hands-on with H2O Driverless AI
Introduction & Hands-on with H2O Driverless AIIntroduction & Hands-on with H2O Driverless AI
Introduction & Hands-on with H2O Driverless AI
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...
Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...
Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
 
H2O Driverless AI Workshop
H2O Driverless AI WorkshopH2O Driverless AI Workshop
H2O Driverless AI Workshop
 
Tom Aliff, Equifax - Configurable Modeling for Maximizing Business Value - H2...
Tom Aliff, Equifax - Configurable Modeling for Maximizing Business Value - H2...Tom Aliff, Equifax - Configurable Modeling for Maximizing Business Value - H2...
Tom Aliff, Equifax - Configurable Modeling for Maximizing Business Value - H2...
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 Recap
 
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
 
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
Introducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoIntroducción al Machine Learning Automático
Introducción al Machine Learning Automático
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 

Similar to Mark Seiss, Dun & Bradstreet - Importance of Domain Expertise for Building ML Based Models

Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Graphs and Financial Services Analytics
Graphs and Financial Services AnalyticsGraphs and Financial Services Analytics
Graphs and Financial Services AnalyticsNeo4j
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)Laura Chiticariu
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionMatt Stubbs
 
Modeling for the Non-Statistician
Modeling for the Non-StatisticianModeling for the Non-Statistician
Modeling for the Non-StatisticianAndrew Curtis
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBigML, Inc
 
Supply Chain Analytics with Simulation
Supply Chain Analytics with SimulationSupply Chain Analytics with Simulation
Supply Chain Analytics with SimulationSteve Haekler
 
Supply Chain Analytics with Simulation
Supply Chain Analytics with SimulationSupply Chain Analytics with Simulation
Supply Chain Analytics with SimulationProModel Corporation
 
Demand forecasting case study
Demand forecasting case studyDemand forecasting case study
Demand forecasting case studyRupam Devnath
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsDavidkerrkelly
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesRevolution Analytics
 
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic BorstnarSupporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic BorstnarInstitute of Contemporary Sciences
 
Explainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleExplainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleMartin Dvorak
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningTamir Taha
 

Similar to Mark Seiss, Dun & Bradstreet - Importance of Domain Expertise for Building ML Based Models (20)

Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Graphs and Financial Services Analytics
Graphs and Financial Services AnalyticsGraphs and Financial Services Analytics
Graphs and Financial Services Analytics
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Modeling for the Non-Statistician
Modeling for the Non-StatisticianModeling for the Non-Statistician
Modeling for the Non-Statistician
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
 
Supply Chain Analytics with Simulation
Supply Chain Analytics with SimulationSupply Chain Analytics with Simulation
Supply Chain Analytics with Simulation
 
Supply Chain Analytics with Simulation
Supply Chain Analytics with SimulationSupply Chain Analytics with Simulation
Supply Chain Analytics with Simulation
 
Demand forecasting case study
Demand forecasting case studyDemand forecasting case study
Demand forecasting case study
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability Defaults
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success Rates
 
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic BorstnarSupporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
 
Explainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleExplainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI module
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsSri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 

Mark Seiss, Dun & Bradstreet - Importance of Domain Expertise for Building ML Based Models

  • 1. Importance of Domain Expertise for Building ML Based Models Mark Seiss, PhD Director, Advanced Analytic Services Dun & Bradstreet #H2OWORLD
  • 2. #H2OWORLD – Introduction – Questions Explored – Data – Analysis/Experiments – Summary Recommendations AGENDA Special Thanks to Venkata Vipparthi from the D&B Chennai office for his contributions to this research.
  • 3. Introduction MOTIVATION GENERAL STEPS FOR BUILDING TRADITIONAL MODELS GUIDANCE FOR ML MODELS GOALS – For traditional modeling methods, the Advanced Analytic Services (AAS) team at Dun and Bradstreet have steps and best practices developed by the team. – Our goal is to provide similar guidance for Machine Learning models that are utilized more and more by our team to provide our customers improved assessments of risk and targeting of prospects. 1. Segmentation Analysis 2. Univariate Analysis 3. Variables Selection 4. Explore Interactions 5. Model Validation 6. Finalize and Document Model – Anecdotal evidence often cited – Limited literature – How to answer customer queries on ML? – Create a rubric for implementing ML models for all D&B data scientists. – Provide guidance on when ML models should be used versus traditional models.
  • 4. Questions Explored How does the univariate performance distribution affect the lift provided by ML models? Does model segmentation improve the performance of ML models? Should the pool of predictor variables be filtered prior to input into ML models?How many records are needed for ML models to outperform traditional models?
  • 5. Data Used in Study This study uses analytic datasets previously aggregated the D&B Advanced Analytic Services (AAS) team for the development of various standard and custom models. ALTERNATIVE LENDERS DATA FINANCIAL STRESS SCORE (FSS)1 2 – FSS is one of D&B traditional standard risk scores, assessing the risk that a business will experience financial stress and declare bankruptcy in this next 12 months. – Current form of the score is built based on scorecard modeling methodology (form of logistic regression). – Data split into 2 Segments: Small Companies (~1.1M Records) and Large Companies (~300K Records). – Alternative lenders make financing available to US business, small ones in particular, when traditional loans are not available. – Alternative lenders often make loan approvals much faster than traditional banks, requiring analytic solutions that quickly and accurately assess a company’s payment risk. – Alternative Lenders Credit Score assesses a small businesses likelihood to be delinquent with their payments. CANADIAN EXPORT PROPENSITY SBA LENDER PURCHASE RATING3 4 – The mission of the Office of Credit Risk Management (OCRM) at the Small Business Administration (SBA) is to manage program credit risk, monitor lender performance, and enforce lending program requirements. – The Lender Purchase Rating (LPR) predicts the performance of loans in a lender’s SBA portfolio over the next 12 months. – Canadian government and provincial ministries have a need to identify businesses that export for planning and economic development purposes. – Export propensity assesses the likelihood that a business exports goods or services.
  • 6. Numerous clients in the past few years have asked a simple question: How many records do we need for ML models to outperform traditional modeling methods? Hypothesis #1: How many records do we need? METHODOLOGY 1. Randomly sample differing numbers of records from the FSS Small Business datasets. 2. Fit models to random samples. 3. Assess fit on the Out-of-Time Validation dataset. FINDINGS 45 50 55 60 65 70 75 80 85 1K 5K 10K 50K 100K 500K 1M GINICoefficient Number of Records Sampled GINI Coeffient by FSS Small Business Modeling Sample Scorecard Methodlogy XG Boost Random Forest 1. ML models start to outperform the Scorecard model after around 50K records. 2. For smaller samples (5K and 10K), the Scorecard model outperforms the ML models. 3. The XG Boost models generally outperform the Random Forest models. 4. The performance of the scorecard models peak at 100K records and then finds a deterioration in performance as sample increases.
  • 7. Hypothesis #1: How many records do we need? M E T H O D O L O G Y 1. Randomly sample differing numbers of “good” and “bad” records from both the FSS Small Business dataset for varying numbers of Total Records and Bad Rates. 2. Fit models to random samples. 3. Assess fit on the Out-of-Time Validation dataset. More Appropriate Question: How many “Bads” are needed for ML models to outperform traditional models? TO TA L N U M B E R O F R E C O R D S N U M B E R O F “ B A D ” R E C O R D S 60 65 70 75 80 85 1000 10000 100000 1000000 GINICoefficient Number of Records Sampled XG Boost 60 65 70 75 80 85 1000 10000 100000 1000000 GINICoefficient Number of Records Sampled Random Forest 60 65 70 75 80 85 100 1000 10000 100000 GINICoefficient Number of Bad Records Sampled XG Boost 60 65 70 75 80 85 100 1000 10000 100000 GINICoefficient Number of Bad Records Sampled Random Forest
  • 8. Hypothesis #1: How many records do we need? F I N D I N G S – ML model performance has a stronger dependence on the number of “bad” records rather than the total number of records. – XG Boost generally outperforms the traditional model in development samples with over 1,000 “bads”. – Random Forest performed similarly as the traditional model for more balanced samples. – Traditional model performs worse with more “goods” for a given number of “bads”. More Appropriate Question: How many “Bads” are needed for ML models to outperform traditional models? 60 65 70 75 80 85 100 1000 10000 GINICoefficient Number of "Bad" Records Scorecards XG Boost Random Forest
  • 9. Hypothesis #1: How many records do we need? F I N D I N G S – ML model performance has a stronger dependence on the number of “bad” records rather than the total number of records. – XG Boost generally outperforms the traditional model in development samples with over 1,000 “bads”. – Random Forest performed similarly as the traditional model for more balanced samples. – Traditional model performs worse with more “goods” for a given number of “bads”. More Appropriate Question: How many “Bads” are needed for ML models to outperform traditional models? 1M Total Sample, 10K Bads 100K Total Sample, 10K Bads 60 65 70 75 80 85 100 1000 10000 GINICoefficient Number of "Bad" Records Scorecards Scorecards
  • 10. Hypothesis #2: Should we filter predictor variables? VARIABLE FILTERING METHOD GINI XG BOOST RANDOM FOREST All Relevant Variables 21.2% 23.2% Univariate Performance Metrics Top ~150 22.3% 23.1% Initial ML Model Run Top ~150 20.5% 21.0% Traditional Filtering Top ~150 (Univariate Analysis and Clustering) 21.5% 24.3% The Dun and Bradstreet database has thousands of variables for predictive modeling. Anecdotal guidance suggests that all variables should be input into ML models with no filtering. Is inputting all available variables into ML algorithms the best approach? METHODOLOGY 1. Analyze the Alternative Lenders developmental dataset, which contains over 1,000 variables that have not been previously filtered. 2. Apply 3 variable filtering methods to 1,000 potential predictor variables. 3. Assess fit on a Out-of-Time Validation dataset. FINDINGS – For both the XG Boost and Random Forest models, simply inputting all available variables was not the best approach. – Univariate performance metrics seem to be the best approach, possibly as part of Traditional filtering.
  • 11. Hypothesis #3: Does model segmentation apply? METHOD SEGMENTATION GINI KS XG BOOST Single Model 76.9% 61.2% Segmented Model (Business Size) 78.7% 62.6% RANDOM FOREST Single Model 67.8% 55.2% Segmented Model (Business Size) 75.5% 59.6% Model Segmentation analysis is the first step in building traditional models. For ML models, general guidance is that segmentation is not required. Can segmentation improve the performance of ML models? METHODOLOGY 1. Fit separate models for small and large businesses in the FSS datasets and assess fit on the combined FSS dataset. 2. Fit one model on both small and large businesses in the FSS datasets and assess fit on the combined FSS dataset. FINDINGS For both the XG Boost and Random Forest models, segmentation provided improved performance over a single model.
  • 12. Hypothesis #4: Why isn’t my ML model better? DATASET SCORECARD XG BOOST ML RELATIVE LIFT TOP 20 CV Alternative Lenders 22.6% 25.6% 15% 0.11 Canadian Export Propensity 41.0% 42.7% 4% 0.27 SBA LPR 79.5% 81.2% 2% 0.32 For custom modeling engagements, the D&B AAS team builds both traditional and ML models to determine the amount of predictive lift that ML models would provide. Under what conditions do ML models provide more predictive lift over traditional models? – ML models were evaluated on the amount lift provided relative to the traditional model performance. – Variable distributions were evaluated on the coefficient of variation of the Top 20 variables. Univariate performance distributions with more variation of the top variables coincide with a decrease in ML lift. 0 20 40 60 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 GINI Distribution: Alternative Lenders 0 10 20 30 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145 153 161 169 177 185 193 201 209 217 GINI Distribution: Canadian Export Propensity 0 20 40 60 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 GINI Distribution: SBA LPR
  • 13. Summary Recommendations Based on the results of the analysis explored in this presentation, we provide the following summary recommendations for the implementation of ML models. 1 2 3 4 The performance of ML models relative to traditional models is more dependent on the number of “bad” records available than the total number of records available, where at least 1,000 “bads” are recommended for building ML models. Variables should be filtered prior to building ML models, utilizing univariate performance metrics and variable clustering. Model segmentation may provide lift to ML models and should be investigated in a manner similar to that of traditional models. ML models showed less lift when a small number of predictors exhibit significantly higher performance metrics than the other predictors. In this case, traditional modeling methods may be preferred.