This document discusses guidelines for building machine learning models using Dun & Bradstreet data. It explores several hypotheses through experiments on different datasets. The main findings are: (1) ML models outperform traditional models when there are at least 1,000 "bad" records, rather than total records; (2) variable filtering before ML modeling improves performance; (3) segmenting models can boost ML performance similar to traditional models; and (4) ML models provide less lift when a few variables are much more predictive than others. The recommendations are to focus on "bad" record count, filter variables, consider segmentation, and prefer traditional models if few variables dominate predictions.
2. #H2OWORLD
– Introduction
– Questions Explored
– Data
– Analysis/Experiments
– Summary Recommendations
AGENDA
Special Thanks to Venkata Vipparthi from the D&B Chennai office for his
contributions to this research.
3. Introduction
MOTIVATION
GENERAL STEPS FOR
BUILDING TRADITIONAL
MODELS
GUIDANCE FOR ML MODELS
GOALS
– For traditional modeling methods,
the Advanced Analytic Services
(AAS) team at Dun and Bradstreet
have steps and best practices
developed by the team.
– Our goal is to provide similar
guidance for Machine Learning
models that are utilized more and
more by our team to provide our
customers improved assessments of
risk and targeting of prospects.
1. Segmentation Analysis
2. Univariate Analysis
3. Variables Selection
4. Explore Interactions
5. Model Validation
6. Finalize and Document
Model
– Anecdotal evidence often cited
– Limited literature
– How to answer customer queries on
ML?
– Create a rubric for implementing
ML models for all D&B data
scientists.
– Provide guidance on when ML
models should be used versus
traditional models.
4. Questions Explored
How does the
univariate
performance
distribution affect
the lift provided by
ML models?
Does model
segmentation
improve the
performance of ML
models?
Should the pool of
predictor variables
be filtered prior to
input into ML
models?How many records are
needed for ML models
to outperform
traditional models?
5. Data Used in Study
This study uses analytic datasets previously aggregated the D&B Advanced Analytic Services (AAS)
team for the development of various standard and custom models.
ALTERNATIVE LENDERS DATA FINANCIAL STRESS SCORE (FSS)1 2
– FSS is one of D&B traditional standard risk scores, assessing
the risk that a business will experience financial stress and
declare bankruptcy in this next 12 months.
– Current form of the score is built based on scorecard modeling
methodology (form of logistic regression).
– Data split into 2 Segments: Small Companies (~1.1M Records)
and Large Companies (~300K Records).
– Alternative lenders make financing available to US business,
small ones in particular, when traditional loans are not
available.
– Alternative lenders often make loan approvals much faster
than traditional banks, requiring analytic solutions that quickly
and accurately assess a company’s payment risk.
– Alternative Lenders Credit Score assesses a small businesses
likelihood to be delinquent with their payments.
CANADIAN EXPORT PROPENSITY SBA LENDER PURCHASE RATING3 4
– The mission of the Office of Credit Risk Management (OCRM)
at the Small Business Administration (SBA) is to manage
program credit risk, monitor lender performance, and enforce
lending program requirements.
– The Lender Purchase Rating (LPR) predicts the performance of
loans in a lender’s SBA portfolio over the next 12 months.
– Canadian government and provincial ministries have a need to
identify businesses that export for planning and economic
development purposes.
– Export propensity assesses the likelihood that a business
exports goods or services.
6. Numerous clients in the past few years have asked a simple
question:
How many records do we need for ML models to
outperform traditional modeling methods?
Hypothesis #1: How many records do we
need?
METHODOLOGY
1. Randomly sample differing numbers of records from the FSS Small
Business datasets.
2. Fit models to random samples.
3. Assess fit on the Out-of-Time Validation dataset.
FINDINGS
45
50
55
60
65
70
75
80
85
1K 5K 10K 50K 100K 500K 1M
GINICoefficient
Number of Records Sampled
GINI Coeffient by FSS Small Business Modeling
Sample
Scorecard Methodlogy XG Boost Random Forest
1. ML models start to outperform the Scorecard model after around
50K records.
2. For smaller samples (5K and 10K), the Scorecard model
outperforms the ML models.
3. The XG Boost models generally outperform the Random Forest
models.
4. The performance of the scorecard models peak at 100K records
and then finds a deterioration in performance as sample increases.
7. Hypothesis #1: How many records do we
need?
M E T H O D O L O G Y
1. Randomly sample differing numbers of “good” and “bad” records from both the FSS Small Business dataset for varying numbers of
Total Records and Bad Rates.
2. Fit models to random samples.
3. Assess fit on the Out-of-Time Validation dataset.
More Appropriate Question: How many “Bads” are needed for ML models to outperform traditional models?
TO TA L N U M B E R O F R E C O R D S N U M B E R O F “ B A D ” R E C O R D S
60
65
70
75
80
85
1000 10000 100000 1000000
GINICoefficient
Number of Records Sampled
XG Boost
60
65
70
75
80
85
1000 10000 100000 1000000
GINICoefficient
Number of Records Sampled
Random Forest
60
65
70
75
80
85
100 1000 10000 100000
GINICoefficient
Number of Bad Records Sampled
XG Boost
60
65
70
75
80
85
100 1000 10000 100000
GINICoefficient
Number of Bad Records Sampled
Random Forest
8. Hypothesis #1: How many records do we
need?
F I N D I N G S
– ML model performance has a
stronger dependence on the
number of “bad” records rather
than the total number of records.
– XG Boost generally outperforms
the traditional model in
development samples with over
1,000 “bads”.
– Random Forest performed
similarly as the traditional model
for more balanced samples.
– Traditional model performs
worse with more “goods” for a
given number of “bads”.
More Appropriate Question: How many “Bads” are needed for ML models to outperform traditional models?
60
65
70
75
80
85
100 1000 10000
GINICoefficient
Number of "Bad" Records
Scorecards XG Boost Random Forest
9. Hypothesis #1: How many records do we
need?
F I N D I N G S
– ML model performance has a
stronger dependence on the
number of “bad” records rather
than the total number of records.
– XG Boost generally outperforms
the traditional model in
development samples with over
1,000 “bads”.
– Random Forest performed
similarly as the traditional model
for more balanced samples.
– Traditional model performs
worse with more “goods” for a
given number of “bads”.
More Appropriate Question: How many “Bads” are needed for ML models to outperform traditional models?
1M Total Sample, 10K Bads
100K Total Sample, 10K Bads
60
65
70
75
80
85
100 1000 10000
GINICoefficient
Number of "Bad" Records
Scorecards
Scorecards
10. Hypothesis #2: Should we filter predictor
variables?
VARIABLE FILTERING
METHOD
GINI
XG
BOOST
RANDOM
FOREST
All Relevant Variables 21.2% 23.2%
Univariate Performance Metrics
Top ~150
22.3% 23.1%
Initial ML Model Run Top ~150 20.5% 21.0%
Traditional Filtering Top ~150
(Univariate Analysis and Clustering)
21.5% 24.3%
The Dun and Bradstreet database has thousands of variables for predictive
modeling. Anecdotal guidance suggests that all variables should be input into
ML models with no filtering.
Is inputting all available variables into ML
algorithms the best approach?
METHODOLOGY
1. Analyze the Alternative Lenders developmental
dataset, which contains over 1,000 variables that
have not been previously filtered.
2. Apply 3 variable filtering methods to 1,000 potential
predictor variables.
3. Assess fit on a Out-of-Time Validation dataset.
FINDINGS
– For both the XG Boost and Random Forest models,
simply inputting all available variables was not the
best approach.
– Univariate performance metrics seem to be the
best approach, possibly as part of Traditional
filtering.
11. Hypothesis #3: Does model segmentation
apply?
METHOD SEGMENTATION GINI KS
XG
BOOST
Single Model 76.9% 61.2%
Segmented Model
(Business Size)
78.7% 62.6%
RANDOM
FOREST
Single Model 67.8% 55.2%
Segmented Model
(Business Size)
75.5% 59.6%
Model Segmentation analysis is the first step in building traditional models. For ML
models, general guidance is that segmentation is not required.
Can segmentation improve the
performance of ML models?
METHODOLOGY
1. Fit separate models for small and large businesses
in the FSS datasets and assess fit on the combined
FSS dataset.
2. Fit one model on both small and large businesses in
the FSS datasets and assess fit on the combined
FSS dataset.
FINDINGS
For both the XG Boost and Random Forest models,
segmentation provided improved performance over a
single model.
12. Hypothesis #4: Why isn’t my ML model
better?
DATASET SCORECARD XG BOOST ML RELATIVE LIFT TOP 20 CV
Alternative Lenders 22.6% 25.6% 15% 0.11
Canadian Export
Propensity 41.0% 42.7% 4% 0.27
SBA LPR 79.5% 81.2% 2% 0.32
For custom modeling engagements, the D&B AAS team builds both traditional and ML
models to determine the amount of predictive lift that ML models would provide.
Under what conditions do ML
models provide more predictive lift
over traditional models?
– ML models were evaluated on the amount lift provided relative to the
traditional model performance.
– Variable distributions were evaluated on the coefficient of variation
of the Top 20 variables.
Univariate performance distributions with more variation
of the top variables coincide with a decrease in ML lift.
0
20
40
60
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
103
109
115
121
127
133
139
145
151
GINI Distribution: Alternative Lenders
0
10
20
30
1
9
17
25
33
41
49
57
65
73
81
89
97
105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
GINI Distribution: Canadian Export Propensity
0
20
40
60
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
101
105
109
GINI Distribution: SBA LPR
13. Summary Recommendations
Based on the results of the analysis explored in this presentation, we provide the following summary
recommendations for the implementation of ML models.
1 2 3 4
The performance of ML
models relative to
traditional models is
more dependent on the
number of “bad” records
available than the total
number of records
available, where at least
1,000 “bads” are
recommended for
building ML models.
Variables should be
filtered prior to building
ML models, utilizing
univariate performance
metrics and variable
clustering.
Model segmentation
may provide lift to ML
models and should be
investigated in a
manner similar to that of
traditional models.
ML models showed less
lift when a small
number of predictors
exhibit significantly
higher performance
metrics than the other
predictors. In this case,
traditional modeling
methods may be
preferred.