SlideShare a Scribd company logo
1 of 121
Download to read offline
Predicting delinquency on debt
What is the problem?
What is the problem?
ā€¢ X Store has a retail credit card available to
customers
What is the problem?
ā€¢ X Store has a retail credit card available to
customers
ā€¢ There can be a number of sources of loss
from this product, but one is customerā€™s
defaulting on their debt
What is the problem?
ā€¢ X Store has a retail credit card available to
customers
ā€¢ There can be a number of sources of loss
from this product, but one is customerā€™s
defaulting on their debt
ā€¢ This prevents the store from collecting
payment for products and services
rendered
Is this problem big enough to matter?
Is this problem big enough to matter?
ā€¢ Examining a slice of the customer database
(150,000 customers) we ļ¬nd that 6.6% of
customers were seriously delinquent in
payment the last two years
Is this problem big enough to matter?
ā€¢ Examining a slice of the customer database
(150,000 customers) we ļ¬nd that 6.6% of
customers were seriously delinquent in
payment the last two years
ā€¢ If only 5% of their carried debt was the
store credit card this is potentially an:
Is this problem big enough to matter?
ā€¢ Examining a slice of the customer database
(150,000 customers) we ļ¬nd that 6.6% of
customers were seriously delinquent in
payment the last two years
ā€¢ If only 5% of their carried debt was the
store credit card this is potentially an:
ā€¢ Average loss of $8.12 per customer
Is this problem big enough to matter?
ā€¢ Examining a slice of the customer database
(150,000 customers) we ļ¬nd that 6.6% of
customers were seriously delinquent in
payment the last two years
ā€¢ If only 5% of their carried debt was the
store credit card this is potentially an:
ā€¢ Average loss of $8.12 per customer
ā€¢ Potential overall loss of $1.2 million
What can be done?
What can be done?
ā€¢ There are numerous models that can be
used to predict which customers will
default
What can be done?
ā€¢ There are numerous models that can be
used to predict which customers will
default
ā€¢ This could be used to decrease credit limits
or cancel credit lines for current risky
customers to minimize potential loss
What can be done?
ā€¢ There are numerous models that can be
used to predict which customers will
default
ā€¢ This could be used to decrease credit limits
or cancel credit lines for current risky
customers to minimize potential loss
ā€¢ Or better screen which customers are
approved for the card
How will I do this?
How will I do this?
ā€¢ This is a basic classiļ¬cation problem with
important business implications
How will I do this?
ā€¢ This is a basic classiļ¬cation problem with
important business implications
ā€¢ Weā€™ll examine a few simplistic models to
get an idea of performance
How will I do this?
ā€¢ This is a basic classiļ¬cation problem with
important business implications
ā€¢ Weā€™ll examine a few simplistic models to
get an idea of performance
ā€¢ Explore decision tree methods to achieve
better performance
What will the models predict delinquency?
Each customer has a number of attributes
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
Mary Rasmussen
Delinquent: No
Age: 73
Income: $2200
Number of Lines: 2
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
Mary Rasmussen
Delinquent: No
Age: 73
Income: $2200
Number of Lines: 2
...
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
Mary Rasmussen
Delinquent: No
Age: 73
Income: $2200
Number of Lines: 2
...
We will use the customer attributes to predict
whether they were delinquent
How do we make sure that our solution actually
has predictive power?
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
Train
150,000
customers
Delinquency
in dataset
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
Train Test
150,000
customers
Delinquency
in dataset
101,000
customers
Delinquency
not in
dataset
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
Train Test
150,000
customers
Delinquency
in dataset
101,000
customers
Delinquency
not in
dataset
None of the customers in the test dataset are
used to train the model
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Train 1
Train 2
Train 3
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Train 1
Train 2
Train 3
Train 1
Train 2
Algorithm
Training
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Train 1
Train 2
Train 3
Train 1
Train 2
Algorithm
Training
Algorithm
Testing
Train 3
What matters is how well we can predict
the test dataset
We judge this using the accuracy, which is the number
of our predictions correct out of the total number of
predictions made
So with 100,000 customers and an 80% accuracy we
will have correctly predicted whether 80,000
customers will default or not in the next two years
Putting accuracy in context
Putting accuracy in context
We could save $600,000 over two years if we
correctly predicted 50% of the customers that would
default and changed their account to prevent it
Putting accuracy in context
We could save $600,000 over two years if we
correctly predicted 50% of the customers that would
default and changed their account to prevent it
The potential loss is minimized by ~$8,000 for every
100,000 customers with each percentage point
increase in accuracy
Looking at the actual data
Looking at the actual data
Looking at the actual data
Looking at the actual data
Assume
$2,500
Looking at the actual data
Assume
$2,500
Assume
0
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
50%
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
50%
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
50%
Simple
Classiļ¬cation
For simple classiļ¬cation we pick a single attribute
and ļ¬nd the best split in the customers
For simple classiļ¬cation we pick a single attribute
and ļ¬nd the best split in the customers
For simple classiļ¬cation we pick a single attribute
and ļ¬nd the best split in the customers
NumberofCustomers
Times Past Due
For simple classiļ¬cation we pick a single attribute
and ļ¬nd the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1
For simple classiļ¬cation we pick a single attribute
and ļ¬nd the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2
For simple classiļ¬cation we pick a single attribute
and ļ¬nd the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2
For simple classiļ¬cation we pick a single attribute
and ļ¬nd the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2
For simple classiļ¬cation we pick a single attribute
and ļ¬nd the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2 ...
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0 20 40 60 80 100
Number of Times 30-59 Days Past Due
0
0.2
0.4
0.6
0.8
Accuracy
Precision
Sensitivity
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0 20 40 60 80 100
Number of Times 30-59 Days Past Due
0
0.2
0.4
0.6
0.8
Accuracy
Precision
Sensitivity
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0 20 40 60 80 100
Number of Times 30-59 Days Past Due
0
0.2
0.4
0.6
0.8
Accuracy
Precision
Sensitivity
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0.61 KGI on Test Set
However, not all ļ¬elds are as informative
Using the number of times past due 60-89 days
we achieve a KGI of 0.5
However, not all ļ¬elds are as informative
Using the number of times past due 60-89 days
we achieve a KGI of 0.5
The approach is naive and could be improved but
our time is better spent on different algorithms
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classiļ¬cation
0.50-0.61
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classiļ¬cation
0.50-0.61
Random
Forests
A random forest starts from a decision tree
Customer Data
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
75,000
Customers>30
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
75,000
Customers>30
Yes
25,000
Customers <30
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
75,000
Customers>30
Yes
25,000
Customers <30
...
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
Class assignment of a customer is based on how many
of the decision trees ā€œvoteā€ on how to split an attribute
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
We use a large number of trees to not over-ļ¬t to the
training data
Class assignment of a customer is based on how many
of the decision trees ā€œvoteā€ on how to split an attribute
The Random Forest algorithm are easily implemented
In Python or R for initial testing and validation
The Random Forest algorithm are easily implemented
In Python or R for initial testing and validation
The Random Forest algorithm are easily implemented
In Python or R for initial testing and validation
Also parallelized with Mahout and Hadoop since
there is no dependence from one tree to the next
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
1000 trees: 0.850 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
1000 trees: 0.850 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
1000 trees: 0.850 KGI
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classiļ¬cation
0.50-0.61
Random
Forests
0.78-0.85
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classiļ¬cation
0.50-0.61
Random
Forests
0.78-0.85
Gradient Tree
Boosting
Boosting Trees is similar to a Random Forest
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
Customers
>30 Data
Yes
Customers
<30 Data
...
Boosting Trees is similar to a Random Forest
Customer Data
Is age <30?
No
Customers
>30 Data
Yes
Customers
<30 Data
...
Do an exhaustive search
for best split
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The ļ¬rst tree is optimized to minimize
a loss function describing the data
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The ļ¬rst tree is optimized to minimize
a loss function describing the data
The next tree is then optimized to
ļ¬t whatever variability the ļ¬rst
tree didnā€™t ļ¬t
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The ļ¬rst tree is optimized to minimize
a loss function describing the data
The next tree is then optimized to
ļ¬t whatever variability the ļ¬rst
tree didnā€™t ļ¬t
This is a sequential process in
comparison to the random forest
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The ļ¬rst tree is optimized to minimize
a loss function describing the data
The next tree is then optimized to
ļ¬t whatever variability the ļ¬rst
tree didnā€™t ļ¬t
This is a sequential process in
comparison to the random forest
We also run the risk of over-ļ¬tting
to the data, thus the learning rate
Implementing Gradient Boosted Trees
In Python or R it is easy for initial testing and validation
Implementing Gradient Boosted Trees
In Python or R it is easy for initial testing and validation
There are implementations that use Hadoop but itā€™s
more complicated to achieve the best performance
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
1000 trees, 0.1 Learning: 0.865248 KGI
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
1000 trees, 0.1 Learning: 0.865248 KGI
0 0.6 0.8
Learning Rate
0.75
0.8
0.85
KGI
0.2 0.4
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
1000 trees, 0.1 Learning: 0.865248 KGI
0 0.6 0.8
Learning Rate
0.75
0.8
0.85
KGI
0.2 0.4
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Boosting Trees
Moving one step further in complexity
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classiļ¬cation
0.50-0.61
Random
Forests
0.78-0.85
Gradient Tree
Boosting
0.71-0.8659
Blended
Method
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Random Forest
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Random Forest
Extremely Random Forest
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Random Forest
Extremely Random Forest
Gradient Tree Boosting
Or more accurately an ensemble of
ensemble methods
Algorithm ProgressionTrain Data Probabilities
Random Forest
Extremely Random Forest
Gradient Tree Boosting
0.1
0.5
0.01
0.8
0.7
.
.
.
Or more accurately an ensemble of
ensemble methods
Algorithm ProgressionTrain Data Probabilities
Random Forest
Extremely Random Forest
Gradient Tree Boosting
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Or more accurately an ensemble of
ensemble methods
Algorithm ProgressionTrain Data Probabilities
Random Forest
Extremely Random Forest
Gradient Tree Boosting
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Combine all of the model information
Train Data Probabilities
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Combine all of the model information
Train Data Probabilities
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Optimize the set of train probabilities
to the known delinquencies
Combine all of the model information
Train Data Probabilities
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Optimize the set of train probabilities
to the known delinquencies
Apply the same weighting scheme to the
set of test data probabilities
Implementation can be done in a number of ways
Testing in Python or R is slower, due to the sequential nature
of applying the algorithms
Could be faster parallelized, running each algorithm separately
and combining the results
Assessing model performance
Blending Performance, 100 trees: 0.864394 KGI
Assessing model performance
Blending Performance, 100 trees: 0.864394 KGI
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Boosting Trees
Blended
Assessing model performance
Blending Performance, 100 trees: 0.864394 KGI
But this performance and the possibility of
additional gains comes at a distinct time cost.
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Boosting Trees
Blended
Examining the continuum of choices
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classiļ¬cation
0.50-0.61
Random
Forests
0.78-0.85
Gradient Tree
Boosting
0.71-0.8659
Blended
Method
0.864
What would be best to implement?
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
However, this algorithm takes the longest to run.
This constraint will apply in testing and validation also
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
However, this algorithm takes the longest to run.
This constraint will apply in testing and validation also
Random Forests returns a reasonably good result.
It is quick and easily parallelized
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
However, this algorithm takes the longest to run.
This constraint will apply in testing and validation also
Random Forests returns a reasonably good result.
It is quick and easily parallelized
Gradient Tree Boosting returns the best result and
runs reasonably fast.
It is not as easily parallelized though
What would be best to implement?
Random Forests returns a reasonably good result.
It is quick and easily parallelized
Gradient Tree Boosting returns the best result and
runs reasonably fast.
It is not as easily parallelized though
Increases in predictive performance have real
business value
Using any of the more complex algorithms we
achieve an increase of 35% in comparison to random
Increases in predictive performance have real
business value
Using any of the more complex algorithms we
achieve an increase of 35% in comparison to random
Potential decrease of ~$420k in losses by identifying
customers likely to default in the training set alone
Thank you for your time

More Related Content

What's hot

Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model finalRitu Sarkar
Ā 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
Ā 
Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Vatsal N Shah
Ā 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
Ā 
Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card PaymentsVikas Virani
Ā 
Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation EngineShravaniBheema
Ā 
Credit Card Business Plan
Credit Card Business PlanCredit Card Business Plan
Credit Card Business PlanRaghavendra L Rao
Ā 
AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banksPankaj Baid
Ā 
Credit Scoring
Credit ScoringCredit Scoring
Credit ScoringMABSIV
Ā 
Artificial Intelligence: a driver of innovation in the Banking Sector
Artificial Intelligence: a driver of innovation in the Banking Sector Artificial Intelligence: a driver of innovation in the Banking Sector
Artificial Intelligence: a driver of innovation in the Banking Sector Big Data Value Association
Ā 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesSindhujanDhayalan
Ā 
Default payment prediction system
Default payment prediction systemDefault payment prediction system
Default payment prediction systemAshish Arora
Ā 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
Ā 
Mobile Banking in 2020 - Mobile World Congress Report
Mobile Banking in 2020 - Mobile World Congress ReportMobile Banking in 2020 - Mobile World Congress Report
Mobile Banking in 2020 - Mobile World Congress ReportNadejda Tatarciuc
Ā 
Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modelingPierre Gutierrez
Ā 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
Ā 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionRavi Gupta
Ā 

What's hot (20)

Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
Ā 
Creditscore
CreditscoreCreditscore
Creditscore
Ā 
Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients
Ā 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
Ā 
Predicting the e-commerce churn
Predicting the e-commerce churnPredicting the e-commerce churn
Predicting the e-commerce churn
Ā 
Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card Payments
Ā 
Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation Engine
Ā 
Credit Card Business Plan
Credit Card Business PlanCredit Card Business Plan
Credit Card Business Plan
Ā 
AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banks
Ā 
Credit Scoring
Credit ScoringCredit Scoring
Credit Scoring
Ā 
Artificial Intelligence: a driver of innovation in the Banking Sector
Artificial Intelligence: a driver of innovation in the Banking Sector Artificial Intelligence: a driver of innovation in the Banking Sector
Artificial Intelligence: a driver of innovation in the Banking Sector
Ā 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
Ā 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniques
Ā 
Default payment prediction system
Default payment prediction systemDefault payment prediction system
Default payment prediction system
Ā 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
Ā 
Mobile Banking in 2020 - Mobile World Congress Report
Mobile Banking in 2020 - Mobile World Congress ReportMobile Banking in 2020 - Mobile World Congress Report
Mobile Banking in 2020 - Mobile World Congress Report
Ā 
UNIT-4.pptx
UNIT-4.pptxUNIT-4.pptx
UNIT-4.pptx
Ā 
Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modeling
Ā 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
Ā 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detection
Ā 

Viewers also liked

Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
Ā 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
Ā 
Reading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li ChenluReading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li ChenluChristian Robert
Ā 
Q trade presentation
Q trade presentationQ trade presentation
Q trade presentationewig123
Ā 
Tree advanced
Tree advancedTree advanced
Tree advancedJinseob Kim
Ā 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3arogozhnikov
Ā 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionSeonho Park
Ā 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodHonglin Yu
Ā 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
Ā 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
Ā 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
Ā 
Tda presentation
Tda presentationTda presentation
Tda presentationHJ van Veen
Ā 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Sri Ambati
Ā 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
Ā 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
Ā 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDeepak George
Ā 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
Ā 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)HJ van Veen
Ā 

Viewers also liked (20)

Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
Ā 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
Ā 
Reading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li ChenluReading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li Chenlu
Ā 
Q trade presentation
Q trade presentationQ trade presentation
Q trade presentation
Ā 
Tree advanced
Tree advancedTree advanced
Tree advanced
Ā 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
Ā 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
Ā 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
Ā 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
Ā 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
Ā 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
Ā 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
Ā 
Tda presentation
Tda presentationTda presentation
Tda presentation
Ā 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Ā 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
Ā 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
Ā 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Ā 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Ā 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
Ā 
Kaggle: Crowd Sourcing for Data Analytics
Kaggle: Crowd Sourcing for Data AnalyticsKaggle: Crowd Sourcing for Data Analytics
Kaggle: Crowd Sourcing for Data Analytics
Ā 

Similar to Kaggle "Give me some credit" challenge overview

Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectBoston Institute of Analytics
Ā 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonaliSonali Gupta
Ā 
Important Terminologies In Statistical Inference
Important  Terminologies In  Statistical  InferenceImportant  Terminologies In  Statistical  Inference
Important Terminologies In Statistical InferenceZoha Qureshi
Ā 
Tpmg Manage Cust Prof Final
Tpmg Manage Cust Prof FinalTpmg Manage Cust Prof Final
Tpmg Manage Cust Prof FinalJohn Tyler
Ā 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
Ā 
Customer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance AgentsCustomer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance AgentsScott Boren
Ā 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.customersforever
Ā 
Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Mira McKee
Ā 
Being Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongBeing Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongData Con LA
Ā 
David apple typeform retention story - saa stock (1)
David apple   typeform retention story - saa stock (1)David apple   typeform retention story - saa stock (1)
David apple typeform retention story - saa stock (1)SaaStock
Ā 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePedro Ecija Serrano
Ā 
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...3 Birds Marketing LLC
Ā 
Stage Presentation
Stage PresentationStage Presentation
Stage PresentationSCI INFO
Ā 
Transform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive AnalyticsTransform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive AnalyticsFIS
Ā 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay ScoreJackie Rojas
Ā 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay ScoreBeth Hall
Ā 
Valiance Portfolio
Valiance PortfolioValiance Portfolio
Valiance PortfolioRohit Pandey
Ā 
Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509gnorth
Ā 

Similar to Kaggle "Give me some credit" challenge overview (20)

Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
Ā 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonali
Ā 
Important Terminologies In Statistical Inference
Important  Terminologies In  Statistical  InferenceImportant  Terminologies In  Statistical  Inference
Important Terminologies In Statistical Inference
Ā 
Tpmg Manage Cust Prof Final
Tpmg Manage Cust Prof FinalTpmg Manage Cust Prof Final
Tpmg Manage Cust Prof Final
Ā 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
Ā 
Customer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance AgentsCustomer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance Agents
Ā 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.
Ā 
Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)
Ā 
Being Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongBeing Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're Wrong
Ā 
David apple typeform retention story - saa stock (1)
David apple   typeform retention story - saa stock (1)David apple   typeform retention story - saa stock (1)
David apple typeform retention story - saa stock (1)
Ā 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Ā 
PATH | WD
PATH | WDPATH | WD
PATH | WD
Ā 
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Ā 
Stage Presentation
Stage PresentationStage Presentation
Stage Presentation
Ā 
Transform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive AnalyticsTransform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive Analytics
Ā 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay Score
Ā 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay Score
Ā 
Valiance Portfolio
Valiance PortfolioValiance Portfolio
Valiance Portfolio
Ā 
Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509
Ā 
Final presentation - Group10(ADS)
Final presentation - Group10(ADS)Final presentation - Group10(ADS)
Final presentation - Group10(ADS)
Ā 

More from Adam Pah

Why Python?
Why Python?Why Python?
Why Python?Adam Pah
Ā 
Quest overview
Quest overviewQuest overview
Quest overviewAdam Pah
Ā 
A quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for researchA quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for researchAdam Pah
Ā 
Pah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuildPah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuildAdam Pah
Ā 
D3 interactivity Linegraph basic example
D3 interactivity Linegraph basic exampleD3 interactivity Linegraph basic example
D3 interactivity Linegraph basic exampleAdam Pah
Ā 
Mercurial Tutorial
Mercurial TutorialMercurial Tutorial
Mercurial TutorialAdam Pah
Ā 
Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"Adam Pah
Ā 

More from Adam Pah (7)

Why Python?
Why Python?Why Python?
Why Python?
Ā 
Quest overview
Quest overviewQuest overview
Quest overview
Ā 
A quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for researchA quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for research
Ā 
Pah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuildPah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuild
Ā 
D3 interactivity Linegraph basic example
D3 interactivity Linegraph basic exampleD3 interactivity Linegraph basic example
D3 interactivity Linegraph basic example
Ā 
Mercurial Tutorial
Mercurial TutorialMercurial Tutorial
Mercurial Tutorial
Ā 
Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Ā 

Recently uploaded

PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPanhandleOilandGas
Ā 
HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024Hector Del Castillo, CPM, CPMM
Ā 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannaBusinessPlans
Ā 
PARK STREET šŸ’‹ Call Girl 9827461493 Call Girls in Escort service book now
PARK STREET šŸ’‹ Call Girl 9827461493 Call Girls in  Escort service book nowPARK STREET šŸ’‹ Call Girl 9827461493 Call Girls in  Escort service book now
PARK STREET šŸ’‹ Call Girl 9827461493 Call Girls in Escort service book nowkapoorjyoti4444
Ā 
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All TimeCall 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Timegargpaaro
Ā 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwaitdaisycvs
Ā 
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service AvailableBerhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Availablepr788182
Ā 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon investment
Ā 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
Ā 
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...pujan9679
Ā 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGpr788182
Ā 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1kcpayne
Ā 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentationuneakwhite
Ā 
Horngrenā€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngrenā€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngrenā€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngrenā€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...ssuserf63bd7
Ā 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
Ā 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecZurliaSoop
Ā 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic
Ā 
GUWAHATI šŸ’‹ Call Girl 9827461493 Call Girls in Escort service book now
GUWAHATI šŸ’‹ Call Girl 9827461493 Call Girls in  Escort service book nowGUWAHATI šŸ’‹ Call Girl 9827461493 Call Girls in  Escort service book now
GUWAHATI šŸ’‹ Call Girl 9827461493 Call Girls in Escort service book nowkapoorjyoti4444
Ā 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Availablepr788182
Ā 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...meghakumariji156
Ā 

Recently uploaded (20)

PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation Final
Ā 
HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024
Ā 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 Updated
Ā 
PARK STREET šŸ’‹ Call Girl 9827461493 Call Girls in Escort service book now
PARK STREET šŸ’‹ Call Girl 9827461493 Call Girls in  Escort service book nowPARK STREET šŸ’‹ Call Girl 9827461493 Call Girls in  Escort service book now
PARK STREET šŸ’‹ Call Girl 9827461493 Call Girls in Escort service book now
Ā 
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All TimeCall 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Ā 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
Ā 
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service AvailableBerhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Ā 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
Ā 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
Ā 
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ā 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Ā 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
Ā 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
Ā 
Horngrenā€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngrenā€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngrenā€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngrenā€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Ā 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Ā 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Ā 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Ā 
GUWAHATI šŸ’‹ Call Girl 9827461493 Call Girls in Escort service book now
GUWAHATI šŸ’‹ Call Girl 9827461493 Call Girls in  Escort service book nowGUWAHATI šŸ’‹ Call Girl 9827461493 Call Girls in  Escort service book now
GUWAHATI šŸ’‹ Call Girl 9827461493 Call Girls in Escort service book now
Ā 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Ā 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Ā 

Kaggle "Give me some credit" challenge overview

  • 2. What is the problem?
  • 3. What is the problem? ā€¢ X Store has a retail credit card available to customers
  • 4. What is the problem? ā€¢ X Store has a retail credit card available to customers ā€¢ There can be a number of sources of loss from this product, but one is customerā€™s defaulting on their debt
  • 5. What is the problem? ā€¢ X Store has a retail credit card available to customers ā€¢ There can be a number of sources of loss from this product, but one is customerā€™s defaulting on their debt ā€¢ This prevents the store from collecting payment for products and services rendered
  • 6. Is this problem big enough to matter?
  • 7. Is this problem big enough to matter? ā€¢ Examining a slice of the customer database (150,000 customers) we ļ¬nd that 6.6% of customers were seriously delinquent in payment the last two years
  • 8. Is this problem big enough to matter? ā€¢ Examining a slice of the customer database (150,000 customers) we ļ¬nd that 6.6% of customers were seriously delinquent in payment the last two years ā€¢ If only 5% of their carried debt was the store credit card this is potentially an:
  • 9. Is this problem big enough to matter? ā€¢ Examining a slice of the customer database (150,000 customers) we ļ¬nd that 6.6% of customers were seriously delinquent in payment the last two years ā€¢ If only 5% of their carried debt was the store credit card this is potentially an: ā€¢ Average loss of $8.12 per customer
  • 10. Is this problem big enough to matter? ā€¢ Examining a slice of the customer database (150,000 customers) we ļ¬nd that 6.6% of customers were seriously delinquent in payment the last two years ā€¢ If only 5% of their carried debt was the store credit card this is potentially an: ā€¢ Average loss of $8.12 per customer ā€¢ Potential overall loss of $1.2 million
  • 11. What can be done?
  • 12. What can be done? ā€¢ There are numerous models that can be used to predict which customers will default
  • 13. What can be done? ā€¢ There are numerous models that can be used to predict which customers will default ā€¢ This could be used to decrease credit limits or cancel credit lines for current risky customers to minimize potential loss
  • 14. What can be done? ā€¢ There are numerous models that can be used to predict which customers will default ā€¢ This could be used to decrease credit limits or cancel credit lines for current risky customers to minimize potential loss ā€¢ Or better screen which customers are approved for the card
  • 15. How will I do this?
  • 16. How will I do this? ā€¢ This is a basic classiļ¬cation problem with important business implications
  • 17. How will I do this? ā€¢ This is a basic classiļ¬cation problem with important business implications ā€¢ Weā€™ll examine a few simplistic models to get an idea of performance
  • 18. How will I do this? ā€¢ This is a basic classiļ¬cation problem with important business implications ā€¢ Weā€™ll examine a few simplistic models to get an idea of performance ā€¢ Explore decision tree methods to achieve better performance
  • 19. What will the models predict delinquency? Each customer has a number of attributes
  • 20. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4
  • 21. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2
  • 22. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2 ...
  • 23. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2 ... We will use the customer attributes to predict whether they were delinquent
  • 24. How do we make sure that our solution actually has predictive power?
  • 25. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset
  • 26. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train 150,000 customers Delinquency in dataset
  • 27. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train Test 150,000 customers Delinquency in dataset 101,000 customers Delinquency not in dataset
  • 28. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train Test 150,000 customers Delinquency in dataset 101,000 customers Delinquency not in dataset None of the customers in the test dataset are used to train the model
  • 29. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train
  • 30. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3
  • 31. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3 Train 1 Train 2 Algorithm Training
  • 32. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3 Train 1 Train 2 Algorithm Training Algorithm Testing Train 3
  • 33. What matters is how well we can predict the test dataset We judge this using the accuracy, which is the number of our predictions correct out of the total number of predictions made So with 100,000 customers and an 80% accuracy we will have correctly predicted whether 80,000 customers will default or not in the next two years
  • 35. Putting accuracy in context We could save $600,000 over two years if we correctly predicted 50% of the customers that would default and changed their account to prevent it
  • 36. Putting accuracy in context We could save $600,000 over two years if we correctly predicted 50% of the customers that would default and changed their account to prevent it The potential loss is minimized by ~$8,000 for every 100,000 customers with each percentage point increase in accuracy
  • 37. Looking at the actual data
  • 38. Looking at the actual data
  • 39. Looking at the actual data
  • 40. Looking at the actual data Assume $2,500
  • 41. Looking at the actual data Assume $2,500 Assume 0
  • 42. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower
  • 43. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance
  • 44. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50%
  • 45. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50%
  • 46. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50% Simple Classiļ¬cation
  • 47. For simple classiļ¬cation we pick a single attribute and ļ¬nd the best split in the customers
  • 48. For simple classiļ¬cation we pick a single attribute and ļ¬nd the best split in the customers
  • 49. For simple classiļ¬cation we pick a single attribute and ļ¬nd the best split in the customers NumberofCustomers Times Past Due
  • 50. For simple classiļ¬cation we pick a single attribute and ļ¬nd the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1
  • 51. For simple classiļ¬cation we pick a single attribute and ļ¬nd the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
  • 52. For simple classiļ¬cation we pick a single attribute and ļ¬nd the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
  • 53. For simple classiļ¬cation we pick a single attribute and ļ¬nd the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
  • 54. For simple classiļ¬cation we pick a single attribute and ļ¬nd the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2 ...
  • 55. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number
  • 56. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent
  • 57. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
  • 58. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
  • 59. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
  • 60. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent 0.61 KGI on Test Set
  • 61. However, not all ļ¬elds are as informative Using the number of times past due 60-89 days we achieve a KGI of 0.5
  • 62. However, not all ļ¬elds are as informative Using the number of times past due 60-89 days we achieve a KGI of 0.5 The approach is naive and could be improved but our time is better spent on different algorithms
  • 63. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiļ¬cation 0.50-0.61
  • 64. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiļ¬cation 0.50-0.61 Random Forests
  • 65. A random forest starts from a decision tree Customer Data
  • 66. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes
  • 67. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30?
  • 68. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30
  • 69. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30 Yes 25,000 Customers <30
  • 70. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30 Yes 25,000 Customers <30 ...
  • 71. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1
  • 72. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1
  • 73. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 Class assignment of a customer is based on how many of the decision trees ā€œvoteā€ on how to split an attribute
  • 74. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 We use a large number of trees to not over-ļ¬t to the training data Class assignment of a customer is based on how many of the decision trees ā€œvoteā€ on how to split an attribute
  • 75. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation
  • 76. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation
  • 77. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation Also parallelized with Mahout and Hadoop since there is no dependence from one tree to the next
  • 78. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI
  • 79. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI
  • 80. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI
  • 81. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI
  • 82. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests
  • 83. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiļ¬cation 0.50-0.61 Random Forests 0.78-0.85
  • 84. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiļ¬cation 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting
  • 85. Boosting Trees is similar to a Random Forest Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No Customers >30 Data Yes Customers <30 Data ...
  • 86. Boosting Trees is similar to a Random Forest Customer Data Is age <30? No Customers >30 Data Yes Customers <30 Data ... Do an exhaustive search for best split
  • 87. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The ļ¬rst tree is optimized to minimize a loss function describing the data
  • 88. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The ļ¬rst tree is optimized to minimize a loss function describing the data The next tree is then optimized to ļ¬t whatever variability the ļ¬rst tree didnā€™t ļ¬t
  • 89. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The ļ¬rst tree is optimized to minimize a loss function describing the data The next tree is then optimized to ļ¬t whatever variability the ļ¬rst tree didnā€™t ļ¬t This is a sequential process in comparison to the random forest
  • 90. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The ļ¬rst tree is optimized to minimize a loss function describing the data The next tree is then optimized to ļ¬t whatever variability the ļ¬rst tree didnā€™t ļ¬t This is a sequential process in comparison to the random forest We also run the risk of over-ļ¬tting to the data, thus the learning rate
  • 91. Implementing Gradient Boosted Trees In Python or R it is easy for initial testing and validation
  • 92. Implementing Gradient Boosted Trees In Python or R it is easy for initial testing and validation There are implementations that use Hadoop but itā€™s more complicated to achieve the best performance
  • 93. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI
  • 94. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI
  • 95. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI 0 0.6 0.8 Learning Rate 0.75 0.8 0.85 KGI 0.2 0.4
  • 96. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI 0 0.6 0.8 Learning Rate 0.75 0.8 0.85 KGI 0.2 0.4 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees
  • 97. Moving one step further in complexity Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiļ¬cation 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting 0.71-0.8659 Blended Method
  • 98. Or more accurately an ensemble of ensemble methods Algorithm Progression
  • 99. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest
  • 100. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest Extremely Random Forest
  • 101. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest Extremely Random Forest Gradient Tree Boosting
  • 102. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . .
  • 103. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
  • 104. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
  • 105. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
  • 106. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . . Optimize the set of train probabilities to the known delinquencies
  • 107. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . . Optimize the set of train probabilities to the known delinquencies Apply the same weighting scheme to the set of test data probabilities
  • 108. Implementation can be done in a number of ways Testing in Python or R is slower, due to the sequential nature of applying the algorithms Could be faster parallelized, running each algorithm separately and combining the results
  • 109. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI
  • 110. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees Blended
  • 111. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI But this performance and the possibility of additional gains comes at a distinct time cost. 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees Blended
  • 112. Examining the continuum of choices Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiļ¬cation 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting 0.71-0.8659 Blended Method 0.864
  • 113. What would be best to implement?
  • 114. What would be best to implement? There is a large amount of optimization in the blended method that could be done
  • 115. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also
  • 116. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also Random Forests returns a reasonably good result. It is quick and easily parallelized
  • 117. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also Random Forests returns a reasonably good result. It is quick and easily parallelized Gradient Tree Boosting returns the best result and runs reasonably fast. It is not as easily parallelized though
  • 118. What would be best to implement? Random Forests returns a reasonably good result. It is quick and easily parallelized Gradient Tree Boosting returns the best result and runs reasonably fast. It is not as easily parallelized though
  • 119. Increases in predictive performance have real business value Using any of the more complex algorithms we achieve an increase of 35% in comparison to random
  • 120. Increases in predictive performance have real business value Using any of the more complex algorithms we achieve an increase of 35% in comparison to random Potential decrease of ~$420k in losses by identifying customers likely to default in the training set alone
  • 121. Thank you for your time