SlideShare a Scribd company logo
Predicting delinquency on debt
What is the problem?
What is the problem?
• X Store has a retail credit card available to
customers
What is the problem?
• X Store has a retail credit card available to
customers
• There can be a number of sources of loss
from this product, but one is customer’s
defaulting on their debt
What is the problem?
• X Store has a retail credit card available to
customers
• There can be a number of sources of loss
from this product, but one is customer’s
defaulting on their debt
• This prevents the store from collecting
payment for products and services
rendered
Is this problem big enough to matter?
Is this problem big enough to matter?
• Examining a slice of the customer database
(150,000 customers) we find that 6.6% of
customers were seriously delinquent in
payment the last two years
Is this problem big enough to matter?
• Examining a slice of the customer database
(150,000 customers) we find that 6.6% of
customers were seriously delinquent in
payment the last two years
• If only 5% of their carried debt was the
store credit card this is potentially an:
Is this problem big enough to matter?
• Examining a slice of the customer database
(150,000 customers) we find that 6.6% of
customers were seriously delinquent in
payment the last two years
• If only 5% of their carried debt was the
store credit card this is potentially an:
• Average loss of $8.12 per customer
Is this problem big enough to matter?
• Examining a slice of the customer database
(150,000 customers) we find that 6.6% of
customers were seriously delinquent in
payment the last two years
• If only 5% of their carried debt was the
store credit card this is potentially an:
• Average loss of $8.12 per customer
• Potential overall loss of $1.2 million
What can be done?
What can be done?
• There are numerous models that can be
used to predict which customers will
default
What can be done?
• There are numerous models that can be
used to predict which customers will
default
• This could be used to decrease credit limits
or cancel credit lines for current risky
customers to minimize potential loss
What can be done?
• There are numerous models that can be
used to predict which customers will
default
• This could be used to decrease credit limits
or cancel credit lines for current risky
customers to minimize potential loss
• Or better screen which customers are
approved for the card
How will I do this?
How will I do this?
• This is a basic classification problem with
important business implications
How will I do this?
• This is a basic classification problem with
important business implications
• We’ll examine a few simplistic models to
get an idea of performance
How will I do this?
• This is a basic classification problem with
important business implications
• We’ll examine a few simplistic models to
get an idea of performance
• Explore decision tree methods to achieve
better performance
What will the models predict delinquency?
Each customer has a number of attributes
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
Mary Rasmussen
Delinquent: No
Age: 73
Income: $2200
Number of Lines: 2
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
Mary Rasmussen
Delinquent: No
Age: 73
Income: $2200
Number of Lines: 2
...
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
Mary Rasmussen
Delinquent: No
Age: 73
Income: $2200
Number of Lines: 2
...
We will use the customer attributes to predict
whether they were delinquent
How do we make sure that our solution actually
has predictive power?
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
Train
150,000
customers
Delinquency
in dataset
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
Train Test
150,000
customers
Delinquency
in dataset
101,000
customers
Delinquency
not in
dataset
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
Train Test
150,000
customers
Delinquency
in dataset
101,000
customers
Delinquency
not in
dataset
None of the customers in the test dataset are
used to train the model
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Train 1
Train 2
Train 3
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Train 1
Train 2
Train 3
Train 1
Train 2
Algorithm
Training
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Train 1
Train 2
Train 3
Train 1
Train 2
Algorithm
Training
Algorithm
Testing
Train 3
What matters is how well we can predict
the test dataset
We judge this using the accuracy, which is the number
of our predictions correct out of the total number of
predictions made
So with 100,000 customers and an 80% accuracy we
will have correctly predicted whether 80,000
customers will default or not in the next two years
Putting accuracy in context
Putting accuracy in context
We could save $600,000 over two years if we
correctly predicted 50% of the customers that would
default and changed their account to prevent it
Putting accuracy in context
We could save $600,000 over two years if we
correctly predicted 50% of the customers that would
default and changed their account to prevent it
The potential loss is minimized by ~$8,000 for every
100,000 customers with each percentage point
increase in accuracy
Looking at the actual data
Looking at the actual data
Looking at the actual data
Looking at the actual data
Assume
$2,500
Looking at the actual data
Assume
$2,500
Assume
0
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
50%
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
50%
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
50%
Simple
Classification
For simple classification we pick a single attribute
and find the best split in the customers
For simple classification we pick a single attribute
and find the best split in the customers
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2 ...
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0 20 40 60 80 100
Number of Times 30-59 Days Past Due
0
0.2
0.4
0.6
0.8
Accuracy
Precision
Sensitivity
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0 20 40 60 80 100
Number of Times 30-59 Days Past Due
0
0.2
0.4
0.6
0.8
Accuracy
Precision
Sensitivity
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0 20 40 60 80 100
Number of Times 30-59 Days Past Due
0
0.2
0.4
0.6
0.8
Accuracy
Precision
Sensitivity
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0.61 KGI on Test Set
However, not all fields are as informative
Using the number of times past due 60-89 days
we achieve a KGI of 0.5
However, not all fields are as informative
Using the number of times past due 60-89 days
we achieve a KGI of 0.5
The approach is naive and could be improved but
our time is better spent on different algorithms
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Random
Forests
A random forest starts from a decision tree
Customer Data
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
75,000
Customers>30
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
75,000
Customers>30
Yes
25,000
Customers <30
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
75,000
Customers>30
Yes
25,000
Customers <30
...
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
Class assignment of a customer is based on how many
of the decision trees “vote” on how to split an attribute
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
We use a large number of trees to not over-fit to the
training data
Class assignment of a customer is based on how many
of the decision trees “vote” on how to split an attribute
The Random Forest algorithm are easily implemented
In Python or R for initial testing and validation
The Random Forest algorithm are easily implemented
In Python or R for initial testing and validation
The Random Forest algorithm are easily implemented
In Python or R for initial testing and validation
Also parallelized with Mahout and Hadoop since
there is no dependence from one tree to the next
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
1000 trees: 0.850 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
1000 trees: 0.850 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
1000 trees: 0.850 KGI
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Random
Forests
0.78-0.85
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Random
Forests
0.78-0.85
Gradient Tree
Boosting
Boosting Trees is similar to a Random Forest
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
Customers
>30 Data
Yes
Customers
<30 Data
...
Boosting Trees is similar to a Random Forest
Customer Data
Is age <30?
No
Customers
>30 Data
Yes
Customers
<30 Data
...
Do an exhaustive search
for best split
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The first tree is optimized to minimize
a loss function describing the data
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The first tree is optimized to minimize
a loss function describing the data
The next tree is then optimized to
fit whatever variability the first
tree didn’t fit
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The first tree is optimized to minimize
a loss function describing the data
The next tree is then optimized to
fit whatever variability the first
tree didn’t fit
This is a sequential process in
comparison to the random forest
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The first tree is optimized to minimize
a loss function describing the data
The next tree is then optimized to
fit whatever variability the first
tree didn’t fit
This is a sequential process in
comparison to the random forest
We also run the risk of over-fitting
to the data, thus the learning rate
Implementing Gradient Boosted Trees
In Python or R it is easy for initial testing and validation
Implementing Gradient Boosted Trees
In Python or R it is easy for initial testing and validation
There are implementations that use Hadoop but it’s
more complicated to achieve the best performance
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
1000 trees, 0.1 Learning: 0.865248 KGI
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
1000 trees, 0.1 Learning: 0.865248 KGI
0 0.6 0.8
Learning Rate
0.75
0.8
0.85
KGI
0.2 0.4
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
1000 trees, 0.1 Learning: 0.865248 KGI
0 0.6 0.8
Learning Rate
0.75
0.8
0.85
KGI
0.2 0.4
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Boosting Trees
Moving one step further in complexity
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Random
Forests
0.78-0.85
Gradient Tree
Boosting
0.71-0.8659
Blended
Method
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Random Forest
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Random Forest
Extremely Random Forest
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Random Forest
Extremely Random Forest
Gradient Tree Boosting
Or more accurately an ensemble of
ensemble methods
Algorithm ProgressionTrain Data Probabilities
Random Forest
Extremely Random Forest
Gradient Tree Boosting
0.1
0.5
0.01
0.8
0.7
.
.
.
Or more accurately an ensemble of
ensemble methods
Algorithm ProgressionTrain Data Probabilities
Random Forest
Extremely Random Forest
Gradient Tree Boosting
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Or more accurately an ensemble of
ensemble methods
Algorithm ProgressionTrain Data Probabilities
Random Forest
Extremely Random Forest
Gradient Tree Boosting
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Combine all of the model information
Train Data Probabilities
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Combine all of the model information
Train Data Probabilities
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Optimize the set of train probabilities
to the known delinquencies
Combine all of the model information
Train Data Probabilities
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Optimize the set of train probabilities
to the known delinquencies
Apply the same weighting scheme to the
set of test data probabilities
Implementation can be done in a number of ways
Testing in Python or R is slower, due to the sequential nature
of applying the algorithms
Could be faster parallelized, running each algorithm separately
and combining the results
Assessing model performance
Blending Performance, 100 trees: 0.864394 KGI
Assessing model performance
Blending Performance, 100 trees: 0.864394 KGI
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Boosting Trees
Blended
Assessing model performance
Blending Performance, 100 trees: 0.864394 KGI
But this performance and the possibility of
additional gains comes at a distinct time cost.
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Boosting Trees
Blended
Examining the continuum of choices
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Random
Forests
0.78-0.85
Gradient Tree
Boosting
0.71-0.8659
Blended
Method
0.864
What would be best to implement?
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
However, this algorithm takes the longest to run.
This constraint will apply in testing and validation also
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
However, this algorithm takes the longest to run.
This constraint will apply in testing and validation also
Random Forests returns a reasonably good result.
It is quick and easily parallelized
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
However, this algorithm takes the longest to run.
This constraint will apply in testing and validation also
Random Forests returns a reasonably good result.
It is quick and easily parallelized
Gradient Tree Boosting returns the best result and
runs reasonably fast.
It is not as easily parallelized though
What would be best to implement?
Random Forests returns a reasonably good result.
It is quick and easily parallelized
Gradient Tree Boosting returns the best result and
runs reasonably fast.
It is not as easily parallelized though
Increases in predictive performance have real
business value
Using any of the more complex algorithms we
achieve an increase of 35% in comparison to random
Increases in predictive performance have real
business value
Using any of the more complex algorithms we
achieve an increase of 35% in comparison to random
Potential decrease of ~$420k in losses by identifying
customers likely to default in the training set alone
Thank you for your time

More Related Content

What's hot

Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
Hirak Sen Roy
 
Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1
Akanksha Jain
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
Venkata Reddy Konasani
 
Credit defaulter analysis
Credit defaulter analysisCredit defaulter analysis
Credit defaulter analysis
Nimai Chand Das Adhikari
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
BU - PG Master Computing Conference
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
Mihai Enescu
 
Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit Rish
Arsalan Qadri
 
Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Data analytics telecom churn final ppt
Data analytics telecom churn final ppt
Gunvansh Khanna
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
Vasudev pendyala
 
Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default prediction
ALTEN Calsoft Labs
 
Delopment and testing of a credit scoring model
Delopment and testing of a credit scoring modelDelopment and testing of a credit scoring model
Delopment and testing of a credit scoring model
Mattia Ciprian
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
Venkata Reddy Konasani
 
Credit Risk Management ppt
Credit Risk Management pptCredit Risk Management ppt
Credit Risk Management ppt
Sneha Salian
 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detection
Ravi Gupta
 
Early warning system_ white paper
Early warning system_ white paperEarly warning system_ white paper
Early warning system_ white paper
Federica Tasselli
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan Data
Deep Borkar
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptxBank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptx
Boston Institute of Analytics
 
Telecom Churn Prediction
Telecom Churn PredictionTelecom Churn Prediction
Telecom Churn Prediction
Anurag Mukhopadhyay
 

What's hot (20)

Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Credit defaulter analysis
Credit defaulter analysisCredit defaulter analysis
Credit defaulter analysis
 
Creditscore
CreditscoreCreditscore
Creditscore
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit Rish
 
Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Data analytics telecom churn final ppt
Data analytics telecom churn final ppt
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default prediction
 
Delopment and testing of a credit scoring model
Delopment and testing of a credit scoring modelDelopment and testing of a credit scoring model
Delopment and testing of a credit scoring model
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
 
Credit Risk Management ppt
Credit Risk Management pptCredit Risk Management ppt
Credit Risk Management ppt
 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detection
 
Credit risk models
Credit risk modelsCredit risk models
Credit risk models
 
Early warning system_ white paper
Early warning system_ white paperEarly warning system_ white paper
Early warning system_ white paper
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan Data
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptxBank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptx
 
Telecom Churn Prediction
Telecom Churn PredictionTelecom Churn Prediction
Telecom Churn Prediction
 

Viewers also liked

Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
HJ van Veen
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
Reading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li ChenluReading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li Chenlu
Christian Robert
 
Q trade presentation
Q trade presentationQ trade presentation
Q trade presentation
ewig123
 
Tree advanced
Tree advancedTree advanced
Tree advanced
Jinseob Kim
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
arogozhnikov
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
Seonho Park
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
Honglin Yu
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
Machine Learning Valencia
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
Gilles Louppe
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
Zihui Li
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
Pier Luca Lanzi
 
Tda presentation
Tda presentationTda presentation
Tda presentation
HJ van Veen
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Sri Ambati
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Deepak George
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Parth Khare
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
HJ van Veen
 
Kaggle: Crowd Sourcing for Data Analytics
Kaggle: Crowd Sourcing for Data AnalyticsKaggle: Crowd Sourcing for Data Analytics
Kaggle: Crowd Sourcing for Data Analytics
Jeffrey Funk Business Models
 

Viewers also liked (20)

Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Reading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li ChenluReading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li Chenlu
 
Q trade presentation
Q trade presentationQ trade presentation
Q trade presentation
 
Tree advanced
Tree advancedTree advanced
Tree advanced
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
Tda presentation
Tda presentationTda presentation
Tda presentation
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
 
Kaggle: Crowd Sourcing for Data Analytics
Kaggle: Crowd Sourcing for Data AnalyticsKaggle: Crowd Sourcing for Data Analytics
Kaggle: Crowd Sourcing for Data Analytics
 

Similar to Kaggle "Give me some credit" challenge overview

Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
Boston Institute of Analytics
 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonali
Sonali Gupta
 
Important Terminologies In Statistical Inference
Important  Terminologies In  Statistical  InferenceImportant  Terminologies In  Statistical  Inference
Important Terminologies In Statistical InferenceZoha Qureshi
 
Tpmg Manage Cust Prof Final
Tpmg Manage Cust Prof FinalTpmg Manage Cust Prof Final
Tpmg Manage Cust Prof Final
John Tyler
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
Carolyn Knight
 
Customer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance AgentsCustomer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance Agents
Scott Boren
 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.
customersforever
 
Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)
Mira McKee
 
Being Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongBeing Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're Wrong
Data Con LA
 
David apple typeform retention story - saa stock (1)
David apple   typeform retention story - saa stock (1)David apple   typeform retention story - saa stock (1)
David apple typeform retention story - saa stock (1)
SaaStock
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Pedro Ecija Serrano
 
PATH | WD
PATH | WDPATH | WD
PATH | WD
ChristineRonk
 
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
3 Birds Marketing LLC
 
Stage Presentation
Stage PresentationStage Presentation
Stage PresentationSCI INFO
 
Transform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive AnalyticsTransform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive Analytics
FIS
 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay Score
Jackie Rojas
 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay Score
Beth Hall
 
Valiance Portfolio
Valiance PortfolioValiance Portfolio
Valiance PortfolioRohit Pandey
 
Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509gnorth
 
Final presentation - Group10(ADS)
Final presentation - Group10(ADS)Final presentation - Group10(ADS)
Final presentation - Group10(ADS)
Varsha Chidambar Holennavar
 

Similar to Kaggle "Give me some credit" challenge overview (20)

Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonali
 
Important Terminologies In Statistical Inference
Important  Terminologies In  Statistical  InferenceImportant  Terminologies In  Statistical  Inference
Important Terminologies In Statistical Inference
 
Tpmg Manage Cust Prof Final
Tpmg Manage Cust Prof FinalTpmg Manage Cust Prof Final
Tpmg Manage Cust Prof Final
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Customer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance AgentsCustomer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance Agents
 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.
 
Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)
 
Being Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongBeing Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're Wrong
 
David apple typeform retention story - saa stock (1)
David apple   typeform retention story - saa stock (1)David apple   typeform retention story - saa stock (1)
David apple typeform retention story - saa stock (1)
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
 
PATH | WD
PATH | WDPATH | WD
PATH | WD
 
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
 
Stage Presentation
Stage PresentationStage Presentation
Stage Presentation
 
Transform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive AnalyticsTransform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive Analytics
 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay Score
 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay Score
 
Valiance Portfolio
Valiance PortfolioValiance Portfolio
Valiance Portfolio
 
Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509
 
Final presentation - Group10(ADS)
Final presentation - Group10(ADS)Final presentation - Group10(ADS)
Final presentation - Group10(ADS)
 

More from Adam Pah

Why Python?
Why Python?Why Python?
Why Python?
Adam Pah
 
Quest overview
Quest overviewQuest overview
Quest overview
Adam Pah
 
A quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for researchA quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for research
Adam Pah
 
Pah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuildPah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuild
Adam Pah
 
D3 interactivity Linegraph basic example
D3 interactivity Linegraph basic exampleD3 interactivity Linegraph basic example
D3 interactivity Linegraph basic example
Adam Pah
 
Mercurial Tutorial
Mercurial TutorialMercurial Tutorial
Mercurial Tutorial
Adam Pah
 
Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Adam Pah
 

More from Adam Pah (7)

Why Python?
Why Python?Why Python?
Why Python?
 
Quest overview
Quest overviewQuest overview
Quest overview
 
A quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for researchA quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for research
 
Pah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuildPah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuild
 
D3 interactivity Linegraph basic example
D3 interactivity Linegraph basic exampleD3 interactivity Linegraph basic example
D3 interactivity Linegraph basic example
 
Mercurial Tutorial
Mercurial TutorialMercurial Tutorial
Mercurial Tutorial
 
Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"
 

Recently uploaded

Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111
zoyaansari11365
 
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Boris Ziegler
 
3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx
tanyjahb
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
bosssp10
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
Norma Mushkat Gaffin
 
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdfMeas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
dylandmeas
 
Authentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto RicoAuthentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto Rico
Corey Perlman, Social Media Speaker and Consultant
 
Project File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdfProject File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdf
RajPriye
 
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdfModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
fisherameliaisabella
 
Brand Analysis for an artist named Struan
Brand Analysis for an artist named StruanBrand Analysis for an artist named Struan
Brand Analysis for an artist named Struan
sarahvanessa51503
 
Digital Transformation and IT Strategy Toolkit and Templates
Digital Transformation and IT Strategy Toolkit and TemplatesDigital Transformation and IT Strategy Toolkit and Templates
Digital Transformation and IT Strategy Toolkit and Templates
Aurelien Domont, MBA
 
Exploring Patterns of Connection with Social Dreaming
Exploring Patterns of Connection with Social DreamingExploring Patterns of Connection with Social Dreaming
Exploring Patterns of Connection with Social Dreaming
Nicola Wreford-Howard
 
Auditing study material for b.com final year students
Auditing study material for b.com final year  studentsAuditing study material for b.com final year  students
Auditing study material for b.com final year students
narasimhamurthyh4
 
Building Your Employer Brand with Social Media
Building Your Employer Brand with Social MediaBuilding Your Employer Brand with Social Media
Building Your Employer Brand with Social Media
LuanWise
 
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesEvent Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Holger Mueller
 
Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...
dylandmeas
 
Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
SynapseIndia
 
Training my puppy and implementation in this story
Training my puppy and implementation in this storyTraining my puppy and implementation in this story
Training my puppy and implementation in this story
WilliamRodrigues148
 
Cracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptxCracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptx
Workforce Group
 
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBdCree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
creerey
 

Recently uploaded (20)

Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111
 
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
 
3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
 
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdfMeas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
 
Authentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto RicoAuthentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto Rico
 
Project File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdfProject File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdf
 
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdfModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
 
Brand Analysis for an artist named Struan
Brand Analysis for an artist named StruanBrand Analysis for an artist named Struan
Brand Analysis for an artist named Struan
 
Digital Transformation and IT Strategy Toolkit and Templates
Digital Transformation and IT Strategy Toolkit and TemplatesDigital Transformation and IT Strategy Toolkit and Templates
Digital Transformation and IT Strategy Toolkit and Templates
 
Exploring Patterns of Connection with Social Dreaming
Exploring Patterns of Connection with Social DreamingExploring Patterns of Connection with Social Dreaming
Exploring Patterns of Connection with Social Dreaming
 
Auditing study material for b.com final year students
Auditing study material for b.com final year  studentsAuditing study material for b.com final year  students
Auditing study material for b.com final year students
 
Building Your Employer Brand with Social Media
Building Your Employer Brand with Social MediaBuilding Your Employer Brand with Social Media
Building Your Employer Brand with Social Media
 
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesEvent Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
 
Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...
 
Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
 
Training my puppy and implementation in this story
Training my puppy and implementation in this storyTraining my puppy and implementation in this story
Training my puppy and implementation in this story
 
Cracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptxCracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptx
 
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBdCree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
 

Kaggle "Give me some credit" challenge overview

  • 2. What is the problem?
  • 3. What is the problem? • X Store has a retail credit card available to customers
  • 4. What is the problem? • X Store has a retail credit card available to customers • There can be a number of sources of loss from this product, but one is customer’s defaulting on their debt
  • 5. What is the problem? • X Store has a retail credit card available to customers • There can be a number of sources of loss from this product, but one is customer’s defaulting on their debt • This prevents the store from collecting payment for products and services rendered
  • 6. Is this problem big enough to matter?
  • 7. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we find that 6.6% of customers were seriously delinquent in payment the last two years
  • 8. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we find that 6.6% of customers were seriously delinquent in payment the last two years • If only 5% of their carried debt was the store credit card this is potentially an:
  • 9. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we find that 6.6% of customers were seriously delinquent in payment the last two years • If only 5% of their carried debt was the store credit card this is potentially an: • Average loss of $8.12 per customer
  • 10. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we find that 6.6% of customers were seriously delinquent in payment the last two years • If only 5% of their carried debt was the store credit card this is potentially an: • Average loss of $8.12 per customer • Potential overall loss of $1.2 million
  • 11. What can be done?
  • 12. What can be done? • There are numerous models that can be used to predict which customers will default
  • 13. What can be done? • There are numerous models that can be used to predict which customers will default • This could be used to decrease credit limits or cancel credit lines for current risky customers to minimize potential loss
  • 14. What can be done? • There are numerous models that can be used to predict which customers will default • This could be used to decrease credit limits or cancel credit lines for current risky customers to minimize potential loss • Or better screen which customers are approved for the card
  • 15. How will I do this?
  • 16. How will I do this? • This is a basic classification problem with important business implications
  • 17. How will I do this? • This is a basic classification problem with important business implications • We’ll examine a few simplistic models to get an idea of performance
  • 18. How will I do this? • This is a basic classification problem with important business implications • We’ll examine a few simplistic models to get an idea of performance • Explore decision tree methods to achieve better performance
  • 19. What will the models predict delinquency? Each customer has a number of attributes
  • 20. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4
  • 21. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2
  • 22. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2 ...
  • 23. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2 ... We will use the customer attributes to predict whether they were delinquent
  • 24. How do we make sure that our solution actually has predictive power?
  • 25. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset
  • 26. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train 150,000 customers Delinquency in dataset
  • 27. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train Test 150,000 customers Delinquency in dataset 101,000 customers Delinquency not in dataset
  • 28. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train Test 150,000 customers Delinquency in dataset 101,000 customers Delinquency not in dataset None of the customers in the test dataset are used to train the model
  • 29. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train
  • 30. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3
  • 31. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3 Train 1 Train 2 Algorithm Training
  • 32. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3 Train 1 Train 2 Algorithm Training Algorithm Testing Train 3
  • 33. What matters is how well we can predict the test dataset We judge this using the accuracy, which is the number of our predictions correct out of the total number of predictions made So with 100,000 customers and an 80% accuracy we will have correctly predicted whether 80,000 customers will default or not in the next two years
  • 35. Putting accuracy in context We could save $600,000 over two years if we correctly predicted 50% of the customers that would default and changed their account to prevent it
  • 36. Putting accuracy in context We could save $600,000 over two years if we correctly predicted 50% of the customers that would default and changed their account to prevent it The potential loss is minimized by ~$8,000 for every 100,000 customers with each percentage point increase in accuracy
  • 37. Looking at the actual data
  • 38. Looking at the actual data
  • 39. Looking at the actual data
  • 40. Looking at the actual data Assume $2,500
  • 41. Looking at the actual data Assume $2,500 Assume 0
  • 42. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower
  • 43. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance
  • 44. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50%
  • 45. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50%
  • 46. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50% Simple Classification
  • 47. For simple classification we pick a single attribute and find the best split in the customers
  • 48. For simple classification we pick a single attribute and find the best split in the customers
  • 49. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due
  • 50. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1
  • 51. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
  • 52. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
  • 53. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
  • 54. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2 ...
  • 55. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number
  • 56. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent
  • 57. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
  • 58. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
  • 59. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
  • 60. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent 0.61 KGI on Test Set
  • 61. However, not all fields are as informative Using the number of times past due 60-89 days we achieve a KGI of 0.5
  • 62. However, not all fields are as informative Using the number of times past due 60-89 days we achieve a KGI of 0.5 The approach is naive and could be improved but our time is better spent on different algorithms
  • 63. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61
  • 64. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61 Random Forests
  • 65. A random forest starts from a decision tree Customer Data
  • 66. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes
  • 67. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30?
  • 68. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30
  • 69. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30 Yes 25,000 Customers <30
  • 70. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30 Yes 25,000 Customers <30 ...
  • 71. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1
  • 72. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1
  • 73. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 Class assignment of a customer is based on how many of the decision trees “vote” on how to split an attribute
  • 74. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 We use a large number of trees to not over-fit to the training data Class assignment of a customer is based on how many of the decision trees “vote” on how to split an attribute
  • 75. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation
  • 76. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation
  • 77. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation Also parallelized with Mahout and Hadoop since there is no dependence from one tree to the next
  • 78. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI
  • 79. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI
  • 80. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI
  • 81. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI
  • 82. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests
  • 83. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61 Random Forests 0.78-0.85
  • 84. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting
  • 85. Boosting Trees is similar to a Random Forest Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No Customers >30 Data Yes Customers <30 Data ...
  • 86. Boosting Trees is similar to a Random Forest Customer Data Is age <30? No Customers >30 Data Yes Customers <30 Data ... Do an exhaustive search for best split
  • 87. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The first tree is optimized to minimize a loss function describing the data
  • 88. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The first tree is optimized to minimize a loss function describing the data The next tree is then optimized to fit whatever variability the first tree didn’t fit
  • 89. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The first tree is optimized to minimize a loss function describing the data The next tree is then optimized to fit whatever variability the first tree didn’t fit This is a sequential process in comparison to the random forest
  • 90. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The first tree is optimized to minimize a loss function describing the data The next tree is then optimized to fit whatever variability the first tree didn’t fit This is a sequential process in comparison to the random forest We also run the risk of over-fitting to the data, thus the learning rate
  • 91. Implementing Gradient Boosted Trees In Python or R it is easy for initial testing and validation
  • 92. Implementing Gradient Boosted Trees In Python or R it is easy for initial testing and validation There are implementations that use Hadoop but it’s more complicated to achieve the best performance
  • 93. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI
  • 94. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI
  • 95. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI 0 0.6 0.8 Learning Rate 0.75 0.8 0.85 KGI 0.2 0.4
  • 96. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI 0 0.6 0.8 Learning Rate 0.75 0.8 0.85 KGI 0.2 0.4 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees
  • 97. Moving one step further in complexity Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting 0.71-0.8659 Blended Method
  • 98. Or more accurately an ensemble of ensemble methods Algorithm Progression
  • 99. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest
  • 100. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest Extremely Random Forest
  • 101. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest Extremely Random Forest Gradient Tree Boosting
  • 102. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . .
  • 103. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
  • 104. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
  • 105. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
  • 106. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . . Optimize the set of train probabilities to the known delinquencies
  • 107. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . . Optimize the set of train probabilities to the known delinquencies Apply the same weighting scheme to the set of test data probabilities
  • 108. Implementation can be done in a number of ways Testing in Python or R is slower, due to the sequential nature of applying the algorithms Could be faster parallelized, running each algorithm separately and combining the results
  • 109. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI
  • 110. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees Blended
  • 111. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI But this performance and the possibility of additional gains comes at a distinct time cost. 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees Blended
  • 112. Examining the continuum of choices Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting 0.71-0.8659 Blended Method 0.864
  • 113. What would be best to implement?
  • 114. What would be best to implement? There is a large amount of optimization in the blended method that could be done
  • 115. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also
  • 116. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also Random Forests returns a reasonably good result. It is quick and easily parallelized
  • 117. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also Random Forests returns a reasonably good result. It is quick and easily parallelized Gradient Tree Boosting returns the best result and runs reasonably fast. It is not as easily parallelized though
  • 118. What would be best to implement? Random Forests returns a reasonably good result. It is quick and easily parallelized Gradient Tree Boosting returns the best result and runs reasonably fast. It is not as easily parallelized though
  • 119. Increases in predictive performance have real business value Using any of the more complex algorithms we achieve an increase of 35% in comparison to random
  • 120. Increases in predictive performance have real business value Using any of the more complex algorithms we achieve an increase of 35% in comparison to random Potential decrease of ~$420k in losses by identifying customers likely to default in the training set alone
  • 121. Thank you for your time