Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Predicting Delinquency-Give me some... by pragativbora 3202 views
- Model building in credit card and l... by Venkata Reddy Kon... 6313 views
- Kaggle presentation by HJ van Veen 4985 views
- Credit Risk Model Building Steps by Venkata Reddy Kon... 20694 views
- Feature Engineering by HJ van Veen 14721 views
- Somebody Give Me Some Credit! by The Easterbrook T... 345 views

3,914 views

Published on

http://www.npcompleteheart.com/project/kaggle-give-me-some-credit/

No Downloads

Total views

3,914

On SlideShare

0

From Embeds

0

Number of Embeds

206

Shares

0

Downloads

206

Comments

0

Likes

6

No embeds

No notes for slide

- 1. Predicting delinquency on debt
- 2. What is the problem?
- 3. What is the problem? • X Store has a retail credit card available to customers
- 4. What is the problem? • X Store has a retail credit card available to customers • There can be a number of sources of loss from this product, but one is customer’s defaulting on their debt
- 5. What is the problem? • X Store has a retail credit card available to customers • There can be a number of sources of loss from this product, but one is customer’s defaulting on their debt • This prevents the store from collecting payment for products and services rendered
- 6. Is this problem big enough to matter?
- 7. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we ﬁnd that 6.6% of customers were seriously delinquent in payment the last two years
- 8. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we ﬁnd that 6.6% of customers were seriously delinquent in payment the last two years • If only 5% of their carried debt was the store credit card this is potentially an:
- 9. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we ﬁnd that 6.6% of customers were seriously delinquent in payment the last two years • If only 5% of their carried debt was the store credit card this is potentially an: • Average loss of $8.12 per customer
- 10. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we ﬁnd that 6.6% of customers were seriously delinquent in payment the last two years • If only 5% of their carried debt was the store credit card this is potentially an: • Average loss of $8.12 per customer • Potential overall loss of $1.2 million
- 11. What can be done?
- 12. What can be done? • There are numerous models that can be used to predict which customers will default
- 13. What can be done? • There are numerous models that can be used to predict which customers will default • This could be used to decrease credit limits or cancel credit lines for current risky customers to minimize potential loss
- 14. What can be done? • There are numerous models that can be used to predict which customers will default • This could be used to decrease credit limits or cancel credit lines for current risky customers to minimize potential loss • Or better screen which customers are approved for the card
- 15. How will I do this?
- 16. How will I do this? • This is a basic classiﬁcation problem with important business implications
- 17. How will I do this? • This is a basic classiﬁcation problem with important business implications • We’ll examine a few simplistic models to get an idea of performance
- 18. How will I do this? • This is a basic classiﬁcation problem with important business implications • We’ll examine a few simplistic models to get an idea of performance • Explore decision tree methods to achieve better performance
- 19. What will the models predict delinquency? Each customer has a number of attributes
- 20. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4
- 21. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2
- 22. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2 ...
- 23. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2 ... We will use the customer attributes to predict whether they were delinquent
- 24. How do we make sure that our solution actually has predictive power?
- 25. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset
- 26. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train 150,000 customers Delinquency in dataset
- 27. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train Test 150,000 customers Delinquency in dataset 101,000 customers Delinquency not in dataset
- 28. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train Test 150,000 customers Delinquency in dataset 101,000 customers Delinquency not in dataset None of the customers in the test dataset are used to train the model
- 29. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train
- 30. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3
- 31. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3 Train 1 Train 2 Algorithm Training
- 32. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3 Train 1 Train 2 Algorithm Training Algorithm Testing Train 3
- 33. What matters is how well we can predict the test dataset We judge this using the accuracy, which is the number of our predictions correct out of the total number of predictions made So with 100,000 customers and an 80% accuracy we will have correctly predicted whether 80,000 customers will default or not in the next two years
- 34. Putting accuracy in context
- 35. Putting accuracy in context We could save $600,000 over two years if we correctly predicted 50% of the customers that would default and changed their account to prevent it
- 36. Putting accuracy in context We could save $600,000 over two years if we correctly predicted 50% of the customers that would default and changed their account to prevent it The potential loss is minimized by ~$8,000 for every 100,000 customers with each percentage point increase in accuracy
- 37. Looking at the actual data
- 38. Looking at the actual data
- 39. Looking at the actual data
- 40. Looking at the actual data Assume $2,500
- 41. Looking at the actual data Assume $2,500 Assume 0
- 42. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower
- 43. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance
- 44. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50%
- 45. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50%
- 46. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50% Simple Classiﬁcation
- 47. For simple classiﬁcation we pick a single attribute and ﬁnd the best split in the customers
- 48. For simple classiﬁcation we pick a single attribute and ﬁnd the best split in the customers
- 49. For simple classiﬁcation we pick a single attribute and ﬁnd the best split in the customers NumberofCustomers Times Past Due
- 50. For simple classiﬁcation we pick a single attribute and ﬁnd the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1
- 51. For simple classiﬁcation we pick a single attribute and ﬁnd the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
- 52. For simple classiﬁcation we pick a single attribute and ﬁnd the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
- 53. For simple classiﬁcation we pick a single attribute and ﬁnd the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
- 54. For simple classiﬁcation we pick a single attribute and ﬁnd the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2 ...
- 55. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number
- 56. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent
- 57. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
- 58. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
- 59. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
- 60. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent 0.61 KGI on Test Set
- 61. However, not all ﬁelds are as informative Using the number of times past due 60-89 days we achieve a KGI of 0.5
- 62. However, not all ﬁelds are as informative Using the number of times past due 60-89 days we achieve a KGI of 0.5 The approach is naive and could be improved but our time is better spent on different algorithms
- 63. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiﬁcation 0.50-0.61
- 64. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiﬁcation 0.50-0.61 Random Forests
- 65. A random forest starts from a decision tree Customer Data
- 66. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes
- 67. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30?
- 68. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30
- 69. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30 Yes 25,000 Customers <30
- 70. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30 Yes 25,000 Customers <30 ...
- 71. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1
- 72. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1
- 73. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 Class assignment of a customer is based on how many of the decision trees “vote” on how to split an attribute
- 74. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 We use a large number of trees to not over-ﬁt to the training data Class assignment of a customer is based on how many of the decision trees “vote” on how to split an attribute
- 75. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation
- 76. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation
- 77. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation Also parallelized with Mahout and Hadoop since there is no dependence from one tree to the next
- 78. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI
- 79. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI
- 80. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI
- 81. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI
- 82. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests
- 83. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiﬁcation 0.50-0.61 Random Forests 0.78-0.85
- 84. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiﬁcation 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting
- 85. Boosting Trees is similar to a Random Forest Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No Customers >30 Data Yes Customers <30 Data ...
- 86. Boosting Trees is similar to a Random Forest Customer Data Is age <30? No Customers >30 Data Yes Customers <30 Data ... Do an exhaustive search for best split
- 87. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The ﬁrst tree is optimized to minimize a loss function describing the data
- 88. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The ﬁrst tree is optimized to minimize a loss function describing the data The next tree is then optimized to ﬁt whatever variability the ﬁrst tree didn’t ﬁt
- 89. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The ﬁrst tree is optimized to minimize a loss function describing the data The next tree is then optimized to ﬁt whatever variability the ﬁrst tree didn’t ﬁt This is a sequential process in comparison to the random forest
- 90. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The ﬁrst tree is optimized to minimize a loss function describing the data The next tree is then optimized to ﬁt whatever variability the ﬁrst tree didn’t ﬁt This is a sequential process in comparison to the random forest We also run the risk of over-ﬁtting to the data, thus the learning rate
- 91. Implementing Gradient Boosted Trees In Python or R it is easy for initial testing and validation
- 92. Implementing Gradient Boosted Trees In Python or R it is easy for initial testing and validation There are implementations that use Hadoop but it’s more complicated to achieve the best performance
- 93. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI
- 94. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI
- 95. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI 0 0.6 0.8 Learning Rate 0.75 0.8 0.85 KGI 0.2 0.4
- 96. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI 0 0.6 0.8 Learning Rate 0.75 0.8 0.85 KGI 0.2 0.4 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees
- 97. Moving one step further in complexity Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiﬁcation 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting 0.71-0.8659 Blended Method
- 98. Or more accurately an ensemble of ensemble methods Algorithm Progression
- 99. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest
- 100. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest Extremely Random Forest
- 101. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest Extremely Random Forest Gradient Tree Boosting
- 102. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . .
- 103. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
- 104. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
- 105. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
- 106. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . . Optimize the set of train probabilities to the known delinquencies
- 107. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . . Optimize the set of train probabilities to the known delinquencies Apply the same weighting scheme to the set of test data probabilities
- 108. Implementation can be done in a number of ways Testing in Python or R is slower, due to the sequential nature of applying the algorithms Could be faster parallelized, running each algorithm separately and combining the results
- 109. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI
- 110. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees Blended
- 111. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI But this performance and the possibility of additional gains comes at a distinct time cost. 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees Blended
- 112. Examining the continuum of choices Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classiﬁcation 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting 0.71-0.8659 Blended Method 0.864
- 113. What would be best to implement?
- 114. What would be best to implement? There is a large amount of optimization in the blended method that could be done
- 115. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also
- 116. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also Random Forests returns a reasonably good result. It is quick and easily parallelized
- 117. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also Random Forests returns a reasonably good result. It is quick and easily parallelized Gradient Tree Boosting returns the best result and runs reasonably fast. It is not as easily parallelized though
- 118. What would be best to implement? Random Forests returns a reasonably good result. It is quick and easily parallelized Gradient Tree Boosting returns the best result and runs reasonably fast. It is not as easily parallelized though
- 119. Increases in predictive performance have real business value Using any of the more complex algorithms we achieve an increase of 35% in comparison to random
- 120. Increases in predictive performance have real business value Using any of the more complex algorithms we achieve an increase of 35% in comparison to random Potential decrease of ~$420k in losses by identifying customers likely to default in the training set alone
- 121. Thank you for your time

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment