2012 predictive clusters

717 views

Published on

Presentation at the SAS Global Forum 2012, Orlando, FL.

Presenters:
Alejandro Correa Bahnsen
Andres Felipe Gonzalez Montoya

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
717
On SlideShare
0
From Embeds
0
Number of Embeds
233
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

2012 predictive clusters

  1. 1. Constructing a Credit RiskScorecard using PredictiveClustersDarwin Amezquita, Colpatria-Scotia BankAlejandro Correa, Colpatria-Scotia BankAndrés González, Colpatria-Scotia BankCatherine Nieto, Colpatria-Scotia Bank
  2. 2. Contents Introduction Data description General Concepts Modeling Results Conclusions
  3. 3. Introduction Innovation – Competitiveness. New solutions using known techniques. Cluster Analysis. SAS®.
  4. 4. Objective Improve credit risk Scorecards. Cluster analysis. Descriptive classification technique. Predictive process.
  5. 5. How? Methodology 1TotalPopulationCredit riskscorecardsLogisticRegressionMLP NeuralNetwork Methodology 2TotalPopulationClusteranalysisPredictiveallocationalgorithmCredit riskscorecardsFinalclassificationScore
  6. 6. Data Description Four different databases – Financial products. Specific default definition. Variables from X1 to Xn.DataNumber ofGoodsNumber ofBadsTotalBadRateNumber ofVariablesDatabase 1 81.659 5.394 87.053 6,2% 7Database 2 12.065 2.258 14.323 15,8% 29Database 3 50.670 3.797 54.467 7,0% 25Database 4 71.127 54.430 125.557 43,4% 7PayrollC.C.VehicleNoExperience
  7. 7. General Concepts Cluster Analysis K-means Kohonen SOM Logistic Regression Multinomial Logistic Regression Multi-layer Perceptron Neural Network Classification distances Minimum Euclidian Distance Minimum Adjusted Distance Minimum Mahalanobis Distance F1 Score
  8. 8. General Concepts Cluster Analysis K-means Kohonen SOM Logistic Regression Multinomial Logistic Regression Multi-layer Perceptron Neural Network Classification distances Minimum Euclidian Distance Minimum Adjusted Distance Minimum Mahalanobis Distance F1 Score
  9. 9. General Concepts - Cluster Analysis What is it? Descriptive process. Create groups between objects that are more similar to eachother than to those in other clusters. Objectives Characterize the population. Understand behaviors. Identify opportunities. Apply different treatments.
  10. 10. General Concepts - Cluster Analysis
  11. 11. General Concepts - Cluster Analysis Different clustering algorithms. Define the measures of similarity. Algorithms K-means. Kohonen Self-organizing maps (SOM).
  12. 12. General Concepts - Cluster Analysis K-means Iterative technique. Assign a set on n observations. k number of clusters. Nearest centroid. Each observation belongs to one cluster. Unsupervised algorithm.
  13. 13. General Concepts - Cluster Analysis K-means
  14. 14. General Concepts - Cluster Analysis Kohonen SOM Unsupervised and iterative algorithm. Kohonens learning law. Winning cluster. Training case. Learning rate.
  15. 15. General Concepts - Cluster Analysis Kohonen SOMK variables
  16. 16. General Concepts - Cluster Analysis Kohonen SOMDefine a learning rateInitialize the weights for each nodeSelect one training case and calculate thewinning clusterUpdate the codebook of the winningclusterConvergence checkSTOP
  17. 17. General Concepts Cluster Analysis K-means Kohonen SOM Logistic Regression Multinomial Logistic Regression Multi-layer Perceptron Neural Network Classification distances Minimum Euclidian Distance Minimum Adjusted Distance Minimum Mahalanobis Distance F1 Score
  18. 18. General Concepts - Logistic Regression Probability of occurrence of an event. Determine the relationship between the independentvariables and the dichotomous dependent variable. Fit output values between 0 and 1. Widely use in credit risk scorecards: Simplicity. Interpretability.
  19. 19. General Concepts Cluster Analysis K-means Kohonen SOM Logistic Regression Multinomial Logistic Regression Multi-layer Perceptron Neural Network Classification distances Minimum Euclidian Distance Minimum Adjusted Distance Minimum Mahalanobis Distance F1 Score
  20. 20. General Concepts - Multinomial Logistic Regression Generalization of the logistic regression. Dependent variable is not dichotomous. Predict more than two possibilities. Parameter estimation through maximum likelihood.
  21. 21. General Concepts Cluster Analysis K-means Kohonen SOM Logistic Regression Multinomial Logistic Regression Multi-layer Perceptron Neural Network Classification distances Minimum Euclidian Distance Minimum Adjusted Distance Minimum Mahalanobis Distance F1 Score
  22. 22. General Concepts – MLP neural networkHiddenLayer iInputLayerHiddenLayer 1HiddenLayer 2OutputLayerX16X17X18X19XnX11X12X13X14X15X6X7X8X9X10X1X2X3X4X5H1jH11H12H21H22H2jHi1Hi2HijYiBias2BiasiBias1 Abstraction of the nervous system. Collection of interconnected units.
  23. 23. General Concepts Cluster Analysis K-means Kohonen SOM Logistic Regression Multinomial Logistic Regression Multi-layer Perceptron Neural Network Classification distances Minimum Euclidian Distance Minimum Adjusted Distance Minimum Mahalanobis Distance F1 Score
  24. 24. General Concepts – Classification Distances Three algorithms: Minimum Euclidean distance. Minimum adjusted distance. Minimum Mahalanobis distance. Classify a new client to a corresponding cluster. Assign to the cluster with the minimum distance.
  25. 25. General Concepts – Classification Distances Minimum Euclidean distance Assign base on the distance to the centroids of thedefined clusters.3.73.38.96.5
  26. 26. General Concepts – Classification Distances Minimum adjusted distance Defined by the authors. Assign base on the distance to cluster radius. Average distance of all observations within a clusterto its centroid.2.12.98.26.0
  27. 27. General Concepts – Classification Distances Minimum adjusted distance
  28. 28. General Concepts – Classification Distances Minimum Mahalanobis distance Considers the correlation between variables. Scale-invariant. Covariance matrix is the Identity matrix. Reduces to the Euclidean distance.
  29. 29. General Concepts Cluster Analysis K-means Kohonen SOM Logistic Regression Multinomial Logistic Regression Multi-layer Perceptron Neural Network Classification distances Minimum Euclidian Distance Minimum Adjusted Distance Minimum Mahalanobis Distance F1 Score
  30. 30. General Concepts – The F1 score Two concepts: Precision. Recall. Standard measure of comparison when the outcome isbinary. Classification accuracy.Percentage of true observations thatwere correctly predicted.Percentage of real good observationsthat were correctly predicted. Harmonic mean between precision and recall.
  31. 31. ModelingLogistic regressionor Neural networkmodel developedfor each resultingcluster.Weighted averageof the scores,weighted by theprobability ofbelonging to eachcluster.Vote of eachscore, weighted bythe probability ofbelonging to eachcluster.
  32. 32. Modeling F1 score is calculated for every model. 248 models. SAS® base and SAS Enterprise Miner™ procedures.
  33. 33. Modeling Example of model 13
  34. 34. Modeling Example of model 13Cluster 20.900.550.350.80Score0.90
  35. 35. Modeling Example of model 40
  36. 36. Modeling Example of model 400.900.550.350.80Score0.741 / 2.91 / 6.01 / 8.21 / 2.1𝑆𝑐𝑜𝑟𝑒 = 𝑠𝑐𝑜𝑟𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑖 ∗ 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑏𝑒𝑖𝑛𝑔 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑖
  37. 37. Modeling Example of model 59
  38. 38. Modeling Example of model 590.90 - 10.55 - 00.35 - 00.80 - 1Classifier11 / 3.21 / 6.41 / 7.81 / 2.7𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟 = 𝑟𝑜𝑢𝑛𝑑 ( 𝑠𝑐𝑜𝑟𝑒 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟 𝑖 ∗ 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑏𝑒𝑖𝑛𝑔 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑖 )
  39. 39. Results 240 predictive clustering models. 8 scorecards for the entire population. Top 25% and Top 10% of the models. Contrast on each stage of the modeling process.
  40. 40. Results Stage 1: Clustering methodologyClustering Methodology Top 25% Top 10%K-Means 46% 52%Kohonen SOM 54% 48%There is no significant difference between the techniques.Oppositerelationship.
  41. 41. Results Stage 2: Predictive clusters methodologyPredictive Clusters Methodology Top 25% Top 10%Multinomial Logistic Regression 4% 0%MLP Neural Network 15% 0%Minimum Euclidean Distance 24% 28%Minimum Adjusted Distance 28% 44%Minimum Mahalanobis Distance 28% 28%81%100%Distance methodologies are more powerful
  42. 42. Results Stage 3: Credit scoring methodologyCredit Scoring Methodology Top 25% Top 10%Logistic Regression 49% 64%MLP Neural Network 51% 36%Approximatelythe SamepercentageLogisticregressionexceeds theN.N.Logistic regression scorecards performs better on segmentedpopulations.
  43. 43. Results Stage 4: Final score methodologyFinal Score Methodology Top 25% Top 10%Cluster Score 21% 8%Score Ensemble 28% 20%Classifier Average Vote Ensemble 51% 72%UndisputedwinnerThe best method to define the final score is the classifieraverage vote ensemble.
  44. 44. Results Entire population models rankingDatabaseLogistic Regression position(F1 Score Ranking)MLP Neural Network position(F1 Score Ranking)Database 1 34th 13thDatabase 2 4th 21stDatabase 3 32nd 16thDatabase 4 13th 24thBest scenarioThe best models according the F1 score statistic are the onesdeveloped using the predictive clusters methodology.
  45. 45. Conclusions Clustering methods lead to similar results. On cluster assignment distance methods perform better. Logistic regression could have a higher predictive powerafter using clusters analysis. The classifier average vote ensemble produce superiorresults on the task of defining the final score.Predictive clusters provide betterresults than a single scorecard.
  46. 46. Thank You!
  47. 47. Contact informationDarwin AmézquitaColpatria – Scotia BankBogotá, Colombia(+57) 301-3372763amezqud@colpatria.comAlejandro CorreaColpatria – Scotia BankBogotá, Colombia(+57) 320-8306606correaal@colpatria.comAndrés GonzálezColpatria – Scotia BankBogotá, Colombia(+57) 310-3595239gonzalean@colpatria.comCatherine NietoColpatria – Scotia BankBogotá, Colombia(+57) 315-7426533nietoa@colpatria.com

×