Predictive Credit Risk Scoring using SAS Enterprise Miner
Upcoming SlideShare
Loading in...5
×
 

Predictive Credit Risk Scoring using SAS Enterprise Miner

on

  • 1,148 views

 

Statistics

Views

Total Views
1,148
Views on SlideShare
1,148
Embed Views
0

Actions

Likes
0
Downloads
34
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Predictive Credit Risk Scoring using SAS Enterprise Miner Predictive Credit Risk Scoring using SAS Enterprise Miner Presentation Transcript

  • CREDIT SCORING USING SAS ENTERPRISE MINER AMAL SHANKER DESHBANDHU PACHAURI
  • LENDING CLUB: INTRODUCTION Founded in 2007 An online financial community Bringing together creditworthy borrowers and savvy investors LATEST COMPANY STATISTICS Loans funded to date: $2,595,182,275 Loans funded last month: $203,355,750 Interest paid to investors since inception: $229,080,795 Image from https://www.lendingclub.com/public/how-peer-lending-works.action
  • AIM  To build predictive decision models using SAS Enterprise miner that will be the best indicator of Credit Worthiness.  Compare Regression analysis to Decision tree model and select the one that predicts accurately. DATA  42539 customers data  2007 to 2011  59 Variables View slide
  • VARIABLE SELECTION • TARGET VARIABLE = Bad Flags (Given) • Remaining were 58 INPUT VARIABLES • 58 was an enough big number to deal with. • Reducing this number to best 4 or 5 was the first target. • METHODOLOGY USED: • INTUITIVE METHODS • VARIABLE CATEGORIZATION • SAS FUNCTIONS View slide
  • VARIABLE SELECTION (Contd.) • INTUITIVE METHODS All the variables were checked for any sort of preliminary data inconsistency VARIABLES DISCARDED  VARIABLES WITH “SIGNIFICANTLY HIGH” MISSING VALUES  Months since last delinquency (26929)  Months since last record (38887)  VARIABLE WITH DIFFICULTY IN ROLE ASSIGNMENT  Employment length  NON-USEFUL VARIABLES  State, Member ID etc.  VARIABLES WITH SAME VALUES FOR ALL ROWS OR ALMOST ALL ROWS  Tax liens  Charge off within 12 months
  • VARIABLE SELECTION (Contd.) VARIABLES CATEGORIZATION PRE-APPROVAL VARIABLES  FICO range high, Annual Income etc. DERIVATIVE VARIABLES  Credit grade, Credit Sub-grade, Interest rate POST-APPROVAL VARIABLES  Principal paid, Interest paid  FINAL OUTCOME • 16 pre-approval variables
  • VARIABLE SELECTION (Contd.)  SAS FUNCTIONS 16 pre-approval variables were then assessed using following SAS functions: STATEXPLORE  To check worthiness of the variables VARIABLE CLUSTERING  To identify and group variables with high degree of correlation INPUT VARIABLES STAT EXPLORE VARIABLE CLUSTERING
  • STATEXPLORE: WORTH ANALYSIS WORTH OF VARIABLES ANALYZED 1. SUBGRADE 2. FICO RANGE HIGH 3. FICO RANGE LOW 4. PURPOSE 5. REVOLVNG UTILITY 6. PUBLIC RECORD BANKRUPTCIES 7. ANNUAL INCOME 8. PUBLIC RECORDS 9. OPEN ACOUNT 10. TOTAL ACCOUNT 11. DTI 12. REVOLVING BALANCE 13. LOAN AMOUNT 14. HOME OWNERSHIP 15. DELINQUENCY IN 2 YEARS 16. DELINQUENCY AMOUNT
  • VARIABLE CLUSTERING: CLUSTER ANALYSIS 1 CLUSTER 0 CLUSTER 3 1. 1. 2. DELINQUENCY AMOUNT CLUSTER 1 1. 2. 3. FICO RANGE HIGH FICO RANGE LOW REVOLVING UTILITY BALANCE CLUSTER 2 1. 2. 3. OPEN ACCOUNTS TOTAL ACCOUNTS DTI PUBLIC RECORDS BANKRUPTCY PUBLIC RECORDS CLUSTER 4 1. 2. 3. ANNUAL INCOME LOAN AMOUNT REVOLVING BALANCE CLUSTER 5 1. DELINQUENCY SINCE 2 YEARS PURPOSE AND HOME OWNERSHIP TOO!!!
  • VARIABLE CLUSTERING: CLUSTER ANALYSIS 2 CLUSTER 1 1. 2. 3. FICO RANGE HIGH DELINQUENCY SINCE 2 YEARS PUBLIC RECORDS BANKRUPTCY CLUSTER 2 1. 2. ANNUAL INCOME OPEN ACCOUNTS PURPOSE AND HOME OWNERSHIP STILL UNDER CONSIDERATION!!!
  • SUMMARY: VARIABLE SELECTION BEGINNING PRE-APPROVAL VARIABLES WORTH ANALYSIS CLUSTER ANALYSIS 1 CLUSTER ANALYSIS 2 58 16 15 7 4
  • FINAL 4 INPUT VARIABLES  FICO RANGE HIGH  PURPOSE  ANNUAL INCOME  HOME OWNERSHIP
  • SAS DIAGRAM WORKSPACE DIAGRAM LOAN V3 INPUT VARIABLES IMPUTE DATA PARTITION DECISION TREE REGRESSION MODEL COMPARISION
  • DECISION TREE ANALYSIS CUMULATIVE LIFT MODEL PROFITABILITY CALCULATIONS Good 6373 1500 9,559,500.00 Bad 565 10000 5,650,000.00 Total 6938 3,909,500.00 EARNINGS PER CUSTOMER 563.49
  • DECISION TREE OUTPUT VARIABLE IMPORTANCE OBS NAME LABEL NRULES IMPORTANCE VIMPORTANCE RATIO 1 IMP_fico_rangehigh Imp: fico_rangehigh 1 1 1 1 2 IMP_annual_inc Imp: annual_inc 1 0.2694 0.2016 0.7484 3 IMP_purpose Imp: purpose 1 0.193 0 0 4 IMP_homeownership Imp: home_ownership 1 0.1075 0.1139 1.0601
  • REGRESSION ANALYSIS Cumulative % Cumulative Number of Mean Posterior Depth Gain Lift Lift Response % Response Observations Probability PRODUCT 5 124.595 2.24595 2.24595 27.9671 27.9671 851 0.28826 245.3093 10 106.193 1.87791 2.06193 23.3843 25.6757 851 0.2125 180.8375 15 88.735 1.53819 1.88735 19.1539 23.5018 851 0.19381 164.9323 20 79.298 1.50988 1.79298 18.8014 22.3267 851 0.17844 151.8524 25 69.107 1.2834 1.69107 15.9812 21.0576 851 0.16588 141.1639 30 57.751 1.00973 1.57751 12.5734 19.6436 851 0.15506 131.9561 35 50.204 1.04871 1.50204 13.0588 18.7038 850 0.14529 123.4965 40 45.111 1.09466 1.45111 13.631 18.0696 851 0.13596 115.702 45 39.263 0.9248 1.39263 11.5159 17.3413 851 0.12699 108.0685 50 33.734 0.83987 1.33734 10.4583 16.653 851 0.11909 101.3456 55 31.099 1.04748 1.31099 13.0435 16.3248 851 0.11126 94.68226 60 27.33 0.85874 1.2733 10.6933 15.8555 851 0.10387 88.39337 65 23.85 0.821 1.2385 10.2233 15.4222 851 0.09661 82.21511 70 19.867 0.68025 1.19867 8.4706 14.9261 850 0.08954 76.109 75 17.286 0.81156 1.17286 10.1058 14.6047 851 0.0824 70.1224 80 12.746 0.44667 1.12746 5.5621 14.0395 851 0.0746 63.4846 85 9.481 0.5725 1.09481 7.1289 13.6329 851 0.06686 56.89786 90 6.754 0.60395 1.06754 7.5206 13.2933 851 0.05864 49.90264 95 3.42 0.43409 1.0342 5.4054 12.8781 851 0.04906 41.75006 100 0 0.34957 1 4.3529 12.4523 850 0.03406 28.951 11911 1101.121 CUMULATIVE LIFT MODEL PROFITABILITY CALCULATIONS Good 10809.88 1500 Bad 1101.121 10000 Total 11911 EARNINGS PER CUSTOMER 16214819 11011210 5203609 436.8742
  • MODEL COMPARISION Event Model Model Classification Selection based on Valid: Model Data Misclassification Rate (_V FALSE MISC_) TRUE FALSE TRUE Node Description Role Target Negative Positive Positive Tree2 Decision Tree Bad_Flag 3176 22345 0 0 Tree2 Reg2 Decision Tree Regression Bad_Flag Bad_Flag 2119 3176 14898 22344 0 1 0 0 Reg2 Regression TRAIN VALIDAT E TRAIN VALIDAT E Bad_Flag 2119 14898 . Negative 0
  • MODEL COMPARISION
  • REGRESSION VS DECISION TREE MODEL DECISION TREE ANALYSIS Good 6373 1500 9,559,500.00 Bad 565 10000 5,650,000.00 Total 6938 3,909,500.00 EARNINGS PER CUSTOMER 563.49 REGRESSION ANALYSIS Good 10809.88 1500 Bad 1101.121 10000 Total 11911 EARNINGS PER CUSTOMER 436.87 563.49 16214819 11011210 5203609 436.8742
  • CONCLUSION  Decision tree model is more credit worthy  Most significant factor to consider is credit score  Regression analysis shows more relative total earnings  Decision analysis shows more earnings per customer