Credit Defaulter
Analysis
By:
Nimai Chand Das Adhikari
Guided By:
Mr. Aashish Soni (Data Scientist, IL and FS)
Introduction
Objective: Now a days the prediction of defaulting the borrower in future is a
challenging task for credit card companies. Therefore the main objective is to
develop prediction models for defaulting the borrower in the future by taking
advantage of available technological advancement.
Terms used for this project:
• Credit Card:- Credit card is a type of financial account. By using credit cards, customers can offer a bank’s money instead
of their own to pay for a product or service today, and over time, they repay the bank. For the benefit of using someone
else’s money, customers will often need to pay interest, as expected with other types of loans.
• Default:- When payments are not made in time and according to the agreement signed by the card holder, the account is
said to be in default.
• Delinquency:- The term delinquent commonly refers to a situation where a borrower is late or overdue on a credit
payment. If borrower is in default, the bank that has issued card is entitled to charge borrower a higher interest rate for that
particular billing period as a penalty.
• Credit Score:- Credit score is numerical expression based on a level of analysis of a person’s credit files. Simply it is
representation of creditworthiness of a borrower.
• Utilization:-Utilization is a percentage of consumer’s available credit that borrower has used. The credit utilization ratio is
key component of borrower credit card.
Scenarios…
• Is it beneficial to lend someone who has a
risk of not repaying the money back??
• Is it “HUMANITY” to lend money to
someone who is in deep trouble?
How to separate the above population
from a pool of beneficial party?
Why to go for Modelling and Invest Money on
a software?
• Insights of the Data, which is very difficult to get from the dataset without a
software
• Assuring with an accuracy for the correctness of the predicting the
defaulters
About The Product
• Name: Def_Catch
• Brief: Trained with credit dataset
with 1,00,000 training examples
and having 11 attributes.
Algorithm
/Model
Data Training
PREDICTIONNew Entry
Insight of Def_Catch
• Reading From the Data
• Pre-Processing the Data
• Missing Values Treatment: All the Missing Values has been Assumed to be Zero
• Other Methods Include: Mean, Median, Most Frequent, Deleting the rows
• Data Duplicity Treatment: Using the Data Frame attribute
• Data Scaling
• Imbalance Treatment: SMOTE is used here
• Predicting the Output
• ASSUMPTIONS:
• No deletion of record.
• Default Hyper parameters for the algorithm taken.
Brief Description Of the Features:
Categorical
• NumberOfTime30-
59DaysPastDueNotWorse
• NumberOfOpenCreditLinesAndLoans
• NumberOfTimes90DaysLate
• NumberRealEstateLoansOrLines
• NumberOfTime60-
89DaysPastDueNotWorse
• NumberOfDependents
Continuous
• RevolvingUtilizationOfUnsecuredLine
s
• Age
• DebtRatio
• MonthlyIncome
Output: SeriousDlqin2yrs (Y/N i.e Defaulter or Not)
Feature_Name count mean std min max
SeriousDlqin2yrs 100000 0.0666 0.2494 0 1
RevolvingUtilization
OfUnsecuredLines 100000 6.011181 261.604 0 50708
age 100000 52.25465 14.75762 0 107
NumberOfTime30-
59DaysPastDueNot
Worse 100000 0.4234 4.222 0 98
DebtRatio 100000 355.8093 2116.25 0 329664
MonthlyIncome 100000 6645.647 12874.63 0 3008750
NumberOfOpenCre
ditLinesAndLoans 100000 8.44 5.13 0 58
NumberOfTimes90
DaysLate 100000 0.268 4.198 0 98
NumberRealEstateL
oansOrLines 100000 1.020419 1.131665 0 54
NumberOfTime60-
89DaysPastDueNot
Worse 100000 0.24232 4.18529 0 98
NumberOfDepende
nts 100000 0.75 1.117 0 20
Predictor Variable: SeriousDIqin2yrs
Features Important
feature 0
RevolvingUtilization
OfUnsecuredLines
feature 1 age
feature 2
NumberOfTime30-
59DaysPastDueNot
Worse
feature 3 DebtRatio
feature 4 MonthlyIncome
feature 5
NumberOfOpenCre
ditLinesAndLoans
feature 6
NumberOfTimes90
DaysLate
feature 7
NumberRealEstateLo
ansOrLines
feature 8
NumberOfTime60-
89DaysPastDueNot
Worse
feature 9
NumberOfDependen
ts
Predicting the Output
Here the following Algorithms are used for the prediction of Accuracy.
• Logistic Regression
• Naïve Bayes
• Multi-Layer Perceptron
• Decision Trees
• Random Forest
80.32
67.223
88.91
93.143
76.086
0
10
20
30
40
50
60
70
80
90
100
Logistic Regression Naïve Bayes Decision Trees Random Forest MLP
Accuracy Comparison
Naïve Bayes
Logistic
Regression Decision Trees Random Forest MLP
Predict Class Predict Class Predict Class Predict Class Predict Class
0 1 0 1 0 1 0 1 0 1
Actual
Class
0 19013 9039 22898 5154 26155 1897 27520 532 22554 5498
1 794 1154 750 1198 1397 551 1532 416 1676 272
CONFUSION MATRIX
Accuracy Comparison:
• The ensemble model : Modeling
Consumer Loan Default Prediction
Using Ensemble Neural Networks
(IEEE)
• Accuracy: Levenberg-Marquardt
(LM) algorithm – 92%
• Product Name: Def_Catch
• Accuracy: 93.143% (Random
Forest)
References:
• https://www.ecb.europa.eu/pub/pdf/scpwps/ecbwp968.pdf?6de72016c144b4ec28
db66bb7c5da541
• Modeling Consumer Loan Default Prediction Using Ensemble Neural Networks,
Amira Kamil Ibrahim Hassan, Ajith Abraham
• https://en.wikipedia.org/wiki/Credit_risk
• http://credfinrisk.com/basics.html
• http://smartdrill.com/pdf/Credit%20Risk%20Analysis.pdf
• CREDIT DEFAULT SWAPS AND THE CREDIT CRISIS , René M. Stulz,
NATIONAL BUREAU OF ECONOMIC RESEARCH
THANK YOU!!

Credit defaulter analysis

  • 1.
    Credit Defaulter Analysis By: Nimai ChandDas Adhikari Guided By: Mr. Aashish Soni (Data Scientist, IL and FS)
  • 2.
    Introduction Objective: Now adays the prediction of defaulting the borrower in future is a challenging task for credit card companies. Therefore the main objective is to develop prediction models for defaulting the borrower in the future by taking advantage of available technological advancement.
  • 3.
    Terms used forthis project: • Credit Card:- Credit card is a type of financial account. By using credit cards, customers can offer a bank’s money instead of their own to pay for a product or service today, and over time, they repay the bank. For the benefit of using someone else’s money, customers will often need to pay interest, as expected with other types of loans. • Default:- When payments are not made in time and according to the agreement signed by the card holder, the account is said to be in default. • Delinquency:- The term delinquent commonly refers to a situation where a borrower is late or overdue on a credit payment. If borrower is in default, the bank that has issued card is entitled to charge borrower a higher interest rate for that particular billing period as a penalty. • Credit Score:- Credit score is numerical expression based on a level of analysis of a person’s credit files. Simply it is representation of creditworthiness of a borrower. • Utilization:-Utilization is a percentage of consumer’s available credit that borrower has used. The credit utilization ratio is key component of borrower credit card.
  • 4.
    Scenarios… • Is itbeneficial to lend someone who has a risk of not repaying the money back?? • Is it “HUMANITY” to lend money to someone who is in deep trouble? How to separate the above population from a pool of beneficial party?
  • 6.
    Why to gofor Modelling and Invest Money on a software? • Insights of the Data, which is very difficult to get from the dataset without a software • Assuring with an accuracy for the correctness of the predicting the defaulters
  • 7.
    About The Product •Name: Def_Catch • Brief: Trained with credit dataset with 1,00,000 training examples and having 11 attributes. Algorithm /Model Data Training PREDICTIONNew Entry
  • 8.
    Insight of Def_Catch •Reading From the Data • Pre-Processing the Data • Missing Values Treatment: All the Missing Values has been Assumed to be Zero • Other Methods Include: Mean, Median, Most Frequent, Deleting the rows • Data Duplicity Treatment: Using the Data Frame attribute • Data Scaling • Imbalance Treatment: SMOTE is used here • Predicting the Output • ASSUMPTIONS: • No deletion of record. • Default Hyper parameters for the algorithm taken.
  • 9.
    Brief Description Ofthe Features: Categorical • NumberOfTime30- 59DaysPastDueNotWorse • NumberOfOpenCreditLinesAndLoans • NumberOfTimes90DaysLate • NumberRealEstateLoansOrLines • NumberOfTime60- 89DaysPastDueNotWorse • NumberOfDependents Continuous • RevolvingUtilizationOfUnsecuredLine s • Age • DebtRatio • MonthlyIncome Output: SeriousDlqin2yrs (Y/N i.e Defaulter or Not)
  • 10.
    Feature_Name count meanstd min max SeriousDlqin2yrs 100000 0.0666 0.2494 0 1 RevolvingUtilization OfUnsecuredLines 100000 6.011181 261.604 0 50708 age 100000 52.25465 14.75762 0 107 NumberOfTime30- 59DaysPastDueNot Worse 100000 0.4234 4.222 0 98 DebtRatio 100000 355.8093 2116.25 0 329664 MonthlyIncome 100000 6645.647 12874.63 0 3008750 NumberOfOpenCre ditLinesAndLoans 100000 8.44 5.13 0 58 NumberOfTimes90 DaysLate 100000 0.268 4.198 0 98 NumberRealEstateL oansOrLines 100000 1.020419 1.131665 0 54 NumberOfTime60- 89DaysPastDueNot Worse 100000 0.24232 4.18529 0 98 NumberOfDepende nts 100000 0.75 1.117 0 20
  • 11.
  • 12.
    Features Important feature 0 RevolvingUtilization OfUnsecuredLines feature1 age feature 2 NumberOfTime30- 59DaysPastDueNot Worse feature 3 DebtRatio feature 4 MonthlyIncome feature 5 NumberOfOpenCre ditLinesAndLoans feature 6 NumberOfTimes90 DaysLate feature 7 NumberRealEstateLo ansOrLines feature 8 NumberOfTime60- 89DaysPastDueNot Worse feature 9 NumberOfDependen ts
  • 13.
    Predicting the Output Herethe following Algorithms are used for the prediction of Accuracy. • Logistic Regression • Naïve Bayes • Multi-Layer Perceptron • Decision Trees • Random Forest
  • 14.
  • 15.
    Naïve Bayes Logistic Regression DecisionTrees Random Forest MLP Predict Class Predict Class Predict Class Predict Class Predict Class 0 1 0 1 0 1 0 1 0 1 Actual Class 0 19013 9039 22898 5154 26155 1897 27520 532 22554 5498 1 794 1154 750 1198 1397 551 1532 416 1676 272 CONFUSION MATRIX
  • 16.
    Accuracy Comparison: • Theensemble model : Modeling Consumer Loan Default Prediction Using Ensemble Neural Networks (IEEE) • Accuracy: Levenberg-Marquardt (LM) algorithm – 92% • Product Name: Def_Catch • Accuracy: 93.143% (Random Forest)
  • 17.
    References: • https://www.ecb.europa.eu/pub/pdf/scpwps/ecbwp968.pdf?6de72016c144b4ec28 db66bb7c5da541 • ModelingConsumer Loan Default Prediction Using Ensemble Neural Networks, Amira Kamil Ibrahim Hassan, Ajith Abraham • https://en.wikipedia.org/wiki/Credit_risk • http://credfinrisk.com/basics.html • http://smartdrill.com/pdf/Credit%20Risk%20Analysis.pdf • CREDIT DEFAULT SWAPS AND THE CREDIT CRISIS , René M. Stulz, NATIONAL BUREAU OF ECONOMIC RESEARCH
  • 18.

Editor's Notes

  • #10 Output: SeriousDlqin2yrs : Person experienced 90 days past due delinquency or worse RevolvingUtilizationOfUnsecuredLines : Total balance on credit cards and personal lines of credit except real estate and no instalment debt like car loans divided by the sum of credit limits NumberOfTime30-59DaysPastDueNotWorse : Number of times borrower has been 30-59 days past due but no worse in the last 2 years. DebtRatio : Monthly debt payments, alimony, living costs divided by monthly gross income MonthlyIncome : Monthly income NumberOfOpenCreditLinesAndLoans : Number of Open loans (instalment like car loan or mortgage) and Lines of credit (e.g. credit cards) NumberOfTimes90DaysLate : Number of times borrower has been 90 days or more past due. NumberRealEstateLoansOrLines : Number of mortgage and real estate loans including home equity lines of credit NumberOfTime60-89DaysPastDueNotWorse : Number of times borrower has been 60-89 days past due but no worse in the last 2 years. NumberOfDependents : Number of dependents in family excluding themselves (spouse, children etc.)