SlideShare a Scribd company logo
1 of 16
Loan Risk Assessment & Scoring
Model
• January 2017
2
PKC
Loan Risk Assessment & Scoring Model
 Probability scores can be assigned to each client to predicate loan defaults.
Equation: Probability to default = ez/1+ez, where e=2.718
z = -1.56+ 0.02*(Age)+ (-0.0083)* (Average Salary) +(-0.0018) *Total Assets +
(-0.00001)*Total Loan+ (-4.33)* Assets to Loan Ratio + 0.75(Male) + (-3.3)*Private
+ 2.1*EU&CA + 4.1*North America + 1.8*Sahara & Africa
Conclusion
 To develop Statistical Prediction Model based on historical data of loan defaulters
(Bad loans) and non-defaulter (Good Loans).
 Calculate probability of default for each client.
Objective
 Banking Data of all clients who have loan balance for the month of November 2015.
Data Source
 Payment history, Credit Utilization, Credit History, Credit Use & Assets of client with other
banks are not available which can be useful for loan scoring model.
 Default percentage in available data differs from the actual default percentage, which
suggest the data is incomplete.
Data Limitations
 To predict if borrowers are likely to default on their loans or not, two classifications are
created as Bad Loans and Good Loans.
 Later created a statistical model for this binary variable with the Logistic Regression along
with all the available demographic and banking variables as attributes.
Approach
3
PKC
 Data suggests that some countries have higher weightage in the current active
loan clearance model of the bank but default rate is very high of these countries in
Nov'2015. There is a possibility that these clients get the loans easily compared to
other countries and later they default without affecting their credit rating in their home
country.
Some examples of defaults by country name:
 United States : 50% (11 defaults of total 22 Loan Clients)
 United Kingdom : 36% (5 defaults of total 14 Loan Clients)
 Somali : 43% (3 defaults of total 7 Loan Clients)
 Romania : 33% (2 defaults of total 6 Loan Clients)
 Maldives : 29% (4 defaults of total 14 Loan Clients)
 Canada : 19% (3 defaults of total 16 Loan Clients)
 Lebanese : 11% (24 defaults of total 213 Loan Clients)
 Germany : 50% (1 default of total 2 Loan Clients)
However, total default percent as per the available data is only 2.05% in terms of count of
clients and 2.53% in terms of amount
Key Findings & Statistical Measures
4
PKC
 Monthly salaries of each client is one of the important factors to predict loan
defaults. A risk indicator can be generated for clients whose salary is not being
credited per month because loan scoring model suggests high probability of these
customers to default.
 One of the interesting finding is that employees of government sector are more
likely to default in comparison to private and semi-government loan clients.
 Assets(Demand Deposit & Time Deposit) available in the data is one of key
variables to predict likelihood of clients to default. As value of the client’s assets
increases probability to default decreases significantly.
Key Findings & Statistical Measures
Total Loan Clients Default Default Percentage
Government 4291 127 2.96%
Private 12158 78 0.64%
Semi-Government 65 0 0.00%
Non-Kuwait Kuwait
Defaults 198 153
Salary Missing 198 142
Percent 100% 92.81%
5
PKC
Conclusion & Future Prospects
 The Loan Score model based on available data demonstrates statistical scope to
predict loan defaults and provide significant risk assessment measure to differentiate
between good & bad loans.
 Enrichment of data with payment history, Credit Utilization, Credit History, Credit Use
& assets of client with other banks will further improve the loan scoring model.
 We can also create a loan scoring model for the prospective customs and target
customer who have very low probability to default and thus reduces the risk to
default and maximizes bank’s profit.
 With the availability of all the demographic, credit and asset data we can create
individual models based on geography , amount of loan and number of clients.
Example:
i. Home Client Loan Scoring Model (Clients of Kuwait)
ii. Indian Client Loan Scoring Model (Indian Loan Client are highest in terms of count)
iii. Out-of-Home Client Loan Scoring Model (For other countries & geographies)
Annexure : Conceptualizing Modeling
steps of Logistic Regression
7
PKC
Conceptualization: Modeling steps of Logistic Regression
Data Access &
Manage
Target Variable
Creation
Variable
Transformation
Dummy
Variable
Creation
Data Partition
in Training &
Validation
Model
Calibration
Lift Chart
Comparison
Model Creation
on entire data
Decide
Probability
Cut-off
Model
Validation
8
PKC
Conceptualization: Modeling steps of Logistic Regression
B. Variable Creation (Target)
 Creation of target variable: Clients for whom Legal Loan is available is considered as Bad
Loans and assigned a value ‘1’ and clients for whom Legal Loan is not available is considered
Good Loans and assigned a value ‘0’ .
A. Data Access and Management in SAS
 Import and Merge(Join) available Banking Loan data of November 2015), Client Level data, Client
Salary data and Employment data.
C. Variable Transformation to categorize variables into less categories for
variables Country name and Profession.
 Country Names into Country Categories as per region: East Asia & Pacific, Europe & Central
Asia, Latin America & Caribbean, Middle East & North Africa, North America, South Asia, Sub-
Saharan Africa and Others
 Profession into Professional Categories:
 Blue-Collar (like Clerk, Technician, Driver & others)
 High-risk (Policemen, Army man & others)
 White-Collar (Doctors, Economist, Auditor & others)
9
PKC
Conceptualization: Modeling steps of Logistic Regression
D. Conversion of Categorical Character variables in Numeric Dummy Variables
for Gender, Professional category, Employment Type & Country Classification
Example: Employment Type Government Private Semi-Government
Government 1 0 0
Private 0 1 0
Semi-Government 0 0 1
Missing 0 0 0
E. Data partition: Entire available client data of 17,096 is divided in two parts.
 Training Data-Set (10,258): 60% of all the observations.
 Validation Data-Set (6,838): 40% of all the observation.
 Random uniform distribution has been used to randomly select
observation in each Data-Set.
Full Data
Validation Data
40% of
Population
Test Data
60% of
Population
10
PKC
Conceptualization: Modeling steps of Logistic Regression
F. Model Calibration:
i. Logistic Regression is used on Testing Data-Set to generate the estimates for each
independent variable along with intersect.
ii. The binary logistic model is used to estimate the probability of a binary
response (1s as Good Loans & 0s as Bad Loans) based on predictor (or
independent) variables as attributes or features.
iii. Multi-Co-linearity Check among independent variables through VIF (Variance
Inflationary Factor). Variable with VIF higher then 5 are removed from the model.
where R2
i is the coefficient of determination of the regression equation.
iv. Highly collinear continuous or dummy variables such as Female, Government,
Semi- Government, South Asia, Middle East & North Africa , East Asia & Pacific,
Loan Available (Salary*15 or 15000 – Existing Loan) and All Age-Groups are
removed from the model to eliminate Multi-Co-linearity.
11
PKC
Conceptualization: Modeling steps of Logistic Regression
F. Model Calibration (Continued):
v. Model is created with stepwise method by considering significance level to enter
and stay in the model as 0.05 (95% confidence level). Below are the estimate of all
the significant variables along with intercept.
Logit(Target) = b0 + b1*X1 + b2*X2+…+bn*Xn,
Where Logit(Target) = log[Prob(Target=1| X1, X2, …, Xn) /
Prob(Target=0| X1, X2, …, Xn)]
And b0, b1,…, bn are the Estimates/Betas
Parameters Estimate Pr > ChiSq
Intercept -1.46170 0.0004
Age 0.01870 0.0271
Average Salary -0.00829 <.0001
Total Loan -0.00001 0.0038
Assets to Loan Ratio -7.46580 <.0001
Male 0.72240 0.0013
Private -3.41490 <.0001
Europe and Central Asia 1.90480 0.0113
North America 3.57640 <.0001
ez
Probability to default(1) = --------
1+ez
Where,
e = 2.71828
z = b0 + b1*X1 + b2*X2+…+bn*Xn
12
PKC
Conceptualization: Modeling steps of Logistic Regression
F. Model Calibration (Continued):
vi. Parameter estimates are used to calculate probability associated with each loan
client to default to Training Data-set (60%).
ez
Probability to default(1) = ------------
1+ez
where, Z = -1.46+ 0.018*(Age)+ (-0.0083)* (Average Salary)+ (-0.00001)*Total Loan+
(-7.47)* Assets to Loan Ratio + 0.72(Male) + (-3.4)*Private + 1.9*EU&CA + 3.6*North America
vii. These parameter estimates are used to calculate probability associated with each
loan client to default on Validation Data-set (40%) also.
Below are the reference files of Training and Validation data sets with all calculations.
13
PKC
Conceptualization: Modeling steps of Logistic Regression
G. Model Validation: Comparing Training & Validation with Lift Chart:
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
1 2 3 4 5 6 7 8 9 10 11
Lift Chart
Cummulative Percent (Training) Cummulative Percent (Validation)
Cummulative (Without Model)
Lift Chart suggests that both
the Training dataset and
validation dataset are in
alignment with parameter
estimates. Thus validating the
correctness of the model.
Association of Predicted Probabilities and Observed
Responses
Percent Concordant 95.6 Somers' D 0.930
Percent Discordant 2.6 Gamma 0.946
Percent Tied 1.7 Tau-a 0.038
Pairs 2166770 c 0.965
Percent Concordant suggest
good percentage of correct
predication (95.6%). Somers’D
& Gamma suggests model
has a significant predictive
power (0.93/0.94). Area
under curve i.e. c is 0.96
which is close to 1.0.
14
PKC
Conceptualization: Modeling steps of Logistic Regression
H. Model Creation on the entire data-set (10796): Now the parameter estimates are
created on the entire data-set by following all the previous steps. Below are the
estimate of all the significant variables along with intercept.
Z = -1.56+ 0.02*(Age)+ (-0.0083)* (Average Salary) +(-0.0018)*Total Assets
+(-0.00001)*Total Loan+ (-4.33)* Assets to Loan Ratio + 0.75(Male) + (-3.3)*Private + 2.1*EU&CA
+ 4.1*North America + 1.8*Sahara & Africa
Parameters Estimate Pr > ChiSq
Intercept -1.5571 <.0001
Age 0.0201 0.0028
Average Salary -0.00831 <.0001
Total Assets -0.0018 0.0009
Total Loans -0.00001 0.0001
Assets to Loan ratio -4.3266 0.0155
Male 0.7518 <.0001
Private -3.2798 <.0001
Europe and Central Asia 2.0924 0.0009
North America 4.0713 <.0001
Sub Sahara & Africa 1.7926 0.0114
ez
Probability to default(1) = ----------
1+ez
where, e = 2.71828,z = b0 + b1*X1 + b2*X2+…+bn*Xn
15
PKC
Conceptualization: Modeling steps of Logistic Regression
I. Probability cut-off to accept or reject a loan application: Decide the probability
level above which loans must be rejected and below level would be accepted.
P>0.40 Predicted
Actual 0 1 TotalPercent
0 16594 151 16745 99.1%
1 184 167 351 47.6%
Total 16778 318 17096
P>0.30 Predicted
Actual 0 1 TotalPercent
0 16458 287 16745 98.3%
1 96 255 351 72.6%
Total 16554 542 17096
P>0.25 Predicted
Actual 0 1 TotalPercent
0 16412 333 16745 98.0%
1 84 267 351 76.1%
Total 16496 600 17096
These tables provides frequency
distribution of correct and in-
correct predictions at 0.4,0.3 &
0.25 probabilities.
Best probability cut-off needs to
be decided to minimizing risk
and maximizing profit.
If target is to acquire more
customers then 0.30 cut-off is
appropriate other-wise 0.25 cut-
off is good enough to reduce
risk.
16
PKC
Conceptualization: Modeling steps of Logistic Regression
J. Model Validation: ROC Curve
Accuracy of the model is measured
by the Area under ROC cure. An
area of 1 represents a perfect test.
In our model area under ROC
curve at last step is 0.96 and it
would be considered as “very good"
at separating good loans versus
bad loans.

More Related Content

What's hot

Data_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateData_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateKaren Yang
 
Data Analysis. Regression. LendingClub Loans
Data Analysis. Regression. LendingClub LoansData Analysis. Regression. LendingClub Loans
Data Analysis. Regression. LendingClub LoansGuillermo Santos
 
Credit Scoring 101 Education
Credit Scoring 101 EducationCredit Scoring 101 Education
Credit Scoring 101 EducationData Facts, Inc
 
Mining Credit Card Defults
Mining Credit Card DefultsMining Credit Card Defults
Mining Credit Card DefultsKrunal Khatri
 
Credit scoring using Rattle and R
Credit scoring using Rattle and RCredit scoring using Rattle and R
Credit scoring using Rattle and RAyan Das
 
Exploratory Data Analysis Bank Fraud Case Study
Exploratory  Data Analysis Bank Fraud Case StudyExploratory  Data Analysis Bank Fraud Case Study
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
 
Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishArsalan Qadri
 
Exploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentExploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentVishalPatil527
 
Credit eda case study presentation
Credit eda case study presentation  Credit eda case study presentation
Credit eda case study presentation DeboraJasmin S
 
Unlocking the secrets of credit scoring presentation
Unlocking the secrets of credit scoring presentationUnlocking the secrets of credit scoring presentation
Unlocking the secrets of credit scoring presentationWanda Strickfaden
 
A study of credit risk management in commercial banks
A study of credit risk management in commercial banksA study of credit risk management in commercial banks
A study of credit risk management in commercial banksWriteKraft Dissertations
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Pratibha Singh
 
Credit default risk
Credit default riskCredit default risk
Credit default riskchs71
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
 
Monthly abr20 credit qi
Monthly abr20 credit qiMonthly abr20 credit qi
Monthly abr20 credit qiEyeHigh
 
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionApplying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionVandanaSharma356
 
A report on Credit Risk Management in Banks
A report on Credit Risk Management in BanksA report on Credit Risk Management in Banks
A report on Credit Risk Management in BanksAnurag Ghosh
 

What's hot (20)

Data_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateData_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRate
 
Data Analysis. Regression. LendingClub Loans
Data Analysis. Regression. LendingClub LoansData Analysis. Regression. LendingClub Loans
Data Analysis. Regression. LendingClub Loans
 
Credit Scoring 101 Education
Credit Scoring 101 EducationCredit Scoring 101 Education
Credit Scoring 101 Education
 
Mining Credit Card Defults
Mining Credit Card DefultsMining Credit Card Defults
Mining Credit Card Defults
 
Project data analysis
Project data analysisProject data analysis
Project data analysis
 
Credit scoring using Rattle and R
Credit scoring using Rattle and RCredit scoring using Rattle and R
Credit scoring using Rattle and R
 
Exploratory Data Analysis Bank Fraud Case Study
Exploratory  Data Analysis Bank Fraud Case StudyExploratory  Data Analysis Bank Fraud Case Study
Exploratory Data Analysis Bank Fraud Case Study
 
Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit Rish
 
Exploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentExploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk Assesment
 
Credit eda case study presentation
Credit eda case study presentation  Credit eda case study presentation
Credit eda case study presentation
 
Unlocking the secrets of credit scoring presentation
Unlocking the secrets of credit scoring presentationUnlocking the secrets of credit scoring presentation
Unlocking the secrets of credit scoring presentation
 
A study of credit risk management in commercial banks
A study of credit risk management in commercial banksA study of credit risk management in commercial banks
A study of credit risk management in commercial banks
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...
 
Credit eda case study
Credit eda case studyCredit eda case study
Credit eda case study
 
Credit default risk
Credit default riskCredit default risk
Credit default risk
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research Paper
 
Monthly abr20 credit qi
Monthly abr20 credit qiMonthly abr20 credit qi
Monthly abr20 credit qi
 
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionApplying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
 
A report on Credit Risk Management in Banks
A report on Credit Risk Management in BanksA report on Credit Risk Management in Banks
A report on Credit Risk Management in Banks
 

Similar to Loan Risk Assessment & Scoring Model

Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsIRJET Journal
 
81_8997_497880.ppt
81_8997_497880.ppt81_8997_497880.ppt
81_8997_497880.pptZainaLuqman
 
Current Write-off Rates and Q-factors in Roll-rate Method
Current Write-off Rates and Q-factors in Roll-rate MethodCurrent Write-off Rates and Q-factors in Roll-rate Method
Current Write-off Rates and Q-factors in Roll-rate MethodGraceCooper18
 
Global Cash Flow Analysis: What, When, Why, and How
Global Cash Flow Analysis: What, When, Why, and HowGlobal Cash Flow Analysis: What, When, Why, and How
Global Cash Flow Analysis: What, When, Why, and HowLibby Bierman
 
Should this loan be approved or denied
Should this loan be approved or deniedShould this loan be approved or denied
Should this loan be approved or deniedPOOJA PATIL
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment Kunal Kashyap
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
 
Commercial Banking Solutions | Commercial Banking BPM | WNS
Commercial Banking Solutions | Commercial Banking BPM | WNSCommercial Banking Solutions | Commercial Banking BPM | WNS
Commercial Banking Solutions | Commercial Banking BPM | WNSRNayak3
 
Consumer credit-risk3440
Consumer credit-risk3440Consumer credit-risk3440
Consumer credit-risk3440stone55
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersIRJET Journal
 
Jacobs stress testing_aug13_8-15-13_v4
Jacobs stress testing_aug13_8-15-13_v4Jacobs stress testing_aug13_8-15-13_v4
Jacobs stress testing_aug13_8-15-13_v4Michael Jacobs, Jr.
 
How To Biuld Internal Rating System For Basel Ii
How To Biuld Internal Rating System For Basel IiHow To Biuld Internal Rating System For Basel Ii
How To Biuld Internal Rating System For Basel IiFNian
 
Assessing probabilities of financial distress of banks in UAE
Assessing probabilities of financial distress of banks in UAEAssessing probabilities of financial distress of banks in UAE
Assessing probabilities of financial distress of banks in UAEAlireza Khosroyar
 
Credit risk management presentation
Credit risk management presentationCredit risk management presentation
Credit risk management presentationharsh raj
 
Credit Base Line Profile March 09
Credit Base Line Profile March 09Credit Base Line Profile March 09
Credit Base Line Profile March 09Mike Oswald
 
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...Microcredit Summit Campaign
 

Similar to Loan Risk Assessment & Scoring Model (20)

Creditscore
CreditscoreCreditscore
Creditscore
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMs
 
81_8997_497880.ppt
81_8997_497880.ppt81_8997_497880.ppt
81_8997_497880.ppt
 
Current Write-off Rates and Q-factors in Roll-rate Method
Current Write-off Rates and Q-factors in Roll-rate MethodCurrent Write-off Rates and Q-factors in Roll-rate Method
Current Write-off Rates and Q-factors in Roll-rate Method
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Global Cash Flow Analysis: What, When, Why, and How
Global Cash Flow Analysis: What, When, Why, and HowGlobal Cash Flow Analysis: What, When, Why, and How
Global Cash Flow Analysis: What, When, Why, and How
 
Forecasting peer-to-peer lending risk
Forecasting peer-to-peer lending riskForecasting peer-to-peer lending risk
Forecasting peer-to-peer lending risk
 
Should this loan be approved or denied
Should this loan be approved or deniedShould this loan be approved or denied
Should this loan be approved or denied
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Commercial Banking Solutions | Commercial Banking BPM | WNS
Commercial Banking Solutions | Commercial Banking BPM | WNSCommercial Banking Solutions | Commercial Banking BPM | WNS
Commercial Banking Solutions | Commercial Banking BPM | WNS
 
Consumer credit-risk3440
Consumer credit-risk3440Consumer credit-risk3440
Consumer credit-risk3440
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
Jacobs stress testing_aug13_8-15-13_v4
Jacobs stress testing_aug13_8-15-13_v4Jacobs stress testing_aug13_8-15-13_v4
Jacobs stress testing_aug13_8-15-13_v4
 
How To Biuld Internal Rating System For Basel Ii
How To Biuld Internal Rating System For Basel IiHow To Biuld Internal Rating System For Basel Ii
How To Biuld Internal Rating System For Basel Ii
 
Assessing probabilities of financial distress of banks in UAE
Assessing probabilities of financial distress of banks in UAEAssessing probabilities of financial distress of banks in UAE
Assessing probabilities of financial distress of banks in UAE
 
Credit risk management presentation
Credit risk management presentationCredit risk management presentation
Credit risk management presentation
 
Credit Base Line Profile March 09
Credit Base Line Profile March 09Credit Base Line Profile March 09
Credit Base Line Profile March 09
 
CECL is coming
CECL is comingCECL is coming
CECL is coming
 
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 

Loan Risk Assessment & Scoring Model

  • 1. Loan Risk Assessment & Scoring Model • January 2017
  • 2. 2 PKC Loan Risk Assessment & Scoring Model  Probability scores can be assigned to each client to predicate loan defaults. Equation: Probability to default = ez/1+ez, where e=2.718 z = -1.56+ 0.02*(Age)+ (-0.0083)* (Average Salary) +(-0.0018) *Total Assets + (-0.00001)*Total Loan+ (-4.33)* Assets to Loan Ratio + 0.75(Male) + (-3.3)*Private + 2.1*EU&CA + 4.1*North America + 1.8*Sahara & Africa Conclusion  To develop Statistical Prediction Model based on historical data of loan defaulters (Bad loans) and non-defaulter (Good Loans).  Calculate probability of default for each client. Objective  Banking Data of all clients who have loan balance for the month of November 2015. Data Source  Payment history, Credit Utilization, Credit History, Credit Use & Assets of client with other banks are not available which can be useful for loan scoring model.  Default percentage in available data differs from the actual default percentage, which suggest the data is incomplete. Data Limitations  To predict if borrowers are likely to default on their loans or not, two classifications are created as Bad Loans and Good Loans.  Later created a statistical model for this binary variable with the Logistic Regression along with all the available demographic and banking variables as attributes. Approach
  • 3. 3 PKC  Data suggests that some countries have higher weightage in the current active loan clearance model of the bank but default rate is very high of these countries in Nov'2015. There is a possibility that these clients get the loans easily compared to other countries and later they default without affecting their credit rating in their home country. Some examples of defaults by country name:  United States : 50% (11 defaults of total 22 Loan Clients)  United Kingdom : 36% (5 defaults of total 14 Loan Clients)  Somali : 43% (3 defaults of total 7 Loan Clients)  Romania : 33% (2 defaults of total 6 Loan Clients)  Maldives : 29% (4 defaults of total 14 Loan Clients)  Canada : 19% (3 defaults of total 16 Loan Clients)  Lebanese : 11% (24 defaults of total 213 Loan Clients)  Germany : 50% (1 default of total 2 Loan Clients) However, total default percent as per the available data is only 2.05% in terms of count of clients and 2.53% in terms of amount Key Findings & Statistical Measures
  • 4. 4 PKC  Monthly salaries of each client is one of the important factors to predict loan defaults. A risk indicator can be generated for clients whose salary is not being credited per month because loan scoring model suggests high probability of these customers to default.  One of the interesting finding is that employees of government sector are more likely to default in comparison to private and semi-government loan clients.  Assets(Demand Deposit & Time Deposit) available in the data is one of key variables to predict likelihood of clients to default. As value of the client’s assets increases probability to default decreases significantly. Key Findings & Statistical Measures Total Loan Clients Default Default Percentage Government 4291 127 2.96% Private 12158 78 0.64% Semi-Government 65 0 0.00% Non-Kuwait Kuwait Defaults 198 153 Salary Missing 198 142 Percent 100% 92.81%
  • 5. 5 PKC Conclusion & Future Prospects  The Loan Score model based on available data demonstrates statistical scope to predict loan defaults and provide significant risk assessment measure to differentiate between good & bad loans.  Enrichment of data with payment history, Credit Utilization, Credit History, Credit Use & assets of client with other banks will further improve the loan scoring model.  We can also create a loan scoring model for the prospective customs and target customer who have very low probability to default and thus reduces the risk to default and maximizes bank’s profit.  With the availability of all the demographic, credit and asset data we can create individual models based on geography , amount of loan and number of clients. Example: i. Home Client Loan Scoring Model (Clients of Kuwait) ii. Indian Client Loan Scoring Model (Indian Loan Client are highest in terms of count) iii. Out-of-Home Client Loan Scoring Model (For other countries & geographies)
  • 6. Annexure : Conceptualizing Modeling steps of Logistic Regression
  • 7. 7 PKC Conceptualization: Modeling steps of Logistic Regression Data Access & Manage Target Variable Creation Variable Transformation Dummy Variable Creation Data Partition in Training & Validation Model Calibration Lift Chart Comparison Model Creation on entire data Decide Probability Cut-off Model Validation
  • 8. 8 PKC Conceptualization: Modeling steps of Logistic Regression B. Variable Creation (Target)  Creation of target variable: Clients for whom Legal Loan is available is considered as Bad Loans and assigned a value ‘1’ and clients for whom Legal Loan is not available is considered Good Loans and assigned a value ‘0’ . A. Data Access and Management in SAS  Import and Merge(Join) available Banking Loan data of November 2015), Client Level data, Client Salary data and Employment data. C. Variable Transformation to categorize variables into less categories for variables Country name and Profession.  Country Names into Country Categories as per region: East Asia & Pacific, Europe & Central Asia, Latin America & Caribbean, Middle East & North Africa, North America, South Asia, Sub- Saharan Africa and Others  Profession into Professional Categories:  Blue-Collar (like Clerk, Technician, Driver & others)  High-risk (Policemen, Army man & others)  White-Collar (Doctors, Economist, Auditor & others)
  • 9. 9 PKC Conceptualization: Modeling steps of Logistic Regression D. Conversion of Categorical Character variables in Numeric Dummy Variables for Gender, Professional category, Employment Type & Country Classification Example: Employment Type Government Private Semi-Government Government 1 0 0 Private 0 1 0 Semi-Government 0 0 1 Missing 0 0 0 E. Data partition: Entire available client data of 17,096 is divided in two parts.  Training Data-Set (10,258): 60% of all the observations.  Validation Data-Set (6,838): 40% of all the observation.  Random uniform distribution has been used to randomly select observation in each Data-Set. Full Data Validation Data 40% of Population Test Data 60% of Population
  • 10. 10 PKC Conceptualization: Modeling steps of Logistic Regression F. Model Calibration: i. Logistic Regression is used on Testing Data-Set to generate the estimates for each independent variable along with intersect. ii. The binary logistic model is used to estimate the probability of a binary response (1s as Good Loans & 0s as Bad Loans) based on predictor (or independent) variables as attributes or features. iii. Multi-Co-linearity Check among independent variables through VIF (Variance Inflationary Factor). Variable with VIF higher then 5 are removed from the model. where R2 i is the coefficient of determination of the regression equation. iv. Highly collinear continuous or dummy variables such as Female, Government, Semi- Government, South Asia, Middle East & North Africa , East Asia & Pacific, Loan Available (Salary*15 or 15000 – Existing Loan) and All Age-Groups are removed from the model to eliminate Multi-Co-linearity.
  • 11. 11 PKC Conceptualization: Modeling steps of Logistic Regression F. Model Calibration (Continued): v. Model is created with stepwise method by considering significance level to enter and stay in the model as 0.05 (95% confidence level). Below are the estimate of all the significant variables along with intercept. Logit(Target) = b0 + b1*X1 + b2*X2+…+bn*Xn, Where Logit(Target) = log[Prob(Target=1| X1, X2, …, Xn) / Prob(Target=0| X1, X2, …, Xn)] And b0, b1,…, bn are the Estimates/Betas Parameters Estimate Pr > ChiSq Intercept -1.46170 0.0004 Age 0.01870 0.0271 Average Salary -0.00829 <.0001 Total Loan -0.00001 0.0038 Assets to Loan Ratio -7.46580 <.0001 Male 0.72240 0.0013 Private -3.41490 <.0001 Europe and Central Asia 1.90480 0.0113 North America 3.57640 <.0001 ez Probability to default(1) = -------- 1+ez Where, e = 2.71828 z = b0 + b1*X1 + b2*X2+…+bn*Xn
  • 12. 12 PKC Conceptualization: Modeling steps of Logistic Regression F. Model Calibration (Continued): vi. Parameter estimates are used to calculate probability associated with each loan client to default to Training Data-set (60%). ez Probability to default(1) = ------------ 1+ez where, Z = -1.46+ 0.018*(Age)+ (-0.0083)* (Average Salary)+ (-0.00001)*Total Loan+ (-7.47)* Assets to Loan Ratio + 0.72(Male) + (-3.4)*Private + 1.9*EU&CA + 3.6*North America vii. These parameter estimates are used to calculate probability associated with each loan client to default on Validation Data-set (40%) also. Below are the reference files of Training and Validation data sets with all calculations.
  • 13. 13 PKC Conceptualization: Modeling steps of Logistic Regression G. Model Validation: Comparing Training & Validation with Lift Chart: 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% 1 2 3 4 5 6 7 8 9 10 11 Lift Chart Cummulative Percent (Training) Cummulative Percent (Validation) Cummulative (Without Model) Lift Chart suggests that both the Training dataset and validation dataset are in alignment with parameter estimates. Thus validating the correctness of the model. Association of Predicted Probabilities and Observed Responses Percent Concordant 95.6 Somers' D 0.930 Percent Discordant 2.6 Gamma 0.946 Percent Tied 1.7 Tau-a 0.038 Pairs 2166770 c 0.965 Percent Concordant suggest good percentage of correct predication (95.6%). Somers’D & Gamma suggests model has a significant predictive power (0.93/0.94). Area under curve i.e. c is 0.96 which is close to 1.0.
  • 14. 14 PKC Conceptualization: Modeling steps of Logistic Regression H. Model Creation on the entire data-set (10796): Now the parameter estimates are created on the entire data-set by following all the previous steps. Below are the estimate of all the significant variables along with intercept. Z = -1.56+ 0.02*(Age)+ (-0.0083)* (Average Salary) +(-0.0018)*Total Assets +(-0.00001)*Total Loan+ (-4.33)* Assets to Loan Ratio + 0.75(Male) + (-3.3)*Private + 2.1*EU&CA + 4.1*North America + 1.8*Sahara & Africa Parameters Estimate Pr > ChiSq Intercept -1.5571 <.0001 Age 0.0201 0.0028 Average Salary -0.00831 <.0001 Total Assets -0.0018 0.0009 Total Loans -0.00001 0.0001 Assets to Loan ratio -4.3266 0.0155 Male 0.7518 <.0001 Private -3.2798 <.0001 Europe and Central Asia 2.0924 0.0009 North America 4.0713 <.0001 Sub Sahara & Africa 1.7926 0.0114 ez Probability to default(1) = ---------- 1+ez where, e = 2.71828,z = b0 + b1*X1 + b2*X2+…+bn*Xn
  • 15. 15 PKC Conceptualization: Modeling steps of Logistic Regression I. Probability cut-off to accept or reject a loan application: Decide the probability level above which loans must be rejected and below level would be accepted. P>0.40 Predicted Actual 0 1 TotalPercent 0 16594 151 16745 99.1% 1 184 167 351 47.6% Total 16778 318 17096 P>0.30 Predicted Actual 0 1 TotalPercent 0 16458 287 16745 98.3% 1 96 255 351 72.6% Total 16554 542 17096 P>0.25 Predicted Actual 0 1 TotalPercent 0 16412 333 16745 98.0% 1 84 267 351 76.1% Total 16496 600 17096 These tables provides frequency distribution of correct and in- correct predictions at 0.4,0.3 & 0.25 probabilities. Best probability cut-off needs to be decided to minimizing risk and maximizing profit. If target is to acquire more customers then 0.30 cut-off is appropriate other-wise 0.25 cut- off is good enough to reduce risk.
  • 16. 16 PKC Conceptualization: Modeling steps of Logistic Regression J. Model Validation: ROC Curve Accuracy of the model is measured by the Area under ROC cure. An area of 1 represents a perfect test. In our model area under ROC curve at last step is 0.96 and it would be considered as “very good" at separating good loans versus bad loans.