SlideShare a Scribd company logo
By:
Harsha Sinha (16125018)
Kriti Doneria (16125022)
Prakhar Barole (16125028)
CREDIT RISK MODELLING USING
LOGISTIC REGRESSION
STATISTICAL METHODS FOR BUSINESS ANALYTICS PROJECT REPORT
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
ACKNOWLEDGEMENTS
On completion of this project, we would like to thank our faculty, Dr. Devlina Chatterjee for
giving us the opportunity to pursue the project as a part of the curriculum and also being a
constant source of support throughout the project.
We would also like to thank our classmates and friends, who helped us in the
conceptualization of the problem statement.
Lastly, we thank all the researchers, bloggers and people from the community at large for
providing us a starting point for our project through their documentation, research and
articles.
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
TABLE OF CONTENTS
ACKNOWLEDGEMENTS.................................................................................................................................1
OBJECTIVE.....................................................................................................................................................3
INTRODUCTION.............................................................................................................................................3
CIBIL SCORE...............................................................................................................................................3
METHODOLOGY ............................................................................................................................................4
LOGISTIC REGRESSION..............................................................................................................................4
REGRESSION EQUATION .......................................................................................................................4
ASSUMPTIONS IN LOGISTIC REGRESSION.............................................................................................5
TOOLS, TECHNOLOGIES AND DATASET:........................................................................................................5
TOOLS AND TECHNOLOGIES .....................................................................................................................5
DATASET....................................................................................................................................................5
DATASET DESCRIPTION.........................................................................................................................5
MODELLING PROCESS, SELECTION AND FINE-TUNING.................................................................................6
PROCESS....................................................................................................................................................6
SELECTION.................................................................................................................................................6
FINE TUNING.............................................................................................................................................7
OBSERVATIONS.............................................................................................................................................7
SELECTING THE MODEL.............................................................................................................................7
SELECTING THE CUT-OFF...........................................................................................................................7
QUALITATIVE ANALYSIS OF THE RESULTS.....................................................................................................8
DIRECT AND INVERSE VARIATIONS...........................................................................................................8
LEVEL OF SIGNIFICANCE............................................................................................................................9
LIMITATIONS OF THE MODEL .......................................................................................................................9
Reject Inference....................................................................................................................................9
Omitted Variable bias ...........................................................................................................................9
Over fitting............................................................................................................................................9
CONCLUSION.................................................................................................................................................9
REFERENCES................................................................................................................................................10
APPENDIX....................................................................................................................................................11
R CODE ....................................................................................................................................................11
R CODE OUTPUT......................................................................................................................................12
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
OBJECTIVE
To explore qualitatively and quantitatively the risks associated with giving out credit for personal and
commercial purposes, and to model the risk factor using a widely used machine learning classification
method; Logistic Regression.
INTRODUCTION
Credit risk modelling tries to answer the question:
Assuming past behavior is predictive of future behavior, what is the probability that a
debtor will not repay the debt-holder?
The analysis of credit risk is of utmost importance for financial institutions. Historically, it was done by
taking into account the net assets a borrower had and if it was enough to cover the debt. Being manual
in nature, it was prone to human biases and corruption. In the past two decades, technology has
transformed and automate the process, making it easier to deal with the volume of debtors (for banks)
as well as variety of debt.
A milestone has been the development of CIBIL score in India.
CIBIL SCORE
A Credit Score or the CIBIL Score is a three-digit numeric summary of your credit history. The score is
derived using the credit history found in the CIR. A CIR is an individual's credit payment history across
loan types and credit institutions over a period of time. The minimum CIBIL score for a personal loan is
generally 750. Anything above this would mean that the applicant is creditworthy and applications are
processed without hassle. In general credit scores range from 300 to 900.
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
METHODOLOGY
Model used: Standard logistic heteroskedastic robust regression model.
LOGISTIC REGRESSION
Logistic regression is the type of regression we use for a response variable (Y) that follows a binomial
distribution.
 Y ~ Binomial(n, p)
 n independent trials
 p = probability of success on each trial
 Y = number of successes out of n trials
 (e.g., Y= number of heads)
REGRESSION EQUATION
P= exp (𝛽0 + 𝛽1 ∙ 𝑥1 + ⋯ + 𝛽𝑛 ∙ ) /1 +
exp(𝛽0 + 𝛽1 ∙ 𝑥1 + ⋯ + 𝛽𝑛 ∙ 𝑥𝑛 )
 p is the probability of default
 xi is the explanatory factor i
 βi is the regression coefficient of the explanatory factor i
 n is the number of explanatory variables
The reasons why Logistic regression is better suited to credit risk analysis are:
1. The independent variable (credit type and duration, income etc) are categorical in
nature. Categories make better predictors in this analysis than actual value.
2. The end result has to be in probability or percentage (like person A is x% likely to default
on the given credit), which is not possible for linear regression model since its values
vary between both ends of the number line.
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
3. The variability of the dependent variable (Y) is not constant, as in the case of a normal
distribution. Variance of a binomial distribution is given by npq, while it’s the standard
deviation constant for a normal distribution inherent assumed in linear regression
model.
ASSUMPTIONS IN LOGISTIC REGRESSION
 Absence of perfect multicollinearity
 No outliers
 Independence of errors
 Ratio of cases to variables – using discrete variables requires that there are enough responses in
every given category
 Not many missing variables
TOOLS, TECHNOLOGIES AND DATASET:
TOOLS AND TECHNOLOGIES
R scripting Language, RStudio IDE for Windows.
DATASET
The dataset is taken as bank’s record about the status of loan defaults and the profile of customers. The
dataset contains information like age, annual income, home ownership, grade of employee that affect
the loan paying capacity of the customer.
DATASET DESCRIPTION
This data is taken from https://www.biz.uiowa.edu/faculty/jledolter/datamining/dataexercises.html
1. Contains 29092 rows and 8 columns.
2. Contains 2043 rows with missing data.
3. The columns are namely:
loan_status: 0 if successful, 1 if defaulted
loan_amnt: total amount of loan taken
int_rate: interest rate
grade: grade of employment
emp_length: duration of employement
home_ownership: type of ownership of house
annual_inc: annual income
age: age of loan taker.
4. In the columns, loan_Status is binary variable, loan_amount, int_rate, annual_inc and age are
all numeric continuous variables, while grade and home ownership are categorical variables
with 7 and 4 categories respectively.
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
MODELLING PROCESS, SELECTION AND FINE-TUNING
PROCESS
By including and excluding some independent variables, three logistic regression models were built.
The dataset was divided into Training (75%) and Testing (25%) set. The objective of modelling was to
minimize the residual deviance on the testing data, using respective co-efficient computed using training
data.
SELECTION
Model selection was done on the basis of lowest AIC (Akaike information criterion), lowest median
residual Deviance and highest number of significant variables at a confidence of 99.95% and above.
Model 3 did well on all the three parameters.
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
FINE TUNING
The result obtained for the test dataset were decimal values. To make it categorical, values with
different cut off limits were used and an accuracy of 77.4% was reached. To avoid over-fitting and save
potential loss of profit, cut-offs were not increased beyond this limit.
OBSERVATIONS
SELECTING THE MODEL
MODEL 1 MODEL 2 MODEL 3
INDEPENDENT
VARIABLES
loan_amnt
int_rate
annual_inc
age
loan_amnt
int_rate
annual_inc
age
home_ownership
loan_amnt int_rate
gradeB
gradeC
gradeD
gradeE
gradeF
gradeG
emp_length home_ow
nershipOTHER
home_ownershipOWN
home_ownershipRENT
NUMBER OF
STATISTICALLY
SIGNIFICANT
INDEPENDENT
VARIABLES (At-least
.05%)
3 4 10
MEDIAN DEVIANCE
RESIDUALS -0.4331 -0.4321 -0.4312
AIC 13236 13235 12667
So, the third model is better than the other two.
SELECTING THE CUT-OFF
Setting the cutoff at .x means that there is a probability of x% that a person will default on the given
credit.
A confusion matrix is a table used to describe the performance of a classification model on a set of test
data for which the true values are known.
Its general structure is:
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
The accuracy of a model is computed as True positives+ True negatives/number of rows in test data.
confmat1 #.15
cutoff1
0 1
0 4494 1173
1 446 256
Accuracy: 65.31%
confmat2 #.20
cutoff2
0 1
0 5363 304
1 614 88
Accuracy: 74.94%
confmat3 #.25
cutoff3
0 1
0 5605 62
1 674 28
Accuracy: 77.45%
QUALITATIVE ANALYSIS OF THE RESULTS
DIRECT AND INVERSE VARIATIONS
The co-efficient of the following are positive:
loan_amnt, int_rate, gradeB, gradeC, gradeD, gradeE , gradeF , gradeG, emp_length,
home_ownershipOTHER
This means the probability of defaulting on the given credit varies directly with these factors ie more the
value, more the risk of losing credit. Common sense suggests the same.
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
For Other types of home ownership (other than home or rent, like a demolished/mortgaged home), the
probability of defaulting increases.
And the following have negative co-efficient:
home_ownershipOWN, home_ownershipRENT, annual_inc, age
This means that the probability of defaulting is inversely proportional to the factors mentioned above.
Intuitively too, it makes perfect sense.
LEVEL OF SIGNIFICANCE
Variables having at-least one star in the coefficients table are significant. Positive coefficient means
higher the value of that variable, higher the risk of default, and vice versa. The significance levels are
determined using standard Z tests.
LIMITATIONS OF THE MODEL
Reject Inference The data given by banks is inherently biased towards the rejected applications, and
hence isn’t a true representation of a client who comes through the door. Stratified sampling can help
take care of this.
Omitted Variable bias can never be fully eliminated from any type of regression. This is because of the
uncertainties in the real world.
In logistic regression no assumptions are made about the linear distribution and absence of high
degree of interaction between the explanatory variables.
Over fitting: Logistic regression sometimes tend to over-fit the sample, appearing to be more confident
than it really is. In this case, it is fine but in other cases, it might be undesirable.
CONCLUSION
 Three logit models were used to predict the loan status, the model with the least residual error was
selected. Different cut off gave different accuracy. The first model had a Akaike information criterion
score of 13236, while second model has score of 13235 and the third model has a score of 12667 w
hich has a significant improvement from other two models. Hence the most precise model was selec
ted.
 Different cut off were used to decide if the loan should be granted to be or not and cut off of .15 gav
e accuracy of 65.31% while cut off of .20 gave accuracy of 74.94% and cut off of .25 gave accuracy of
77.45%. Hence most accurate model was chosen. The decision to set a cutoff is arbitrary and higher
cut off increases the risk so a level of .25 was decided to be optimum. The area under the curve also
gives a measure of accuracy, which came out to be 64% approx.
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
REFERENCES
[1.] www.wikihow.com/Check-Your-Credit-Score-Online-in-India
[2.] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/
[3.] https://www2.deloitte.com/
[4.] Hackerearth.com
[5.] Analyticsvidya.com
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
APPENDIX
R CODE
data1<- readRDS("Loandata.rds") #reading data
head(data1) #reading the first few lines off the dataset
traindata<- sample(data1,0.75*nrow(data1))#preparing training data
testdata<-sample(data1,-.75*nrow(data1))#preparing test data
#model 1 with loan amount, interest rate, annual income, age
result<-
glm(formula=loan_status~loan_amnt+int_rate+annual_inc+age,family="binomial",data=traindata)
summary(result)
#model 2 with loan amount, interest amount, annual income, age and home ownership
result1<-
glm(formula=loan_status~loan_amnt+int_rate+annual_inc+age+home_ownership,family="binomial",da
ta=traindata)
summary(result1)
#model 3 with loan amount, interest rate, grade, employment length, annual income, age, home
ownership
result2<-glm(loan_status~.,family="binomial",data=traindata)
summary (result2)
#Least residual deviance
#predicting the result on test data
pred1<-predict(result,testdata,type="response")
pred2<-predict(result1,testdata,type="response")
pred<-predict(result2,testdata,type="response")
#Varying cut off for the best predictor on the model with least residual deviance
#at if value below .15 then it is declined else excepted
cutoff1<-ifelse(pred>.15,1,0)
#at if value below .2 then it is declined else excepted
cutoff2<-ifelse(pred>.2,1,0)
#at if value below .25 then it is declined else excepted
cutoff3<-ifelse(pred>.25,1,0)
#confusion matrix to show Type 1 and 2 errors
confmat1<-table(testdata$loan_status,cutoff1)
confmat1
confmat2<-table(testdata$loan_status,cutoff2)
confmat2
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
confmat3<-table(testdata$loan_status,cutoff3)
confmat3
#checking accuracy of different models
logit1<-sum(diag(confmat1))/nrow(testdata)
logit1
logit2<-sum(diag(confmat2))/nrow(testdata)
logit2
logit3<-sum(diag(confmat3))/nrow(testdata)
logit3
R CODE OUTPUT
summary(result)
Call:
glm(formula = loan_status ~ loan_amnt + int_rate + annual_inc +
age, family = "binomial", data = traindata)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0794 -0.5334 -0.4331 -0.3421 3.7236
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.265e+00 1.400e-01 -23.318 <2e-16 ***
loan_amnt 1.762e-07 4.127e-06 0.043 0.966
int_rate 1.517e-01 7.257e-03 20.902 <2e-16 ***
annual_inc -6.935e-06 7.700e-07 -9.005 <2e-16 ***
age -5.271e-03 3.843e-03 -1.372 0.170
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13800 on 19775 degrees of freedom
Residual deviance: 13226 on 19771 degrees of freedom
(2043 observations deleted due to missingness)
AIC: 13236
Number of Fisher Scoring iterations: 5
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
summary(result1)
Call:
glm(formula = loan_status ~ loan_amnt + int_rate + annual_inc +
age + home_ownership, family = "binomial", data = traindata)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0816 -0.5339 -0.4321 -0.3420 3.7963
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.217e+00 1.442e-01 -22.311 <2e-16 ***
loan_amnt -2.785e-08 4.133e-06 -0.007 0.9946
int_rate 1.527e-01 7.329e-03 20.837 <2e-16 ***
annual_inc -7.265e-06 8.070e-07 -9.002 <2e-16 ***
age -5.120e-03 3.843e-03 -1.332 0.1828
home_ownershipOTHER 6.196e-01 3.072e-01 2.017 0.0437 *
home_ownershipOWN -1.487e-01 9.310e-02 -1.597 0.1103
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
home_ownershipRENT -6.259e-02 5.185e-02 -1.207 0.2274
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13800 on 19775 degrees of freedom
Residual deviance: 13219 on 19768 degrees of freedom
(2043 observations deleted due to missingness)
AIC: 13235
Number of Fisher Scoring iterations: 5
summary(result2)
Call:
glm(formula = loan_status ~ ., family = "binomial", data = traindata)
Deviance Residuals:
Min 1Q Median 3Q Max
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
-1.0905 -0.5315 -0.4312 -0.3321 3.7253
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.830e+00 2.166e-01 -13.066 < 2e-16 ***
loan_amnt 2.691e-07 4.230e-06 0.064 0.949276
int_rate 8.519e-02 2.314e-02 3.681 0.000232 ***
gradeB 3.390e-01 1.092e-01 3.104 0.001909 **
gradeC 5.366e-01 1.581e-01 3.394 0.000688 ***
gradeD 6.203e-01 2.010e-01 3.086 0.002031 **
gradeE 7.253e-01 2.507e-01 2.893 0.003819 **
gradeF 9.959e-01 3.345e-01 2.977 0.002911 **
gradeG 1.192e+00 4.401e-01 2.707 0.006783 **
emp_length 3.406e-03 3.718e-03 0.916 0.359671
home_ownershipOTHER 6.501e-01 3.085e-01 2.107 0.035129 *
home_ownershipOWN -1.740e-01 9.798e-02 -1.776 0.075728 .
home_ownershipRENT -5.825e-02 5.383e-02 -1.082 0.279175
annual_inc -6.929e-06 8.191e-07 -8.460 < 2e-16 ***
age -6.457e-03 3.963e-03 -1.629 0.103211
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13214 on 19201 degrees of freedom
Residual deviance: 12637 on 19187 degrees of freedom
(2617 observations deleted due to missingness)
AIC: 12667
Number of Fisher Scoring iterations: 5
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
We then set different cutoff to know which loan application to be denied and which to be accepted
confmat1
cutoff1
0 1
0 4494 1173
1 446 256
confmat2
cutoff2
0 1
0 5363 304
1 614 88
confmat3
cutoff3
0 1
0 5605 62
1 674 28
MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
Here, cutoff1=.15, cutoff2=.20 and cutoff3=.25
Accuracy at different cutoff were
logit1
[1] 0.6531005
logit2
[1] 0.7494844
logit3
[1] 0.7745085
*

More Related Content

What's hot

Prediction of potential customers for term deposit
Prediction of potential customers for term depositPrediction of potential customers for term deposit
Prediction of potential customers for term deposit
Pranov Mishra
 
Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card Payments
Vikas Virani
 
Delopment and testing of a credit scoring model
Delopment and testing of a credit scoring modelDelopment and testing of a credit scoring model
Delopment and testing of a credit scoring model
Mattia Ciprian
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card Prediction
Alexandre Pinto
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
Tuhin AI Advisory
 
Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In Databricks
Databricks
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
Andrea Dal Pozzolo
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
Ritu Sarkar
 
AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banks
Pankaj Baid
 
Credit eda case study presentation
Credit eda case study presentation  Credit eda case study presentation
Credit eda case study presentation
DeboraJasmin S
 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detection
Ravi Gupta
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 
FinTech, AI, Machine Learning in Finance
FinTech, AI, Machine Learning in FinanceFinTech, AI, Machine Learning in Finance
FinTech, AI, Machine Learning in Finance
Sanjiv Das
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentation
Mithul Murugaadev
 
Lead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxLead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptx
RachnaGoel10
 
Data mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetData mining and analysis of customer churn dataset
Data mining and analysis of customer churn dataset
Rohan Choksi
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and Prediction
SOUMIT KAR
 
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
accenture
 

What's hot (20)

Prediction of potential customers for term deposit
Prediction of potential customers for term depositPrediction of potential customers for term deposit
Prediction of potential customers for term deposit
 
Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card Payments
 
Delopment and testing of a credit scoring model
Delopment and testing of a credit scoring modelDelopment and testing of a credit scoring model
Delopment and testing of a credit scoring model
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card Prediction
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
 
Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In Databricks
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
 
AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banks
 
Credit eda case study presentation
Credit eda case study presentation  Credit eda case study presentation
Credit eda case study presentation
 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detection
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
FinTech, AI, Machine Learning in Finance
FinTech, AI, Machine Learning in FinanceFinTech, AI, Machine Learning in Finance
FinTech, AI, Machine Learning in Finance
 
report
reportreport
report
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentation
 
Lead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxLead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptx
 
Data mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetData mining and analysis of customer churn dataset
Data mining and analysis of customer churn dataset
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and Prediction
 
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
 

Similar to Credit risk modelling using logistic regression in R

Cr risk model
Cr risk modelCr risk model
Cr risk model
Tulsi Chandan
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
A Review on Credit Card Default Modelling using Data Science
A Review on Credit Card Default Modelling using Data ScienceA Review on Credit Card Default Modelling using Data Science
A Review on Credit Card Default Modelling using Data Science
YogeshIJTSRD
 
Loan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningLoan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine Learning
Souma Maiti
 
PBA.docx ( Credit Risk Analysis of loans )
PBA.docx ( Credit Risk Analysis of loans )PBA.docx ( Credit Risk Analysis of loans )
PBA.docx ( Credit Risk Analysis of loans )
Supriyasingh459171
 
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionApplying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
VandanaSharma356
 
Risk Based Loan Approval Framework
Risk Based Loan Approval FrameworkRisk Based Loan Approval Framework
Risk Based Loan Approval Framework
Ramkumar Ravichandran
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
Souma Maiti
 
Credit scoring i financial sector
Credit scoring i financial  sector Credit scoring i financial  sector
Credit scoring i financial sector
Chandrasekhar Subramanyam
 
Improving the credit scoring model of microfinance
Improving the credit scoring model of microfinanceImproving the credit scoring model of microfinance
Improving the credit scoring model of microfinance
eSAT Publishing House
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMs
IRJET Journal
 
6317ijite01
6317ijite016317ijite01
6317ijite01
IJITE
 
Big data analytics in financial market
Big data analytics in financial marketBig data analytics in financial market
Big data analytics in financial market
eSAT Journals
 
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...
IJITE
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
Akanksha Gohil
 
PPT-SORAJ LAMSAL.pptx
PPT-SORAJ LAMSAL.pptxPPT-SORAJ LAMSAL.pptx
PPT-SORAJ LAMSAL.pptx
GitaGautam
 
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
inventionjournals
 

Similar to Credit risk modelling using logistic regression in R (20)

Cr risk model
Cr risk modelCr risk model
Cr risk model
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
A Review on Credit Card Default Modelling using Data Science
A Review on Credit Card Default Modelling using Data ScienceA Review on Credit Card Default Modelling using Data Science
A Review on Credit Card Default Modelling using Data Science
 
Loan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningLoan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine Learning
 
PBA.docx ( Credit Risk Analysis of loans )
PBA.docx ( Credit Risk Analysis of loans )PBA.docx ( Credit Risk Analysis of loans )
PBA.docx ( Credit Risk Analysis of loans )
 
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionApplying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
 
Risk Based Loan Approval Framework
Risk Based Loan Approval FrameworkRisk Based Loan Approval Framework
Risk Based Loan Approval Framework
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
 
Credit scoring i financial sector
Credit scoring i financial  sector Credit scoring i financial  sector
Credit scoring i financial sector
 
Improving the credit scoring model of microfinance
Improving the credit scoring model of microfinanceImproving the credit scoring model of microfinance
Improving the credit scoring model of microfinance
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMs
 
6317ijite01
6317ijite016317ijite01
6317ijite01
 
Big data analytics in financial market
Big data analytics in financial marketBig data analytics in financial market
Big data analytics in financial market
 
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 
PPT-SORAJ LAMSAL.pptx
PPT-SORAJ LAMSAL.pptxPPT-SORAJ LAMSAL.pptx
PPT-SORAJ LAMSAL.pptx
 
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
 

More from Kriti Doneria

Building loyalty in business markets
Building loyalty in business marketsBuilding loyalty in business markets
Building loyalty in business markets
Kriti Doneria
 
Doneria Engineering Company-Study
Doneria Engineering Company-StudyDoneria Engineering Company-Study
Doneria Engineering Company-Study
Kriti Doneria
 
Chapter 2 cloud
Chapter 2 cloudChapter 2 cloud
Chapter 2 cloud
Kriti Doneria
 
Building and sustaining hidden monopoly through organizational design: Luxot...
Building and sustaining  hidden monopoly through organizational design: Luxot...Building and sustaining  hidden monopoly through organizational design: Luxot...
Building and sustaining hidden monopoly through organizational design: Luxot...
Kriti Doneria
 
RoBoVac:Product marketing plan
RoBoVac:Product marketing plan RoBoVac:Product marketing plan
RoBoVac:Product marketing plan
Kriti Doneria
 
Marketing assignment 1 group12
Marketing assignment 1 group12Marketing assignment 1 group12
Marketing assignment 1 group12
Kriti Doneria
 
Team structures-Tata nano
Team structures-Tata nanoTeam structures-Tata nano
Team structures-Tata nano
Kriti Doneria
 
Marketing strategy for Prime Seller Hub
Marketing strategy for Prime Seller HubMarketing strategy for Prime Seller Hub
Marketing strategy for Prime Seller Hub
Kriti Doneria
 
NPA in PSB-India
NPA in PSB-IndiaNPA in PSB-India
NPA in PSB-India
Kriti Doneria
 
Kareo Case study
Kareo Case studyKareo Case study
Kareo Case study
Kriti Doneria
 
Financial ratios assignment 2016
Financial ratios assignment 2016Financial ratios assignment 2016
Financial ratios assignment 2016
Kriti Doneria
 
Jacob succhard reorganizing for 1992
Jacob succhard reorganizing for 1992Jacob succhard reorganizing for 1992
Jacob succhard reorganizing for 1992
Kriti Doneria
 
Green ICT-Calculation of carbon footprint of an academic facility
Green ICT-Calculation of carbon footprint of an academic facilityGreen ICT-Calculation of carbon footprint of an academic facility
Green ICT-Calculation of carbon footprint of an academic facility
Kriti Doneria
 
Team Zenithers
Team ZenithersTeam Zenithers
Team Zenithers
Kriti Doneria
 
Bevco-Charting the way ahead
Bevco-Charting the way aheadBevco-Charting the way ahead
Bevco-Charting the way ahead
Kriti Doneria
 
766797-Kriti Doneria
766797-Kriti Doneria766797-Kriti Doneria
766797-Kriti DoneriaKriti Doneria
 

More from Kriti Doneria (16)

Building loyalty in business markets
Building loyalty in business marketsBuilding loyalty in business markets
Building loyalty in business markets
 
Doneria Engineering Company-Study
Doneria Engineering Company-StudyDoneria Engineering Company-Study
Doneria Engineering Company-Study
 
Chapter 2 cloud
Chapter 2 cloudChapter 2 cloud
Chapter 2 cloud
 
Building and sustaining hidden monopoly through organizational design: Luxot...
Building and sustaining  hidden monopoly through organizational design: Luxot...Building and sustaining  hidden monopoly through organizational design: Luxot...
Building and sustaining hidden monopoly through organizational design: Luxot...
 
RoBoVac:Product marketing plan
RoBoVac:Product marketing plan RoBoVac:Product marketing plan
RoBoVac:Product marketing plan
 
Marketing assignment 1 group12
Marketing assignment 1 group12Marketing assignment 1 group12
Marketing assignment 1 group12
 
Team structures-Tata nano
Team structures-Tata nanoTeam structures-Tata nano
Team structures-Tata nano
 
Marketing strategy for Prime Seller Hub
Marketing strategy for Prime Seller HubMarketing strategy for Prime Seller Hub
Marketing strategy for Prime Seller Hub
 
NPA in PSB-India
NPA in PSB-IndiaNPA in PSB-India
NPA in PSB-India
 
Kareo Case study
Kareo Case studyKareo Case study
Kareo Case study
 
Financial ratios assignment 2016
Financial ratios assignment 2016Financial ratios assignment 2016
Financial ratios assignment 2016
 
Jacob succhard reorganizing for 1992
Jacob succhard reorganizing for 1992Jacob succhard reorganizing for 1992
Jacob succhard reorganizing for 1992
 
Green ICT-Calculation of carbon footprint of an academic facility
Green ICT-Calculation of carbon footprint of an academic facilityGreen ICT-Calculation of carbon footprint of an academic facility
Green ICT-Calculation of carbon footprint of an academic facility
 
Team Zenithers
Team ZenithersTeam Zenithers
Team Zenithers
 
Bevco-Charting the way ahead
Bevco-Charting the way aheadBevco-Charting the way ahead
Bevco-Charting the way ahead
 
766797-Kriti Doneria
766797-Kriti Doneria766797-Kriti Doneria
766797-Kriti Doneria
 

Recently uploaded

BONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdfBONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
coingabbar
 
The European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
GRAPE
 
Analyzing the instability of equilibrium in thr harrod domar model
Analyzing the instability of equilibrium in thr harrod domar modelAnalyzing the instability of equilibrium in thr harrod domar model
Analyzing the instability of equilibrium in thr harrod domar model
ManthanBhardwaj4
 
2. Elemental Economics - Mineral demand.pdf
2. Elemental Economics - Mineral demand.pdf2. Elemental Economics - Mineral demand.pdf
2. Elemental Economics - Mineral demand.pdf
Neal Brewster
 
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
obyzuk
 
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdfPensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
Henry Tapper
 
Instant Issue Debit Cards
Instant Issue Debit CardsInstant Issue Debit Cards
Instant Issue Debit Cards
egoetzinger
 
How to get verified on Coinbase Account?_.docx
How to get verified on Coinbase Account?_.docxHow to get verified on Coinbase Account?_.docx
How to get verified on Coinbase Account?_.docx
Buy bitget
 
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt FinancingHow Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
Vighnesh Shashtri
 
1. Elemental Economics - Introduction to mining.pdf
1. Elemental Economics - Introduction to mining.pdf1. Elemental Economics - Introduction to mining.pdf
1. Elemental Economics - Introduction to mining.pdf
Neal Brewster
 
An Overview of the Prosocial dHEDGE Vault works
An Overview of the Prosocial dHEDGE Vault worksAn Overview of the Prosocial dHEDGE Vault works
An Overview of the Prosocial dHEDGE Vault works
Colin R. Turner
 
SWAIAP Fraud Risk Mitigation Prof Oyedokun.pptx
SWAIAP Fraud Risk Mitigation   Prof Oyedokun.pptxSWAIAP Fraud Risk Mitigation   Prof Oyedokun.pptx
SWAIAP Fraud Risk Mitigation Prof Oyedokun.pptx
Godwin Emmanuel Oyedokun MBA MSc PhD FCA FCTI FCNA CFE FFAR
 
Tdasx: Unveiling the Trillion-Dollar Potential of Bitcoin DeFi
Tdasx: Unveiling the Trillion-Dollar Potential of Bitcoin DeFiTdasx: Unveiling the Trillion-Dollar Potential of Bitcoin DeFi
Tdasx: Unveiling the Trillion-Dollar Potential of Bitcoin DeFi
nimaruinazawa258
 
Globalization (Nike) Presentation PPT Poster Infographic.pdf
Globalization (Nike) Presentation PPT Poster Infographic.pdfGlobalization (Nike) Presentation PPT Poster Infographic.pdf
Globalization (Nike) Presentation PPT Poster Infographic.pdf
VohnArchieEdjan
 
NEW NORMAL! WHAT BECOMES OF ACCOUNTING PROFESSION
NEW NORMAL!  WHAT BECOMES OF ACCOUNTING PROFESSION NEW NORMAL!  WHAT BECOMES OF ACCOUNTING PROFESSION
NEW NORMAL! WHAT BECOMES OF ACCOUNTING PROFESSION
Godwin Emmanuel Oyedokun MBA MSc PhD FCA FCTI FCNA CFE FFAR
 
The secret way to sell pi coins effortlessly.
The secret way to sell pi coins effortlessly.The secret way to sell pi coins effortlessly.
The secret way to sell pi coins effortlessly.
DOT TECH
 
how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.
DOT TECH
 
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdfTumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Henry Tapper
 
Intro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptxIntro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptx
shetivia
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
egoetzinger
 

Recently uploaded (20)

BONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdfBONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
 
The European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
 
Analyzing the instability of equilibrium in thr harrod domar model
Analyzing the instability of equilibrium in thr harrod domar modelAnalyzing the instability of equilibrium in thr harrod domar model
Analyzing the instability of equilibrium in thr harrod domar model
 
2. Elemental Economics - Mineral demand.pdf
2. Elemental Economics - Mineral demand.pdf2. Elemental Economics - Mineral demand.pdf
2. Elemental Economics - Mineral demand.pdf
 
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
 
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdfPensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
 
Instant Issue Debit Cards
Instant Issue Debit CardsInstant Issue Debit Cards
Instant Issue Debit Cards
 
How to get verified on Coinbase Account?_.docx
How to get verified on Coinbase Account?_.docxHow to get verified on Coinbase Account?_.docx
How to get verified on Coinbase Account?_.docx
 
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt FinancingHow Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
 
1. Elemental Economics - Introduction to mining.pdf
1. Elemental Economics - Introduction to mining.pdf1. Elemental Economics - Introduction to mining.pdf
1. Elemental Economics - Introduction to mining.pdf
 
An Overview of the Prosocial dHEDGE Vault works
An Overview of the Prosocial dHEDGE Vault worksAn Overview of the Prosocial dHEDGE Vault works
An Overview of the Prosocial dHEDGE Vault works
 
SWAIAP Fraud Risk Mitigation Prof Oyedokun.pptx
SWAIAP Fraud Risk Mitigation   Prof Oyedokun.pptxSWAIAP Fraud Risk Mitigation   Prof Oyedokun.pptx
SWAIAP Fraud Risk Mitigation Prof Oyedokun.pptx
 
Tdasx: Unveiling the Trillion-Dollar Potential of Bitcoin DeFi
Tdasx: Unveiling the Trillion-Dollar Potential of Bitcoin DeFiTdasx: Unveiling the Trillion-Dollar Potential of Bitcoin DeFi
Tdasx: Unveiling the Trillion-Dollar Potential of Bitcoin DeFi
 
Globalization (Nike) Presentation PPT Poster Infographic.pdf
Globalization (Nike) Presentation PPT Poster Infographic.pdfGlobalization (Nike) Presentation PPT Poster Infographic.pdf
Globalization (Nike) Presentation PPT Poster Infographic.pdf
 
NEW NORMAL! WHAT BECOMES OF ACCOUNTING PROFESSION
NEW NORMAL!  WHAT BECOMES OF ACCOUNTING PROFESSION NEW NORMAL!  WHAT BECOMES OF ACCOUNTING PROFESSION
NEW NORMAL! WHAT BECOMES OF ACCOUNTING PROFESSION
 
The secret way to sell pi coins effortlessly.
The secret way to sell pi coins effortlessly.The secret way to sell pi coins effortlessly.
The secret way to sell pi coins effortlessly.
 
how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.
 
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdfTumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
 
Intro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptxIntro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptx
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
 

Credit risk modelling using logistic regression in R

  • 1. By: Harsha Sinha (16125018) Kriti Doneria (16125022) Prakhar Barole (16125028) CREDIT RISK MODELLING USING LOGISTIC REGRESSION STATISTICAL METHODS FOR BUSINESS ANALYTICS PROJECT REPORT
  • 2. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 ACKNOWLEDGEMENTS On completion of this project, we would like to thank our faculty, Dr. Devlina Chatterjee for giving us the opportunity to pursue the project as a part of the curriculum and also being a constant source of support throughout the project. We would also like to thank our classmates and friends, who helped us in the conceptualization of the problem statement. Lastly, we thank all the researchers, bloggers and people from the community at large for providing us a starting point for our project through their documentation, research and articles.
  • 3. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 TABLE OF CONTENTS ACKNOWLEDGEMENTS.................................................................................................................................1 OBJECTIVE.....................................................................................................................................................3 INTRODUCTION.............................................................................................................................................3 CIBIL SCORE...............................................................................................................................................3 METHODOLOGY ............................................................................................................................................4 LOGISTIC REGRESSION..............................................................................................................................4 REGRESSION EQUATION .......................................................................................................................4 ASSUMPTIONS IN LOGISTIC REGRESSION.............................................................................................5 TOOLS, TECHNOLOGIES AND DATASET:........................................................................................................5 TOOLS AND TECHNOLOGIES .....................................................................................................................5 DATASET....................................................................................................................................................5 DATASET DESCRIPTION.........................................................................................................................5 MODELLING PROCESS, SELECTION AND FINE-TUNING.................................................................................6 PROCESS....................................................................................................................................................6 SELECTION.................................................................................................................................................6 FINE TUNING.............................................................................................................................................7 OBSERVATIONS.............................................................................................................................................7 SELECTING THE MODEL.............................................................................................................................7 SELECTING THE CUT-OFF...........................................................................................................................7 QUALITATIVE ANALYSIS OF THE RESULTS.....................................................................................................8 DIRECT AND INVERSE VARIATIONS...........................................................................................................8 LEVEL OF SIGNIFICANCE............................................................................................................................9 LIMITATIONS OF THE MODEL .......................................................................................................................9 Reject Inference....................................................................................................................................9 Omitted Variable bias ...........................................................................................................................9 Over fitting............................................................................................................................................9 CONCLUSION.................................................................................................................................................9 REFERENCES................................................................................................................................................10 APPENDIX....................................................................................................................................................11 R CODE ....................................................................................................................................................11 R CODE OUTPUT......................................................................................................................................12
  • 4. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 OBJECTIVE To explore qualitatively and quantitatively the risks associated with giving out credit for personal and commercial purposes, and to model the risk factor using a widely used machine learning classification method; Logistic Regression. INTRODUCTION Credit risk modelling tries to answer the question: Assuming past behavior is predictive of future behavior, what is the probability that a debtor will not repay the debt-holder? The analysis of credit risk is of utmost importance for financial institutions. Historically, it was done by taking into account the net assets a borrower had and if it was enough to cover the debt. Being manual in nature, it was prone to human biases and corruption. In the past two decades, technology has transformed and automate the process, making it easier to deal with the volume of debtors (for banks) as well as variety of debt. A milestone has been the development of CIBIL score in India. CIBIL SCORE A Credit Score or the CIBIL Score is a three-digit numeric summary of your credit history. The score is derived using the credit history found in the CIR. A CIR is an individual's credit payment history across loan types and credit institutions over a period of time. The minimum CIBIL score for a personal loan is generally 750. Anything above this would mean that the applicant is creditworthy and applications are processed without hassle. In general credit scores range from 300 to 900.
  • 5. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 METHODOLOGY Model used: Standard logistic heteroskedastic robust regression model. LOGISTIC REGRESSION Logistic regression is the type of regression we use for a response variable (Y) that follows a binomial distribution.  Y ~ Binomial(n, p)  n independent trials  p = probability of success on each trial  Y = number of successes out of n trials  (e.g., Y= number of heads) REGRESSION EQUATION P= exp (𝛽0 + 𝛽1 ∙ 𝑥1 + ⋯ + 𝛽𝑛 ∙ ) /1 + exp(𝛽0 + 𝛽1 ∙ 𝑥1 + ⋯ + 𝛽𝑛 ∙ 𝑥𝑛 )  p is the probability of default  xi is the explanatory factor i  βi is the regression coefficient of the explanatory factor i  n is the number of explanatory variables The reasons why Logistic regression is better suited to credit risk analysis are: 1. The independent variable (credit type and duration, income etc) are categorical in nature. Categories make better predictors in this analysis than actual value. 2. The end result has to be in probability or percentage (like person A is x% likely to default on the given credit), which is not possible for linear regression model since its values vary between both ends of the number line.
  • 6. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 3. The variability of the dependent variable (Y) is not constant, as in the case of a normal distribution. Variance of a binomial distribution is given by npq, while it’s the standard deviation constant for a normal distribution inherent assumed in linear regression model. ASSUMPTIONS IN LOGISTIC REGRESSION  Absence of perfect multicollinearity  No outliers  Independence of errors  Ratio of cases to variables – using discrete variables requires that there are enough responses in every given category  Not many missing variables TOOLS, TECHNOLOGIES AND DATASET: TOOLS AND TECHNOLOGIES R scripting Language, RStudio IDE for Windows. DATASET The dataset is taken as bank’s record about the status of loan defaults and the profile of customers. The dataset contains information like age, annual income, home ownership, grade of employee that affect the loan paying capacity of the customer. DATASET DESCRIPTION This data is taken from https://www.biz.uiowa.edu/faculty/jledolter/datamining/dataexercises.html 1. Contains 29092 rows and 8 columns. 2. Contains 2043 rows with missing data. 3. The columns are namely: loan_status: 0 if successful, 1 if defaulted loan_amnt: total amount of loan taken int_rate: interest rate grade: grade of employment emp_length: duration of employement home_ownership: type of ownership of house annual_inc: annual income age: age of loan taker. 4. In the columns, loan_Status is binary variable, loan_amount, int_rate, annual_inc and age are all numeric continuous variables, while grade and home ownership are categorical variables with 7 and 4 categories respectively.
  • 7. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 MODELLING PROCESS, SELECTION AND FINE-TUNING PROCESS By including and excluding some independent variables, three logistic regression models were built. The dataset was divided into Training (75%) and Testing (25%) set. The objective of modelling was to minimize the residual deviance on the testing data, using respective co-efficient computed using training data. SELECTION Model selection was done on the basis of lowest AIC (Akaike information criterion), lowest median residual Deviance and highest number of significant variables at a confidence of 99.95% and above. Model 3 did well on all the three parameters.
  • 8. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 FINE TUNING The result obtained for the test dataset were decimal values. To make it categorical, values with different cut off limits were used and an accuracy of 77.4% was reached. To avoid over-fitting and save potential loss of profit, cut-offs were not increased beyond this limit. OBSERVATIONS SELECTING THE MODEL MODEL 1 MODEL 2 MODEL 3 INDEPENDENT VARIABLES loan_amnt int_rate annual_inc age loan_amnt int_rate annual_inc age home_ownership loan_amnt int_rate gradeB gradeC gradeD gradeE gradeF gradeG emp_length home_ow nershipOTHER home_ownershipOWN home_ownershipRENT NUMBER OF STATISTICALLY SIGNIFICANT INDEPENDENT VARIABLES (At-least .05%) 3 4 10 MEDIAN DEVIANCE RESIDUALS -0.4331 -0.4321 -0.4312 AIC 13236 13235 12667 So, the third model is better than the other two. SELECTING THE CUT-OFF Setting the cutoff at .x means that there is a probability of x% that a person will default on the given credit. A confusion matrix is a table used to describe the performance of a classification model on a set of test data for which the true values are known. Its general structure is:
  • 9. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 The accuracy of a model is computed as True positives+ True negatives/number of rows in test data. confmat1 #.15 cutoff1 0 1 0 4494 1173 1 446 256 Accuracy: 65.31% confmat2 #.20 cutoff2 0 1 0 5363 304 1 614 88 Accuracy: 74.94% confmat3 #.25 cutoff3 0 1 0 5605 62 1 674 28 Accuracy: 77.45% QUALITATIVE ANALYSIS OF THE RESULTS DIRECT AND INVERSE VARIATIONS The co-efficient of the following are positive: loan_amnt, int_rate, gradeB, gradeC, gradeD, gradeE , gradeF , gradeG, emp_length, home_ownershipOTHER This means the probability of defaulting on the given credit varies directly with these factors ie more the value, more the risk of losing credit. Common sense suggests the same.
  • 10. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 For Other types of home ownership (other than home or rent, like a demolished/mortgaged home), the probability of defaulting increases. And the following have negative co-efficient: home_ownershipOWN, home_ownershipRENT, annual_inc, age This means that the probability of defaulting is inversely proportional to the factors mentioned above. Intuitively too, it makes perfect sense. LEVEL OF SIGNIFICANCE Variables having at-least one star in the coefficients table are significant. Positive coefficient means higher the value of that variable, higher the risk of default, and vice versa. The significance levels are determined using standard Z tests. LIMITATIONS OF THE MODEL Reject Inference The data given by banks is inherently biased towards the rejected applications, and hence isn’t a true representation of a client who comes through the door. Stratified sampling can help take care of this. Omitted Variable bias can never be fully eliminated from any type of regression. This is because of the uncertainties in the real world. In logistic regression no assumptions are made about the linear distribution and absence of high degree of interaction between the explanatory variables. Over fitting: Logistic regression sometimes tend to over-fit the sample, appearing to be more confident than it really is. In this case, it is fine but in other cases, it might be undesirable. CONCLUSION  Three logit models were used to predict the loan status, the model with the least residual error was selected. Different cut off gave different accuracy. The first model had a Akaike information criterion score of 13236, while second model has score of 13235 and the third model has a score of 12667 w hich has a significant improvement from other two models. Hence the most precise model was selec ted.  Different cut off were used to decide if the loan should be granted to be or not and cut off of .15 gav e accuracy of 65.31% while cut off of .20 gave accuracy of 74.94% and cut off of .25 gave accuracy of 77.45%. Hence most accurate model was chosen. The decision to set a cutoff is arbitrary and higher cut off increases the risk so a level of .25 was decided to be optimum. The area under the curve also gives a measure of accuracy, which came out to be 64% approx.
  • 11. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 REFERENCES [1.] www.wikihow.com/Check-Your-Credit-Score-Online-in-India [2.] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/ [3.] https://www2.deloitte.com/ [4.] Hackerearth.com [5.] Analyticsvidya.com
  • 12. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 APPENDIX R CODE data1<- readRDS("Loandata.rds") #reading data head(data1) #reading the first few lines off the dataset traindata<- sample(data1,0.75*nrow(data1))#preparing training data testdata<-sample(data1,-.75*nrow(data1))#preparing test data #model 1 with loan amount, interest rate, annual income, age result<- glm(formula=loan_status~loan_amnt+int_rate+annual_inc+age,family="binomial",data=traindata) summary(result) #model 2 with loan amount, interest amount, annual income, age and home ownership result1<- glm(formula=loan_status~loan_amnt+int_rate+annual_inc+age+home_ownership,family="binomial",da ta=traindata) summary(result1) #model 3 with loan amount, interest rate, grade, employment length, annual income, age, home ownership result2<-glm(loan_status~.,family="binomial",data=traindata) summary (result2) #Least residual deviance #predicting the result on test data pred1<-predict(result,testdata,type="response") pred2<-predict(result1,testdata,type="response") pred<-predict(result2,testdata,type="response") #Varying cut off for the best predictor on the model with least residual deviance #at if value below .15 then it is declined else excepted cutoff1<-ifelse(pred>.15,1,0) #at if value below .2 then it is declined else excepted cutoff2<-ifelse(pred>.2,1,0) #at if value below .25 then it is declined else excepted cutoff3<-ifelse(pred>.25,1,0) #confusion matrix to show Type 1 and 2 errors confmat1<-table(testdata$loan_status,cutoff1) confmat1 confmat2<-table(testdata$loan_status,cutoff2) confmat2
  • 13. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 confmat3<-table(testdata$loan_status,cutoff3) confmat3 #checking accuracy of different models logit1<-sum(diag(confmat1))/nrow(testdata) logit1 logit2<-sum(diag(confmat2))/nrow(testdata) logit2 logit3<-sum(diag(confmat3))/nrow(testdata) logit3 R CODE OUTPUT summary(result) Call: glm(formula = loan_status ~ loan_amnt + int_rate + annual_inc + age, family = "binomial", data = traindata) Deviance Residuals: Min 1Q Median 3Q Max -1.0794 -0.5334 -0.4331 -0.3421 3.7236 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.265e+00 1.400e-01 -23.318 <2e-16 *** loan_amnt 1.762e-07 4.127e-06 0.043 0.966 int_rate 1.517e-01 7.257e-03 20.902 <2e-16 *** annual_inc -6.935e-06 7.700e-07 -9.005 <2e-16 *** age -5.271e-03 3.843e-03 -1.372 0.170 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 13800 on 19775 degrees of freedom Residual deviance: 13226 on 19771 degrees of freedom (2043 observations deleted due to missingness) AIC: 13236 Number of Fisher Scoring iterations: 5
  • 14. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 summary(result1) Call: glm(formula = loan_status ~ loan_amnt + int_rate + annual_inc + age + home_ownership, family = "binomial", data = traindata) Deviance Residuals: Min 1Q Median 3Q Max -1.0816 -0.5339 -0.4321 -0.3420 3.7963 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.217e+00 1.442e-01 -22.311 <2e-16 *** loan_amnt -2.785e-08 4.133e-06 -0.007 0.9946 int_rate 1.527e-01 7.329e-03 20.837 <2e-16 *** annual_inc -7.265e-06 8.070e-07 -9.002 <2e-16 *** age -5.120e-03 3.843e-03 -1.332 0.1828 home_ownershipOTHER 6.196e-01 3.072e-01 2.017 0.0437 * home_ownershipOWN -1.487e-01 9.310e-02 -1.597 0.1103
  • 15. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 home_ownershipRENT -6.259e-02 5.185e-02 -1.207 0.2274 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 13800 on 19775 degrees of freedom Residual deviance: 13219 on 19768 degrees of freedom (2043 observations deleted due to missingness) AIC: 13235 Number of Fisher Scoring iterations: 5 summary(result2) Call: glm(formula = loan_status ~ ., family = "binomial", data = traindata) Deviance Residuals: Min 1Q Median 3Q Max
  • 16. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 -1.0905 -0.5315 -0.4312 -0.3321 3.7253 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.830e+00 2.166e-01 -13.066 < 2e-16 *** loan_amnt 2.691e-07 4.230e-06 0.064 0.949276 int_rate 8.519e-02 2.314e-02 3.681 0.000232 *** gradeB 3.390e-01 1.092e-01 3.104 0.001909 ** gradeC 5.366e-01 1.581e-01 3.394 0.000688 *** gradeD 6.203e-01 2.010e-01 3.086 0.002031 ** gradeE 7.253e-01 2.507e-01 2.893 0.003819 ** gradeF 9.959e-01 3.345e-01 2.977 0.002911 ** gradeG 1.192e+00 4.401e-01 2.707 0.006783 ** emp_length 3.406e-03 3.718e-03 0.916 0.359671 home_ownershipOTHER 6.501e-01 3.085e-01 2.107 0.035129 * home_ownershipOWN -1.740e-01 9.798e-02 -1.776 0.075728 . home_ownershipRENT -5.825e-02 5.383e-02 -1.082 0.279175 annual_inc -6.929e-06 8.191e-07 -8.460 < 2e-16 *** age -6.457e-03 3.963e-03 -1.629 0.103211 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 13214 on 19201 degrees of freedom Residual deviance: 12637 on 19187 degrees of freedom (2617 observations deleted due to missingness) AIC: 12667 Number of Fisher Scoring iterations: 5
  • 17. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 We then set different cutoff to know which loan application to be denied and which to be accepted confmat1 cutoff1 0 1 0 4494 1173 1 446 256 confmat2 cutoff2 0 1 0 5363 304 1 614 88 confmat3 cutoff3 0 1 0 5605 62 1 674 28
  • 18. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017 Here, cutoff1=.15, cutoff2=.20 and cutoff3=.25 Accuracy at different cutoff were logit1 [1] 0.6531005 logit2 [1] 0.7494844 logit3 [1] 0.7745085 *