This document describes using logistic regression to model credit risk. It discusses CIBIL scores, the methodology of logistic regression including the regression equation and assumptions. It details the tools, technologies and dataset used which contains loan applicant information. The modeling process is described including variable selection, fine tuning the model, and observations around selecting the best model and cut-off value. Limitations of the model and conclusions are also summarized.
What is Predictive Analytics?
Predictive Analytics is the stream of the advanced analytics which utilizes diverse techniques like data mining, predictive modelling, statistics, machine learning and artificial intelligence to analyse current data and predict future.
To Know more: https://goo.gl/zAcnCR
LOAN DEFAULT PREDICTION – A CASE STUDY
Content Covered in this video:
Business Problem & Benefits
The Risk - LOAN DEFAULT PREDICTION
Data Analysis Process
Data Processing
Predictive Analysis Process
Tools & Technology
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
This is a project that I worked on as a Capstone for my Masters in Business Analytics program at the University of Cincinnati. In this project, I have performed an end-to-end data mining exercise including data cleaning, distribution analysis, exploratory data analysis, model building etc. to identify and predict Credit Card defaults using Customer's data on past payments and general profile. In the process for building Machine Learning models, I have fit and compared the performance of multiple models and algorithms like Logistic Regreesion, PCA, Classification tree, AdaBoost Classifier, ANN and LDA.
Machine Learning Project - Default credit card clients Vatsal N Shah
- The model we built here will use all possible factors to predict data on customers to find who are defaulters and non‐defaulters next month.
- The goal is to find the whether the clients are able to pay their next month credit amount.
- Identify some potential customers for the bank who can settle their credit balance.
- To determine if their customers could make the credit card payments on‐time.
- Default is the failure to pay interest or principal on a loan or credit card payment.
Loan default prediction with machine language Aayush Kumar
Deafult-Loan-Prediction-Project-Using-Random-Forest-and-Decision-Tree
Deafult Loan Prediction Project Using Random Forest and Decision Tree, In This Project we use loan data from Leanding Club Random Forest Project - Deafult Loan Prediction For this project we will be exploring publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). Hopefully, as an investor you would want to invest in people who showed a profile of having a high probability of paying you back. We will try to create a model that will help predict this.
Loan Default Prediction with Machine LearningAlibaba Cloud
See webinar recording of this presentation at: https://resource.alibabacloud.com/webinar/detail.htm?webinarId=50
This webinar is designed to help users understand the end-to-end data science processes of using a propensity model on Alibaba Cloud’s Machine Learning Platform for AI; from defining the business problem, exploratory data analysis, data processing, model training to testing and deployment. You get an end-to-end case study (including a live demo) on how to use Alibaba Cloud products to predict the propensity of loan defaults.
Learn more about Machine Learning Platform for AI:
https://www.alibabacloud.com/product/machine-learning
What is Predictive Analytics?
Predictive Analytics is the stream of the advanced analytics which utilizes diverse techniques like data mining, predictive modelling, statistics, machine learning and artificial intelligence to analyse current data and predict future.
To Know more: https://goo.gl/zAcnCR
LOAN DEFAULT PREDICTION – A CASE STUDY
Content Covered in this video:
Business Problem & Benefits
The Risk - LOAN DEFAULT PREDICTION
Data Analysis Process
Data Processing
Predictive Analysis Process
Tools & Technology
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
This is a project that I worked on as a Capstone for my Masters in Business Analytics program at the University of Cincinnati. In this project, I have performed an end-to-end data mining exercise including data cleaning, distribution analysis, exploratory data analysis, model building etc. to identify and predict Credit Card defaults using Customer's data on past payments and general profile. In the process for building Machine Learning models, I have fit and compared the performance of multiple models and algorithms like Logistic Regreesion, PCA, Classification tree, AdaBoost Classifier, ANN and LDA.
Machine Learning Project - Default credit card clients Vatsal N Shah
- The model we built here will use all possible factors to predict data on customers to find who are defaulters and non‐defaulters next month.
- The goal is to find the whether the clients are able to pay their next month credit amount.
- Identify some potential customers for the bank who can settle their credit balance.
- To determine if their customers could make the credit card payments on‐time.
- Default is the failure to pay interest or principal on a loan or credit card payment.
Loan default prediction with machine language Aayush Kumar
Deafult-Loan-Prediction-Project-Using-Random-Forest-and-Decision-Tree
Deafult Loan Prediction Project Using Random Forest and Decision Tree, In This Project we use loan data from Leanding Club Random Forest Project - Deafult Loan Prediction For this project we will be exploring publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). Hopefully, as an investor you would want to invest in people who showed a profile of having a high probability of paying you back. We will try to create a model that will help predict this.
Loan Default Prediction with Machine LearningAlibaba Cloud
See webinar recording of this presentation at: https://resource.alibabacloud.com/webinar/detail.htm?webinarId=50
This webinar is designed to help users understand the end-to-end data science processes of using a propensity model on Alibaba Cloud’s Machine Learning Platform for AI; from defining the business problem, exploratory data analysis, data processing, model training to testing and deployment. You get an end-to-end case study (including a live demo) on how to use Alibaba Cloud products to predict the propensity of loan defaults.
Learn more about Machine Learning Platform for AI:
https://www.alibabacloud.com/product/machine-learning
Default credit cards are an important issue that bring negative consequences to both sides, i.e, banks and customer. If a customer does not pay his obligations, banks loose money, the customer will lose credibility in future payments, collection calls start to be made and in last resort, the case may go into the court. In order to avoid all of that trouble, effective methods that are able to predict the default of credit cards are needed. Therefore, default credit card prediction is an important, challenging and useful task that should be addressed.
This presentation documents how the problem can be addressed, following the pipeline of a typical Patter Recognition application. The main task is to classify a set of samples representing the history of payments and bill statements of a given client plus some background information about the client according to its ability to pay or not (Default) the next monthly payment of its credit card.
Credit Card Fraud Detection Using ML In DatabricksDatabricks
In the Credit Card Companies, illegitimate credit card usage is a serious problem which results in a need to accurately detect fraudulent transactions vs non-fraudulent transactions. All organizations can be hugely impacted by fraud and fraudulent activities, especially those in financial services. The threat can originate from internal or external, but the effects can be devastating – including loss of consumer confidence, incarceration for those involved, even up to downfall of a corporation. Despite regular fraud prevention measures, these are constantly being put to the test in an attempt to beat the system.
Fraud detection is a task of predicting whether a card has been used by the cardholder. One of the methods to recognize fraud card usage is to leverage Machine Learning (ML) models. In order to more dynamically detect fraudulent transactions, one can train ML models on a set of dataset including credit card transaction information as well as card and demographic information of the owner of the account. This will be our goal of the project while leveraging Databricks.
AI powered Decision Making in Banks - How Banks today are using Advanced analytics in credit Decisioning, enhancing customer life time value, lower operating costs and stronger customer acquisition
Certain Cases of Customers default on Payments in Taiwan.
From a Risk Management Perspective a Bank/Credit Card Company is more interested in minimizing their losses towards a particular customer.
The information that is more valuable to them is estimating the probability of default rather than classifying a customer as credible/not credible.
Goal: To compute the predictive accuracy of probability of default for a Taiwanese Credit Card Client.
Problem Analysis – Classify Probability of default for next month: 1 as “Default” and 0 as “Not Default”.
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...accenture
In recent years, technological developments have undergone in-depth analysis among banks, but we are still far from attaining mature levels both at the methodological and at the credit granting, monitoring and control process levels. Banks should equip themselves with new and more structured Model Risk frameworks to manage new Machine Learning model validation paradigms. Learn more from Accenture Finance & Risk: https://accntu.re/2qGUUMx
Default credit cards are an important issue that bring negative consequences to both sides, i.e, banks and customer. If a customer does not pay his obligations, banks loose money, the customer will lose credibility in future payments, collection calls start to be made and in last resort, the case may go into the court. In order to avoid all of that trouble, effective methods that are able to predict the default of credit cards are needed. Therefore, default credit card prediction is an important, challenging and useful task that should be addressed.
This presentation documents how the problem can be addressed, following the pipeline of a typical Patter Recognition application. The main task is to classify a set of samples representing the history of payments and bill statements of a given client plus some background information about the client according to its ability to pay or not (Default) the next monthly payment of its credit card.
Credit Card Fraud Detection Using ML In DatabricksDatabricks
In the Credit Card Companies, illegitimate credit card usage is a serious problem which results in a need to accurately detect fraudulent transactions vs non-fraudulent transactions. All organizations can be hugely impacted by fraud and fraudulent activities, especially those in financial services. The threat can originate from internal or external, but the effects can be devastating – including loss of consumer confidence, incarceration for those involved, even up to downfall of a corporation. Despite regular fraud prevention measures, these are constantly being put to the test in an attempt to beat the system.
Fraud detection is a task of predicting whether a card has been used by the cardholder. One of the methods to recognize fraud card usage is to leverage Machine Learning (ML) models. In order to more dynamically detect fraudulent transactions, one can train ML models on a set of dataset including credit card transaction information as well as card and demographic information of the owner of the account. This will be our goal of the project while leveraging Databricks.
AI powered Decision Making in Banks - How Banks today are using Advanced analytics in credit Decisioning, enhancing customer life time value, lower operating costs and stronger customer acquisition
Certain Cases of Customers default on Payments in Taiwan.
From a Risk Management Perspective a Bank/Credit Card Company is more interested in minimizing their losses towards a particular customer.
The information that is more valuable to them is estimating the probability of default rather than classifying a customer as credible/not credible.
Goal: To compute the predictive accuracy of probability of default for a Taiwanese Credit Card Client.
Problem Analysis – Classify Probability of default for next month: 1 as “Default” and 0 as “Not Default”.
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...accenture
In recent years, technological developments have undergone in-depth analysis among banks, but we are still far from attaining mature levels both at the methodological and at the credit granting, monitoring and control process levels. Banks should equip themselves with new and more structured Model Risk frameworks to manage new Machine Learning model validation paradigms. Learn more from Accenture Finance & Risk: https://accntu.re/2qGUUMx
A Review on Credit Card Default Modelling using Data ScienceYogeshIJTSRD
In the last few years, credit card issuers have become one of the major consumer lending products in the U.S. as well as several other developed nations of the world, representing roughly 30 of total consumer lending USD 3.6 tn in 2016 . Credit cards issued by banks hold the majority of the market share with approximately 70 of the total outstanding balance. Bank’s credit card charge offs have stabilized after the financial crisis to around 3 of the outstanding total balance. However, there are still differences in the credit card charge off levels between different competitors. Harsh Nautiyal | Ayush Jyala | Dishank Bhandari "A Review on Credit Card Default Modelling using Data Science" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Special Issue | International Conference on Advances in Engineering, Science and Technology - 2021 , May 2021, URL: https://www.ijtsrd.com/papers/ijtsrd42461.pdf Paper URL : https://www.ijtsrd.com/engineering/computer-engineering/42461/a-review-on-credit-card-default-modelling-using-data-science/harsh-nautiyal
Applying Convolutional-GRU for Term Deposit Likelihood PredictionVandanaSharma356
Banks are normally offered two kinds of deposit accounts. It consists of deposits like current/saving account and term deposits like fixed or recurring deposits.For enhancing the maximized profit from bank as well as customer perspective, term deposit can accelerate uplifting of finance fields. This paper focuses on likelihood of term deposit subscription taken by the customers. Bank campaign efforts and customer detail analysis caninfluence term deposit subscription chances. An automated system is approached in this paper that works towards prediction of term deposit investment possibilities in advance. This paper proposes deep learning based hybrid model that stacks Convolutional layers and Recurrent Neural Network (RNN) layers as predictive model. For RNN, Gated Recurrent Unit (GRU) is employed. The proposed predictive model is later compared with other benchmark classifiers such as k-Nearest Neighbor (k-NN), Decision tree classifier (DT), and Multi-layer perceptron classifier (MLP). Experimental study concludesthat proposed model attainsan accuracy of 89.59% and MSE of 0.1041 which outperform wellother baseline models.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Predictive models are able to predict edX student grades with an accuracy error of 0.1 (10%, about one
letter grade standard deviation), based on participation data. Student background variables are not useful
for predicting grades. By using a combination of segmentation, random forest regression, linear
transformation and application beyond the segmented data, it is possible to determine the population of the
Auditors student use case, a population larger than those students completing courses with grades.
Abstract
Big data plays a serious role within the business for creating higher predictions over business information that is collected from the real world. Finance is that the new sector wherever the big data technologies like Hadoop, NoSQL are creating its mark in predictions from financial data by the analysts. It’s a lot of fascinating within the stock exchange choices which might predict on a lot of profits of stock exchange. For this stock exchange analysis each regular information and historical information of specific stock exchange are needed for creating predictions. There are varied techniques used for analyzing the unstructured information like stock exchange reviews (day-to-day information) and historical statistic of economic information severally. This paper involves discussion regarding the strategies that square measure used for analyzing each varieties of information.
Keywords: Big data, prediction, finance, stock market, business intelligence
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...IJITE
Predictive models are able to predict edX student grades with an accuracy error of 0.1 (10%, about one letter grade standard deviation), based on participation data. Student background variables are not useful for predicting grades. By using a combination of segmentation, random forest regression, linear transformation and application beyond the segmented data, it is possible to determine the population of the Auditors student use case, a population larger than those students completing courses with grades.
This report contains:-
1. what is data analytics, its usages, its types.
2. Tools used for data analytics
3. description of Classification
4. description of the association
5. description of clustering
6. decision tree, SVM modelling etc with example
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...inventionjournals
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Similar to Credit risk modelling using logistic regression in R (20)
Building and sustaining hidden monopoly through organizational design: Luxot...Kriti Doneria
Did you know why your Prada glasses are so expensive?
Did you know that almost every eyewear you use in a lifetime is manufactured by a single company?
Read on to know more..
Prepared for coursework Organizational structure and design,supervised by DR. Amit Shukla,IME IITK.
Special thanks to Mr Akash Goyle,Managing Director at Luxottica India for valuable inputs.
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdfcoingabbar
Introducing BONKMILLON - The Most Bonkers Meme Coin Yet
Let's be real for a second – the world of meme coins can feel like a bit of a circus at times. Every other day, there's a new token promising to take you "to the moon" or offering some groundbreaking utility that'll change the game forever. But how many of them actually deliver on that hype?
The European Unemployment Puzzle: implications from population agingGRAPE
We study the link between the evolving age structure of the working population and unemployment. We build a large new Keynesian OLG model with a realistic age structure, labor market frictions, sticky prices, and aggregate shocks. Once calibrated to the European economy, we quantify the extent to which demographic changes over the last three decades have contributed to the decline of the unemployment rate. Our findings yield important implications for the future evolution of unemployment given the anticipated further aging of the working population in Europe. We also quantify the implications for optimal monetary policy: lowering inflation volatility becomes less costly in terms of GDP and unemployment volatility, which hints that optimal monetary policy may be more hawkish in an aging society. Finally, our results also propose a partial reversal of the European-US unemployment puzzle due to the fact that the share of young workers is expected to remain robust in the US.
2. Elemental Economics - Mineral demand.pdfNeal Brewster
After this second you should be able to: Explain the main determinants of demand for any mineral product, and their relative importance; recognise and explain how demand for any product is likely to change with economic activity; recognise and explain the roles of technology and relative prices in influencing demand; be able to explain the differences between the rates of growth of demand for different products.
How to get verified on Coinbase Account?_.docxBuy bitget
t's important to note that buying verified Coinbase accounts is not recommended and may violate Coinbase's terms of service. Instead of searching to "buy verified Coinbase accounts," follow the proper steps to verify your own account to ensure compliance and security.
1. Elemental Economics - Introduction to mining.pdfNeal Brewster
After this first you should: Understand the nature of mining; have an awareness of the industry’s boundaries, corporate structure and size; appreciation the complex motivations and objectives of the industries’ various participants; know how mineral reserves are defined and estimated, and how they evolve over time.
Lecture slide titled Fraud Risk Mitigation, Webinar Lecture Delivered at the Society for West African Internal Audit Practitioners (SWAIAP) on Wednesday, November 8, 2023.
This presentation poster infographic delves into the multifaceted impacts of globalization through the lens of Nike, a prominent global brand. It explores how globalization has reshaped Nike's supply chain, marketing strategies, and cultural influence worldwide, examining both the benefits and challenges associated with its global expansion.
PPrreesseenntteedd bbyy:: GGrroouupp 66
GGlloobbaalliizzaattiioonn
o f
PP
oo
ll
yy
ee
ss
tt
ee
rr
RR
uu
bb
bb
ee
rr
EE
tt
hh
yy
ll
ee
nn
ee
VV
ii
nn
yy
ll
AA
cc
ee
tt
aa
tt
ee
GG
ee
nn
uu
ii
nn
ee
LL
ee
aa
tt
hh
ee
rr
SS
yy
nn
tt
hh
ee
tt
ii
cc
LL
ee
aa
tt
hh
ee
rr
CC
oo
tt
tt
oo
nn
C
o
u
n
t
r
i
e
s
I
n
v
o
l
v
e
d
Ni
k
e
h
a
s
m
o
r
e
t
h
a
n
7
0
0
s
h
o
p
s
i
n
c
o
n
t
r
a
c
t
w
i
t
h
w
o
r
l
d
w
i
d
e,
w
h
e
r
e
i
n
t
h
e
i
r
offi
c
e
s
a
n
d
i
n
d
e
p
e
n
d
e
n
t
fa
c
t
o
r
y
o
u
t
l
e
t
s
a
r
e
fo
u
n
d
w
i
t
h
i
n
t
h
e
p
r
e
m
i
s
e
s
of
ap
p
r
o
x
i
m
a
t
e
l
y
4
5
c
o
u
n
t
r
i
e
s.
AAuussttrraalliiaa
China
India
IInnddoonneessiiaa
TThhaaiillaanndd
TTuurrkkeeyy
USA
VViieettnnaamm
NNiikkee SSuuppppllyy CChhaaiinn
RRuubbbbeerr,, FFaabbrriicc
aanndd ootthheerr rraaww
mmaatteerriiaallss
Shoe
MMaannuuffaaccttuurriinngg
aanndd AAsssseemmbbllyy
MMaarrkkeettiinngg
SSppoorrttiinngg ggooooddss,,
ddeevveellooppmmeenntt
aanndd SShhooee ssttoorreess
OOnnlliinnee,, CCaattaalloogg
aanndd ootthheerr rreettaaiill
NNiikkee bbrraannddeedd
shoes
PPrroodduucctt
ddeevveellooppmmeenntt
CCuussttoommeerr nneeeeddss//wwaannttss ffeeeeddbbaacckk
NNiikk
Nike Supply Chain
Globalization of Nike
Nike Manufacturing Process
Rubber Materials Nike
Ethylene Vinyl Acetate Nike
Genuine Leather Nike
Synthetic Leather Nike
Cotton in Nike Apparel
Nike Shops Worldwide
Nike Manufacturing Countries
Cold Cement Assembly Nike
3D Printing Nike Shoes
Nike Product Development
Nike Marketing Strategies
Nike Customer Feedback
Nike Distribution Centers
Automation in Nike Manufacturing
Nike Consumer Direct Acceleration
Nike Logistics and Transport
The secret way to sell pi coins effortlessly.DOT TECH
Well as we all know pi isn't launched yet. But you can still sell your pi coins effortlessly because some whales in China are interested in holding massive pi coins. And they are willing to pay good money for it. If you are interested in selling I will leave a contact for you. Just telegram this number below. I sold about 3000 pi coins to him and he paid me immediately.
Telegram: @Pi_vendor_247
how to sell pi coins in South Korea profitably.DOT TECH
Yes. You can sell your pi network coins in South Korea or any other country, by finding a verified pi merchant
What is a verified pi merchant?
Since pi network is not launched yet on any exchange, the only way you can sell pi coins is by selling to a verified pi merchant, and this is because pi network is not launched yet on any exchange and no pre-sale or ico offerings Is done on pi.
Since there is no pre-sale, the only way exchanges can get pi is by buying from miners. So a pi merchant facilitates these transactions by acting as a bridge for both transactions.
How can i find a pi vendor/merchant?
Well for those who haven't traded with a pi merchant or who don't already have one. I will leave the telegram id of my personal pi merchant who i trade pi with.
Tele gram: @Pi_vendor_247
#pi #sell #nigeria #pinetwork #picoins #sellpi #Nigerian #tradepi #pinetworkcoins #sellmypi
Credit risk modelling using logistic regression in R
1. By:
Harsha Sinha (16125018)
Kriti Doneria (16125022)
Prakhar Barole (16125028)
CREDIT RISK MODELLING USING
LOGISTIC REGRESSION
STATISTICAL METHODS FOR BUSINESS ANALYTICS PROJECT REPORT
2. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
ACKNOWLEDGEMENTS
On completion of this project, we would like to thank our faculty, Dr. Devlina Chatterjee for
giving us the opportunity to pursue the project as a part of the curriculum and also being a
constant source of support throughout the project.
We would also like to thank our classmates and friends, who helped us in the
conceptualization of the problem statement.
Lastly, we thank all the researchers, bloggers and people from the community at large for
providing us a starting point for our project through their documentation, research and
articles.
3. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
TABLE OF CONTENTS
ACKNOWLEDGEMENTS.................................................................................................................................1
OBJECTIVE.....................................................................................................................................................3
INTRODUCTION.............................................................................................................................................3
CIBIL SCORE...............................................................................................................................................3
METHODOLOGY ............................................................................................................................................4
LOGISTIC REGRESSION..............................................................................................................................4
REGRESSION EQUATION .......................................................................................................................4
ASSUMPTIONS IN LOGISTIC REGRESSION.............................................................................................5
TOOLS, TECHNOLOGIES AND DATASET:........................................................................................................5
TOOLS AND TECHNOLOGIES .....................................................................................................................5
DATASET....................................................................................................................................................5
DATASET DESCRIPTION.........................................................................................................................5
MODELLING PROCESS, SELECTION AND FINE-TUNING.................................................................................6
PROCESS....................................................................................................................................................6
SELECTION.................................................................................................................................................6
FINE TUNING.............................................................................................................................................7
OBSERVATIONS.............................................................................................................................................7
SELECTING THE MODEL.............................................................................................................................7
SELECTING THE CUT-OFF...........................................................................................................................7
QUALITATIVE ANALYSIS OF THE RESULTS.....................................................................................................8
DIRECT AND INVERSE VARIATIONS...........................................................................................................8
LEVEL OF SIGNIFICANCE............................................................................................................................9
LIMITATIONS OF THE MODEL .......................................................................................................................9
Reject Inference....................................................................................................................................9
Omitted Variable bias ...........................................................................................................................9
Over fitting............................................................................................................................................9
CONCLUSION.................................................................................................................................................9
REFERENCES................................................................................................................................................10
APPENDIX....................................................................................................................................................11
R CODE ....................................................................................................................................................11
R CODE OUTPUT......................................................................................................................................12
4. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
OBJECTIVE
To explore qualitatively and quantitatively the risks associated with giving out credit for personal and
commercial purposes, and to model the risk factor using a widely used machine learning classification
method; Logistic Regression.
INTRODUCTION
Credit risk modelling tries to answer the question:
Assuming past behavior is predictive of future behavior, what is the probability that a
debtor will not repay the debt-holder?
The analysis of credit risk is of utmost importance for financial institutions. Historically, it was done by
taking into account the net assets a borrower had and if it was enough to cover the debt. Being manual
in nature, it was prone to human biases and corruption. In the past two decades, technology has
transformed and automate the process, making it easier to deal with the volume of debtors (for banks)
as well as variety of debt.
A milestone has been the development of CIBIL score in India.
CIBIL SCORE
A Credit Score or the CIBIL Score is a three-digit numeric summary of your credit history. The score is
derived using the credit history found in the CIR. A CIR is an individual's credit payment history across
loan types and credit institutions over a period of time. The minimum CIBIL score for a personal loan is
generally 750. Anything above this would mean that the applicant is creditworthy and applications are
processed without hassle. In general credit scores range from 300 to 900.
5. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
METHODOLOGY
Model used: Standard logistic heteroskedastic robust regression model.
LOGISTIC REGRESSION
Logistic regression is the type of regression we use for a response variable (Y) that follows a binomial
distribution.
Y ~ Binomial(n, p)
n independent trials
p = probability of success on each trial
Y = number of successes out of n trials
(e.g., Y= number of heads)
REGRESSION EQUATION
P= exp (𝛽0 + 𝛽1 ∙ 𝑥1 + ⋯ + 𝛽𝑛 ∙ ) /1 +
exp(𝛽0 + 𝛽1 ∙ 𝑥1 + ⋯ + 𝛽𝑛 ∙ 𝑥𝑛 )
p is the probability of default
xi is the explanatory factor i
βi is the regression coefficient of the explanatory factor i
n is the number of explanatory variables
The reasons why Logistic regression is better suited to credit risk analysis are:
1. The independent variable (credit type and duration, income etc) are categorical in
nature. Categories make better predictors in this analysis than actual value.
2. The end result has to be in probability or percentage (like person A is x% likely to default
on the given credit), which is not possible for linear regression model since its values
vary between both ends of the number line.
6. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
3. The variability of the dependent variable (Y) is not constant, as in the case of a normal
distribution. Variance of a binomial distribution is given by npq, while it’s the standard
deviation constant for a normal distribution inherent assumed in linear regression
model.
ASSUMPTIONS IN LOGISTIC REGRESSION
Absence of perfect multicollinearity
No outliers
Independence of errors
Ratio of cases to variables – using discrete variables requires that there are enough responses in
every given category
Not many missing variables
TOOLS, TECHNOLOGIES AND DATASET:
TOOLS AND TECHNOLOGIES
R scripting Language, RStudio IDE for Windows.
DATASET
The dataset is taken as bank’s record about the status of loan defaults and the profile of customers. The
dataset contains information like age, annual income, home ownership, grade of employee that affect
the loan paying capacity of the customer.
DATASET DESCRIPTION
This data is taken from https://www.biz.uiowa.edu/faculty/jledolter/datamining/dataexercises.html
1. Contains 29092 rows and 8 columns.
2. Contains 2043 rows with missing data.
3. The columns are namely:
loan_status: 0 if successful, 1 if defaulted
loan_amnt: total amount of loan taken
int_rate: interest rate
grade: grade of employment
emp_length: duration of employement
home_ownership: type of ownership of house
annual_inc: annual income
age: age of loan taker.
4. In the columns, loan_Status is binary variable, loan_amount, int_rate, annual_inc and age are
all numeric continuous variables, while grade and home ownership are categorical variables
with 7 and 4 categories respectively.
7. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
MODELLING PROCESS, SELECTION AND FINE-TUNING
PROCESS
By including and excluding some independent variables, three logistic regression models were built.
The dataset was divided into Training (75%) and Testing (25%) set. The objective of modelling was to
minimize the residual deviance on the testing data, using respective co-efficient computed using training
data.
SELECTION
Model selection was done on the basis of lowest AIC (Akaike information criterion), lowest median
residual Deviance and highest number of significant variables at a confidence of 99.95% and above.
Model 3 did well on all the three parameters.
8. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
FINE TUNING
The result obtained for the test dataset were decimal values. To make it categorical, values with
different cut off limits were used and an accuracy of 77.4% was reached. To avoid over-fitting and save
potential loss of profit, cut-offs were not increased beyond this limit.
OBSERVATIONS
SELECTING THE MODEL
MODEL 1 MODEL 2 MODEL 3
INDEPENDENT
VARIABLES
loan_amnt
int_rate
annual_inc
age
loan_amnt
int_rate
annual_inc
age
home_ownership
loan_amnt int_rate
gradeB
gradeC
gradeD
gradeE
gradeF
gradeG
emp_length home_ow
nershipOTHER
home_ownershipOWN
home_ownershipRENT
NUMBER OF
STATISTICALLY
SIGNIFICANT
INDEPENDENT
VARIABLES (At-least
.05%)
3 4 10
MEDIAN DEVIANCE
RESIDUALS -0.4331 -0.4321 -0.4312
AIC 13236 13235 12667
So, the third model is better than the other two.
SELECTING THE CUT-OFF
Setting the cutoff at .x means that there is a probability of x% that a person will default on the given
credit.
A confusion matrix is a table used to describe the performance of a classification model on a set of test
data for which the true values are known.
Its general structure is:
9. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
The accuracy of a model is computed as True positives+ True negatives/number of rows in test data.
confmat1 #.15
cutoff1
0 1
0 4494 1173
1 446 256
Accuracy: 65.31%
confmat2 #.20
cutoff2
0 1
0 5363 304
1 614 88
Accuracy: 74.94%
confmat3 #.25
cutoff3
0 1
0 5605 62
1 674 28
Accuracy: 77.45%
QUALITATIVE ANALYSIS OF THE RESULTS
DIRECT AND INVERSE VARIATIONS
The co-efficient of the following are positive:
loan_amnt, int_rate, gradeB, gradeC, gradeD, gradeE , gradeF , gradeG, emp_length,
home_ownershipOTHER
This means the probability of defaulting on the given credit varies directly with these factors ie more the
value, more the risk of losing credit. Common sense suggests the same.
10. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
For Other types of home ownership (other than home or rent, like a demolished/mortgaged home), the
probability of defaulting increases.
And the following have negative co-efficient:
home_ownershipOWN, home_ownershipRENT, annual_inc, age
This means that the probability of defaulting is inversely proportional to the factors mentioned above.
Intuitively too, it makes perfect sense.
LEVEL OF SIGNIFICANCE
Variables having at-least one star in the coefficients table are significant. Positive coefficient means
higher the value of that variable, higher the risk of default, and vice versa. The significance levels are
determined using standard Z tests.
LIMITATIONS OF THE MODEL
Reject Inference The data given by banks is inherently biased towards the rejected applications, and
hence isn’t a true representation of a client who comes through the door. Stratified sampling can help
take care of this.
Omitted Variable bias can never be fully eliminated from any type of regression. This is because of the
uncertainties in the real world.
In logistic regression no assumptions are made about the linear distribution and absence of high
degree of interaction between the explanatory variables.
Over fitting: Logistic regression sometimes tend to over-fit the sample, appearing to be more confident
than it really is. In this case, it is fine but in other cases, it might be undesirable.
CONCLUSION
Three logit models were used to predict the loan status, the model with the least residual error was
selected. Different cut off gave different accuracy. The first model had a Akaike information criterion
score of 13236, while second model has score of 13235 and the third model has a score of 12667 w
hich has a significant improvement from other two models. Hence the most precise model was selec
ted.
Different cut off were used to decide if the loan should be granted to be or not and cut off of .15 gav
e accuracy of 65.31% while cut off of .20 gave accuracy of 74.94% and cut off of .25 gave accuracy of
77.45%. Hence most accurate model was chosen. The decision to set a cutoff is arbitrary and higher
cut off increases the risk so a level of .25 was decided to be optimum. The area under the curve also
gives a measure of accuracy, which came out to be 64% approx.
11. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
REFERENCES
[1.] www.wikihow.com/Check-Your-Credit-Score-Online-in-India
[2.] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/
[3.] https://www2.deloitte.com/
[4.] Hackerearth.com
[5.] Analyticsvidya.com
12. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
APPENDIX
R CODE
data1<- readRDS("Loandata.rds") #reading data
head(data1) #reading the first few lines off the dataset
traindata<- sample(data1,0.75*nrow(data1))#preparing training data
testdata<-sample(data1,-.75*nrow(data1))#preparing test data
#model 1 with loan amount, interest rate, annual income, age
result<-
glm(formula=loan_status~loan_amnt+int_rate+annual_inc+age,family="binomial",data=traindata)
summary(result)
#model 2 with loan amount, interest amount, annual income, age and home ownership
result1<-
glm(formula=loan_status~loan_amnt+int_rate+annual_inc+age+home_ownership,family="binomial",da
ta=traindata)
summary(result1)
#model 3 with loan amount, interest rate, grade, employment length, annual income, age, home
ownership
result2<-glm(loan_status~.,family="binomial",data=traindata)
summary (result2)
#Least residual deviance
#predicting the result on test data
pred1<-predict(result,testdata,type="response")
pred2<-predict(result1,testdata,type="response")
pred<-predict(result2,testdata,type="response")
#Varying cut off for the best predictor on the model with least residual deviance
#at if value below .15 then it is declined else excepted
cutoff1<-ifelse(pred>.15,1,0)
#at if value below .2 then it is declined else excepted
cutoff2<-ifelse(pred>.2,1,0)
#at if value below .25 then it is declined else excepted
cutoff3<-ifelse(pred>.25,1,0)
#confusion matrix to show Type 1 and 2 errors
confmat1<-table(testdata$loan_status,cutoff1)
confmat1
confmat2<-table(testdata$loan_status,cutoff2)
confmat2
13. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
confmat3<-table(testdata$loan_status,cutoff3)
confmat3
#checking accuracy of different models
logit1<-sum(diag(confmat1))/nrow(testdata)
logit1
logit2<-sum(diag(confmat2))/nrow(testdata)
logit2
logit3<-sum(diag(confmat3))/nrow(testdata)
logit3
R CODE OUTPUT
summary(result)
Call:
glm(formula = loan_status ~ loan_amnt + int_rate + annual_inc +
age, family = "binomial", data = traindata)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0794 -0.5334 -0.4331 -0.3421 3.7236
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.265e+00 1.400e-01 -23.318 <2e-16 ***
loan_amnt 1.762e-07 4.127e-06 0.043 0.966
int_rate 1.517e-01 7.257e-03 20.902 <2e-16 ***
annual_inc -6.935e-06 7.700e-07 -9.005 <2e-16 ***
age -5.271e-03 3.843e-03 -1.372 0.170
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13800 on 19775 degrees of freedom
Residual deviance: 13226 on 19771 degrees of freedom
(2043 observations deleted due to missingness)
AIC: 13236
Number of Fisher Scoring iterations: 5
14. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
summary(result1)
Call:
glm(formula = loan_status ~ loan_amnt + int_rate + annual_inc +
age + home_ownership, family = "binomial", data = traindata)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0816 -0.5339 -0.4321 -0.3420 3.7963
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.217e+00 1.442e-01 -22.311 <2e-16 ***
loan_amnt -2.785e-08 4.133e-06 -0.007 0.9946
int_rate 1.527e-01 7.329e-03 20.837 <2e-16 ***
annual_inc -7.265e-06 8.070e-07 -9.002 <2e-16 ***
age -5.120e-03 3.843e-03 -1.332 0.1828
home_ownershipOTHER 6.196e-01 3.072e-01 2.017 0.0437 *
home_ownershipOWN -1.487e-01 9.310e-02 -1.597 0.1103
15. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
home_ownershipRENT -6.259e-02 5.185e-02 -1.207 0.2274
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13800 on 19775 degrees of freedom
Residual deviance: 13219 on 19768 degrees of freedom
(2043 observations deleted due to missingness)
AIC: 13235
Number of Fisher Scoring iterations: 5
summary(result2)
Call:
glm(formula = loan_status ~ ., family = "binomial", data = traindata)
Deviance Residuals:
Min 1Q Median 3Q Max
16. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
-1.0905 -0.5315 -0.4312 -0.3321 3.7253
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.830e+00 2.166e-01 -13.066 < 2e-16 ***
loan_amnt 2.691e-07 4.230e-06 0.064 0.949276
int_rate 8.519e-02 2.314e-02 3.681 0.000232 ***
gradeB 3.390e-01 1.092e-01 3.104 0.001909 **
gradeC 5.366e-01 1.581e-01 3.394 0.000688 ***
gradeD 6.203e-01 2.010e-01 3.086 0.002031 **
gradeE 7.253e-01 2.507e-01 2.893 0.003819 **
gradeF 9.959e-01 3.345e-01 2.977 0.002911 **
gradeG 1.192e+00 4.401e-01 2.707 0.006783 **
emp_length 3.406e-03 3.718e-03 0.916 0.359671
home_ownershipOTHER 6.501e-01 3.085e-01 2.107 0.035129 *
home_ownershipOWN -1.740e-01 9.798e-02 -1.776 0.075728 .
home_ownershipRENT -5.825e-02 5.383e-02 -1.082 0.279175
annual_inc -6.929e-06 8.191e-07 -8.460 < 2e-16 ***
age -6.457e-03 3.963e-03 -1.629 0.103211
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13214 on 19201 degrees of freedom
Residual deviance: 12637 on 19187 degrees of freedom
(2617 observations deleted due to missingness)
AIC: 12667
Number of Fisher Scoring iterations: 5
17. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
We then set different cutoff to know which loan application to be denied and which to be accepted
confmat1
cutoff1
0 1
0 4494 1173
1 446 256
confmat2
cutoff2
0 1
0 5363 304
1 614 88
confmat3
cutoff3
0 1
0 5605 62
1 674 28
18. MBA652A Course Instructor: Dr. Devlina Chatterjee April 2017
Here, cutoff1=.15, cutoff2=.20 and cutoff3=.25
Accuracy at different cutoff were
logit1
[1] 0.6531005
logit2
[1] 0.7494844
logit3
[1] 0.7745085
*