SlideShare a Scribd company logo
1 of 17
Yashwantrao Chavan Institute of Science, Satara.
Department of Statistics
M.Sc. II 2018-2019
Seminar on
“Should This Loan be Approved or Denied?”: A Large
Dataset with Class Assignment Guidelines
Min Li, Amy Mickel, and Stanley Taylor
Presented by
Patil Pooja Rajaram
Roll No. 115
Content:
Introduction
Methodology
Background and Description of Datasets
Procedure
Statistical tools used
Analysis using logistic regression, Artificial
neural network and Support vector machine
Conclusion
References
IntroduCtIon:
In this article, a large and rich dataset from the U.S.
Small Business Administration (SBA) and an accompanying
assignment designed to teach statistics as an investigative
process of decision making are presented. Guidelines for the
assignment titled “Should This Loan Be Approved or Denied?,”
along with a
subset of the larger dataset, are provided.
For this case-study assignment, students assume the role of
loan officer at a bank and are asked to approve or deny a loan by
assessing its risk of default using logistic regression. The dataset
accompanying this article is a real dataset from the U.S. Small
Business Administration (SBA).
MetHodoLoGY :
By analysing real data, students experience statistics as an
investigative process of decision making, for the student is required
to answer the following question: As a representative of the bank,
should I grant a loan to a particular small business (Company X)?
Why or why not? The student makes this decision by assessing a
loan’s risk.
The assessment is accomplished by estimating the loan’s
default probability through analyzing this historical dataset and then
classifying the loan into one of two categories:
(a) higher risk—likely to default on the loan (i.e., be charged
off/failure to pay in full) or
(b) lower risk—likely to pay off the loan in full.
BaCkGround and desCrIptIon of datasets :
The U.S. SBA was founded in 1953 on the principle of
promoting and assisting small enterprises in the U.S. credit market.
SBA acts much like an insurance provider to reduce the risk for a
bank by taking on some of the risk through guaranteeing a portion
of the loan.
Two datasets are provided:
(a) “National SBA” dataset (named SBAnational.csv) from the
U.S. SBA which includes historical data from 1987 through 2014
(899,164 observations) and
(b) “SBA Case” dataset (named SBAcase.csv) which is used in
the assignment described in this paper (2102 observations).
The “SBA Case” dataset is a subset of the “National SBA.”
The variable name, the data type, and a brief description of each
variable are provided for the 27 variables in the two datasets. For the
“SBA Case” dataset, an additional eight variables were generated by
the authors as part of the assignment.
PROCEDURE:
The steps involved in the investigative process of analysing
these data to make an informed decision as to whether a loan
should be approved or denied are :
Step 1: Identifying indicators of potential risk
Step 2: Understanding the case study
Step 3: Building the model, creating decision rules, and validating
the logistic regression model and
Step 4: Using the model to make decisions.
STATISTICAL TOOLS USED FOR ANALYSIS ARE :
Statistical analysis is carried out using R-software and
statistical tools used for analysis are :
1] Logistic regression
2] Artificial neural network(ANN)
3] Support vector machine(SVM)
 Step 1: Identifying Explanatory Variables (Indicators or
Predictors) of Potential Risk
1) Location (State)
2) Industry
3) Gross Disbursement
4) New versus Established Businesses
5) Loans Backed by Real Estate
6) Economic Recession
7) SBA’s Guaranteed Portion of Approved Loan
 Step 2: Understanding the Case Study and Dataset:
Students being a loan officer for Bank of America, have
received two loan applications from two small businesses:
Carmichael Realty (a commercial real estate agency) and SV
Consulting (a real estate consulting firm). As a loan officer,
students need to determine if they should grant or deny these
two loan applications and provide an explanation as to “why or
why not.” To make this decision, they need to assess the loan’s
risk by calculating the estimated probability of default using
Step 4: Using the Model to Make Decisions :
Table 1:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.61170 0.09462 6.465 1.02e-10
Real Estate 2.12822 0.34500 6.169 6.89e-10
Portion 0.55722 0.10058 5.540 3.03e-08
Recession -0.50412 0.24121 -2.090 0.0366
classification
State of nature: Reality
Loans charged
off
Loans paid in
full
Total
Higher risk 31 14 45
Lower risk 324 682 1006
Total 355 696 1051
Model Accuracy = 0.6784015 = 67.84 %
Misclassification rate = 0.3215985 = 32.15 %
The final model with the risk indicators in Table 1 is used to
estimate the probability of default for the two loan applications, the
estimated probability of default for Carmichael Realty (Loan 1) is 0.05
and SV Consulting (Loan 2) is 0.55. Applying the decision rules and
cut-off probability of 0.5, Loan 1 is classified as “lower risk” and
should be approved, and Loan 2 is classified as “higher risk” and
should be denied.
Loan Name Date Loan SBA Real
Estate
Est.
Prob. Of
Default
Approve
1 Carmichael
Realty
Current $1000000 $750000 Yes 0.05 Yes
2 SV
Consulting
current $100000 $40000 No 0.55 No
Artificial neural network :
classification
State of nature: Reality
Loans
charged off
Loans paid in
full
Total
Higher risk 31 12 43
Lower risk 324 684 1008
Total 355 696 1051
Model Accuracy = 0.6803045 = 68.03 %
Misclassification rate = 0.3196955 = 31.96 %
Support vector machine :
classification
State of nature: Reality
Loans
charged off
Loans paid in
full
Total
Higher risk 20 39 59
Lower risk 335 657 992
Total 355 696 1051
Model Accuracy = 0.6441484 = 64.41 %
Misclassification rate = 0.3558516 = 35.58 %
Loan Application
Carmichael Realty
SV Consulting
Result :
Should be
ConClusion:
Model Accuracy Misclassification rate
Logistic regression 67.84 % 32.15 %
ANN 68.03 % 31.96 %
SVM 64.41 % 35.58 %
 The misclassification rate for support vector machine was
found to be higher than those from logistic regression or
neural networks.
 Logistic regression is equivalent to the neural network with no
hidden node.
 If the objective is to separate loans from loans that are likely
to default without needing the predicted probability of default,
then neural networks and SVM are good choices.
RefeRences :
Journal of statistics education (Taylor and Francis
group)
Introduction to linear regression analysis
:Douglas C Montgomerry, Elizabeth A. Peck, G.
Geoffrey Vining
Data mining concepts and techniques :Micheline
Kamber, Jiawei Han, Jian Pei
Thank
you…..

More Related Content

Similar to Should this loan be approved or denied

Loan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelLoan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelSaurabh Singh
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsIRJET Journal
 
IRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank LoansIRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank LoansIRJET Journal
 
fast publication journals
fast publication journalsfast publication journals
fast publication journalsrikaseorika
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
Using "big data" in the Netherlands for troubled borrowers
Using "big data" in the Netherlands for troubled borrowersUsing "big data" in the Netherlands for troubled borrowers
Using "big data" in the Netherlands for troubled borrowersjtgator
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
 
Barclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistBarclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistNaveen Kumar
 
Project Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelProject Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelSubhasis Mishra
 
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING mlaij
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersIRJET Journal
 
Mortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek MrotekMortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek Mrotekkylemrotek
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation ModelMihai Enescu
 
Predicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning AlgorithmsPredicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
 
Loan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningLoan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningSouma Maiti
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswaAkhil Bandhu Hens, FRM
 

Similar to Should this loan be approved or denied (20)

Loan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelLoan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring Model
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMs
 
IRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank LoansIRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank Loans
 
Data mining on Financial Data
Data mining on Financial DataData mining on Financial Data
Data mining on Financial Data
 
fast publication journals
fast publication journalsfast publication journals
fast publication journals
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
scrib.pptx
scrib.pptxscrib.pptx
scrib.pptx
 
Creditscore
CreditscoreCreditscore
Creditscore
 
Using "big data" in the Netherlands for troubled borrowers
Using "big data" in the Netherlands for troubled borrowersUsing "big data" in the Netherlands for troubled borrowers
Using "big data" in the Netherlands for troubled borrowers
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Barclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistBarclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National Finalist
 
Project Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelProject Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring Model
 
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
Mortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek MrotekMortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek Mrotek
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
Predicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning AlgorithmsPredicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning Algorithms
 
Loan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningLoan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine Learning
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswa
 
Group 1 p53
Group 1 p53Group 1 p53
Group 1 p53
 

Recently uploaded

Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Delhi Call girls
 
How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingAggregage
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure servicePooja Nehwal
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdfAdnet Communications
 
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...Suhani Kapoor
 
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130  Available With RoomVIP Kolkata Call Girl Serampore 👉 8250192130  Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Roomdivyansh0kumar0
 
Quantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector CompaniesQuantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector Companiesprashantbhati354
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...shivangimorya083
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Roomdivyansh0kumar0
 
Dividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxDividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxanshikagoel52
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfMichael Silva
 
Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfAdnet Communications
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...makika9823
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designsegoetzinger
 
Quarter 4- Module 3 Principles of Marketing
Quarter 4- Module 3 Principles of MarketingQuarter 4- Module 3 Principles of Marketing
Quarter 4- Module 3 Principles of MarketingMaristelaRamos12
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
 
How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of Reporting
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf
 
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
 
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130  Available With RoomVIP Kolkata Call Girl Serampore 👉 8250192130  Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
 
Quantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector CompaniesQuantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector Companies
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
 
Dividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxDividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptx
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdf
 
Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdf
 
🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road
 
Monthly Economic Monitoring of Ukraine No 231, April 2024
Monthly Economic Monitoring of Ukraine No 231, April 2024Monthly Economic Monitoring of Ukraine No 231, April 2024
Monthly Economic Monitoring of Ukraine No 231, April 2024
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
 
Quarter 4- Module 3 Principles of Marketing
Quarter 4- Module 3 Principles of MarketingQuarter 4- Module 3 Principles of Marketing
Quarter 4- Module 3 Principles of Marketing
 

Should this loan be approved or denied

  • 1. Yashwantrao Chavan Institute of Science, Satara. Department of Statistics M.Sc. II 2018-2019 Seminar on “Should This Loan be Approved or Denied?”: A Large Dataset with Class Assignment Guidelines Min Li, Amy Mickel, and Stanley Taylor Presented by Patil Pooja Rajaram Roll No. 115
  • 2. Content: Introduction Methodology Background and Description of Datasets Procedure Statistical tools used Analysis using logistic regression, Artificial neural network and Support vector machine Conclusion References
  • 3. IntroduCtIon: In this article, a large and rich dataset from the U.S. Small Business Administration (SBA) and an accompanying assignment designed to teach statistics as an investigative process of decision making are presented. Guidelines for the assignment titled “Should This Loan Be Approved or Denied?,” along with a subset of the larger dataset, are provided. For this case-study assignment, students assume the role of loan officer at a bank and are asked to approve or deny a loan by assessing its risk of default using logistic regression. The dataset accompanying this article is a real dataset from the U.S. Small Business Administration (SBA).
  • 4. MetHodoLoGY : By analysing real data, students experience statistics as an investigative process of decision making, for the student is required to answer the following question: As a representative of the bank, should I grant a loan to a particular small business (Company X)? Why or why not? The student makes this decision by assessing a loan’s risk. The assessment is accomplished by estimating the loan’s default probability through analyzing this historical dataset and then classifying the loan into one of two categories: (a) higher risk—likely to default on the loan (i.e., be charged off/failure to pay in full) or (b) lower risk—likely to pay off the loan in full.
  • 5. BaCkGround and desCrIptIon of datasets : The U.S. SBA was founded in 1953 on the principle of promoting and assisting small enterprises in the U.S. credit market. SBA acts much like an insurance provider to reduce the risk for a bank by taking on some of the risk through guaranteeing a portion of the loan. Two datasets are provided: (a) “National SBA” dataset (named SBAnational.csv) from the U.S. SBA which includes historical data from 1987 through 2014 (899,164 observations) and (b) “SBA Case” dataset (named SBAcase.csv) which is used in the assignment described in this paper (2102 observations). The “SBA Case” dataset is a subset of the “National SBA.” The variable name, the data type, and a brief description of each variable are provided for the 27 variables in the two datasets. For the “SBA Case” dataset, an additional eight variables were generated by the authors as part of the assignment.
  • 6. PROCEDURE: The steps involved in the investigative process of analysing these data to make an informed decision as to whether a loan should be approved or denied are : Step 1: Identifying indicators of potential risk Step 2: Understanding the case study Step 3: Building the model, creating decision rules, and validating the logistic regression model and Step 4: Using the model to make decisions.
  • 7. STATISTICAL TOOLS USED FOR ANALYSIS ARE : Statistical analysis is carried out using R-software and statistical tools used for analysis are : 1] Logistic regression 2] Artificial neural network(ANN) 3] Support vector machine(SVM)
  • 8.  Step 1: Identifying Explanatory Variables (Indicators or Predictors) of Potential Risk 1) Location (State) 2) Industry 3) Gross Disbursement 4) New versus Established Businesses 5) Loans Backed by Real Estate 6) Economic Recession 7) SBA’s Guaranteed Portion of Approved Loan  Step 2: Understanding the Case Study and Dataset: Students being a loan officer for Bank of America, have received two loan applications from two small businesses: Carmichael Realty (a commercial real estate agency) and SV Consulting (a real estate consulting firm). As a loan officer, students need to determine if they should grant or deny these two loan applications and provide an explanation as to “why or why not.” To make this decision, they need to assess the loan’s risk by calculating the estimated probability of default using
  • 9.
  • 10. Step 4: Using the Model to Make Decisions : Table 1: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.61170 0.09462 6.465 1.02e-10 Real Estate 2.12822 0.34500 6.169 6.89e-10 Portion 0.55722 0.10058 5.540 3.03e-08 Recession -0.50412 0.24121 -2.090 0.0366 classification State of nature: Reality Loans charged off Loans paid in full Total Higher risk 31 14 45 Lower risk 324 682 1006 Total 355 696 1051 Model Accuracy = 0.6784015 = 67.84 % Misclassification rate = 0.3215985 = 32.15 %
  • 11. The final model with the risk indicators in Table 1 is used to estimate the probability of default for the two loan applications, the estimated probability of default for Carmichael Realty (Loan 1) is 0.05 and SV Consulting (Loan 2) is 0.55. Applying the decision rules and cut-off probability of 0.5, Loan 1 is classified as “lower risk” and should be approved, and Loan 2 is classified as “higher risk” and should be denied. Loan Name Date Loan SBA Real Estate Est. Prob. Of Default Approve 1 Carmichael Realty Current $1000000 $750000 Yes 0.05 Yes 2 SV Consulting current $100000 $40000 No 0.55 No
  • 12. Artificial neural network : classification State of nature: Reality Loans charged off Loans paid in full Total Higher risk 31 12 43 Lower risk 324 684 1008 Total 355 696 1051 Model Accuracy = 0.6803045 = 68.03 % Misclassification rate = 0.3196955 = 31.96 %
  • 13. Support vector machine : classification State of nature: Reality Loans charged off Loans paid in full Total Higher risk 20 39 59 Lower risk 335 657 992 Total 355 696 1051 Model Accuracy = 0.6441484 = 64.41 % Misclassification rate = 0.3558516 = 35.58 %
  • 14. Loan Application Carmichael Realty SV Consulting Result : Should be
  • 15. ConClusion: Model Accuracy Misclassification rate Logistic regression 67.84 % 32.15 % ANN 68.03 % 31.96 % SVM 64.41 % 35.58 %  The misclassification rate for support vector machine was found to be higher than those from logistic regression or neural networks.  Logistic regression is equivalent to the neural network with no hidden node.  If the objective is to separate loans from loans that are likely to default without needing the predicted probability of default, then neural networks and SVM are good choices.
  • 16. RefeRences : Journal of statistics education (Taylor and Francis group) Introduction to linear regression analysis :Douglas C Montgomerry, Elizabeth A. Peck, G. Geoffrey Vining Data mining concepts and techniques :Micheline Kamber, Jiawei Han, Jian Pei