SlideShare a Scribd company logo
THE CREDIT RISK ANALYTICS
EDA Case Study By,
• Mr. Prathmesh Pise
• Mr. Vishal Patil
 CONTENTS
 Problem statement
 Flow Chart
 Importing and Cleaning1
 Importing and Cleaning2
 Approach
 Data Visualization
 Significant Insights
 PROBLEM STATEMENT:
1. Aim is to identify patterns which indicate if a client had difficulty paying their installments which
will help the bank in taking following actions:
• Denying the loan
• Reducing the amount of loan
• Lending (to risky applicants) at a higher interest rate, etc.
2. Identifying the co-relation between dependent variables with target variable
3. To ensure that the consumers capable of repaying the loan are not rejected
 FLOW CHART
DATA
LOADING
DETECT
TARGET
VARIABL
E
DATA
CLEANING
HANDLING
MISSING
VALUES
UNIVARIATE
ANALYSIS
BIVARIATE
ANALYSIS
MULTIVARIATE
ANALYSIS
DATA
VISUALIZATION
DATA
INSIGHTS
EDA Process Followed
1. Imported pandas, matplotlib and seaborn library for loading the data and data
visualization
2. Target variable is flag variable weather a clients pays instalments on time or not
3. Two data frames were created from csv files namely,
• Application data- Contains all the information of the client at the time of application
• Previous application data - contains information about the client’s previous loan data
4. Dropped unnecessary columns like the one belonging to client’s house dimensions
5. Achieved 40% memory usage reduction by changing the data types of categorical
variables from object to category.
 IMPORTING AND CLEANING1:
 IMPORTING AND CLEANING2:
1. Imported required data set for previous application data set:
• Previous application data set as previous_app
2. Cleaned the data by removing columns that were less significant for
analysis and were prone to containing erroneous data, namely,
• WEEKDAY_APPR_PROCESS_START
• HOUR_APPR_PROCESS_START, etc.
3. Achieved 40% memory usage reduction by changing the data types
of categorical variables from object to category and dropping
unnecessary columns
HANDLING DATA AND MISSING VALUES:
1. Checked for null values in application_data and found that:
• OWN_CAR_AGE had 65.99%, OCCUPATION_TYPE had 31.35% and EXT_SOURCE_1
had 56.38% missing values
• Hence decided to drop these columns
2. We also checked for null values in previous_app and found that:
• RATE_INTEREST_PRIMARY had 99.64%
• RATE_INTEREST_PRIVILEGED had 99.64% had of Null values
• Hence we dropped them
3. The external source data had some missing values , We impute them to zero
as the External agencies have not provided score for these customers
meaning the client's account was not prone to be a defaulter. Hence score
was assumed as zero.
4. Took average of EXT_SOURCE_1, EXT_SOURCE_2, EXT_SOURCE_3 columns
creating ext_sources column.
5. In previous_app, NAME_TYPE_SUITE had 49% missing values and does not
affect whether the client will default or not. Hence, we drop this column.
6. Defined a function null_percentage to calculate null values in the columns
from both the data sets.
7. Since data is imbalanced we have taken proportion of all the categories to
analyse the data and have used stacked bar plots as it enhances our
understanding.
8. Defined a function called stacker this function compares a categorical column
with our Target variable, it considers data imbalance and converts each
category into percentages and plots the stacked chart with their proportion.
9. Merged previous_app data set with application data set, to compare it with
our Target variable.
 DATA VISUALIZATION
• Univariate analysis on following variables,
1. Target
2. Income
3. Children count
• Bi-variate analysis on Target variable against the following,
1. Gender & age
2. Contract type
3. Average external score
4. Income & occupation type
5. Education type etc
• Multi-variate analysis on Target variable against the following,
1. Income and education type
2. Income and previous application status
 TARGET V/S GENDER
Inference:
• The percentage of Males that pay late installments is more than that of females.
• The percentage of Females paying on time is more than that of males.
 TARGET V/S CONTRACT TYPE
Inference:
• The clients with Cash loans tend to pay late as compared to the clients with
Revolving loans.
 TARGET V/S CAR
Inference:
• Percentage of people with No-Car and paying late installments are slightly more
than that of people with Car
 TARGET V/S AVG_EXT_SCORE
Inference:
• 50% client population who delay their installment payments have a low average
external score, and it ranges from 0.2-0.4 approximately.
• The clients who pay their installments on time have a moderate average score ranging
from 0.3-0.5 approximately.
• There are some clients who have received a very high score and they delay their
installments.
 TARGET V/S AMT INCOME
Inference:
• The clients with income less than 2 lakhs pa pay late installments among these
classes.
• The clients with income more that 6 lakhs pa i.e. Rich class is more likely to pay on
time than other classes.
 TARGET V/S INCOME TYPE
Inference:
• Amongst all the Income types, the Others(Maternity leaves, Students, Unemployed clients, etc.) are the
one who tend to pay late installments.
• The Businessman income types do not pay late installments.
• The working class also have a higher percentage of people in late paying installments which is 10%.
 TARGET V/S FAMILY STATUS
Inference:
• The clients who are Single/not married and the Civil marriage class tend to
pay late installments.
 TARGET V/S HOUSING TYPE
Inference:
• The clients who live in rented apartments and with parents tend to pay late
installments.
• The clients who stay in office apartments pay on time installments.
Inference:
• The people who do not provide the Document2 tend to pay late
installments. Hence it is advisable to make this document mandatory.
 TARGET V/S DOCUMENT 2
Inference:
• The people who provide mobile number tend to pay installments on time.
• Hence it is advisable to collect mobile number of the clients.
 TARGET V/S CLIENTS PROVIDING MOBILE NUMBERS
 TARGET V/S AGE
Inference:
• The clients with age below 25 tend to pay late installments.
• The clients with age of 65 and above pay the installments on time.
• The possible reason is that clients below age 25 are less financially stable as
compared to those above 65.
 TARGET V/S OCCUPATION TYPE
Inference:
• Low skill laborers , Waiters/barmen staff , security staff , cooking , cleaning staff , drivers, Laborers tend
to pay late installments.
• Most of the accountants, High skill tech staff and HR-staffs pay the installments on time.
• The obvious reason being that they represent the sectors with higher salary.
 TARGET V/S CNT_CHILDREN
Inference:
• The clients who have count of children greater than 5 tend to pay late installments.
• Most of the clients with count of children of 2 or 3 pay installments on time.
 TARGET V/S NAME_EDUCATION_TYPE
Inference:
• The clients with academic degree pay installments on time.
• The clients with lower secondary education pay late installments.
 MULTIVARIATE ANALYSIS ON NUMERIC VARIABLES
Inference:
• A positive high co-relation is seen between good's price and amount credit
• A positive high co-relation is seen between annuity amount and amount credit
• A positive high co-relation is seen between annuity amount and good's price
 PROPORTIONS OF CLIENTS BASED ON PREVIOUS APPLICATION STATUS
Inference:
• Out of the total loan applications only 63% were Approved.
• 17% were Refused loan and 19% applications were cancelled by the clients.
 HANDLING OUTLIERS
Inference:
• Outliers were observed in the annual income variable.
• 99% clients had their income less than 4.75 LPA
• Hence for analyzing the annual income, the analysis was limited to clients with annual
income less than 4.75
 TARGET V/S INCOME V/S EDUCATION TYPE
Inference:
• The clients with Education type as academic degree and income in range of
3-3.6 Lakhs pay late installments as compared to those with low income
 TARGET V/S NAME_CASH_LOAN_PURPOSE
Inference:
• The clients who previously took loan for the payments on other loan pay
late installments.
• Following them ,are the clients with Home/Office/Land Loan and personal
household expenses, they pay late installments
 TARGET V/S INCOME V/S PREVIOUS APPLICATION
STATUS
Inference:
• Clients who took loan for Business Development and annual income above
2.6 LPA pay late instalments.
 TARGET V/S PREVIOUS LOAN STATUS
Inference:
• The clients for whom the previous loan was Refused , pay
the installments late
 KEY INSIGHTS
• Following are the strong indicators of default
1. NAME_HOUSING_TYPE : Clients living in rented apartments
2. NAME_FAMILY_STATUS : Clients belonging to Civil marriage
and those who are single/married
3. NAME_INCOME_TYPE : Maternity leave , students,
Unemployed clients
4. FLAG_DOCUMENT_2 : The clients who do not provide
document 2
5. FLAG_MOBIL : The clients who do not provide mobile number
6. OCCUPATION_TYPE : Low skill, Laborer, Waiters, Barmen,
Security staff
7. CNT_CHILDREN : Positive co-relation between number of
children with the chance of client being a defaulter
8. NAME_EDUCATION_TYPE : Clients with lower secondary and
secondary/ secondary special and incomplete higher
9. EDUCATION_TYPE : Clients with academic degree and annual
income between 3-3.6 lakhs
10. CASH_LOAN_PURPOSE : Clients with previous loan purpose as
payment on other loans
• Following clients should be targeted
1. CODE_GENDER : Females
2. NAME_CONTRACT_TYPE : Clients with revolving loans
3. FLAG_CODE_CAR : Clients with car
4. AVG_EXT_SCORE : Clients with moderate external score
5. AMT_INCOME_TOTAL : Clients with annual income
greater than 6 lakhs
6. NAME_INCOME_TYPE : The businessmen and pensioners
7. FLAG_MOBIL :Clients who provide mobile number
8. DAYS_BIRTH :Clients with age of 65 and above
9. OCCUPATION_TYPE : accountants, High skill tech staff and
HR-staffs pay the installments on time
10. NAME_EDUCATION_TYPE : Clients with academic degree
 CONCLUSION
• Based on the inferences obtained, a credit score can be
set
• Variables which contributes towards the chances of a client
being a defaulter will be rated a low score
• The variables contributing towards the chances of a client paying
the installments on time, will be rated with high credit scores
• Based on the final credit score, bank can take following
decision,
1. Grant loan to clients with healthy overall credit score
2. Grant loan at higher interest rates to clients with
comparatively low credit scores
3. Reject loan for clients with extremely low credit score
THANK YOU!

More Related Content

What's hot

Home credit company risk presentation
Home credit company risk presentationHome credit company risk presentation
Home credit company risk presentation
Shreya Solanki
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentation
Mithul Murugaadev
 
Credit default risk
Credit default riskCredit default risk
Credit default risk
chs71
 
Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default prediction
ALTEN Calsoft Labs
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
Vishva Abeyrathne
 
Lead scoring case study
Lead scoring case studyLead scoring case study
Lead scoring case study
Shreya Solanki
 
Storytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsxStorytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsx
Devanshi358374
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan Data
Deep Borkar
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
Sara Hooker
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
Carolyn Knight
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
Venkata Reddy Konasani
 
Ingredients based - Recipe recommendation engine
Ingredients based - Recipe recommendation engineIngredients based - Recipe recommendation engine
Ingredients based - Recipe recommendation engine
Bharat Gandhi
 
Lead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxLead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptx
RachnaGoel10
 
Credit score and reports
Credit score and reportsCredit score and reports
Credit score and reportskdepodesta
 
Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.
Kuldeep Mahani
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
Saleesh Satheeshchandran
 
Ways to Reduce the Customer Churn Rate
Ways to Reduce the Customer Churn RateWays to Reduce the Customer Churn Rate
Ways to Reduce the Customer Churn Rate
FORMCEPT
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
PinintiHarishReddy
 
Lead Scoring Case Study
Lead Scoring Case StudyLead Scoring Case Study
Lead Scoring Case Study
LumbiniSardare
 
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language
Aayush Kumar
 

What's hot (20)

Home credit company risk presentation
Home credit company risk presentationHome credit company risk presentation
Home credit company risk presentation
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentation
 
Credit default risk
Credit default riskCredit default risk
Credit default risk
 
Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default prediction
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 
Lead scoring case study
Lead scoring case studyLead scoring case study
Lead scoring case study
 
Storytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsxStorytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsx
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan Data
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
 
Ingredients based - Recipe recommendation engine
Ingredients based - Recipe recommendation engineIngredients based - Recipe recommendation engine
Ingredients based - Recipe recommendation engine
 
Lead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxLead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptx
 
Credit score and reports
Credit score and reportsCredit score and reports
Credit score and reports
 
Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
Ways to Reduce the Customer Churn Rate
Ways to Reduce the Customer Churn RateWays to Reduce the Customer Churn Rate
Ways to Reduce the Customer Churn Rate
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
 
Lead Scoring Case Study
Lead Scoring Case StudyLead Scoring Case Study
Lead Scoring Case Study
 
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language
 

Similar to Exploratory Data Analysis For Credit Risk Assesment

Banking Credit Risk- EDA.pptx
Banking Credit Risk- EDA.pptxBanking Credit Risk- EDA.pptx
Banking Credit Risk- EDA.pptx
rishikakhanna7
 
Credit bureau
Credit bureauCredit bureau
Credit bureau
ThiyagarajanSM
 
Estimating Supply and Demand for Microcredit
Estimating Supply and Demand for MicrocreditEstimating Supply and Demand for Microcredit
Estimating Supply and Demand for Microcredit
Friedman Associates
 
ROLE OF credit score WHILE SanctionING LOAN .pptx
ROLE OF credit score  WHILE SanctionING LOAN .pptxROLE OF credit score  WHILE SanctionING LOAN .pptx
ROLE OF credit score WHILE SanctionING LOAN .pptx
rekhabawa2
 
Customer Lifetime Value
Customer Lifetime ValueCustomer Lifetime Value
Customer Lifetime Value
JennaToler
 
Lending unit 4
Lending unit 4Lending unit 4
Lending unit 4
UNBFS
 
CFPB Small Dollar Lending Exam Procedures Module 2 ECOA, FCRA, TILA and Othe...
CFPB Small Dollar Lending Exam Procedures  Module 2 ECOA, FCRA, TILA and Othe...CFPB Small Dollar Lending Exam Procedures  Module 2 ECOA, FCRA, TILA and Othe...
CFPB Small Dollar Lending Exam Procedures Module 2 ECOA, FCRA, TILA and Othe...Justin Hosie
 
Debt recovery techniques
Debt recovery techniques Debt recovery techniques
Debt recovery techniques
Humayra Trina
 
globalca-panel-final
globalca-panel-finalglobalca-panel-final
globalca-panel-finalJim Faith
 
Seminar fico and credit scores presentation new for posting
Seminar fico and credit scores presentation new for postingSeminar fico and credit scores presentation new for posting
Seminar fico and credit scores presentation new for postingnokio
 
Credit Repair Education for Libraries 6.15.19
Credit Repair Education for Libraries  6.15.19Credit Repair Education for Libraries  6.15.19
Credit Repair Education for Libraries 6.15.19
Victor Johnson
 
sandip nayek CRM ASSIGNMENT.PPTX 2023.pptx
sandip nayek CRM ASSIGNMENT.PPTX 2023.pptxsandip nayek CRM ASSIGNMENT.PPTX 2023.pptx
sandip nayek CRM ASSIGNMENT.PPTX 2023.pptx
SANDIPNAYEK1
 
Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They WorkUnderstanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
Absolute Home Mortgage Corp.
 
Southern university bangladesh.
Southern university bangladesh.Southern university bangladesh.
Southern university bangladesh.
Tasrif masruf khan
 
Southern university bangladesh.
Southern university bangladesh.Southern university bangladesh.
Southern university bangladesh.
Tasrif masruf khan
 
How a Credit Union Can Stay Off the CFPB's Radar
How a Credit Union Can Stay Off the CFPB's RadarHow a Credit Union Can Stay Off the CFPB's Radar
How a Credit Union Can Stay Off the CFPB's Radar
Silver cloud
 
Accounts Payables Specialist
Accounts Payables SpecialistAccounts Payables Specialist
Accounts Payables Specialist
Angela Denese Stephens
 
Group 1 p53
Group 1 p53Group 1 p53
Group 1 p53
Amit Kulkarni
 

Similar to Exploratory Data Analysis For Credit Risk Assesment (20)

Banking Credit Risk- EDA.pptx
Banking Credit Risk- EDA.pptxBanking Credit Risk- EDA.pptx
Banking Credit Risk- EDA.pptx
 
retailing-credit card
retailing-credit cardretailing-credit card
retailing-credit card
 
Credit bureau
Credit bureauCredit bureau
Credit bureau
 
Estimating Supply and Demand for Microcredit
Estimating Supply and Demand for MicrocreditEstimating Supply and Demand for Microcredit
Estimating Supply and Demand for Microcredit
 
ROLE OF credit score WHILE SanctionING LOAN .pptx
ROLE OF credit score  WHILE SanctionING LOAN .pptxROLE OF credit score  WHILE SanctionING LOAN .pptx
ROLE OF credit score WHILE SanctionING LOAN .pptx
 
Customer Lifetime Value
Customer Lifetime ValueCustomer Lifetime Value
Customer Lifetime Value
 
Lending unit 4
Lending unit 4Lending unit 4
Lending unit 4
 
CFPB Small Dollar Lending Exam Procedures Module 2 ECOA, FCRA, TILA and Othe...
CFPB Small Dollar Lending Exam Procedures  Module 2 ECOA, FCRA, TILA and Othe...CFPB Small Dollar Lending Exam Procedures  Module 2 ECOA, FCRA, TILA and Othe...
CFPB Small Dollar Lending Exam Procedures Module 2 ECOA, FCRA, TILA and Othe...
 
Debt recovery techniques
Debt recovery techniques Debt recovery techniques
Debt recovery techniques
 
globalca-panel-final
globalca-panel-finalglobalca-panel-final
globalca-panel-final
 
Seminar fico and credit scores presentation new for posting
Seminar fico and credit scores presentation new for postingSeminar fico and credit scores presentation new for posting
Seminar fico and credit scores presentation new for posting
 
Raghav resume latest
Raghav resume latestRaghav resume latest
Raghav resume latest
 
Credit Repair Education for Libraries 6.15.19
Credit Repair Education for Libraries  6.15.19Credit Repair Education for Libraries  6.15.19
Credit Repair Education for Libraries 6.15.19
 
sandip nayek CRM ASSIGNMENT.PPTX 2023.pptx
sandip nayek CRM ASSIGNMENT.PPTX 2023.pptxsandip nayek CRM ASSIGNMENT.PPTX 2023.pptx
sandip nayek CRM ASSIGNMENT.PPTX 2023.pptx
 
Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They WorkUnderstanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
 
Southern university bangladesh.
Southern university bangladesh.Southern university bangladesh.
Southern university bangladesh.
 
Southern university bangladesh.
Southern university bangladesh.Southern university bangladesh.
Southern university bangladesh.
 
How a Credit Union Can Stay Off the CFPB's Radar
How a Credit Union Can Stay Off the CFPB's RadarHow a Credit Union Can Stay Off the CFPB's Radar
How a Credit Union Can Stay Off the CFPB's Radar
 
Accounts Payables Specialist
Accounts Payables SpecialistAccounts Payables Specialist
Accounts Payables Specialist
 
Group 1 p53
Group 1 p53Group 1 p53
Group 1 p53
 

Recently uploaded

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 

Recently uploaded (20)

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 

Exploratory Data Analysis For Credit Risk Assesment

  • 1. THE CREDIT RISK ANALYTICS EDA Case Study By, • Mr. Prathmesh Pise • Mr. Vishal Patil
  • 2.  CONTENTS  Problem statement  Flow Chart  Importing and Cleaning1  Importing and Cleaning2  Approach  Data Visualization  Significant Insights
  • 3.  PROBLEM STATEMENT: 1. Aim is to identify patterns which indicate if a client had difficulty paying their installments which will help the bank in taking following actions: • Denying the loan • Reducing the amount of loan • Lending (to risky applicants) at a higher interest rate, etc. 2. Identifying the co-relation between dependent variables with target variable 3. To ensure that the consumers capable of repaying the loan are not rejected
  • 5. 1. Imported pandas, matplotlib and seaborn library for loading the data and data visualization 2. Target variable is flag variable weather a clients pays instalments on time or not 3. Two data frames were created from csv files namely, • Application data- Contains all the information of the client at the time of application • Previous application data - contains information about the client’s previous loan data 4. Dropped unnecessary columns like the one belonging to client’s house dimensions 5. Achieved 40% memory usage reduction by changing the data types of categorical variables from object to category.  IMPORTING AND CLEANING1:
  • 6.  IMPORTING AND CLEANING2: 1. Imported required data set for previous application data set: • Previous application data set as previous_app 2. Cleaned the data by removing columns that were less significant for analysis and were prone to containing erroneous data, namely, • WEEKDAY_APPR_PROCESS_START • HOUR_APPR_PROCESS_START, etc. 3. Achieved 40% memory usage reduction by changing the data types of categorical variables from object to category and dropping unnecessary columns
  • 7. HANDLING DATA AND MISSING VALUES: 1. Checked for null values in application_data and found that: • OWN_CAR_AGE had 65.99%, OCCUPATION_TYPE had 31.35% and EXT_SOURCE_1 had 56.38% missing values • Hence decided to drop these columns 2. We also checked for null values in previous_app and found that: • RATE_INTEREST_PRIMARY had 99.64% • RATE_INTEREST_PRIVILEGED had 99.64% had of Null values • Hence we dropped them 3. The external source data had some missing values , We impute them to zero as the External agencies have not provided score for these customers meaning the client's account was not prone to be a defaulter. Hence score was assumed as zero. 4. Took average of EXT_SOURCE_1, EXT_SOURCE_2, EXT_SOURCE_3 columns creating ext_sources column. 5. In previous_app, NAME_TYPE_SUITE had 49% missing values and does not affect whether the client will default or not. Hence, we drop this column.
  • 8. 6. Defined a function null_percentage to calculate null values in the columns from both the data sets. 7. Since data is imbalanced we have taken proportion of all the categories to analyse the data and have used stacked bar plots as it enhances our understanding. 8. Defined a function called stacker this function compares a categorical column with our Target variable, it considers data imbalance and converts each category into percentages and plots the stacked chart with their proportion. 9. Merged previous_app data set with application data set, to compare it with our Target variable.
  • 9.  DATA VISUALIZATION • Univariate analysis on following variables, 1. Target 2. Income 3. Children count • Bi-variate analysis on Target variable against the following, 1. Gender & age 2. Contract type 3. Average external score 4. Income & occupation type 5. Education type etc • Multi-variate analysis on Target variable against the following, 1. Income and education type 2. Income and previous application status
  • 10.  TARGET V/S GENDER Inference: • The percentage of Males that pay late installments is more than that of females. • The percentage of Females paying on time is more than that of males.
  • 11.  TARGET V/S CONTRACT TYPE Inference: • The clients with Cash loans tend to pay late as compared to the clients with Revolving loans.
  • 12.  TARGET V/S CAR Inference: • Percentage of people with No-Car and paying late installments are slightly more than that of people with Car
  • 13.  TARGET V/S AVG_EXT_SCORE Inference: • 50% client population who delay their installment payments have a low average external score, and it ranges from 0.2-0.4 approximately. • The clients who pay their installments on time have a moderate average score ranging from 0.3-0.5 approximately. • There are some clients who have received a very high score and they delay their installments.
  • 14.  TARGET V/S AMT INCOME Inference: • The clients with income less than 2 lakhs pa pay late installments among these classes. • The clients with income more that 6 lakhs pa i.e. Rich class is more likely to pay on time than other classes.
  • 15.  TARGET V/S INCOME TYPE Inference: • Amongst all the Income types, the Others(Maternity leaves, Students, Unemployed clients, etc.) are the one who tend to pay late installments. • The Businessman income types do not pay late installments. • The working class also have a higher percentage of people in late paying installments which is 10%.
  • 16.  TARGET V/S FAMILY STATUS Inference: • The clients who are Single/not married and the Civil marriage class tend to pay late installments.
  • 17.  TARGET V/S HOUSING TYPE Inference: • The clients who live in rented apartments and with parents tend to pay late installments. • The clients who stay in office apartments pay on time installments.
  • 18. Inference: • The people who do not provide the Document2 tend to pay late installments. Hence it is advisable to make this document mandatory.  TARGET V/S DOCUMENT 2
  • 19. Inference: • The people who provide mobile number tend to pay installments on time. • Hence it is advisable to collect mobile number of the clients.  TARGET V/S CLIENTS PROVIDING MOBILE NUMBERS
  • 20.  TARGET V/S AGE Inference: • The clients with age below 25 tend to pay late installments. • The clients with age of 65 and above pay the installments on time. • The possible reason is that clients below age 25 are less financially stable as compared to those above 65.
  • 21.  TARGET V/S OCCUPATION TYPE Inference: • Low skill laborers , Waiters/barmen staff , security staff , cooking , cleaning staff , drivers, Laborers tend to pay late installments. • Most of the accountants, High skill tech staff and HR-staffs pay the installments on time. • The obvious reason being that they represent the sectors with higher salary.
  • 22.  TARGET V/S CNT_CHILDREN Inference: • The clients who have count of children greater than 5 tend to pay late installments. • Most of the clients with count of children of 2 or 3 pay installments on time.
  • 23.  TARGET V/S NAME_EDUCATION_TYPE Inference: • The clients with academic degree pay installments on time. • The clients with lower secondary education pay late installments.
  • 24.  MULTIVARIATE ANALYSIS ON NUMERIC VARIABLES Inference: • A positive high co-relation is seen between good's price and amount credit • A positive high co-relation is seen between annuity amount and amount credit • A positive high co-relation is seen between annuity amount and good's price
  • 25.  PROPORTIONS OF CLIENTS BASED ON PREVIOUS APPLICATION STATUS Inference: • Out of the total loan applications only 63% were Approved. • 17% were Refused loan and 19% applications were cancelled by the clients.
  • 26.  HANDLING OUTLIERS Inference: • Outliers were observed in the annual income variable. • 99% clients had their income less than 4.75 LPA • Hence for analyzing the annual income, the analysis was limited to clients with annual income less than 4.75
  • 27.  TARGET V/S INCOME V/S EDUCATION TYPE Inference: • The clients with Education type as academic degree and income in range of 3-3.6 Lakhs pay late installments as compared to those with low income
  • 28.  TARGET V/S NAME_CASH_LOAN_PURPOSE Inference: • The clients who previously took loan for the payments on other loan pay late installments. • Following them ,are the clients with Home/Office/Land Loan and personal household expenses, they pay late installments
  • 29.  TARGET V/S INCOME V/S PREVIOUS APPLICATION STATUS Inference: • Clients who took loan for Business Development and annual income above 2.6 LPA pay late instalments.
  • 30.  TARGET V/S PREVIOUS LOAN STATUS Inference: • The clients for whom the previous loan was Refused , pay the installments late
  • 31.  KEY INSIGHTS • Following are the strong indicators of default 1. NAME_HOUSING_TYPE : Clients living in rented apartments 2. NAME_FAMILY_STATUS : Clients belonging to Civil marriage and those who are single/married 3. NAME_INCOME_TYPE : Maternity leave , students, Unemployed clients 4. FLAG_DOCUMENT_2 : The clients who do not provide document 2 5. FLAG_MOBIL : The clients who do not provide mobile number 6. OCCUPATION_TYPE : Low skill, Laborer, Waiters, Barmen, Security staff 7. CNT_CHILDREN : Positive co-relation between number of children with the chance of client being a defaulter 8. NAME_EDUCATION_TYPE : Clients with lower secondary and secondary/ secondary special and incomplete higher 9. EDUCATION_TYPE : Clients with academic degree and annual income between 3-3.6 lakhs 10. CASH_LOAN_PURPOSE : Clients with previous loan purpose as payment on other loans • Following clients should be targeted 1. CODE_GENDER : Females 2. NAME_CONTRACT_TYPE : Clients with revolving loans 3. FLAG_CODE_CAR : Clients with car 4. AVG_EXT_SCORE : Clients with moderate external score 5. AMT_INCOME_TOTAL : Clients with annual income greater than 6 lakhs 6. NAME_INCOME_TYPE : The businessmen and pensioners 7. FLAG_MOBIL :Clients who provide mobile number 8. DAYS_BIRTH :Clients with age of 65 and above 9. OCCUPATION_TYPE : accountants, High skill tech staff and HR-staffs pay the installments on time 10. NAME_EDUCATION_TYPE : Clients with academic degree
  • 32.  CONCLUSION • Based on the inferences obtained, a credit score can be set • Variables which contributes towards the chances of a client being a defaulter will be rated a low score • The variables contributing towards the chances of a client paying the installments on time, will be rated with high credit scores • Based on the final credit score, bank can take following decision, 1. Grant loan to clients with healthy overall credit score 2. Grant loan at higher interest rates to clients with comparatively low credit scores 3. Reject loan for clients with extremely low credit score