SlideShare a Scribd company logo
1 of 19
Download to read offline
Credit EDACaseStudy
Prepared by:
Santh Raul
Ramlal Naik
Date: 22/06/2020
 Number of columns having null
value more than 50% : 41 Nos
These columns should be
dropped.
 Number of columns having null
value less than 15% : 13 Nos
These columns shall be imputed
with suitable values which shall
be explained subsequently
For analysis of imputation
selected 7 variables.
Columns having Null value:
 Continuous variables:
'EXT_SOURCE_2‘
'AMT_GOODS_PRICE‘
• Categorical variables:
'OBS_30_CNT_SOCIAL_CIRCLE','
OBS_60_CNT_SOCIAL_CIRCLE',
'DEF_60_CNT_SOCIAL_CIRCLE',
‘DEF_30_CNT_SOCIAL_CIRCLE','
NAME_TYPE_SUITE
• For 'EXT_SOURCE_2' there is no
outliers present and missing
values can be imputed with mean
or median (median: 0.565)
 There are high number of outliers
present in the
AMT_GOODS_PRICE data. Hence
it is recommended to impute data
with Median value i.e. 450000
For categorical variable the value which should be imputed should be the maximum in frequency.
So the value to be imputed are:
NAME_TYPE_SUITE: Unaccompanied
OBS_30_CNT_SOCIAL_CIRCLE: 0.0
DEF_30_CNT_SOCIAL_CIRCLE: 0.0
OBS_60_CNT_SOCIAL_CIRCLE: 0.0
DEF_60_CNT_SOCIAL_CIRCLE: 0.0
Data Imputation analysis for columns having <15% null value:
Checking the outlier for
numerical variables:
 The first quartile almost missing
for CNT_CHILDREN that means
most of the data are present in
the first quartile.
 There is single high value data
point as outlier present in
AMT_INCOME_TOTAL and
DAYS_EMPLOYED. Removal
this point will drastically impact
the box plot for further analysis.
 The first quartiles is slim
compare to third quartile for
AMT_CREDIT,AMT_ANNUITY,
DAYS_REGISTRATION. This
mean data are skewed towards
first quartile.
AMT_INCOME_RANGE :
 The people having 100000-
200000 are having higher
number of loan and also
having higher in defaulter
 The income segment having
>500000 are having less
defaulter.
AMT_CREDIT_RANGE:
 The people having <100000
loan are less defaulter.
 income having more than
>100000 are almost equal %
of loan defaulter
Univariate analysis for categorical variables
NAME_INCOME_TYPE:
 Student pensioner and business
have higher percentage of loan
repayment.
 Working, State servant and
Commercial associates have higher
default percentage.
 Maternity category is significantly
higher problem in repayment.
NAME_CONTRACT_TYPE
 For contract type ‘cash loans’ is
having higher number of credits
than ‘Revolving loans’ contract type.
 From the graphs we can see that the
Revolving loans are small amount
compared to Cash loans but the %
of non payment for the revolving
loans are comparatively high.
Univariate analysis for categorical variables
Univariate analysis for categorical variables
CODE_GENDER:
 The % of defaulters are more in
Male than Female
FLAG_OWN_CAR:
 The person owning car is having
higher percentage of defaulter.
Univariate analysis for continuous variables
 Days Birth:The people having
higher age are having higher
probability of repayment.
 Some outliers are observed in In
'AMT_ANNUITY','AMT_GOODS_P
RICE','DAYS_EMPLOYED',
DAYS_LAST_PHONE_CHANGE in
the dataset.
 Less outlier observed in Days Birth
 DAYS_EMPLOYED. Removal this
point will drastically impact the
box plot for further analysis.
Univariate analysis for continuous variables
 Less outlier observed in
DAYS_ID_PUBLISH
 1st quartile is smaller than third
quartile in In
'AMT_ANNUITY','AMT_GOODS_P
RICE',
DAYS_LAST_PHONE_CHANGE.
 In DAYS_ID_PUBLISH: people
changing ID in recent days are
relatively prone to be default.
Bivariate analysis for numerical variables – Target 0 (Client having no payment difficulties)
 Family status of 'civil marriage',
'marriage' and 'separated' of
Academic degree education are
having higher number of credits
than others.
 Also, higher education of family
status of 'marriage', 'single' and
'civil marriage' are having more
outliers.
 Civil marriage for Academic degree
is having most of the credits in the
third quartile.
 In Education type 'Higher
education' the income amount is
mostly equal with family status. It
does contain many outliers.
 Less outlier are having for
Academic degree but there income
amount is little higher that Higher
education.
 Lower secondary of civil marriage
family status are have less income
amount than others
Bivariate analysis for numerical variables – Target 0 (Client having no payment difficulties)
Bivariate analysis for numerical variables – Target 1 (Client having payment difficulties)
 Observations are Quite similar
withTarget 0
 Family status of 'civil marriage',
'marriage' and 'separated' of
Academic degree education are
having higher number of credits
than others.
 Most of the outliers are from
Education type 'Higher education'
and 'Secondary'.
 Civil marriage for Academic degree
is having most of the credits in the
third quartile.
Bivariate analysis for numerical variables – Target 1 (Client having payment difficulties)
 There is also have some similarity
withTarget 0
 Education type 'Higher education'
the income amount is mostly
equal with family status.
 Less outlier are having for
Academic degree but there
income amount is little higher that
Higher education.
 Lower secondary are have less
income amount than others.
Target0:
Target1:
 From the correlation analysis it
is inferred that the highest
correlation (1.0) is between
(OBS_60_CNT_SOCIAL_CIRCLE
with
OBS_30_CNT_SOCIAL_CIRCLE)
and (FLOORSMAX_MEDI with
FLOORSMAX_AVG) which is
same for both the data set.
Correlation
Univariate analysis for combined dataset (Distribution of contract status with purpose)
Most rejection of loans came from purpose 'repairs'. For education purposes we have equal number of approves and rejection Paying other loans and
buying a new car is having significant higher rejection than approves.
Univariate analysis for combined dataset (Distribution of the purpose with target)
Loan purposes with 'Repairs' are facing more difficulties in payment on time. There are few places where loan payment is significant higher than facing
difficulties. They are 'Buying a garage', 'Business development', 'Buying land', 'Buying a new car' and 'Education' Hence we can focus on these purposes for
which the client is having for minimal payment difficulties.
Bivariate analysis for combined dataset
 The credit amount of Loan
purposes like 'Buying a
home', 'Buying a land',
'Buying a new car' and
'Building a house' is
higher.
 Income type of state
servants have a significant
amount of credit applied
 Money for third person or
a Hobby is having less
credits applied .
 For Housing type, office
apartment is having higher
credit of target 0 and co-op
apartment is having higher
credit of target 1.
 So, we can conclude that
bank should avoid giving
loans to the housing type of
co-op apartment as they are
having difficulties in
payment.
 Bank can focus mostly on
housing type with parents or
Houseapartment or
municipal apartment for
successful payments.
Bivariate analysis for combined dataset
 Banks should focus more on contract type ‘Student’ ,’pensioner’ and ‘Businessman’ with
housing ‘type other than ‘Co-op apartment’ for successful payments.
 Banks should focus less on income type ‘Working’ as they are having most number of
unsuccessful payments.
 In loan purpose ‘Repairs’:
 Although having higher number of rejection in loan purposes with 'Repairs' there are
observed difficulties in payment on time.
 There are few places where loan payment is delay is significantly high.
 Bank should keep continue to caution while giving loan for this purpose.
 Bank should avoid giving loans to the housing type of co-op apartment as they are having
difficulties in payment.
 Bank can focus mostly on housing type ‘with parents’ , ‘Houseapartment’ and ‘municipal
apartment’ for successful payments.
Conclusion/ Recommendation:

More Related Content

Similar to EDA_ Bank_Loan_Case_Study_PPT.pdf

Executive MPH Program PBHL 600ES – Health Management a.docx
Executive MPH Program   PBHL 600ES – Health Management a.docxExecutive MPH Program   PBHL 600ES – Health Management a.docx
Executive MPH Program PBHL 600ES – Health Management a.docxSANSKAR20
 
Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Mira McKee
 
Exploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentExploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentVishalPatil527
 
DAT 520 Milestone Three Guidelines and Rubric In this m.docx
DAT 520 Milestone Three Guidelines and Rubric  In this m.docxDAT 520 Milestone Three Guidelines and Rubric  In this m.docx
DAT 520 Milestone Three Guidelines and Rubric In this m.docxsimonithomas47935
 
The Credit Score Present and Future
The Credit Score Present and FutureThe Credit Score Present and Future
The Credit Score Present and FutureUrjanet
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
 
Historical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card SpendHistorical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card SpendExperian
 
Balance Transfers White Paper
Balance Transfers White PaperBalance Transfers White Paper
Balance Transfers White PaperExperian
 
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities WNS Global Services
 
Pilgrim Bank Case Study
Pilgrim Bank Case StudyPilgrim Bank Case Study
Pilgrim Bank Case StudyElyse Schaefer
 
Barclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistBarclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistNaveen Kumar
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docxShiraPrater50
 
fast publication journals
fast publication journalsfast publication journals
fast publication journalsrikaseorika
 
Introduction to Risk and Efficiency among CDFIs: A Statistical Evaluation usi...
Introduction to Risk and Efficiency among CDFIs: A Statistical Evaluation usi...Introduction to Risk and Efficiency among CDFIs: A Statistical Evaluation usi...
Introduction to Risk and Efficiency among CDFIs: A Statistical Evaluation usi...nc_initiative
 
CHAPTER OVERVIEWThis chapter defines consumer credit and analyze.docx
CHAPTER OVERVIEWThis chapter defines consumer credit and analyze.docxCHAPTER OVERVIEWThis chapter defines consumer credit and analyze.docx
CHAPTER OVERVIEWThis chapter defines consumer credit and analyze.docxchristinemaritza
 
Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011gertjankaart
 

Similar to EDA_ Bank_Loan_Case_Study_PPT.pdf (20)

Executive MPH Program PBHL 600ES – Health Management a.docx
Executive MPH Program   PBHL 600ES – Health Management a.docxExecutive MPH Program   PBHL 600ES – Health Management a.docx
Executive MPH Program PBHL 600ES – Health Management a.docx
 
Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)
 
Exploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentExploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk Assesment
 
DAT 520 Milestone Three Guidelines and Rubric In this m.docx
DAT 520 Milestone Three Guidelines and Rubric  In this m.docxDAT 520 Milestone Three Guidelines and Rubric  In this m.docx
DAT 520 Milestone Three Guidelines and Rubric In this m.docx
 
The Credit Score Present and Future
The Credit Score Present and FutureThe Credit Score Present and Future
The Credit Score Present and Future
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Historical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card SpendHistorical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card Spend
 
Credit Scoring Capstone Project- Pallavi Mohanty.pptx
Credit Scoring Capstone Project- Pallavi Mohanty.pptxCredit Scoring Capstone Project- Pallavi Mohanty.pptx
Credit Scoring Capstone Project- Pallavi Mohanty.pptx
 
Balance Transfers White Paper
Balance Transfers White PaperBalance Transfers White Paper
Balance Transfers White Paper
 
Credit EDA case study
Credit EDA case studyCredit EDA case study
Credit EDA case study
 
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
 
Conquer your credit score
Conquer your credit scoreConquer your credit score
Conquer your credit score
 
Pilgrim Bank Case Study
Pilgrim Bank Case StudyPilgrim Bank Case Study
Pilgrim Bank Case Study
 
Four Mortgage Metrics that Matter Handouts
Four Mortgage Metrics that Matter HandoutsFour Mortgage Metrics that Matter Handouts
Four Mortgage Metrics that Matter Handouts
 
Barclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistBarclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National Finalist
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
 
fast publication journals
fast publication journalsfast publication journals
fast publication journals
 
Introduction to Risk and Efficiency among CDFIs: A Statistical Evaluation usi...
Introduction to Risk and Efficiency among CDFIs: A Statistical Evaluation usi...Introduction to Risk and Efficiency among CDFIs: A Statistical Evaluation usi...
Introduction to Risk and Efficiency among CDFIs: A Statistical Evaluation usi...
 
CHAPTER OVERVIEWThis chapter defines consumer credit and analyze.docx
CHAPTER OVERVIEWThis chapter defines consumer credit and analyze.docxCHAPTER OVERVIEWThis chapter defines consumer credit and analyze.docx
CHAPTER OVERVIEWThis chapter defines consumer credit and analyze.docx
 
Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011
 

Recently uploaded

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Recently uploaded (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

EDA_ Bank_Loan_Case_Study_PPT.pdf

  • 1. Credit EDACaseStudy Prepared by: Santh Raul Ramlal Naik Date: 22/06/2020
  • 2.  Number of columns having null value more than 50% : 41 Nos These columns should be dropped.  Number of columns having null value less than 15% : 13 Nos These columns shall be imputed with suitable values which shall be explained subsequently For analysis of imputation selected 7 variables. Columns having Null value:
  • 3.  Continuous variables: 'EXT_SOURCE_2‘ 'AMT_GOODS_PRICE‘ • Categorical variables: 'OBS_30_CNT_SOCIAL_CIRCLE',' OBS_60_CNT_SOCIAL_CIRCLE', 'DEF_60_CNT_SOCIAL_CIRCLE', ‘DEF_30_CNT_SOCIAL_CIRCLE',' NAME_TYPE_SUITE • For 'EXT_SOURCE_2' there is no outliers present and missing values can be imputed with mean or median (median: 0.565)  There are high number of outliers present in the AMT_GOODS_PRICE data. Hence it is recommended to impute data with Median value i.e. 450000 For categorical variable the value which should be imputed should be the maximum in frequency. So the value to be imputed are: NAME_TYPE_SUITE: Unaccompanied OBS_30_CNT_SOCIAL_CIRCLE: 0.0 DEF_30_CNT_SOCIAL_CIRCLE: 0.0 OBS_60_CNT_SOCIAL_CIRCLE: 0.0 DEF_60_CNT_SOCIAL_CIRCLE: 0.0 Data Imputation analysis for columns having <15% null value:
  • 4. Checking the outlier for numerical variables:  The first quartile almost missing for CNT_CHILDREN that means most of the data are present in the first quartile.  There is single high value data point as outlier present in AMT_INCOME_TOTAL and DAYS_EMPLOYED. Removal this point will drastically impact the box plot for further analysis.  The first quartiles is slim compare to third quartile for AMT_CREDIT,AMT_ANNUITY, DAYS_REGISTRATION. This mean data are skewed towards first quartile.
  • 5. AMT_INCOME_RANGE :  The people having 100000- 200000 are having higher number of loan and also having higher in defaulter  The income segment having >500000 are having less defaulter. AMT_CREDIT_RANGE:  The people having <100000 loan are less defaulter.  income having more than >100000 are almost equal % of loan defaulter Univariate analysis for categorical variables
  • 6. NAME_INCOME_TYPE:  Student pensioner and business have higher percentage of loan repayment.  Working, State servant and Commercial associates have higher default percentage.  Maternity category is significantly higher problem in repayment. NAME_CONTRACT_TYPE  For contract type ‘cash loans’ is having higher number of credits than ‘Revolving loans’ contract type.  From the graphs we can see that the Revolving loans are small amount compared to Cash loans but the % of non payment for the revolving loans are comparatively high. Univariate analysis for categorical variables
  • 7. Univariate analysis for categorical variables CODE_GENDER:  The % of defaulters are more in Male than Female FLAG_OWN_CAR:  The person owning car is having higher percentage of defaulter.
  • 8. Univariate analysis for continuous variables  Days Birth:The people having higher age are having higher probability of repayment.  Some outliers are observed in In 'AMT_ANNUITY','AMT_GOODS_P RICE','DAYS_EMPLOYED', DAYS_LAST_PHONE_CHANGE in the dataset.  Less outlier observed in Days Birth  DAYS_EMPLOYED. Removal this point will drastically impact the box plot for further analysis.
  • 9. Univariate analysis for continuous variables  Less outlier observed in DAYS_ID_PUBLISH  1st quartile is smaller than third quartile in In 'AMT_ANNUITY','AMT_GOODS_P RICE', DAYS_LAST_PHONE_CHANGE.  In DAYS_ID_PUBLISH: people changing ID in recent days are relatively prone to be default.
  • 10. Bivariate analysis for numerical variables – Target 0 (Client having no payment difficulties)  Family status of 'civil marriage', 'marriage' and 'separated' of Academic degree education are having higher number of credits than others.  Also, higher education of family status of 'marriage', 'single' and 'civil marriage' are having more outliers.  Civil marriage for Academic degree is having most of the credits in the third quartile.
  • 11.  In Education type 'Higher education' the income amount is mostly equal with family status. It does contain many outliers.  Less outlier are having for Academic degree but there income amount is little higher that Higher education.  Lower secondary of civil marriage family status are have less income amount than others Bivariate analysis for numerical variables – Target 0 (Client having no payment difficulties)
  • 12. Bivariate analysis for numerical variables – Target 1 (Client having payment difficulties)  Observations are Quite similar withTarget 0  Family status of 'civil marriage', 'marriage' and 'separated' of Academic degree education are having higher number of credits than others.  Most of the outliers are from Education type 'Higher education' and 'Secondary'.  Civil marriage for Academic degree is having most of the credits in the third quartile.
  • 13. Bivariate analysis for numerical variables – Target 1 (Client having payment difficulties)  There is also have some similarity withTarget 0  Education type 'Higher education' the income amount is mostly equal with family status.  Less outlier are having for Academic degree but there income amount is little higher that Higher education.  Lower secondary are have less income amount than others.
  • 14. Target0: Target1:  From the correlation analysis it is inferred that the highest correlation (1.0) is between (OBS_60_CNT_SOCIAL_CIRCLE with OBS_30_CNT_SOCIAL_CIRCLE) and (FLOORSMAX_MEDI with FLOORSMAX_AVG) which is same for both the data set. Correlation
  • 15. Univariate analysis for combined dataset (Distribution of contract status with purpose) Most rejection of loans came from purpose 'repairs'. For education purposes we have equal number of approves and rejection Paying other loans and buying a new car is having significant higher rejection than approves.
  • 16. Univariate analysis for combined dataset (Distribution of the purpose with target) Loan purposes with 'Repairs' are facing more difficulties in payment on time. There are few places where loan payment is significant higher than facing difficulties. They are 'Buying a garage', 'Business development', 'Buying land', 'Buying a new car' and 'Education' Hence we can focus on these purposes for which the client is having for minimal payment difficulties.
  • 17. Bivariate analysis for combined dataset  The credit amount of Loan purposes like 'Buying a home', 'Buying a land', 'Buying a new car' and 'Building a house' is higher.  Income type of state servants have a significant amount of credit applied  Money for third person or a Hobby is having less credits applied .
  • 18.  For Housing type, office apartment is having higher credit of target 0 and co-op apartment is having higher credit of target 1.  So, we can conclude that bank should avoid giving loans to the housing type of co-op apartment as they are having difficulties in payment.  Bank can focus mostly on housing type with parents or Houseapartment or municipal apartment for successful payments. Bivariate analysis for combined dataset
  • 19.  Banks should focus more on contract type ‘Student’ ,’pensioner’ and ‘Businessman’ with housing ‘type other than ‘Co-op apartment’ for successful payments.  Banks should focus less on income type ‘Working’ as they are having most number of unsuccessful payments.  In loan purpose ‘Repairs’:  Although having higher number of rejection in loan purposes with 'Repairs' there are observed difficulties in payment on time.  There are few places where loan payment is delay is significantly high.  Bank should keep continue to caution while giving loan for this purpose.  Bank should avoid giving loans to the housing type of co-op apartment as they are having difficulties in payment.  Bank can focus mostly on housing type ‘with parents’ , ‘Houseapartment’ and ‘municipal apartment’ for successful payments. Conclusion/ Recommendation: