SlideShare a Scribd company logo
1 of 20
LoanEDACaseStudy
Prepared by:
Amit Kumar Das
Date: 30/10/2022
 Aims to give you an idea of applying EDA in a real business scenario.
 Applying the techniques that you have learnt in the EDA module, you will also develop a basic
understanding of risk analytics in banking and financial services.
 Understand how data is used to minimize the risk of losing money while lending to customers.
 Present the overall approach of the analysis in a presentation.
 Mention the problem statement and the analysis approach briefly.
 Identify the missing data and use appropriate method to deal with it. (Remove columns/or replace
it with an appropriate value)
 Identify if there are outliers in the dataset. Also, mention why do you think it is an outlier. Again,
remember that for this exercise, it is not necessary to remove any data points
 Identify if there is data imbalance in the data. Find the ratio of data imbalance.
 Explain the results of univariate, segmented univariate, bivariate analysis, etc. in business terms
 Include visualizations and summarize the most important results in the presentation. You are free
to choose the graphs which explain the numerical/categorical variables
 Insights should explain why the variable is important for differentiating the clients with payment
difficulties with all other cases
ProblemStatement:
 Number of columns having null
value more than 50%: 41 Nos
These columns should be
dropped.
 Number of columns having null
value less than 15%:13Nos
These columns shall beimputed
with suitable values which shall
be explained subsequently
For analysis ofimputation
selected 7 variables.
Columnshaving Nullvalue:
 Continuous variables:
'EXT_SOURCE_2‘
'AMT_GOODS_PRICE‘
• Categorical variables:
'OBS_30_CNT_SOCIAL_CIRCLE','
OBS_60_CNT_SOCIAL_CIRCLE',
'DEF_60_CNT_SOCIAL_CIRCLE',
‘DEF_30_CNT_SOCIAL_CIRCLE','
NAME_TYPE_SUITE
• For 'EXT_SOURCE_2' there is no
outliers present and missing
values canbe imputed with mean
or median (median:0.565)
 There are high number of outliers
present in the
AMT_GOODS_PRICEdata. Hence
it is recommended to impute data
withMedian value i.e. 450000
For categorical variable the value which should be imputed should be the maximum infrequency.
So the value to be imputed are:
NAME_TYPE_SUITE: Unaccompanied
OBS_30_CNT_SOCIAL_CIRCLE: 0.0
DEF_30_CNT_SOCIAL_CIRCLE: 0.0
OBS_60_CNT_SOCIAL_CIRCLE: 0.0
DEF_60_CNT_SOCIAL_CIRCLE: 0.0
Data Imputation analysis for columnshaving <15%nullvalue:
Checkingthe outlierfor
numerical variables:
 The first quartile almost missing
for CNT_CHILDRENthat means
most of the data are present in
the firstquartile.
 There is single high value data
point asoutlier present in
AMT_INCOME_TOTALand
DAYS_EMPLOYED. Removal
this point will drastically impact
the box plot for further analysis.
 The first quartiles is slim
compare to third quartile for
AMT_CREDIT,AMT_ANNUITY
,
DAYS_REGISTRATION.This
mean data are skewedtowards
first quartile.
AMT_INCOME_RANGE:
 The people having100000-
200000 are having higher
number of loan and also
are higher indefaulter
 The income segmenthaving
>500000 are having less
defaulter.
AMT_CREDIT_RANGE:
 The people having<100000
loan are lessdefaulter.
 income having morethan
>100000 are almost equal%
of loandefaulter
Univariate analysis for categorical variables
NAME_INCOME_TYPE:
 Student pensioner andbusiness
have higher percentage of loan
repayment.
 Working, State servant and
Commercial associates havehigher
default percentage.
 Maternity category issignificantly
higher problem inrepayment.
NAME_CONTRACT_TYPE
 For contract type‘cash loans’ is
having higher number of credits
than ‘Revolvingloans’contract type.
 From the graphs we can seethat the
Revolving loans are small amount
compared to Cashloans but the %
of non payment for the revolving
loans are comparativelyhigh.
Univariate analysis for categorical variables
Univariate analysis for categorical variables
CODE_GENDER:
 The %ofdefaulters are more in
Male thanFemale
FLAG_OWN_CAR:
 The person owning car ishaving
higher percentage ofdefaulter.
Univariate analysis for continuous variables
 Days Birth:The people having
higher age are having higher
probability ofrepayment.
 Some outliers are observed in In
'AMT_ANNUITY','AMT_GOODS_P
RICE','DAYS_EMPLOYED',
DAYS_LAST_PHONE_CHANGEin
the dataset.
 Lessoutlier observed inDays Birth
 DAYS_EMPLOYED. Removalof
this point will drastically impact
the box plot for further analysis.
Univariate analysis for continuous variables
 Lessoutlier observedin
DAYS_ID_PUBLISH
 1st quartile is smaller than third
quartile in In
'AMT_ANNUITY','AMT_GOODS_P
RICE',
DAYS_LAST_PHONE_CHANGE.
 In DAYS_ID_PUBLISH: people
changing ID in recent days are
relatively prone to be default.
Bivariate analysis for numerical variables – Target 0 (Client having no payment difficulties)
 Family status of 'civil marriage',
'marriage' and 'separated' of
Academic degree education are
having higher number of credits
than others.
 Also, higher education of family
status of 'marriage', 'single' and
'civil marriage' are having more
outliers.
 Civil marriage forAcademic degree
is having most of the credits in the
third quartile.
 In Education type 'Higher
education' the income amount is
mostly equalwith family status. It
does contain manyoutliers.
 Lessoutlier are having for
Academicdegree but there income
amount is little higher that Higher
education.
 Lower secondary of civil marriage
family status are have less income
amount thanothers
Bivariate analysis for numerical variables – Target 0 (Client having no payment difficulties)
Bivariate analysis for numerical variables – Target 1 (Client having payment difficulties)
 Observations are Quitesimilar
withTarget 0
 Family status of 'civil marriage',
'marriage' and 'separated' of
Academic degree education are
having higher number of credits
than others.
 Most of the outliers are from
Education type 'Higher
education' and 'Secondary'.
 Civilmarriage forAcademic degree
is havingmost of the credits in the
third quartile.
Bivariate analysis for numerical variables – Target 1 (Client having payment difficulties)
 There is also have somesimilarity
withTarget 0
 Education type 'Higher
education' the income amount is
mostly equal with familystatus.
 Lessoutlier are having for
Academic degree but there
income amount islittle higher that
Higher education.
 Lower secondary are haveless
income amount thanothers.
Target0
:
Target1
:
 From the correlation analysis it
is inferred that the highest
correlation (1.0) is between
(OBS_60_CNT_SOCIAL_CIRCLE
with
OBS_30_CNT_SOCIAL_CIRCLE)
and (FLOORSMAX_MEDI with
FLOORSMAX_AVG) which is
samefor both the data set.
Correlation
Univariate analysis for combined dataset (Distribution of contract status with purpose)
Most rejection of loanscamefrom purpose 'repairs'. For education purposes we have equal number of approves and rejection Payingother loansand
buying anewcarishaving significant higher rejection than approves.
Univariate analysis for combined dataset (Distribution of the purpose with target)
Loan purposes with 'Repairs' are facing more difficulties in payment on time. There are few placeswhere loan payment is significant higher than facing
difficulties.Theyare 'Buyingagarage','Business development', 'Buyingland', 'Buyinganewcar' and 'Education' Hencewe canfocus on these purposesfor
which the client is having for minimal payment difficulties.
Bivariate analysis for combined dataset
 The credit amount of Loan
purposes like 'Buying a
home', 'Buying aland',
'Buying anew car' and
'Building ahouse' is
higher.
 Income type of state
servants have asignificant
amount ofcredit applied
 Money for third person or
aHobby is having less
credits applied.
 For Housing type, office
apartment is having higher
credit of target 0 and co-op
apartment is having higher
credit of target 1.
 So, we can conclude that
bank should avoid giving
loans to the housing type of
co-op apartment astheyare
having difficulties in
payment.
 Bank can focus mostly on
housing type with parents or
Houseapartment or
municipal apartment for
successfulpayments.
Bivariate analysis for combined dataset
 Banks should focus more on contract type ‘Student’ ,’pensioner’ and ‘Businessman’with
housing ‘type other than ‘Co-op apartment’ for successful payments.
 Banks should focus lesson income type ‘Working’asthey are having most number of
unsuccessful payments.
 In loan purpose‘Repairs’:
 Although having higher number of rejection in loan purposes with 'Repairs' there are
observed difficulties in payment on time.
 Thereare few places where loan payment is delay is significantly high.
 Bank should keep continue to caution while giving loan for this purpose.
 Bank should avoid giving loans to the housing type of co-op apartment asthey are having
difficulties inpayment.
 Bank can focus mostly on housing type ‘with parents’ , ‘Houseapartment’and ‘municipal
apartment’ for successfulpayments.
Conclusion/Recommendation:

More Related Content

What's hot

Home credit company risk presentation
Home credit company risk presentationHome credit company risk presentation
Home credit company risk presentationShreya Solanki
 
project-6-bank-loan-case-study.pdf
project-6-bank-loan-case-study.pdfproject-6-bank-loan-case-study.pdf
project-6-bank-loan-case-study.pdfVaibhaviKhedekar1
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentationMithul Murugaadev
 
Lead scoring case study
Lead scoring case studyLead scoring case study
Lead scoring case studyShreya Solanki
 
Banking Credit Risk- EDA.pptx
Banking Credit Risk- EDA.pptxBanking Credit Risk- EDA.pptx
Banking Credit Risk- EDA.pptxrishikakhanna7
 
Lead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxLead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxRachnaGoel10
 
Storytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsxStorytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsxDevanshi358374
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision TreesSara Hooker
 
IMDB Movie Dataset Analysis
IMDB Movie Dataset AnalysisIMDB Movie Dataset Analysis
IMDB Movie Dataset AnalysisAaron McClellan
 
Lead Scoring Case Study
Lead Scoring Case StudyLead Scoring Case Study
Lead Scoring Case StudyLumbiniSardare
 
Credit default risk
Credit default riskCredit default risk
Credit default riskchs71
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction PresentationPinintiHarishReddy
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
Boom bikes data analysis
Boom bikes data analysisBoom bikes data analysis
Boom bikes data analysisLaveena5
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and PredictionSOUMIT KAR
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDeep Borkar
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationVishva Abeyrathne
 

What's hot (20)

Home credit company risk presentation
Home credit company risk presentationHome credit company risk presentation
Home credit company risk presentation
 
project-6-bank-loan-case-study.pdf
project-6-bank-loan-case-study.pdfproject-6-bank-loan-case-study.pdf
project-6-bank-loan-case-study.pdf
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentation
 
Lead scoring case study
Lead scoring case studyLead scoring case study
Lead scoring case study
 
IMDB.pptx
IMDB.pptxIMDB.pptx
IMDB.pptx
 
Banking Credit Risk- EDA.pptx
Banking Credit Risk- EDA.pptxBanking Credit Risk- EDA.pptx
Banking Credit Risk- EDA.pptx
 
Lead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxLead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptx
 
Storytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsxStorytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsx
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
IMDB Movie Dataset Analysis
IMDB Movie Dataset AnalysisIMDB Movie Dataset Analysis
IMDB Movie Dataset Analysis
 
Lead Scoring Case Study
Lead Scoring Case StudyLead Scoring Case Study
Lead Scoring Case Study
 
Credit default risk
Credit default riskCredit default risk
Credit default risk
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Boom bikes data analysis
Boom bikes data analysisBoom bikes data analysis
Boom bikes data analysis
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and Prediction
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan Data
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 

Similar to EDA_Case_Study_PPT.pptx

EDA_ Bank_Loan_Case_Study_PPT.pdf
EDA_ Bank_Loan_Case_Study_PPT.pdfEDA_ Bank_Loan_Case_Study_PPT.pdf
EDA_ Bank_Loan_Case_Study_PPT.pdfSourabhpathak21
 
Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011gertjankaart
 
Executive MPH Program PBHL 600ES – Health Management a.docx
Executive MPH Program   PBHL 600ES – Health Management a.docxExecutive MPH Program   PBHL 600ES – Health Management a.docx
Executive MPH Program PBHL 600ES – Health Management a.docxSANSKAR20
 
Application Of Property Theories Of The Beacon Hill
Application Of Property Theories Of The Beacon HillApplication Of Property Theories Of The Beacon Hill
Application Of Property Theories Of The Beacon HillTheresa Singh
 
Techathon Idea Paper
Techathon Idea PaperTechathon Idea Paper
Techathon Idea PaperDillip kumar
 
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities WNS Global Services
 
1 2Summary Of the Business Model Ca.docx
1  2Summary Of the Business Model Ca.docx1  2Summary Of the Business Model Ca.docx
1 2Summary Of the Business Model Ca.docxdurantheseldine
 
Template for submitting Business Plan in AKTU Parikrama Competition.
Template for submitting Business Plan in AKTU Parikrama Competition.Template for submitting Business Plan in AKTU Parikrama Competition.
Template for submitting Business Plan in AKTU Parikrama Competition.Engineers inc
 
Telecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insightTelecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insightsheetal sharma
 
10Appendix 1Student Guide to Case Analysis Using the Ameri.docx
10Appendix 1Student Guide to Case Analysis Using the Ameri.docx10Appendix 1Student Guide to Case Analysis Using the Ameri.docx
10Appendix 1Student Guide to Case Analysis Using the Ameri.docxdrennanmicah
 
2014 cs data collection guide (1)
2014 cs data collection guide (1)2014 cs data collection guide (1)
2014 cs data collection guide (1)Tamer Turgut
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docxShiraPrater50
 
Better way-to-evaluate-small-business-credit-risk (3)
Better way-to-evaluate-small-business-credit-risk (3)Better way-to-evaluate-small-business-credit-risk (3)
Better way-to-evaluate-small-business-credit-risk (3)Matthew Kearns
 

Similar to EDA_Case_Study_PPT.pptx (13)

EDA_ Bank_Loan_Case_Study_PPT.pdf
EDA_ Bank_Loan_Case_Study_PPT.pdfEDA_ Bank_Loan_Case_Study_PPT.pdf
EDA_ Bank_Loan_Case_Study_PPT.pdf
 
Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011
 
Executive MPH Program PBHL 600ES – Health Management a.docx
Executive MPH Program   PBHL 600ES – Health Management a.docxExecutive MPH Program   PBHL 600ES – Health Management a.docx
Executive MPH Program PBHL 600ES – Health Management a.docx
 
Application Of Property Theories Of The Beacon Hill
Application Of Property Theories Of The Beacon HillApplication Of Property Theories Of The Beacon Hill
Application Of Property Theories Of The Beacon Hill
 
Techathon Idea Paper
Techathon Idea PaperTechathon Idea Paper
Techathon Idea Paper
 
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
 
1 2Summary Of the Business Model Ca.docx
1  2Summary Of the Business Model Ca.docx1  2Summary Of the Business Model Ca.docx
1 2Summary Of the Business Model Ca.docx
 
Template for submitting Business Plan in AKTU Parikrama Competition.
Template for submitting Business Plan in AKTU Parikrama Competition.Template for submitting Business Plan in AKTU Parikrama Competition.
Template for submitting Business Plan in AKTU Parikrama Competition.
 
Telecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insightTelecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insight
 
10Appendix 1Student Guide to Case Analysis Using the Ameri.docx
10Appendix 1Student Guide to Case Analysis Using the Ameri.docx10Appendix 1Student Guide to Case Analysis Using the Ameri.docx
10Appendix 1Student Guide to Case Analysis Using the Ameri.docx
 
2014 cs data collection guide (1)
2014 cs data collection guide (1)2014 cs data collection guide (1)
2014 cs data collection guide (1)
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
 
Better way-to-evaluate-small-business-credit-risk (3)
Better way-to-evaluate-small-business-credit-risk (3)Better way-to-evaluate-small-business-credit-risk (3)
Better way-to-evaluate-small-business-credit-risk (3)
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 

Recently uploaded (20)

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 

EDA_Case_Study_PPT.pptx

  • 2.  Aims to give you an idea of applying EDA in a real business scenario.  Applying the techniques that you have learnt in the EDA module, you will also develop a basic understanding of risk analytics in banking and financial services.  Understand how data is used to minimize the risk of losing money while lending to customers.  Present the overall approach of the analysis in a presentation.  Mention the problem statement and the analysis approach briefly.  Identify the missing data and use appropriate method to deal with it. (Remove columns/or replace it with an appropriate value)  Identify if there are outliers in the dataset. Also, mention why do you think it is an outlier. Again, remember that for this exercise, it is not necessary to remove any data points  Identify if there is data imbalance in the data. Find the ratio of data imbalance.  Explain the results of univariate, segmented univariate, bivariate analysis, etc. in business terms  Include visualizations and summarize the most important results in the presentation. You are free to choose the graphs which explain the numerical/categorical variables  Insights should explain why the variable is important for differentiating the clients with payment difficulties with all other cases ProblemStatement:
  • 3.  Number of columns having null value more than 50%: 41 Nos These columns should be dropped.  Number of columns having null value less than 15%:13Nos These columns shall beimputed with suitable values which shall be explained subsequently For analysis ofimputation selected 7 variables. Columnshaving Nullvalue:
  • 4.  Continuous variables: 'EXT_SOURCE_2‘ 'AMT_GOODS_PRICE‘ • Categorical variables: 'OBS_30_CNT_SOCIAL_CIRCLE',' OBS_60_CNT_SOCIAL_CIRCLE', 'DEF_60_CNT_SOCIAL_CIRCLE', ‘DEF_30_CNT_SOCIAL_CIRCLE',' NAME_TYPE_SUITE • For 'EXT_SOURCE_2' there is no outliers present and missing values canbe imputed with mean or median (median:0.565)  There are high number of outliers present in the AMT_GOODS_PRICEdata. Hence it is recommended to impute data withMedian value i.e. 450000 For categorical variable the value which should be imputed should be the maximum infrequency. So the value to be imputed are: NAME_TYPE_SUITE: Unaccompanied OBS_30_CNT_SOCIAL_CIRCLE: 0.0 DEF_30_CNT_SOCIAL_CIRCLE: 0.0 OBS_60_CNT_SOCIAL_CIRCLE: 0.0 DEF_60_CNT_SOCIAL_CIRCLE: 0.0 Data Imputation analysis for columnshaving <15%nullvalue:
  • 5. Checkingthe outlierfor numerical variables:  The first quartile almost missing for CNT_CHILDRENthat means most of the data are present in the firstquartile.  There is single high value data point asoutlier present in AMT_INCOME_TOTALand DAYS_EMPLOYED. Removal this point will drastically impact the box plot for further analysis.  The first quartiles is slim compare to third quartile for AMT_CREDIT,AMT_ANNUITY , DAYS_REGISTRATION.This mean data are skewedtowards first quartile.
  • 6. AMT_INCOME_RANGE:  The people having100000- 200000 are having higher number of loan and also are higher indefaulter  The income segmenthaving >500000 are having less defaulter. AMT_CREDIT_RANGE:  The people having<100000 loan are lessdefaulter.  income having morethan >100000 are almost equal% of loandefaulter Univariate analysis for categorical variables
  • 7. NAME_INCOME_TYPE:  Student pensioner andbusiness have higher percentage of loan repayment.  Working, State servant and Commercial associates havehigher default percentage.  Maternity category issignificantly higher problem inrepayment. NAME_CONTRACT_TYPE  For contract type‘cash loans’ is having higher number of credits than ‘Revolvingloans’contract type.  From the graphs we can seethat the Revolving loans are small amount compared to Cashloans but the % of non payment for the revolving loans are comparativelyhigh. Univariate analysis for categorical variables
  • 8. Univariate analysis for categorical variables CODE_GENDER:  The %ofdefaulters are more in Male thanFemale FLAG_OWN_CAR:  The person owning car ishaving higher percentage ofdefaulter.
  • 9. Univariate analysis for continuous variables  Days Birth:The people having higher age are having higher probability ofrepayment.  Some outliers are observed in In 'AMT_ANNUITY','AMT_GOODS_P RICE','DAYS_EMPLOYED', DAYS_LAST_PHONE_CHANGEin the dataset.  Lessoutlier observed inDays Birth  DAYS_EMPLOYED. Removalof this point will drastically impact the box plot for further analysis.
  • 10. Univariate analysis for continuous variables  Lessoutlier observedin DAYS_ID_PUBLISH  1st quartile is smaller than third quartile in In 'AMT_ANNUITY','AMT_GOODS_P RICE', DAYS_LAST_PHONE_CHANGE.  In DAYS_ID_PUBLISH: people changing ID in recent days are relatively prone to be default.
  • 11. Bivariate analysis for numerical variables – Target 0 (Client having no payment difficulties)  Family status of 'civil marriage', 'marriage' and 'separated' of Academic degree education are having higher number of credits than others.  Also, higher education of family status of 'marriage', 'single' and 'civil marriage' are having more outliers.  Civil marriage forAcademic degree is having most of the credits in the third quartile.
  • 12.  In Education type 'Higher education' the income amount is mostly equalwith family status. It does contain manyoutliers.  Lessoutlier are having for Academicdegree but there income amount is little higher that Higher education.  Lower secondary of civil marriage family status are have less income amount thanothers Bivariate analysis for numerical variables – Target 0 (Client having no payment difficulties)
  • 13. Bivariate analysis for numerical variables – Target 1 (Client having payment difficulties)  Observations are Quitesimilar withTarget 0  Family status of 'civil marriage', 'marriage' and 'separated' of Academic degree education are having higher number of credits than others.  Most of the outliers are from Education type 'Higher education' and 'Secondary'.  Civilmarriage forAcademic degree is havingmost of the credits in the third quartile.
  • 14. Bivariate analysis for numerical variables – Target 1 (Client having payment difficulties)  There is also have somesimilarity withTarget 0  Education type 'Higher education' the income amount is mostly equal with familystatus.  Lessoutlier are having for Academic degree but there income amount islittle higher that Higher education.  Lower secondary are haveless income amount thanothers.
  • 15. Target0 : Target1 :  From the correlation analysis it is inferred that the highest correlation (1.0) is between (OBS_60_CNT_SOCIAL_CIRCLE with OBS_30_CNT_SOCIAL_CIRCLE) and (FLOORSMAX_MEDI with FLOORSMAX_AVG) which is samefor both the data set. Correlation
  • 16. Univariate analysis for combined dataset (Distribution of contract status with purpose) Most rejection of loanscamefrom purpose 'repairs'. For education purposes we have equal number of approves and rejection Payingother loansand buying anewcarishaving significant higher rejection than approves.
  • 17. Univariate analysis for combined dataset (Distribution of the purpose with target) Loan purposes with 'Repairs' are facing more difficulties in payment on time. There are few placeswhere loan payment is significant higher than facing difficulties.Theyare 'Buyingagarage','Business development', 'Buyingland', 'Buyinganewcar' and 'Education' Hencewe canfocus on these purposesfor which the client is having for minimal payment difficulties.
  • 18. Bivariate analysis for combined dataset  The credit amount of Loan purposes like 'Buying a home', 'Buying aland', 'Buying anew car' and 'Building ahouse' is higher.  Income type of state servants have asignificant amount ofcredit applied  Money for third person or aHobby is having less credits applied.
  • 19.  For Housing type, office apartment is having higher credit of target 0 and co-op apartment is having higher credit of target 1.  So, we can conclude that bank should avoid giving loans to the housing type of co-op apartment astheyare having difficulties in payment.  Bank can focus mostly on housing type with parents or Houseapartment or municipal apartment for successfulpayments. Bivariate analysis for combined dataset
  • 20.  Banks should focus more on contract type ‘Student’ ,’pensioner’ and ‘Businessman’with housing ‘type other than ‘Co-op apartment’ for successful payments.  Banks should focus lesson income type ‘Working’asthey are having most number of unsuccessful payments.  In loan purpose‘Repairs’:  Although having higher number of rejection in loan purposes with 'Repairs' there are observed difficulties in payment on time.  Thereare few places where loan payment is delay is significantly high.  Bank should keep continue to caution while giving loan for this purpose.  Bank should avoid giving loans to the housing type of co-op apartment asthey are having difficulties inpayment.  Bank can focus mostly on housing type ‘with parents’ , ‘Houseapartment’and ‘municipal apartment’ for successfulpayments. Conclusion/Recommendation: