SlideShare a Scribd company logo
E.D.A
By
Adithi – E19002
Bhaswani – E19009
Neha – E19018
BRIEF OVERVIEW:
 To identify the attributes having influential power in
decision making to either reject or accept loan application.
 Context of the data set: The original dataset contains 1000
entries with 20 categorical/symbolic attributes. In this
dataset, each entry represents a person who takes a credit
by a bank. Each person is classified as good or bad credit
risks according to the set of attributes.
S.No Variable Description Data type
1 Credibility 1 : credit-worthy; [good risk]
0 : not credit-worthy [ bad risk ]
Categorical
2 Balance of
current
account
no running account - 1
No balance or debit -2;
0 <= ... < 200 DM – 3;
... >= 200 DM or checking account for at least 1 year-4;
Categorical
3 Duration in
months
(metric)
[<=12] – up to 1 year
[12< ... <= 24] – 1-2 years
[24 < ... <= 36] – 2-3 years
[36 < ... <= 48] – 3- 4 years
[48< ... <= 60] – 4-5 years
[60 < ... <= 72] – 5-6 years
NUMERICAL
4 Payment of
previous
credits
no previous credits / paid back all previous credits - 2
paid back previous credits at this bank - 4
no problems with current credits at this bank - 3
problematic running account / there are further credits running but
at other banks – 1
hesitant payment of previous credits - 0
CATEGORICAL
5 Purpose of
credit
new car - 1
used car - 2
items of furniture - 3
radio / television - 4
household appliances- 5
Repair -6
Education - 7
Vacation- 8
Retraining -9
Business- 10
Other -0
CATEGORICAL
ATTRIBUTES:
S.No Variable Description Data type
6 Amount of credit in DM [<=1500 ] - 1;
[1500 < ... <= 4500] - 2;
[4500 < ... <= 7500] - 3;
[7500 < ... <= 10500] - 4;
[10500 < ... <=13500] - 5;
[13500 < ... <= 16500] - 6;
[> 16500] - 7
Numerical
7 Value of savings or stocks not available / no savings - 1
[< 100], - 2
[100,- <= ... < 500], - 3
[500,- <= ... < 1000], - 4
[>= 1000], - 5
Categorical
8 Has been employed by
current employer
For Unemployed - 1
[<= 1] - 2
[1 <= ... < 4 ] - 3
[4 <= ... < 7]- 4
[>= 7] - 5
Categorical
9 rate Instalment in % of
available income
[>= 35] - 1
[25 <= ... < 35] - 2
[20 <= ... < 25] - 3
[< 20] - 4
Categorical
10 Marital Status / Sex male: divorced / living apart – 1; male: single- 2
male: married / widowed – 3; female: 4
Categorical
11 Further debtors /
Guarantors
None – 1; Co-Applicant – 2; Guarantor - 3 Categorical
12 Living in current household
for
[< 1 year] - 1
[1 <= ... < 4 ] years - 2
[4 <= ... < 7] years - 3
[ >= 7 ] years - 4
Categorical
S.No Variable Description Data type
13 Most valuable available assets Ownership of house or land - 4
Savings contract with a building society / life
insurance - 3
Car / other - 2
Not available / no assets -1
Categorical
14 Age in years (categorized) [0 <= ... <= 25] - 1
[ 26 <= ... <= 39 ] - 2
[ 40 <= ... <= 59] - 3
[ 60 <= ... <= 64 ] - 4
[ >= 65 ] - 5
Numerical
15 Further running credits At other banks – 1
At department store or mail order house - 2
No further running credits – 3
Categorical
16 Type of apartment Rented-1; owned – 2 ; free - 3 Categorical
17 Number of previous credits at
this bank (including the
running one)
One- 1; two or three – 2; four or five –
3; six and above - 4
Categorical
18 Occupation Unemployed / unskilled with no permanent
residence - 1
Unskilled with permanent residence - 2
Skilled worker / skilled employee / minor civil
servant - 3
Executive / self-employed / higher civil servant
- 4
Categorical
19 Number of persons entitled to
maintenance
0 to 2 – 2 ; 3 and more - 1 Numerical
20 Telephone No- 1 ; yes - 2 Categorical
21 Foreign worker Yes- 1; no - 2 Categorical
• We have the population
distribution in
proposition of 70:30 risk
wise
• We have 4 numeric and
16 categorical features.
• Few non influencing
variables which may
not contribute for
decision making
• To find, which is the
most influencing
variable, we adapted a
techniques – WOE-IV
From the data:
WEIGHT OF
EVIDENCE-
INFORMATION VALUE
WOE - IV
WOE & IV are simple,
yet powerful
techniques to
perform variable
transformation and
selection.
It is widely used in
credit scoring to
measure the
separation of good vs
bad customers.
COMPUTATION
&
INTERPRETATION…!
Age Group
Total
Number of
Loans
Number of
Bad Loans
Numbef of
Good
Loans
% Bad
Loans
Name of
Group
Distibution
Bad (DB)
Distibution
Good (DG)
WOE DG - DB
(DG - DB)*
WOE
21 - 30 4821 206 4615 4.3% G1 0.135 0.078 -0.553 -0.057 0.0318
30 - 36 10266 357 9909 3.5% G2 0.235 0.167 -0.339 -0.067 0.0228
36 - 48 32926 776 32150 2.4% G3 0.510 0.542 0.062 0.032 0.0020
48 - 60 12788 183 12605 1.4% G4 0.120 0.213 0.570 0.092 0.0527
Total 60801 1522 59279 Information Value --> 0.1093
Higher the age higher
the credibility
But above sixty years
i.e., after retirement the
credibility is reduced
IV : 0.093
Weak predictive Power
Female have good
credibility
Among male married
have high credibility
IV : 0.045
Weak predictive Power
Higher the balance in
account more the
probability to fall in good
risk
IV :
Savings Account: 0.196
Medium predictive Power
Current Account:0.666
Suspicious Predictive
Power / Too good to rely
on
Predictive Power Of:
CA>SB
Duration In
Months
Lower the duration
lower the bad risk
IV : 0.166
Medium predictive
Power
Amount of credit
Lower the amount
lower the bad risk
<=1500 also have slight
increase in bad risk
IV : 0.165
Medium predictive
Power
PURPOSE OF
CREDIT
If the purpose of the loan
is to create an asset good
risk should be high
Where as the purpose is
an expenditure , bad risk
should be high.
But for vacation it shows
high good risk.
On Further observation,
the no of loan given for
the purpose of vacation
are just 9 not even 1%
(0.9 %)
Hence ignored..!
IV : 0.166
Medium predictive Power
PURPOSE 0 1 2 3 4 5 6 8 9 10
NOT CREDIBLE 89 17 58 62 4 8 22 1 34 5
CREDIBLE 145 86 123 218 8 14 28 8 63 7
Grand Total 234 103 181 280 12 22 50 9 97 12
Higher the no of years
employment , Higher the
credibility
IV : 0.086
Weak predictive Power
People with no assets are
having high probability of
falling into credible
category
IV : 0.113
Medium predictive Power
Payment Of
Previous Credits
Bad risk is observed in
people who are hesitant
to pay previous credits
IV : 0.293
Medium predictive Power
Bad risk is observed in
people whose instalment
is lower in % of the
income.
Which is contrary…!
Though the pattern is
almost resembling the
population.
IV : 0.026
Weak predictive Power
Higher the no of credits
availed higher the
credibility.
But not more than 6
credit facilities.
IV : 0.013
Not useful for prediction
People with no current
credits are having high
credibility.
IV : 0.085
Weak predictive Power
If the loan is secured by a
guarantor it shows high
credibility.
IV : 0.032
Weak predictive Power
People work abroad are
given high credibility
IV : 0.087
Weak predictive Power
For people who have
Rented housing as got
high credibility..!
IV : 0.085
Weak predictive Power
17.9% 71.4% 10.7%
96.3 % 3.7 %
Not influencing
variables as they are
representing the
population distribution
of 70:30 propositionIV VALUES:
Further analysis…!
ATTRIBUTE IV INTERPRETATION
Current Account Balance 0.666 Suspicious Predictive Power
Payment Status Of Previous Credit 0.293 Medium predictive Power
Value Savings/Stocks 0.196 Medium predictive Power
Purpose 0.166 Medium predictive Power
Duration Of Credit (Month) 0.165 Medium predictive Power
Credit Amount 0.119 Medium predictive Power
Most Valuable Available Asset 0.113 Medium predictive Power
Age 0.093 Weak predictive Power
Foreign Worker 0.087 Weak predictive Power
Length Of Current Employment 0.086 Weak predictive Power
Housing 0.085 Weak predictive Power
Concurrent Credit 0.058 Weak predictive Power
Sex & Marital Status 0.045 Weak predictive Power
Guarantor /Debtor 0.032 Weak predictive Power
Instalment Per Cent 0.026 Weak predictive Power
No Of Credits 0.013 Not useful for prediction
Telephone 0.01 Not useful for prediction
Occupation 0.009 Not useful for prediction
Duration In Current House 0.004 Not useful for prediction
Dependents 0.00004 Not useful for prediction
CHOOSING MODEL
 when customer applies for a loan, the bank accepts or rejects the
application based on predicted risk -probability of default- for the
application.
 Considering this is an objective segmentation, we need to have a
target/dependent variable. In this case it will be whether a
customer has Bad or good risk over the loan.
 If we are working on an objective segmentation problem, our aim
is to find conditions which help us find a segment which is very
similar on target variable value.
 Decision Tree is one of the commonly used as objective
segmentation techniques.
 Based on the WOE – IV we have chosen the variables with good
predictive power for building a decision tree
DECISION TREE:
 Interpretation:
 Train-test split : 70:30
 Class1 : credible
 Class 0: not credible
 Depth : 3
 Accuracy: 0.76
 Precision: 0.77
 Sensitivity: 0.92
 Specificity: 35
 F1 score: 0.84
 Interpretation:
 Train-test split : 70:30
 Class1 : credible
 Class 0: not credible
 Depth :4
 Accuracy: 0.74
 Precision: 0.77
 Sensitivity: 0.89
 Specificity: 37
 F1 score: 0.83
FURTHER ANALYSIS TO BE CONTD..
THANK YOU…!
Queries..?

More Related Content

What's hot

Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
Learnbay Datascience
 
Indian Payments Industry Analysis
Indian Payments Industry AnalysisIndian Payments Industry Analysis
Indian Payments Industry Analysis
Aniket Harsh
 
Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients
Vatsal N Shah
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
Tuhin AI Advisory
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
Mihai Enescu
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
Ritu Sarkar
 
AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banks
Pankaj Baid
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
Pranov Mishra
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in Banking
Arul Bharathi
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
Hirak Sen Roy
 
Credit EDA Assignment (Tanvi Pradhan)
Credit EDA Assignment (Tanvi Pradhan)Credit EDA Assignment (Tanvi Pradhan)
Credit EDA Assignment (Tanvi Pradhan)
TanviPradhan4
 
Identifying customer segments using machine learning
Identifying customer segments using machine learningIdentifying customer segments using machine learning
Identifying customer segments using machine learning
Knoldus Inc.
 
Data science in finance industry
Data science in finance industryData science in finance industry
Data science in finance industry
Institute of Contemporary Sciences
 
Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card Payments
Vikas Virani
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail store
Siddharth Chaudhary
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 
Credit default risk
Credit default riskCredit default risk
Credit default risk
chs71
 
Artificial Intelligence and Digital Banking - What about fraud prevention ?
Artificial Intelligence and Digital Banking - What about fraud prevention ?Artificial Intelligence and Digital Banking - What about fraud prevention ?
Artificial Intelligence and Digital Banking - What about fraud prevention ?
Jérôme Kehrli
 
How Big Data and Predictive Analytics are revolutionizing AML and Financial C...
How Big Data and Predictive Analytics are revolutionizing AML and Financial C...How Big Data and Predictive Analytics are revolutionizing AML and Financial C...
How Big Data and Predictive Analytics are revolutionizing AML and Financial C...
DataWorks Summit
 
KPMG Forage.pptx
KPMG Forage.pptxKPMG Forage.pptx
KPMG Forage.pptx
KartikAggarwal79
 

What's hot (20)

Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
 
Indian Payments Industry Analysis
Indian Payments Industry AnalysisIndian Payments Industry Analysis
Indian Payments Industry Analysis
 
Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
 
AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banks
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in Banking
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Credit EDA Assignment (Tanvi Pradhan)
Credit EDA Assignment (Tanvi Pradhan)Credit EDA Assignment (Tanvi Pradhan)
Credit EDA Assignment (Tanvi Pradhan)
 
Identifying customer segments using machine learning
Identifying customer segments using machine learningIdentifying customer segments using machine learning
Identifying customer segments using machine learning
 
Data science in finance industry
Data science in finance industryData science in finance industry
Data science in finance industry
 
Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card Payments
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail store
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
Credit default risk
Credit default riskCredit default risk
Credit default risk
 
Artificial Intelligence and Digital Banking - What about fraud prevention ?
Artificial Intelligence and Digital Banking - What about fraud prevention ?Artificial Intelligence and Digital Banking - What about fraud prevention ?
Artificial Intelligence and Digital Banking - What about fraud prevention ?
 
How Big Data and Predictive Analytics are revolutionizing AML and Financial C...
How Big Data and Predictive Analytics are revolutionizing AML and Financial C...How Big Data and Predictive Analytics are revolutionizing AML and Financial C...
How Big Data and Predictive Analytics are revolutionizing AML and Financial C...
 
KPMG Forage.pptx
KPMG Forage.pptxKPMG Forage.pptx
KPMG Forage.pptx
 

Similar to exploratory data analysis on german credit data

profiling creditworthiness &entrepreneurship using psychometric tools
profiling creditworthiness &entrepreneurship using psychometric toolsprofiling creditworthiness &entrepreneurship using psychometric tools
profiling creditworthiness &entrepreneurship using psychometric tools
Raj Dravid
 
Credit bureau
Credit bureauCredit bureau
Credit bureau
ThiyagarajanSM
 
Credit defaulter analysis
Credit defaulter analysisCredit defaulter analysis
Credit defaulter analysis
Nimai Chand Das Adhikari
 
C C G Welcome Package
C C G  Welcome  PackageC C G  Welcome  Package
C C G Welcome Package
Steve Gonzalez
 
How To Score With Credit
How To  Score With  CreditHow To  Score With  Credit
How To Score With Credittrustintiff
 
Microloan PowerPoint Presentation
Microloan PowerPoint PresentationMicroloan PowerPoint Presentation
Microloan PowerPoint Presentation
mdesmond
 
Receivable management and Factoring (By BU AIS 2nd Batch)
Receivable management and Factoring (By BU AIS 2nd Batch)Receivable management and Factoring (By BU AIS 2nd Batch)
Receivable management and Factoring (By BU AIS 2nd Batch)
Jessic Sharif
 
Credit 101 2014
Credit 101 2014Credit 101 2014
Credit 101 2014
Blue Water Credit
 
CIBIL PPT.pptx
CIBIL PPT.pptxCIBIL PPT.pptx
CIBIL PPT.pptx
QwertyUiop140013
 
Get the Credit You Deserve
Get the Credit You DeserveGet the Credit You Deserve
Get the Credit You Deserve
kyliehatch
 
Credit Training Presentation
Credit Training PresentationCredit Training Presentation
Credit Training Presentation
guest7f7d4
 
Credit 101 presentation
Credit 101 presentationCredit 101 presentation
Credit 101 presentation
Blue Water Credit
 
AIBB 202 Lesson 2.6: CRG & Internal Credit Risk Rating Systems (ICRRS)
AIBB 202 Lesson 2.6: CRG & Internal Credit Risk Rating Systems (ICRRS)AIBB 202 Lesson 2.6: CRG & Internal Credit Risk Rating Systems (ICRRS)
AIBB 202 Lesson 2.6: CRG & Internal Credit Risk Rating Systems (ICRRS)
Saiful Islam
 
212013 14398 f013_credit rating
212013 14398 f013_credit rating212013 14398 f013_credit rating
212013 14398 f013_credit rating
Sumit Sharma
 
Unsecured Financing You Should Know About
Unsecured Financing You Should Know AboutUnsecured Financing You Should Know About
Unsecured Financing You Should Know About
Ty Crandall, Business Credit Expert
 
Emerging manager renaissance
Emerging manager renaissanceEmerging manager renaissance
Emerging manager renaissance
Peter Urbani
 
Credit Score and Debt basics 2nd edition
Credit Score and Debt basics 2nd editionCredit Score and Debt basics 2nd edition
Credit Score and Debt basics 2nd edition
Jeff Wilson II, CPA/PFS, CFE, CGMA
 
What Predicts the Growth of Small- and Medium-Sized Firms? Evidence from Tanz...
What Predicts the Growth of Small- and Medium-Sized Firms? Evidence from Tanz...What Predicts the Growth of Small- and Medium-Sized Firms? Evidence from Tanz...
What Predicts the Growth of Small- and Medium-Sized Firms? Evidence from Tanz...
International Food Policy Research Institute (IFPRI)
 

Similar to exploratory data analysis on german credit data (20)

profiling creditworthiness &entrepreneurship using psychometric tools
profiling creditworthiness &entrepreneurship using psychometric toolsprofiling creditworthiness &entrepreneurship using psychometric tools
profiling creditworthiness &entrepreneurship using psychometric tools
 
Credit bureau
Credit bureauCredit bureau
Credit bureau
 
Credit defaulter analysis
Credit defaulter analysisCredit defaulter analysis
Credit defaulter analysis
 
C C G Welcome Package
C C G  Welcome  PackageC C G  Welcome  Package
C C G Welcome Package
 
C C G Welcome Package
C C G  Welcome  PackageC C G  Welcome  Package
C C G Welcome Package
 
How To Score With Credit
How To  Score With  CreditHow To  Score With  Credit
How To Score With Credit
 
Microloan PowerPoint Presentation
Microloan PowerPoint PresentationMicroloan PowerPoint Presentation
Microloan PowerPoint Presentation
 
EXIM Bank
EXIM Bank EXIM Bank
EXIM Bank
 
Receivable management and Factoring (By BU AIS 2nd Batch)
Receivable management and Factoring (By BU AIS 2nd Batch)Receivable management and Factoring (By BU AIS 2nd Batch)
Receivable management and Factoring (By BU AIS 2nd Batch)
 
Credit 101 2014
Credit 101 2014Credit 101 2014
Credit 101 2014
 
CIBIL PPT.pptx
CIBIL PPT.pptxCIBIL PPT.pptx
CIBIL PPT.pptx
 
Get the Credit You Deserve
Get the Credit You DeserveGet the Credit You Deserve
Get the Credit You Deserve
 
Credit Training Presentation
Credit Training PresentationCredit Training Presentation
Credit Training Presentation
 
Credit 101 presentation
Credit 101 presentationCredit 101 presentation
Credit 101 presentation
 
AIBB 202 Lesson 2.6: CRG & Internal Credit Risk Rating Systems (ICRRS)
AIBB 202 Lesson 2.6: CRG & Internal Credit Risk Rating Systems (ICRRS)AIBB 202 Lesson 2.6: CRG & Internal Credit Risk Rating Systems (ICRRS)
AIBB 202 Lesson 2.6: CRG & Internal Credit Risk Rating Systems (ICRRS)
 
212013 14398 f013_credit rating
212013 14398 f013_credit rating212013 14398 f013_credit rating
212013 14398 f013_credit rating
 
Unsecured Financing You Should Know About
Unsecured Financing You Should Know AboutUnsecured Financing You Should Know About
Unsecured Financing You Should Know About
 
Emerging manager renaissance
Emerging manager renaissanceEmerging manager renaissance
Emerging manager renaissance
 
Credit Score and Debt basics 2nd edition
Credit Score and Debt basics 2nd editionCredit Score and Debt basics 2nd edition
Credit Score and Debt basics 2nd edition
 
What Predicts the Growth of Small- and Medium-Sized Firms? Evidence from Tanz...
What Predicts the Growth of Small- and Medium-Sized Firms? Evidence from Tanz...What Predicts the Growth of Small- and Medium-Sized Firms? Evidence from Tanz...
What Predicts the Growth of Small- and Medium-Sized Firms? Evidence from Tanz...
 

Recently uploaded

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 

exploratory data analysis on german credit data

  • 1. E.D.A By Adithi – E19002 Bhaswani – E19009 Neha – E19018
  • 2. BRIEF OVERVIEW:  To identify the attributes having influential power in decision making to either reject or accept loan application.  Context of the data set: The original dataset contains 1000 entries with 20 categorical/symbolic attributes. In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes.
  • 3. S.No Variable Description Data type 1 Credibility 1 : credit-worthy; [good risk] 0 : not credit-worthy [ bad risk ] Categorical 2 Balance of current account no running account - 1 No balance or debit -2; 0 <= ... < 200 DM – 3; ... >= 200 DM or checking account for at least 1 year-4; Categorical 3 Duration in months (metric) [<=12] – up to 1 year [12< ... <= 24] – 1-2 years [24 < ... <= 36] – 2-3 years [36 < ... <= 48] – 3- 4 years [48< ... <= 60] – 4-5 years [60 < ... <= 72] – 5-6 years NUMERICAL 4 Payment of previous credits no previous credits / paid back all previous credits - 2 paid back previous credits at this bank - 4 no problems with current credits at this bank - 3 problematic running account / there are further credits running but at other banks – 1 hesitant payment of previous credits - 0 CATEGORICAL 5 Purpose of credit new car - 1 used car - 2 items of furniture - 3 radio / television - 4 household appliances- 5 Repair -6 Education - 7 Vacation- 8 Retraining -9 Business- 10 Other -0 CATEGORICAL ATTRIBUTES:
  • 4. S.No Variable Description Data type 6 Amount of credit in DM [<=1500 ] - 1; [1500 < ... <= 4500] - 2; [4500 < ... <= 7500] - 3; [7500 < ... <= 10500] - 4; [10500 < ... <=13500] - 5; [13500 < ... <= 16500] - 6; [> 16500] - 7 Numerical 7 Value of savings or stocks not available / no savings - 1 [< 100], - 2 [100,- <= ... < 500], - 3 [500,- <= ... < 1000], - 4 [>= 1000], - 5 Categorical 8 Has been employed by current employer For Unemployed - 1 [<= 1] - 2 [1 <= ... < 4 ] - 3 [4 <= ... < 7]- 4 [>= 7] - 5 Categorical 9 rate Instalment in % of available income [>= 35] - 1 [25 <= ... < 35] - 2 [20 <= ... < 25] - 3 [< 20] - 4 Categorical 10 Marital Status / Sex male: divorced / living apart – 1; male: single- 2 male: married / widowed – 3; female: 4 Categorical 11 Further debtors / Guarantors None – 1; Co-Applicant – 2; Guarantor - 3 Categorical 12 Living in current household for [< 1 year] - 1 [1 <= ... < 4 ] years - 2 [4 <= ... < 7] years - 3 [ >= 7 ] years - 4 Categorical
  • 5. S.No Variable Description Data type 13 Most valuable available assets Ownership of house or land - 4 Savings contract with a building society / life insurance - 3 Car / other - 2 Not available / no assets -1 Categorical 14 Age in years (categorized) [0 <= ... <= 25] - 1 [ 26 <= ... <= 39 ] - 2 [ 40 <= ... <= 59] - 3 [ 60 <= ... <= 64 ] - 4 [ >= 65 ] - 5 Numerical 15 Further running credits At other banks – 1 At department store or mail order house - 2 No further running credits – 3 Categorical 16 Type of apartment Rented-1; owned – 2 ; free - 3 Categorical 17 Number of previous credits at this bank (including the running one) One- 1; two or three – 2; four or five – 3; six and above - 4 Categorical 18 Occupation Unemployed / unskilled with no permanent residence - 1 Unskilled with permanent residence - 2 Skilled worker / skilled employee / minor civil servant - 3 Executive / self-employed / higher civil servant - 4 Categorical 19 Number of persons entitled to maintenance 0 to 2 – 2 ; 3 and more - 1 Numerical 20 Telephone No- 1 ; yes - 2 Categorical 21 Foreign worker Yes- 1; no - 2 Categorical
  • 6. • We have the population distribution in proposition of 70:30 risk wise • We have 4 numeric and 16 categorical features. • Few non influencing variables which may not contribute for decision making • To find, which is the most influencing variable, we adapted a techniques – WOE-IV From the data:
  • 8. WOE & IV are simple, yet powerful techniques to perform variable transformation and selection. It is widely used in credit scoring to measure the separation of good vs bad customers.
  • 10. Age Group Total Number of Loans Number of Bad Loans Numbef of Good Loans % Bad Loans Name of Group Distibution Bad (DB) Distibution Good (DG) WOE DG - DB (DG - DB)* WOE 21 - 30 4821 206 4615 4.3% G1 0.135 0.078 -0.553 -0.057 0.0318 30 - 36 10266 357 9909 3.5% G2 0.235 0.167 -0.339 -0.067 0.0228 36 - 48 32926 776 32150 2.4% G3 0.510 0.542 0.062 0.032 0.0020 48 - 60 12788 183 12605 1.4% G4 0.120 0.213 0.570 0.092 0.0527 Total 60801 1522 59279 Information Value --> 0.1093
  • 11. Higher the age higher the credibility But above sixty years i.e., after retirement the credibility is reduced IV : 0.093 Weak predictive Power Female have good credibility Among male married have high credibility IV : 0.045 Weak predictive Power
  • 12. Higher the balance in account more the probability to fall in good risk IV : Savings Account: 0.196 Medium predictive Power Current Account:0.666 Suspicious Predictive Power / Too good to rely on Predictive Power Of: CA>SB
  • 13. Duration In Months Lower the duration lower the bad risk IV : 0.166 Medium predictive Power Amount of credit Lower the amount lower the bad risk <=1500 also have slight increase in bad risk IV : 0.165 Medium predictive Power
  • 14. PURPOSE OF CREDIT If the purpose of the loan is to create an asset good risk should be high Where as the purpose is an expenditure , bad risk should be high. But for vacation it shows high good risk. On Further observation, the no of loan given for the purpose of vacation are just 9 not even 1% (0.9 %) Hence ignored..! IV : 0.166 Medium predictive Power PURPOSE 0 1 2 3 4 5 6 8 9 10 NOT CREDIBLE 89 17 58 62 4 8 22 1 34 5 CREDIBLE 145 86 123 218 8 14 28 8 63 7 Grand Total 234 103 181 280 12 22 50 9 97 12
  • 15. Higher the no of years employment , Higher the credibility IV : 0.086 Weak predictive Power People with no assets are having high probability of falling into credible category IV : 0.113 Medium predictive Power
  • 16. Payment Of Previous Credits Bad risk is observed in people who are hesitant to pay previous credits IV : 0.293 Medium predictive Power Bad risk is observed in people whose instalment is lower in % of the income. Which is contrary…! Though the pattern is almost resembling the population. IV : 0.026 Weak predictive Power
  • 17. Higher the no of credits availed higher the credibility. But not more than 6 credit facilities. IV : 0.013 Not useful for prediction People with no current credits are having high credibility. IV : 0.085 Weak predictive Power
  • 18. If the loan is secured by a guarantor it shows high credibility. IV : 0.032 Weak predictive Power People work abroad are given high credibility IV : 0.087 Weak predictive Power For people who have Rented housing as got high credibility..! IV : 0.085 Weak predictive Power 17.9% 71.4% 10.7% 96.3 % 3.7 %
  • 19. Not influencing variables as they are representing the population distribution of 70:30 propositionIV VALUES:
  • 20. Further analysis…! ATTRIBUTE IV INTERPRETATION Current Account Balance 0.666 Suspicious Predictive Power Payment Status Of Previous Credit 0.293 Medium predictive Power Value Savings/Stocks 0.196 Medium predictive Power Purpose 0.166 Medium predictive Power Duration Of Credit (Month) 0.165 Medium predictive Power Credit Amount 0.119 Medium predictive Power Most Valuable Available Asset 0.113 Medium predictive Power Age 0.093 Weak predictive Power Foreign Worker 0.087 Weak predictive Power Length Of Current Employment 0.086 Weak predictive Power Housing 0.085 Weak predictive Power Concurrent Credit 0.058 Weak predictive Power Sex & Marital Status 0.045 Weak predictive Power Guarantor /Debtor 0.032 Weak predictive Power Instalment Per Cent 0.026 Weak predictive Power No Of Credits 0.013 Not useful for prediction Telephone 0.01 Not useful for prediction Occupation 0.009 Not useful for prediction Duration In Current House 0.004 Not useful for prediction Dependents 0.00004 Not useful for prediction
  • 21. CHOOSING MODEL  when customer applies for a loan, the bank accepts or rejects the application based on predicted risk -probability of default- for the application.  Considering this is an objective segmentation, we need to have a target/dependent variable. In this case it will be whether a customer has Bad or good risk over the loan.  If we are working on an objective segmentation problem, our aim is to find conditions which help us find a segment which is very similar on target variable value.  Decision Tree is one of the commonly used as objective segmentation techniques.  Based on the WOE – IV we have chosen the variables with good predictive power for building a decision tree
  • 22. DECISION TREE:  Interpretation:  Train-test split : 70:30  Class1 : credible  Class 0: not credible  Depth : 3  Accuracy: 0.76  Precision: 0.77  Sensitivity: 0.92  Specificity: 35  F1 score: 0.84
  • 23.  Interpretation:  Train-test split : 70:30  Class1 : credible  Class 0: not credible  Depth :4  Accuracy: 0.74  Precision: 0.77  Sensitivity: 0.89  Specificity: 37  F1 score: 0.83
  • 24. FURTHER ANALYSIS TO BE CONTD.. THANK YOU…! Queries..?