Performed exploratory data analysis to find relations between the predictors and the target variable in the dataset. Used WOE-IV technique to identify the influencing variables and then fit a decision tree model using these influencing variables. 8 predictors were used for building the model. In case of decision tree, accuracy is 0.76.
The German Credit data provides variables that help classify observations as good credit vs bad credit. Multiple algorithms such as Logistic Regression, Classification tree, GAM, Neural Net and Linear Discriminant Analysis were used to compare the classification power of the models built.
Predictive Model for Loan Approval Process using SAS 9.3_M1Akanksha Jain
This is a Predictive Model which uses Logistic Regression to statistically help make better loan approval decisions in future for a German Bank. It uses an historical credit data set with 1000 data points and 20 variables.
Tool used:
SAS 9.3_M1
Steps Involved are:
- Data Quality check using Correlations and VIF Tests
- Analysis of different Variable Selection Methods such as Forward, Backward and Stepwise
- Variable Selection on the basis of Parameter Estimates and Odds Ratio
- Outlier Analysis to identify the outliers and improve the model
- Final Model Selection Decision based on ROC curve, Percent Concordant, PROC Rank and Hosmer Lemeshow Test
What is Predictive Analytics?
Predictive Analytics is the stream of the advanced analytics which utilizes diverse techniques like data mining, predictive modelling, statistics, machine learning and artificial intelligence to analyse current data and predict future.
To Know more: https://goo.gl/zAcnCR
LOAN DEFAULT PREDICTION – A CASE STUDY
Content Covered in this video:
Business Problem & Benefits
The Risk - LOAN DEFAULT PREDICTION
Data Analysis Process
Data Processing
Predictive Analysis Process
Tools & Technology
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
The German Credit data provides variables that help classify observations as good credit vs bad credit. Multiple algorithms such as Logistic Regression, Classification tree, GAM, Neural Net and Linear Discriminant Analysis were used to compare the classification power of the models built.
Predictive Model for Loan Approval Process using SAS 9.3_M1Akanksha Jain
This is a Predictive Model which uses Logistic Regression to statistically help make better loan approval decisions in future for a German Bank. It uses an historical credit data set with 1000 data points and 20 variables.
Tool used:
SAS 9.3_M1
Steps Involved are:
- Data Quality check using Correlations and VIF Tests
- Analysis of different Variable Selection Methods such as Forward, Backward and Stepwise
- Variable Selection on the basis of Parameter Estimates and Odds Ratio
- Outlier Analysis to identify the outliers and improve the model
- Final Model Selection Decision based on ROC curve, Percent Concordant, PROC Rank and Hosmer Lemeshow Test
What is Predictive Analytics?
Predictive Analytics is the stream of the advanced analytics which utilizes diverse techniques like data mining, predictive modelling, statistics, machine learning and artificial intelligence to analyse current data and predict future.
To Know more: https://goo.gl/zAcnCR
LOAN DEFAULT PREDICTION – A CASE STUDY
Content Covered in this video:
Business Problem & Benefits
The Risk - LOAN DEFAULT PREDICTION
Data Analysis Process
Data Processing
Predictive Analysis Process
Tools & Technology
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
Customer segmentation is a Project on Machine learning that is developed by using Clustering & clustering is the technique that comes under unsupervised learning of machine learning.
Segmentation allows prospects based on their wants and needs. It allows identifying the most valuable customer segment so the basis of it vender improve their return on marketing investment by only targeting those likely to be your best customer.
Machine Learning Project - Default credit card clients Vatsal N Shah
- The model we built here will use all possible factors to predict data on customers to find who are defaulters and non‐defaulters next month.
- The goal is to find the whether the clients are able to pay their next month credit amount.
- Identify some potential customers for the bank who can settle their credit balance.
- To determine if their customers could make the credit card payments on‐time.
- Default is the failure to pay interest or principal on a loan or credit card payment.
Credit risks are calculated based on the borrowers’ overall ability to repay. Our objective was to use optimization in order to create a tool that approves or rejects loans to borrowers. We also used optimization to establish how much interest rate/credit will be extended to borrowers who were approved for a loan.
AI powered Decision Making in Banks - How Banks today are using Advanced analytics in credit Decisioning, enhancing customer life time value, lower operating costs and stronger customer acquisition
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
The aim of this project is to help a telecom company with insights on customer behavior that would be useful for retention of customers. The specific goals expected to be achieved are given below
1. Identification of the top variables driving likelihood of churn
2. Build a predictive model to identify customers who have highest probability to terminate services with the company.
3. Build a lift chart for optimization of efforts by targeting most of the potential churns with least contact efforts. Here with 30% of the total customer pool, the model accurately provides 33% of total potential churn candidates.
Models tried to arrive at the best are
1. Simple Models like Logistic Regression & Discriminant Analysis with different thresholds for classification
2. Random Forest after balancing the dataset using Synthetic Minority Oversampling Technique (SMOTE)
3. Ensemble of five individual models and predicting the output by averaging the individual output probabilities
4. Xgboost algorithm
This is a presentation in a meetup called "Business of Data Science". Data science is being leveraged extensively in the field of Banking and Financial Services and this presentation will give a brief and fundamental highlight to the evergreen field.
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
Project Details: In this study, the concept and application of credit scoring in a German banking environment is
explained. A credit scoring model has been developed using logistic regression and random forest. Limitations of
the model are explained and possible solutions are given with an overview of LASSO.
Guide: Dr. Sibnarayan Guria, Associate Professor and Head of the Department, Department of
Statistics, West Bengal State University
Language Used: R
This case study aims to identify patterns that indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of the loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilize this knowledge for its portfolio and risk assessment.
Identifying customer segments using machine learningKnoldus Inc.
Retail businesses keep personalized marketing as a key strategy to target their customers with marketing campaigns. But before you can personalize your messaging, you need to know your customers and segment them according to their persona so that you can formulate common marketing strategies for each segment.
But customer segmentation has several challenges -
~ Segments have become broad as data points have expanded. Basic factors like demographic data are no longer sufficient as companies look for additional segmentation criteria throughout the customer journey to target better.
~ Customer segments need to be updated with changing conditions and there’s a constant need to re-allocate them to different segments. For instance, a customer who has applied for a mortgage will not be interested in another one in a long time.
~ You have to take moment marketing into account, which means your marketing strategy has to be dynamic for members of each segment to react within a moment.
Machine learning goes a long way in solving these problems of customer segmentation as companies can utilize data points to gain a deeper understanding of customer behavior, interests, and preferences.
In this webinar, our ML expert will help you with customer segmentation techniques with Machine Learning algorithms so that you can discover valuable segments to maximize the return on your investments
Implemented Data warehouse on “Retail Stores of five states of USA” by using 3 different data sources including structured and unstructured using SSIS, SSAS and Power BI.
Artificial Intelligence and Digital Banking - What about fraud prevention ?Jérôme Kehrli
Artificial intelligence for banking fraud prevention.
A presentation on how it takes its root in the digitalisation ways and how it impacts customer experience.
How Big Data and Predictive Analytics are revolutionizing AML and Financial C...DataWorks Summit
Banks, Payment Providers and capital markets firms are under intense regulatory mandate to process huge amounts of transaction-related data from both traditional and non-traditional sources. Compliance teams need to constantly analyze data-in-motion (wires, fund transfers, banking transactions) and data-at-rest (years worth of historical data) for actionable intelligence required for Suspicious Activity Reports—to discover illegal activity and provide detailed reporting to authorities. Annual estimates of global money laundering flows ranging anywhere from $ 1 trillion to 2 trillion – almost 5% of global GDP. Almost all of this is laundered via Retail & Merchant Banks, Payment Networks, Securities & Futures firms, Casino Services & Clubs etc – which explains why annual AML related fines on Banking organizations run into the billions and are increasing every year. However, the number of SARs (Suspicious Activity Reports) filed by banking institutions are much higher as a category as compared to the numbers filed by these other businesses. In this presentation we will discuss the business imperatives, value drivers and the woeful inadequacy of current technology architectures and approaches in tackling AML. We will then pivot to a deepdive around Big Data and Predictive Analytics in how they can ease and solve these vexing challenges that Banking executives are grappling with globally.
profiling creditworthiness &entrepreneurship using psychometric toolsRaj Dravid
In Agriculture last mile connect is critical. How do we ensure that the right kind of leadership and creditworthiness is determined using state of art psychometric profiling tools?
This presentation gives us an idea about how to read an CIBIL Report, Credit Score meanings, what are the different ways by which the score is determined which is very rare to know in any other PPT that you come across. Hope this helps all. thanks
Customer segmentation is a Project on Machine learning that is developed by using Clustering & clustering is the technique that comes under unsupervised learning of machine learning.
Segmentation allows prospects based on their wants and needs. It allows identifying the most valuable customer segment so the basis of it vender improve their return on marketing investment by only targeting those likely to be your best customer.
Machine Learning Project - Default credit card clients Vatsal N Shah
- The model we built here will use all possible factors to predict data on customers to find who are defaulters and non‐defaulters next month.
- The goal is to find the whether the clients are able to pay their next month credit amount.
- Identify some potential customers for the bank who can settle their credit balance.
- To determine if their customers could make the credit card payments on‐time.
- Default is the failure to pay interest or principal on a loan or credit card payment.
Credit risks are calculated based on the borrowers’ overall ability to repay. Our objective was to use optimization in order to create a tool that approves or rejects loans to borrowers. We also used optimization to establish how much interest rate/credit will be extended to borrowers who were approved for a loan.
AI powered Decision Making in Banks - How Banks today are using Advanced analytics in credit Decisioning, enhancing customer life time value, lower operating costs and stronger customer acquisition
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
The aim of this project is to help a telecom company with insights on customer behavior that would be useful for retention of customers. The specific goals expected to be achieved are given below
1. Identification of the top variables driving likelihood of churn
2. Build a predictive model to identify customers who have highest probability to terminate services with the company.
3. Build a lift chart for optimization of efforts by targeting most of the potential churns with least contact efforts. Here with 30% of the total customer pool, the model accurately provides 33% of total potential churn candidates.
Models tried to arrive at the best are
1. Simple Models like Logistic Regression & Discriminant Analysis with different thresholds for classification
2. Random Forest after balancing the dataset using Synthetic Minority Oversampling Technique (SMOTE)
3. Ensemble of five individual models and predicting the output by averaging the individual output probabilities
4. Xgboost algorithm
This is a presentation in a meetup called "Business of Data Science". Data science is being leveraged extensively in the field of Banking and Financial Services and this presentation will give a brief and fundamental highlight to the evergreen field.
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
Project Details: In this study, the concept and application of credit scoring in a German banking environment is
explained. A credit scoring model has been developed using logistic regression and random forest. Limitations of
the model are explained and possible solutions are given with an overview of LASSO.
Guide: Dr. Sibnarayan Guria, Associate Professor and Head of the Department, Department of
Statistics, West Bengal State University
Language Used: R
This case study aims to identify patterns that indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of the loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilize this knowledge for its portfolio and risk assessment.
Identifying customer segments using machine learningKnoldus Inc.
Retail businesses keep personalized marketing as a key strategy to target their customers with marketing campaigns. But before you can personalize your messaging, you need to know your customers and segment them according to their persona so that you can formulate common marketing strategies for each segment.
But customer segmentation has several challenges -
~ Segments have become broad as data points have expanded. Basic factors like demographic data are no longer sufficient as companies look for additional segmentation criteria throughout the customer journey to target better.
~ Customer segments need to be updated with changing conditions and there’s a constant need to re-allocate them to different segments. For instance, a customer who has applied for a mortgage will not be interested in another one in a long time.
~ You have to take moment marketing into account, which means your marketing strategy has to be dynamic for members of each segment to react within a moment.
Machine learning goes a long way in solving these problems of customer segmentation as companies can utilize data points to gain a deeper understanding of customer behavior, interests, and preferences.
In this webinar, our ML expert will help you with customer segmentation techniques with Machine Learning algorithms so that you can discover valuable segments to maximize the return on your investments
Implemented Data warehouse on “Retail Stores of five states of USA” by using 3 different data sources including structured and unstructured using SSIS, SSAS and Power BI.
Artificial Intelligence and Digital Banking - What about fraud prevention ?Jérôme Kehrli
Artificial intelligence for banking fraud prevention.
A presentation on how it takes its root in the digitalisation ways and how it impacts customer experience.
How Big Data and Predictive Analytics are revolutionizing AML and Financial C...DataWorks Summit
Banks, Payment Providers and capital markets firms are under intense regulatory mandate to process huge amounts of transaction-related data from both traditional and non-traditional sources. Compliance teams need to constantly analyze data-in-motion (wires, fund transfers, banking transactions) and data-at-rest (years worth of historical data) for actionable intelligence required for Suspicious Activity Reports—to discover illegal activity and provide detailed reporting to authorities. Annual estimates of global money laundering flows ranging anywhere from $ 1 trillion to 2 trillion – almost 5% of global GDP. Almost all of this is laundered via Retail & Merchant Banks, Payment Networks, Securities & Futures firms, Casino Services & Clubs etc – which explains why annual AML related fines on Banking organizations run into the billions and are increasing every year. However, the number of SARs (Suspicious Activity Reports) filed by banking institutions are much higher as a category as compared to the numbers filed by these other businesses. In this presentation we will discuss the business imperatives, value drivers and the woeful inadequacy of current technology architectures and approaches in tackling AML. We will then pivot to a deepdive around Big Data and Predictive Analytics in how they can ease and solve these vexing challenges that Banking executives are grappling with globally.
profiling creditworthiness &entrepreneurship using psychometric toolsRaj Dravid
In Agriculture last mile connect is critical. How do we ensure that the right kind of leadership and creditworthiness is determined using state of art psychometric profiling tools?
This presentation gives us an idea about how to read an CIBIL Report, Credit Score meanings, what are the different ways by which the score is determined which is very rare to know in any other PPT that you come across. Hope this helps all. thanks
A Very informative and detailed explanation of what your credit rights are, what your credit score means, and United Credit Education Services Company overview and procedures.
AIBB 202 Lesson 2.6: CRG & Internal Credit Risk Rating Systems (ICRRS)Saiful Islam
Internal Credit Risk Rating System refers to the system to analyze a borrower's repayment
ability based on information about a customer's financial condition including its liquidity,
cash flow, profitability, debt profile, market indicators, industry and operational
background, management capabilities, and other indicators. The summary indicator derived from the system will be called Internal Credit Risk Rating (ICRR).
Herewith an update of some research of mine on the relative performance of Emerging or Early Stage Hedge funds versus that of their older typically larger brethren. Given the recent announcement by CalPERS that they are withdrawing from hedge funds I thought it might be germane to show that notwithstanding CalPERS exit there remain some signs of life for hedge funds yet.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
2. BRIEF OVERVIEW:
To identify the attributes having influential power in
decision making to either reject or accept loan application.
Context of the data set: The original dataset contains 1000
entries with 20 categorical/symbolic attributes. In this
dataset, each entry represents a person who takes a credit
by a bank. Each person is classified as good or bad credit
risks according to the set of attributes.
3. S.No Variable Description Data type
1 Credibility 1 : credit-worthy; [good risk]
0 : not credit-worthy [ bad risk ]
Categorical
2 Balance of
current
account
no running account - 1
No balance or debit -2;
0 <= ... < 200 DM – 3;
... >= 200 DM or checking account for at least 1 year-4;
Categorical
3 Duration in
months
(metric)
[<=12] – up to 1 year
[12< ... <= 24] – 1-2 years
[24 < ... <= 36] – 2-3 years
[36 < ... <= 48] – 3- 4 years
[48< ... <= 60] – 4-5 years
[60 < ... <= 72] – 5-6 years
NUMERICAL
4 Payment of
previous
credits
no previous credits / paid back all previous credits - 2
paid back previous credits at this bank - 4
no problems with current credits at this bank - 3
problematic running account / there are further credits running but
at other banks – 1
hesitant payment of previous credits - 0
CATEGORICAL
5 Purpose of
credit
new car - 1
used car - 2
items of furniture - 3
radio / television - 4
household appliances- 5
Repair -6
Education - 7
Vacation- 8
Retraining -9
Business- 10
Other -0
CATEGORICAL
ATTRIBUTES:
4. S.No Variable Description Data type
6 Amount of credit in DM [<=1500 ] - 1;
[1500 < ... <= 4500] - 2;
[4500 < ... <= 7500] - 3;
[7500 < ... <= 10500] - 4;
[10500 < ... <=13500] - 5;
[13500 < ... <= 16500] - 6;
[> 16500] - 7
Numerical
7 Value of savings or stocks not available / no savings - 1
[< 100], - 2
[100,- <= ... < 500], - 3
[500,- <= ... < 1000], - 4
[>= 1000], - 5
Categorical
8 Has been employed by
current employer
For Unemployed - 1
[<= 1] - 2
[1 <= ... < 4 ] - 3
[4 <= ... < 7]- 4
[>= 7] - 5
Categorical
9 rate Instalment in % of
available income
[>= 35] - 1
[25 <= ... < 35] - 2
[20 <= ... < 25] - 3
[< 20] - 4
Categorical
10 Marital Status / Sex male: divorced / living apart – 1; male: single- 2
male: married / widowed – 3; female: 4
Categorical
11 Further debtors /
Guarantors
None – 1; Co-Applicant – 2; Guarantor - 3 Categorical
12 Living in current household
for
[< 1 year] - 1
[1 <= ... < 4 ] years - 2
[4 <= ... < 7] years - 3
[ >= 7 ] years - 4
Categorical
5. S.No Variable Description Data type
13 Most valuable available assets Ownership of house or land - 4
Savings contract with a building society / life
insurance - 3
Car / other - 2
Not available / no assets -1
Categorical
14 Age in years (categorized) [0 <= ... <= 25] - 1
[ 26 <= ... <= 39 ] - 2
[ 40 <= ... <= 59] - 3
[ 60 <= ... <= 64 ] - 4
[ >= 65 ] - 5
Numerical
15 Further running credits At other banks – 1
At department store or mail order house - 2
No further running credits – 3
Categorical
16 Type of apartment Rented-1; owned – 2 ; free - 3 Categorical
17 Number of previous credits at
this bank (including the
running one)
One- 1; two or three – 2; four or five –
3; six and above - 4
Categorical
18 Occupation Unemployed / unskilled with no permanent
residence - 1
Unskilled with permanent residence - 2
Skilled worker / skilled employee / minor civil
servant - 3
Executive / self-employed / higher civil servant
- 4
Categorical
19 Number of persons entitled to
maintenance
0 to 2 – 2 ; 3 and more - 1 Numerical
20 Telephone No- 1 ; yes - 2 Categorical
21 Foreign worker Yes- 1; no - 2 Categorical
6. • We have the population
distribution in
proposition of 70:30 risk
wise
• We have 4 numeric and
16 categorical features.
• Few non influencing
variables which may
not contribute for
decision making
• To find, which is the
most influencing
variable, we adapted a
techniques – WOE-IV
From the data:
8. WOE & IV are simple,
yet powerful
techniques to
perform variable
transformation and
selection.
It is widely used in
credit scoring to
measure the
separation of good vs
bad customers.
10. Age Group
Total
Number of
Loans
Number of
Bad Loans
Numbef of
Good
Loans
% Bad
Loans
Name of
Group
Distibution
Bad (DB)
Distibution
Good (DG)
WOE DG - DB
(DG - DB)*
WOE
21 - 30 4821 206 4615 4.3% G1 0.135 0.078 -0.553 -0.057 0.0318
30 - 36 10266 357 9909 3.5% G2 0.235 0.167 -0.339 -0.067 0.0228
36 - 48 32926 776 32150 2.4% G3 0.510 0.542 0.062 0.032 0.0020
48 - 60 12788 183 12605 1.4% G4 0.120 0.213 0.570 0.092 0.0527
Total 60801 1522 59279 Information Value --> 0.1093
11. Higher the age higher
the credibility
But above sixty years
i.e., after retirement the
credibility is reduced
IV : 0.093
Weak predictive Power
Female have good
credibility
Among male married
have high credibility
IV : 0.045
Weak predictive Power
12. Higher the balance in
account more the
probability to fall in good
risk
IV :
Savings Account: 0.196
Medium predictive Power
Current Account:0.666
Suspicious Predictive
Power / Too good to rely
on
Predictive Power Of:
CA>SB
13. Duration In
Months
Lower the duration
lower the bad risk
IV : 0.166
Medium predictive
Power
Amount of credit
Lower the amount
lower the bad risk
<=1500 also have slight
increase in bad risk
IV : 0.165
Medium predictive
Power
14. PURPOSE OF
CREDIT
If the purpose of the loan
is to create an asset good
risk should be high
Where as the purpose is
an expenditure , bad risk
should be high.
But for vacation it shows
high good risk.
On Further observation,
the no of loan given for
the purpose of vacation
are just 9 not even 1%
(0.9 %)
Hence ignored..!
IV : 0.166
Medium predictive Power
PURPOSE 0 1 2 3 4 5 6 8 9 10
NOT CREDIBLE 89 17 58 62 4 8 22 1 34 5
CREDIBLE 145 86 123 218 8 14 28 8 63 7
Grand Total 234 103 181 280 12 22 50 9 97 12
15. Higher the no of years
employment , Higher the
credibility
IV : 0.086
Weak predictive Power
People with no assets are
having high probability of
falling into credible
category
IV : 0.113
Medium predictive Power
16. Payment Of
Previous Credits
Bad risk is observed in
people who are hesitant
to pay previous credits
IV : 0.293
Medium predictive Power
Bad risk is observed in
people whose instalment
is lower in % of the
income.
Which is contrary…!
Though the pattern is
almost resembling the
population.
IV : 0.026
Weak predictive Power
17. Higher the no of credits
availed higher the
credibility.
But not more than 6
credit facilities.
IV : 0.013
Not useful for prediction
People with no current
credits are having high
credibility.
IV : 0.085
Weak predictive Power
18. If the loan is secured by a
guarantor it shows high
credibility.
IV : 0.032
Weak predictive Power
People work abroad are
given high credibility
IV : 0.087
Weak predictive Power
For people who have
Rented housing as got
high credibility..!
IV : 0.085
Weak predictive Power
17.9% 71.4% 10.7%
96.3 % 3.7 %
20. Further analysis…!
ATTRIBUTE IV INTERPRETATION
Current Account Balance 0.666 Suspicious Predictive Power
Payment Status Of Previous Credit 0.293 Medium predictive Power
Value Savings/Stocks 0.196 Medium predictive Power
Purpose 0.166 Medium predictive Power
Duration Of Credit (Month) 0.165 Medium predictive Power
Credit Amount 0.119 Medium predictive Power
Most Valuable Available Asset 0.113 Medium predictive Power
Age 0.093 Weak predictive Power
Foreign Worker 0.087 Weak predictive Power
Length Of Current Employment 0.086 Weak predictive Power
Housing 0.085 Weak predictive Power
Concurrent Credit 0.058 Weak predictive Power
Sex & Marital Status 0.045 Weak predictive Power
Guarantor /Debtor 0.032 Weak predictive Power
Instalment Per Cent 0.026 Weak predictive Power
No Of Credits 0.013 Not useful for prediction
Telephone 0.01 Not useful for prediction
Occupation 0.009 Not useful for prediction
Duration In Current House 0.004 Not useful for prediction
Dependents 0.00004 Not useful for prediction
21. CHOOSING MODEL
when customer applies for a loan, the bank accepts or rejects the
application based on predicted risk -probability of default- for the
application.
Considering this is an objective segmentation, we need to have a
target/dependent variable. In this case it will be whether a
customer has Bad or good risk over the loan.
If we are working on an objective segmentation problem, our aim
is to find conditions which help us find a segment which is very
similar on target variable value.
Decision Tree is one of the commonly used as objective
segmentation techniques.
Based on the WOE – IV we have chosen the variables with good
predictive power for building a decision tree