SlideShare a Scribd company logo
Name: Kunal Kashyap
College: Indian Institute of Management Kashipur
Case:
Round 3: Grand Finale
Personal Loan Risk Assessment on
Two-Wheeler Loan Customer Base
Business Problem Snapshot
Business Problem
Approach taken
Objectives
• To identify the segment of customers, who have a higher
tendency to default, if they are offered a Personal Loan
• To leverage the existing Two-Wheeler Loan (TW) customer
base to cross sell the Personal Loan product
• To develop a prediction model to classify the customer
base into Risky and Non-Risky categories for rejecting and
considering them for PL offer respectively
Problem
Statement
Credit Process Flow
Analyzing data
Modelling
Cost-Benefit Analysis
Live loans
closed loans
enquiries Gender
Age
Interest rate
Tenure
EMI
MOB
First EMI Bounce
Total down payment
Total Loan amount
Two-Wheeler loans
Employment type
Number of times defaulted
Cost of Asset
bounces with TVS Credit
bounces in last 3 months
Available
data
Payment History of 1.2 Lakh Customers
Prediction Model will help in the classification
Recommendation
& Deployment
Methodology Used | Research Insights
Start
End
Key Highlight
Team Data
Science
Process
(TDSP)
methodology
has been used
for solving this
case
Business
Understanding
Modelling
Data acquisition
& Understanding
Deployment
Approach
TDSP methodology
• Small ticket personal loans (STPL) are considered as
personal loans of ticket size less than Rs 50,000
• STPL market – 12000 Cr as of Aug 2020 | Half of them is
for loans below Rs 5000
• TG - young, low income, digitally savvy customers who
have small ticket and short-term credit needs, and no or
limited credit history customers
• Demand driver -> millennials and young borrowers in
the age group 18-30 years
140 %
Growth in FY 2019 | Driven by STPL
segment
• Home renovation, wedding, higher education or
travel costs
• To meet a medical emergency et al.
End-use
Research Insights
• Alternative data – digital footprint of customers such
as Social media profile, mobile bill, Social scoring by
psychometric analysis through digital footprints
Sources: Microsoft TDSP methodology| Paisa bazaar | BCG report | Financial Express
Business Problem
Approach taken
Analyzing data
Modelling
Cost-Benefit Analysis
Recommendation
& Deployment
Data Wrangling, Exploration & Cleaning
Key Highlight
Ensemble
algorithm to
be used for
future work to
achieve higher
accuracy and
enhanced
business
opportunities
Features V1, V11,
V13, and V17 have
not been used for
modelling
technique
Transformation -
Data has been
normalized using
Min-Max method
One hot encoding
has been applied
on features
V15(Gender) and
V16(Employment)
Dataset was split
in equal
proportion for
Testing and
Training purpose
Four features –
V21, V22, V28,
and V29 were
removed due to
missing values or
very less data
A new feature
named ‘Age’ has
been created from
V18 and V18 is
removed
Random Over-
sampling and
Random Under-
sampling of minority
class and majority
class was performed
respectively due to
imbalanced nature
of dataset
Step 7
Step 6
Step 5
Step 4
Step 3
Step 2
We are left with
119,486
customers after
removing rows
with incomplete
data
Step 1
The data consists of past loan history of 119,529 customers; It has 30 features from various sources
Data Source
Business Problem
Approach taken
Analyzing data
Modelling
Cost-Benefit Analysis
Recommendation
& Deployment
Classification
Good Customer
(Non-default)
Bad Customer
(Default)
Random
Oversampling of
minority class
Random
Undersampling of
majority class
Modeling Architecture
Modified
dataset
Given
dataset
Loaded
dataset
Evaluation metrics
Test set
Training set
Random Forrest Model
Overall dataset
Other
Models
Logistic regression
Deep Neural Network
SMOTE using KNN for
minority class
generation
Random Forrest Model
Business Problem
Approach taken
Analyzing data
Modelling
Cost-Benefit Analysis
Recommendation
& Deployment
Architecture
Classification Model
Evaluation Metrics
Modelling: Random Forest
Features Description Importance Cumulative score
V27 Number of times defaulted in last 12 months 0.128 0.128
V26 Number of times defaulted in last 6 months 0.103 0.232
Age Age of customers 0.083 0.314
V25 Number of times defaulted in last 3 months 0.076 0.390
V7 Total down payment of existing loan 0.068 0.457
V8 EMI of existing loan 0.066 0.523
V6 Cost of Asset (existing loan) 0.064 0.588
V9 Total Loan amount of existing loan 0.061 0.649
V23 Number of closed loans 0.059 0.708
V14 Rate of interest for existing loan 0.054 0.761
V4 MOB (Month of business with TVS Credit) 0.051 0.813
Since our objective is to segregate the customers into two
categories, we will use a Classification Model to achieve this.
Random Forest Classifier
This method is an ensemble technique used for classification by
constructing multitude of decision trees on training set (we trained
model with 1000 trees with 99.9% accuracy on training set)
Below are the top 11 variables with higher importance in
building the model
From the Random Forest model, we identified the
parameters contributing significantly in classifying the risky
& non-risky customers. The Importance column in the table
shows the significance of parameters. Higher the value,
higher the impact!
Classification Model
Output Snapshot
Business Problem
Approach taken
Analyzing data
Modelling
Cost-Benefit Analysis
Recommendation
& Deployment
Architecture
Classification Model
Evaluation Metrics
Note: Python code files and API files are attached on Annexure slide
Evaluation Metrics: Confusion matrix provides a performance summary of the classifier
Evaluation metric on Training set
Accuracy Sensitivity Precision Specificity F1 Score MCC
99.94% 100% 99.80% 99.91% 99.91% 99.87%
True Negative
(TN)
True Positive
(TP)
False Positive
(FP)
False Negative
(FN)
34942 17618 31 0
Evaluation metric on Test set
Accuracy Sensitivity Precision Specificity F1 Score MCC
98.75% 99.82% 96.53% 98.22% 98.15% 97.24%
True Negative
(TN)
True Positive
(TP)
False Positive
(FP)
False Negative
(FN)
34524 17412 625 31
99.94% of customers were
correctly labelled by the Model
Of all the customers, who were
predicted of defaulting on loan
payment, 99.80% defaulted
The Model predicted 100%
customers correctly who could
default on loan payment
Of all the customers, 99.91% of
non-defaulters were correctly
labelled by the Model
98.75% of customers were
correctly labelled by the Model
Of all the customers, who were
predicted of defaulting on loan
payment, 96.53% defaulted
The Model predicted 99.82%
customers correctly who could
default on loan payment
Of all the customers, 98.22% of
non-defaulters were correctly
labelled by the Model
Notes: MCC – Matthew Correlation Coefficient
Business Problem
Approach taken
Analyzing data
Modelling
Cost-Benefit Analysis
Recommendation
& Deployment
Architecture
Classification Model
Evaluation Metrics
Business Metrics
Note:
V6: Cost of
Asset (existing
loan)
V7: Total
down
payment of
existing loan
V8: EMI of
existing loan
V10: Tenure of
existing loan:
Evaluation metric on Original full dataset
Accuracy Sensitivity Precision Specificity F1 Score MCC
98.80% 99.89% 64.69% 99.78% 78.53% 79.89%
True Negative
(TN)
True Positive
(TP)
False Positive
(FP)
False Negative
(FN)
115446 2611 1425 3
98.80% of customers were
correctly labelled by the Model
Of all the customers, who were
predicted of defaulting on loan
payment, 64.69% defaulted
The Model predicted 99.89%
customers correctly who could
default on loan payment
Of all the customers, 99.78% of
non-defaulters were correctly
labelled by the Model
Business
metrics
Particulars Business Value
Avg. loan amount (V9) 39322
No. of defaults (V30) 2614
Total loss (without model) 102787708
Avg. loan amount 39322
No. of defaults (model_FN) 3
Total loss with defaults_model 117966
Opportunity loss( # customers)_FP 1425
value lost (V10*V8 + V7 -V6 ) 258
opportunity loss with model 367650
Total loss with model 485616
Loss saved with modelling 102302092
Percentage of loss saved 99.53%
Net Profit
(-72634990)
Total Profit
(30152718)
Total Loss
(102787708)
Without Model With Proposed Model
Net Profit
(29299452)
Total Profit
(29785068)
Total Loss
(485616)
With proposed model, We are making transition
from approx. - 7 crore to +3 crore in profits.
We are saving around 99.5% in losses from using the RF model
Business Problem
Approach taken
Analyzing data
Modelling
Cost-Benefit
Analysis
Recommendation
& Deployment
Deployment
Recommendations
• It is recommended to use
analytical model like the
proposed one to save losses for
this initiative
• Alternative data – Digital
footprint of customers such as
Social media profile, Social scoring
by psychometric analysis through
digital footprints to be used
Business Problem
Approach taken
Analyzing data
Modelling
Cost-Benefit Analysis
Recommendation
& Deployment
Call POST: Created API is called using POST where it displays HTML page to enter the input of feature. Post execution, console will let us know
the output based on model.
THANK YOU
“It always seems impossible until it’s done.”
- Nelson Mandela
Annexures
Feature Feature Definition
V1 Customer's ID
V2 First EMI Bounce (0 : No, 1: Yes) (existing loan)
V3 Number of bounces in last 3 months Outside TVS Credit
V4 MOB (Month of business with TVS Credit)
V5 Number of bounces with TVS Credit
V6 Cost of Asset (existing loan)
V7 Total down payment of existing loan
V8 EMI of existing loan
V9 Total Loan amount of existing loan
V10 Tenure of existing loan
V11 Customer's Geographical Area Code
V12 Customer's TW Dealer's Code
V13 Customer's TW Model’s Code
V14 Rate of interest for existing loan
V15 Gender
V16
Employment type of customer (SAL : Salaried, SELF : Self-employed, HOUSEWIFE, PENS :
Pensioner, STUDENT)
V17 Pin code
V18 Date of Birth
V19 Number of Live loans
V20 Number of Two-Wheeler loans
V21 Maximum sanction amount of Live Loans
V22 Number of new loans taken in last 3 months
V23 Number of closed loans
V24 Number of enquiries
V25 Number of times defaulted in last 3 months
V26 Number of times defaulted in last 6 months
V27 Number of times defaulted in last 12 months
V28 Maximum loan amount sanctioned for any Gold loan
V29 Maximum loan amount sanctioned for any personal loan
V30 Target variable ( 1: Bad Customer / 0 : Good Customer )
Assumptions:
• Complete EMI duration has been
taken irrespective of at what point
customer is going default due to lack
of information
• This conservative approach should be
offset by the depreciation of assets
• Avg. loan amount and avg. tenure are
considered for calculation
Data Dictionary
Python Code

More Related Content

What's hot

Super shampoo products and the indian mass market case study
Super shampoo products and the indian mass market case studySuper shampoo products and the indian mass market case study
Super shampoo products and the indian mass market case study
Mustahid Ali
 
5 FORCE ANALYSIS OF THE CEMENT INDUSTRY IN INDIA
5 FORCE ANALYSIS OF THE  CEMENT INDUSTRY IN INDIA5 FORCE ANALYSIS OF THE  CEMENT INDUSTRY IN INDIA
5 FORCE ANALYSIS OF THE CEMENT INDUSTRY IN INDIA
Rohit Digra
 
A marketing project report on tanishq
A marketing project report on tanishqA marketing project report on tanishq
A marketing project report on tanishq
Projects Kart
 
Merger & acquisition, Hindalco Novelis
Merger & acquisition, Hindalco NovelisMerger & acquisition, Hindalco Novelis
Merger & acquisition, Hindalco Novelis
Gagan Pareek, PMP
 
A HBR case study on Depreciation at delta airlines and singapore airlines
A HBR case study on Depreciation at delta airlines and singapore airlinesA HBR case study on Depreciation at delta airlines and singapore airlines
A HBR case study on Depreciation at delta airlines and singapore airlines
Swaraj Mishra
 
portfolio risk
portfolio riskportfolio risk
portfolio riskAttiq Khan
 
CUSTOMER PROFITABILIY AND CUSTOMER RELATIONSHIP MANAGAEMTN AT RBC FINANCIAL G...
CUSTOMER PROFITABILIY AND CUSTOMER RELATIONSHIP MANAGAEMTN AT RBC FINANCIAL G...CUSTOMER PROFITABILIY AND CUSTOMER RELATIONSHIP MANAGAEMTN AT RBC FINANCIAL G...
CUSTOMER PROFITABILIY AND CUSTOMER RELATIONSHIP MANAGAEMTN AT RBC FINANCIAL G...KRISHNA SOWJANYA
 
Om case agarwal
Om case agarwalOm case agarwal
Om case agarwal
Mainan Ray
 
Accountancy Comprehensive Project For Class - 12th on Partnership Firm
Accountancy Comprehensive Project For Class - 12th on Partnership FirmAccountancy Comprehensive Project For Class - 12th on Partnership Firm
Accountancy Comprehensive Project For Class - 12th on Partnership Firm
Priyanka Sahu
 
Project titles for mba research project
Project titles for mba research projectProject titles for mba research project
Project titles for mba research projectEzhil Arasan
 
Chad cameroon pipeline project
Chad cameroon pipeline projectChad cameroon pipeline project
Chad cameroon pipeline project
Ujjwal Joshi
 
A project report on consumer’s preference among the branded and non branded j...
A project report on consumer’s preference among the branded and non branded j...A project report on consumer’s preference among the branded and non branded j...
A project report on consumer’s preference among the branded and non branded j...
Projects Kart
 
Dell's Working Capital
Dell's Working CapitalDell's Working Capital
Dell's Working Capital
Rohit Patidar
 
Presentation Case Tri Star - Final
Presentation Case Tri Star - FinalPresentation Case Tri Star - Final
Presentation Case Tri Star - FinalSpencer Cheung
 
Dividend theories
Dividend theoriesDividend theories
Dividend theoriesice456
 
Case study biovail
Case study biovailCase study biovail
Case study biovail
Anurag Joshi
 
Natureview Farm - Harvard Case Study
Natureview Farm - Harvard Case StudyNatureview Farm - Harvard Case Study
Natureview Farm - Harvard Case Study
Santhosh Kumar
 
PORTFOLIO PERFORMANCE EVALUATION
PORTFOLIO PERFORMANCE EVALUATIONPORTFOLIO PERFORMANCE EVALUATION
PORTFOLIO PERFORMANCE EVALUATION
Dinesh Kumar
 
Capital Asset Pricing Model (CAPM)
Capital Asset Pricing Model (CAPM)Capital Asset Pricing Model (CAPM)
Capital Asset Pricing Model (CAPM)
Heickal Pradinanta
 

What's hot (20)

Super shampoo products and the indian mass market case study
Super shampoo products and the indian mass market case studySuper shampoo products and the indian mass market case study
Super shampoo products and the indian mass market case study
 
5 FORCE ANALYSIS OF THE CEMENT INDUSTRY IN INDIA
5 FORCE ANALYSIS OF THE  CEMENT INDUSTRY IN INDIA5 FORCE ANALYSIS OF THE  CEMENT INDUSTRY IN INDIA
5 FORCE ANALYSIS OF THE CEMENT INDUSTRY IN INDIA
 
A marketing project report on tanishq
A marketing project report on tanishqA marketing project report on tanishq
A marketing project report on tanishq
 
Merger & acquisition, Hindalco Novelis
Merger & acquisition, Hindalco NovelisMerger & acquisition, Hindalco Novelis
Merger & acquisition, Hindalco Novelis
 
A HBR case study on Depreciation at delta airlines and singapore airlines
A HBR case study on Depreciation at delta airlines and singapore airlinesA HBR case study on Depreciation at delta airlines and singapore airlines
A HBR case study on Depreciation at delta airlines and singapore airlines
 
portfolio risk
portfolio riskportfolio risk
portfolio risk
 
CUSTOMER PROFITABILIY AND CUSTOMER RELATIONSHIP MANAGAEMTN AT RBC FINANCIAL G...
CUSTOMER PROFITABILIY AND CUSTOMER RELATIONSHIP MANAGAEMTN AT RBC FINANCIAL G...CUSTOMER PROFITABILIY AND CUSTOMER RELATIONSHIP MANAGAEMTN AT RBC FINANCIAL G...
CUSTOMER PROFITABILIY AND CUSTOMER RELATIONSHIP MANAGAEMTN AT RBC FINANCIAL G...
 
Om case agarwal
Om case agarwalOm case agarwal
Om case agarwal
 
Ratio analysis project presentation
Ratio analysis project presentationRatio analysis project presentation
Ratio analysis project presentation
 
Accountancy Comprehensive Project For Class - 12th on Partnership Firm
Accountancy Comprehensive Project For Class - 12th on Partnership FirmAccountancy Comprehensive Project For Class - 12th on Partnership Firm
Accountancy Comprehensive Project For Class - 12th on Partnership Firm
 
Project titles for mba research project
Project titles for mba research projectProject titles for mba research project
Project titles for mba research project
 
Chad cameroon pipeline project
Chad cameroon pipeline projectChad cameroon pipeline project
Chad cameroon pipeline project
 
A project report on consumer’s preference among the branded and non branded j...
A project report on consumer’s preference among the branded and non branded j...A project report on consumer’s preference among the branded and non branded j...
A project report on consumer’s preference among the branded and non branded j...
 
Dell's Working Capital
Dell's Working CapitalDell's Working Capital
Dell's Working Capital
 
Presentation Case Tri Star - Final
Presentation Case Tri Star - FinalPresentation Case Tri Star - Final
Presentation Case Tri Star - Final
 
Dividend theories
Dividend theoriesDividend theories
Dividend theories
 
Case study biovail
Case study biovailCase study biovail
Case study biovail
 
Natureview Farm - Harvard Case Study
Natureview Farm - Harvard Case StudyNatureview Farm - Harvard Case Study
Natureview Farm - Harvard Case Study
 
PORTFOLIO PERFORMANCE EVALUATION
PORTFOLIO PERFORMANCE EVALUATIONPORTFOLIO PERFORMANCE EVALUATION
PORTFOLIO PERFORMANCE EVALUATION
 
Capital Asset Pricing Model (CAPM)
Capital Asset Pricing Model (CAPM)Capital Asset Pricing Model (CAPM)
Capital Asset Pricing Model (CAPM)
 

Similar to Personal Loan Risk Assessment

AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banks
Pankaj Baid
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
IRJET Journal
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
Aniket Patil
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
patilaniket2418
 
Use of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyUse of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economy
Amit Parija
 
Data science vs real world: friends or foes - Pavle Kecman
Data science vs real world: friends or foes - Pavle KecmanData science vs real world: friends or foes - Pavle Kecman
Data science vs real world: friends or foes - Pavle Kecman
Institute of Contemporary Sciences
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
Aseda Owusua Addai-Deseh
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
Roger Barga
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
 
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics OrientationHWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ Hochschule für Wirtschaft
 
Using AI and ML Solutions for Proactive Customer Retention.pptx
Using AI and ML Solutions for Proactive Customer Retention.pptxUsing AI and ML Solutions for Proactive Customer Retention.pptx
Using AI and ML Solutions for Proactive Customer Retention.pptx
VOZIQ
 
Alhuda CIBE -Mfi insight analytics
Alhuda CIBE -Mfi insight analyticsAlhuda CIBE -Mfi insight analytics
Alhuda CIBE -Mfi insight analytics
Alhuda Centre of Islamic Banking & Economics
 
Emvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce Deck
Emvigo Technologies
 
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BI
CCG
 
Integrated Contact Center (Final)
Integrated Contact Center (Final)Integrated Contact Center (Final)
Integrated Contact Center (Final)
Anand Rao
 
Wooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersWooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit Customers
Lucinda Linde
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
Boston Institute of Analytics
 
Presentation Title
Presentation TitlePresentation Title
Presentation Titlebutest
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMs
IRJET Journal
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 

Similar to Personal Loan Risk Assessment (20)

AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banks
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Use of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyUse of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economy
 
Data science vs real world: friends or foes - Pavle Kecman
Data science vs real world: friends or foes - Pavle KecmanData science vs real world: friends or foes - Pavle Kecman
Data science vs real world: friends or foes - Pavle Kecman
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics OrientationHWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
 
Using AI and ML Solutions for Proactive Customer Retention.pptx
Using AI and ML Solutions for Proactive Customer Retention.pptxUsing AI and ML Solutions for Proactive Customer Retention.pptx
Using AI and ML Solutions for Proactive Customer Retention.pptx
 
Alhuda CIBE -Mfi insight analytics
Alhuda CIBE -Mfi insight analyticsAlhuda CIBE -Mfi insight analytics
Alhuda CIBE -Mfi insight analytics
 
Emvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce Deck
 
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BI
 
Integrated Contact Center (Final)
Integrated Contact Center (Final)Integrated Contact Center (Final)
Integrated Contact Center (Final)
 
Wooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersWooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit Customers
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMs
 
Creditscore
CreditscoreCreditscore
Creditscore
 

Recently uploaded

Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Personal Loan Risk Assessment

  • 1. Name: Kunal Kashyap College: Indian Institute of Management Kashipur Case: Round 3: Grand Finale Personal Loan Risk Assessment on Two-Wheeler Loan Customer Base
  • 2. Business Problem Snapshot Business Problem Approach taken Objectives • To identify the segment of customers, who have a higher tendency to default, if they are offered a Personal Loan • To leverage the existing Two-Wheeler Loan (TW) customer base to cross sell the Personal Loan product • To develop a prediction model to classify the customer base into Risky and Non-Risky categories for rejecting and considering them for PL offer respectively Problem Statement Credit Process Flow Analyzing data Modelling Cost-Benefit Analysis Live loans closed loans enquiries Gender Age Interest rate Tenure EMI MOB First EMI Bounce Total down payment Total Loan amount Two-Wheeler loans Employment type Number of times defaulted Cost of Asset bounces with TVS Credit bounces in last 3 months Available data Payment History of 1.2 Lakh Customers Prediction Model will help in the classification Recommendation & Deployment
  • 3. Methodology Used | Research Insights Start End Key Highlight Team Data Science Process (TDSP) methodology has been used for solving this case Business Understanding Modelling Data acquisition & Understanding Deployment Approach TDSP methodology • Small ticket personal loans (STPL) are considered as personal loans of ticket size less than Rs 50,000 • STPL market – 12000 Cr as of Aug 2020 | Half of them is for loans below Rs 5000 • TG - young, low income, digitally savvy customers who have small ticket and short-term credit needs, and no or limited credit history customers • Demand driver -> millennials and young borrowers in the age group 18-30 years 140 % Growth in FY 2019 | Driven by STPL segment • Home renovation, wedding, higher education or travel costs • To meet a medical emergency et al. End-use Research Insights • Alternative data – digital footprint of customers such as Social media profile, mobile bill, Social scoring by psychometric analysis through digital footprints Sources: Microsoft TDSP methodology| Paisa bazaar | BCG report | Financial Express Business Problem Approach taken Analyzing data Modelling Cost-Benefit Analysis Recommendation & Deployment
  • 4. Data Wrangling, Exploration & Cleaning Key Highlight Ensemble algorithm to be used for future work to achieve higher accuracy and enhanced business opportunities Features V1, V11, V13, and V17 have not been used for modelling technique Transformation - Data has been normalized using Min-Max method One hot encoding has been applied on features V15(Gender) and V16(Employment) Dataset was split in equal proportion for Testing and Training purpose Four features – V21, V22, V28, and V29 were removed due to missing values or very less data A new feature named ‘Age’ has been created from V18 and V18 is removed Random Over- sampling and Random Under- sampling of minority class and majority class was performed respectively due to imbalanced nature of dataset Step 7 Step 6 Step 5 Step 4 Step 3 Step 2 We are left with 119,486 customers after removing rows with incomplete data Step 1 The data consists of past loan history of 119,529 customers; It has 30 features from various sources Data Source Business Problem Approach taken Analyzing data Modelling Cost-Benefit Analysis Recommendation & Deployment
  • 5. Classification Good Customer (Non-default) Bad Customer (Default) Random Oversampling of minority class Random Undersampling of majority class Modeling Architecture Modified dataset Given dataset Loaded dataset Evaluation metrics Test set Training set Random Forrest Model Overall dataset Other Models Logistic regression Deep Neural Network SMOTE using KNN for minority class generation Random Forrest Model Business Problem Approach taken Analyzing data Modelling Cost-Benefit Analysis Recommendation & Deployment Architecture Classification Model Evaluation Metrics
  • 6. Modelling: Random Forest Features Description Importance Cumulative score V27 Number of times defaulted in last 12 months 0.128 0.128 V26 Number of times defaulted in last 6 months 0.103 0.232 Age Age of customers 0.083 0.314 V25 Number of times defaulted in last 3 months 0.076 0.390 V7 Total down payment of existing loan 0.068 0.457 V8 EMI of existing loan 0.066 0.523 V6 Cost of Asset (existing loan) 0.064 0.588 V9 Total Loan amount of existing loan 0.061 0.649 V23 Number of closed loans 0.059 0.708 V14 Rate of interest for existing loan 0.054 0.761 V4 MOB (Month of business with TVS Credit) 0.051 0.813 Since our objective is to segregate the customers into two categories, we will use a Classification Model to achieve this. Random Forest Classifier This method is an ensemble technique used for classification by constructing multitude of decision trees on training set (we trained model with 1000 trees with 99.9% accuracy on training set) Below are the top 11 variables with higher importance in building the model From the Random Forest model, we identified the parameters contributing significantly in classifying the risky & non-risky customers. The Importance column in the table shows the significance of parameters. Higher the value, higher the impact! Classification Model Output Snapshot Business Problem Approach taken Analyzing data Modelling Cost-Benefit Analysis Recommendation & Deployment Architecture Classification Model Evaluation Metrics Note: Python code files and API files are attached on Annexure slide
  • 7. Evaluation Metrics: Confusion matrix provides a performance summary of the classifier Evaluation metric on Training set Accuracy Sensitivity Precision Specificity F1 Score MCC 99.94% 100% 99.80% 99.91% 99.91% 99.87% True Negative (TN) True Positive (TP) False Positive (FP) False Negative (FN) 34942 17618 31 0 Evaluation metric on Test set Accuracy Sensitivity Precision Specificity F1 Score MCC 98.75% 99.82% 96.53% 98.22% 98.15% 97.24% True Negative (TN) True Positive (TP) False Positive (FP) False Negative (FN) 34524 17412 625 31 99.94% of customers were correctly labelled by the Model Of all the customers, who were predicted of defaulting on loan payment, 99.80% defaulted The Model predicted 100% customers correctly who could default on loan payment Of all the customers, 99.91% of non-defaulters were correctly labelled by the Model 98.75% of customers were correctly labelled by the Model Of all the customers, who were predicted of defaulting on loan payment, 96.53% defaulted The Model predicted 99.82% customers correctly who could default on loan payment Of all the customers, 98.22% of non-defaulters were correctly labelled by the Model Notes: MCC – Matthew Correlation Coefficient Business Problem Approach taken Analyzing data Modelling Cost-Benefit Analysis Recommendation & Deployment Architecture Classification Model Evaluation Metrics
  • 8. Business Metrics Note: V6: Cost of Asset (existing loan) V7: Total down payment of existing loan V8: EMI of existing loan V10: Tenure of existing loan: Evaluation metric on Original full dataset Accuracy Sensitivity Precision Specificity F1 Score MCC 98.80% 99.89% 64.69% 99.78% 78.53% 79.89% True Negative (TN) True Positive (TP) False Positive (FP) False Negative (FN) 115446 2611 1425 3 98.80% of customers were correctly labelled by the Model Of all the customers, who were predicted of defaulting on loan payment, 64.69% defaulted The Model predicted 99.89% customers correctly who could default on loan payment Of all the customers, 99.78% of non-defaulters were correctly labelled by the Model Business metrics Particulars Business Value Avg. loan amount (V9) 39322 No. of defaults (V30) 2614 Total loss (without model) 102787708 Avg. loan amount 39322 No. of defaults (model_FN) 3 Total loss with defaults_model 117966 Opportunity loss( # customers)_FP 1425 value lost (V10*V8 + V7 -V6 ) 258 opportunity loss with model 367650 Total loss with model 485616 Loss saved with modelling 102302092 Percentage of loss saved 99.53% Net Profit (-72634990) Total Profit (30152718) Total Loss (102787708) Without Model With Proposed Model Net Profit (29299452) Total Profit (29785068) Total Loss (485616) With proposed model, We are making transition from approx. - 7 crore to +3 crore in profits. We are saving around 99.5% in losses from using the RF model Business Problem Approach taken Analyzing data Modelling Cost-Benefit Analysis Recommendation & Deployment
  • 9. Deployment Recommendations • It is recommended to use analytical model like the proposed one to save losses for this initiative • Alternative data – Digital footprint of customers such as Social media profile, Social scoring by psychometric analysis through digital footprints to be used Business Problem Approach taken Analyzing data Modelling Cost-Benefit Analysis Recommendation & Deployment Call POST: Created API is called using POST where it displays HTML page to enter the input of feature. Post execution, console will let us know the output based on model.
  • 10. THANK YOU “It always seems impossible until it’s done.” - Nelson Mandela
  • 12. Feature Feature Definition V1 Customer's ID V2 First EMI Bounce (0 : No, 1: Yes) (existing loan) V3 Number of bounces in last 3 months Outside TVS Credit V4 MOB (Month of business with TVS Credit) V5 Number of bounces with TVS Credit V6 Cost of Asset (existing loan) V7 Total down payment of existing loan V8 EMI of existing loan V9 Total Loan amount of existing loan V10 Tenure of existing loan V11 Customer's Geographical Area Code V12 Customer's TW Dealer's Code V13 Customer's TW Model’s Code V14 Rate of interest for existing loan V15 Gender V16 Employment type of customer (SAL : Salaried, SELF : Self-employed, HOUSEWIFE, PENS : Pensioner, STUDENT) V17 Pin code V18 Date of Birth V19 Number of Live loans V20 Number of Two-Wheeler loans V21 Maximum sanction amount of Live Loans V22 Number of new loans taken in last 3 months V23 Number of closed loans V24 Number of enquiries V25 Number of times defaulted in last 3 months V26 Number of times defaulted in last 6 months V27 Number of times defaulted in last 12 months V28 Maximum loan amount sanctioned for any Gold loan V29 Maximum loan amount sanctioned for any personal loan V30 Target variable ( 1: Bad Customer / 0 : Good Customer ) Assumptions: • Complete EMI duration has been taken irrespective of at what point customer is going default due to lack of information • This conservative approach should be offset by the depreciation of assets • Avg. loan amount and avg. tenure are considered for calculation Data Dictionary Python Code