Personal Loan Risk Assessment

Name: Kunal Kashyap
College: Indian Institute of Management Kashipur
Case:
Round 3: Grand Finale
Personal Loan Risk Assessment on
Two-Wheeler Loan Customer Base

Business Problem Snapshot
Business Problem
Approach taken
Objectives
• To identify the segment of customers, who have a higher
tendency to default, if they are offered a Personal Loan
• To leverage the existing Two-Wheeler Loan (TW) customer
base to cross sell the Personal Loan product
• To develop a prediction model to classify the customer
base into Risky and Non-Risky categories for rejecting and
considering them for PL offer respectively
Problem
Statement
Credit Process Flow
Analyzing data
Modelling
Cost-Benefit Analysis
Live loans
closed loans
enquiries Gender
Age
Interest rate
Tenure
EMI
MOB
First EMI Bounce
Total down payment
Total Loan amount
Two-Wheeler loans
Employment type
Number of times defaulted
Cost of Asset
bounces with TVS Credit
bounces in last 3 months
Available
data
Payment History of 1.2 Lakh Customers
Prediction Model will help in the classification
Recommendation
& Deployment

Methodology Used | Research Insights
Start
End
Key Highlight
Team Data
Science
Process
(TDSP)
methodology
has been used
for solving this
case
Business
Understanding
Modelling
Data acquisition
& Understanding
Deployment
Approach
TDSP methodology
• Small ticket personal loans (STPL) are considered as
personal loans of ticket size less than Rs 50,000
• STPL market – 12000 Cr as of Aug 2020 | Half of them is
for loans below Rs 5000
• TG - young, low income, digitally savvy customers who
have small ticket and short-term credit needs, and no or
limited credit history customers
• Demand driver -> millennials and young borrowers in
the age group 18-30 years
140 %
Growth in FY 2019 | Driven by STPL
segment
• Home renovation, wedding, higher education or
travel costs
• To meet a medical emergency et al.
End-use
Research Insights
• Alternative data – digital footprint of customers such
as Social media profile, mobile bill, Social scoring by
psychometric analysis through digital footprints
Sources: Microsoft TDSP methodology| Paisa bazaar | BCG report | Financial Express
Business Problem
Approach taken
Analyzing data
Modelling
Recommendation
& Deployment

Data Wrangling, Exploration & Cleaning
Key Highlight
Ensemble
algorithm to
be used for
future work to
achieve higher
accuracy and
enhanced
business
opportunities
Features V1, V11,
V13, and V17 have
not been used for
modelling
technique
Transformation -
Data has been
normalized using
Min-Max method
One hot encoding
has been applied
on features
V15(Gender) and
V16(Employment)
Dataset was split
in equal
proportion for
Testing and
Training purpose
Four features –
V21, V22, V28,
and V29 were
removed due to
missing values or
very less data
A new feature
named ‘Age’ has
been created from
V18 and V18 is
removed
Random Over-
sampling and
Random Under-
sampling of minority
class and majority
class was performed
respectively due to
imbalanced nature
of dataset
Step 7
Step 6
Step 5
Step 4
Step 3
Step 2
We are left with
119,486
customers after
removing rows
with incomplete
data
Step 1
The data consists of past loan history of 119,529 customers; It has 30 features from various sources
Data Source
Business Problem
Approach taken
Analyzing data
Modelling
Recommendation
& Deployment

Classification
Good Customer
(Non-default)
Bad Customer
(Default)
Random
Oversampling of
minority class
Random
Undersampling of
majority class
Modeling Architecture
Modified
dataset
Given
dataset
Loaded
dataset
Evaluation metrics
Test set
Training set
Random Forrest Model
Overall dataset
Other
Models
Logistic regression
Deep Neural Network
SMOTE using KNN for
minority class
generation
Random Forrest Model
Business Problem
Approach taken
Analyzing data
Modelling
Recommendation
& Deployment
Architecture
Classification Model
Evaluation Metrics

Modelling: Random Forest
Features Description Importance Cumulative score
V27 Number of times defaulted in last 12 months 0.128 0.128
Age Age of customers 0.083 0.314
V7 Total down payment of existing loan 0.068 0.457
V8 EMI of existing loan 0.066 0.523
V6 Cost of Asset (existing loan) 0.064 0.588
V9 Total Loan amount of existing loan 0.061 0.649
V23 Number of closed loans 0.059 0.708
V14 Rate of interest for existing loan 0.054 0.761
V4 MOB (Month of business with TVS Credit) 0.051 0.813
Since our objective is to segregate the customers into two
categories, we will use a Classification Model to achieve this.
Random Forest Classifier
This method is an ensemble technique used for classification by
constructing multitude of decision trees on training set (we trained
model with 1000 trees with 99.9% accuracy on training set)
Below are the top 11 variables with higher importance in
building the model
From the Random Forest model, we identified the
parameters contributing significantly in classifying the risky
& non-risky customers. The Importance column in the table
shows the significance of parameters. Higher the value,
higher the impact!
Output Snapshot
Business Problem
Approach taken
Analyzing data
Modelling
Recommendation
& Deployment
Architecture
Evaluation Metrics
Note: Python code files and API files are attached on Annexure slide

Evaluation Metrics: Confusion matrix provides a performance summary of the classifier
Evaluation metric on Training set
Accuracy Sensitivity Precision Specificity F1 Score MCC
99.94% 100% 99.80% 99.91% 99.91% 99.87%
True Negative
(TN)
True Positive
(TP)
False Positive
(FP)
False Negative
(FN)
34942 17618 31 0
Evaluation metric on Test set
98.75% 99.82% 96.53% 98.22% 98.15% 97.24%
True Negative
(TN)
True Positive
(TP)
False Positive
(FP)
False Negative
(FN)
34524 17412 625 31
99.94% of customers were
correctly labelled by the Model
Of all the customers, who were
predicted of defaulting on loan
payment, 99.80% defaulted
The Model predicted 100%
customers correctly who could
default on loan payment
Of all the customers, 99.91% of
non-defaulters were correctly
labelled by the Model
The Model predicted 99.82%
Notes: MCC – Matthew Correlation Coefficient
Business Problem
Approach taken
Analyzing data
Modelling
Recommendation
& Deployment
Architecture
Evaluation Metrics

Business Metrics
Note:
V6: Cost of
Asset (existing
loan)
V7: Total
down
payment of
existing loan
V8: EMI of
existing loan
V10: Tenure of
existing loan:
Evaluation metric on Original full dataset
98.80% 99.89% 64.69% 99.78% 78.53% 79.89%
True Negative
(TN)
True Positive
(TP)
False Positive
(FP)
False Negative
(FN)
115446 2611 1425 3
The Model predicted 99.89%
Business
metrics
Particulars Business Value
Avg. loan amount (V9) 39322
No. of defaults (V30) 2614
Total loss (without model) 102787708
Avg. loan amount 39322
No. of defaults (model_FN) 3
Total loss with defaults_model 117966
Opportunity loss( # customers)_FP 1425
value lost (V10*V8 + V7 -V6 ) 258
opportunity loss with model 367650
Total loss with model 485616
Loss saved with modelling 102302092
Percentage of loss saved 99.53%
Net Profit
(-72634990)
Total Profit
(30152718)
Total Loss
(102787708)
Without Model With Proposed Model
Net Profit
(29299452)
Total Profit
(29785068)
Total Loss
(485616)
With proposed model, We are making transition
from approx. - 7 crore to +3 crore in profits.
We are saving around 99.5% in losses from using the RF model
Business Problem
Approach taken
Analyzing data
Modelling
Cost-Benefit
Analysis
Recommendation
& Deployment

Deployment
Recommendations
• It is recommended to use
analytical model like the
proposed one to save losses for
this initiative
• Alternative data – Digital
footprint of customers such as
Social media profile, Social scoring
by psychometric analysis through
digital footprints to be used
Business Problem
Approach taken
Analyzing data
Modelling
Recommendation
& Deployment
Call POST: Created API is called using POST where it displays HTML page to enter the input of feature. Post execution, console will let us know
the output based on model.

THANK YOU
“It always seems impossible until it’s done.”
- Nelson Mandela

Feature Feature Definition
V1 Customer's ID
V2 First EMI Bounce (0 : No, 1: Yes) (existing loan)
V3 Number of bounces in last 3 months Outside TVS Credit
V4 MOB (Month of business with TVS Credit)
V5 Number of bounces with TVS Credit
V6 Cost of Asset (existing loan)
V7 Total down payment of existing loan
V8 EMI of existing loan
V9 Total Loan amount of existing loan
V10 Tenure of existing loan
V11 Customer's Geographical Area Code
V12 Customer's TW Dealer's Code
V13 Customer's TW Model’s Code
V14 Rate of interest for existing loan
V15 Gender
V16
Employment type of customer (SAL : Salaried, SELF : Self-employed, HOUSEWIFE, PENS :
Pensioner, STUDENT)
V17 Pin code
V18 Date of Birth
V19 Number of Live loans
V20 Number of Two-Wheeler loans
V21 Maximum sanction amount of Live Loans
V22 Number of new loans taken in last 3 months
V23 Number of closed loans
V24 Number of enquiries
V25 Number of times defaulted in last 3 months
V28 Maximum loan amount sanctioned for any Gold loan
V29 Maximum loan amount sanctioned for any personal loan
V30 Target variable ( 1: Bad Customer / 0 : Good Customer )
Assumptions:
• Complete EMI duration has been
taken irrespective of at what point
customer is going default due to lack
of information
• This conservative approach should be
offset by the depreciation of assets
• Avg. loan amount and avg. tenure are
considered for calculation
Data Dictionary
Python Code

Personal Loan Risk Assessment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Personal Loan Risk Assessment

Similar to Personal Loan Risk Assessment (20)

Recently uploaded

Recently uploaded (20)

Personal Loan Risk Assessment