Decoding Loan Approval: Predictive Modeling in Action

The Thera bank has a growing customer base. The bank wants to increase borrowers (asset
customers) base to bring in more loan business and earn more through the interest on loans.
So, the bank wants to convert the liability-based customers to personal loan customers.
A campaign that the bank ran last year for liability customers showed a healthy conversion
rate of over 9% success. The retail marketing department is developing campaigns with
better target marketing to increase the success rate with a minimal budget.
CAPSTONE PROJECT
Topic : Predictive Modelling for Loan Approval
Overview of the Project:

Problem statement:
Source: Kaggle
The classification goal is to predict the likelihood of a liability customer buying personal loan
In this project, I have build a predictive model to predict if an will buy personal loan or not.

The models which are used are as follows:
They want to set up a new marketing campaign hence, they need information about
the relation between the different attributes given in the data
I have implemented 3 different Models with using 1 classification algorithm, all of
them had Accuracies around 95%.
How my model will help :
Models:
Logistic Regression Classifier
Complement Naïve Bayes
K-Nearest Kneighbor (KNN)
AdaBoosting (Adaptive Boosting)
I have used confusion matrix to understand and compare model performances.

Step to follow:
• Loading, Understanding and Cleaning the Data
• EDA (Exploratory Data Analysis)
• Preprocessing
• Train Test Split
• Algorithm
• Evaluation

Loading, Understanding and Cleaning the data
1. Libraries imported:
 Numpy
 Pandas
 Matplotlib
 Seaborn
 Plotly Express
2. Loaded the Data:
3. Understanding the data:
df.shape() = There are 5000 rows and 14 colums.
df.describe().transpose() = describes statistical
details

df.duplicate().sum() = There are no duplicate values
df.isnull().sum() = There are no null values
negative_counts = (df < 0).sum() There are 52 negative values present in experience column so it is converted in
positive.
df.info() = To get information of data.
df.drop() = ID and ZIP Code are the redundant
columns which does not hold predictive power so it
can be dropped.

• EDA (Exploratory Data Analysis):
Detecting Outliers:
There are some Outliers in the above boxplots in some features which are Income, CCAvg and Mortgage.
Whereas we have no strong logic to exclude them from the dataset.
Therefore, we do not remove them from the dataset.

 Family:
 The count of Families per personal loan shows that families with 4 people have the
highest frequency of accepting the personal loan and families with 2 people have the
lowest frequency of accepting the personal loan.
 Education:
 The count of Education per personal loan shows that customers
with (Advanced / Professional) education levels have the highest frequency of
accepting personal loans and customers with (Undergrad) education levels
have the lowest frequency of accepting personal loans.
 Securities Accounts:
 The count of Securities Accounts per personal loan shows that customers
who don't have securities accounts with the bank have the highest frequency of
accepting personal loans than those who have a securities account with the bank.

 CD Accounts:
 The count of CD Accounts per personal loan shows that customers who don't have a
certificate of deposit (CD) account with the bank have the highest frequency of
accepting personal loans than those who have a certificate of deposit (CD) account
with the bank.
 Online:
 The count of Online per personal loan shows that customers who use internet
banking facilities have the highest frequency of accepting personal loans than
those who don't use internet banking facilities.
 Credit card:
 The count of Credit card per personal loan shows that customers who don't use a
credit card issued by the Bank have the highest frequency of accepting personal
loans than those who use a credit card issued by the Bank.

A normal distribution (with no skewness) is observed in the features
of Age and Experience for both customers who accept and do not
accept the personal loan.
A positive and normal skewness is observed in the distribution
of Income for customers who both don't accept and
accept personal loans, respectively.
A positive skewness is observed in the distribution of Mortgage for
both customers who accept and do not accept the personal loan.

There is a positive correlation between Personal Loan and
Income, Personal Loan and CCAvg, Personal Loan and CD
Account with the target except Experience and Age.
There is moderate correlation between CCAvg and
Income.
And, there is a strong correlation between Age and
Experience.
Multivariate Analysis: To check co-relations of attributes with target variable

Model Building
Libraries imported for building models:
• Seperating the data into x and y .
• Divided the data into X_train, Y_train, X _test, Y_test and X_test to fit the model and also test the model on test
data.
Logistic Regression

Made prediction on test data and calculated the model
accuracy score which is 95%
Data Standardization using Standard Scaler and MinMaxScaler.
Training Logistic Regression Model
Confusion Matrix, has provided insights into the model's performance and
helped me in evaluating various metrics such as accuracy, precision, recall, and
F1 score which are given below.

• Because the data is imbalanced, I have used Complement Naive Bayes
Classifier using following steps:
• Transformed Continues features into Categorical features
• Convert int64 into Categorical type
• Using dummy encoding to convert categorical variables into binary variables.
Complement Naïve Bayes
• Divided the data into X_train, Y_train, X _test, Y_test and X_test to fit the model and also
test the model on test data.

Made prediction on test data and calculated the model accuracy
score which is around 87%
Data Standardization using MinMaxScaler.
Train Complement Naïve Bayes Model
Confusion Matrix, has provided insights into the model's performance and helped
me in evaluating various metrics such as accuracy, precision, recall, and F1 score
which are given below.

K-Nearest Kneighbor (KNN)
• Divided the data into X_train, Y_train, X _test, Y_test and X_test to fit the model and also test the model on test data.
Visualizing the training and test accuracy using plot to find the best K for KNN Classifier
Maximum neighbor = 50
Best accuracy is 0.93% for k

Data Standardization using StandardScaler.
Train K-Nearest Neighbor (KNN) Model

AdaBoosting (Adaptive Boosting)
• Divided the data into X_train, Y_train, X _test, Y_test and X_test to fit the model and also test the
model on test data.
• Created a new dataframe

Data Standardization using MinMaxScaler
Train AdaBoost Model

Conclusion
Comparing and Concatenating Performance Metrics of different Classification Models. It is shown above in
tabular format.
After analysing and calculating the performance of different classification models it has been observed that
the AdaBoost Classifier gives best result. Hence, we can use Adaboost Classifier as our model for
determine the liability of the customer buying Personal Loans from the bank.

Decoding Loan Approval: Predictive Modeling in Action

Recommended

Recommended

More Related Content

Similar to Decoding Loan Approval: Predictive Modeling in Action

Similar to Decoding Loan Approval: Predictive Modeling in Action (20)

More from Boston Institute of Analytics

More from Boston Institute of Analytics (20)

Recently uploaded

Recently uploaded (20)

Decoding Loan Approval: Predictive Modeling in Action