Bank Loan Approval Analysis: A Comprehensive Data Analysis Project

Project Title :- Bank Loan Approval Analysis
Present by :- Shiva G Waghe

Project Contents
1. Introduction
2. Library Import
3. Loading Data
4. Data Exploration (EDA)
5. Data Cleaning
6. Data Visualization
7. Data Preprocessing
8. Train and Test Split
9. Model Building and Evaluation
10.Model Comparison
11.Power BI Dashboard
12.Observation

Introduction of Bank Loan Approval Analysis
Finance companies deals with some kinds of home loans. They
may have their presence across urban, semi urban and rural areas.
Customer first applies for home loan and after that company validates
the customer eligibility for loan.
Mostly Company wants to automate the loan eligibility process
(real time) based on customer detail provided while filling online
application form. These details are Gender, Marital Status, Education,
Number of Dependents, Income, Loan Amount, Credit History and
others. To automate this process, I have provided a data set to identify
the customers segments that are eligible for loan amount so that they
can specifically target these customers.

Library Import
 Import the libraries required for data processing and visualization.
 Reading a CSV file from the provided directory and assigning it to the pandas Data Frame
'data'.

 data.Shape –This attribute of a Data Frame returns a tuple
describing its Dimensionality.
 The data.Isnull.sum function returns a count of null values in
each column of the DataFrame.
EDA

 Head Function displays the first five rows of a Data Frame, providing a quick overview of its
structure and content.
 Tail function shows the last few rows of a Data Frame.

 The data.duplicate.sum method displays the total of duplicate values in the data set. There
are no duplicate values in this dataset..
 The unique() function returns a Series object that shows the unique values for each
column

Data Cleaning
 For better understanding, we convert Y=Yes and N=No in the
Loan Status Column using Replace function.
Filling Null value using fillna function.

Data Preprocessing
 Loan ID column is not important in our dataset. So, we will drop that column.
 We know that machines cannot interpret categorical values, so we convert data into
numerical form.

 splitting into independent & Dependent Feature.

 This code randomly splits the dataset x (features) and y (labels) into two separate sets: the
training set (x_train and y_train) and the testing set (x_test and y_test). The split is done with
a test size of “0.3”, meaning that “30%” of the data will be allocated for testing, while the
remaining “70%” will be used for training. The random_state parameter is set to “0” to
ensure of the split.
Splitting data into Training and Testing

Models used :
1. Logistic Regression : Logistic regression on this dataset requires numerous steps, as
it is often used for binary classification problems. For this dataset, logistic
regression could be used to predict a binary outcome.
2. Support Vector Classifier : SVC (Support Vector Classification), a variation of the
SVM (Support Vector Machine) model, will be utilized in this dataset to perform a
number of classification tasks. SVC is especially useful for binary and multiclass
classification tasks. For this dataset, we may use SVC to predict a categorical result,
such as whether a customer's loan was authorized or not.
3. K-Nearest Neighbors (KNN) : KNN is a simple, instance-based learning method
used in classification and regression. It categorizes a data point according on how its
neighbors are classed. In classification, the data point is assigned to the class with
the most k-nearest neighbors.
Model Building and Evaluation

Model Comparison
Selection of Model:
 After evaluating three different models, including Logistic Regression, SVC, and KNN, it is
clear that Logistic Regression outperforms than the others, with got accuracy score of 79%
Train and 82% Test.

Observation
 Majority of the customers is getting loan approved (Yes) 68.7%
 Those that are educated are better able to get their loans approved.
 A majority of our customers who get loans approved are located
in semi-urban areas.
 Those who are married taking loans more than unmarried
people.
 The majority of the graduates come from semiurban areas.
 we can see that those people whose salary above 5446 have a strong
chances of getting a loan authorized.

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project

Recommended

Recommended

More Related Content

Similar to Bank Loan Approval Analysis: A Comprehensive Data Analysis Project

Similar to Bank Loan Approval Analysis: A Comprehensive Data Analysis Project (20)

More from Boston Institute of Analytics

More from Boston Institute of Analytics (20)

Recently uploaded

Recently uploaded (20)

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project