Project crm submission sonali

1
Predicting Customer Churn in Banking Industry
Sonali Gupta
X01527245
MSc in Data Analytics
National College of Ireland
Abstract-- The aim of this project is to predict the customer churn with different data mining techniques in the banking industry.
Data mining analyse the large set of data into useful information with different algorithms. Data mining also help to explain the
banking problems by finding some relation, correlation and causality to corporate data which are not visible because they are
concealed in a large amount of data. In this paper, we are using different data mining techniques such as Logistic Regression,
Support Vector Machine (SVM), K-means Nearest Neighbours (KNN), Artificial Neural network (ANN). We will also compare
the accuracy of the model to show its performance.
Keywords: Data mining, Support vector machine, logistic regression, Artificial neural network.
I. INTRODUCTION
Customer Churn is the tendency of customers to terminate
doing business with the organization in a certain period of
time. Customer churn is a critical concern for every company.
Many of the researchers are figure out the problem in our own
perspective to find out a solution for churners. Many of the
banking industry are facing churn problem and all types of
churns lead to acquiring loss and loss of loyal and high-value
customers will create a problem for an organization.
Customers are always being a significant part of the growth of
any business. With the high amount of race in every market. It
is critical to retain the loyal and long-term customers. [1]
Customer churn is the main key of success and loss for any
industry. Consequently, now banks give retention those
customers who are worth for the company to prevent the
churn.
Customer Churn Prediction is important tool to predict those
customers who are more likely to leave. On the other hand,
Data mining plays a crucial role to predict customer churn.
These data mining techniques may use Logistic Regression
(LR), Decision Tree (DT), K-nearest Neighbor (KNN),
Support Vector Machine (SVM) in further Artificial Neural
Network (ANN) to predict churners.
II. TYPES OF CHURNERS
Churners are categories into two parts which are Voluntary
and Involuntary. In Voluntary Churn divide into two parts i.e.
Deliberated and Incidental. In Voluntary Churn means when a
customer decides to cease with the company. Incidental Churn
means when customers have some problems in their own lives
such as due to financial condition churn, Change location
churn. In Deliberate churn occur when a customer decides to
leave for example customer want new service, quality and
some social factors. Involuntary Churn easiest to find out
where Organization decides to remove customers. These
customers are fraud, non-paying.
Fig-1 Churn Type
III. PROPOSED MODEL
The proposed model consists of five steps. First identify the
problem, second select the required dataset, third investigate
the dataset, fourth applying the techniques to evaluate and
interpret the result.
Fig-2
IV. HYPOTHESIS
Which Customers are higher risk to leave the bank?
V. BACKGROUND OF THE DATASET
This dataset is of the big international banks and has been
taken from the Kaggle website. Dataset contain 10,000
records and 13 attributes of the customers.
The attributes of this dataset are explained below:
1.CustomerID: This is unique ID of the customer provided by
the bank.
Identify the
Problem
Data
Selection
Investigate
Dataset
Data Mining Techniques
(KNN, ANN, Decision
Tree, SVM
Interpret &
Evaluate
result

2
2.Surname: This is surname of the customer to identify the
customer.
3.CreditScore: A Credit score is a number that reflects the
likelihood of paying back. Lenders like banks and credit card
companies will look at credit history and calculate credit
score, which show them the level of risk.
4.Geography: location of bank (French, Spain, Germany)
5.Gender: Male or Female
6.Age: Age of Customers
7.Tenure: Number of year customer relation with the bank.
8.Balance: This attribute represents the customer’s balance in
account.
9.NoOfProducts: This attribute represents the number of
product of the customer which provided by the bank.
10.HasCrCard: This attribute represents the customers who
has the credit card.
11.IsActiveMember: This attribute represents the customer
who is active member in bank.
12.EstimatedSalary: This attribute represents the of
customer’s salary.
13.Exited: This attribute represents the customers who is exit
from the bank.
VI. DATA PREPARATION DETAILS
In this data set first, we check the missing values and found
that there were no missing values.
Second, we checked that which columns are useful and some
of the columns such as CustomerID, Surname were not useful.
So, we excluded these columns from the dataset using R.
Third, in dataset Geography and Gender was denoted in the
characteristic form which has been converted into numeric
values.
Last, Choose the outcome variable (dependent variable) so
that could give the answer to the hypothesis. In the dataset
Exited attribute are selected as the dependent variable whereas
“0” act as “Non Exit” and “1” act as “Exit”. And normalized
all the columns. Now the dataset was ready for applying
techniques for predicting customer churn.
VII. RELATED WORK
T.Vafeiadis et al.[2] predict the customer churn in telecom
industry using cross-validation and compared the accuracy of
boosted versions method with non-boosted versions. Semrl et
al. [3] have proposed churn prediction to increase the gym
members using Logistic Regression and Neural network.
Shaaban et al. [4] proposed churn prediction model using
SVM and Clustering with WEKA software. Oyeniyi et al. [5]
proposed churn prediction in banking sector using clustering
k-means algorithm to determine the pattern and develop
customer retention service. Zoric et al. [6] presented a case
study of churn analysis in banking industry using a neural
network with the help of Alyuda NeuroIntelligence and
conclude that customers who used more services are less
Likely to leave and clients who used fewer services are more
likely to leave the bank.
VIII. METHODOLOGIES
Data mining play a significant role in every Customer
Relationship Management (CRM) framework, easy to detect
customer’s behavior, build and evaluate the answer of the
business problem and reduce the churn rate in the banking
industry.
In this project, five different techniques have been used to
predict the accuracy of the model and compared the accuracy
to determine the best fit model. The main aim is to predict
those customers who will likely to leave bank on the based on
diverse attributes of the dataset.
A. DECISION TREE
The Decision tree is a supervised learning technique and tree
like structure which consists of roots and nodes is easy to
understand the output and commonly used in CRM related
problems.
In this illustration going to use all the attributes with respect to
Exited attribute. It shows how our response attribute (Exited)
is different from all other independent attributes. In this 80%
data used as training set and 20 % used as test set. The code of
the decision tree in Fig-3.
Fig-3
From the Fig -4, we can see the accuracy of the test data is
87% and can also predict from the confusion matrix table that
160 customers are correctly predicted who Exited and 1575
customer correctly predict who were Non-Exit.
Fig- 4

3
From the decision tree fig-5, we can understand
1.If customer age is less than or equal to then 71% chance of
exit and if these customers have greater than 2.5 products then
2 % chance to exit and if they have less than 2.5 products than
69% chance they will not exit from the bank.
2.If Customer age is greater than 42 then 29 % chance
customer will not exit and if customer is active member 13 %
chance of exit additionally, if customer not active then 16 %
chances of stay.
Fig- 5 Decision Tree
B. SVM (SUPPORT VECTOR MACHINE)
In this technique compare all the attributes with respect to
Exited attributes and here 80:20 ratios of splitting the data into
training and testing set moreover used different kernel such as
rbfdot, laplacedot, besseldot and splinedot to check the
accuracy of the model.
Fig-6
From the Fig-7, the accuracy of the test data using “rbfdot”
kernal is 85.1 % and also correctly predict 162 customers who
Exited and 1541 customers who are Not-Exit.
Fig-7
C. KNN (K- NEAREST NEIGHBORS)
In this technique compare all the independent attributes with
the response variable (Exited). In this Case, 80% of the data
has been split as training set and 20% of the data as testing set
and also check two times nearest neighbors k=3 and k=9. The
code of this technique given below.
Fig-8
From the Fig-9, The accuracy of the test data when k=3 is 81.1
% and also 144 correctly predicted Exit Customer and 1478
predicted Not-Exited Customer and when k=9 accuracy is 82.1
% at that time 98 customer correctly predicted who has likely
to leave and 1544 customer predicted as Not-Exited.

4
Fig-9
D. ANN (ARTIFICIAL NEURAL NETWORK)
In this technique compare all the independent attribute with
respect to response variable (Exited) and 80 % data considered
as training and 20 % as testing data. The code of this
technique is given in fig-10
Fig-10
Fig-11
From Fig-11, the accuracy of the test data when hidden=1 is
83.2% and this model also correctly predicted 107 customers
are likely to Exited and 1544 customers Not-Exited.
Fig-12 Neural Network
E. LOGISTIC REGRESSION
In this technique, we have compared all the independent
variable with the dependent variable and divided the training
and testing data in 80:20 ratios. The code of this technique is
given in fig-12.
Fig-13
Fig-14
From the fig-13, the accuracy of test data is 83.3% and 154
customers correctly predicted as the exit customers and 1512
customers predicted as Non-Exited.

5
IX. INTERPRETATION OF RESULT
A. Confusion Matrix table result
B. Comparison of DT, SVM, KNN, LR, ANN techniques
Method Accuracy
Decision Tree 86.4 %
SVM (Support Vector
Machine)
85.7 %
KNN (K Nearest
Neighbour)
82.1 %
Logistic Regression 83.3 %
ANN (Artificial Neural
Network)
79.1 %
X. CUSTOMER RETENTION SOLUTION
In today’s Banking Industry Customer retention is a very
important task, because Banking Industry profit is based on
transactions volume not on margin, as there is not much profit
margin on single transactions on banking products. Banking
industry relies on volume. So, it is very important for them to
have a huge number of customers to work with and increase
their profit base. Banking industry employs many customer
retention techniques which are as follows: -
1.Customizing the product as per Customer need and Demand
2.Extending the Credit for High-end Customers as per their
requirement after analyzing their Credit history and Income
3.Conducting Survey to set Customer Expectations.
4.Setting up R&D division to look for the solutions to the
problem of today and tomorrow
5.Building relationship with the Customer with Trust and
understanding
6.Banking Industry runs on customers who return back to the
Bank, for this the happen flawlessly Banks need to be
customer friendly and be ready to go the extra mile (i.e. Under
Legal boundaries) with the customer. Today the Banking
scenario is changing around the globe and so are the
customers and their needs. Earlier banking was all about
deposits and withdrawals, but as time passed they got into
Lending, Insurance, Currency exchange, Business
Development, Investment and many more.
So, the Banks should keep an eye out for any new sector
opening to retain their customers to their banking system
because it is a very competitive market and everyone is
fighting for a piece of the Pie.
XI. CONCLUSION
In this Customer Churn prediction, we compared the accuracy
of different supervised learning techniques such as Decision
tree, SVM, KNN, ANN and Logistic Regression.
Decision tree gives best accuracy which means this model is
best for predicting Customer Churn in banking industry.
REFERENCES
[1] Ahn,”Customer churn analysis:churn determinants and
mediation effects of partial defection in the Korean mobile
telecommunication service industry”.
[2] T.Vafeiadis,“A Comparison of machine learning
techniques for customer churn prediction”.
[3] Semrl,“Churn Prediction Model for Effective Gym
Customer Retention”.
[4] Shaaban, “A Proposed Churn Prediction Model”
[5] Oyeniyi,“Customer Churn Analysis in Banking Sector
using Data Mining Techniques”
[6] Zoric,“Predicting Customer Churn in Banking Industry
using Neural Network”.
Actual
Class
Actual Prediction
Decision
Tree
Not Exit Exit
Not Exit 1575 222
Exit 43 165
SVM Not Exit 1541 271
Exit 26 162
KNN Not Exit 1478 132
Exit 292 144
Logistic
Regression
Not Exit 1512 69
Exit 265 154
ANN Not Exit 1557 47
Exit 289 107

Project crm submission sonali

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Project crm submission sonali

Similar to Project crm submission sonali (20)

Recently uploaded

Recently uploaded (20)

Project crm submission sonali