Customer churn has been evolving as one of the major problems for financial organizations. The incessant competitions in the market and high cost of acquiring new customers have made organizations to drive their focus towards more effective customer retention strategies.
A Review on Credit Card Default Modelling using Data ScienceYogeshIJTSRD
In the last few years, credit card issuers have become one of the major consumer lending products in the U.S. as well as several other developed nations of the world, representing roughly 30 of total consumer lending USD 3.6 tn in 2016 . Credit cards issued by banks hold the majority of the market share with approximately 70 of the total outstanding balance. Bank’s credit card charge offs have stabilized after the financial crisis to around 3 of the outstanding total balance. However, there are still differences in the credit card charge off levels between different competitors. Harsh Nautiyal | Ayush Jyala | Dishank Bhandari "A Review on Credit Card Default Modelling using Data Science" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Special Issue | International Conference on Advances in Engineering, Science and Technology - 2021 , May 2021, URL: https://www.ijtsrd.com/papers/ijtsrd42461.pdf Paper URL : https://www.ijtsrd.com/engineering/computer-engineering/42461/a-review-on-credit-card-default-modelling-using-data-science/harsh-nautiyal
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
This is a project that I worked on as a Capstone for my Masters in Business Analytics program at the University of Cincinnati. In this project, I have performed an end-to-end data mining exercise including data cleaning, distribution analysis, exploratory data analysis, model building etc. to identify and predict Credit Card defaults using Customer's data on past payments and general profile. In the process for building Machine Learning models, I have fit and compared the performance of multiple models and algorithms like Logistic Regreesion, PCA, Classification tree, AdaBoost Classifier, ANN and LDA.
Machine Learning Project - Default credit card clients Vatsal N Shah
- The model we built here will use all possible factors to predict data on customers to find who are defaulters and non‐defaulters next month.
- The goal is to find the whether the clients are able to pay their next month credit amount.
- Identify some potential customers for the bank who can settle their credit balance.
- To determine if their customers could make the credit card payments on‐time.
- Default is the failure to pay interest or principal on a loan or credit card payment.
High level overview of Predictive Analytics techniques - Decision Trees, Regressions, Time Series Forecasting, Exponential Smoothing, etc.
Was put together to train friends and mentees. Based on personal learnings/research and no proprietary info, etc. and no claims on 100% accuracy. Also every institution/organization/team uses it own steps/methodologies, so please use the one relevant for you and this only for training purposes.
A Review on Credit Card Default Modelling using Data ScienceYogeshIJTSRD
In the last few years, credit card issuers have become one of the major consumer lending products in the U.S. as well as several other developed nations of the world, representing roughly 30 of total consumer lending USD 3.6 tn in 2016 . Credit cards issued by banks hold the majority of the market share with approximately 70 of the total outstanding balance. Bank’s credit card charge offs have stabilized after the financial crisis to around 3 of the outstanding total balance. However, there are still differences in the credit card charge off levels between different competitors. Harsh Nautiyal | Ayush Jyala | Dishank Bhandari "A Review on Credit Card Default Modelling using Data Science" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Special Issue | International Conference on Advances in Engineering, Science and Technology - 2021 , May 2021, URL: https://www.ijtsrd.com/papers/ijtsrd42461.pdf Paper URL : https://www.ijtsrd.com/engineering/computer-engineering/42461/a-review-on-credit-card-default-modelling-using-data-science/harsh-nautiyal
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
This is a project that I worked on as a Capstone for my Masters in Business Analytics program at the University of Cincinnati. In this project, I have performed an end-to-end data mining exercise including data cleaning, distribution analysis, exploratory data analysis, model building etc. to identify and predict Credit Card defaults using Customer's data on past payments and general profile. In the process for building Machine Learning models, I have fit and compared the performance of multiple models and algorithms like Logistic Regreesion, PCA, Classification tree, AdaBoost Classifier, ANN and LDA.
Machine Learning Project - Default credit card clients Vatsal N Shah
- The model we built here will use all possible factors to predict data on customers to find who are defaulters and non‐defaulters next month.
- The goal is to find the whether the clients are able to pay their next month credit amount.
- Identify some potential customers for the bank who can settle their credit balance.
- To determine if their customers could make the credit card payments on‐time.
- Default is the failure to pay interest or principal on a loan or credit card payment.
High level overview of Predictive Analytics techniques - Decision Trees, Regressions, Time Series Forecasting, Exponential Smoothing, etc.
Was put together to train friends and mentees. Based on personal learnings/research and no proprietary info, etc. and no claims on 100% accuracy. Also every institution/organization/team uses it own steps/methodologies, so please use the one relevant for you and this only for training purposes.
Default Probability Prediction using Artificial Neural Networks in R ProgrammingVineet Ojha
The objective of the project is to analyze the ability of the Artificial Neural Network Model
developed to forecast the credit risk profile of retails banking loan consumers and credit card
customers.
From a theoretical point of view, this project introduces a literature review on the detailed
working and the application of Artificial Neural Networks for credit risk management.
Practically, the aim of this project is presenting a model for estimating the Probability of Default
using Artificial Neural Network to accrue benefit non-linear models.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
CHURN ANALYSIS AND PLAN RECOMMENDATION FOR TELECOM OPERATORSJournal For Research
With increasing number of mobile operators, user is entitled with unlimited freedom to switch from one mobile operator to another if he is not satisfied with service or pricing. This trend is not good for operators as they lose their revenue because of customer switch. To solve it, operators are looking for machine learning tools which can predict well in advance which customer may churn, so that they can predict any alternative plans to satisfy and retain them. In this paper, we design a hybrid machine learning classifier to predict if the customer will churn based on the CDR parameters and we also propose a rule engine to suggest best plans.
This project aims at predicting Defaulters of Credit Card Payment. R programming is used for Exploratory Data Analysis and for Model building R programming and Azure ML is used.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This was a part of smartcubes resolvr case study competition
Blue Delta Airways, a well-known budget airline company operating primarily in the US and Europe, has observed a decline in the number of flyers over the past year. In view of the declining customer base, the company has collected data on customer satisfaction – based on several personal and services-based attributes, such as in-flight services, cleanliness, and legroom. Blue Delta Airways has approached your team with a historical dataset of c.130,000 flyers and would like you to come up with a model to predict customer satisfaction and the parameters it is influenced by. To aid related decision-making, provide the client useful information and insights based on questions on the following slide
Predicting the Credit Defaulter is a perilous task of Financial Industries like Banks. Ascertainingnon payer
before giving loan is a significant and conflict-ridden task of the Banker. Classification techniques
are the better choice for predictive analysis like finding the claimant, whether he/she is an unpretentious
customer or a cheat. Defining the outstanding classifier is a risky assignment for any industrialist like a
banker. This allow computer science researchers to drill down efficient research works through evaluating
different classifiers and finding out the best classifier for such predictive problems. This research
work investigates the productivity of LADTree Classifier and REPTree Classifier for the credit risk prediction
and compares their fitness through various measures. German credit dataset has been taken and used
to predict the credit risk with a help of open source machine learning tool.
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSAndresz26
Este Working Paper relata sobre el nivel del riesgo crediticio, se debe reconocer que el riesgo de un grupo proviene de la diversidad de sus miembros, este libro propone una metodología para la aplicación de la medición del riesgo crediticio, y permite hacer un ranking de la población por su nivel de riesgo. La misma que realiza una distinción en los diferentes rankings de la población por su nivel de riesgo, y considerando en el ranking los riesgos de sus preferencias en sus decisiones realizadas.
http://www.udla.edu.ec/
Customer churn classification using machine learning techniquesSindhujanDhayalan
Advanced data mining project on classifying customer churn by
using machine learning algorithms such as random forest,
C5.0, Decision tree, KNN, ANN, and SVM. CRISP-DM approach was followed for developing the project. Accuracy rate, Error rate, Precision, Recall, F1 and ROC curve was generated using R programming and the efficient model was found comparing these values.
Data Mining on Customer Churn ClassificationKaushik Rajan
Implemented multiple classifiers to classify if a customer will leave or stay with the company based on multiple independent variables.
Tools used:
> RStudio for Exploratory data analysis, Data Pre-processing and building the models
> Tableau and RStudio for Visualization
> LATEX for documentation
Machine learning models used:
> Random Forest
> C5.0
> Decision tree
> Neural Network
> K-Nearest Neighbour
> Naive Bayes
> Support Vector Machine
Methodology: CRISP-DM
Default Probability Prediction using Artificial Neural Networks in R ProgrammingVineet Ojha
The objective of the project is to analyze the ability of the Artificial Neural Network Model
developed to forecast the credit risk profile of retails banking loan consumers and credit card
customers.
From a theoretical point of view, this project introduces a literature review on the detailed
working and the application of Artificial Neural Networks for credit risk management.
Practically, the aim of this project is presenting a model for estimating the Probability of Default
using Artificial Neural Network to accrue benefit non-linear models.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
CHURN ANALYSIS AND PLAN RECOMMENDATION FOR TELECOM OPERATORSJournal For Research
With increasing number of mobile operators, user is entitled with unlimited freedom to switch from one mobile operator to another if he is not satisfied with service or pricing. This trend is not good for operators as they lose their revenue because of customer switch. To solve it, operators are looking for machine learning tools which can predict well in advance which customer may churn, so that they can predict any alternative plans to satisfy and retain them. In this paper, we design a hybrid machine learning classifier to predict if the customer will churn based on the CDR parameters and we also propose a rule engine to suggest best plans.
This project aims at predicting Defaulters of Credit Card Payment. R programming is used for Exploratory Data Analysis and for Model building R programming and Azure ML is used.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This was a part of smartcubes resolvr case study competition
Blue Delta Airways, a well-known budget airline company operating primarily in the US and Europe, has observed a decline in the number of flyers over the past year. In view of the declining customer base, the company has collected data on customer satisfaction – based on several personal and services-based attributes, such as in-flight services, cleanliness, and legroom. Blue Delta Airways has approached your team with a historical dataset of c.130,000 flyers and would like you to come up with a model to predict customer satisfaction and the parameters it is influenced by. To aid related decision-making, provide the client useful information and insights based on questions on the following slide
Predicting the Credit Defaulter is a perilous task of Financial Industries like Banks. Ascertainingnon payer
before giving loan is a significant and conflict-ridden task of the Banker. Classification techniques
are the better choice for predictive analysis like finding the claimant, whether he/she is an unpretentious
customer or a cheat. Defining the outstanding classifier is a risky assignment for any industrialist like a
banker. This allow computer science researchers to drill down efficient research works through evaluating
different classifiers and finding out the best classifier for such predictive problems. This research
work investigates the productivity of LADTree Classifier and REPTree Classifier for the credit risk prediction
and compares their fitness through various measures. German credit dataset has been taken and used
to predict the credit risk with a help of open source machine learning tool.
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSAndresz26
Este Working Paper relata sobre el nivel del riesgo crediticio, se debe reconocer que el riesgo de un grupo proviene de la diversidad de sus miembros, este libro propone una metodología para la aplicación de la medición del riesgo crediticio, y permite hacer un ranking de la población por su nivel de riesgo. La misma que realiza una distinción en los diferentes rankings de la población por su nivel de riesgo, y considerando en el ranking los riesgos de sus preferencias en sus decisiones realizadas.
http://www.udla.edu.ec/
Customer churn classification using machine learning techniquesSindhujanDhayalan
Advanced data mining project on classifying customer churn by
using machine learning algorithms such as random forest,
C5.0, Decision tree, KNN, ANN, and SVM. CRISP-DM approach was followed for developing the project. Accuracy rate, Error rate, Precision, Recall, F1 and ROC curve was generated using R programming and the efficient model was found comparing these values.
Data Mining on Customer Churn ClassificationKaushik Rajan
Implemented multiple classifiers to classify if a customer will leave or stay with the company based on multiple independent variables.
Tools used:
> RStudio for Exploratory data analysis, Data Pre-processing and building the models
> Tableau and RStudio for Visualization
> LATEX for documentation
Machine learning models used:
> Random Forest
> C5.0
> Decision tree
> Neural Network
> K-Nearest Neighbour
> Naive Bayes
> Support Vector Machine
Methodology: CRISP-DM
Many customers often switch or unsubscribe (churn) from their telecom providers for a variety of reasons. These could range from unsatisfactory service, better pricing from competitors, customers moving to different cities etc. Therefore, telecom companies are interested in analyzing the patterns for customers who churn from their services and use the resultant analysis to determine in the future which customers are more likely to unsubscribe from their services. One such company is Telco Systems. Telco Systems is interested in identifying the precise patterns for their churning customers and have provided the customer data for this project.
Dive deep into the world of insurance churn prediction with this captivating data analysis project presented by Boston Institute of Analytics. Our talented students embark on a journey to unravel the mysteries behind customer churn in the insurance industry, leveraging advanced data analysis techniques to forecast and anticipate customer behavior. From analyzing historical data and customer demographics to identifying predictive indicators and developing churn prediction models, this project offers a comprehensive exploration of the factors influencing insurance churn dynamics. Gain valuable insights and actionable recommendations derived from rigorous data analysis, presented in an engaging and informative format. Don't miss this opportunity to delve into the fascinating realm of data analysis and unlock new perspectives on insurance churn prediction. Explore the project now and embark on a journey of discovery with Boston Institute of Analytics. To learn more about our data science and artificial intelligence programs, visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
Explore our students' cutting-edge project on predicting bank customer churn using advanced analytics techniques. This project employs machine learning algorithms to analyze customer data and forecast the likelihood of churn, offering valuable insights for financial institutions. Gain insights into customer retention strategies, predictive modeling, and the potential impact on banking operations. To learn more, do check out https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
The data set used in this project is available in the Kaggle and contains nineteen columns (independent variables) that indicate the characteristics of the clients of a fictional telecommunications corporation. The Churn column (response variable) indicates whether the customer departed within the last month or not. The class No includes the clients that did not leave the company last month, while the class YES contains the clients that decided to terminate their relations with the company. The objective of the analysis is to obtain the relation between the customer’s characteristics and the churn.
Dive into the intricate world of fraud detection with this comprehensive presentation featuring an unique student project. Explore the project's objectives, methodologies, and innovative solutions developed to combat fraudulent activities within financial transactions. From data analysis to model implementation, witness the journey our student has undertaken to create a robust fraud detection system. Whether you're a fellow student, industry professional, or enthusiast, this showcase provides valuable insights into the challenges and advancements in fraud detection technology. To learn more, do check out https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
1. 1
Predicting Customer Churn in Banking Industry
Sonali Gupta
X01527245
MSc in Data Analytics
National College of Ireland
Abstract-- The aim of this project is to predict the customer churn with different data mining techniques in the banking industry.
Data mining analyse the large set of data into useful information with different algorithms. Data mining also help to explain the
banking problems by finding some relation, correlation and causality to corporate data which are not visible because they are
concealed in a large amount of data. In this paper, we are using different data mining techniques such as Logistic Regression,
Support Vector Machine (SVM), K-means Nearest Neighbours (KNN), Artificial Neural network (ANN). We will also compare
the accuracy of the model to show its performance.
Keywords: Data mining, Support vector machine, logistic regression, Artificial neural network.
I. INTRODUCTION
Customer Churn is the tendency of customers to terminate
doing business with the organization in a certain period of
time. Customer churn is a critical concern for every company.
Many of the researchers are figure out the problem in our own
perspective to find out a solution for churners. Many of the
banking industry are facing churn problem and all types of
churns lead to acquiring loss and loss of loyal and high-value
customers will create a problem for an organization.
Customers are always being a significant part of the growth of
any business. With the high amount of race in every market. It
is critical to retain the loyal and long-term customers. [1]
Customer churn is the main key of success and loss for any
industry. Consequently, now banks give retention those
customers who are worth for the company to prevent the
churn.
Customer Churn Prediction is important tool to predict those
customers who are more likely to leave. On the other hand,
Data mining plays a crucial role to predict customer churn.
These data mining techniques may use Logistic Regression
(LR), Decision Tree (DT), K-nearest Neighbor (KNN),
Support Vector Machine (SVM) in further Artificial Neural
Network (ANN) to predict churners.
II. TYPES OF CHURNERS
Churners are categories into two parts which are Voluntary
and Involuntary. In Voluntary Churn divide into two parts i.e.
Deliberated and Incidental. In Voluntary Churn means when a
customer decides to cease with the company. Incidental Churn
means when customers have some problems in their own lives
such as due to financial condition churn, Change location
churn. In Deliberate churn occur when a customer decides to
leave for example customer want new service, quality and
some social factors. Involuntary Churn easiest to find out
where Organization decides to remove customers. These
customers are fraud, non-paying.
Fig-1 Churn Type
III. PROPOSED MODEL
The proposed model consists of five steps. First identify the
problem, second select the required dataset, third investigate
the dataset, fourth applying the techniques to evaluate and
interpret the result.
Fig-2
IV. HYPOTHESIS
Which Customers are higher risk to leave the bank?
V. BACKGROUND OF THE DATASET
This dataset is of the big international banks and has been
taken from the Kaggle website. Dataset contain 10,000
records and 13 attributes of the customers.
The attributes of this dataset are explained below:
1.CustomerID: This is unique ID of the customer provided by
the bank.
Identify the
Problem
Data
Selection
Investigate
Dataset
Data Mining Techniques
(KNN, ANN, Decision
Tree, SVM
Interpret &
Evaluate
result
2. 2
2.Surname: This is surname of the customer to identify the
customer.
3.CreditScore: A Credit score is a number that reflects the
likelihood of paying back. Lenders like banks and credit card
companies will look at credit history and calculate credit
score, which show them the level of risk.
4.Geography: location of bank (French, Spain, Germany)
5.Gender: Male or Female
6.Age: Age of Customers
7.Tenure: Number of year customer relation with the bank.
8.Balance: This attribute represents the customer’s balance in
account.
9.NoOfProducts: This attribute represents the number of
product of the customer which provided by the bank.
10.HasCrCard: This attribute represents the customers who
has the credit card.
11.IsActiveMember: This attribute represents the customer
who is active member in bank.
12.EstimatedSalary: This attribute represents the of
customer’s salary.
13.Exited: This attribute represents the customers who is exit
from the bank.
VI. DATA PREPARATION DETAILS
In this data set first, we check the missing values and found
that there were no missing values.
Second, we checked that which columns are useful and some
of the columns such as CustomerID, Surname were not useful.
So, we excluded these columns from the dataset using R.
Third, in dataset Geography and Gender was denoted in the
characteristic form which has been converted into numeric
values.
Last, Choose the outcome variable (dependent variable) so
that could give the answer to the hypothesis. In the dataset
Exited attribute are selected as the dependent variable whereas
“0” act as “Non Exit” and “1” act as “Exit”. And normalized
all the columns. Now the dataset was ready for applying
techniques for predicting customer churn.
VII. RELATED WORK
T.Vafeiadis et al.[2] predict the customer churn in telecom
industry using cross-validation and compared the accuracy of
boosted versions method with non-boosted versions. Semrl et
al. [3] have proposed churn prediction to increase the gym
members using Logistic Regression and Neural network.
Shaaban et al. [4] proposed churn prediction model using
SVM and Clustering with WEKA software. Oyeniyi et al. [5]
proposed churn prediction in banking sector using clustering
k-means algorithm to determine the pattern and develop
customer retention service. Zoric et al. [6] presented a case
study of churn analysis in banking industry using a neural
network with the help of Alyuda NeuroIntelligence and
conclude that customers who used more services are less
Likely to leave and clients who used fewer services are more
likely to leave the bank.
VIII. METHODOLOGIES
Data mining play a significant role in every Customer
Relationship Management (CRM) framework, easy to detect
customer’s behavior, build and evaluate the answer of the
business problem and reduce the churn rate in the banking
industry.
In this project, five different techniques have been used to
predict the accuracy of the model and compared the accuracy
to determine the best fit model. The main aim is to predict
those customers who will likely to leave bank on the based on
diverse attributes of the dataset.
A. DECISION TREE
The Decision tree is a supervised learning technique and tree
like structure which consists of roots and nodes is easy to
understand the output and commonly used in CRM related
problems.
In this illustration going to use all the attributes with respect to
Exited attribute. It shows how our response attribute (Exited)
is different from all other independent attributes. In this 80%
data used as training set and 20 % used as test set. The code of
the decision tree in Fig-3.
Fig-3
From the Fig -4, we can see the accuracy of the test data is
87% and can also predict from the confusion matrix table that
160 customers are correctly predicted who Exited and 1575
customer correctly predict who were Non-Exit.
Fig- 4
3. 3
From the decision tree fig-5, we can understand
1.If customer age is less than or equal to then 71% chance of
exit and if these customers have greater than 2.5 products then
2 % chance to exit and if they have less than 2.5 products than
69% chance they will not exit from the bank.
2.If Customer age is greater than 42 then 29 % chance
customer will not exit and if customer is active member 13 %
chance of exit additionally, if customer not active then 16 %
chances of stay.
Fig- 5 Decision Tree
B. SVM (SUPPORT VECTOR MACHINE)
In this technique compare all the attributes with respect to
Exited attributes and here 80:20 ratios of splitting the data into
training and testing set moreover used different kernel such as
rbfdot, laplacedot, besseldot and splinedot to check the
accuracy of the model.
Fig-6
From the Fig-7, the accuracy of the test data using “rbfdot”
kernal is 85.1 % and also correctly predict 162 customers who
Exited and 1541 customers who are Not-Exit.
Fig-7
C. KNN (K- NEAREST NEIGHBORS)
In this technique compare all the independent attributes with
the response variable (Exited). In this Case, 80% of the data
has been split as training set and 20% of the data as testing set
and also check two times nearest neighbors k=3 and k=9. The
code of this technique given below.
Fig-8
From the Fig-9, The accuracy of the test data when k=3 is 81.1
% and also 144 correctly predicted Exit Customer and 1478
predicted Not-Exited Customer and when k=9 accuracy is 82.1
% at that time 98 customer correctly predicted who has likely
to leave and 1544 customer predicted as Not-Exited.
4. 4
Fig-9
D. ANN (ARTIFICIAL NEURAL NETWORK)
In this technique compare all the independent attribute with
respect to response variable (Exited) and 80 % data considered
as training and 20 % as testing data. The code of this
technique is given in fig-10
Fig-10
Fig-11
From Fig-11, the accuracy of the test data when hidden=1 is
83.2% and this model also correctly predicted 107 customers
are likely to Exited and 1544 customers Not-Exited.
Fig-12 Neural Network
E. LOGISTIC REGRESSION
In this technique, we have compared all the independent
variable with the dependent variable and divided the training
and testing data in 80:20 ratios. The code of this technique is
given in fig-12.
Fig-13
Fig-14
From the fig-13, the accuracy of test data is 83.3% and 154
customers correctly predicted as the exit customers and 1512
customers predicted as Non-Exited.
5. 5
IX. INTERPRETATION OF RESULT
A. Confusion Matrix table result
B. Comparison of DT, SVM, KNN, LR, ANN techniques
Method Accuracy
Decision Tree 86.4 %
SVM (Support Vector
Machine)
85.7 %
KNN (K Nearest
Neighbour)
82.1 %
Logistic Regression 83.3 %
ANN (Artificial Neural
Network)
79.1 %
X. CUSTOMER RETENTION SOLUTION
In today’s Banking Industry Customer retention is a very
important task, because Banking Industry profit is based on
transactions volume not on margin, as there is not much profit
margin on single transactions on banking products. Banking
industry relies on volume. So, it is very important for them to
have a huge number of customers to work with and increase
their profit base. Banking industry employs many customer
retention techniques which are as follows: -
1.Customizing the product as per Customer need and Demand
2.Extending the Credit for High-end Customers as per their
requirement after analyzing their Credit history and Income
3.Conducting Survey to set Customer Expectations.
4.Setting up R&D division to look for the solutions to the
problem of today and tomorrow
5.Building relationship with the Customer with Trust and
understanding
6.Banking Industry runs on customers who return back to the
Bank, for this the happen flawlessly Banks need to be
customer friendly and be ready to go the extra mile (i.e. Under
Legal boundaries) with the customer. Today the Banking
scenario is changing around the globe and so are the
customers and their needs. Earlier banking was all about
deposits and withdrawals, but as time passed they got into
Lending, Insurance, Currency exchange, Business
Development, Investment and many more.
So, the Banks should keep an eye out for any new sector
opening to retain their customers to their banking system
because it is a very competitive market and everyone is
fighting for a piece of the Pie.
XI. CONCLUSION
In this Customer Churn prediction, we compared the accuracy
of different supervised learning techniques such as Decision
tree, SVM, KNN, ANN and Logistic Regression.
Decision tree gives best accuracy which means this model is
best for predicting Customer Churn in banking industry.
REFERENCES
[1] Ahn,”Customer churn analysis:churn determinants and
mediation effects of partial defection in the Korean mobile
telecommunication service industry”.
[2] T.Vafeiadis,“A Comparison of machine learning
techniques for customer churn prediction”.
[3] Semrl,“Churn Prediction Model for Effective Gym
Customer Retention”.
[4] Shaaban, “A Proposed Churn Prediction Model”
[5] Oyeniyi,“Customer Churn Analysis in Banking Sector
using Data Mining Techniques”
[6] Zoric,“Predicting Customer Churn in Banking Industry
using Neural Network”.
Actual
Class
Actual Prediction
Decision
Tree
Not Exit Exit
Not Exit 1575 222
Exit 43 165
SVM Not Exit 1541 271
Exit 26 162
KNN Not Exit 1478 132
Exit 292 144
Logistic
Regression
Not Exit 1512 69
Exit 265 154
ANN Not Exit 1557 47
Exit 289 107