Churn customer, one of the most important issues in customer relationship management and marketing is especially in industries such as telecommunications, the financial and insurance. In recent decades much
research has been done in this area. In this research, the index set for the reasons set reason churn customers for our customers is of particular importance. In this study we are intended to provide a formula for the index churn customers, the better to understand the reasons for customers to provide churn. Therefore, in order to evaluate the formula provided through six Classification methods (Decision tree QUEST, Decision tree C5.0, Decision tree CHAID, Decision trees CART, Bayesian network, Neural network) to evaluate the formula will be involved with individual indicators
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
1. International Journal of Information Technology, Control and Automation (IJITCA) Vol.3, No.1, January 2013
DOI:10.5121/ijitca.2013.3105 61
PROVIDING A METHOD FOR DETERMINING THE
INDEX OF CUSTOMER CHURN IN INDUSTRY
Keyvan Vahidy Rodpysh1
1
Department of e-commerce, Nooretouba University, Tehran, Iran
keyvanvahidy@yahoo.com
ABSTRACT
Churn customer, one of the most important issues in customer relationship management and marketing is
especially in industries such as telecommunications, the financial and insurance. In recent decades much
research has been done in this area. In this research, the index set for the reasons set reason churn
customers for our customers is of particular importance. In this study we are intended to provide a formula
for the index churn customers, the better to understand the reasons for customers to provide churn.
Therefore, in order to evaluate the formula provided through six Classification methods (Decision tree
QUEST, Decision tree C5.0, Decision tree CHAID, Decision trees CART, Bayesian network, Neural
network) to evaluate the formula will be involved with individual indicators
KEYWORDS
Classification method, Customer churn, insurance, data mining
1. INTRODUCTION
In the present competitive world market where the customer is the main reason for selling is
growing day by day. Identify customer needs in order to understand the behaviour of customers in
each industry will recognize the new Policy.
Churn customers, one of the most important things for a company that may arise in connection
with customers. Avoid Churn customer will irreparable damage to a reputable company.
Prediction of customer churn important because of the negative effects of losing customers
because of high costs of attracting new customers, decreased revenue, and negative impact on a
company Reputation[7]
One of the tools in the areas of marketing and customer relationship management wide use in
order to extract hidden patterns in data and understanding of such issues is a data mining.
Classifications of data mining methods are most commonly used methods. For the assessment,
prediction and discovery of data patterns are used. Areas of study include the predicted churn
customers. Fields of study in churn customers can expect the following:
• Identifying churn customers and determining churn rate [11]
• Identifying the reason of customer churn [12].
• Get reduction strategies against customer churn. [13]
• Presenting indicator to determine customer churn
2. International Journal of Information Technology, Control and Automation (IJITCA) Vol.3, No.1, January 2013
62
In this study we followed, Formula compared with other indicators to provide optimal customer
churn.
Mainly two factors that break with the company's current or future pay levels tend to break with
the company .In order to evaluate each of the indicators together with the indicators we used the
tools Classification include Decision tree QUEST, Decision tree C5.0, Decision tree CHAID,
Decision trees CART, Bayesian network, Neural network.
2. LITERATURE REVIEW
In the last decade,customer churn prediction have received a growing attention [14]. Customer
churn, customer orientation in the direction of motion for the transition from a competitive
supplier to another supplier [15].
Table 1 Review of literature churn customers, churn index and consumer shows. As you can see,
most research has been done through data mining tools. Churn index is intended for customers to
predict and churn.
As seen in Table 1, Customers in various industries such as telecommunications, which predicts
churn, include banking, insurance and the journalist has a special significance. Churn customers
in order to study various indicators Identifying churn customers and determining churn rate [11],
identifying the reason of customer churn [12], Get reduction strategies against customer churn.
[13].
Table. 1 Literature review of anticipated customer churn and the index churn
3. RESEARCH METHODOLOGY
As you can see in Figure 1,five variable (May terminate the contract and service,Extension of
Contract and services,Advised the company to others,Satisfaction of rival companies,Overall
satisfaction of the Company) method with the formula provided by the six Classification method
(Decision tree QUEST,Decision tree C5.0,Decision tree CHAID,Decision trees CART,Bayesian
network,Neural network) on the test data are evaluated to determine if a formulation is desirable
or not results
3. International Journal of Information Technology, Control and Automation (IJITCA) Vol.3, No.1, January 2013
63
3.1. Data and variables used
Population includes customers who used third party policy in Guilan province in time interval of
23JUL-23SEP 2011 After primary setting of questionnaire and field survey, final data obtained
from a database of customers have been refined in order to data mining and analysis.
In this study, seven variables (answer questions / problems, Customer satisfaction of the insurer's
behaviour, The Company's obligations to pay compensation, Iran Insurance Company
Figure 1.Research methodology
Of premiums, How to pay premiums to the insurer, Credit insurer or insurance company in Iran,
Find out which insurance companies use insurance) as the main variables and five variable (May
terminate the contract and service,Extension of Contract and services,Advised the company to
others,Satisfaction of rival companies,Overall satisfaction of the Company)as the variables
involved in the proposed formula is used to determine the index of customer churn
3.2. Methods of classification
As noted in the study of methods to evaluate the classification of the formulation. The following
provides a brief overview of the classification methods we used.
Decision tree CART
CART decision tree analysis in 1984 by a group classification was developed. The above
algorithm for a comprehensive study on decision trees, providing technical innovations, a
complex discussion on the data structure and a strong management on large sample theory for
trees is important. CART decision tree recursive segmentation procedure that is able to handle
continuous and discrete attributes property values.The data rows are administered in the form of
binary operation is not necessary. Trees have been grown without the use of any law to the fullest
extent possible and then get to the root pruning. [2]
4. International Journal of Information Technology, Control and Automation (IJITCA) Vol.3, No.1, January 2013
64
Decision tree QUEST
QUEST decision tree is a decision tree approach that statistical methods used in CART decision
tree provides the same functionality but with more restrictions and are usually the result of lower
.QUEST tree uses surrogate splitting to handle missing values [8]
Decision tree CHAID
CHAID is a statistical technique for effective segmentation, or tree growth, which was developed
in 1980 by kass. The result of a statistical test is used as a criterion. CHAID evaluates all
potential predictive value of a field is built. This method of statistical values that have been
assessed as homogeneous values consider the target variables are combined and all other amounts
that are heterogeneous (non-identical), they will maintain. The best predictor is chosen from the
first cluster of the decision tree, so each node has children from a group of selected field values
are uniform. As this process continues recursively until the tree is fully grown. [8]
Decision tree C5.0
The C5.0 decision tree algorithm, the new algorithm is capable of classifying variable costs gives.
The equal treatment of all errors is present algorithm but in some practical applications of
classification errors and other errors are more critical than others. Create a classification C5.0
algorithm to minimize expected cost of classification error rates, rather than making whether it is
possible that the case was not of equal importance. C5.0 algorithm provides facilities for feature
weight to calculate the importance of each case deals with this feature, the algorithm tries to
reduce the error rate is predicted [1]
Neural network
For the best rated models, are models of neural network a simplified model of the field of neural
networks and brain nerve cells? It is designed for computers. The main objective of this study is
to find a suitable set of weights for different categories of participants The neural network
learning the way that the records are tested And when an incorrect estimate of the weight
adjustment is performed This process continues to improve so estimates are conditional upon
Neural networks are powerful estimators as well as other methods of estimating And sometimes
they do best. The main disadvantage of this method is much time spent on various parameters is
selected [1]
Bayesian network
The Bayesian networks through mathematical rules based on new information combined with
knowledge exists.The Bayesian network is based Bayes theory, uncertainty is a powerful tool for
determining circumstances a very simple form of a New Bayesian classification [5]
Evaluation methods
Evaluate the accuracy of the factions do we classification the evaluation criteria we use include:
• Overall accuracy: Percent of the records are correctly predicted the entire record
TP: the number of true positives are defined as positive
FP : the number that have been incorrectly classified as positive.
5. International Journal of Information Technology, Control and Automation (IJITCA) Vol.3, No.1, January 2013
65
FN: the number of false positive elements that have been classified as negative.
TN: the number of correctly identified its negative
• Profit: Profit function of a set of weighting coefficients values is with the revenue and
expense. Appropriate model for this function is zero at the beginning minded reached a point
of maximum return in the form of a poor model ascending or descending directly be seen.
• ROC curve: ROC curve is an indicator for measuring the performance of a model of the area
under the curve is more accurate indication.
• Index Lift: Sample rate depending on the sort of confidence, predict the parameters of the
basic units of society as a whole shows Lift [3]
4. Provide a method and its evaluation with other indicators
As the literature review were presented in most research on predicting customer churn of second
factor likely to break with the company in the future, or break with the company as churn index is
used. Hence we have the following five variables with the normalization Min-Max, which is used
for data normalization, which is presented as follows X=(X-Min(X))/(Max(X)-Min(X)) [9].
Formulation for the index following relationship is obtained from churn index:
: Using the services of companies competing products (if you have other insurance in an
insurance company 1 and otherwise 0)
: Satisfaction with services companies competing products (toward normalization satisfaction
and convert the data to range between 0 and 1, instead of simply very little = 0, little = 0.25,
normal = 0.5, much = 0.75, too much=1)
: Overall satisfaction with services provided in the company's products (in the normalization
satisfaction and convert the data to range between 0 and 1, instead of simply very little = 1, little
= 0.75, normal = 0.5, much =0.25, too much= 0)
: More likely to use the services of the company products (in the normalization satisfaction and
convert the data to range between 0 and 1, instead of simply very little = 1, little = 0.75, normal =
0.5, much =0.25, too much = 0)
: The company recommends to others (in the normalization satisfaction and convert the data to
range between 0 and 1, instead of simply very little = 1, little = 0.75, normal = 0.5, much =0.25,
too much = 0)
6. International Journal of Information Technology, Control and Automation (IJITCA) Vol.3, No.1, January 2013
66
: The probability of discontinuation of service products in the future (toward normalization
satisfaction and convert the data to range between 0 and 1, instead of simply very little = 0, little
= 0.25, normal = 0.5, much = 0.75 , too much=1)
customers have a high rate of 0.5 as customers churn and customers with lower rates of 0.5
customers are not supposed to churn.
the formula presented in this study to verify the correctness of each of the indicators presented
Method to evaluate the six Classification methods (Decision tree QUEST, Decision tree
C5.0,Decision tree CHAID, Decision trees CART, Bayesian network, Neural network,) on the
seven original variables (variables (answer questions / problems, Customer satisfaction of the
insurer's behavior, The Company's obligations to pay compensation, Iran Insurance Company of
premiums, How to pay premiums to the insurer, Credit insurer or insurance company in Iran, Find
out which insurance companies use insurance) that has been mentioned, has been paid.
Evaluation results classification of the index has been used in the tables and figures below
Table.2 Formulation of indicators and criteria for determining Churn Customer by CART
decision tree
Table.3 Formulation of indicators and criteria for determining Churn Customer by Bayesian
network
7. International Journal of Information Technology, Control and Automation (IJITCA) Vol.3, No.1, January 2013
67
Table.4 Formulation of indicators and criteria for determining Churn Customer by Neural
network
Table.5 Formulation of indicators and criteria for determining Churn Customer by Quest
decision tree
Table.6 Formulation of indicators and criteria for determining Churn Customer by C5.0
decision tree
8. International Journal of Information Technology, Control and Automation (IJITCA) Vol.3, No.1, January 2013
68
Table.7 Formulation of indicators and criteria for determining Churn Customer by CHAID
decision tree
Figure 2. Formulation of criteria for determining Churn Customer (profit)
Figure 3. Formulation of criteria for determining Churn Customer (Overall accuracy)
0
50
100
150
200
250
CART decision tree
Bayesian network
Neuralnetwork
Questdecision tree
C5.0 decision tree
CHAID decision tree
85
90
95
100
105
CART
decision
tree
Bayesian
network
9. International Journal of Information Technology, Control and Automation (IJITCA) Vol.3, No.1, January 2013
69
Figure 4. Formulation of criteria for determining Churn Customer (ROC)
Figure 5. Formulation of criteria for determining Churn Customer (LIFT)
As originally observed in the above tables Accuracy of the method in 6 classification on database
performance indicator formulas are presented.
5. Conclusion
In the present study were followed, Indicator for the customer to consider churn. The present
results suggest that indicators presented in this study, that formula index churn has outperformed
each index In this study, a method to evaluate the indices of the 6 classification have been churn
customers. The formula shows better performance than the individual indicators have been
provided. From what was said can be concluded formula stated that the accuracy and reliability
than individual factors in the formula and indicates the correct formula is presented.
REFERENCES
1. Micheal J. A .Berry and Gordon S.linoff.92004) EBook Data Mining Technique for marketing Sales
and CRM: Wiley Publishing, Inc., Indianapolis. Indiana
2. Daniel Westreich , Justin Lessler and Michele Jonsson Funk . (2010) Propensity score estimation:
neural networks(nn), support vector machines(SVM), decision trees (CART) and meta-classifiers as
alternatives to logistic regression. Journal of Clinical Epidemiology, vol.63, pp. 826-833
3. Chih-Ping Wei and I-Tang Chiu. (2002)Turning telecommunications to Churn prediction:a data
mining approach. Expert Systems with Applications, vol. 23, pp. 103–112
0
0.2
0.4
0.6
0.8
1
1.2
CART decision tree
Bayesian network
Neural network
Questdecision tree
C5.0 decision tree
CHAID decision tree
0
0.5
1
1.5
2
2.5
3
3.5
CART decision tree
Bayesian network
Neural network
Questdecision tree
C5.0 decision tree
CHAID decision tree
10. International Journal of Information Technology, Control and Automation (IJITCA) Vol.3, No.1, January 2013
70
4. Saradh and Palshikar. (2011)Employees churn prediction. Expert Systems with Applications,vol. 38,
pp. 1999–2006
5. Han and Kamber.(2006) Data Mining: Concepts and Techniques, Second. Morgan Kaufman
Publisher, pp. 383-407 .
6. Coussement, Benoit and Van den Poel.(2010) Improved marketing decision making in a customer
churn prediction context using generalized additive models , Expert Systems with Applications,,
vol.37 ,pp. 2132–2143
7. K.M.Osei-Bryson.(2004), Evaluation of decision trees: a multi-criteria approach. Computers &
Operations Researchvol.31 , pp.1933-1945
8. SPSS Inc, Clementine 12.0 Algorithms Guide, (2007)
9. Buckinx, Van den Poel. (2005), Customer base analysis: partial defection of behaviorally loyal clients
in a non-contractual FMCG retail setting. European Journal of Operational Research, vol.164,pp.252-
268
10. Wai-Ho Au, Keith C. C. Chan and Xin Yao . (2003), A novel evolutionary data mining algorithm
with applications to churn prediction. IEEE Transactions on Evolutionary computation. vol. 7, No.
6pp. 532-545
11. Ganesh,Arnold and Reynolds. (2000), Understanding the customer base of service providers:An
examination of the differences between switchers and stayers.Journal of Marketing, vol.64,pp.65-87
12. Hadden , Tiwari , R.Roy and D. Ruta . (2005), Computer assisted customer churn management:
State-of-the-art and future trends. Computers&OperationsResearch, vol. 34,pp.2902-2917
13. Z.G.Qian, W.Jiang and K.L.Tsui. ( 2006), Churn detection via customer profile modelling .
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, vol.44, pp.2913-2933
14. Dirk Van den Poel and Bart Lariviere. (2004), Customer attrition analysis for financial services using
proportional hazard models. (European Journal of Operational Research, vol.157, pp. 196-217
15. Wouter Verbeke , David Martens ,Christophe Mues and Bart Baesens. (2011), Building
comprehensible customer churn prediction models with advanced rule induction techniques. Expert
Systems with Applications, vol.38 ,pp. 2354-2364
16. Larose. (2005) EBook discovering knowledge in data an introduction to data mining: John Wiley &
Sons, Inc., Hoboken, New Jersey,
17. C.F.Tsai and M.Y. Chen .(2010) ,Variable selection by association rules for customer churn
prediction of multimedia on demand.Expert Systems with Applications, (2010),vol.37,pp.2006–2015
18. Hur and Lim.(2005), Customer churning prediction using support vector machines in online auto
insurance service. ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS
Book Series: LECTURE NOTES IN COMPUTER SCIENCE vol. 3497 pp. 928-933