With the growth of the e-commerce sector, customers have more choices, a fact which encourages them to divide their purchases amongst several ecommerce sites and compare their competitors‟ products, yet this increases high risks of churning. A review of the literature on customer churning models reveals that no prior research had considered both partial and total defection in non-contractual online environments. Instead, they focused either on a total or partial defect. This study proposes a customer churn prediction model in an e-commerce context, wherein a clustering phase is based on the integration of the k-means method and the Length-RecencyFrequency-Monetary (LRFM) model. This phase is employed to define churn followed by a multi-class prediction phase based on three classification techniques: Simple decision tree, Artificial neural networks and Decision tree ensemble, in which the dependent variable classifies a particular customer into a customer continuing loyal buying patterns (Non-churned), a partial defector (Partially-churned), and a total defector (Totally-churned). Macroaveraging measures including average accuracy, macro-average of Precision, Recall, and F-1 are used to evaluate classifiers‟ performance on 10-fold cross validation. Using real data from an online store, the results show the efficiency of decision tree ensemble model over the other models in identifying both future partial and total defection.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
Biometric Identification and Authentication Providence using Fingerprint for ...IJECEIAES
The raise in the recent security incidents of cloud computing and its challenges is to secure the data. To solve this problem, the integration of mobile with cloud computing, Mobile biometric authentication in cloud computing is presented in this paper. To enhance the security, the biometric authentication is being used, since the Mobile cloud computing is popular among the mobile user. This paper examines how the mobile cloud computing (MCC) is used in security issue with finger biometric authentication model. Through this fingerprint biometric, the secret code is generated by entropy value. This enables the person to request for accessing the data in the desk computer. When the person requests the access to the authorized user through Bluetooth in mobile, the Authorized user sends the permit access through fingerprint secret code. Finally this fingerprint is verified with the database in the Desk computer. If it is matched, then the computer can be accessed by the requested person.
The use of genetic algorithm, clustering and feature selection techniques in ...IJMIT JOURNAL
Decision tree modelling, as one of data mining techniques, is used for credit scoring of bank customers.
The main problem is the construction of decision trees that could classify customers optimally. This study
presents a new hybrid mining approach in the design of an effective and appropriate credit scoring model.
It is based on genetic algorithm for credit scoring of bank customers in order to offer credit facilities to
each class of customers. Genetic algorithm can help banks in credit scoring of customers by selecting
appropriate features and building optimum decision trees. The new proposed hybrid classification model is
established based on a combination of clustering, feature selection, decision trees, and genetic algorithm
techniques. We used clustering and feature selection techniques to pre-process the input samples to
construct the decision trees in the credit scoring model. The proposed hybrid model choices and combines
the best decision trees based on the optimality criteria. It constructs the final decision tree for credit
scoring of customers. Using one credit dataset, results confirm that the classification accuracy of the
proposed hybrid classification model is more than almost the entire classification models that have been
compared in this paper. Furthermore, the number of leaves and the size of the constructed decision tree
(i.e. complexity) are less, compared with other decision tree models. In this work, one financial dataset was
chosen for experiments, including Bank Mellat credit dataset.
Prediction of Default Customer in Banking Sector using Artificial Neural Networkrahulmonikasharma
The aim of this article is to present perdition and risk accuracy analysis of default customer in the banking sector. The neural network is a learning model inspired by biological neuron it is used to estimate and predict that can depend on a large number of inputs. The bank customer dataset from UCI repository, used for data analysis method to extract informative data set from a large volume of the dataset. This dataset is used in the neural network for training data and testing data. In a training of data, the data set is iterated till the desired output. This training data is cross check with test data. This paper focuses on predicting default customer by using deep learning neural network (DNN) algorithm.
Augmentation of Customer’s Profile Dataset Using Genetic AlgorithmRSIS International
Data is the lifeblood of all type of business. Clean,
accurate and complete data is the prerequisite for the decisionmaking
in business process. Data is one of the most valuable
assets for any organization. It is immensely important that the
business focus on the quality of their data as it can help in
increasing the business performance by improving efficiencies,
streamlining operations and consolidating data sources. Good
quality data helps to improve and simplify processes, eliminate
time-consuming rework and externally to enhance a user’s
experience, further translating it to significant financial and
operational benefits [1] [2]. All organizations/ businesses strive to
retain their existing customers and gain new ones. Accurate data
enables the business to improve the customer experience. Data
augmentation adds value to base data by enhancing information
derived from the existing source. Data augmentation can help
reduce the manual intervention required to develop meaningful
information and insight of business data, as well as significantly
enhance data quality. Hence the business can provide unique
customer experience and deliver above and beyond their
expectations. The Data Augmentation is immensely important as
it helps in improving the overall productivity of the business. It
is also important in making the most accurate and relevant
information available quickly for decision making.
This work focuses on augmentation of the customer
dataset using Genetic Algorithm(GA). These augmented data are
used for the purpose of customer behavioral analysis. The data
set consists of the different factors inherent in each situation of
the customer to understand the market strategy. This behavioral
data is used in the earlier work of analyzing the data [13]. It is
found that collecting a very large amount of such data manually
is a very cumbersome process. It is inferred from the earlier
work [13] that the more number of data may give accurate
result. Hence it is decided to enrich the dataset by using Genetic
Algorithm.
CHURN ANALYSIS AND PLAN RECOMMENDATION FOR TELECOM OPERATORSJournal For Research
With increasing number of mobile operators, user is entitled with unlimited freedom to switch from one mobile operator to another if he is not satisfied with service or pricing. This trend is not good for operators as they lose their revenue because of customer switch. To solve it, operators are looking for machine learning tools which can predict well in advance which customer may churn, so that they can predict any alternative plans to satisfy and retain them. In this paper, we design a hybrid machine learning classifier to predict if the customer will churn based on the CDR parameters and we also propose a rule engine to suggest best plans.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
Biometric Identification and Authentication Providence using Fingerprint for ...IJECEIAES
The raise in the recent security incidents of cloud computing and its challenges is to secure the data. To solve this problem, the integration of mobile with cloud computing, Mobile biometric authentication in cloud computing is presented in this paper. To enhance the security, the biometric authentication is being used, since the Mobile cloud computing is popular among the mobile user. This paper examines how the mobile cloud computing (MCC) is used in security issue with finger biometric authentication model. Through this fingerprint biometric, the secret code is generated by entropy value. This enables the person to request for accessing the data in the desk computer. When the person requests the access to the authorized user through Bluetooth in mobile, the Authorized user sends the permit access through fingerprint secret code. Finally this fingerprint is verified with the database in the Desk computer. If it is matched, then the computer can be accessed by the requested person.
The use of genetic algorithm, clustering and feature selection techniques in ...IJMIT JOURNAL
Decision tree modelling, as one of data mining techniques, is used for credit scoring of bank customers.
The main problem is the construction of decision trees that could classify customers optimally. This study
presents a new hybrid mining approach in the design of an effective and appropriate credit scoring model.
It is based on genetic algorithm for credit scoring of bank customers in order to offer credit facilities to
each class of customers. Genetic algorithm can help banks in credit scoring of customers by selecting
appropriate features and building optimum decision trees. The new proposed hybrid classification model is
established based on a combination of clustering, feature selection, decision trees, and genetic algorithm
techniques. We used clustering and feature selection techniques to pre-process the input samples to
construct the decision trees in the credit scoring model. The proposed hybrid model choices and combines
the best decision trees based on the optimality criteria. It constructs the final decision tree for credit
scoring of customers. Using one credit dataset, results confirm that the classification accuracy of the
proposed hybrid classification model is more than almost the entire classification models that have been
compared in this paper. Furthermore, the number of leaves and the size of the constructed decision tree
(i.e. complexity) are less, compared with other decision tree models. In this work, one financial dataset was
chosen for experiments, including Bank Mellat credit dataset.
Prediction of Default Customer in Banking Sector using Artificial Neural Networkrahulmonikasharma
The aim of this article is to present perdition and risk accuracy analysis of default customer in the banking sector. The neural network is a learning model inspired by biological neuron it is used to estimate and predict that can depend on a large number of inputs. The bank customer dataset from UCI repository, used for data analysis method to extract informative data set from a large volume of the dataset. This dataset is used in the neural network for training data and testing data. In a training of data, the data set is iterated till the desired output. This training data is cross check with test data. This paper focuses on predicting default customer by using deep learning neural network (DNN) algorithm.
Augmentation of Customer’s Profile Dataset Using Genetic AlgorithmRSIS International
Data is the lifeblood of all type of business. Clean,
accurate and complete data is the prerequisite for the decisionmaking
in business process. Data is one of the most valuable
assets for any organization. It is immensely important that the
business focus on the quality of their data as it can help in
increasing the business performance by improving efficiencies,
streamlining operations and consolidating data sources. Good
quality data helps to improve and simplify processes, eliminate
time-consuming rework and externally to enhance a user’s
experience, further translating it to significant financial and
operational benefits [1] [2]. All organizations/ businesses strive to
retain their existing customers and gain new ones. Accurate data
enables the business to improve the customer experience. Data
augmentation adds value to base data by enhancing information
derived from the existing source. Data augmentation can help
reduce the manual intervention required to develop meaningful
information and insight of business data, as well as significantly
enhance data quality. Hence the business can provide unique
customer experience and deliver above and beyond their
expectations. The Data Augmentation is immensely important as
it helps in improving the overall productivity of the business. It
is also important in making the most accurate and relevant
information available quickly for decision making.
This work focuses on augmentation of the customer
dataset using Genetic Algorithm(GA). These augmented data are
used for the purpose of customer behavioral analysis. The data
set consists of the different factors inherent in each situation of
the customer to understand the market strategy. This behavioral
data is used in the earlier work of analyzing the data [13]. It is
found that collecting a very large amount of such data manually
is a very cumbersome process. It is inferred from the earlier
work [13] that the more number of data may give accurate
result. Hence it is decided to enrich the dataset by using Genetic
Algorithm.
CHURN ANALYSIS AND PLAN RECOMMENDATION FOR TELECOM OPERATORSJournal For Research
With increasing number of mobile operators, user is entitled with unlimited freedom to switch from one mobile operator to another if he is not satisfied with service or pricing. This trend is not good for operators as they lose their revenue because of customer switch. To solve it, operators are looking for machine learning tools which can predict well in advance which customer may churn, so that they can predict any alternative plans to satisfy and retain them. In this paper, we design a hybrid machine learning classifier to predict if the customer will churn based on the CDR parameters and we also propose a rule engine to suggest best plans.
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody
Researchers Navin Kasa, Andrew Dahbura, and Charishma Ravoori undertook a capstone project—part of the UVA Data Science Institute Master of Science in Data Science program—that addresses credit card fraud detection through a semi-supervised approach, in which clusters of account profiles are created and used for modeling classifiers.
Projection pursuit Random Forest using discriminant feature analysis model fo...IJECEIAES
A major and demand issue in the telecommunications industry is the prediction of churn customers. Churn describes the customer who attrites from the current provider to competitors searching for better service offers. Companies from the Telco sector frequently have customer relationship management offices it is the main objective in how to win back defecting clients because preserve long-term customers can be much more beneficial than gain newly recruited customers. Researchers and practitioners are paying great attention to developing a robust customer churn prediction model, especially in the telecommunication business by proposed numerous machine learning approaches. Many approaches of Classification are established, but the most effective in recent times is a tree-based method. The main contribution of this research is to predict churners/non-churners in the Telecom sector based on project pursuit Random Forest (PPForest) that uses discriminant feature analysis as a novelty extension of the conventional Random Forest for learning oblique Project Pursuit tree (PPtree). The proposed methodology leverages the advantage of two discriminant analysis methods to calculate the project index used in the construction of PPtree. The first method used Support Vector Machines (SVM) while, the second method used Linear Discriminant Analysis (LDA) to achieve linear splitting of variables during oblique PPtree construction to produce individual classifiers that are robust and more diverse than classical Random Forest. It is found that the proposed methods enjoy the best performance measurements e.g. Accuracy, hit rate, ROC curve, Lift, H-measure, AUC. Moreover, PPForest based on LDA delivers effective evaluators in the prediction model.
The recruitment of new personnel is one of the most essential business processes which affect the quality of
human capital within any company. It is highly essential for the companies to ensure the recruitment of
right talent to maintain a competitive edge over the others in the market. However IT companies often face
a problem while recruiting new people for their ongoing projects due to lack of a proper framework that
defines a criteria for the selection process. In this paper we aim to develop a framework that would allow
any project manager to take the right decision for selecting new talent by correlating performance
parameters with the other domain-specific attributes of the candidates. Also, another important motivation
behind this project is to check the validity of the selection procedure often followed by various big
companies in both public and private sectors which focus only on academic scores, GPA/grades of students
from colleges and other academic backgrounds. We test if such a decision will produce optimal results in
the industry or is there a need for change that offers a more holistic approach to recruitment of new talent
in the software companies. The scope of this work extends beyond the IT domain and a similar procedure
can be adopted to develop a recruitment framework in other fields as well. Data-mining techniques provide
useful information from the historical projects depending on which the hiring-manager can make decisions
for recruiting high-quality workforce. This study aims to bridge this hiatus by developing a data-mining
framework based on an ensemble-learning technique to refocus on the criteria for personnel selection. The
results from this research clearly demonstrated that there is a need to refocus on the selection-criteria for
quality objectives.
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Millennials and the use of social networking sites as a job searching tooljournalBEEI
This research is conducted to examine the factors that influence the behavioural intention of millienials in using SNS when seeking for a job. The data was collected from respondents who are from the generation Y demographic and actively looking for jobs. The respondents must possess some experience in using SNS when job hunting. The data was then gathered and analyzed using partial least square (PLS) which encompasses the measurement and structural models of the study. The findings revealed that three of the constructs as applied in TAM are statistically significant to behavioural intention. The three factors that influenced the job seekers’ intention to use SNSs as a job search tool are; perceived usefulness, perceived ease of use and privacy concerns. All these factors are elements which contribute to and have a significant relationship with job seekers’ intention to use SNSs, as verified using PLS data analysis. The recruiters or employers who intend to adopt SNSs in the recruitment process are advised to design the recruitment plan regarding the utilization of SNSs to be more convenient and user-friendly. This study provides insight and knowledge regarding the impact of technology in online job application and hiring processes.
A survey on discrimination deterrence in data miningeSAT Journals
Abstract
For extracting useful knowledge which is hidden in large set of data, Data mining is a very important technology. There are some negative perceptions about data mining. This perception may contain unfairly treating people who belongs to some specific group. Classification rule mining technique has covered the way for making automatic decisions like loan granting/denial and insurance premium computation etc. These are automated data collection and data mining techniques. According to discrimination attributes if training data sets are biases then discriminatory decisions may ensue. Thus in data mining antidiscrimination techniques with discrimination discovery and prevention are included. It can be direct or indirect. . When choices are created depending on delicate features that period the discrimination is oblique. The elegance is oblique when choices are created depending on nonsensitive features which are strongly correlated with one-sided delicate ones. The suggested system tries to deal with elegance protection in information exploration. It suggests new improved techniques applicable for immediate or oblique elegance protection independently or both simultaneously. Conversations about how to clean coaching information sets and contracted information places in such a way that immediate and/or oblique discriminatory decision guidelines are transformed to genuine classification guidelines are done. New analytics to evaluate the utility of the suggested methods are suggests and comparison of these methods is also done.
Keywords: Antidiscrimination, information exploration, oblique and immediate elegance protection, concept protection, concept generalization, privacy.
Corporate bankruptcy prediction using Deep learning techniquesShantanu Deshpande
Corporate Bankruptcy prediction using Recurrent neural networks – Aim is to build a recurrent neural network-based model to predict whether company will become bankrupt or not using financial ratios of Polish companies.
Methodologies & Tools: CRISP-DM, SMOTE-ENN, GA Algorithm, LSTM network (type of RNN)
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONijaia
Data augmentation has been broadly applied in training deep-learning models to increase the diversity of
data. This study ingestigates the effectiveness of different data augmentation methods for deep-learningbased human intention prediction when only limited training data is available. A human participant pitches
a ball to nine potential targets in our experiment. We expect to predict which target the participant pitches
the ball to. Firstly, the effectiveness of 10 data augmentation groups is evaluated on a single-participant
data set using RGB images. Secondly, the best data augmentation method (i.e., random cropping) on the
single-participant data set is further evaluated on a multi-participant data set to assess its generalization
ability. Finally, the effectiveness of random cropping on fusion data of RGB images and optical flow is
evaluated on both single- and multi-participant data sets. Experiment results show that: 1) Data
augmentation methods that crop or deform images can improve the prediction performance; 2) Random
cropping can be generalized to the multi-participant data set (prediction accuracy is improved from 50%
to 57.4%); and 3) Random cropping with fusion data of RGB images and optical flow can further improve
the prediction accuracy from 57.4% to 63.9% on the multi-participant data set.
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICScscpconf
Many business operations and strategies rely on bankruptcy prediction. In this paper, we aim to
study the impacts of public records and firmographics and predict the bankruptcy in a 12-
month-ahead period with using different classification models and adding values to traditionally
used financial ratios. Univariate analysis shows the statistical association and significance of
public records and firmographics indicators with the bankruptcy. Further, seven statistical
models and machine learning methods were developed, including Logistic Regression, Decision
Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and
Neural Network. The performance of models were evaluated and compared based on
classification accuracy, Type I error, Type II error, and ROC curves on the hold-out dataset.
Moreover, an experiment was set up to show the importance of oversampling for rare event
prediction. The result also shows that Bayesian Network is comparatively more robust than
other models without oversampling.
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
Process Mining (PM) emerged from business process management but has recently been applied to
educational data and has been found to facilitate the understanding of the educational process.
Educational Process Mining (EPM) bridges the gap between process analysis and data analysis, based on
the techniques of model discovery, conformance checking and extension of existing process models. We
present a systematic review of the recent and current status of research in the EPM domain, focusing on
application domains, techniques, tools and models, to highlight the use of EPM in comprehending and
improving educational processes.
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...ijaia
Movies are among the most prominent contributors to the global entertainment industry today, and they
are among the biggest revenue-generating industries from a commercial standpoint. It's vital to divide
films into two categories: successful and unsuccessful. To categorize the movies in this research, a variety
of models were utilized, including regression models such as Simple Linear, Multiple Linear, and Logistic
Regression, clustering techniques such as SVM and K-Means, Time Series Analysis, and an Artificial
Neural Network. The models stated above were compared on a variety of factors, including their accuracy
on the training and validation datasets as well as the testing dataset, the availability of new movie
characteristics, and a variety of other statistical metrics. During the course of this study, it was discovered
that certain characteristics have a greater impact on the likelihood of a film's success than others. For
example, the existence of the genre action may have a significant impact on the forecasts, although another
genre, such as sport, may not. The testing dataset for the models and classifiers has been taken from the
IMDb website for the year 2020. The Artificial Neural Network, with an accuracy of 86 percent, is the best
performing model of all the models discussed.
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...csandit
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns,
sentiments and attitudes toward entities, products, services and their attributes. With the rapid
development in the field of Internet, potential customer’s provides a satisfactory level of
product/service reviews. The high volume of customer reviews were developed for
product/review through taxonomy-aware processing but, it was difficult to identify the best
reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is
developed to predict the pattern for service provider and to improve customer satisfaction based
on the review comments. Associative Regression based Decision Rule Mining performs twosteps
for improving the customer satisfactory level. Initially, the Machine Learning Bayes
Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After
that, Regressive factor of the opinion words and Class labels were checked for Association
between the words by using various probabilistic rules. Based on the probabilistic rules, the
opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of
service preferred by the customers with their review comments. The Associative Regressive
Decision Rule helps the service provider to take decision on improving the customer satisfactory
level. The experimental results reveal that the Associative Regression Decision Rule Mining
(ARDRM) technique improved the performance in terms of true positive rate, Associative
Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of
similar pattern.
INTEGRATION OF MACHINE LEARNING TECHNIQUES TO EVALUATE DYNAMIC CUSTOMER SEGME...IJDKP
The telecommunications industry is highly competitive, which means that the mobile providers need a
business intelligence model that can be used to achieve an optimal level of churners, as well as a minimal
level of cost in marketing activities. Machine learning applications can be used to provide guidance on
marketing strategies. Furthermore, data mining techniques can be used in the process of customer
segmentation. The purpose of this paper is to provide a detailed analysis of the C.5 algorithm, within naive
Bayesian modelling for the task of segmenting telecommunication customers behavioural profiling
according to their billing and socio-demographic aspects. Results have been experimentally implemented.
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody
Researchers Navin Kasa, Andrew Dahbura, and Charishma Ravoori undertook a capstone project—part of the UVA Data Science Institute Master of Science in Data Science program—that addresses credit card fraud detection through a semi-supervised approach, in which clusters of account profiles are created and used for modeling classifiers.
Projection pursuit Random Forest using discriminant feature analysis model fo...IJECEIAES
A major and demand issue in the telecommunications industry is the prediction of churn customers. Churn describes the customer who attrites from the current provider to competitors searching for better service offers. Companies from the Telco sector frequently have customer relationship management offices it is the main objective in how to win back defecting clients because preserve long-term customers can be much more beneficial than gain newly recruited customers. Researchers and practitioners are paying great attention to developing a robust customer churn prediction model, especially in the telecommunication business by proposed numerous machine learning approaches. Many approaches of Classification are established, but the most effective in recent times is a tree-based method. The main contribution of this research is to predict churners/non-churners in the Telecom sector based on project pursuit Random Forest (PPForest) that uses discriminant feature analysis as a novelty extension of the conventional Random Forest for learning oblique Project Pursuit tree (PPtree). The proposed methodology leverages the advantage of two discriminant analysis methods to calculate the project index used in the construction of PPtree. The first method used Support Vector Machines (SVM) while, the second method used Linear Discriminant Analysis (LDA) to achieve linear splitting of variables during oblique PPtree construction to produce individual classifiers that are robust and more diverse than classical Random Forest. It is found that the proposed methods enjoy the best performance measurements e.g. Accuracy, hit rate, ROC curve, Lift, H-measure, AUC. Moreover, PPForest based on LDA delivers effective evaluators in the prediction model.
The recruitment of new personnel is one of the most essential business processes which affect the quality of
human capital within any company. It is highly essential for the companies to ensure the recruitment of
right talent to maintain a competitive edge over the others in the market. However IT companies often face
a problem while recruiting new people for their ongoing projects due to lack of a proper framework that
defines a criteria for the selection process. In this paper we aim to develop a framework that would allow
any project manager to take the right decision for selecting new talent by correlating performance
parameters with the other domain-specific attributes of the candidates. Also, another important motivation
behind this project is to check the validity of the selection procedure often followed by various big
companies in both public and private sectors which focus only on academic scores, GPA/grades of students
from colleges and other academic backgrounds. We test if such a decision will produce optimal results in
the industry or is there a need for change that offers a more holistic approach to recruitment of new talent
in the software companies. The scope of this work extends beyond the IT domain and a similar procedure
can be adopted to develop a recruitment framework in other fields as well. Data-mining techniques provide
useful information from the historical projects depending on which the hiring-manager can make decisions
for recruiting high-quality workforce. This study aims to bridge this hiatus by developing a data-mining
framework based on an ensemble-learning technique to refocus on the criteria for personnel selection. The
results from this research clearly demonstrated that there is a need to refocus on the selection-criteria for
quality objectives.
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Millennials and the use of social networking sites as a job searching tooljournalBEEI
This research is conducted to examine the factors that influence the behavioural intention of millienials in using SNS when seeking for a job. The data was collected from respondents who are from the generation Y demographic and actively looking for jobs. The respondents must possess some experience in using SNS when job hunting. The data was then gathered and analyzed using partial least square (PLS) which encompasses the measurement and structural models of the study. The findings revealed that three of the constructs as applied in TAM are statistically significant to behavioural intention. The three factors that influenced the job seekers’ intention to use SNSs as a job search tool are; perceived usefulness, perceived ease of use and privacy concerns. All these factors are elements which contribute to and have a significant relationship with job seekers’ intention to use SNSs, as verified using PLS data analysis. The recruiters or employers who intend to adopt SNSs in the recruitment process are advised to design the recruitment plan regarding the utilization of SNSs to be more convenient and user-friendly. This study provides insight and knowledge regarding the impact of technology in online job application and hiring processes.
A survey on discrimination deterrence in data miningeSAT Journals
Abstract
For extracting useful knowledge which is hidden in large set of data, Data mining is a very important technology. There are some negative perceptions about data mining. This perception may contain unfairly treating people who belongs to some specific group. Classification rule mining technique has covered the way for making automatic decisions like loan granting/denial and insurance premium computation etc. These are automated data collection and data mining techniques. According to discrimination attributes if training data sets are biases then discriminatory decisions may ensue. Thus in data mining antidiscrimination techniques with discrimination discovery and prevention are included. It can be direct or indirect. . When choices are created depending on delicate features that period the discrimination is oblique. The elegance is oblique when choices are created depending on nonsensitive features which are strongly correlated with one-sided delicate ones. The suggested system tries to deal with elegance protection in information exploration. It suggests new improved techniques applicable for immediate or oblique elegance protection independently or both simultaneously. Conversations about how to clean coaching information sets and contracted information places in such a way that immediate and/or oblique discriminatory decision guidelines are transformed to genuine classification guidelines are done. New analytics to evaluate the utility of the suggested methods are suggests and comparison of these methods is also done.
Keywords: Antidiscrimination, information exploration, oblique and immediate elegance protection, concept protection, concept generalization, privacy.
Corporate bankruptcy prediction using Deep learning techniquesShantanu Deshpande
Corporate Bankruptcy prediction using Recurrent neural networks – Aim is to build a recurrent neural network-based model to predict whether company will become bankrupt or not using financial ratios of Polish companies.
Methodologies & Tools: CRISP-DM, SMOTE-ENN, GA Algorithm, LSTM network (type of RNN)
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONijaia
Data augmentation has been broadly applied in training deep-learning models to increase the diversity of
data. This study ingestigates the effectiveness of different data augmentation methods for deep-learningbased human intention prediction when only limited training data is available. A human participant pitches
a ball to nine potential targets in our experiment. We expect to predict which target the participant pitches
the ball to. Firstly, the effectiveness of 10 data augmentation groups is evaluated on a single-participant
data set using RGB images. Secondly, the best data augmentation method (i.e., random cropping) on the
single-participant data set is further evaluated on a multi-participant data set to assess its generalization
ability. Finally, the effectiveness of random cropping on fusion data of RGB images and optical flow is
evaluated on both single- and multi-participant data sets. Experiment results show that: 1) Data
augmentation methods that crop or deform images can improve the prediction performance; 2) Random
cropping can be generalized to the multi-participant data set (prediction accuracy is improved from 50%
to 57.4%); and 3) Random cropping with fusion data of RGB images and optical flow can further improve
the prediction accuracy from 57.4% to 63.9% on the multi-participant data set.
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICScscpconf
Many business operations and strategies rely on bankruptcy prediction. In this paper, we aim to
study the impacts of public records and firmographics and predict the bankruptcy in a 12-
month-ahead period with using different classification models and adding values to traditionally
used financial ratios. Univariate analysis shows the statistical association and significance of
public records and firmographics indicators with the bankruptcy. Further, seven statistical
models and machine learning methods were developed, including Logistic Regression, Decision
Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and
Neural Network. The performance of models were evaluated and compared based on
classification accuracy, Type I error, Type II error, and ROC curves on the hold-out dataset.
Moreover, an experiment was set up to show the importance of oversampling for rare event
prediction. The result also shows that Bayesian Network is comparatively more robust than
other models without oversampling.
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
Process Mining (PM) emerged from business process management but has recently been applied to
educational data and has been found to facilitate the understanding of the educational process.
Educational Process Mining (EPM) bridges the gap between process analysis and data analysis, based on
the techniques of model discovery, conformance checking and extension of existing process models. We
present a systematic review of the recent and current status of research in the EPM domain, focusing on
application domains, techniques, tools and models, to highlight the use of EPM in comprehending and
improving educational processes.
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...ijaia
Movies are among the most prominent contributors to the global entertainment industry today, and they
are among the biggest revenue-generating industries from a commercial standpoint. It's vital to divide
films into two categories: successful and unsuccessful. To categorize the movies in this research, a variety
of models were utilized, including regression models such as Simple Linear, Multiple Linear, and Logistic
Regression, clustering techniques such as SVM and K-Means, Time Series Analysis, and an Artificial
Neural Network. The models stated above were compared on a variety of factors, including their accuracy
on the training and validation datasets as well as the testing dataset, the availability of new movie
characteristics, and a variety of other statistical metrics. During the course of this study, it was discovered
that certain characteristics have a greater impact on the likelihood of a film's success than others. For
example, the existence of the genre action may have a significant impact on the forecasts, although another
genre, such as sport, may not. The testing dataset for the models and classifiers has been taken from the
IMDb website for the year 2020. The Artificial Neural Network, with an accuracy of 86 percent, is the best
performing model of all the models discussed.
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...csandit
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns,
sentiments and attitudes toward entities, products, services and their attributes. With the rapid
development in the field of Internet, potential customer’s provides a satisfactory level of
product/service reviews. The high volume of customer reviews were developed for
product/review through taxonomy-aware processing but, it was difficult to identify the best
reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is
developed to predict the pattern for service provider and to improve customer satisfaction based
on the review comments. Associative Regression based Decision Rule Mining performs twosteps
for improving the customer satisfactory level. Initially, the Machine Learning Bayes
Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After
that, Regressive factor of the opinion words and Class labels were checked for Association
between the words by using various probabilistic rules. Based on the probabilistic rules, the
opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of
service preferred by the customers with their review comments. The Associative Regressive
Decision Rule helps the service provider to take decision on improving the customer satisfactory
level. The experimental results reveal that the Associative Regression Decision Rule Mining
(ARDRM) technique improved the performance in terms of true positive rate, Associative
Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of
similar pattern.
INTEGRATION OF MACHINE LEARNING TECHNIQUES TO EVALUATE DYNAMIC CUSTOMER SEGME...IJDKP
The telecommunications industry is highly competitive, which means that the mobile providers need a
business intelligence model that can be used to achieve an optimal level of churners, as well as a minimal
level of cost in marketing activities. Machine learning applications can be used to provide guidance on
marketing strategies. Furthermore, data mining techniques can be used in the process of customer
segmentation. The purpose of this paper is to provide a detailed analysis of the C.5 algorithm, within naive
Bayesian modelling for the task of segmenting telecommunication customers behavioural profiling
according to their billing and socio-demographic aspects. Results have been experimentally implemented.
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales Shrikant Samarth
Task: You are required to choose a dataset (or related datasets) in an area of interest suitable for analyzing customer relationships.
Approach: Topic is chosen – Customer behavior Analysis in Ecommerce Industry for Enhancing Sales. Brazilian E-commerce public dataset was downloaded, cleaned and performed multiple regression in SPSS to check the relationship between the dependent variable and multiple independent variables.
Findings: Customer can be retained if the product delivered in time and if there is a delay in the product delivery, it is a duty of a seller to inform the customer for the same. The payment method has proven to be an important parameter to enhance sales over a period of time. analysis suggests on-time delivery, flexibility in payment method and good customer service would help the seller to gain customer trust which would help them to convert more sales.
Tools: IBM SPSS , Excel (pivot tables and charts), Tableau
APPLYING DATA MINING IN CUSTOMER RELATIONSHIP MANAGEMENTIJITCA Journal
In this article we are going to define the overall customer relationship management (CRM) and Data mining, Factors between the techniques and software to "data mining" in "CRM" and the interaction between two concepts. For this purpose and after that in past studies and reports on issues of "data mining" and "CRM" took place between them. The effect of "data mining" and extract latent information from large databases of valuable customer has made their determination, and aintenance in order to attract customers through its taken a step forward and ultimately achieve profitability and efficiency are
good.
APPLYING DATA MINING IN CUSTOMER RELATIONSHIP MANAGEMENTIJITCA Journal
In this article we are going to define the overall customer relationship management (CRM) and Data mining, Factors between the techniques and software to "data mining" in "CRM" and the interaction between two concepts. For this purpose and after that in past studies and reports on issues of "data mining" and "CRM" took place between them. The effect of "data mining" and extract latent information from large databases of valuable customer has made their determination, and maintenance in order to attract customers through its taken a step forward and ultimately achieve profitability and efficiency are
good
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Proposed ranking for point of sales using data mining for telecom operatorsijdms
This study helps telecom companies in making decisions that optimize its sales points to reduce costs, also
to identify profitable customers and churn ones. This study builds two research models; physical model for
continuous mining of database where ever it resides i.e., as we have On Line Analytic Processing (OLAP)
we must have On Line Data Mining (OLDM), and logical model using Technology Acceptance Model.
Previous Studies showed that using basic information of customers, call details and customer service
related data, a model can effectively achieve accurate prediction data.
This research gives a new definition and classification for telecommunication services from the data
mining point of view. Then this research proposed a formula for total rank a shop and each term of this
formula gives a sub rank. The proposed example shows that even a shop with lower numbers of population
and visitors, it still has higher rank.
This research suggested that telecom operators has to concentrate more on their e-shopping and epayment
as it is more cost effective and use data from shops for marketing issues. Some assumptions made
in this study need to be validated using surveys, also proposed ranking should be applied on live database.
A Study on CLTV Model in E Commerce Domains using Pythonijtsrd
Customer Relationship Management CRM system is an information management and analysis tool that can help businesses and other organizations manage their interactions with customers. CRMs were originally designed to target large corporations, but the internet has allowed small business owners to take advantage of these tools as well. Customer data is collected in a CRM database, which allows for advanced analysis such as customer segmentation and contact history. Customer relationship management system CRMs is a process in which a business or other organization administers its interactions with customers, typically using data analysis to study large amounts of information. In this article, we will be explaining how you can a E commerce company can apply their customer relationship management system to analyze their customer base by CLTV, a key marketing metric that allows you to evaluate the impact and outcomes of the firm’s customer relationship management strategies and tactics. In order to increase revenue through better marketing campaigns. E commerce companies consider that customers are their most important asset and that it is essential to estimate the potential value of this asset. Hence, a model for calculating customers value is essential in these domains. We describe a general modeling approach, based on BG NBD and Gamma Gamma models, for calculating customer value in the e commerce domain. This model extends existing models from the field of direct marketing, by taking into account a sample set of variables required for evaluating customers value in an e commerce environment. In addition, we present an algorithm for generating this model from historical data, as well as an application of this modeling approach for the creation of a model for e commerce. This model provides more accurate predictions than existing models regarding the future income generated by customers using Python. Rasamallu Sai Bharath Reddy | Dr. T. Narayana Reddy "A Study on CLTV Model in E-Commerce Domains using Python" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-6 , October 2022, URL: https://www.ijtsrd.com/papers/ijtsrd51952.pdf Paper URL: https://www.ijtsrd.com/other-scientific-research-area/other/51952/a-study-on-cltv-model-in-ecommerce-domains-using-python/rasamallu-sai-bharath-reddy
A widely used approach for gaining insight into the heterogeneity of consumer’s buying behavior is market segmentation. Conventional market segmentation models often ignore the fact that consumers’ behavior may evolve over time. Therefore retailers consume limited resources attempting to service unprofitable consumers. This study looks into the integration between enhanced Recency, Frequency, Monetary (RFM) scores and Consumer Lifetime Value (CLV) matrix for a medium size retailer in the State of Kuwait. A modified regression algorithm investigates the consumer purchase trend gaining knowledge from a pointof-sales data warehouse. In addition, this study applies enhanced normal distribution formula to remove outliers, followed by soft clustering Fuzzy C-Means and hard clustering Expectation Maximization (EM) algorithms to the analysis of consumer buying behavior. Using cluster quality assessment shows EM algorithm scales much better than Fuzzy C-Means algorithm with its ability to assign good initial points in the smaller dataset.
Prediction of Corporate Bankruptcy using Machine Learning Techniques Shantanu Deshpande
Aim is to build a classification model to predict whether company will become bankrupt or not using financial ratios of Polish companies. Applied various machine learning models like Random Forest, KNN, AdaBoost & Decision Tree with pre-processing techniques like SMOTE-ENN (to deal with class imbalance) & feature selection (for identifying ) and trained on Polish Bankruptcy dataset with prediction accuracy of 89%.
Making Analytics Actionable for Financial Institutions (Part I of III)Cognizant
To maximize ROI from their analytics platforms, financial institutions must build solutions that explicitly, visibly and sustainably enable real-time translation of data into meaningful and continuous improvements in their products, services, operating models and supporting infrastructures.
CUSTOMERS PERCEPTION TOWARDS CRM PRACTICES ADOPTED BY PUBLIC SECTOR BANKS IN ...IAEME Publication
The CRM practices are adopted to generate better understanding of the customers for product development, segmentation, appropriate targeting, campaign management and maintenance of long term profitable and mutually beneficial relationships with customers. A very small proportion of its potential has been utilized. Today's banking is known as Innovative banking. Driven by new technologies, changing customer preferences, and increased competition, banks have taken to heavy investments in new distribution channels like advance automated teller machines, telephone systems, and online banking, etc. The research work is an empirical study intended to explore the technological revolution that the commercial banks witnessed and how far it has benefited banks to build better customer relationship management (CRM) services of public sector banks.
Data Mining on Customer Churn ClassificationKaushik Rajan
Implemented multiple classifiers to classify if a customer will leave or stay with the company based on multiple independent variables.
Tools used:
> RStudio for Exploratory data analysis, Data Pre-processing and building the models
> Tableau and RStudio for Visualization
> LATEX for documentation
Machine learning models used:
> Random Forest
> C5.0
> Decision tree
> Neural Network
> K-Nearest Neighbour
> Naive Bayes
> Support Vector Machine
Methodology: CRISP-DM
Similar to Clustering Prediction Techniques in Defining and Predicting Customers Defection: The Case of E-Commerce Context (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Online aptitude test management system project report.pdfKamal Acharya
The purpose of on-line aptitude test system is to take online test in an efficient manner and no time wasting for checking the paper. The main objective of on-line aptitude test system is to efficiently evaluate the candidate thoroughly through a fully automated system that not only saves lot of time but also gives fast results. For students they give papers according to their convenience and time and there is no need of using extra thing like paper, pen etc. This can be used in educational institutions as well as in corporate world. Can be used anywhere any time as it is a web based application (user Location doesn’t matter). No restriction that examiner has to be present when the candidate takes the test.
Every time when lecturers/professors need to conduct examinations they have to sit down think about the questions and then create a whole new set of questions for each and every exam. In some cases the professor may want to give an open book online exam that is the student can take the exam any time anywhere, but the student might have to answer the questions in a limited time period. The professor may want to change the sequence of questions for every student. The problem that a student has is whenever a date for the exam is declared the student has to take it and there is no way he can take it at some other time. This project will create an interface for the examiner to create and store questions in a repository. It will also create an interface for the student to take examinations at his convenience and the questions and/or exams may be timed. Thereby creating an application which can be used by examiners and examinee’s simultaneously.
Examination System is very useful for Teachers/Professors. As in the teaching profession, you are responsible for writing question papers. In the conventional method, you write the question paper on paper, keep question papers separate from answers and all this information you have to keep in a locker to avoid unauthorized access. Using the Examination System you can create a question paper and everything will be written to a single exam file in encrypted format. You can set the General and Administrator password to avoid unauthorized access to your question paper. Every time you start the examination, the program shuffles all the questions and selects them randomly from the database, which reduces the chances of memorizing the questions.
Water billing management system project report.pdfKamal Acharya
Our project entitled “Water Billing Management System” aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard.
The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record.
We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular.
MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2367 – 2383
2368
concurrent. This is a common challengefor any online company. We therefore address the following question
to better retain customers: How can we avoid reducing customer retention rate? Put differently, how can we
reduce customer churn in e-commerce context? According to Neslin et al., [3] and Burez [4], two basic
approaches exist for resolving this issue: On the one hand, „untargeted approaches‟ that rely on superior
product and mass advertising to increase brand loyalty and retain customers. A good example of this is
AOL‟s efforts to decrease churn through better software and content [5]. On the other hand, „targeted
approaches‟ that rely on identifying potential churners in order to avoid defection by targeting such
customers with direct incentives [4], [6-9]. In this study, we are concerned with the second approach. For that
alone, we investigate whether or not we are able to identify the moment when customers begin to discontinue
their relationship with e-commerce website in order to target them by retention programs to avoid their total
defection.
Customer relationship management, and customer churn prediction in particular, have received a
growing attention during the last decade. Table 1 summarizes customer churn prediction models reported in
the literature in recent years. The distinctive characteristics of each study in terms of the sectors, environment
settings, defection types, and churn definitions are provided. As can be seen from Table 1, there are two
major remarks: (1) Environment settings: great number of studies are in the contractual setting, that is
characterized by the existence of a contract between the firm and the customer, in such a case, the date of
churn is clearly known, and it matches up with the contract cancellation date. (2) Partial or total defection:
most of those studies consider total defection, whilst only few studies use prediction models to identify
partial defection [6], [10]-[12]. Moreover, each of those studies defines customer churn differently, this raises
the following question: Which one is more appropriate?
Table 1 reveals that the churn issue has been under-researched in the e-commerce sector. Moreover,
all analyses in this sector consider total defection (defection column).To discover both partial and total
defection in e-commerce sector, this study contributes to the extant literature in two important ways. First, it
combines LRFM model and clustering techniques during a calibration period (T1) to segment all customers
into homogeneous clusters, then an LRFM pattern will be assigned to each cluster [13]. Change in the LRFM
pattern (Moving a customer from one cluster with an important value in T1 to another group of less value in
prediction period (T2)) may be a partial or total defection signal. Second, it introduces classification
techniques for building prediction models to predict both partial and total defection in order to minimize the
risk of churn.
On the other hand, contrary to research that seeks to retain only profitable customers [6], [7], [14],
[15] or those that spend many efforts for the entire customer base [9], [16], [17], our study is centered not
only on the customers who belong to the clusters representing the core customers, but also on those who
demonstrate positive change in their purchase behavior even if they are grouped in clusters that do not
contribute positively to profits.
The creation of a retention program that targets all types of customers will be very costly for the
company. By adopting a method that focuses only on profitable customers, companies, especially those
working in e-commerce field, can lose some customers. This could be ascribed to the lack of their
engagement with the beneficiaries of the retention programs, which will lead to increased customer churn
rate followed by a decrease in profits. These customers really deserve attention from the company; so they
should not be eliminated, but they should be placed in another category. This is an important point because
no company wants to miss the opportunity of converting a previously dissatisfied customer into a loyal
customer. These customers are those that demonstrate positive change in their purchase behavior even if they
are grouped in clusters that do not contribute positively to profits. The identification of these customers will
be discussed in the following sections.
For example, in a situation where the goal of a company is to retain only profitable customers, the
company should discover why customers leave and go to competitors. A churn analysis for their profitable
customer‟s segment shows that some customers leave the e-commerce website because delivery charges are
not free. Subsequently, the company decides to reduce delivery costs for the most profitable customer in
order to retain them. However, the less profitable customers are not served with this reduce; only profitable
customers are satisfied. Therefore, targeting only profitable customers is not an optimal strategy for
increasing retention rate because a group of customers was profitable in the past, doesn‟t mean it will
continue to be so in the future [18].
The rest of this paper is organized by 4 other sections, the research method including segmenting
methods and data mining techniques used in this study are briefly described in Section 2 followed by an
empirical study in Section 3 to demonstrate how this prediction approach works in practice, whereas in the
fourth section we discuss the results. The paper finishes with the conclusion, limitation and some issues for
future research.
3. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering Prediction Techniques in Defining and Predicting Customers Defection: … (Ait Daqud Rachid)
2369
Table 1. Literature ReviewChurndefinition
Customershifttransactionpattern.
Achurnerisdefinedasasubscriberwhoisvoluntarytoleave
Changeincustomer'sstatus.
Subscriberdoesnotreneworpay.
Notmentioned.
Customerstopsdoingbusiness.
Customerswho,fromacertainperiod,didnotbuyanythingorthose
whoinallsubsequentperiodsspentlessthan40%oftheamountspent
inthereferenceperiod.
Notmentioned.
Gamblerdoesnotplayduringaperiod.
Setofdefinitionsofchurn.
Userleavescompany.
Inactivityinsecondhalfoftheyear.
Customerswhoquittheserviceortransfertocompetitorsatadateclose
toaspecificdate.
Shiftingloyaltiesfromoneserviceprovidertoanother
Customerdeclaredchurnedifherportfoliosizefallsbelowaspecific
thresholdvalueandstaysthatwayforsixconsecutivemonths.
Customerwithfewsessionsduringthe8–12monthsafterregistration.
Whenthecompanyhasbeenactive(i.e.hasatleastonetransactionin
theyear)buthasnoactivity(i.e.purchase)intheyear.
Customerwiththeinsureratthestartoftheyearbutisnolongeratthe
endoftheyear.
CustomershiftsLRFMpattern,changesinpurchasebehavior.
Defection
Partial
Total
PartialandTotal
Total
Total
Total
Partial
Total
Total
Partial
Total
Total
Total
Total
Total
Total
Total
Total
PartialandTotal
Environmentsetting
Offlineenvironment/Non-contractualsetting
Offlineenvironment/Contractualsetting
Offlineenvironment/Contractualsetting
Offlineenvironment/Contractualsetting
Offlineenvironment/Contractualsetting
Onlineenvironment/Non-contractualsetting
Offlineenvironment/Non-contractualsetting
Offlineenvironment/Contractualsetting
Onlineenvironment/Non-contractualsetting
Offlineenvironment/Non-contractualsetting
Offlineenvironment/Contractualsetting
Onlineenvironment/Non-contractualsetting
Offlineenvironment/Contractualsetting
Offlineenvironment/Contractualsetting
Offlineenvironment/Contractualsetting
Onlineenvironment/Non-contractualsetting
Onlineenvironment/Non-contractualsetting
Offlineenvironment/Contractualsetting
Onlineenvironment/Non-contractualsetting
Sectors
Retailing
Telecommunications
Telecommunications
Pay-TVsubscription
Financialservices
E-commerce
Retailing
Telecommunications
Onlinegambling
Retailing
Telecommunications
E-commerce(B2B)
Telecommunications
Financialservices
Financialservices
E-commerce
E-commerce(B2B)
Financialservices
E-commerce
Literaturereview
Buckinxandvanden
Poel(2005)[6]
Shin-YuanHung,et
al.(2006)[14]
Jae-HyeonAhna,et
al.(2006)[12]
Burezandvanden
Poel(2007)[4]
Xie,etal.(2009)[19]
XiaobingYu,etal.
(2011)[17]
V.L.Miguéis,etal.
(2012)[10]
BingquanHuang,et
el.(2012)[20]
KristofCoussement,
etal.(2013)[57]
M.Clemente-Císcar,
al.(2014)[11]
Faris(2014)[21]
A.T.Jahromi,etal.
(2014)[7]
K.Kyoungoketal.
(2014)[22]
M.A.H.Farquad,etal
(2014)[15]
OzdenGurAliand
UmutArıturk(2014)
[8]
Ssu-HanChen(2016)
[23]
NiccolòGordinia,
ValerioVegliob
(2017)[9]
N.Holtrop,etal.
(2017)[24]
Thisstudy
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2367 – 2383
2370
2. RESEARCH METHOD
The purpose of this study is to build a customer‟s churn prediction model in e-commerce sector by
using clustering and prediction techniques to predict those customers who are likely to churn in the near
future in order to minimize the risk of churn.
2.1. Customer profiling
Market segmentation is the process of identifying key groups within the general market that share
specific characteristics and consuming habits [25]. RFM model, which was proposed by Hughes (1994) [26],
is one of the most common methods for segmenting and identifying customer values in companies.Clustering
techniques have been widely used to segment customers when using RFM model [13], [25], [27]-[29]. In this
section, we discuss k-means as clustering technique and the LRFM model as the extended version of RFM
model that consider customer relationship length (L) we use for customer profiling task.
2.1.1. RFM and LRFM models
RFM model is an effective method of segmenting and it is likewise a behavioral analysis that can be
employed for market segmentation [30], [31]. A. Hughes [30] describes that the main asset of the RFM
method is, on the one hand, to obtain customers‟ behavioral analysis in order to group them into
homogeneous clusters, and, on the other hand, to develop a marketing plan tailored to each specific market
segment. RFM analysis improves the market segmentation by examining the when (recency), how often
(frequency), and the money spent (monetary) in a particular item or service [32]. A. Yang [32] summarized
that customers who had bought most recently, most frequently, and had spent the most money would be
much more likely to react to the future promotions.Some researchers try to develop new RFM models by
adding some additional parameters to it so as to examine whether they achieve good results than the basic
RFM model or not [33]-[35]. For example, Chang and Tsay [36] propose the LRFM model, by taking the
customer relation length into account, in order to resolve RFMmodel problem related to the difficulty of
distinguishing between customers, who have long-term or short-termrelationships with the company. In
addition, S. Chow and R. Holden [37] suggest that the customer‟s loyaltyand profitability depend on the
relationship between a company and its customers. In this regard, in order to identify most loyal customers, it
is necessary to consider the customer‟s relation length (L), where L is defined as the number of time periods
(such as days) from the first purchase to the last purchase in the database.
2.1.2. K-means method
K-means clustering is the most common algorithm used to cluster n vectors based on attributes into
k partitions, where k < n, depending on some measures. The name comes from the fact that k clusters are
identified, and the center of a cluster is the mean of all vectors within this cluster. The algorithm starts with
choosing k random initial centroids, then assigns vectors to the nearest centroid using Euclidean distance and
recalculates the new centroids as means of the assigned data vectors. This process is repeated many times
until vectors no longer altered clusters between iterations [38].
However, in the k-means technique, the number of clusters is randomly selected, which means that
the clustering result will become unreliable if the supposed number of the clusters is incorrect [39], [40], this
raises the following fundamental question: How to choose the right number of expected clusters (k)?.
Some types of efficient clustering quality indexes can help determine the best number. In this study,
we have used two methods for determining the optimal number of clusters for k-means. These methods
consist of optimizing a criterion, such as the within cluster sums of squares and the average silhouette. The
corresponding methods are named elbow and silhouette methods, respectively. In this study, the sum of
squared errors (SSE) and the average silhouette coefficient which are shown in the Equations (1) and (2)
respectively, are combined to measure the quality of clustering and to determine the optimal clustering
number. Specifically, we applied k-means technique under different k values and then we plot the curves of
the SSE and average silhouette coefficient against the number of clusters to analyze the two curves and to
identify the optimal number of cluster. The optimal clustering number can be found in a data set by looking
for the number of clusters at which a knee, peak, or dip exists in the plot of the evaluation measure when
plotted against the number of clusters [41].
∑ ∑ ‖ ‖ (1)
Where k is the number ofclusters, yi is the jthobject in cluster Ci and ciis the center of cluster Ci.
(2)
5. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering Prediction Techniques in Defining and Predicting Customers Defection: … (Ait Daqud Rachid)
2371
Where ai is the average distance of object I to all other objects in its cluster; for object I and any cluster not
containing it, calculate the average distance of the object to all the objects in the given cluster, and bi is the
minimum of such values with respect to all clusters.The details of SSE and Silhouette can be
foundrespectively in [42], [43].
This study combines K-means and LRFM model in e-commerce sector to divide the customer base
up into homogeneous clusters according to their L, R, F and M values. Similarly to Chang and Tsay [36], we
will use average LRFM values of each cluster to compare with the total average LRFM values of all clusters.
If the average (L,R,F,M) value of a cluster is greater than the total average, an over bar appears. However, if
the average (L,R,F,M) value of a cluster is less than the total average, an under bar appears. (i.e., : Higher R
value; customer have recently made a purchase, : Lower R value; customer have not buy on the online store
for a long time).
Chang and Tsay [36] based on Ha and Park [44] further proposed customer classification by
summing the sixteen combinations of LRFM model to five kinds of customer groups according to their
LRFM patterns, such as core customers, potential customers, lost customers, new customers, and resource-
consumption customers. Specifically, core customers include L↑R↑F↑M↑, L↑R↑F↑M↓, and L↑R↑F↓M↑.
Potential customers consist of L↑R↓F↑M↑, L↑R↓F↑M↓, and L↑R↓F↓M↑. Lost customers are composed of
L↓R↓F↑M↑, L↓R↓F↑M↓, L↓R↓F↓M↑, and L↓R↓F↓M↓. New customers comprise L↓R↑F↓M↓, L↓R↑F↑M↓,
L↓R↑F↓M↑, and L↓R↑F↑ M↑. Finally, resource-consumption customers are L↑R↑F↓ M↓ and L↑R↓F↓M↓.
When different LRFM combinations are identified during a period T, customers can be classified
into appropriate groups such as core customers, potential customers, lost customers, new customers, and
resource-consumption customers. First, we focus on customers belong to core customers, new customers (no
company want to miss new customers), second, we take into account those belonging during the period T to
other remaining groups, and which are subsequently converted into core customers in T+1.More specifically,
the customers in our clusters of attention belong to the following patterns:
a. , , , , and during a period T.
b. Customers who do not belong in period T to the patterns listed in (1), but in the T+1 period, their LRFM
pattern transformed into one of the patterns mentioned in (1).
Customers who were clustered during the period T with potential, lost or resource-consumption customers,
and that are stayed in the same group or are transformed to a lower value group in T+1, they will be removed.
2.2. Partial and total churning
Among the first main hurdles which face on the customers churn prediction in the non-contractual
businesses is the difficulty of defining churn because the characteristics that should be observed to saying
that a customer has totally or partially defected are not clearly defined [11].
For solving the problems above, (definition of customer churn) LRFM model and clustering
technique (k-means) are combined. This study proposes a new procedure by joining the quantitative values of
the LRFMattributes, extracted during a period T, into K-means algorithm to identify the different types of
customer profiles (different LRFM patterns). We then define a customer‟s LRFM pattern change from a core
( , , ) or a new customer ( , , ) to potential customer ( , ,
) or to low consuming resource customer groups ( ) as partial defection. By the same token, if a
customer changes her LRFM model from one of two following types of customer: core customers ( ,
, )or new customer ( , , ) to the lost customers ( , , ,
) or to high consuming resource customer groups ( ), in this case, we are talking about total
defection. This would indicate that a customer‟s change in LRFM patterns is an early signal of either partial
or total defection. Whereas customers who staying true to their existing positive patterns are likely to stay.
For this purpose, as shown in Figure 1, we consider two equal sub-periods T1 and T2. T1 is used to
determine the different customer groups (different LRFM patterns) and assign each customer to its
appropriate group. The period T2 is used to determine partial or total defection. Figure 2 illustrates our
proposed approach to defining partial and total defection, and the full process is summarized in Figure 3.
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2367 – 2383
2372
Figure 1. Period of observation. T1 this period of eight months (from November 2013 to June 2014) was also
used to derive the independent variables (calibration period) of the model. T2 this period of eight months
(from July, 2014 to February, 2015) was used to derive the dependent variable (prediction period)
Figure 2. Deviation of customers LRFM pattern over time to define customer churn
Observation
midpointT1.1 T1.2
T1: Period to identify different customer groups T2: Period to determine partial and total defection
Nov-2013 … … Jun-2014 Jul-2014 … … Feb-2015T1 T2
Potential customers
Core customers
Resource-consumption
customers
Resource-consumption
customers
New customers
Lost customers
T1 T2
A1
A2
B1
B2
B2
A1: Partial defection
A2: Total defection
B1: Partial defection
B2: Total defection
Time
x x
Potential customers
7. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering Prediction Techniques in Defining and Predicting Customers Defection: … (Ait Daqud Rachid)
2373
Figure 3. Defining churn in non-contractual settings methodology based on LRFM model and K-means
technique
2.3. Classification techniques
The objective of this research was to develop a predictive model for customer churn in anon-
contractual setting which would be able to distinguish between customers who are likely to partially or totally
churn in the near future and the ones who are likely to stay with the company based on historical transactions
and characteristics of a customer. To reach this goal three models are proposed, the first is based on decision
tree techniques (DT), the second on artificial neural networks (ANN) and the third based on an ensemble of
decision trees. We note that all our models are constructed using KNIME Analytics Platform 3.3.2. The
following is the short description for these known data mining techniques used for this task.
2.3.1. Artificial neural networks (ANN)
Unlike to conventional statistical methods, artificial neural networks do not need any hypothesis on
the variables, they are well-suited to handle unstructured complex problems, i.e issues on which there is no a
priori specify the form of relationships between variables.
Neural networks can be distinguished into single-layer perceptron and multilayer perceptron (MLP),
in this paper, we use the MLP structure that allows realizing the most diverse applications. An MLP network
is generally composed of a finite set of cells (neurons), organized in successive layers. The first layer
comprising several neurons is called the input layer, the last layer is the output layer, and the intermediate
layers (if any) are the hidden layers. Neurons in different layers are connected by sigmoid or hyperbolic
tangent functions that are used as activation functions in Multi-layer perception. The details of MLP can be
found in [45].
2.3.2. Simple decision tree (DT)
Decision tree (DT) is one of the most data mining techniques for knowledge discovery and it used
usually for the purpose of classification and prediction [46]. The simplicity and ease of interpreting the
Data set
Select customers with
date_session in T1
Select customers with
date_session in T2
Extract the values of L, R, F and M for each
customer
Extract the values of L, R, F and M for each
customer
Normalize the values of all variables (Z-
score Normalization)
Normalize the values of L, R, F and M in T2
according to the normalization parameters
as given in T1
Determine the best number of cluster (K)
by using SSE and silhouette methods
Application of cluster assigner that assigns
existing customers in T2 to the existing
cluster, which are obtained by k-means in T1
Using the number (K) as input parameter
of K-means to segment all customers into
(K) clusters according to their L, R, F and M
values
K clusters are determined, and each
customer is assigned to its appropriate
cluster
Each customer will be assigned to its nearest
cluster
LRFM pattern change from a core or
new customer (T1) to potential
customer or to low consuming
resource customer groups (T2)
LRFM pattern change from a core or
new customer (T1) to lost or to high
consuming resource customer
groups (T2)
Partial defection Total defection
8. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2367 – 2383
2374
observed results by decision makers are the main reasons for its popularity in business compared to other
prediction techniques [47]. DT development usually consists of two distinct stages, tree building, and tree
pruning. At first, the techniques start to search in the training set an attribute offering the best information
gain at the root node level, and then dividing the tree into sub-trees. The same procedure is used to
recursively partitioned the sub-tree following the same rule, then the partitioning stops when the leaf node is
reached.Once the tree is created, rules can be extracted by traversing through the tree until a leaf node is
reached. Several algorithms such as C4.5, C5.0, CHAID and CART are used to produce the trees, in this
study we consider C4.5 algorithm. The details of DT can be found in [48], [49].
2.3.3. Decision tree ensemble (DTE)
Despite the advantages of the decision tree method mentioned above, it also has some
disadvantages. For example, Dudoit, Fridlyand, and Speed [50] note some of its disadvantages; e.g. its
suboptimal performance and the lack of robustness. Among the best ways to solve them is the creation of the
ensemble of trees followed by a vote for the most popular class [51]. This solution is the result of some
researchers who optimized the Decision tree technique.
In this regard, we use the both Tree Ensemble Learner and the Tree Ensemble Predictor nodes of
Knime to build our third model that is based on decision tree ensemble.
The Tree Ensemble Learner node builds an ensemble of decision trees, as a variant of the random
forest. Each of the decision tree models is trained on a different subset of rows and/or on a different subset of
columns, randomly selected at each iteration. The output model is then an ensemble of differently trained
decision tree models. The decision trees learning parameters are similar to the Random Forest classifier
described by Leo Breiman [51]. The Tree Ensemble Predictor node applies all decision trees to each data row
and uses the simple majority vote for prediction.
3. EMPIRICAL STUDY
3.1. General
The data analyzed in this research have been provided from one of the biggest online retailers
specialized in electronics, fashion, home appliances and children's items in Morocco. When customers visit
the website, the system records their login, logout, shopping process and the final state of each session. A
customer can make four types of events, namely “Session with Product Views”, “Session with Add to Cart”,
“Session with Check-Out”, and “Session with Transactions”. The dataset consists of 2783 customers who
visited the e-commerce website. Specifically, the dataset consists of information at the individual customer
level, such as customer register, login, session, transaction and web log in the e-commerce website.
Transactional records of customers for the period November 1, 2013 through February 28, 2015 have been
utilized.
Customers have four modes of payment: Cash on delivery, online credit card, bank transfer and
payment in three installments.
The Transactional records for each customer must be transformed to a usable format for the LRFM
model. From the integrated dataset, the L, R, F and M variables were extracted for each customer.
The definition of LRFM model used in this study is shown in Table 2.
The descriptive statistics for the variables (LRFM) in T1 are provided in Table 3.
Table 2. The Definitions of LRFM Model
Attribute name Data content
Length (L) Refers to the number of days from the first to the last purchase
Recency (R) Refers to the number of days between the first day of study period and
the day of the last purchase.
Frequency (F) Refers to the number of transaction observed in the period
analyzed
Monetary (M) Refers to the total amount spent by customers in the period analyzed.
(Moroccan dirhams)
Table 3. The Descriptions of Length, Recency, Frequency and Monetary in T1
Variables Max Min Average Standard deviation
Length (L) 813 2 656.68 192.87
Recency (R) 241 1 164.77 76.05
Frequency (F) 17 1 8.67 4.99
Monetary (M) 13,723.00 87.00 4431.15 4327.72
9. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering Prediction Techniques in Defining and Predicting Customers Defection: … (Ait Daqud Rachid)
2375
3.2. Clustering by K-means based on LRFM variables
The first eight-months period of the available data, from November, 2013 to June, 2014 (T1), is
used to identify the different customer groups (different LRFM patterns). Consequently, 2692 customers how
visited the e-commerce website in this period are selected.
According to the proposed model described in Section 3, KNIME Analytics Platform 3.3.2 is used.
Consequently, we find seven clusters of customers that have a different LRFM behavior. The optimal number
of cluster (k=7) is obtained based on elbow and silhouette methods. Figure 4 shows the plots of the SSE and
average silhouette coefficient versus the number of clusters for k-means. A distinct knee in the SSE and a
distinct peak in the silhouette coefficient are present when the number of clusters is equal to 7.
Figure 4. Elbow and Average silhouette methods for determining the optimal number of cluster
Table 4 is a summary of the clustering of these seven clusters, each with the corresponding number
of customers, average length (L), average recency (R), average frequency (F), average monetary (M) and the
last column shows the LRFM pattern for each cluster. Most of the customers are in Clusters 1, 3 and 5.
Whereas, cluster 6 includes the minimum number customers (only 77 customers).
As mentioned earlier, we focus our study on customers belong to core customers , and
(Cluster 2, 3 and 4) and the high-value new customers (Cluster 0), the both represent 51.23 %
of the total available customer database.
Table 4. Descriptive Statistics of Seven Clusters based on K-Means Method in T1
Cluster Count Mean(L) Mean(R) Mean(F) Mean(M) Pattern
cluster_0 332 282.31 211.26 12.41 8204.28
cluster_1 760 752.87 48.31 4.47 857.62
cluster_2 375 707.71 206.59 13.34 2187.08
cluster_3 509 742.81 210.91 14.65 10817.28
cluster_4 210 741.10 209.95 6.97 8266.72
cluster_5 428 699.51 212.89 3.93 1063.69
cluster_6 77 35.52 214.66 2.91 405.12
In the second period T2 (from July, 2014 to February, 2015), we introduce the cluster assigner node
(that assigns existing customers in T2 to the existing groups, which are obtained by k-means in T1) to
determine customer who has moved from the core customer in T1 to the defector customer during the
subsequent period of eight months. Applying our partial-total churn definition described in 3.2 section results
in 254 partial defections (17.81% = 254/1426) and 363total defections (25.45% = 363/1426), where 1426
represents the number of customers under investigation (cluster_2 + cluster_3 + cluster_4 +
cluster_0 = 1426).
10. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2367 – 2383
2376
3.3. Variables operationalization
3.3.1. Predictors (independent variables)
A major part of the existing studies related to the prediction of customer churn focuses on the
incorporation of two groups of information: behavioral information and customer demographics. According
to several studies such as Coussement and Van den Poel [52], Guadagni and Little [53]; Rossi et al., [54],
and Tamaddoni Jahromi et al., [7] demographic data (i.e gender, age, address, profession, etc) have less
impact on churn prediction. For this, our study will be based only on behavioral information at the level of
the individual customer (independent variables), this will allow us to keep the models in their simplest form
and, on the other hand, to maximize their predictive power.
Compared with traditional transaction methods, the biggest advantage of e-commerce is that all the
navigation data of all the visits made by customers on the e-commerce site are stored in the servers. From this
behavioral and transactional information at the level of the individual customer (page viewed, sequence of
visits, purchase process, number of transactions, etc ...) and in addition to RFM variables, many indicators
can be extracted [55], and used as predictor variables by our models to improve their distinction power
between customers totally churn and those who partially defect and those who remain loyal.An overview of
all extracted variables used in this study is presented in Table 5.
Table 6 summaries all behavioural independent variables supported by previous research in the both
offline and online environments. The recency, frequency and monetary variables are those that have more
popularity in predicting customer churn in the online environments. The variables that describe the dropout
rates at each step of the buying process, the length of relationship (L), average interpurchase time (ITP) and
mode of payment (Mopayment) are variables infrequently used in prior research. Therefore, in order to assess
their importance in predicting customer churn, we will take them into account.
Table 5. Predictor Variables
Variable type Variable name Description
Recency R
Number of days between the first day of thestudy period and the day of the last
purchase in calibration period (0<=R<=T1).
Frequency F Number of purchases observing during the calibration period (T1).
R_change.F Relative change in number of purchases in the second half of the calibration period
F.T1.2 when compared with the first half of the calibration period F.T1.1, i.e.
R_change.F=(F.T1.2 - F.T1.1)/F.T.11
Monetary M Total monetary amount of purchases in calibration period (T1).
Length of
relationship
L Number of days from the first to the last purchase.
Interpurchase time ITP Average number of days between purchases.
R_change.ITP Relative change in interpurchase time in the second half of the calibration
period ITP.T1.2 when compared with the first half of the calibration period
ITP.T1.1, i.e. R_change.ITP=(ITP.T1.2 - ITP.T1.1)/ITP.T.11
Mode of payment Mopayment Indicates the most mode of payment used in the last three transactions.
Dropoutrate Last_session_abandoned Indicates whether the last session was abandoned at checkout step (yes,non).
aband_rate(allvisit to
productviews)T1.2
The percent of sessions that abandoned the buying process at the "Product
views" step in the second half of the calibration period T1.2.
aband_rate(productviews
to addcart) T1.2
Abandonment rate for a customer when moving from "Product views" step to
"Add to cart" step in the second half of the calibration period T1.2.
aband_rate(addcart to
checkout)T1.2
Abandonment rate for a customer when moving from "Add to cart" step to
"Check-out" step in the second half of the calibration period T1.2.
aband_rate(checkout to
transaction)T1.2
Abandonment rate for a customer when moving from Check-out step to
Transaction step in the second half of the calibration period T1.2.
aband_rate(allvisit to
transaction)T1.2
The percent of sessions that abandoned the buying process at Check-out step in
T1.2 period.
Table 6. Behavioural Predictors of Defection and Type of Target Variable in Prior Research
Predictor variables Target variable
L R F M IPT Product
categories
Mode of
payment
Failure Dropoutr
ate
Variable type
Offline
environm
ent
Buckinx and van
den Poel (2005)
[6]
X X X X X X X Binary (Churner, Non-
churner)
V.L. Miguéis, et
al (2012) [10]
X X X X Binary (Partially
churned, Non-
churner)
Mozer, et al.
(2000) [16]
X X X X X X Binary (Churner, Non-
churner)
11. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering Prediction Techniques in Defining and Predicting Customers Defection: … (Ait Daqud Rachid)
2377
Predictor variables Target variable
L R F M IPT Product
categories
Mode of
payment
Failure Dropoutr
ate
Variable type
Online
environm
ent
Keaveney and
Parthasarathy
(2001) [56]
X X Binary (switchers,
continuers)
K. Coussement
and, K. W. De
Bock (2013) [57]
X X X X X Binary (Churner, Non-
churner)
A.T. Jahromi, et
al. (2014) [7]
X X X Binary (Churner, Non-
churner)
Ssu-Han Chen
(2016) [23]
X X Binary (Churner, Non-
churner)
N. Gordini and V.
Vegliob (2017)
[9]
X X X X X X Binary (Churner, Non-
churner)
This study X X X X X X X Multi-class (Partially-
churned, Totally-
churned, Non-
churner)
3.3.2. Target variable (dependent variable)
The target variable in the current study is „status‟, a categorical variable which has three values:
Partially-churned, Totally-churned and Non-Churner, and which is predicted based on customer's event
history on the e-commerce website.
3.4. Performance Measures
Table 6 reveals that all existing churn studies focus on binary classification models. This study
contributes to the literature by not focusing on the binary classification models where the model predicts the
status of a customer as churner or non-churner. Our study, however, will address the case of Multi-Class
Classification problems where the dependent variable classifies a particular customer either as a customer
continuing his loyal buying pattern (Non-churned) or as a partial defector (Partially-churned) or as a total
defector (Totally-churned).
In terms of multi-class classification problems, Micro-average and Macro-average measures are
commonly used in evaluating performance. However, the Micro averaging does not provide an accurate
measure of performance when the instances are not equally distributed over the classes (most instances
belong to one class). Unlike Micro averaging, Macro averaging provides meaningful performance measure
despite that data is not equally representative of all classes (imbalanced classes) [58]. Therefore, Macro
averaging is used as a measure to evaluate the multi-class model performance in this study.
Table 7 gives a typical resulting confusion matrix (is a table that shows each class in the test set and
the number of correct predictions and incorrect predictions) for a problem with three classes, where Nij
represents the number of instances with actual class i which are predicted as a class j. (i = 1, 2, 3, j = 1, 2, 3).
Table 7. A Typical Resulting Confusion Matrix
Predict
Classi Classj Classk
Actual
Classi Nii
Nij
Nik
Classj Nji
Njj
Njk
Classk Nki
Nkj
Nkk
Table 8 presents the most often used measures for multi-class classificationbased on the values of
the confusion matrix.
Besides, we consider 10-fold cross validation in which the initial data are randomly divided into 10
equal parts, 9 parts are used as training data to build the prediction model, while the remaining one part is
reserved as the test set. Therefore, each part will be trained and tested ten timesand the average accuracy rate
can be obtained consequently.
12. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2367 – 2383
2378
Table 8. Measures for multi-class classification used in this study based on TPi: instances of Classi correctly
predicted, TNi: instances of other Class(j,k) correctly predicted, FPi: instances of other Class(j,k) predicted as
instances of classi and FNi: instances of Classi incorrectly. M indice represent macro-averaging [59]
Measure Formula Description
Average Accuracy ∑ The average per-class effectiveness of a classifier
Error Rate ∑ The average per-class classification error
PrecisionM ∑ An average per-class agreement of the data class labels
with those of a classifiers
RecallM ∑ An average per-class effectiveness of a classifier to
identify class labels
F-1M
( )
,
avec
Relations between data‟s positive labels and those given
by a classifier based on a per-class average
4. RESULTS AND DISCUSSION
The actual fraction of partially churned, totally churned, and non-churned in the dataset is 17.81%,
25.45%, and 56.73% respectively. In order to evaluate the quality of the predictions of the churn prediction
models including decision tree, artificial neural network and Decision Tree Ensemble, Macro average
measures and 10-fold cross validation method are considered.
Table 9. Prediction performances of the three models with 10-fold cross validation
DT ANN DTE
fold 1 93.71 90.21 93.71
fold 2 95.10 95.10 98.60
fold 3 95.80 97.20 98.60
fold 4 93.01 90.91 95.10
fold 5 92.31 91.61 95.10
fold6 96.50 97.90 99.30
fold7 96.48 93.66 97.18
fold8 95.07 92.25 97.18
fold9 92.25 93.66 97.18
fold10 95.07 92.25 96.48
Avg 94.53 93.48 96.84
Table 9 shows the prediction performance of three models based on 10-fold cross-validation. On
average, the prediction models provide higher than 93% accuracy. When comparing the different
classification techniques, the Decision Tree Ensemble offers the best results in terms of accuracy in all the
test folds. Accuracy alone is sometimes quite misleading to confirm the prediction quality [60]. For this,
additional measures of models' performance such as Recall, and Precision are required to identify the better
performing churn prediction model. Therefore, based on confusion matrix tables we calculate the Recalli,
Precisioni and F-1i values for each class to assess the performance with respect to every of three classes in
our dataset. The detailed results are presented in Table 10, Table 11 and Table 12.
Table 10, Table 11 and Table 12 summarize the overall accuracy, recall, precision and F-1 values
for each of the three classes for the three classification techniques with 10-fold cross validation.
Table 10. The overall accuracy, Recall, Precision and F-1 values for each of the three classes for the ANN
classifier with 10-fold cross validation
ANN Overall accuracy TP FP TN FN Recall Precision F-1
Partially-churned 93.48 % 192 29 1143 62 0.756 0.869 0.808
Totally-churned 350 27 1036 13 0.964 0.928 0.946
Non-churned 791 37 580 18 0.978 0.955 0.966
13. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering Prediction Techniques in Defining and Predicting Customers Defection: … (Ait Daqud Rachid)
2379
Table 11. The overall accuracy, Recall, Precision and F-1 values for each of the three classes for the DT
classifier with 10-fold cross validation
DT Overall accuracy TP FP TN FN Recall Precision F-1
Partially-churned 94.53 % 218 42 1130 36 0.858 0.838 0.848
Totally-churned 346 13 1050 17 0.953 0.964 0.958
Non-churned 784 23 594 25 0.969 0.971 0.97
Table 12. The overall accuracy, Recall, Precision and F-1 values for each of the three classes for the DTE
classifier with 10-fold cross validation
DTE Overall accuracy TP FP TN FN Recall Precision F-1
Partially-churned 96.84 % 218 9 1163 36 0.858 0.960 0.906
Totally-churned 357 14 1049 6 0.983 0.962 0.973
Non-churned 806 22 595 3 0.996 0.973 0.985
Table 13. Macro-averaging measures for the three classifiers
Average Accuracy RecallM PrecisionM F-1M
Artificial Neural Networks (ANN) 0.957 0.899 0.917 0.907
Simple Decision Tree (DT) 0.964 0.927 0.924 0.925
Decision Tree Ensemble (DTE) 0.979 0.946 0.965 0.955
The results shown in Table 10, Table 11 and Table 12 indicate that for the three classes of our
database the DTE model offers the better results in terms of precision, recall and F-1.
Moreover, as stated in Table 13, the DTE shows better predictive performance than the other models
in terms of Macro-averaging measures. Compared to DT and ANN is + 1.90, + 4.63 points respectively in
terms of RecallM, whilst the improvement in PrecisionM is + 4.06, + 4.76 respectively and + 2.93, + 4.80
respectively in terms of F-1M. Based on these results, we conclude that the DTE model performs the best in
identifying customers totally defect, partially defect and those who remain loyal. Consequently, we are able
to follow both partial and total defection in contrast with past research that focused either on a total or partial
defect. This contribution is important due to several reasons. First, since we consider the two types of
defection (partial and total), the degree of risk related to partial defection is different from that of total
defection. Therefore, due to the costs associated with retention strategies, it is advisable not to concentrate
churn management efforts in the same way on both [61]. In other words, a customer predicted by the model
as a customer that will likely to (partially) churn in the future, should not be targeted by the same incentive
program dedicated to those predicted that will likely to leave the company definitely in the future and vice
versa. This will help the managers to make right interactions at the right time for retaining these customers
without wasting resources. Secondly, the ability to check whether the total churn is always preceded by a
partial attrition or that there are cases where the customer definitely leaves without leaving any signs of
dissatisfaction. This will allow companies to think about solutions to such situations.
Finally, we consider which predictors contribute more to predicting partial and total customer
defection using the three models. For this, we create a knime workflow that allows calculating the variable
importance, the basic idea is: for calculating the importance of the variable k, we exclude it out for the
prediction. If the prediction accuracy of the model decreases to the absence of this variable, this indicates that
the latter is important in the prediction process. The same procedure is used for all variables and finally, the
importance variable for each predictor is normalized from 0 to 1 to get an indicator how important each
variable. The results are presented in Table 14. Therefore, the knowledge of the most important churn
predictors in the e-commerce sector would be of particular interest to marketing manager because they
provide actionable information to target the customers that are most likely to churn in the near future with
tailored incentives to minimize the risk of churn [9].
Table 14 shows the importance of each predictor for DT, ANN and DTE models, where the first
column indicates the rank of importance, the second shows the name of thevariable, and the last column
shows the normalized importance of a variable based on the accuracy of the model without it.
Table 14. Importance of Variables
Decision tree ensemble Decision tree Artificial Neural Networks
Rank Variable NormImp Variable NormImp Variable NormImp
1 aband_rate(checkout-
transaction)T1.2
1.000
aband_rate(checkout-
transaction)T1.2
1.000
aband_rate(checkout-
transaction)T1.2
1.000
2 aband_rate(productviews- 0.562 aband_rate(productviews- 0.287 aband_rate(productviews- 0.652
addcart)T1.2 addcart)T1.2 addcart)T1.2
14. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2367 – 2383
2380
Decision tree ensemble Decision tree Artificial Neural Networks
Rank Variable NormImp Variable NormImp Variable NormImp
3 aband_rate(allv-
transaction)T1.2
0.210
aband_rate(addcart-
checkout)T1.2
0.181
aband_rate(allv-
transaction)T1.2
0.277
4 aband_rate(addcart-
checkout)T1.2
0.181
aband_rate(allv-
productviews)T1.2
0.070
aband_rate(addcart-
checkout)T1.2
0.241
5 L 0.038 R 0.064 L 0.143
6 aband_rate(allv-
productviews)T1.2
0.038 Last_session_abandoned 0.058 R 0.098
7
ITP 0.029
aband_rate(allv-
transaction)T1.2
0.053 R_change.F 0.089
8 R_change.ITP 0.029 L 0.053 ITP 0.071
9 F 0.019 ITP 0.053 R_change.ITP 0.071
10 M 0.010 R_change.F 0.053 F 0.071
11
Mode-ofpayment 0.010 M 0.047
aband_rate(allv-
productviews)T1.2
0.063
12 Last_session_abandoned 0.010 F 0.035 M 0.027
13 R 0.010 Mode-ofpayment 0.029 Mode-ofpayment 0.027
14 R_change.F 0.000 R_change.ITP 0.000 Last_session_abandoned 0.000
It is clear from the rankings of variable importance that variables which describe the dropout rates in
the buying process steps like: aband_rate(checkout-transaction)T1.2, aband_rate(productviews-addcart)T1.2
and aband_rate(addcart-checkout)T1.2 are indeed powerful predictors of partial and total churn in the e-
commerce field.The confirmation of this view is given by the similarity in the ranking of the importance of
these variables, where we find them at the four of top-ten variables of all models and seem to outperform
other variables. The main difference between DTE model and the other models is, most clearly related to the
variable that describes the recency of the last purchase (R) which appears at the bottom of the ranking for
DTE and quite important in the other two models.However, when comparing the results of the study with the
recent research that deals with the prediction of churn in online environments A.T. Jahromi et al [7], N.
Gordini and V. Vegliob [9] and K. Coussement and K.W. De Bock [57], it appears on the one hand, that in
terms of the importance of variables used in the phase learning, recency and frequency variables seem less
important for predicting churn. This goes in utter contrast with the expectations we have formulated from
existing research, which strongly emphasize the predictive power of the RF variables of the RFM models.
This result occurs mainly because the clients involved in the prediction models are the ones that represent the
core customers and new high-value customers, more precisely, customers belonging to clusters ,
, and .Considering Table 4, one observes that the average F and R values of these four
clusters are almost close. In addition, the descriptive statistics presented in Table 3 indicate that the standard
deviation of F and R is low, which means that the values of these two variables are little dispersed around the
average. Therefore, this makes their contribution less important in the distinction between total defectors
customers, partial defectors and loyal ones.
Another explanation couldreside in the fact that these studies have not fully exploited the large
amount of data generated by online environments, but they have remained limited to the variables that
characterize offline environments such as recency, frequency and the monetary. Indeed, from the events
made by customers on the merchant site, we can easily extract many predictor variables that have an
explanatory power in the understanding of customers‟ behavior, and in the analysis of their buying
experience that starts with product consultation and ends with validation of the transaction. For example, we
can retrieve variables that describe for each customer the rate of dropping sessions at different stages of the
buying process.
5. CONCLUSION
In order to address the crucial problem of churn definition in the non-contractual (e-commerce)
settings, LRFM model and clustering technique (k-means) are combined in the first stage to identify different
types of customer profiles (different LRFM patterns) based onthe first sub-period (T1). Consequently, we
find seven clusters of customers that have a different LRFM behavior ( , , , , ,
, ) and then we define a customer‟s LRFM pattern change over time as an early signal of either
partial or total defection. In our opinion, the proposed methodology for churn definition can be a useful
decision tool for companies operating in non-contractual settings, where customers and companies do not
have any contracts between them. After resolving the problem of churn definition, we have proposed three
15. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering Prediction Techniques in Defining and Predicting Customers Defection: … (Ait Daqud Rachid)
2381
predictive models (Artificial Neural Networks, Simple Decision Tree, and Decision Tree Ensemble) for
partial/total customer churn in thee-commerce sector.
In order to test the proposed models in a real context, we used as acase study an online store, where
the click stream behavior records of customers for the period November 1, 2013 through February 28, 2015
have been utilized. The results reported reveal that three proposed models can provide an individual-level
prediction of the probability to partially or totally defect in the future, that would enable us to follow the both
partial and total defectors. A comparative analysis of different models is also presented, the results of this
comparative analysis show the beneficial impact of Decision Tree Ensemble over other models (simple
decision tree and artificial neural networks) in terms of prediction quality.
This prediction is very useful for marketing managers because will greatly help them to implement
new tailored incentives solutions (retention actions) according to the degree of the defection (Partial or total)
to convince them to stay.
Finally, the variables that can be contributing more to predicting partial and total customer‟s
defection in e-commerce sector have been identified.
Our findings indicate, also, some limitations and issues for further research.
Firstly, this study is limited to e-commerce sector, and it is difficult to apply it in the offline world,
because it's based on analyzing Web browsing behavior (page views, sequence of visits, buying process,
session dropout rates at each stage of the buying process).
Secondly, we have used only a few numbers of variables in the clustering phase. However, further
studies may utilize additional variables such as variables related to product category.
Finally, the predictive power of themodel is significantly influenced by the choice of classification
technique. However, in future work other classification techniques, such as genetic algorithms, naïve Bayes
tree (NBTree), rough set approaches and fuzzy logic, will be used.
REFERENCES
[1] Interbank Electronic Banking Center, Morocco, “Activité monétique 1er
semestre 2017 au Maroc”,
[https://www.cmi.co.ma/]
[2] Hongsheng Xu, et al., “Construction of Ecommerce Recommendation System based on Semantic Annotation of
Ontology and User Preference”, TELKOMNIKA (Telecommunication omputing, Electronics and Control), vol. 12,
no. 3, pp. 2028-2035, 2014.
[3] Neslin, S.A., Gupta, S., Kamakura, W., Lu, J., & Mason, C, “Defection detection: improving predictive accuracy of
customer churn models”, Working Paper, Teradata Center at Duke University, 2004.
[4] Burez, J., and Van den Poel, D., “Crm at a pay-TV company: Using analytical models to reduce customer attrition
by targeted marketing for subscription services”, Expert Systems with Applications, vol. 32, pp. 277-288, 2007.
[5] Y. Catherine, “AOL: Scrambling to Halt the Exodus,” Business Week, 62, August 4, 2003.
[6] Buckinx, W. and Van den Poel, D., “Customer base analysis: partial defection of behaviorally loyal clients in a
non-contractual FMCG retail setting”, European Journal of Operational Research, vol. 164, no. 1, pp. 252-268,
2005.
[7] A.T. Jahromi, et al., “Managing B2B customer churn, retention and profitability”, Industrial Marketing
Management, vol. 43, no. 7, pp. 1258-1268, October 2014
[8] Özden Gür Ali, Umut Arıtürk, “Dynamic churn prediction framework with more effective use of rare event data:
The case of private banking”, Expert Systems with Applications, vol. 41, no. 17, pp. 7889-7903, 2014.
[9] N. Gordini and V. Veglio, “Customers churn prediction and marketing retention strategies. An application of
support vector machines based on the AUC parameter-selection technique in B2B e-commerce industry”, Industrial
Marketing Management, vol. 62, pp. 100-107, April 2017.
[10] V.L. Miguéis, et al., “Modeling partial customer churn: On the value of first product-category purchase sequences”,
Expert Systems with Applications, vol. 39, no. 12 and 15, pp. 11250-11256, September 2012.
[11] M. Clemente-Císcar, et al., “A methodology based on profitability criteria for defining the partial defection of
customers in non-contractual settings”, European Journal of Operational Research, vol. 239, no. 1, 16 November
2014.
[12] Jae-Hyeon Ahna, Sang-Pil Hana, Yung-Seop Lee., “Customer churn analysis: Churn determinants and mediation
effects of partial defection in the Korean mobile telecommunications service industry”, Telecommunications Policy,
vol. 30, no. 10 and 11, pp. 552-568, 2006.
[13] R. Ait daoud, et al., “Customer Segmentation Model in E-commerce UsingClustering Techniques and LRFM
Model: The Caseof Online Stores in Morocco”, World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering, vol. 9, no. 8,
2015.
[14] Shin-Yuan Hung, et al., “Applying data mining to telecom churn management”, Expert Systems with Applications,
vol. 31, pp. 515-524, 2006.
[15] M.A.H. Farquad, Vadlamani Ravi, S. Bapi Raju, “Churn prediction using comprehensible support vector machine:
An analytical CRM application”, Applied Soft Computing, vol. 19, pp. 31-40, 2014
16. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2367 – 2383
2382
[16] Mozer, et al., “Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications
industry”, IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 690-696, 2000.
[17] Xiaobing Yu, et al., “An extended support vector machine forecasting framework for customer churn in e-
commerce”, Expert Systems with Applications, vol. 38, pp. 1425-1430, 2011.
[18] Reinartz W1, Kumar V,“The mismanagement of customer loyalty”, Harv Bus Rev, vol. 80, no. 7, 125, pp. 86-94,
2002.
[19] Xie, et al., “Customer churn prediction using improved balanced random forests”, Expert Systems with
Applications, vol. 36, no. 3, pp.5445-5449, 2009.
[20] Bingquan, et al., “Customer churn prediction in telecommunications”, Expert Systems with Applications, vol. 39,
no. 1, pp. 1414-1425, January 2012.
[21] Faris, H., “Neighborhood cleaning rules and particle swarm optimization for predicting customer churn behavior in
telecom industry”. Int. J. Adv. Sci. Technol., vol. 68, pp. 11-22, 2014.
[22] K. Kyoungok, et al., “Chi-Hyuk Jun, Jaewook Lee, Improved churn prediction in telecommunication industry by
analyzing a large network”, Expert Systems with Applications, vol. 41, no. 15, 2014, pp. 6575-6584, 2014.
[23] Ssu-Han Chen, “The gamma CUSUM chart method for online customer churn prediction”, Electronic Commerce
Research and Applications, vol. 17, pp. 99-111, May–June 2016.
[24] Niels Holtrop, Jaap E. Wieringa, Maarten J. Gijsenberg, Peter C. Verhoef, “No future without the past? Predicting
churn in the face of customer privacy”, International Journal of Research in Marketing, vol. 34, no. 1, pp. 154-172,
2017.
[25] H. H. Wu, et al., “Analyzing Patients‟ Values by Applying Cluster Analysis and LRFM Model in a Pediatric Dental
Clinic in Taiwan”, Hindawi Publishing Corporation The Scientific World Journal, vol. 2014, Article ID 685495,
pp. 7, 2014.
[26] A.M. Hughes, “Strategic database marketing”, Probus Publishing, 1994.
[27] R. Ait daoud, et al., “Combining RFM model and clustering techniques for customer value analysis of a company
selling online”. IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 1-6,
2015.
[28] Jo-Ting Wei, et al., “Customer relationship management in the hairdressing industry: An application of data mining
techniques”, Expert Systems with Applications, vol. 40, no. 18, pp. 7513-7518, 15 December 2013.
[29] Der-Chiang Li, et al., “A two-stage clustering method to analyze customer characteristics to build discriminative
customer management: A case of textile manufacturing business”, Expert Systems with Applications, vol. 38, no. 6,
pp. 7186-7191, June 2011.
[30] A.M. Hughes, “Boosting response with RFM”. Marketing Tools, vol. 3, no. 3, pp. 4-7, 1996.
[31] G.M. Marakas, “Decision Support Systems in the 21st Century”, Second Edition. Prentice Hall, Upper Saddle
River, NJ, 2003.
[32] A.X. Yang, “How to develop new approaches to RFM segmentation”, Journal of Targeting, Measurement and
Analysis for Marketing, vol. 13, no. 1, pp. 50-60, 2004.
[33] S.M.S. Hosseini, et al., “Cluster analysis using data mining approach to develop CRM methodology to assess the
customer loyalty”, Journal of Expert Systems with Applications, vol. 37, no. 7, pp. 5259-5264, 2010.
[34] I.C. Yeh, et al., “Knowledge discovery on RFM model using Bernoulli sequence”, Expert Systems with
Applications, vol. 36, no. 3, pp. 5866-5871, 2009.
[35] H.C. Chang and H.P. Tsai, “Group RFM analysis as a novel framework to discover better customer consumption
behavior”, Expert Systems with Applications, vol. 38, no. 12, pp.14499-14513, 2011.
[36] H.H. Chang and S.F. Tsay., “Integrating of SOM and K-mean in data mining clustering: an empirical study of
CRM and profitability evaluation”, Journal of Information Management, vol. 11, no. 4, pp. 161-203, 2004.
[37] S. Chow. and R. Holden., “Toward an understanding of loyalty: The moderating role of trust”, Journal of
Management issues, vol. 9, no. 3, pp. 275-298, 1997.
[38] D. Birant, “Data Mining Using RFM Analysis, Knowledge-Oriented Applications in Data Mining”, InTech,
ISBN: 978-953-307-154-1, 2011.
[39] Se-Hoon Jung, et al., “ Prediction Data Processing Scheme using an Artificial Neural Network and Data Clustering
for Big Data”, International Journal of Electrical and Computer Engineering, vol. 6, no. 1, pp. 330-336,
February 2016.
[40] R.J. Kuo, et al., “Integration of self-organizing feature map and K-means algorithm for market segmentation”,
Computers & Operations Research, vol. 29, no. 11, pp. 1475-1493, 2002.
[41] Tan P.N., Steinbach M., Kumar V. “Introduction to Data Mining”, Pearson Addison Wesley; Boston, MA,
pp. 487-556 ,USA: 2006.
[42] Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques, 2nd
ed”, Morgan Kaufmann
Publishers, ISBN 1-55860-901-6, March 2006.
[43] Peter J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis”, Journal of
Computational and Applied Mathematics, vol. 20, pp. 53-65, 1987.
[44] S.H. Ha and S.C. Park., “Application of data mining tools to hotel data mart on the Intranet for database
marketing”, Expert Systems with Applications, vol. 15, no. 1, pp. 1-31, 1998.
[45] Rumelhart, et al., “Learning internal representations by error propagation”, (vol. 1). MA: MIT Press Cambridge,
1986.
[46] Y.L.Chen, et al., “Constructing a multivalued and multi-labeled decision tree”, Expert Systems with Applications,
vol. 25, no. 2, pp. 199-209, 2003.
17. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering Prediction Techniques in Defining and Predicting Customers Defection: … (Ait Daqud Rachid)
2383
[47] Wei, C. -P., & Chiu, I. -T., “Turning telecommunications call details to churn prediction: A data mining approach”.
Expert Systems with Applications, vol. 23, no. 2, pp. 103-112, 2002.
[48] Quinlan, J.R., “C4.5: Programs for machine learning”, Morgan Kaufman Publishers, 1993.
[49] Quinlan, J.R., “Improved use of continuous attributes in c4.5”, Journal of Artificial Intelligence Research,vol. 4,
pp. 77-90, 1996.
[50] Dudoit, et al., “Comparison of discrimination methods for the classification of tumors using gene expression data”,
Journal of the American Statistical Association, vol. 97, no. 457, pp. 77-87, 2002.
[51] L. Breiman, “Random forests”, Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[52] K. Coussement and D. Van den Poel, “Churn prediction in subscription services: An application of support vector
machines while comparing two parameter-selection techniques”, Expert systems with applications, vol. 34, no. 1,
pp. 313-327, 2008.
[53] P.M. Guadagni, and J.D.C. Little., “A Logit Model of Brand Choice Calibrated on Scanner Data”, Marketing
Science, vol. 2, no. 3, pp. 203-238, 1983.
[54] P, Rossi et al., On the value of houschold purchase history information in target marketing, Marketing Science,
vol. 15, no. 4, pp. 321-340, 1996.
[55] Guofang Kuang and Yuanchen Li, "Using Fuzzy Association Rules to Design Ecommerce Personalized
Recommendation System", TELKOMNIKA (Telecommunication Computing, Electronics and Control), vol. 12,
no. 2, pp. 1519-1527, 2014
[56] Keaveney, S.M., & Parthasarathy, M. Customer switching behavior in online services: An exploratory study of the
role of selected attitudinal, behavioral, and demographic factors. Journal of the Academy of Marketing Science,
vol. 29, no. 4, pp. 374-390, 2001.
[57] K. Coussement and K.W. De Bock, “Customer churn prediction in the online gambling industry: The beneficial
effect of ensemble learning”, Journal of Business Research, vol. 66, pp. 1629-1636, 2013.
[58] Ligang Zhou, et al., “One versus one multi-class classification fusion using optimizing decision directed acyclic
graph for predicting listing status of companies”, Information Fusion, vol. 36, pp. 80-89, 2017.
[59] M. Sokolova and G. Lapalme., “A systematic analysis of performance measures for classification tasks”,
Information Processing & Management, vol. 45, no. 4, pp. 427-437, 2009.
[60] A Sturm and Bob L, “Classification accuracy is not enough”, Journal of Intelligent Information Systems, vol. 41,
no. 3, December 2013, pp. 371-406, 2013.
[61] J. Hadden, et al., “Computer assisted customer churn management: State-of-the-art and future trends”, Computers
& Operations Research, vol. 34, no. 10, pp. 2902-2917, 2007.