SlideShare a Scribd company logo
1 of 4
Download to read offline
On Identifying Potential Direct Marketing Consumers
using Adaptive Boosted Support Vector Machine
Armin Lawi Department of
Computer Science Universitas
Hasanuddin
Makassar, Indonesia
armin@unhas.ac.id
Ali Akbar Velayaty*)
Post-graduate Program of Electrical
Engineering, Universitas Hasanuddin
Makassar, Indonesia
arvasit@gmail.com
Zahir Zainuddin
Dept. Electrical Engineering
Universitas Hasanuddin
Makassar, Indonesia
zainuddinzahir@gmail.com
Abstract— Identifying potential consumers for direct
marketing to very large data is a difficult and impossible
task to do manually. Therefore, the machine learning
approach needs to help analyze the data to contribute in
determining the marketing strategy policy. In this paper,
vector machine support methods using the Adaboost
algorithm are investigated to classify potential customers
for direct marketing of large bank data marketing. The
adaboost algorithm aims to build a better model than the
model generated from a single classifier. Data obtained
from UCI machine learning repository. Total data is
processed as many as 9280 data with the number of
attributes of 20 classes and 1 target class. Training data and
test data are divided into 70% and 30%. This classification
predicts the prospects for a deposit subscription. The results
show that the SVM method using Adaboost algorithm
obtained accuracy is 95.07% and the sensitivity is 91.65%
higher than the ordinary SVM approach.
Keywords—Bank Direct Marketing; Support Vector Machine;
Adaboost.
I. INTRODUCTION
Business Intelligence is a term related to the use of
information space and Intelligence mechanisms to support a
decision [1]. One sector that implements this is the banking
sector that receives and processes information from customers
every day. One area of banking that collects information from
customers is the marketing sector. There are two ways of
banking in introduce the goods or services that is with the
general advertising either through television, radio, newspapers,
etc. or by targeting customers in particular or called the bank
direct marketing [2].
Direct marketing is a marketing technique applied to the
bank to introduce a product either in the form of goods or
services to customers by phone, email and others [3]–[5].
However, the disadvantages of direct marketing are those
customers who feel disturbed that can cause negative ratings for
the bank and the cost and time spent by telemarketers is not
proportional to the number of customers contacted [2]
Over time, the amount of data that contains information from
customers will increase. The increasing amount of data can be
used to find information that serves to assist in making a
decision. One of the computational methods that discuss the
above problem is Data Mining [1].
Data mining is a field in the computational process that
allows finding a pattern of a data that is often ignored by using
a particular method. The pattern can be used to help take in a
decision. One method of data mining is a classification method
that is a technique of grouping data obtained based on the pattern
produced previously. There are several models of classification
among them is Support Vector Machine (SVM) method.
Support Vector Machine is the method that forms the best
separator in hyperplane field on the data that separates the two
classes by maximizing the margin value of the two classes.
According to [6] Although ordinary SVM shows good
performance in classification, but is sensitive to sample and
parameter settings. Some research indicates that the accuracy
and performance of ordinary SVM can be improved by using the
Ensemble method [7]. The Ensemble method is a method that
combines the predictions generated by some classifier. One of
the Ensemble Methods of a classification is boosting algorithm.
which is more popular used compared to bagging algorithm [7].
So, in this research will compare the performance of ordinary
SVM with SVM using adaboost algorithm.
II. LITERATURE
As for some related research that discusses the predictions of
potential customers to subscribe deposits are as follows. Moro
(2011), et al. [2] proposed Cross Industry Standard Process for
Data Mining (CRISP-DM) method indicates that the support
vector machine gets better AUC and ALIFT results than the
Decision Tree and Naive Bayes methods. Feng, et al. [7]
proposed a Bayesian networks method and their application
shows that a single support vector machine method only
produces an average level of accuracy that is not much different
than other classification methods, and thus they conducted
ensemble method to improve the accuracy of the classification
method. Moro (2014), et al. [8] proposed a data-driven approach
to predict the success of bank telemarketing by comparing four
methods of data mining classification, i.e., Decision Tree,
Neural Networks, Support Vector Machine and Logistic
Regression. Their result showed that neural network is the best
*)
Corresponding Author: arvasit@gmail.com
Name Type
Age Numeric
Job Categorical : "Admin.", "Blue-Collar",
"Entrepreneur", "Housemaid",
"Management", "Retired", "Self-
Employed", "Services", "Student",
"Technician", "Unemployed", "Unknown"
Marital Categorical : "Divorced", "Married",
"Single", "Unknown";
Education Categorical : "Basic.4y", "Basic.6y",
"Basic.9y", "High.School", "Illiterate",
"Professional.Course",
"University.Degree", "Unknown"
Default Categorical: "No","Yes","Unknown"
Housing Categorical: "No","Yes","Unknown"
Loan Categorical: "No","Yes","Unknown"
Contact Categorical: "Cellular","Telephone"
Month Categorical: "Jan", "Feb", "Mar", ...,
"Nov", "Dec"
Day_Of_Week Categorical: "Mon", "Tue", "Wed", "Thu",
"Fri"
Duration Numeric
Campaign Numeric
Pdays Numeric
Previous Numeric
Poutcome Categorical : "Failure", "Nonexistent",
"Success"
Emp.Var.Rate Numeric
Cons.Price.Idx Numeric
Cons.Conf.Idx Numeric
Euribor3m Numeric
Nr.Employed Numeric
Y Binary: "Yes","No"
performance, and support vector machines and neural networks
are more flexible and have better learning skills among other
classification methods. Alhakbani and Rifaie [9] proposed a
handling class imbalance in direct marketing dataset using a
hybrid data. An algorithmic level solution using the Syntethic
Minority oversampling technique is implemented to balance the
dataset and then use grid search to optimize the c, gamma, and
kernel parameters. The models used the support vector machine
to build the HybridDA model. The results obtained indicate that
the technique is better than the techniques used by previous
research.
III. PRELIMINARIES
The datasets used in this paper is taken from the University
of California at Irvine (UCI) Machine Learning Repository. It is
a bank marketing data collected from March 2008 to November
2010. The number of attributes is 20 attributes with the addition
of 1 target class, and the total observation is 41,188 data [8].
Table I. Attributes Description
A. Support Vector Machine
Support Vector Machine (SVM) is one of the methods used
in the problem of pattern classification (pattern classification).
The basic idea of the SVM method is to maximize the
boundary-field separator (hyperplane) that separates the data
into two classes in a feature space. Support vector machine
method is suitable for use on linearly separable data [10].
To classifying the data cannot be separated linearly and it
should be transformed into feature space dimension such that
the data can be separated linearly from the feature space.
Feature space in practice usually has a higher dimension of
input space. Suppose there is a dataset whose data has two
attributes and two classes of positive and negative classes when
depicted in two-dimensional space of data cannot be separated
linearly. Therefore, the data is transformed to a higher
dimension so that it can be separated linearly. This method is
called kernel methods, which transform a linear SVM into non-
linear SVM
B. Adaboost
Boosting is a common and effective method for building an
accurate classifier by combining weak classifiers [7] The use of
boosting is preferred because it focuses on misclassified issues
and has a tendency to increase higher accuracy compared to
bagging. The commonly used boosting algorithm is the adaboost
algorithm
Adaboost trains the basic classifier sequentially (iteratively)
each iteration. The basic classification is trained by using data
training with weight coefficients that depend on the classifier
performance. On the previous iteration to give greater weight to
the wrongly classified data (misclassified). If the classifier has
been trained as much as desired, then the classifier is combined
to form a final decision on the model that shows the best
performance
The steps of the Adaboost algorithm are [11]
1. Input : Training data along with its label
, Component Learner, And the
number of iterations of T.
2. Initialization of training data weight
. (1)
3. For iteration
a. Use the Component Learner algorithm to train a
classification component, , On training weights.
b. Calculate the weight of classification error on
. (2)
The learning trust index is calculated as:
. (3)
c. Renew the sample training weight
. (4)
d. Testing model with testing data
Method Iteration Accuracy Sensitivity
Ordinary SVM - 91,67 % 83,80 %
Adaboost SVM
10 92,53 % 85,93 %
20 94,30 % 89,68 %
30 95,07 % 91,65 %
40 94,81 % 91,05 %
50 94,86 % 91,05 %
4. The last learning output
Combination of all classifications
Table III. Classification results of SVM Adaboost
C. Performance Evaluation
. (5)
The performance result of the proposed method is evaluated
based on the degree of accuracy and sensitivity. The
measurement evaluation is obtained from the confusion matrix
which consists of 4 parts of the classification results, i.e., True
Positive (TP), True Negative (TN), False Positive (FP) and False
Negative (FN). The confusion matrix is given in Table II.
Table II. Confusion Matrix/ Contingency Table
Actual  Prediction True False
True TP FN
False FP TN
The accuracy level is defined as the degree of proximity
between the predicted data with the actual data or the ratio of the
amount of data that is correctly classified as shown by equation
(6).
Table III shows that the adaboost SVM method gives higher
accuracy rate since the algorithm updates the weights of
incorrect classified data to the specified iteration limit. The table
indicates that the increasing number of iterations is not
guaranteed the performance obtained will be better. In this
result, the iteration 30th
gives the best performance.
VI. CONCLUSION
In this paper proposed adaboost SVM method to improve
prediction results of direct marketing of bank dataset compare
to the ordinary SVM. The Adaboost algorithm successfully
Accuracy rate . (6) improves the performance of the SVM method. The results
The sensitivity level is defined as the ratio of data classified
positively [12]given by the equation (7).
obtained shows that the accuracy of the ordinary SVM only get
91.67% and sensitivity of 83.80%, whereas our proposed
adaboost SVM gives accuracy to 95,07% and sensitivity
th
Sensitivity . (7)
91,65% in the 30 iteration.
A. Dataset
IV. EXPERIMENTAL DESIGN
REFERENCES
[1] S. Abbas, “Deposit subscribe Prediction using Data Mining Techniques
based Real Marketing Dataset,” Int. J. Comput. Appl., vol. 110, no. 3, pp.
The total of 41,188 data will gets better accuracy results;
however, since the amount of data on the target deposit
subscribed customers is only 4,640 data which is less than
compared with customers who do not subscribed of 36,548 data.
This sitation will tend to predict the new data more incline to
deposit unsubscribe costumer, and therefore, in this experiment,
the amount of data is balanced to obtain the same probability
outcome. Hence, in this experiment use 9,280 total data with the
amount of comparison of training data is 70% and testing data is
30%.
B. Normalization data
For each attribute of type "categorical" will be changed to
"numerical". This is because data other than the "numerical"
type cannot be processed. The next type of target is changed to
1 for "yes" class and -1 for "No" class
C. Implementation
In this research will compare the performance results from a
single vector support method with support vector machine using
the adaboost algorithm.
V. RESULTS
The performance comparison result of performance of
ordinary SVM and adaboost SVM is given in Table III.
975–8887, 2015.
[2] S. Moro and R. M. S. Laureano, “Using Data Mining for Bank Direct
Marketing: An application of the CRISP-DM methodology,” Eur. Simul.
Model. Conf., no. Figure 1, pp. 117–121, 2011.
[3] C. Vajiramedhin and A. Suebsing, “Feature Selection with Data Balancing
for Prediction of Bank Telemarketing,” Appl. Math. Sci., vol. 8, no. 114,
pp. 5667–5672, 2014.
[4] H. A. Elsalamony, “Bank Direct Marketing Analysis of Data Mining
Techniques,” Int. J. Comput. Appl., vol. 85, no. 7, pp. 12–22, 2014.
[5] H. Elsalamony and A. Elsayad, “Bank Direct Marketing Based on Neural
Network,” Int. J. Eng. Adv. Technol., vol. 2, no. 6, pp. 392–400, 2013.
[6] L. Zhou, K. K. Lai, and L. Yu, “Least squares support vector machines
ensemble models for credit scoring,” Expert Syst. Appl., vol. 37, no. 1, pp.
127–133, 2010.
[7] G. Feng, J.-D. Zhang, and S. Shaoyi Liao, “A novel method for combining
Bayesian networks, theoretical analysis, and its applications,” Pattern
Recognit., vol. 47, no. 5, pp. 2057–2069, 2014.
[8] S. Moro, P. Cortez, and P. Rita, “A data-driven approach to predict the
success of bank telemarketing,” Decis. Support Syst., vol. 62, pp. 22–31,
2014.
[9] H. A. Alhakbani and M. M. Al-Rifaie, “Handling Class Imbalance in Direct
Marketing Dataset using A Hybrid Data and Algorithmic Level Solutions,”
SAI Comput. Conf. 2016, pp. 1–6, 2016.
[10]C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern
Recognition,” Data Min. Knowl. Discov., vol. 2, no. 2, pp. 121–167, 1998.
[11]Y. Freund, “A more robust boosting algorithm,” vol. arXiv:0905, 2009.
[12]G. Nisbet, R., Elder, J., Miner, Handbook of statistical analysis and data
mining applications. 2009.

More Related Content

What's hot

An enhanced kernel weighted collaborative recommended system to alleviate spa...
An enhanced kernel weighted collaborative recommended system to alleviate spa...An enhanced kernel weighted collaborative recommended system to alleviate spa...
An enhanced kernel weighted collaborative recommended system to alleviate spa...
IJECEIAES
 
A Two Stage Classification Model for Call Center Purchase Prediction
A Two Stage Classification Model for Call Center Purchase PredictionA Two Stage Classification Model for Call Center Purchase Prediction
A Two Stage Classification Model for Call Center Purchase Prediction
TELKOMNIKA JOURNAL
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
Amit Kumar
 
The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...
IJMIT JOURNAL
 

What's hot (19)

An enhanced kernel weighted collaborative recommended system to alleviate spa...
An enhanced kernel weighted collaborative recommended system to alleviate spa...An enhanced kernel weighted collaborative recommended system to alleviate spa...
An enhanced kernel weighted collaborative recommended system to alleviate spa...
 
Internship project presentation_final_upload
Internship project presentation_final_uploadInternship project presentation_final_upload
Internship project presentation_final_upload
 
When deep learners change their mind learning dynamics for active learning
When deep learners change their mind  learning dynamics for active learningWhen deep learners change their mind  learning dynamics for active learning
When deep learners change their mind learning dynamics for active learning
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classication
 
A Two Stage Classification Model for Call Center Purchase Prediction
A Two Stage Classification Model for Call Center Purchase PredictionA Two Stage Classification Model for Call Center Purchase Prediction
A Two Stage Classification Model for Call Center Purchase Prediction
 
A Study on Machine Learning and Its Working
A Study on Machine Learning and Its WorkingA Study on Machine Learning and Its Working
A Study on Machine Learning and Its Working
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniques
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
Data Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationData Mining on Customer Churn Classification
Data Mining on Customer Churn Classification
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
 
IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...
IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...
IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
 
Accounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarksAccounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarks
 
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRYPROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
 
The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOX
 
E-commerce online review for detecting influencing factors users perception
E-commerce online review for detecting influencing factors users perceptionE-commerce online review for detecting influencing factors users perception
E-commerce online review for detecting influencing factors users perception
 
On the benefit of logic-based machine learning to learn pairwise comparisons
On the benefit of logic-based machine learning to learn pairwise comparisonsOn the benefit of logic-based machine learning to learn pairwise comparisons
On the benefit of logic-based machine learning to learn pairwise comparisons
 

Similar to direct marketing in banking using data mining

Predicting churn with filter-based techniques and deep learning
Predicting churn with filter-based techniques and deep learningPredicting churn with filter-based techniques and deep learning
Predicting churn with filter-based techniques and deep learning
IJECEIAES
 

Similar to direct marketing in banking using data mining (20)

Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
Predicting reaction based on customer's transaction using machine learning a...
Predicting reaction based on customer's transaction using  machine learning a...Predicting reaction based on customer's transaction using  machine learning a...
Predicting reaction based on customer's transaction using machine learning a...
 
A Comparative Study on Identical Face Classification using Machine Learning
A Comparative Study on Identical Face Classification using Machine LearningA Comparative Study on Identical Face Classification using Machine Learning
A Comparative Study on Identical Face Classification using Machine Learning
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTION
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptx
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom Industry
 
A method for missing values imputation of machine learning datasets
A method for missing values imputation of machine learning datasetsA method for missing values imputation of machine learning datasets
A method for missing values imputation of machine learning datasets
 
trialFinal report7th sem.pdf
trialFinal report7th sem.pdftrialFinal report7th sem.pdf
trialFinal report7th sem.pdf
 
A Modified CNN-Based Face Recognition System
A Modified CNN-Based Face Recognition SystemA Modified CNN-Based Face Recognition System
A Modified CNN-Based Face Recognition System
 
A Modified CNN-Based Face Recognition System
A Modified CNN-Based Face Recognition SystemA Modified CNN-Based Face Recognition System
A Modified CNN-Based Face Recognition System
 
A Modified CNN-Based Face Recognition System
A Modified CNN-Based Face Recognition SystemA Modified CNN-Based Face Recognition System
A Modified CNN-Based Face Recognition System
 
IRJET- Web based Hybrid Book Recommender System using Genetic Algorithm
IRJET- Web based Hybrid Book Recommender System using Genetic AlgorithmIRJET- Web based Hybrid Book Recommender System using Genetic Algorithm
IRJET- Web based Hybrid Book Recommender System using Genetic Algorithm
 
Predicting churn with filter-based techniques and deep learning
Predicting churn with filter-based techniques and deep learningPredicting churn with filter-based techniques and deep learning
Predicting churn with filter-based techniques and deep learning
 
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
 
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERN
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERNEVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERN
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERN
 

Recently uploaded

INCOME TAXATION- CHAPTER 13-A ADDITIONAL CLAIMABLE COMPENSATION EXPENSE
INCOME TAXATION- CHAPTER 13-A ADDITIONAL CLAIMABLE COMPENSATION EXPENSEINCOME TAXATION- CHAPTER 13-A ADDITIONAL CLAIMABLE COMPENSATION EXPENSE
INCOME TAXATION- CHAPTER 13-A ADDITIONAL CLAIMABLE COMPENSATION EXPENSE
RicaAbellanosa
 
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
mikehavy0
 

Recently uploaded (20)

Gain potential customers through Lead Generation
Gain potential customers through Lead GenerationGain potential customers through Lead Generation
Gain potential customers through Lead Generation
 
Macro PDF - How a Global Partner Marketing Concierge Team Can Drive Channel R...
Macro PDF - How a Global Partner Marketing Concierge Team Can Drive Channel R...Macro PDF - How a Global Partner Marketing Concierge Team Can Drive Channel R...
Macro PDF - How a Global Partner Marketing Concierge Team Can Drive Channel R...
 
Personal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptx
Personal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptxPersonal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptx
Personal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptx
 
SEO: A Beginner's Guide to Ranking Higher
SEO: A Beginner's Guide to Ranking HigherSEO: A Beginner's Guide to Ranking Higher
SEO: A Beginner's Guide to Ranking Higher
 
INCOME TAXATION- CHAPTER 13-A ADDITIONAL CLAIMABLE COMPENSATION EXPENSE
INCOME TAXATION- CHAPTER 13-A ADDITIONAL CLAIMABLE COMPENSATION EXPENSEINCOME TAXATION- CHAPTER 13-A ADDITIONAL CLAIMABLE COMPENSATION EXPENSE
INCOME TAXATION- CHAPTER 13-A ADDITIONAL CLAIMABLE COMPENSATION EXPENSE
 
Social Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendaySocial Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh Benday
 
How consumers use technology and the impacts on their lives
How consumers use technology and the impacts on their livesHow consumers use technology and the impacts on their lives
How consumers use technology and the impacts on their lives
 
Iranian Strikes on Israel Assessing the Impact and Implications.pptx
Iranian Strikes on Israel  Assessing the Impact and Implications.pptxIranian Strikes on Israel  Assessing the Impact and Implications.pptx
Iranian Strikes on Israel Assessing the Impact and Implications.pptx
 
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
 
Aiizennxqc Digital Marketing | SEO & SMM
Aiizennxqc Digital Marketing | SEO & SMMAiizennxqc Digital Marketing | SEO & SMM
Aiizennxqc Digital Marketing | SEO & SMM
 
ch05.ppt Strategies in marketing channel
ch05.ppt Strategies in marketing channelch05.ppt Strategies in marketing channel
ch05.ppt Strategies in marketing channel
 
Cartona.pptx. Marketing how to present your project very well , discussed a...
Cartona.pptx.   Marketing how to present your project very well , discussed a...Cartona.pptx.   Marketing how to present your project very well , discussed a...
Cartona.pptx. Marketing how to present your project very well , discussed a...
 
Key Social Media Marketing Trends for 2024
Key Social Media Marketing Trends for 2024Key Social Media Marketing Trends for 2024
Key Social Media Marketing Trends for 2024
 
2024 Social Trends Report V4 from Later.com
2024 Social Trends Report V4 from Later.com2024 Social Trends Report V4 from Later.com
2024 Social Trends Report V4 from Later.com
 
Elevate Your Brand with Billion Broadcaster's Horizontal Lift Advertising
Elevate Your Brand with Billion Broadcaster's Horizontal Lift AdvertisingElevate Your Brand with Billion Broadcaster's Horizontal Lift Advertising
Elevate Your Brand with Billion Broadcaster's Horizontal Lift Advertising
 
Meta­ unveils­ enhanced­ gen-AI­ tools­ catering­ to­marketers.pdf
Meta­ unveils­ enhanced­ gen-AI­ tools­ catering­ to­marketers.pdfMeta­ unveils­ enhanced­ gen-AI­ tools­ catering­ to­marketers.pdf
Meta­ unveils­ enhanced­ gen-AI­ tools­ catering­ to­marketers.pdf
 
HOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATION
HOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATIONHOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATION
HOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATION
 
Social Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendaySocial Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh Benday
 
Resumé Karina Perez | Digital Strategist
Resumé Karina Perez | Digital StrategistResumé Karina Perez | Digital Strategist
Resumé Karina Perez | Digital Strategist
 
Global Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdf
Global Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdfGlobal Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdf
Global Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdf
 

direct marketing in banking using data mining

  • 1. On Identifying Potential Direct Marketing Consumers using Adaptive Boosted Support Vector Machine Armin Lawi Department of Computer Science Universitas Hasanuddin Makassar, Indonesia armin@unhas.ac.id Ali Akbar Velayaty*) Post-graduate Program of Electrical Engineering, Universitas Hasanuddin Makassar, Indonesia arvasit@gmail.com Zahir Zainuddin Dept. Electrical Engineering Universitas Hasanuddin Makassar, Indonesia zainuddinzahir@gmail.com Abstract— Identifying potential consumers for direct marketing to very large data is a difficult and impossible task to do manually. Therefore, the machine learning approach needs to help analyze the data to contribute in determining the marketing strategy policy. In this paper, vector machine support methods using the Adaboost algorithm are investigated to classify potential customers for direct marketing of large bank data marketing. The adaboost algorithm aims to build a better model than the model generated from a single classifier. Data obtained from UCI machine learning repository. Total data is processed as many as 9280 data with the number of attributes of 20 classes and 1 target class. Training data and test data are divided into 70% and 30%. This classification predicts the prospects for a deposit subscription. The results show that the SVM method using Adaboost algorithm obtained accuracy is 95.07% and the sensitivity is 91.65% higher than the ordinary SVM approach. Keywords—Bank Direct Marketing; Support Vector Machine; Adaboost. I. INTRODUCTION Business Intelligence is a term related to the use of information space and Intelligence mechanisms to support a decision [1]. One sector that implements this is the banking sector that receives and processes information from customers every day. One area of banking that collects information from customers is the marketing sector. There are two ways of banking in introduce the goods or services that is with the general advertising either through television, radio, newspapers, etc. or by targeting customers in particular or called the bank direct marketing [2]. Direct marketing is a marketing technique applied to the bank to introduce a product either in the form of goods or services to customers by phone, email and others [3]–[5]. However, the disadvantages of direct marketing are those customers who feel disturbed that can cause negative ratings for the bank and the cost and time spent by telemarketers is not proportional to the number of customers contacted [2] Over time, the amount of data that contains information from customers will increase. The increasing amount of data can be used to find information that serves to assist in making a decision. One of the computational methods that discuss the above problem is Data Mining [1]. Data mining is a field in the computational process that allows finding a pattern of a data that is often ignored by using a particular method. The pattern can be used to help take in a decision. One method of data mining is a classification method that is a technique of grouping data obtained based on the pattern produced previously. There are several models of classification among them is Support Vector Machine (SVM) method. Support Vector Machine is the method that forms the best separator in hyperplane field on the data that separates the two classes by maximizing the margin value of the two classes. According to [6] Although ordinary SVM shows good performance in classification, but is sensitive to sample and parameter settings. Some research indicates that the accuracy and performance of ordinary SVM can be improved by using the Ensemble method [7]. The Ensemble method is a method that combines the predictions generated by some classifier. One of the Ensemble Methods of a classification is boosting algorithm. which is more popular used compared to bagging algorithm [7]. So, in this research will compare the performance of ordinary SVM with SVM using adaboost algorithm. II. LITERATURE As for some related research that discusses the predictions of potential customers to subscribe deposits are as follows. Moro (2011), et al. [2] proposed Cross Industry Standard Process for Data Mining (CRISP-DM) method indicates that the support vector machine gets better AUC and ALIFT results than the Decision Tree and Naive Bayes methods. Feng, et al. [7] proposed a Bayesian networks method and their application shows that a single support vector machine method only produces an average level of accuracy that is not much different than other classification methods, and thus they conducted ensemble method to improve the accuracy of the classification method. Moro (2014), et al. [8] proposed a data-driven approach to predict the success of bank telemarketing by comparing four methods of data mining classification, i.e., Decision Tree, Neural Networks, Support Vector Machine and Logistic Regression. Their result showed that neural network is the best *) Corresponding Author: arvasit@gmail.com
  • 2. Name Type Age Numeric Job Categorical : "Admin.", "Blue-Collar", "Entrepreneur", "Housemaid", "Management", "Retired", "Self- Employed", "Services", "Student", "Technician", "Unemployed", "Unknown" Marital Categorical : "Divorced", "Married", "Single", "Unknown"; Education Categorical : "Basic.4y", "Basic.6y", "Basic.9y", "High.School", "Illiterate", "Professional.Course", "University.Degree", "Unknown" Default Categorical: "No","Yes","Unknown" Housing Categorical: "No","Yes","Unknown" Loan Categorical: "No","Yes","Unknown" Contact Categorical: "Cellular","Telephone" Month Categorical: "Jan", "Feb", "Mar", ..., "Nov", "Dec" Day_Of_Week Categorical: "Mon", "Tue", "Wed", "Thu", "Fri" Duration Numeric Campaign Numeric Pdays Numeric Previous Numeric Poutcome Categorical : "Failure", "Nonexistent", "Success" Emp.Var.Rate Numeric Cons.Price.Idx Numeric Cons.Conf.Idx Numeric Euribor3m Numeric Nr.Employed Numeric Y Binary: "Yes","No" performance, and support vector machines and neural networks are more flexible and have better learning skills among other classification methods. Alhakbani and Rifaie [9] proposed a handling class imbalance in direct marketing dataset using a hybrid data. An algorithmic level solution using the Syntethic Minority oversampling technique is implemented to balance the dataset and then use grid search to optimize the c, gamma, and kernel parameters. The models used the support vector machine to build the HybridDA model. The results obtained indicate that the technique is better than the techniques used by previous research. III. PRELIMINARIES The datasets used in this paper is taken from the University of California at Irvine (UCI) Machine Learning Repository. It is a bank marketing data collected from March 2008 to November 2010. The number of attributes is 20 attributes with the addition of 1 target class, and the total observation is 41,188 data [8]. Table I. Attributes Description A. Support Vector Machine Support Vector Machine (SVM) is one of the methods used in the problem of pattern classification (pattern classification). The basic idea of the SVM method is to maximize the boundary-field separator (hyperplane) that separates the data into two classes in a feature space. Support vector machine method is suitable for use on linearly separable data [10]. To classifying the data cannot be separated linearly and it should be transformed into feature space dimension such that the data can be separated linearly from the feature space. Feature space in practice usually has a higher dimension of input space. Suppose there is a dataset whose data has two attributes and two classes of positive and negative classes when depicted in two-dimensional space of data cannot be separated linearly. Therefore, the data is transformed to a higher dimension so that it can be separated linearly. This method is called kernel methods, which transform a linear SVM into non- linear SVM B. Adaboost Boosting is a common and effective method for building an accurate classifier by combining weak classifiers [7] The use of boosting is preferred because it focuses on misclassified issues and has a tendency to increase higher accuracy compared to bagging. The commonly used boosting algorithm is the adaboost algorithm Adaboost trains the basic classifier sequentially (iteratively) each iteration. The basic classification is trained by using data training with weight coefficients that depend on the classifier performance. On the previous iteration to give greater weight to the wrongly classified data (misclassified). If the classifier has been trained as much as desired, then the classifier is combined to form a final decision on the model that shows the best performance The steps of the Adaboost algorithm are [11] 1. Input : Training data along with its label , Component Learner, And the number of iterations of T. 2. Initialization of training data weight . (1) 3. For iteration a. Use the Component Learner algorithm to train a classification component, , On training weights. b. Calculate the weight of classification error on . (2) The learning trust index is calculated as: . (3) c. Renew the sample training weight . (4) d. Testing model with testing data
  • 3. Method Iteration Accuracy Sensitivity Ordinary SVM - 91,67 % 83,80 % Adaboost SVM 10 92,53 % 85,93 % 20 94,30 % 89,68 % 30 95,07 % 91,65 % 40 94,81 % 91,05 % 50 94,86 % 91,05 % 4. The last learning output Combination of all classifications Table III. Classification results of SVM Adaboost C. Performance Evaluation . (5) The performance result of the proposed method is evaluated based on the degree of accuracy and sensitivity. The measurement evaluation is obtained from the confusion matrix which consists of 4 parts of the classification results, i.e., True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). The confusion matrix is given in Table II. Table II. Confusion Matrix/ Contingency Table Actual Prediction True False True TP FN False FP TN The accuracy level is defined as the degree of proximity between the predicted data with the actual data or the ratio of the amount of data that is correctly classified as shown by equation (6). Table III shows that the adaboost SVM method gives higher accuracy rate since the algorithm updates the weights of incorrect classified data to the specified iteration limit. The table indicates that the increasing number of iterations is not guaranteed the performance obtained will be better. In this result, the iteration 30th gives the best performance. VI. CONCLUSION In this paper proposed adaboost SVM method to improve prediction results of direct marketing of bank dataset compare to the ordinary SVM. The Adaboost algorithm successfully Accuracy rate . (6) improves the performance of the SVM method. The results The sensitivity level is defined as the ratio of data classified positively [12]given by the equation (7). obtained shows that the accuracy of the ordinary SVM only get 91.67% and sensitivity of 83.80%, whereas our proposed adaboost SVM gives accuracy to 95,07% and sensitivity th Sensitivity . (7) 91,65% in the 30 iteration. A. Dataset IV. EXPERIMENTAL DESIGN REFERENCES [1] S. Abbas, “Deposit subscribe Prediction using Data Mining Techniques based Real Marketing Dataset,” Int. J. Comput. Appl., vol. 110, no. 3, pp. The total of 41,188 data will gets better accuracy results; however, since the amount of data on the target deposit subscribed customers is only 4,640 data which is less than compared with customers who do not subscribed of 36,548 data. This sitation will tend to predict the new data more incline to deposit unsubscribe costumer, and therefore, in this experiment, the amount of data is balanced to obtain the same probability outcome. Hence, in this experiment use 9,280 total data with the amount of comparison of training data is 70% and testing data is 30%. B. Normalization data For each attribute of type "categorical" will be changed to "numerical". This is because data other than the "numerical" type cannot be processed. The next type of target is changed to 1 for "yes" class and -1 for "No" class C. Implementation In this research will compare the performance results from a single vector support method with support vector machine using the adaboost algorithm. V. RESULTS The performance comparison result of performance of ordinary SVM and adaboost SVM is given in Table III. 975–8887, 2015. [2] S. Moro and R. M. S. Laureano, “Using Data Mining for Bank Direct Marketing: An application of the CRISP-DM methodology,” Eur. Simul. Model. Conf., no. Figure 1, pp. 117–121, 2011. [3] C. Vajiramedhin and A. Suebsing, “Feature Selection with Data Balancing for Prediction of Bank Telemarketing,” Appl. Math. Sci., vol. 8, no. 114, pp. 5667–5672, 2014. [4] H. A. Elsalamony, “Bank Direct Marketing Analysis of Data Mining Techniques,” Int. J. Comput. Appl., vol. 85, no. 7, pp. 12–22, 2014. [5] H. Elsalamony and A. Elsayad, “Bank Direct Marketing Based on Neural Network,” Int. J. Eng. Adv. Technol., vol. 2, no. 6, pp. 392–400, 2013. [6] L. Zhou, K. K. Lai, and L. Yu, “Least squares support vector machines ensemble models for credit scoring,” Expert Syst. Appl., vol. 37, no. 1, pp. 127–133, 2010. [7] G. Feng, J.-D. Zhang, and S. Shaoyi Liao, “A novel method for combining Bayesian networks, theoretical analysis, and its applications,” Pattern Recognit., vol. 47, no. 5, pp. 2057–2069, 2014. [8] S. Moro, P. Cortez, and P. Rita, “A data-driven approach to predict the success of bank telemarketing,” Decis. Support Syst., vol. 62, pp. 22–31, 2014. [9] H. A. Alhakbani and M. M. Al-Rifaie, “Handling Class Imbalance in Direct Marketing Dataset using A Hybrid Data and Algorithmic Level Solutions,” SAI Comput. Conf. 2016, pp. 1–6, 2016.
  • 4. [10]C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Min. Knowl. Discov., vol. 2, no. 2, pp. 121–167, 1998. [11]Y. Freund, “A more robust boosting algorithm,” vol. arXiv:0905, 2009. [12]G. Nisbet, R., Elder, J., Miner, Handbook of statistical analysis and data mining applications. 2009.