SlideShare a Scribd company logo
1 of 1
Yarlagadda, Merla-
Introduction
The support vector machine is a powerful state of algorithm that has both the
theoretical foundations and strong regularization properties. The strong
generalization properties make Support vector machines able to generalize
the model easily to the new data. Support vector machine is a kernel based
algorithm which uses quadratic programming that transforms the input data
into high dimensional space for finding the global optimum. The banking
industry is a key element in any industrial economy. For the banks to be
healthy, they must use marketing campaigns to attract and retain customers.
The prediction of the success rate of marketing campaigns can be handled via
different models. The goal of this paper is to show that effective use of SVM is
the best for handling the data with large number of variables where the
outcome of the campaign occupies the space in four different quadrants of the
dimensional space which cannot be fitted via the linear fashion.
Data Preparation
The data was collected from the machine learning repository of UCI website.
The data set consists of 397783 number of observations and 25 variables
such as like job, education, marital status, contact type, day, month, duration,
last contact date, type of campaign, response etc. with most of the variables
containing multiple categories. For categorical variables, a dummy variable is
created with the case values like 0 and 1 to indicate the type of the category.
Therefore, a categorical dependent variable consisting of 4 levels like (A, B, C,
D) is represented by { A(1,0,0,0), B(0,1,0,0), C(0,0,1,0) , D(0,0,0,1)};
Several imputation techniques like Tree imputation, synthetic distribution are
used to replace some of the missing values. The distribution of each of the
variable was observed clearly and some of the variable distributions are
transformed into normal distributions using transform variables in order to
improve the performance of the model
Model building and evaluation
Regression, Neural Networks, Decision Trees, Radial Basis, Sigmoid, and
Polynomial SVM models are built on this data(%training, %validation) and
compared. The validation average squared error, misclassification rate, ROC
curve and cumulative lift statistics are used to evaluate the performance of the
models. Radial Basis and Sigmoid SVM models turned out to be the best
models with a validation average squared error of 0.06 and 0.07,
misclassification rate of 0.56 and 0.57, cumulative lift of 1.76. The outcome of
success of the campaign mainly depends on the last contact and the status of
the previous marketing campaign. March and November seem the best
months for the marketing of bank campaigns.
Figure 1. Model building
Discussion:
1)We might end up with a binary target occupying the non-linear shape in
the dimensional space with the dataset having a large number of variables.
SVM are the best type of models which maps the data into four dimensional
spaces and gives us the unbiased results with more accuracy.
2)The low average squared error and the long curve departure away from
the diagonal for Radial Basis SVM and Sigmoid SVM models from the
above results shows that SVM are best in categorizing the data which
occupies the non-linear shape in the dimensional space.
3)Banks can try SVM to best match the needs of the requirements
whenever the response of their campaigns or offers occupies the non-linear
shape in the dimensional space due to the presence of hundreds or
thousands of variables.
4)The use of Neural networks, decision trees and regression might also
work well in training for the non-linearly separable data, but they might fail
miserably when validating a model
Effective use of Support Vector Machines for the evaluation of the Banking
campaigns using SAS Enterprise Miner 12.1
Krishna Chaitanya Yarlagadda
Data Mining and Reporting Analyst, IQR Consulting, Oklahoma State University(Alumni), Stillwater, OK 74078
Faculty Advisor: Dr. Goutam Chakraborty
Figure 2. Prediction Accuracy Results
Figure 3. ROC curve of the different models
Results
References
.An Introduction to Support
Vector Machines and Other
Kernel-based Learning Methods-
By Nello Cristianini and John-
Shawe-Taylor
. A tutorial on –
http://www.cs.columbia.edu/
.Support Vector machines by
Steinwart, Ingo, Christmann,
Andreas
.Support Vector Machines:
Optimization Based Theory,
Algorithms, and Extensions- By
Naiyang Deng; Yingjie Tian;
Chunhua Zhang
Acknowledgement
The authors wish to thank Dr.
Goutam Chakraborty for his
guidance and advice on this
project.
Authors Information:
Krishna Chaitanya Yarlagadda
E-mail-
krishna.chaitanya.yarlagadda@o
kstate.edu
Work Phone: (269)365-1975
Figure 4. Best Models
Figure 5. Regularization parameter used : 0.775 (which is not too large nor
small) by Radial Basis Support Vector machine model

More Related Content

Similar to Krishna Chaitanya Yarlagadda Main Poster- Support Vector machines

Om0010 operations management
Om0010   operations managementOm0010   operations management
Om0010 operations management
smumbahelp
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
AnanthReddy38
 
The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...
IJMIT JOURNAL
 
network layer service models forwarding versus routing how a router works rou...
network layer service models forwarding versus routing how a router works rou...network layer service models forwarding versus routing how a router works rou...
network layer service models forwarding versus routing how a router works rou...
Ashish Gupta
 

Similar to Krishna Chaitanya Yarlagadda Main Poster- Support Vector machines (20)

direct marketing in banking using data mining
direct marketing in banking using data miningdirect marketing in banking using data mining
direct marketing in banking using data mining
 
Predictive modelling
Predictive modellingPredictive modelling
Predictive modelling
 
Machine learning project
Machine learning project Machine learning project
Machine learning project
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
 
Om0010 operations management
Om0010   operations managementOm0010   operations management
Om0010 operations management
 
Predicting User Ratings of Competitive ProgrammingContests using Decision Tre...
Predicting User Ratings of Competitive ProgrammingContests using Decision Tre...Predicting User Ratings of Competitive ProgrammingContests using Decision Tre...
Predicting User Ratings of Competitive ProgrammingContests using Decision Tre...
 
Brain Tumor Classification using Support Vector Machine
Brain Tumor Classification using Support Vector MachineBrain Tumor Classification using Support Vector Machine
Brain Tumor Classification using Support Vector Machine
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
A Comparative Study on Identical Face Classification using Machine Learning
A Comparative Study on Identical Face Classification using Machine LearningA Comparative Study on Identical Face Classification using Machine Learning
A Comparative Study on Identical Face Classification using Machine Learning
 
A survey of modified support vector machine using particle of swarm optimizat...
A survey of modified support vector machine using particle of swarm optimizat...A survey of modified support vector machine using particle of swarm optimizat...
A survey of modified support vector machine using particle of swarm optimizat...
 
Feature selection in multimodal
Feature selection in multimodalFeature selection in multimodal
Feature selection in multimodal
 
Creating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmCreating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning Algorithm
 
Explainable Machine Learning
Explainable Machine LearningExplainable Machine Learning
Explainable Machine Learning
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...
 
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionApplying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
 
network layer service models forwarding versus routing how a router works rou...
network layer service models forwarding versus routing how a router works rou...network layer service models forwarding versus routing how a router works rou...
network layer service models forwarding versus routing how a router works rou...
 

Krishna Chaitanya Yarlagadda Main Poster- Support Vector machines

  • 1. Yarlagadda, Merla- Introduction The support vector machine is a powerful state of algorithm that has both the theoretical foundations and strong regularization properties. The strong generalization properties make Support vector machines able to generalize the model easily to the new data. Support vector machine is a kernel based algorithm which uses quadratic programming that transforms the input data into high dimensional space for finding the global optimum. The banking industry is a key element in any industrial economy. For the banks to be healthy, they must use marketing campaigns to attract and retain customers. The prediction of the success rate of marketing campaigns can be handled via different models. The goal of this paper is to show that effective use of SVM is the best for handling the data with large number of variables where the outcome of the campaign occupies the space in four different quadrants of the dimensional space which cannot be fitted via the linear fashion. Data Preparation The data was collected from the machine learning repository of UCI website. The data set consists of 397783 number of observations and 25 variables such as like job, education, marital status, contact type, day, month, duration, last contact date, type of campaign, response etc. with most of the variables containing multiple categories. For categorical variables, a dummy variable is created with the case values like 0 and 1 to indicate the type of the category. Therefore, a categorical dependent variable consisting of 4 levels like (A, B, C, D) is represented by { A(1,0,0,0), B(0,1,0,0), C(0,0,1,0) , D(0,0,0,1)}; Several imputation techniques like Tree imputation, synthetic distribution are used to replace some of the missing values. The distribution of each of the variable was observed clearly and some of the variable distributions are transformed into normal distributions using transform variables in order to improve the performance of the model Model building and evaluation Regression, Neural Networks, Decision Trees, Radial Basis, Sigmoid, and Polynomial SVM models are built on this data(%training, %validation) and compared. The validation average squared error, misclassification rate, ROC curve and cumulative lift statistics are used to evaluate the performance of the models. Radial Basis and Sigmoid SVM models turned out to be the best models with a validation average squared error of 0.06 and 0.07, misclassification rate of 0.56 and 0.57, cumulative lift of 1.76. The outcome of success of the campaign mainly depends on the last contact and the status of the previous marketing campaign. March and November seem the best months for the marketing of bank campaigns. Figure 1. Model building Discussion: 1)We might end up with a binary target occupying the non-linear shape in the dimensional space with the dataset having a large number of variables. SVM are the best type of models which maps the data into four dimensional spaces and gives us the unbiased results with more accuracy. 2)The low average squared error and the long curve departure away from the diagonal for Radial Basis SVM and Sigmoid SVM models from the above results shows that SVM are best in categorizing the data which occupies the non-linear shape in the dimensional space. 3)Banks can try SVM to best match the needs of the requirements whenever the response of their campaigns or offers occupies the non-linear shape in the dimensional space due to the presence of hundreds or thousands of variables. 4)The use of Neural networks, decision trees and regression might also work well in training for the non-linearly separable data, but they might fail miserably when validating a model Effective use of Support Vector Machines for the evaluation of the Banking campaigns using SAS Enterprise Miner 12.1 Krishna Chaitanya Yarlagadda Data Mining and Reporting Analyst, IQR Consulting, Oklahoma State University(Alumni), Stillwater, OK 74078 Faculty Advisor: Dr. Goutam Chakraborty Figure 2. Prediction Accuracy Results Figure 3. ROC curve of the different models Results References .An Introduction to Support Vector Machines and Other Kernel-based Learning Methods- By Nello Cristianini and John- Shawe-Taylor . A tutorial on – http://www.cs.columbia.edu/ .Support Vector machines by Steinwart, Ingo, Christmann, Andreas .Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions- By Naiyang Deng; Yingjie Tian; Chunhua Zhang Acknowledgement The authors wish to thank Dr. Goutam Chakraborty for his guidance and advice on this project. Authors Information: Krishna Chaitanya Yarlagadda E-mail- krishna.chaitanya.yarlagadda@o kstate.edu Work Phone: (269)365-1975 Figure 4. Best Models Figure 5. Regularization parameter used : 0.775 (which is not too large nor small) by Radial Basis Support Vector machine model