network layer service models forwarding versus routing how a router works rou...
Krishna Chaitanya Yarlagadda Main Poster- Support Vector machines
1. Yarlagadda, Merla-
Introduction
The support vector machine is a powerful state of algorithm that has both the
theoretical foundations and strong regularization properties. The strong
generalization properties make Support vector machines able to generalize
the model easily to the new data. Support vector machine is a kernel based
algorithm which uses quadratic programming that transforms the input data
into high dimensional space for finding the global optimum. The banking
industry is a key element in any industrial economy. For the banks to be
healthy, they must use marketing campaigns to attract and retain customers.
The prediction of the success rate of marketing campaigns can be handled via
different models. The goal of this paper is to show that effective use of SVM is
the best for handling the data with large number of variables where the
outcome of the campaign occupies the space in four different quadrants of the
dimensional space which cannot be fitted via the linear fashion.
Data Preparation
The data was collected from the machine learning repository of UCI website.
The data set consists of 397783 number of observations and 25 variables
such as like job, education, marital status, contact type, day, month, duration,
last contact date, type of campaign, response etc. with most of the variables
containing multiple categories. For categorical variables, a dummy variable is
created with the case values like 0 and 1 to indicate the type of the category.
Therefore, a categorical dependent variable consisting of 4 levels like (A, B, C,
D) is represented by { A(1,0,0,0), B(0,1,0,0), C(0,0,1,0) , D(0,0,0,1)};
Several imputation techniques like Tree imputation, synthetic distribution are
used to replace some of the missing values. The distribution of each of the
variable was observed clearly and some of the variable distributions are
transformed into normal distributions using transform variables in order to
improve the performance of the model
Model building and evaluation
Regression, Neural Networks, Decision Trees, Radial Basis, Sigmoid, and
Polynomial SVM models are built on this data(%training, %validation) and
compared. The validation average squared error, misclassification rate, ROC
curve and cumulative lift statistics are used to evaluate the performance of the
models. Radial Basis and Sigmoid SVM models turned out to be the best
models with a validation average squared error of 0.06 and 0.07,
misclassification rate of 0.56 and 0.57, cumulative lift of 1.76. The outcome of
success of the campaign mainly depends on the last contact and the status of
the previous marketing campaign. March and November seem the best
months for the marketing of bank campaigns.
Figure 1. Model building
Discussion:
1)We might end up with a binary target occupying the non-linear shape in
the dimensional space with the dataset having a large number of variables.
SVM are the best type of models which maps the data into four dimensional
spaces and gives us the unbiased results with more accuracy.
2)The low average squared error and the long curve departure away from
the diagonal for Radial Basis SVM and Sigmoid SVM models from the
above results shows that SVM are best in categorizing the data which
occupies the non-linear shape in the dimensional space.
3)Banks can try SVM to best match the needs of the requirements
whenever the response of their campaigns or offers occupies the non-linear
shape in the dimensional space due to the presence of hundreds or
thousands of variables.
4)The use of Neural networks, decision trees and regression might also
work well in training for the non-linearly separable data, but they might fail
miserably when validating a model
Effective use of Support Vector Machines for the evaluation of the Banking
campaigns using SAS Enterprise Miner 12.1
Krishna Chaitanya Yarlagadda
Data Mining and Reporting Analyst, IQR Consulting, Oklahoma State University(Alumni), Stillwater, OK 74078
Faculty Advisor: Dr. Goutam Chakraborty
Figure 2. Prediction Accuracy Results
Figure 3. ROC curve of the different models
Results
References
.An Introduction to Support
Vector Machines and Other
Kernel-based Learning Methods-
By Nello Cristianini and John-
Shawe-Taylor
. A tutorial on –
http://www.cs.columbia.edu/
.Support Vector machines by
Steinwart, Ingo, Christmann,
Andreas
.Support Vector Machines:
Optimization Based Theory,
Algorithms, and Extensions- By
Naiyang Deng; Yingjie Tian;
Chunhua Zhang
Acknowledgement
The authors wish to thank Dr.
Goutam Chakraborty for his
guidance and advice on this
project.
Authors Information:
Krishna Chaitanya Yarlagadda
E-mail-
krishna.chaitanya.yarlagadda@o
kstate.edu
Work Phone: (269)365-1975
Figure 4. Best Models
Figure 5. Regularization parameter used : 0.775 (which is not too large nor
small) by Radial Basis Support Vector machine model