This paper proposes a combined GA-CFS approach to increase the accuracy of predicting participation in elections by identifying and removing noisy features. The approach uses genetic algorithm as a search method to select important features for election participation based on correlation-based feature selection, which evaluates feature subsets. When applied to a dataset on election participation factors, the combined method improved the predictive accuracy of classification algorithms compared to using the full feature set. The results demonstrate that the GA-CFS approach is effective at identifying and removing irrelevant and noisy features to enhance predictive performance.
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
GA-CFS APPROACH TO INCREASE THE ACCURACY OF ESTIMATES IN ELECTIONS PARTICIPATION
1. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.5, September 2013
DOI:10.5121/ijfcst.2013.3503 27
GA-CFS APPROACH TO INCREASE THE ACCURACY
OF ESTIMATES IN ELECTIONS PARTICIPATION
Seyyed Reza Khaze1
, Ramin Jafarzadeh2
, Laya Ebrahimi3
, Isa Maleki4
1,4
Department of Computer Engineering, Dehdasht Branch,
Islamic Azad University, Dehdasht, Iran
2
Department of Computer Engineering, Khoy Branch,
Islamic Azad University, Khoy, Iran
3
Department of Computer Engineering, Science and Research Branch,
Islamic Azad University, West Azerbaijan, Iran
ABSTRACT
Prediction the main indexes of participation in election and its effective factors are serious challenges for
political and social planners. By respect to abnormal nature of offered analyzes by political scientists, data
searchers tried to solve the problems of other methods by discovering hidden sciences of data. In this
paper, we represent combined method of Genetic Algorithm (GA) and Correlation-based feature selection
(CFS) for increasing precision of classifying in methods based on data searching for participation in
election which identifies and removes noised Feature of total set of them. Results of our paper indicated
that our offered method could increase precision of other methods prediction.
KEYWORDS
Stock Market, Stock index, S&P 500, Data Mining, Regression, Dataset
1. INTRODUCTION
Participation is a voluntarily contribution of a group of people ,movement and forms in political
and social plans [1]which is from people and groups sentiment for solving their problems and can
have main role in national development and for increasing the credit of different governments
political and social leadership. every person and group like to live in a system which can answer
his mental, political, opinionative and even financial needs and without participating decisions he
can't find his position and such system. On the other hand, by participation in political and social
life we can find background of stability in political systems. Election is a kind of action which
defines and determine amount of political participation in social system [2] and can be the best
sample of people's political participation. Therefore participation analyze in election can be true
analyze of totality of peoples political participation. By respect to its importance, scientists and
analyzers in political and social sciences in worldwide are developing models for determination
factors of people's political participation by which prepare programs based on the most important
causes of participation or not participation for people [3]. Government also tries to meet the
shortages and factors which prevent people in addition to reinforce them to participate based on
offered plans. For preventing the probable hurts to government without participation of people
and for decreasing the public dissatisfying, before election, its better that people participation be
predicted and encouraging and preventing factors to participation be determined which leads to
less costs for a political system. Political experiences proved that political leadership which
believe in democracy and people who live in such society and are satisfied by their politicians
have a people force against internal and external threats, make other societies to respect and
2. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.5, September 2013
28
suitable action in international organizations and societies with them. So far analyzers have
offered various methods for prediction people participation in election and its effective factors
which have probably and uncertain nature and they don’t have high certain coefficient [4]. On the
other hand, political and social experts offered many factors for people participation which have
wide range and respecting to all of them require time and is impossible. Today, most of scientists
and analyzers believe that Data mining can discover hidden rules of data [5]. In this paper we
tried to solve problems of probably and uncertain nature by methods of searching data and by
offering trust method, discover the most important factors which are effective on people
participation in election. This model can make minimum the cost of many factors which are taken
account by different experts and help them to increase amount of people participation in social
decision by respect to the most important discovered factors from knowledge of data by
investment of them. In this paper, at first, we investigated people participation prediction and its
effective factors by respect to previous works and then, we tried to increase percentage of
recognition and precision of prediction in previous methods by common model of searching data.
2. PREVIOUS WORK
Amin Babazadeh Sangar & et al. [6] prepared classification tree, K-Nearest Neighbor (KNN) and
naive Bayes for prediction the amount of people participation in election and discovering
effective factors by orange tools based on CRISP which is a total schematic of architecting of
system for prediction the participation in president election by KNN algorithm and its results are
good. The KNN decision rule has been a ubiquitous classification tool with good scalability. Past
experience has shown that the optimal choice of K depends upon the data, making it laborious to
tune the parameter for different applications [6]. The performance of a KNN classifier is
primarily determined by the choice of K as well as the distance metric applied.
For test and assessment the offered model, they used subjective study of 100 person's opinion
who can participate in 11th president election in Islamic republic of Iran in kohgiloye town and
prediction of their participation in election. In their paper they introduced factors by domain
which are observed in Table (1) as effective parameter for determining the amount of people
participation in election.
Table1. Effective parameters introduced in [6] for people's participation in election
No Important Feature
1 Age of people
2 Degree of people
3 Jobs of people
4 Political Orientation of people
5 Attitudes to Government Services
6 Opinion of people about Participation type in elections
7 Opinion of people about general policy in international affairs
8 Opinion of people about the election official
9 Opinion of people about Candidates
They indicated that for prediction the participation KNN algorithm in compare to 2 other
algorithms predict and classify with more precision. In this paper, we tried to increase amount of
precision of used algorithm.
3. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.5, September 2013
29
3. THE PROPOSED METHOD
In this paper, we represented a combined method of GA and CFS and by recognition the noisy
and unrelated Feature from its set by Amin Babazadeh et al [6] helps to nearest neighbor
algorithm, means decision tree and simple Bayes that classify by better percentage. The classifier
tree is one of the possible approaches to multistage decision making [7]. In fact, decision making
tree is a type of proper tool and function to classify data, do estimation and provide anticipation
due to the features and characteristics which data have till now. Naïve Bayes models are popular
in machine learning applications, due to their simplicity in allowing each attribute to contribute
towards the final decision equally and independently from the other attributes [8]. This simplicity
equates to computational efficiency, which makes Naïve Bayes techniques attractive and suitable
for many domains. Bayesian theory is a basic statistical method to solve problems of pattern
recognition and classification.
GA was at first offered by John Holland based on Darwin natural evolution opinion [9]. GA is a
repeated accidentally process which doesn’t grant the convergence. But with some strategies this
chance is increased and this offered algorithm is implanted. The condition for repeating stop in
this algorithm can be acceptable with some constant and predetermined amounts such as number
of generations or reaching to level. GA is an effective optimization method which collected
positive feature of accidentally methods so much. But always, it can be offered some changes in
GA by respect to feature of the problem [10]. The first step in GA is primary population
accidentally or discovered. Each member of population is named a chromosome, and shows an
answer for problem. Chromosomes are completed during repeated periods and each period is
named a generation.in each period population is changed and created a new generation which is
effective for optimum answer. For keeping the best answer of each generation and preventing
their destroying we can use elite technics. Chromosomes reform and evolution is done in 2 ways
.in the first step, number of chromosomes are accidentally selected and are combined to create
new elements. Second step is named jumping in which in each repeating one or more
chromosomes are accidentally selected and one of its gene is also accidentally selected and is
changed based on special mechanism and thus, new chromosomes are created.in final step
primary population is selected from extended population among the best members and is
considered as new generation. There are different methods for selection step and we can point to
roulette selection. GA is different in combination method, jumping, and selection or application
steps [11].
Feature subset selection has extended scope in research and statistic and pattern recognition
methods [12]. Feature selection can have role in economic cost and human force decreasing for
data collecting and implementation and analyzing the levels [13]. Selection the Feature subset is
important in searching data. High number of Feature lead to be difficult classifying methods
testing and training [14] and less precision in prediction [15].in selecting the Feature we try to
identify unneeded and unrelated data with preprocessing and remove them from Feature domain
[16]. The main problem in recognition machine learning is a set of Feature which is more
important for making classify model and prediction [17].one of the methods for Feature selection
is CFS [18]. So far CFS has been used in many natural and artificial sets. Experiments indicated
that this method in compare to learning algorithm such as decision tree , nearest neighbor and
simple Bayes which were used by Amin Babazadeh et al, has better speed for representing noisy
and unrelated Feature. This method proved that in most cases, classify and prediction is equal or
better than other methods. CFS assessor is value of one subset of adjectives by respect to ability
for individual prediction for each property and by respect to length of increasement degree
between them. Unity coefficient is used for unity estimating among adjectives subset and class
label and for unity relating among considered feature [19]. CFS is used for the best property of
the subset and can be with some search methods such as backward, forward selection, GA, best-
4. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.5, September 2013
30
first search, bi-directional search, elimination. In this paper, we considered GA by name search
method among effective Feature for participation in election and CFS was considered as
assessment mechanism for effective subsets for participation in election or fitness function and its
total scheme is observed in figure 1.
Figure1. The Proposed Model Is Based On a GA-CFS Approach
We set effective parameters in GA corresponded to details in table 2.
Table 2. Characteristics of the GA for Proposed Model
ValueParameters
20Population size
100000Number of generations
0.6Probability of crossover
0.033Probability of mutation
For calculating fitness function, we used CFS is observed in equation (1).
ii
zi
ZC
rkkk
rk
r
)1(
(1)
General set of
features to
participate in
elections
The initial
generation of a
set of features
Crossover Mutation
Fitness function is computed
using CFS
Selection
Maximum
number of
generations
has come?
Choosing the best
set of features
5. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.5, September 2013
31
rZC Is unity between Feature subset and variables class. Parameter K is number of Feature in
subsets and rzi is average unity between Feature subsets in variables class and rii is average unity
of feature subsets. Our methods by combination two algorithms indicated that finally in 19728th
generation, crossover and mutation don’t effect on alternation the function and finished search
and assessment. At the end, three feature from feature sets which prepared by Amin Babazadeh et
al are selected as ineffective and noisy feature and we removed them of data sets [6]. These
features are age, job and opinion about candidates.
4. EVALUATION OF THE PROPOSED METHOD
This section includes the results obtained by the methods described in the paper. For the
experiments of the paper have been used the database [6]. We classify them by removing noisy
feature from property sets in Table 1. Its results are based on offered method in figure 2.
Figure2. The proposed approach is compared with other methods
It can be observed that offered method could have better performance in decision tree and simple
Bayes and increase precision of classifying but in compare to KNN its 1% weaker. Thus, in
average precision, offered method could recognize noisy data and increase prediction precision by
removing them of data set. GA combined performs anticipation and classification, high accuracy.
5. CONCLUSIONS
In this paper, GA is combined with Classifier Trees, KNN and Naive Bayes algorithms. We, a
combined method of GA and CFS is introduced which classifies by better percentage with noisy
and unrelated feature recognition for predicting the participation in election. We used GA as
search method among effective feature in participation in election and used CFS as assessment
mechanism for effective subsets in election participation and after removing unrelated and noisy
feature, classified data sets for other investigators and its results indicated that average amount of
prediction for algorithms by offered method has better results. Therefore, Obtained results
demonstrate the good accuracy of the proposed GA-based combined other algorithms.
6. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.5, September 2013
32
REFERENCES
[1] Day, N. (1992). Political participation and democracy in Britain. Cambridge University Press.
[2] Feddersen, T., & Sandroni, A. (2006). A theory of participation in elections. The American Economic
Review, 96(4), 1271-1282.
[3] Brady, H. E., Verba, S., & Schlozman, K. L. (1995). Beyond SES: A resource model of political
participation. American Political Science Review, 271-294.
[4] Gill, G. S. (2008), Election Result Forecasting Using Two Layer Perceptron Network ,Journal of
Theoretical and Applied Information Technology, 47(11),1019-1024
[5] Gharehchopogh, F. S. & Khaze, S. R. (2012). Data Mining Application for Cyber Space Users
Tendency In Blog Writing: A Case Study, International journal of computer applications, 47(18), 40-
46.
[6] Sangar, A. B., Khaze, S. R., & Ebrahimi, L (2013), Participation Anticipating In Elections Using Data
Mining Methods , International Journal On Cybernetics & Informatics,2(2),47-60.
[7] Kumari, M., Godara, S. (2011), Comparative Study of Data Mining Classification Methods in
Cardiovascular Disease Prediction, International Journal of Computer Science and Technology, 2(2),
304-308
[8] Mccallum, Andrew, Nigam, K. (1998). A Comparison of Event Models for Naive Bayes text
classification, aaai-98 workshop on learning for text categorization, Vol. 752, 41-48.
[9] Goldberg, D. E., & Holland, J. H. (1988). Genetic algorithms and machine learning. Machine
learning, 3(2), 95-99.
[10] Gharehchopogh, F.S., Maleki, I., & Farahmandian, M. (2012). New Approach for Solving Dynamic
Traveling Salesman Problem with Hybrid Genetic Algorithms and Ant Colony Optimization.
International Journal of Computer Applications, 53(1), 39-44.
[11] Grefenstette, J. J. (1986). Optimization of control parameters for genetic algorithms. Systems, Man
and Cybernetics, IEEE Transactions on, 16(1), 122-128.
[12] Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent data analysis, 1(3), 131-
156.
[13] Jain, A., & Zongker, D. (1997). Feature selection: Evaluation, application, and small sample
performance. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19(2), 153-158.
[14] Sun, Z., Bebis, G., Yuan, X., & Louis, S. J. (2002). Genetic feature subset selection for gender
classification: A comparison study. In Applications of Computer Vision, 2002.(WACV 2002).
Proceedings. Sixth IEEE Workshop on (pp. 165-170). IEEE.
[15] Singhi, S. K., & Liu, H. (2006, June). Feature subset selection bias for classification learning. In
Proceedings of the 23rd international conference on Machine learning (pp. 849-856). ACM.
[16] Yang, J., & Honavar, V. (1998). Feature subset selection using a genetic algorithm. In Feature
extraction, construction and selection (pp. 117-136). Springer US.
[17] Hall, M. A., & Smith, L. A. (1999, May). Feature Selection for Machine Learning: Comparing a
Correlation-Based Filter Approach to the Wrapper. In FLAIRS Conference (pp. 235-239).
[18] Hall, M. A. (1999). Correlation-based feature selection for machine learning (Doctoral dissertation,
The University of Waikato).
[19] Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial intelligence,
151(1), 155-176.
Authors
Seyyed Reza Khaze is a Lecturer and Member of the Research Committee of the
Department of Computer Engineering, Dehdasht Branch, Islamic Azad University,
Iran. He is a Member of Editorial Board and Review Board in Several International
Journals and National Conferences. His interested research areas are in the Software
Cost Estimation, Machine learning, Data Mining, Optimization and Artificial
Intelligence.
7. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.5, September 2013
33
Ramin Jafarzadeh is Faculty member of the Department of Computer Engineering,
Khoy Branch, Islamic Azad University, Iran. Currently He Has a PhD Candidate In
Department Of Computer Engineering At Science And Research Branch, Islamic
Azad University, Iran. His Interested Research Areas Are In The Cloud Computing
And Data Mining Artificial Intelligence.
Laya Ebrahimi is a M.Sc. student in Computer Engineering Department, Science
and Research Branch, Islamic Azad University, West Azerbaijan, Iran. His interested
research areas are Meta Heuristic Algorithms, Data Mining and Machine learning
Techniques.
Isa Maleki is a Lecturer and Member of The Research Committee of The
Department of Computer Engineering, Dehdasht Branch, Islamic Azad University,
Dehdasht, Iran. He Also Has Research Collaboration with Dehdasht Universities
Research Association Ngo. He is a Member of Review Board in Several National
Conferences. His Interested Research Areas Are in the Software Cost Estimation,
Machine learning, Data Mining, Optimization and Artificial Intelligence.