SlideShare a Scribd company logo
1 of 7
Download to read offline
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. VI (Nov – Dec. 2015), PP 45-51
www.iosrjournals.org
DOI: 10.9790/0661-17664551 www.iosrjournals.org 45 | Page
Filter-Wrapper Approach to Feature Selection Using PSO-GA for
Arabic Document Classification with Naive Bayes Multinomial
Indriyani1)
, Wawan Gunawan 2)
, Ardhon Rakhmadi3)
1, 2, 3)
Department of Informatics, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember,
Jl. Teknik Kimia, Sukolilo, Surabaya, 60117, Indonesia
Abstract: Text categorization and feature selection are two of the many text data mining problems. In
text categorization, the document that contains a collection of text will be changed to the dataset
format, the dataset that consists of features and class, words become features and categories of
documents become class on this dataset. The number of features that too many can cause a decrease
in performance of classifier because many of the features that are redundant and not optimal so that
feature selection is required to select the optimal features. This paper proposed a feature selection
strategy based on Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) methods for
Arabic Document Classification with Naive Bayes Multinomial (NBM). Particle Swarm Optimization
(PSO) is adopted in the first phase with the aim to eliminate the insignificant features and prepared
the reduce features to the next phase. In the second phase, the reduced features are optimized using
the new evolutionary computation method, Genetic Algorithm (GA). These methods have greatly
reduced the features and achieved higher classification compared with full features without features
selection. From the experiment that has been done the obtained results of accuracy are NBM
85.31%, NBM-PSO 83.91% and NBM-PSO-GA 90.20%.
Keywords: Document Classification, Feature Selection, Particle Swarm Optimization (PSO), Genetic
Algorithm (GA).
I. Introduction
Text data mining[1] is a research domain involving many research areas, such as natural
language processing, machine learning, information retrieval[2], and data mining. Text categorization
and feature selection are two of the many text data mining problems. The text document
categorization problem has been studied by many researchers[3][4][5]. Yang et al. showed
comparative research on feature selection in text classification[5]. Many machine learning methods
have been used for the text categorization problem, but the problem of feature selection is to find a
subset of features for optimal classification. Even some noise features may sharply reduce the
classification accuracy. Furthermore, a high number of features can slow down the classification
process or even make some classifiers inapplicable.
According to John, Kohavi, and Pfleger[6], there are mainly two types of feature selection
methods in machine learning: wrappers and filters. Wrappers use the classification accuracy of some
learning algorithm as their evaluation function. Since wrappers have to train a classifier for each
feature subset to be evaluated, they are usually much more time consuming especially when the
number of features is high. So wrappers are generally not suitable for text classification.
Hence, feature selection is commonly used in text classification to reduce the dimensionality
of feature space and improve the efficiency and accuracy of classifiers. Text classification tasks can
usually be accurately performed with less than 100 words in simple cases, and do best with words in
the thousands in the complex ones[7].
As opposed to wrappers, filters perform feature selection independently of the learning
algorithm that will use the selected features. In order to evaluate a feature, filters use an evaluation
metric that measures the ability of the feature to differentiate each class.
Since wrappers have to train a classifier for each feature subset to be evaluated, they are
usually much more time consuming especially when the number of features is high. So wrappers are
generally not suitable for text classification.
For this reason, our motivation is to build a good text classifier by investigating hybrid filter
and wrapper method for feature selection. Finally, our proposed algorithm Particle Swarm Optimation
Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification…
DOI: 10.9790/0661-17664551 www.iosrjournals.org 46 | Page
for filtering and second phase, we reduce result features filtering using wrapper Genetic Algorithm
(GA) for optimizing features for Classification using Naive Bayes Multinominal for Arabic Document
text Classification. The evaluation used an Arabic corpus that consists of 478 documents from
www.shamela.ws, which are independently classified into seven categories.
II. Methodology
General description of the research method is shown in Figure 1. The stages and the methods
used to lead this study include.
Stages and methods used to underpin this study include:
A. DataSet Document
Document used as experimental data taken from www.shamela.ws and taken seven categories
according to the most often discussed is Sholat, Zakat, Shaum, hajj, mutazawwij, Baa'aand Isytaro
and Wakaf/
B. PreProcessing
Fig 1. Research Stage
In the process of preprocessing, each document is converted to vector document news with
the following sequence:
- Filtering, namely the elimination of illegal characters (numbers and symbols) in the document's
contents.
- Stoplist removal, i.e. removal of the characters included in the category of stopword or words that
have a high frequency contained in the data stoplist.
- Terms extraction, which extract the terms (words) of each document to be processed and compiled
into a vector of terms that represent the document.
- Stemming, namely to restore the basic shape of each term found in the document vector and
grouping based on the terms similar.
- TF-IDF weighting i.e. performs weighting TF-IDF on any terms that exist in the document vector.
C. Filter - Particle Swarm Optimization (PSO)
Particle swarm optimization (PSO) is an evolutionary computation technique developed by
Kennedy and Eberhart [11]. The Particle Swarms find optimal regions of the complex search space
through the interaction of individuals in the population. PSO is attractive for feature selection in that
particle swarms will discover best feature combinations as they fly within the subset space.During
Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification…
DOI: 10.9790/0661-17664551 www.iosrjournals.org 47 | Page
movement, the current position of particle i is represented by a vector xi = (xi1, xi2, ..., xiD), where D
is the dimensionality of the search space. The velocity of particle i is represented as vi = (vi1, vi2, ...,
viD) and it must be in a range defined by parameters vmin and vmax. The personal best position of the
particle local best is the best previous position of that particle and the best position obtained by the
population thus far is called global best. According to the following equations, PSO searches for the
optimal solution by updating the velocity and the position of each particle:
xid(t+1) = xid(t) + vid(t+1) (1)
vid(t+1) = w * vid(t) +c1* r1i * ( pid - xid(t)) +
c2* r2i * ( pgd - xid(t)) (2)
where t denotes the tth iteration, d ϵ D denotes the dth dimension in the search space, w is inertia weight, c1 and
c2are the learning factors called, respectively, cognitive parameter, social parameter, r1i and r2i are random values uniformly
distributed in [0, 1], and finally pid and pgd represent the elements of local best and global best in the dth dimension. The
personal best position of particle i is calculated as
(3)
In this work, PSO is a filter with CFS (Correlation-based Feature Selection) as a fitness function.
Like the majority of feature selection techniques, CFS uses a search algorithm along with a function
to evaluate the worth of feature subsets. CFS measures the usefulness of each feature for predicting
the class label along with the level of inter correlation among them, based on the hypothesis: Good
feature subsets contain features highly correlated (predictive of) with the class, yet uncorrelated with
(not predictive of)each other [12].
D. Design of the Wrapper Phase
After filtering, the next phase is the wrapper, Wrapper methods used previously usually adopt
random search strategies, such as Adaptive Genetic Algorithm (AGA), Chaotic Binary Particle
Swarm Optimization (CBPSO) and Clonal Selection Algorithm (CSA). The wrapper approach
consists of methods choosing a minimum subset of features that satisfies an evaluation criterion.
It was proved that the wrapper approach produces the best results out of the feature selection
methods [13], although this is a time-consuming method since each feature subset considered must be
evaluated with the classifier algorithm. to overcome this, we perform filtering using the PSO in
advance so that we can reduce the execution time.In the wrapper method, the features subset selection
algorithm exists as a wrapper around the data mining algorithm and outcome evaluation.The induction
algorithm is used as a black box. The feature selection algorithm conducts a search for a proper subset
using the induction algorithm itself as a part of the evaluation function. GA-based wrapper methods
involve a Genetic Algorithm (GA) as a search method of subset features.
GA is a random search method, effectively exploring large search spaces [14]. The basic idea
of GA is to evolve a population of individuals (chromosomes), where an individual is a possible
solution to a given problem. In the case of searching the appropriate subset of features, a population
consists of different subsets evolved by a mutation, a crossover, and selection operations. After
reaching maximum generations, algorithms returns the chromosome with the highest fitness, i.e. the
subset of features with the highest accuracy.
Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification…
DOI: 10.9790/0661-17664551 www.iosrjournals.org 48 | Page
Figure 2. GA-based wrapper feature selection with Naive Bayes Multinomial as an induction
algorithm evaluating feature subset.
E. Naive Bayes Multinomial
Multinomial Naive Bayes (MNB) is the version of Naive Bayes that is commonly used for
text categorization problems. In the MNB classifier, each document is viewed as a collection of words
and the order of words is considered irrelevant.
The probability of a class value c given a test document d is computed as
(4)
where nwd is the number of times word w occurs in document d, P (w|c) is the probability of
observing word w given class c, P (c) is the prior probability of class c, and P (d) is a constant that
makes the probabilities for the different classes sum to one. P (c) is estimated by the proportion of
training documents pertaining to class c and P (w|c) is estimated as
(5)
Many text categorization problems are unbalanced. This can cause problems because of the
Laplace correction used in. Consider a word w in a two class problem with classes c1 and c2, where w
is completely irrelevant to the classification. This means the odds ratio for that particular word should
be one, i.e. , so that this word does not influence the class probability. Assume the word
occurs with equal relative frequency 0.1 in the text of each of the two classes. Assume there are
20,000 words in the vocabulary (k = 20, 000) and the total size of the corpus is 100,000 words.
Fortunately, it turns out that there is a simple remedy: we can normalize the word counts in each
class so that the total size of the classes is the same for both classes after normalization[17].
III. Experimental results
In order to evaluate the performance of the proposed method, our algorithm has been tested
using 478 documents from www.shamela.ws, which are independently classified into seven
categories, 70% for data Training and 30% for data testing. The number of words (features) in this
dataset is 1578, in our experiments, we have successfully reduced the number of words (features) into
Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification…
DOI: 10.9790/0661-17664551 www.iosrjournals.org 49 | Page
618 using CFS-PSO filtering algorithm and then the second phase we reduce result features filtering
using wrapper Genetic Algorithm (GA) for optimizing features into 266. We have compared the
accuracy our proposed method NBM-PSO-GA with NBM without filtering and NBM using
Algorithm PSO for filtering with the same dataset.
3.1. The classification results of the NBM, NBM-PSO, and NBM-PSO-GA
The classification results of the NBM, NBM-PSO, and NBM-PSO-GA Classifier are shown
in Tables 3.2, 3.3, and 3.4 respectively.
Tabel 3.1. Number of documents selected from Dataset
Class Name Doc Num.
Sholat
ÕáÇÉ
136
Zakat
ÇáÒßÇÉ
36
Shaum
ÕíÇã
42
Hajj
ÇáÍÇÌ
64
mutazawwij
ãÊÒæÌ
46
Baa'aand Isytaro
ÇáÈíÚ æÇáÔÑÇÁ
109
Wakaf
ÇáÃæÞÇÝ
45
As shown in Table 3.2, when the Naive Bayes Multinomial classifier is used, the Accuracy
Rate, Recall Rate, F-Measure and Precision of the entire classification result Accuracy Rate are
81.8%. The table indicates that Recall is 0.82, Precession is 0.82 and F-Measure are 0.828. Moreover,
the table indicates that “Sholat” classification Accuracy can reach 97.4 % and the Accuracy Rate of
the „„Shaum” class is as low as 66.7% and this influences the entire classification result.
As shown in Table 3.3, when the Naive Bayes Multinomial with a features filter selection
(PSO) classifier is used, the Accuracy Rate, Recall Rate, F-Measure and Precision of the entire
classification result Accuracy Rate are 81.8%. The table indicates that Recall is 0.82, Precession is
0.84, and F-Measure is 0.82. Moreover, the table indicates that “Sholat" classification accuracy can
reach 97,4% and the Accuracy Rate of the „„Baa'aand Isytaro” class is as low as 68,8% and this very
influences the entire classification result.
Finally, the method we propose, as shown in Table 3.3. Naive Bayes Multinomial with
Hybrid features selection with Filter(PSO) and Wrapper(GA) classifier is used, the Accuracy Rate,
Recall Rate, F-Measure and Precision of the entire classification result Accuracy Rate are 90,2%,
Recall is 0.90, Precession is 0.91 and F-Measure is 0.90. Moreover, the table indicates that “Sholat”
classification accuracy can reach 97,z% and the Accuracy Rate of the „„Baa'aand Isytaro” class is as
low as 71,4% and this very influences the entire classification result.
Tabel 3.2. Classification results of the Naive Bayes Multinomial
Class Name Recall Precision F-Measure Accurate
Sholat ‫صالة‬ 0.97 0.93 0.95 97.4
Zakat ‫اة‬ ‫زك‬ ‫ال‬ 0.82 0.69 0.75 81.8
Shaum ‫يام‬ ‫ص‬ 0.67 0.75 0.71 66.7
Hajj ‫حاج‬ ‫ال‬ 0.86 0.67 0.75 85.7
Mutazawwij ‫تزوج‬ ‫م‬ 0.71 0.83 0.77 71.4
Baa'aand Isytaro
‫يع‬ ‫ب‬ ‫ال‬ ‫شراء‬ ‫وال‬ 0.75 0.84 0.79 75
Wakaf ‫اف‬ ‫األوق‬ 0.73 0.79 0.76 73.3
Total 0.82 0.82 0.82 81.8
Tabel 3.3. Classification results of the Naive Bayes Multinomial Filtering with PSO
Class Name Recall Precision F-Measure Accurate
Sholat ‫صالة‬ 0.97 0.93 0.95 97.4
Zakat ‫اة‬ ‫زك‬ ‫ال‬ 0.91 0.71 0.80 90.9
Shaum ‫يام‬ ‫ص‬ 0.78 0.70 0.74 77.8
Hajj ‫حاج‬ ‫ال‬ 0.93 0.65 0.77 92.9
Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification…
DOI: 10.9790/0661-17664551 www.iosrjournals.org 50 | Page
Mutazawwij ‫تزوج‬ ‫م‬ 0.71 0.71 0.71 71.4
Baa'aand Isytaro
‫يع‬ ‫ب‬ ‫ال‬ ‫شراء‬ ‫وال‬ 0.69 0.92 0.79 68.8
Wakaf ‫اف‬ ‫األوق‬ 0.73 0.73 0.73 73.3
Total 0.82 0.84 0.82 81.8
Tabel 3.4. Classification results of the Naive Bayes Multinomial Filtering PSO + Wrapper GA
Class Name Recall Precision F-Measure Accurate
Sholat ‫صالة‬ 0.97 0.93 0.95 97.4
Zakat ‫اة‬ ‫زك‬ ‫ال‬ 0.91 0.83 0.87 90.9
Shaum ‫يام‬ ‫ص‬ 0.89 0.89 0.89 88.9
Hajj ‫حاج‬ ‫ال‬ 0.93 0.81 0.87 92.9
Mutazawwij ‫تزوج‬ ‫م‬ 0.71 0.71 0.71 71.4
Baa'aand Isytaro
‫يع‬ ‫ب‬ ‫ال‬
‫شراء‬ ‫وال‬ 0.88 0.96 0.91 87.5
Wakaf ‫اف‬ ‫األوق‬ 0.87 0.93 0.90 86.7
Total 0.90 0.91 0.90 90.2
Conclusions of 3 The table above shows that the method that we propose to show the value of
the highest accuracy is 85.90%, compared with the NBM and NBM-PSO, and the lowest for the
accuracy of the 81% that NBM-PSO.
Figure 3 Accuracy with number of features
From Figure 3.4 it can be seen, features selection using Filtering PSO, successful can reduce
features from 1578 into 493, with the same accuracy of the classification without any filtering, and the
we propose method is the result of filtering of the PSO, we Optimizing features using GA , and the
results are in addition to the number of features is reduced to 266, accuracy is also increased by 8.4%.
IV. Conclusion and future work
This paper proposed the integration of NBM-PSO-GA methods for Arabic Document
Classification, our experiments revealed the important of feature selection process in building a
classifier. The integration of PSO+GA has greatly reduced the features by keeping the resources to a
minimum while at the same time improves the classification accuracy. The accuracy rate for our
method is 85.89%. For future work, we will try to combine the two methods of classification with
other selection features for optimal results.
Reference
[1]. Chakrabarti, S., 2000. Data mining for hypertext: A tutorial survey. SIGKDD explorations 1(2), ACM SIGKDD.
[2]. Salton, G., 1989. Automatic Text Procession: The Transformation, Analysis, and Retrieval of Information by Computer. Addison
Wesley, Reading, PA.
[3]. Rifkin, R.M., 2002. Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning, Ph.D.
Dissertation, Sloan School of Management Science, September, MIT.
[4]. Salton, G., 1989. Automatic Text Procession: The Transformation, Analysis, and Retrieval of Information by Computer. Addison
Wesley, Reading, PA.
[5]. Yang, Y., Pedersen, J., 1997. A comparative study on feature selection in text categorization. In: Proceedings of ICML-97,
Fourteenth International Conference on Machine Learning.
[6]. John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant Features and the Subset
Selection Problem. In Proceedings of the 11th International Conference on machine
learning (pp. 121–129). San Francisco: Morgan Kaufmann
[7]. A. McCallum, K. Nigam, A comparison of event models for naive Bayes text classification, in: AAAI-98 Workshop on Learning
for Text Categorization, 752, 1998, pp. 41–48.
Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification…
DOI: 10.9790/0661-17664551 www.iosrjournals.org 51 | Page
[8]. G. Forman, An extensive empirical study of feature selection metrics for text classification, J. Mach. Learn. Res. 3 (2003) 1289–
1305.
[9]. M. Rogati, Y. Yang, High-performing variable selection for text classification, in:
CIKM ‟02 Proceedings of the 11th International Conference on Information and
Knowledge Management, 2002, pp. 659–661.
[10]. Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization, in: The Fourteenth International
Conference on Machine Learning (ICML 97), 1997, pp. 412–420.
[11]. J .Kennedy, R.Eberhart, “Particle Swarm Optimization",In :Proc IEEE Int. Conf. On Neural Networks, Perth, pp. 1942-1948, 1995.
[12]. Matthew Settles, “An Introduction to Particle Swarm Optimization”, November 2007.
[13]. X. Zhiwei and W. Xinghua, “Research for information extraction based on wrapper model algorithm,” in 2010 SecondInternational
Conference on Computer Research and Development, Kuala Lumpur, Malaysia, 2010, pp. 652–655
[14]. D. Goldberg, Genetic Algorithms in Search, Optimization, andMachine Learning. Addison-Wesley, Reading, MA, 1989.
[15]. Asha Gowda Karegowda, M.A.Jayaram, A.S .Manjunath.”Feature Subset Selection using Cascaded GA & CFS:A Filter Approach
in Supervised Learning”. International Journal of Computer Applications (0975 – 8887) , Volume 23– No.2, June 2011
[16]. Hall, Mark A and Smith, Lloyd A, " Feature subset selection: a correlation based filter approach", Springer, 1997.
[17]. Eibe Frank1 and Remco R. Bouckaert ,” Naive Bayes for Text Classification with
Unbalanced Classes” Computer Science Department, University of Waikato,2006

More Related Content

What's hot

A Novel Approach for Developing Paraphrase Detection System using Machine Lea...
A Novel Approach for Developing Paraphrase Detection System using Machine Lea...A Novel Approach for Developing Paraphrase Detection System using Machine Lea...
A Novel Approach for Developing Paraphrase Detection System using Machine Lea...Rudradityo Saha
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...cscpconf
 
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...idescitation
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Miningijsc
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...ijcsit
 
The comparison of the text classification methods to be used for the analysis...
The comparison of the text classification methods to be used for the analysis...The comparison of the text classification methods to be used for the analysis...
The comparison of the text classification methods to be used for the analysis...ijcsit
 
Text Classification using Support Vector Machine
Text Classification using Support Vector MachineText Classification using Support Vector Machine
Text Classification using Support Vector Machineinventionjournals
 
Implementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic ParserImplementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic ParserWaqas Tariq
 
A Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionA Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionAM Publications
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESkevig
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
 
Study on Relavance Feature Selection Methods
Study on Relavance Feature Selection MethodsStudy on Relavance Feature Selection Methods
Study on Relavance Feature Selection MethodsIRJET Journal
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...IOSR Journals
 

What's hot (17)

A Novel Approach for Developing Paraphrase Detection System using Machine Lea...
A Novel Approach for Developing Paraphrase Detection System using Machine Lea...A Novel Approach for Developing Paraphrase Detection System using Machine Lea...
A Novel Approach for Developing Paraphrase Detection System using Machine Lea...
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
 
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...
 
The comparison of the text classification methods to be used for the analysis...
The comparison of the text classification methods to be used for the analysis...The comparison of the text classification methods to be used for the analysis...
The comparison of the text classification methods to be used for the analysis...
 
Text Classification using Support Vector Machine
Text Classification using Support Vector MachineText Classification using Support Vector Machine
Text Classification using Support Vector Machine
 
Implementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic ParserImplementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic Parser
 
A Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionA Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept Extraction
 
G04124041046
G04124041046G04124041046
G04124041046
 
Ju3517011704
Ju3517011704Ju3517011704
Ju3517011704
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
Study on Relavance Feature Selection Methods
Study on Relavance Feature Selection MethodsStudy on Relavance Feature Selection Methods
Study on Relavance Feature Selection Methods
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
 

Viewers also liked

A Retrospective Study to Investigate Association among Age, BMI and BMD in th...
A Retrospective Study to Investigate Association among Age, BMI and BMD in th...A Retrospective Study to Investigate Association among Age, BMI and BMD in th...
A Retrospective Study to Investigate Association among Age, BMI and BMD in th...IOSR Journals
 
Buyer-Vendor Integrated system – the Technique of EOQ Dependent Shipment Size...
Buyer-Vendor Integrated system – the Technique of EOQ Dependent Shipment Size...Buyer-Vendor Integrated system – the Technique of EOQ Dependent Shipment Size...
Buyer-Vendor Integrated system – the Technique of EOQ Dependent Shipment Size...IOSR Journals
 
Urus Pendirian PT (Perseroan Terbatas)
Urus Pendirian PT (Perseroan Terbatas)Urus Pendirian PT (Perseroan Terbatas)
Urus Pendirian PT (Perseroan Terbatas)legalservice
 
A Correlative Study of Cardiovascular Response to Sustained Hand Grip in Heal...
A Correlative Study of Cardiovascular Response to Sustained Hand Grip in Heal...A Correlative Study of Cardiovascular Response to Sustained Hand Grip in Heal...
A Correlative Study of Cardiovascular Response to Sustained Hand Grip in Heal...IOSR Journals
 
Satin bridesmaid dresses gudeer.com
Satin bridesmaid dresses   gudeer.comSatin bridesmaid dresses   gudeer.com
Satin bridesmaid dresses gudeer.comGudeer Kitty
 
Urus Izin Prinsip Perubahan – PMA / PMDN
Urus Izin Prinsip Perubahan – PMA / PMDNUrus Izin Prinsip Perubahan – PMA / PMDN
Urus Izin Prinsip Perubahan – PMA / PMDNlegalservice
 
Urus Izin Prinsip Perubahan – PMA / PMDN
Urus Izin Prinsip Perubahan – PMA / PMDNUrus Izin Prinsip Perubahan – PMA / PMDN
Urus Izin Prinsip Perubahan – PMA / PMDNlegalservice
 
Jasa Urus Setifikat ISO EXPRESS
Jasa Urus Setifikat ISO EXPRESSJasa Urus Setifikat ISO EXPRESS
Jasa Urus Setifikat ISO EXPRESSlegalservice
 
Logos and their hidden meanings
Logos and their hidden meaningsLogos and their hidden meanings
Logos and their hidden meaningsSrihith Bharadwaj
 
Conoscere l'editoria 1
Conoscere l'editoria 1Conoscere l'editoria 1
Conoscere l'editoria 1Cosi Repossi
 

Viewers also liked (17)

Sudhir
SudhirSudhir
Sudhir
 
A Retrospective Study to Investigate Association among Age, BMI and BMD in th...
A Retrospective Study to Investigate Association among Age, BMI and BMD in th...A Retrospective Study to Investigate Association among Age, BMI and BMD in th...
A Retrospective Study to Investigate Association among Age, BMI and BMD in th...
 
Buyer-Vendor Integrated system – the Technique of EOQ Dependent Shipment Size...
Buyer-Vendor Integrated system – the Technique of EOQ Dependent Shipment Size...Buyer-Vendor Integrated system – the Technique of EOQ Dependent Shipment Size...
Buyer-Vendor Integrated system – the Technique of EOQ Dependent Shipment Size...
 
NPIK GULA
NPIK GULANPIK GULA
NPIK GULA
 
Urus Pendirian PT (Perseroan Terbatas)
Urus Pendirian PT (Perseroan Terbatas)Urus Pendirian PT (Perseroan Terbatas)
Urus Pendirian PT (Perseroan Terbatas)
 
A Correlative Study of Cardiovascular Response to Sustained Hand Grip in Heal...
A Correlative Study of Cardiovascular Response to Sustained Hand Grip in Heal...A Correlative Study of Cardiovascular Response to Sustained Hand Grip in Heal...
A Correlative Study of Cardiovascular Response to Sustained Hand Grip in Heal...
 
Parker Cameron PPP
Parker Cameron PPPParker Cameron PPP
Parker Cameron PPP
 
Satin bridesmaid dresses gudeer.com
Satin bridesmaid dresses   gudeer.comSatin bridesmaid dresses   gudeer.com
Satin bridesmaid dresses gudeer.com
 
Urus Izin Prinsip Perubahan – PMA / PMDN
Urus Izin Prinsip Perubahan – PMA / PMDNUrus Izin Prinsip Perubahan – PMA / PMDN
Urus Izin Prinsip Perubahan – PMA / PMDN
 
Technology Commercialization Center (Russian presentation)
Technology Commercialization Center (Russian presentation)Technology Commercialization Center (Russian presentation)
Technology Commercialization Center (Russian presentation)
 
Urus Izin Prinsip Perubahan – PMA / PMDN
Urus Izin Prinsip Perubahan – PMA / PMDNUrus Izin Prinsip Perubahan – PMA / PMDN
Urus Izin Prinsip Perubahan – PMA / PMDN
 
Dad- importance in life
Dad- importance in lifeDad- importance in life
Dad- importance in life
 
Jasa Urus Setifikat ISO EXPRESS
Jasa Urus Setifikat ISO EXPRESSJasa Urus Setifikat ISO EXPRESS
Jasa Urus Setifikat ISO EXPRESS
 
NPIK JAGUNG
NPIK JAGUNGNPIK JAGUNG
NPIK JAGUNG
 
Logos and their hidden meanings
Logos and their hidden meaningsLogos and their hidden meanings
Logos and their hidden meanings
 
Conoscere l'editoria 1
Conoscere l'editoria 1Conoscere l'editoria 1
Conoscere l'editoria 1
 
Iklan penawaran 1
Iklan penawaran 1Iklan penawaran 1
Iklan penawaran 1
 

Similar to G017664551

Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
 
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...IAESIJAI
 
Survey on Efficient Techniques of Text Mining
Survey on Efficient Techniques of Text MiningSurvey on Efficient Techniques of Text Mining
Survey on Efficient Techniques of Text Miningvivatechijri
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILESAN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILEScscpconf
 
An efficient feature selection in
An efficient feature selection inAn efficient feature selection in
An efficient feature selection incsandit
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval forWaheeb Ahmed
 
AN EFFICIENT FEATURE SELECTION MODEL FOR IGBO TEXT
AN EFFICIENT FEATURE SELECTION MODEL FOR IGBO TEXTAN EFFICIENT FEATURE SELECTION MODEL FOR IGBO TEXT
AN EFFICIENT FEATURE SELECTION MODEL FOR IGBO TEXTIJDKP
 
IRJET- Survey of Feature Selection based on Ant Colony
IRJET- Survey of Feature Selection based on Ant ColonyIRJET- Survey of Feature Selection based on Ant Colony
IRJET- Survey of Feature Selection based on Ant ColonyIRJET Journal
 
Automatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for ClusteringAutomatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for Clusteringidescitation
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHIJDKP
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
Classification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmClassification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmeSAT Publishing House
 
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL IJCSEA Journal
 
Applying genetic algorithms to information retrieval using vector space model
Applying genetic algorithms to information retrieval using vector space modelApplying genetic algorithms to information retrieval using vector space model
Applying genetic algorithms to information retrieval using vector space modelIJCSEA Journal
 
An Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text ClassificationAn Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text ClassificationIJCSIS Research Publications
 
Improved feature selection using a hybrid side-blotched lizard algorithm and ...
Improved feature selection using a hybrid side-blotched lizard algorithm and ...Improved feature selection using a hybrid side-blotched lizard algorithm and ...
Improved feature selection using a hybrid side-blotched lizard algorithm and ...IJECEIAES
 

Similar to G017664551 (20)

Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)
 
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...
 
Survey on Efficient Techniques of Text Mining
Survey on Efficient Techniques of Text MiningSurvey on Efficient Techniques of Text Mining
Survey on Efficient Techniques of Text Mining
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILESAN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
 
An efficient feature selection in
An efficient feature selection inAn efficient feature selection in
An efficient feature selection in
 
I017235662
I017235662I017235662
I017235662
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval for
 
AN EFFICIENT FEATURE SELECTION MODEL FOR IGBO TEXT
AN EFFICIENT FEATURE SELECTION MODEL FOR IGBO TEXTAN EFFICIENT FEATURE SELECTION MODEL FOR IGBO TEXT
AN EFFICIENT FEATURE SELECTION MODEL FOR IGBO TEXT
 
T0 numtq0n tk=
T0 numtq0n tk=T0 numtq0n tk=
T0 numtq0n tk=
 
IRJET- Survey of Feature Selection based on Ant Colony
IRJET- Survey of Feature Selection based on Ant ColonyIRJET- Survey of Feature Selection based on Ant Colony
IRJET- Survey of Feature Selection based on Ant Colony
 
Automatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for ClusteringAutomatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for Clustering
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Classification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmClassification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithm
 
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
 
Applying genetic algorithms to information retrieval using vector space model
Applying genetic algorithms to information retrieval using vector space modelApplying genetic algorithms to information retrieval using vector space model
Applying genetic algorithms to information retrieval using vector space model
 
An Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text ClassificationAn Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text Classification
 
Improved feature selection using a hybrid side-blotched lizard algorithm and ...
Improved feature selection using a hybrid side-blotched lizard algorithm and ...Improved feature selection using a hybrid side-blotched lizard algorithm and ...
Improved feature selection using a hybrid side-blotched lizard algorithm and ...
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

G017664551

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. VI (Nov – Dec. 2015), PP 45-51 www.iosrjournals.org DOI: 10.9790/0661-17664551 www.iosrjournals.org 45 | Page Filter-Wrapper Approach to Feature Selection Using PSO-GA for Arabic Document Classification with Naive Bayes Multinomial Indriyani1) , Wawan Gunawan 2) , Ardhon Rakhmadi3) 1, 2, 3) Department of Informatics, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember, Jl. Teknik Kimia, Sukolilo, Surabaya, 60117, Indonesia Abstract: Text categorization and feature selection are two of the many text data mining problems. In text categorization, the document that contains a collection of text will be changed to the dataset format, the dataset that consists of features and class, words become features and categories of documents become class on this dataset. The number of features that too many can cause a decrease in performance of classifier because many of the features that are redundant and not optimal so that feature selection is required to select the optimal features. This paper proposed a feature selection strategy based on Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) methods for Arabic Document Classification with Naive Bayes Multinomial (NBM). Particle Swarm Optimization (PSO) is adopted in the first phase with the aim to eliminate the insignificant features and prepared the reduce features to the next phase. In the second phase, the reduced features are optimized using the new evolutionary computation method, Genetic Algorithm (GA). These methods have greatly reduced the features and achieved higher classification compared with full features without features selection. From the experiment that has been done the obtained results of accuracy are NBM 85.31%, NBM-PSO 83.91% and NBM-PSO-GA 90.20%. Keywords: Document Classification, Feature Selection, Particle Swarm Optimization (PSO), Genetic Algorithm (GA). I. Introduction Text data mining[1] is a research domain involving many research areas, such as natural language processing, machine learning, information retrieval[2], and data mining. Text categorization and feature selection are two of the many text data mining problems. The text document categorization problem has been studied by many researchers[3][4][5]. Yang et al. showed comparative research on feature selection in text classification[5]. Many machine learning methods have been used for the text categorization problem, but the problem of feature selection is to find a subset of features for optimal classification. Even some noise features may sharply reduce the classification accuracy. Furthermore, a high number of features can slow down the classification process or even make some classifiers inapplicable. According to John, Kohavi, and Pfleger[6], there are mainly two types of feature selection methods in machine learning: wrappers and filters. Wrappers use the classification accuracy of some learning algorithm as their evaluation function. Since wrappers have to train a classifier for each feature subset to be evaluated, they are usually much more time consuming especially when the number of features is high. So wrappers are generally not suitable for text classification. Hence, feature selection is commonly used in text classification to reduce the dimensionality of feature space and improve the efficiency and accuracy of classifiers. Text classification tasks can usually be accurately performed with less than 100 words in simple cases, and do best with words in the thousands in the complex ones[7]. As opposed to wrappers, filters perform feature selection independently of the learning algorithm that will use the selected features. In order to evaluate a feature, filters use an evaluation metric that measures the ability of the feature to differentiate each class. Since wrappers have to train a classifier for each feature subset to be evaluated, they are usually much more time consuming especially when the number of features is high. So wrappers are generally not suitable for text classification. For this reason, our motivation is to build a good text classifier by investigating hybrid filter and wrapper method for feature selection. Finally, our proposed algorithm Particle Swarm Optimation
  • 2. Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification… DOI: 10.9790/0661-17664551 www.iosrjournals.org 46 | Page for filtering and second phase, we reduce result features filtering using wrapper Genetic Algorithm (GA) for optimizing features for Classification using Naive Bayes Multinominal for Arabic Document text Classification. The evaluation used an Arabic corpus that consists of 478 documents from www.shamela.ws, which are independently classified into seven categories. II. Methodology General description of the research method is shown in Figure 1. The stages and the methods used to lead this study include. Stages and methods used to underpin this study include: A. DataSet Document Document used as experimental data taken from www.shamela.ws and taken seven categories according to the most often discussed is Sholat, Zakat, Shaum, hajj, mutazawwij, Baa'aand Isytaro and Wakaf/ B. PreProcessing Fig 1. Research Stage In the process of preprocessing, each document is converted to vector document news with the following sequence: - Filtering, namely the elimination of illegal characters (numbers and symbols) in the document's contents. - Stoplist removal, i.e. removal of the characters included in the category of stopword or words that have a high frequency contained in the data stoplist. - Terms extraction, which extract the terms (words) of each document to be processed and compiled into a vector of terms that represent the document. - Stemming, namely to restore the basic shape of each term found in the document vector and grouping based on the terms similar. - TF-IDF weighting i.e. performs weighting TF-IDF on any terms that exist in the document vector. C. Filter - Particle Swarm Optimization (PSO) Particle swarm optimization (PSO) is an evolutionary computation technique developed by Kennedy and Eberhart [11]. The Particle Swarms find optimal regions of the complex search space through the interaction of individuals in the population. PSO is attractive for feature selection in that particle swarms will discover best feature combinations as they fly within the subset space.During
  • 3. Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification… DOI: 10.9790/0661-17664551 www.iosrjournals.org 47 | Page movement, the current position of particle i is represented by a vector xi = (xi1, xi2, ..., xiD), where D is the dimensionality of the search space. The velocity of particle i is represented as vi = (vi1, vi2, ..., viD) and it must be in a range defined by parameters vmin and vmax. The personal best position of the particle local best is the best previous position of that particle and the best position obtained by the population thus far is called global best. According to the following equations, PSO searches for the optimal solution by updating the velocity and the position of each particle: xid(t+1) = xid(t) + vid(t+1) (1) vid(t+1) = w * vid(t) +c1* r1i * ( pid - xid(t)) + c2* r2i * ( pgd - xid(t)) (2) where t denotes the tth iteration, d ϵ D denotes the dth dimension in the search space, w is inertia weight, c1 and c2are the learning factors called, respectively, cognitive parameter, social parameter, r1i and r2i are random values uniformly distributed in [0, 1], and finally pid and pgd represent the elements of local best and global best in the dth dimension. The personal best position of particle i is calculated as (3) In this work, PSO is a filter with CFS (Correlation-based Feature Selection) as a fitness function. Like the majority of feature selection techniques, CFS uses a search algorithm along with a function to evaluate the worth of feature subsets. CFS measures the usefulness of each feature for predicting the class label along with the level of inter correlation among them, based on the hypothesis: Good feature subsets contain features highly correlated (predictive of) with the class, yet uncorrelated with (not predictive of)each other [12]. D. Design of the Wrapper Phase After filtering, the next phase is the wrapper, Wrapper methods used previously usually adopt random search strategies, such as Adaptive Genetic Algorithm (AGA), Chaotic Binary Particle Swarm Optimization (CBPSO) and Clonal Selection Algorithm (CSA). The wrapper approach consists of methods choosing a minimum subset of features that satisfies an evaluation criterion. It was proved that the wrapper approach produces the best results out of the feature selection methods [13], although this is a time-consuming method since each feature subset considered must be evaluated with the classifier algorithm. to overcome this, we perform filtering using the PSO in advance so that we can reduce the execution time.In the wrapper method, the features subset selection algorithm exists as a wrapper around the data mining algorithm and outcome evaluation.The induction algorithm is used as a black box. The feature selection algorithm conducts a search for a proper subset using the induction algorithm itself as a part of the evaluation function. GA-based wrapper methods involve a Genetic Algorithm (GA) as a search method of subset features. GA is a random search method, effectively exploring large search spaces [14]. The basic idea of GA is to evolve a population of individuals (chromosomes), where an individual is a possible solution to a given problem. In the case of searching the appropriate subset of features, a population consists of different subsets evolved by a mutation, a crossover, and selection operations. After reaching maximum generations, algorithms returns the chromosome with the highest fitness, i.e. the subset of features with the highest accuracy.
  • 4. Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification… DOI: 10.9790/0661-17664551 www.iosrjournals.org 48 | Page Figure 2. GA-based wrapper feature selection with Naive Bayes Multinomial as an induction algorithm evaluating feature subset. E. Naive Bayes Multinomial Multinomial Naive Bayes (MNB) is the version of Naive Bayes that is commonly used for text categorization problems. In the MNB classifier, each document is viewed as a collection of words and the order of words is considered irrelevant. The probability of a class value c given a test document d is computed as (4) where nwd is the number of times word w occurs in document d, P (w|c) is the probability of observing word w given class c, P (c) is the prior probability of class c, and P (d) is a constant that makes the probabilities for the different classes sum to one. P (c) is estimated by the proportion of training documents pertaining to class c and P (w|c) is estimated as (5) Many text categorization problems are unbalanced. This can cause problems because of the Laplace correction used in. Consider a word w in a two class problem with classes c1 and c2, where w is completely irrelevant to the classification. This means the odds ratio for that particular word should be one, i.e. , so that this word does not influence the class probability. Assume the word occurs with equal relative frequency 0.1 in the text of each of the two classes. Assume there are 20,000 words in the vocabulary (k = 20, 000) and the total size of the corpus is 100,000 words. Fortunately, it turns out that there is a simple remedy: we can normalize the word counts in each class so that the total size of the classes is the same for both classes after normalization[17]. III. Experimental results In order to evaluate the performance of the proposed method, our algorithm has been tested using 478 documents from www.shamela.ws, which are independently classified into seven categories, 70% for data Training and 30% for data testing. The number of words (features) in this dataset is 1578, in our experiments, we have successfully reduced the number of words (features) into
  • 5. Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification… DOI: 10.9790/0661-17664551 www.iosrjournals.org 49 | Page 618 using CFS-PSO filtering algorithm and then the second phase we reduce result features filtering using wrapper Genetic Algorithm (GA) for optimizing features into 266. We have compared the accuracy our proposed method NBM-PSO-GA with NBM without filtering and NBM using Algorithm PSO for filtering with the same dataset. 3.1. The classification results of the NBM, NBM-PSO, and NBM-PSO-GA The classification results of the NBM, NBM-PSO, and NBM-PSO-GA Classifier are shown in Tables 3.2, 3.3, and 3.4 respectively. Tabel 3.1. Number of documents selected from Dataset Class Name Doc Num. Sholat ÕáÇÉ 136 Zakat ÇáÒßÇÉ 36 Shaum ÕíÇã 42 Hajj ÇáÍÇÌ 64 mutazawwij ãÊÒæÌ 46 Baa'aand Isytaro ÇáÈíÚ æÇáÔÑÇÁ 109 Wakaf ÇáÃæÞÇÝ 45 As shown in Table 3.2, when the Naive Bayes Multinomial classifier is used, the Accuracy Rate, Recall Rate, F-Measure and Precision of the entire classification result Accuracy Rate are 81.8%. The table indicates that Recall is 0.82, Precession is 0.82 and F-Measure are 0.828. Moreover, the table indicates that “Sholat” classification Accuracy can reach 97.4 % and the Accuracy Rate of the „„Shaum” class is as low as 66.7% and this influences the entire classification result. As shown in Table 3.3, when the Naive Bayes Multinomial with a features filter selection (PSO) classifier is used, the Accuracy Rate, Recall Rate, F-Measure and Precision of the entire classification result Accuracy Rate are 81.8%. The table indicates that Recall is 0.82, Precession is 0.84, and F-Measure is 0.82. Moreover, the table indicates that “Sholat" classification accuracy can reach 97,4% and the Accuracy Rate of the „„Baa'aand Isytaro” class is as low as 68,8% and this very influences the entire classification result. Finally, the method we propose, as shown in Table 3.3. Naive Bayes Multinomial with Hybrid features selection with Filter(PSO) and Wrapper(GA) classifier is used, the Accuracy Rate, Recall Rate, F-Measure and Precision of the entire classification result Accuracy Rate are 90,2%, Recall is 0.90, Precession is 0.91 and F-Measure is 0.90. Moreover, the table indicates that “Sholat” classification accuracy can reach 97,z% and the Accuracy Rate of the „„Baa'aand Isytaro” class is as low as 71,4% and this very influences the entire classification result. Tabel 3.2. Classification results of the Naive Bayes Multinomial Class Name Recall Precision F-Measure Accurate Sholat ‫صالة‬ 0.97 0.93 0.95 97.4 Zakat ‫اة‬ ‫زك‬ ‫ال‬ 0.82 0.69 0.75 81.8 Shaum ‫يام‬ ‫ص‬ 0.67 0.75 0.71 66.7 Hajj ‫حاج‬ ‫ال‬ 0.86 0.67 0.75 85.7 Mutazawwij ‫تزوج‬ ‫م‬ 0.71 0.83 0.77 71.4 Baa'aand Isytaro ‫يع‬ ‫ب‬ ‫ال‬ ‫شراء‬ ‫وال‬ 0.75 0.84 0.79 75 Wakaf ‫اف‬ ‫األوق‬ 0.73 0.79 0.76 73.3 Total 0.82 0.82 0.82 81.8 Tabel 3.3. Classification results of the Naive Bayes Multinomial Filtering with PSO Class Name Recall Precision F-Measure Accurate Sholat ‫صالة‬ 0.97 0.93 0.95 97.4 Zakat ‫اة‬ ‫زك‬ ‫ال‬ 0.91 0.71 0.80 90.9 Shaum ‫يام‬ ‫ص‬ 0.78 0.70 0.74 77.8 Hajj ‫حاج‬ ‫ال‬ 0.93 0.65 0.77 92.9
  • 6. Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification… DOI: 10.9790/0661-17664551 www.iosrjournals.org 50 | Page Mutazawwij ‫تزوج‬ ‫م‬ 0.71 0.71 0.71 71.4 Baa'aand Isytaro ‫يع‬ ‫ب‬ ‫ال‬ ‫شراء‬ ‫وال‬ 0.69 0.92 0.79 68.8 Wakaf ‫اف‬ ‫األوق‬ 0.73 0.73 0.73 73.3 Total 0.82 0.84 0.82 81.8 Tabel 3.4. Classification results of the Naive Bayes Multinomial Filtering PSO + Wrapper GA Class Name Recall Precision F-Measure Accurate Sholat ‫صالة‬ 0.97 0.93 0.95 97.4 Zakat ‫اة‬ ‫زك‬ ‫ال‬ 0.91 0.83 0.87 90.9 Shaum ‫يام‬ ‫ص‬ 0.89 0.89 0.89 88.9 Hajj ‫حاج‬ ‫ال‬ 0.93 0.81 0.87 92.9 Mutazawwij ‫تزوج‬ ‫م‬ 0.71 0.71 0.71 71.4 Baa'aand Isytaro ‫يع‬ ‫ب‬ ‫ال‬ ‫شراء‬ ‫وال‬ 0.88 0.96 0.91 87.5 Wakaf ‫اف‬ ‫األوق‬ 0.87 0.93 0.90 86.7 Total 0.90 0.91 0.90 90.2 Conclusions of 3 The table above shows that the method that we propose to show the value of the highest accuracy is 85.90%, compared with the NBM and NBM-PSO, and the lowest for the accuracy of the 81% that NBM-PSO. Figure 3 Accuracy with number of features From Figure 3.4 it can be seen, features selection using Filtering PSO, successful can reduce features from 1578 into 493, with the same accuracy of the classification without any filtering, and the we propose method is the result of filtering of the PSO, we Optimizing features using GA , and the results are in addition to the number of features is reduced to 266, accuracy is also increased by 8.4%. IV. Conclusion and future work This paper proposed the integration of NBM-PSO-GA methods for Arabic Document Classification, our experiments revealed the important of feature selection process in building a classifier. The integration of PSO+GA has greatly reduced the features by keeping the resources to a minimum while at the same time improves the classification accuracy. The accuracy rate for our method is 85.89%. For future work, we will try to combine the two methods of classification with other selection features for optimal results. Reference [1]. Chakrabarti, S., 2000. Data mining for hypertext: A tutorial survey. SIGKDD explorations 1(2), ACM SIGKDD. [2]. Salton, G., 1989. Automatic Text Procession: The Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley, Reading, PA. [3]. Rifkin, R.M., 2002. Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning, Ph.D. Dissertation, Sloan School of Management Science, September, MIT. [4]. Salton, G., 1989. Automatic Text Procession: The Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley, Reading, PA. [5]. Yang, Y., Pedersen, J., 1997. A comparative study on feature selection in text categorization. In: Proceedings of ICML-97, Fourteenth International Conference on Machine Learning. [6]. John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant Features and the Subset Selection Problem. In Proceedings of the 11th International Conference on machine learning (pp. 121–129). San Francisco: Morgan Kaufmann [7]. A. McCallum, K. Nigam, A comparison of event models for naive Bayes text classification, in: AAAI-98 Workshop on Learning for Text Categorization, 752, 1998, pp. 41–48.
  • 7. Filter-Wrapper Approach to Feature Selection Using PSO-GA For Arabic Document Classification… DOI: 10.9790/0661-17664551 www.iosrjournals.org 51 | Page [8]. G. Forman, An extensive empirical study of feature selection metrics for text classification, J. Mach. Learn. Res. 3 (2003) 1289– 1305. [9]. M. Rogati, Y. Yang, High-performing variable selection for text classification, in: CIKM ‟02 Proceedings of the 11th International Conference on Information and Knowledge Management, 2002, pp. 659–661. [10]. Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization, in: The Fourteenth International Conference on Machine Learning (ICML 97), 1997, pp. 412–420. [11]. J .Kennedy, R.Eberhart, “Particle Swarm Optimization",In :Proc IEEE Int. Conf. On Neural Networks, Perth, pp. 1942-1948, 1995. [12]. Matthew Settles, “An Introduction to Particle Swarm Optimization”, November 2007. [13]. X. Zhiwei and W. Xinghua, “Research for information extraction based on wrapper model algorithm,” in 2010 SecondInternational Conference on Computer Research and Development, Kuala Lumpur, Malaysia, 2010, pp. 652–655 [14]. D. Goldberg, Genetic Algorithms in Search, Optimization, andMachine Learning. Addison-Wesley, Reading, MA, 1989. [15]. Asha Gowda Karegowda, M.A.Jayaram, A.S .Manjunath.”Feature Subset Selection using Cascaded GA & CFS:A Filter Approach in Supervised Learning”. International Journal of Computer Applications (0975 – 8887) , Volume 23– No.2, June 2011 [16]. Hall, Mark A and Smith, Lloyd A, " Feature subset selection: a correlation based filter approach", Springer, 1997. [17]. Eibe Frank1 and Remco R. Bouckaert ,” Naive Bayes for Text Classification with Unbalanced Classes” Computer Science Department, University of Waikato,2006