Spam is defined as redundant and unwanted electronica letters, and nowadays, it has created many problems in business life such as
occupying networks bandwidth and the space of user’s mailbox. Due to these problems, much research has been carried out in this
regard by using classification technique. The resent research show that feature selection can have positive effect on the efficiency of
machine learning algorithm. Most algorithms try to present a data model depending on certain detection of small set of features.
Unrelated features in the process of making model result in weak estimation and more computations. In this research it has been tried
to evaluate spam detection in legal electronica letters, and their effect on several Machin learning algorithms through presenting a
feature selection method based on genetic algorithm. Bayesian network and KNN classifiers have been taken into account in
classification phase and spam base dataset is used.
Proposed efficient algorithm to filter spam using machine learning techniques
Ali Shafigh Aski a, *, Navid Khalilzadeh Sourati b
a Department of Computer Engineering, Islamic Azad University, Sari Branch, Islamic Republic of Iran
b Islamic Azad University of Amol, Ayatollah Amoli Branch, Islamic Republic of Iran
Identification of Spam Emails from Valid Emails by Using VotingEditor IJCATR
In recent years, the increasing use of e-mails has led to the emergence and increase of problems caused by mass unwanted
messages which are commonly known as spam. In this study, by using decision trees, support vector machine, Naïve Bayes theorem
and voting algorithm, a new version for identifying and classifying spams is provided. In order to verify the proposed method, a set of
a mails are chosen to get tested. First three algorithms try to detect spams, and then by using voting method, spams are identified. The
advantage of this method is utilizing a combination of three algorithms at the same time: decision tree, support vector machine and
Naïve Bayes method. During the evaluation of this method, a data set is analyzed by Weka software. Charts prepared in spam
detection indicate improved accuracy compared to the previous methods.
Analysis of an image spam in email based on content analysisijnlc
Researchers initially have addressed the problem of spam detection as a text classification or
categorization problem. However, as spammers’ continue to develop new techniques and the type of email
content becomes more disparate, text-based anti-spam approaches alone are not sufficiently enough in
preventing spam. In an attempt to defeat the anti-spam development technologies, spammers have recently
adopted the image spam trick to make the scrutiny of emails’ body text inefficient. The main idea behind
this project is to design a spam detection system. The system will be enabled to analyze the content of
emails, in particular the artificially generated image sent as attachment in an email. The system will
analyze the image content and classify the embedded image as spam or legitimate hence classify the email
accordingly.
Proposed efficient algorithm to filter spam using machine learning techniques
Ali Shafigh Aski a, *, Navid Khalilzadeh Sourati b
a Department of Computer Engineering, Islamic Azad University, Sari Branch, Islamic Republic of Iran
b Islamic Azad University of Amol, Ayatollah Amoli Branch, Islamic Republic of Iran
Identification of Spam Emails from Valid Emails by Using VotingEditor IJCATR
In recent years, the increasing use of e-mails has led to the emergence and increase of problems caused by mass unwanted
messages which are commonly known as spam. In this study, by using decision trees, support vector machine, Naïve Bayes theorem
and voting algorithm, a new version for identifying and classifying spams is provided. In order to verify the proposed method, a set of
a mails are chosen to get tested. First three algorithms try to detect spams, and then by using voting method, spams are identified. The
advantage of this method is utilizing a combination of three algorithms at the same time: decision tree, support vector machine and
Naïve Bayes method. During the evaluation of this method, a data set is analyzed by Weka software. Charts prepared in spam
detection indicate improved accuracy compared to the previous methods.
Analysis of an image spam in email based on content analysisijnlc
Researchers initially have addressed the problem of spam detection as a text classification or
categorization problem. However, as spammers’ continue to develop new techniques and the type of email
content becomes more disparate, text-based anti-spam approaches alone are not sufficiently enough in
preventing spam. In an attempt to defeat the anti-spam development technologies, spammers have recently
adopted the image spam trick to make the scrutiny of emails’ body text inefficient. The main idea behind
this project is to design a spam detection system. The system will be enabled to analyze the content of
emails, in particular the artificially generated image sent as attachment in an email. The system will
analyze the image content and classify the embedded image as spam or legitimate hence classify the email
accordingly.
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approachesijtsrd
In this work, we have reviewed the issue of spam mail which is a big problem in the area of Internet. The growing size of uncalled mass e mail or spam has produced the requirement of a dependable anti spam filter. Now a days the Machine learning ML proedures are being employed to spontaneously filter the spam e mail in an effective manner. In this work, we have reviewed some of the prevalent ML approaches such as Rough sets, Bayesian classification, SVMs, k NN, ANNs and Artificial immune system and of their use fullness in the issue of spam Email taxonomy. We have provided the depictions of the procedures and the divergence of their enactment on the basis of the quantity of Spam Assassin. Anu | Ms. Preeti "A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-6 , October 2020, URL: https://www.ijtsrd.com/papers/ijtsrd33261.pdf Paper Url: https://www.ijtsrd.com/computer-science/data-processing/33261/a-deep-analysis-on-prevailing-spam-mail-filteration-machine-learning-approaches/anu
Empirical analysis of ensemble methods for the classification of robocalls in...IJECEIAES
With the advent of technology, there has been an excessive use of cellular phones. Cellular phones have made life convenient in our society. However, individuals and groups have subverted the telecommunication devices to deceive unwary victims. Robocalls are quite prevalent these days and they can either be legal or used by scammers to trick one out of their money. The proposed methodology in the paper is to experiment two ensemble models on the dataset acquired from the Federal Trade Commission(DNC Dataset). It is imperative to analyze the call records and based on the patterns the calls can classify as a robocall or not a robocall. Two algorithms Random Forest and XgBoost are combined in two ways and compared in the paper in terms of accuracy, sensitivity and the time taken.
A Novel Approach for Developing Paraphrase Detection System using Machine Lea...Rudradityo Saha
Plagiarism detection is difficult since there can be changes made to a sentence at several levels, namely, lexical, semantic, and syntactic level, to construct a paraphrased or plagiarized sentence posing as original. This project presents a novel Supervised Machine Learning Classification Paraphrase Detection System developed by conducting experiments using Microsoft Research Paraphrase (MSRP) Corpus and assessed on the same. The proposed paraphrase detection system has achieved comparable performance with existing paraphrase detection systems. The major contributions of this project are the utilization of a unique combination of lexical, semantic, and syntactic features, utilization of Shapley Additive Explanations (SHAP) Feature Importance Plots in XGBoost, and application of a soft voting classifier comprising of the top 3 performing standalone machine learning classifiers on the training dataset of MSRP Corpus. Another major contribution of the project is the finding that applying data augmentation techniques degrades the performance of machine learning classifiers.
Identifying Valid Email Spam Emails Using Decision TreeEditor IJCATR
The increasing use of e-mail and the growing trend of Internet users sending unsolicited bulk e-mail, the need for an antispam
filtering or have created, Filter large poster have been produced in this area, each with its own method and some parameters are
to recognize spam. The advantage of this method is the simultaneous use of two algorithms decision tree ID3 - Mamdani and Naive
Bayesian is fuzzy. The first two algorithms are then used to detect spam Bagging approach is to identify spam. In the evaluation of this
dataset contains a thousand letters have been analyzed by the software Weka charts provided in spam detection accuracy than previous
methods of improvement
03 fauzi indonesian 9456 11nov17 edit septianIAESIJEECS
Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmjournalBEEI
Indonesia is ranked the top 8th out of the total country population in the world for the global spammers. Web-based spam filter service with the REST API type can be used to detect email spam in the Indonesian language on the email server or various types of email server applications. With REST API, then there will be data exchange between the applications with JSON data type using existing HTTP commands. One type of spam filter commonly used is Bayesian Filtering, where the Naïve Bayes algorithm is used as a classification algorithm. Meanwhile, the N-gram method is used to increase the accuracy of the implementation of the Naïve Bayes algorithm in this study. N-gram and Naïve Bayes algorithms to detect spam email in the Indonesian language have successfully been implemented with accuracy around 0.615 until 0.94, precision at 0.566 until 0.924, recall at 0.96 until 1.00, and F-measure at 0.721 until 0.942. The best solution is found by using the 5-gram method with the highest score of accuracy at 0.94, precision at 0.924, recall at 0.96, and F-measure value at 0.942.
An incremental learning based framework for image spam filteringIJCSEA Journal
Nowadays, an image spam is an unsolved problem because of two reasons. One is due to the diversity of
spamming tricks. The other reason is due to the evolving nature of image spam. As new spam constantly
emerging, filters’ effectiveness drops over time. In this paper, we present an effective anti-spam approach
to solve the two problems. First, a novel clustering filter is proposed. By exploring the density-based
clustering algorithm, the proposed filter is robust to spamming tricks. Then, we present a hierarchical
framework by combining the clustering filter with other machine learning based classifiers to further
improve the filtering capacity. Moreover, incremental learning mechanism is integrated to ensure the
proposed framework be capable of adjusting itself to overcome new image spamming tricks. We evaluate
the proposed framework on two public spam corpora. The experiment results show that the proposed
framework achieves high precision along with low false positive rate.
A Heuristic Approach for Network Data Clusteringidescitation
In this growing world of technology there are lots of security threats received by
each and every area of computer networks. Most of the time the network security threats
produce high false positive and negative ratios, this creates an obstacle for any security
system to work improperly. The overwhelming threats make it challenging to understand
and manage the network data.
To address this problem we present a novel approach which eventually understand the
network data by clustering them without background knowledge of any threats according to
various parameters like source IP, Destination IP etc. And this approach saves
administrator’s time and energy in processing of large amount threats.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Extraction of Data Using Comparable Entity Miningiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Survey on: Sound Source Separation MethodsIJCERT
now a day’s multimedia databases are growing rapidly on large scale. For the effective management and exploration of large amount of music data the technology of singer identification is developed. With the help of this technology songs performed by particular singer can be clustered automatically. To improve the Performance of singer identification the technologies are emerged that can separate the singing voice from music accompaniment. One of the methods used for separating the singing voice from music accompaniment is non-negative matrix partial co factorization. This paper studies the different techniques for separation of singing voice from music accompaniment.
Efficient Filtering Algorithms for Location- Aware Publish/subscribeIJSRD
Location-based services have been mostly used in many systems. preceding systems uses a pull model or user-initiated model, where a user arrival a query to a server which gives response with location-aware answers. To offer outcomes to users with fast responses, a push model or server-initiated model is flattering an important computing model in the next-generation location-based services. In the push model, subscribers arrive spatio-textual subscriptions to fastening their curiosities, and publishers send spatio-textual messages. It is used for a high-performance location-aware publish/subscribe system to send publishers’ messages to valid subscribers. In this paper, we find the exploration happenstances that start in manipulative a location-aware publish/subscribe system. We recommend an R-tree based index by merging textual descriptions into R-tree nodes. We design efficient filtering algorithms and effective pruning techniques to accomplish high performance. This method can support likewise conjunctive queries and ranking queries.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Feature Selection Approach based on Firefly Algorithm and Chi-square IJECEIAES
Dimensionality problem is a well-known challenging issue for most classifiers in which datasets have unbalanced number of samples and features. Features may contain unreliable data which may lead the classification process to produce undesirable results. Feature selection approach is considered a solution for this kind of problems. In this paperan enhanced firefly algorithm is proposed to serve as a feature selection solution for reducing dimensionality and picking the most informative features to be used in classification. The main purpose of the proposedmodel is to improve the classification accuracy through using the selected features produced from the model, thus classification errors will decrease. Modeling firefly in this research appears through simulating firefly position by cell chi-square value which is changed after every move, and simulating firefly intensity by calculating a set of different fitness functionsas a weight for each feature. Knearest neighbor and Discriminant analysis are used as classifiers to test the proposed firefly algorithm in selecting features. Experimental results showed that the proposed enhanced algorithmbased on firefly algorithm with chisquare and different fitness functions can provide better results than others. Results showed that reduction of dataset is useful for gaining higher accuracy in classification.
Improved spambase dataset prediction using svm rbf kernel with adaptive boosteSAT Journals
Abstract Spam is no more garbage but risk as it includes virus attachments and spyware agents which make the recipients’ system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning algorithms have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should deal with it. In this paper we have proposed Hybrid classifier Adaptive boost with support vector machine RBF kernel on Spambase dataset. We have also extracted the features first by Principal component analysis. General Terms: Email Spam classification. Keywords: Adaboost, classifier, ensemble, machine learning, spam email, SVM.
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approachesijtsrd
In this work, we have reviewed the issue of spam mail which is a big problem in the area of Internet. The growing size of uncalled mass e mail or spam has produced the requirement of a dependable anti spam filter. Now a days the Machine learning ML proedures are being employed to spontaneously filter the spam e mail in an effective manner. In this work, we have reviewed some of the prevalent ML approaches such as Rough sets, Bayesian classification, SVMs, k NN, ANNs and Artificial immune system and of their use fullness in the issue of spam Email taxonomy. We have provided the depictions of the procedures and the divergence of their enactment on the basis of the quantity of Spam Assassin. Anu | Ms. Preeti "A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-6 , October 2020, URL: https://www.ijtsrd.com/papers/ijtsrd33261.pdf Paper Url: https://www.ijtsrd.com/computer-science/data-processing/33261/a-deep-analysis-on-prevailing-spam-mail-filteration-machine-learning-approaches/anu
Empirical analysis of ensemble methods for the classification of robocalls in...IJECEIAES
With the advent of technology, there has been an excessive use of cellular phones. Cellular phones have made life convenient in our society. However, individuals and groups have subverted the telecommunication devices to deceive unwary victims. Robocalls are quite prevalent these days and they can either be legal or used by scammers to trick one out of their money. The proposed methodology in the paper is to experiment two ensemble models on the dataset acquired from the Federal Trade Commission(DNC Dataset). It is imperative to analyze the call records and based on the patterns the calls can classify as a robocall or not a robocall. Two algorithms Random Forest and XgBoost are combined in two ways and compared in the paper in terms of accuracy, sensitivity and the time taken.
A Novel Approach for Developing Paraphrase Detection System using Machine Lea...Rudradityo Saha
Plagiarism detection is difficult since there can be changes made to a sentence at several levels, namely, lexical, semantic, and syntactic level, to construct a paraphrased or plagiarized sentence posing as original. This project presents a novel Supervised Machine Learning Classification Paraphrase Detection System developed by conducting experiments using Microsoft Research Paraphrase (MSRP) Corpus and assessed on the same. The proposed paraphrase detection system has achieved comparable performance with existing paraphrase detection systems. The major contributions of this project are the utilization of a unique combination of lexical, semantic, and syntactic features, utilization of Shapley Additive Explanations (SHAP) Feature Importance Plots in XGBoost, and application of a soft voting classifier comprising of the top 3 performing standalone machine learning classifiers on the training dataset of MSRP Corpus. Another major contribution of the project is the finding that applying data augmentation techniques degrades the performance of machine learning classifiers.
Identifying Valid Email Spam Emails Using Decision TreeEditor IJCATR
The increasing use of e-mail and the growing trend of Internet users sending unsolicited bulk e-mail, the need for an antispam
filtering or have created, Filter large poster have been produced in this area, each with its own method and some parameters are
to recognize spam. The advantage of this method is the simultaneous use of two algorithms decision tree ID3 - Mamdani and Naive
Bayesian is fuzzy. The first two algorithms are then used to detect spam Bagging approach is to identify spam. In the evaluation of this
dataset contains a thousand letters have been analyzed by the software Weka charts provided in spam detection accuracy than previous
methods of improvement
03 fauzi indonesian 9456 11nov17 edit septianIAESIJEECS
Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmjournalBEEI
Indonesia is ranked the top 8th out of the total country population in the world for the global spammers. Web-based spam filter service with the REST API type can be used to detect email spam in the Indonesian language on the email server or various types of email server applications. With REST API, then there will be data exchange between the applications with JSON data type using existing HTTP commands. One type of spam filter commonly used is Bayesian Filtering, where the Naïve Bayes algorithm is used as a classification algorithm. Meanwhile, the N-gram method is used to increase the accuracy of the implementation of the Naïve Bayes algorithm in this study. N-gram and Naïve Bayes algorithms to detect spam email in the Indonesian language have successfully been implemented with accuracy around 0.615 until 0.94, precision at 0.566 until 0.924, recall at 0.96 until 1.00, and F-measure at 0.721 until 0.942. The best solution is found by using the 5-gram method with the highest score of accuracy at 0.94, precision at 0.924, recall at 0.96, and F-measure value at 0.942.
An incremental learning based framework for image spam filteringIJCSEA Journal
Nowadays, an image spam is an unsolved problem because of two reasons. One is due to the diversity of
spamming tricks. The other reason is due to the evolving nature of image spam. As new spam constantly
emerging, filters’ effectiveness drops over time. In this paper, we present an effective anti-spam approach
to solve the two problems. First, a novel clustering filter is proposed. By exploring the density-based
clustering algorithm, the proposed filter is robust to spamming tricks. Then, we present a hierarchical
framework by combining the clustering filter with other machine learning based classifiers to further
improve the filtering capacity. Moreover, incremental learning mechanism is integrated to ensure the
proposed framework be capable of adjusting itself to overcome new image spamming tricks. We evaluate
the proposed framework on two public spam corpora. The experiment results show that the proposed
framework achieves high precision along with low false positive rate.
A Heuristic Approach for Network Data Clusteringidescitation
In this growing world of technology there are lots of security threats received by
each and every area of computer networks. Most of the time the network security threats
produce high false positive and negative ratios, this creates an obstacle for any security
system to work improperly. The overwhelming threats make it challenging to understand
and manage the network data.
To address this problem we present a novel approach which eventually understand the
network data by clustering them without background knowledge of any threats according to
various parameters like source IP, Destination IP etc. And this approach saves
administrator’s time and energy in processing of large amount threats.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Extraction of Data Using Comparable Entity Miningiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Survey on: Sound Source Separation MethodsIJCERT
now a day’s multimedia databases are growing rapidly on large scale. For the effective management and exploration of large amount of music data the technology of singer identification is developed. With the help of this technology songs performed by particular singer can be clustered automatically. To improve the Performance of singer identification the technologies are emerged that can separate the singing voice from music accompaniment. One of the methods used for separating the singing voice from music accompaniment is non-negative matrix partial co factorization. This paper studies the different techniques for separation of singing voice from music accompaniment.
Efficient Filtering Algorithms for Location- Aware Publish/subscribeIJSRD
Location-based services have been mostly used in many systems. preceding systems uses a pull model or user-initiated model, where a user arrival a query to a server which gives response with location-aware answers. To offer outcomes to users with fast responses, a push model or server-initiated model is flattering an important computing model in the next-generation location-based services. In the push model, subscribers arrive spatio-textual subscriptions to fastening their curiosities, and publishers send spatio-textual messages. It is used for a high-performance location-aware publish/subscribe system to send publishers’ messages to valid subscribers. In this paper, we find the exploration happenstances that start in manipulative a location-aware publish/subscribe system. We recommend an R-tree based index by merging textual descriptions into R-tree nodes. We design efficient filtering algorithms and effective pruning techniques to accomplish high performance. This method can support likewise conjunctive queries and ranking queries.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Feature Selection Approach based on Firefly Algorithm and Chi-square IJECEIAES
Dimensionality problem is a well-known challenging issue for most classifiers in which datasets have unbalanced number of samples and features. Features may contain unreliable data which may lead the classification process to produce undesirable results. Feature selection approach is considered a solution for this kind of problems. In this paperan enhanced firefly algorithm is proposed to serve as a feature selection solution for reducing dimensionality and picking the most informative features to be used in classification. The main purpose of the proposedmodel is to improve the classification accuracy through using the selected features produced from the model, thus classification errors will decrease. Modeling firefly in this research appears through simulating firefly position by cell chi-square value which is changed after every move, and simulating firefly intensity by calculating a set of different fitness functionsas a weight for each feature. Knearest neighbor and Discriminant analysis are used as classifiers to test the proposed firefly algorithm in selecting features. Experimental results showed that the proposed enhanced algorithmbased on firefly algorithm with chisquare and different fitness functions can provide better results than others. Results showed that reduction of dataset is useful for gaining higher accuracy in classification.
Improved spambase dataset prediction using svm rbf kernel with adaptive boosteSAT Journals
Abstract Spam is no more garbage but risk as it includes virus attachments and spyware agents which make the recipients’ system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning algorithms have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should deal with it. In this paper we have proposed Hybrid classifier Adaptive boost with support vector machine RBF kernel on Spambase dataset. We have also extracted the features first by Principal component analysis. General Terms: Email Spam classification. Keywords: Adaboost, classifier, ensemble, machine learning, spam email, SVM.
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
Email is one of the most popular communication media in the current century; it has become an effective and fast method to share and information exchangeall over the world. In recent years, emails users are facing problem which is spam emails. Spam emails are unsolicited, bulk emails are sent by spammers. It consumes storage of mail servers, waste of time and consumes network bandwidth.Many methods used for
spam filtering to classify email messages into two groups spam and non-spam. In general, one of the most powerful tools used for data classification is Artificial Neural Networks (ANNs); it has the capability of dealing a huge amount of data with high dimensionality in better accuracy. One important type of ANNs is the Radial Basis Function Neural Networks (RBFNN) that will be used in this work to classify spam message. In this paper, we present a new approach of spam filtering technique which combines RBFNN and Particles Swarm Optimization (PSO) algorithm (HC-RBFPSO). The proposed approach uses PSO
algorithm to optimize the RBFNN parameters, depending on the evolutionary heuristic search process of PSO. PSO use to optimize the best position of the RBFNN centers c. The Radii r optimize using K-Nearest Neighbors algorithmand the weights w optimize using Singular Value Decomposition algorithm within
each iterative process of PSO depending the fitness (error) function. The experiments are conducted on spam dataset namely SPAMBASE downloaded from UCI Machine Learning Repository. The experimental results show that our approach is performed in accuracy compared with other approaches that use the same dataset.
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
Email is one of the most popular communication media in the current century; it has become an effective
and fast method to share and information exchangeall over the world. In recent years, emails users are
facing problem which is spam emails. Spam emails are unsolicited, bulk emails are sent by spammers. It
consumes storage of mail servers, waste of time and consumes network bandwidth.Many methods used for
spam filtering to classify email messages into two groups spam and non-spam. In general, one of the most
powerful tools used for data classification is Artificial Neural Networks (ANNs); it has the capability of
dealing a huge amount of data with high dimensionality in better accuracy. One important type of ANNs is
the Radial Basis Function Neural Networks (RBFNN) that will be used in this work to classify spam
message. In this paper, we present a new approach of spam filtering technique which combinesRBFNN and
Particles Swarm Optimization (PSO) algorithm (HC-RBFPSO). The proposed approach uses PSO
algorithm to optimize the RBFNN parameters, depending on the evolutionary heuristic search process of
PSO. PSO use to optimize the best position of the RBFNN centers c. The Radii r optimize using K-Nearest
Neighbors algorithmand the weights w optimize using Singular Value Decomposition algorithm within
each iterative process of PSO depending the fitness (error) function. The experiments are conducted on
spam dataset namely SPAMBASE downloaded from UCI Machine Learning Repository. The experimental
results show that our approach is performed in accuracy compared with other approaches that use the
same dataset.
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
Email is one of the most popular communication media in the current century; it has become an effective and fast method to share and information exchangeall over the world. In recent years, emails users are facing problem which is spam emails. Spam emails are unsolicited, bulk emails are sent by spammers. It consumes storage of mail servers, waste of time and consumes network bandwidth.Many methods used for spam filtering to classify email messages into two groups spam and non-spam. In general, one of the most powerful tools used for data classification is Artificial Neural Networks (ANNs); it has the capability of dealing a huge amount of data with high dimensionality in better accuracy. One important type of ANNs is the Radial Basis Function Neural Networks (RBFNN) that will be used in this work to classify spam message. In this paper, we present a new approach of spam filtering technique which combinesRBFNN and Particles Swarm Optimization (PSO) algorithm (HC-RBFPSO). The proposed approach uses PSO algorithm to optimize the RBFNN parameters, depending on the evolutionary heuristic search process of PSO. PSO use to optimize the best position of the RBFNN centers c. The Radii r optimize using K-Nearest Neighbors algorithmand the weights w optimize using Singular Value Decomposition algorithm within each iterative process of PSO depending the fitness (error) function. The experiments are conducted on spam dataset namely SPAMBASE downloaded from UCI Machine Learning Repository. The experimental results show that our approach is performed in accuracy compared with other approaches that use the same dataset.
Congestion Control in Wireless Sensor Networks Using Genetic AlgorithmEditor IJCATR
Sensor network consists of a large number of small nods, strongly interacting with the physical environment, takes
environmental data through sensors, and reacts after processing on information. Wireless network technologies are widely used in most
applications. As wireless sensor networks have many activities in the field of information transmission, network congestion cannot be
thus avoided. So it seems necessary that some new methods can control congestion and use existing resources for providing better traffic
demands. Congestion increases packet loss and retransmission of removed packets and also wastes of energy. In this paper, a novel
method is presented for congestion control in wireless sensor networks using genetic algorithm. The results of simulation show that the
proposed method, in comparison with the algorithm LEACH, can significantly improve congestion control at high speeds.
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
The development of microarray technology has suppli
ed a large volume of data to many fields. The gene
microarray analysis and classification have demonst
rated an effective way for the effective diagnosis
of
diseases and cancers. In as much as the data achiev
ing from microarray technology is very noisy and al
so
has thousands of features, feature selection plays
an important role in removing irrelevant and redund
ant
features and also reducing computational complexity
. There are two important approaches for gene
selection in microarray data analysis, the filters
and the wrappers. To select a concise subset of inf
ormative
genes, we introduce a hybrid feature selection whic
h combines two approaches. The fact of the matter i
s
that candidate’s features are first selected from t
he original set via several effective filters. The
candidate
feature set is further refined by more accurate wra
ppers. Thus, we can take advantage of both the filt
ers
and wrappers. Experimental results based on 11 micr
oarray datasets show that our mechanism can be
effected with a smaller feature set. Moreover, thes
e feature subsets can be obtained in a reasonable t
ime
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
The development of microarray technology has supplied a large volume of data to many fields. The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. In as much as the data achieving from microarray technology is very noisy and also has thousands of features, feature selection plays an important role in removing irrelevant and redundant features and also reducing computational complexity. There are two important approaches for gene selection in microarray data analysis, the filters and the wrappers. To select a concise subset of informative genes, we introduce a hybrid feature selection which combines two approaches. The fact of the matter is that candidate’s features are first selected from the original set via several effective filters. The candidate feature set is further refined by more accurate wrappers. Thus, we can take advantage of both the filters and wrappers. Experimental results based on 11 microarray datasets show that our mechanism can be effected with a smaller feature set. Moreover, these feature subsets can be obtained in a reasonable time.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
This paper presents a set of methods that uses a genetic algorithm for automatic test-data generation in
software testing. For several years researchers have proposed several methods for generating test data
which had different drawbacks. In this paper, we have presented various Genetic Algorithm (GA) based test
methods which will be having different parameters to automate the structural-oriented test data generation
on the basis of internal program structure. The factors discovered are used in evaluating the fitness
function of Genetic algorithm for selecting the best possible Test method. These methods take the test
populations as an input and then evaluate the test cases for that program. This integration will help in
improving the overall performance of genetic algorithm in search space exploration and exploitation fields
with better convergence rate.
In this research, a hybrid wrapper model is proposed to identify the featured gene subset from the gene expression data. To balance the gap between exploration
and exploitation, a hybrid model with a popular meta-heuristic algorithm named
spider monkey optimizer (SMO) and simulated annealing (SA) is applied. In the proposed model, ReliefF is used as a filter to obtain the relevant gene subset
from dataset by removing the noise and outliers prior to feeding the data to the
wrapper SMO. To enhance the quality of the solution, simulated annealing is
deployed as local search with the SMO in the second phase, which will guide to the detection of the most optimal feature subset. To evaluate the performance of the proposed model, support vector machine (SVM) as a fitness function to recognize the most informative biomarker gene from the cancer datasets along with University of California, Irvine (UCI) datasets. To further evaluate the model, 4 different classifiers (SVM, na¨ıve Bayes (NB), decision tree (DT), and k-nearest neighbors (KNN)) are used. From the experimental results and analysis, it’s noteworthy to accept that the ReliefF-SMO-SA-SVM performs relatively better than its state-of-the-art counterparts. For cancer datasets, our model performs better in terms of accuracy with a maximum of 99.45%.
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...IJNSA Journal
Electronic mail, commonly known as email, is a crucial technology that enables streamlined operations and communications in corporate environments. Empowering swift and dependable transactions, email is a driving force behind heightened productivity and organizational effectiveness. However, its versatility also renders it susceptible to misuse by cybercriminals engaging in activities such as hacking, spoofing, phishing, email bombing, whaling, and spamming. As a result, effective and efficient data analysis is important in avoiding and detecting cyber-attacks and crime on times. To overcome the above challenges, a novel approach named Aquila Optimization (AO) is used in this paper to find the best set of hyperparameters of the Stacked Auto Encoder (SAE) classifier. The purpose of increasing the hyperparameters of the SAE using the AO is to obtain a higher text classification accuracy. Then the optimized SAE classifies the selected features into different classes. The experimental results showed that the proposed AO-SAE model outperforms the existing models such as Logistic Regression (LR) and Long Short-Term Model based Gated Current Unit (LSTM based GRU) in terms of Accuracy.
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...IJERD Editor
Advancements of Computer technology has made every organization to implement the automatic processing systems for its activities. One of the examples is the recognition of handwritten characters, which has always been a challenging task in image processing and pattern recognition. In this paper we propose Zone based features for recognition of the handwritten characters. In this zoning approach a digit image is divided into 8x8 zones and centre pixel is computed for each zone. This procedure is sequentially repeated for entire zone. Finally features are extracted for classification and recognition.
AN EFFICIENT FEATURE SELECTION MODEL FOR IGBO TEXTIJDKP
The development in Information Technology (IT) has encouraged the use of Igbo Language in text creation, online news reporting, online searching and articles publications. As the information stored in text format of this language is increasing, there is need for an intelligent text-based system for proper management of the data. The selection of optimal set of features for processing plays vital roles in text-based system. This paper analyzed the structure of Igbo text and designed an efficient feature selection model for an intelligent Igbo text-based system. It adopted Mean TF-IDF measure to select most relevant features on Igbo text documents represented with two word-based n-gram text representation (unigram and bigram) models. The model is designed with Object-Oriented Methodology and implemented with Python programming language with tools from Natural Language Toolkits (NLTK). The result shows that bigram represented text gives more relevant features based on the language semantics.
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
Similar to Spam filtering by using Genetic based Feature Selection (20)
Text Mining in Digital Libraries using OKAPI BM25 ModelEditor IJCATR
The emergence of the internet has made vast amounts of information available and easily accessible online. As a result, most libraries have digitized their content in order to remain relevant to their users and to keep pace with the advancement of the internet. However, these digital libraries have been criticized for using inefficient information retrieval models that do not perform relevance ranking to the retrieved results. This paper proposed the use of OKAPI BM25 model in text mining so as means of improving relevance ranking of digital libraries. Okapi BM25 model was selected because it is a probability-based relevance ranking algorithm. A case study research was conducted and the model design was based on information retrieval processes. The performance of Boolean, vector space, and Okapi BM25 models was compared for data retrieval. Relevant ranked documents were retrieved and displayed at the OPAC framework search page. The results revealed that Okapi BM 25 outperformed Boolean model and Vector Space model. Therefore, this paper proposes the use of Okapi BM25 model to reward terms according to their relative frequencies in a document so as to improve the performance of text mining in digital libraries.
Green Computing, eco trends, climate change, e-waste and eco-friendlyEditor IJCATR
This study focused on the practice of using computing resources more efficiently while maintaining or increasing overall performance. Sustainable IT services require the integration of green computing practices such as power management, virtualization, improving cooling technology, recycling, electronic waste disposal, and optimization of the IT infrastructure to meet sustainability requirements. Studies have shown that costs of power utilized by IT departments can approach 50% of the overall energy costs for an organization. While there is an expectation that green IT should lower costs and the firm’s impact on the environment, there has been far less attention directed at understanding the strategic benefits of sustainable IT services in terms of the creation of customer value, business value and societal value. This paper provides a review of the literature on sustainable IT, key areas of focus, and identifies a core set of principles to guide sustainable IT service design.
Policies for Green Computing and E-Waste in NigeriaEditor IJCATR
Computers today are an integral part of individuals’ lives all around the world, but unfortunately these devices are toxic to the environment given the materials used, their limited battery life and technological obsolescence. Individuals are concerned about the hazardous materials ever present in computers, even if the importance of various attributes differs, and that a more environment -friendly attitude can be obtained through exposure to educational materials. In this paper, we aim to delineate the problem of e-waste in Nigeria and highlight a series of measures and the advantage they herald for our country and propose a series of action steps to develop in these areas further. It is possible for Nigeria to have an immediate economic stimulus and job creation while moving quickly to abide by the requirements of climate change legislation and energy efficiency directives. The costs of implementing energy efficiency and renewable energy measures are minimal as they are not cash expenditures but rather investments paid back by future, continuous energy savings.
Performance Evaluation of VANETs for Evaluating Node Stability in Dynamic Sce...Editor IJCATR
Vehicular ad hoc networks (VANETs) are a favorable area of exploration which empowers the interconnection amid the movable vehicles and between transportable units (vehicles) and road side units (RSU). In Vehicular Ad Hoc Networks (VANETs), mobile vehicles can be organized into assemblage to promote interconnection links. The assemblage arrangement according to dimensions and geographical extend has serious influence on attribute of interaction .Vehicular ad hoc networks (VANETs) are subclass of mobile Ad-hoc network involving more complex mobility patterns. Because of mobility the topology changes very frequently. This raises a number of technical challenges including the stability of the network .There is a need for assemblage configuration leading to more stable realistic network. The paper provides investigation of various simulation scenarios in which cluster using k-means algorithm are generated and their numbers are varied to find the more stable configuration in real scenario of road.
Optimum Location of DG Units Considering Operation ConditionsEditor IJCATR
The optimal sizing and placement of Distributed Generation units (DG) are becoming very attractive to researchers these days. In this paper a two stage approach has been used for allocation and sizing of DGs in distribution system with time varying load model. The strategic placement of DGs can help in reducing energy losses and improving voltage profile. The proposed work discusses time varying loads that can be useful for selecting the location and optimizing DG operation. The method has the potential to be used for integrating the available DGs by identifying the best locations in a power system. The proposed method has been demonstrated on 9-bus test system.
Analysis of Comparison of Fuzzy Knn, C4.5 Algorithm, and Naïve Bayes Classifi...Editor IJCATR
Early detection of diabetes mellitus (DM) can prevent or inhibit complication. There are several laboratory test that must be done to detect DM. The result of this laboratory test then converted into data training. Data training used in this study generated from UCI Pima Database with 6 attributes that were used to classify positive or negative diabetes. There are various classification methods that are commonly used, and in this study three of them were compared, which were fuzzy KNN, C4.5 algorithm and Naïve Bayes Classifier (NBC) with one identical case. The objective of this study was to create software to classify DM using tested methods and compared the three methods based on accuracy, precision, and recall. The results showed that the best method was Fuzzy KNN with average and maximum accuracy reached 96% and 98%, respectively. In second place, NBC method had respective average and maximum accuracy of 87.5% and 90%. Lastly, C4.5 algorithm had average and maximum accuracy of 79.5% and 86%, respectively.
Web Scraping for Estimating new Record from Source SiteEditor IJCATR
Study in the Competitive field of Intelligent, and studies in the field of Web Scraping, have a symbiotic relationship mutualism. In the information age today, the website serves as a main source. The research focus is on how to get data from websites and how to slow down the intensity of the download. The problem that arises is the website sources are autonomous so that vulnerable changes the structure of the content at any time. The next problem is the system intrusion detection snort installed on the server to detect bot crawler. So the researchers propose the use of the methods of Mining Data Records and the method of Exponential Smoothing so that adaptive to changes in the structure of the content and do a browse or fetch automatically follow the pattern of the occurrences of the news. The results of the tests, with the threshold 0.3 for MDR and similarity threshold score 0.65 for STM, using recall and precision values produce f-measure average 92.6%. While the results of the tests of the exponential estimation smoothing using ? = 0.5 produces MAE 18.2 datarecord duplicate. It slowed down to 3.6 datarecord from 21.8 datarecord results schedule download/fetch fix in an average time of occurrence news.
Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...Editor IJCATR
Most of the existing semantic similarity measures that use ontology structure as their primary source can measure semantic similarity between concepts/classes using single ontology. The ontology-based semantic similarity techniques such as structure-based semantic similarity techniques (Path Length Measure, Wu and Palmer’s Measure, and Leacock and Chodorow’s measure), information content-based similarity techniques (Resnik’s measure, Lin’s measure), and biomedical domain ontology techniques (Al-Mubaid and Nguyen’s measure (SimDist)) were evaluated relative to human experts’ ratings, and compared on sets of concepts using the ICD-10 “V1.0” terminology within the UMLS. The experimental results validate the efficiency of the SemDist technique in single ontology, and demonstrate that SemDist semantic similarity techniques, compared with the existing techniques, gives the best overall results of correlation with experts’ ratings.
Semantic Similarity Measures between Terms in the Biomedical Domain within f...Editor IJCATR
The techniques and tests are tools used to define how measure the goodness of ontology or its resources. The similarity between biomedical classes/concepts is an important task for the biomedical information extraction and knowledge discovery. However, most of the semantic similarity techniques can be adopted to be used in the biomedical domain (UMLS). Many experiments have been conducted to check the applicability of these measures. In this paper, we investigate to measure semantic similarity between two terms within single ontology or multiple ontologies in ICD-10 “V1.0” as primary source, and compare my results to human experts score by correlation coefficient.
A Strategy for Improving the Performance of Small Files in Openstack Swift Editor IJCATR
This is an effective way to improve the storage access performance of small files in Openstack Swift by adding an aggregate storage module. Because Swift will lead to too much disk operation when querying metadata, the transfer performance of plenty of small files is low. In this paper, we propose an aggregated storage strategy (ASS), and implement it in Swift. ASS comprises two parts which include merge storage and index storage. At the first stage, ASS arranges the write request queue in chronological order, and then stores objects in volumes. These volumes are large files that are stored in Swift actually. During the short encounter time, the object-to-volume mapping information is stored in Key-Value store at the second stage. The experimental results show that the ASS can effectively improve Swift's small file transfer performance.
Integrated System for Vehicle Clearance and RegistrationEditor IJCATR
Efficient management and control of government's cash resources rely on government banking arrangements. Nigeria, like many low income countries, employed fragmented systems in handling government receipts and payments. Later in 2016, Nigeria implemented a unified structure as recommended by the IMF, where all government funds are collected in one account would reduce borrowing costs, extend credit and improve government's fiscal policy among other benefits to government. This situation motivated us to embark on this research to design and implement an integrated system for vehicle clearance and registration. This system complies with the new Treasury Single Account policy to enable proper interaction and collaboration among five different level agencies (NCS, FRSC, SBIR, VIO and NPF) saddled with vehicular administration and activities in Nigeria. Since the system is web based, Object Oriented Hypermedia Design Methodology (OOHDM) is used. Tools such as Php, JavaScript, css, html, AJAX and other web development technologies were used. The result is a web based system that gives proper information about a vehicle starting from the exact date of importation to registration and renewal of licensing. Vehicle owner information, custom duty information, plate number registration details, etc. will also be efficiently retrieved from the system by any of the agencies without contacting the other agency at any point in time. Also number plate will no longer be the only means of vehicle identification as it is presently the case in Nigeria, because the unified system will automatically generate and assigned a Unique Vehicle Identification Pin Number (UVIPN) on payment of duty in the system to the vehicle and the UVIPN will be linked to the various agencies in the management information system.
Assessment of the Efficiency of Customer Order Management System: A Case Stu...Editor IJCATR
The Supermarket Management System deals with the automation of buying and selling of good and services. It includes both sales and purchase of items. The project Supermarket Management System is to be developed with the objective of making the system reliable, easier, fast, and more informative.
Energy-Aware Routing in Wireless Sensor Network Using Modified Bi-Directional A*Editor IJCATR
Energy is a key component in the Wireless Sensor Network (WSN)[1]. The system will not be able to run according to its function without the availability of adequate power units. One of the characteristics of wireless sensor network is Limitation energy[2]. A lot of research has been done to develop strategies to overcome this problem. One of them is clustering technique. The popular clustering technique is Low Energy Adaptive Clustering Hierarchy (LEACH)[3]. In LEACH, clustering techniques are used to determine Cluster Head (CH), which will then be assigned to forward packets to Base Station (BS). In this research, we propose other clustering techniques, which utilize the Social Network Analysis approach theory of Betweeness Centrality (BC) which will then be implemented in the Setup phase. While in the Steady-State phase, one of the heuristic searching algorithms, Modified Bi-Directional A* (MBDA *) is implemented. The experiment was performed deploy 100 nodes statically in the 100x100 area, with one Base Station at coordinates (50,50). To find out the reliability of the system, the experiment to do in 5000 rounds. The performance of the designed routing protocol strategy will be tested based on network lifetime, throughput, and residual energy. The results show that BC-MBDA * is better than LEACH. This is influenced by the ways of working LEACH in determining the CH that is dynamic, which is always changing in every data transmission process. This will result in the use of energy, because they always doing any computation to determine CH in every transmission process. In contrast to BC-MBDA *, CH is statically determined, so it can decrease energy usage.
Security in Software Defined Networks (SDN): Challenges and Research Opportun...Editor IJCATR
In networks, the rapidly changing traffic patterns of search engines, Internet of Things (IoT) devices, Big Data and data centers has thrown up new challenges for legacy; existing networks; and prompted the need for a more intelligent and innovative way to dynamically manage traffic and allocate limited network resources. Software Defined Network (SDN) which decouples the control plane from the data plane through network vitalizations aims to address these challenges. This paper has explored the SDN architecture and its implementation with the OpenFlow protocol. It has also assessed some of its benefits over traditional network architectures, security concerns and how it can be addressed in future research and related works in emerging economies such as Nigeria.
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Editor IJCATR
Report handling on "LAPOR!" (Laporan, Aspirasi dan Pengaduan Online Rakyat) system depending on the system administrator who manually reads every incoming report [3]. Read manually can lead to errors in handling complaints [4] if the data flow is huge and grows rapidly, it needs at least three days to prepare a confirmation and it sensitive to inconsistencies [3]. In this study, the authors propose a model that can measure the identities of the Query (Incoming) with Document (Archive). The authors employed Class-Based Indexing term weighting scheme, and Cosine Similarities to analyse document similarities. CoSimTFIDF, CoSimTFICF and CoSimTFIDFICF values used in classification as feature for K-Nearest Neighbour (K-NN) classifier. The optimum result evaluation is pre-processing employ 75% of training data ratio and 25% of test data with CoSimTFIDF feature. It deliver a high accuracy 84%. The k = 5 value obtain high accuracy 84.12%
Hangul Recognition Using Support Vector MachineEditor IJCATR
The recognition of Hangul Image is more difficult compared with that of Latin. It could be recognized from the structural arrangement. Hangul is arranged from two dimensions while Latin is only from the left to the right. The current research creates a system to convert Hangul image into Latin text in order to use it as a learning material on reading Hangul. In general, image recognition system is divided into three steps. The first step is preprocessing, which includes binarization, segmentation through connected component-labeling method, and thinning with Zhang Suen to decrease some pattern information. The second is receiving the feature from every single image, whose identification process is done through chain code method. The third is recognizing the process using Support Vector Machine (SVM) with some kernels. It works through letter image and Hangul word recognition. It consists of 34 letters, each of which has 15 different patterns. The whole patterns are 510, divided into 3 data scenarios. The highest result achieved is 94,7% using SVM kernel polynomial and radial basis function. The level of recognition result is influenced by many trained data. Whilst the recognition process of Hangul word applies to the type 2 Hangul word with 6 different patterns. The difference of these patterns appears from the change of the font type. The chosen fonts for data training are such as Batang, Dotum, Gaeul, Gulim, Malgun Gothic. Arial Unicode MS is used to test the data. The lowest accuracy is achieved through the use of SVM kernel radial basis function, which is 69%. The same result, 72 %, is given by the SVM kernel linear and polynomial.
Application of 3D Printing in EducationEditor IJCATR
This paper provides a review of literature concerning the application of 3D printing in the education system. The review identifies that 3D Printing is being applied across the Educational levels [1] as well as in Libraries, Laboratories, and Distance education systems. The review also finds that 3D Printing is being used to teach both students and trainers about 3D Printing and to develop 3D Printing skills.
Survey on Energy-Efficient Routing Algorithms for Underwater Wireless Sensor ...Editor IJCATR
In underwater environment, for retrieval of information the routing mechanism is used. In routing mechanism there are three to four types of nodes are used, one is sink node which is deployed on the water surface and can collect the information, courier/super/AUV or dolphin powerful nodes are deployed in the middle of the water for forwarding the packets, ordinary nodes are also forwarder nodes which can be deployed from bottom to surface of the water and source nodes are deployed at the seabed which can extract the valuable information from the bottom of the sea. In underwater environment the battery power of the nodes is limited and that power can be enhanced through better selection of the routing algorithm. This paper focuses the energy-efficient routing algorithms for their routing mechanisms to prolong the battery power of the nodes. This paper also focuses the performance analysis of the energy-efficient algorithms under which we can examine the better performance of the route selection mechanism which can prolong the battery power of the node
Comparative analysis on Void Node Removal Routing algorithms for Underwater W...Editor IJCATR
The designing of routing algorithms faces many challenges in underwater environment like: propagation delay, acoustic channel behaviour, limited bandwidth, high bit error rate, limited battery power, underwater pressure, node mobility, localization 3D deployment, and underwater obstacles (voids). This paper focuses the underwater voids which affects the overall performance of the entire network. The majority of the researchers have used the better approaches for removal of voids through alternate path selection mechanism but still research needs improvement. This paper also focuses the architecture and its operation through merits and demerits of the existing algorithms. This research article further focuses the analytical method of the performance analysis of existing algorithms through which we found the better approach for removal of voids
Decay Property for Solutions to Plate Type Equations with Variable CoefficientsEditor IJCATR
In this paper we consider the initial value problem for a plate type equation with variable coefficients and memory in
1 n R n ), which is of regularity-loss property. By using spectrally resolution, we study the pointwise estimates in the spectral
space of the fundamental solution to the corresponding linear problem. Appealing to this pointwise estimates, we obtain the global
existence and the decay estimates of solutions to the semilinear problem by employing the fixed point theorem
Decay Property for Solutions to Plate Type Equations with Variable Coefficients
Spam filtering by using Genetic based Feature Selection
1. International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 839 - 843, 2014, ISSN:- 2319–8656
www.ijcat.com 839
Spam filtering by using Genetic based Feature Selection
Sorayya mirzapour kalaibar
Department of Computer, Shabestar Branch,
Islamic Azad University,
Shabestar, Iran
Seyed Naser Razavi
Computer Engineering Department,
Faulty of Electrical and Computer Engineering,
University of Tabriz, Iran
Abstract:
Spam is defined as redundant and unwanted electronica letters, and nowadays, it has created many problems in business life such as
occupying networks bandwidth and the space of user’s mailbox. Due to these problems, much research has been carried out in this
regard by using classification technique. The resent research show that feature selection can have positive effect on the efficiency of
machine learning algorithm. Most algorithms try to present a data model depending on certain detection of small set of features.
Unrelated features in the process of making model result in weak estimation and more computations. In this research it has been tried
to evaluate spam detection in legal electronica letters, and their effect on several Machin learning algorithms through presenting a
feature selection method based on genetic algorithm. Bayesian network and KNN classifiers have been taken into account in
classification phase and spam base dataset is used.
Keywords: Email spam, feature selection, genetic algorithm, classification.
1. INTRODUCTION
Nowadays, e-mail is widely becoming one of the fastest and
most economical forms of communication .Thus, the e-mail is
prone to be misused. One such misuse is the posting of
unsolicited, unwanted e-mails known as spam or junk e-
mails[1]. Spam is becoming an increasingly large problem.
Many Internet Service Providers (ISPs) receive over a billion
spam messages per day. Much of these e-mails are filtered
before they reach end users. Content-Based filtering is a key
technological method to e-mail filtering. The spam e-mail
contents usually contain common words called features.
Frequency of occurrence of these features inside an e-mail
gives an indication that the e-mail is a spam or legitimate
[2,3,4]. There are various purposes in sending spams such as
economical purposes. Some of the spams are unwanted
advertising and commercial message, while others deceive the
users to use their private information (phishing), or they
temporarily destroy the mail server by sending malicious
software to the user’s computer. Also, they create traffic, or
distribute immoral messages. Therefore, it is necessary to find
some ways to filter these troublesome and annoying emails
automatically. In order to detect spams, some methods such as
parameter optimization and feature selection have been
proposed in order to reduce processing overhead and to
guarantee high detection rate [16].The spam filtering is high
sensitive application of text classification (TC) task. A main
problem in text classification tasks which is more serious in
email filtering is existence of large number of features. For
solving the issue, various feature selection methods are
considered, which extract a lower dimensional feature space
from original one and offer it as input to classifier[5]. In this
paper, we incorporate genetic algorithm to find an optimal
subset of features of the spam base data set. The selected
features are used for classification of the spam base.
2. LITERATURE REVIEW
Features selection approaches are usually employed to reduce
the size of the feature set, and to select a subset of the original
features. Over the past years, the following methods have
been considered to select effective features such as the
algorithms based on population to select important features
and to remove irrelevant and redundant features such as
genetic algorithm (GA), particle swarm optimization (PSO),
and ant colony algorithm (ACO). Some algorithms are
developed to classify and filter e-mails. The RIPPER
algorithm [6] is an algorithm that employs rule-based to
filtering e-mails. Drucker, et. al. [7] proposed an SVM
algorithm for spam categorization. Sahami, et. al. [8]
proposed Bayesian junk E-mail filter using bag-of-words
representation and Naïve Bayes algorithm. Clark, et. al. [9]
used the bag-of-words representation and ANN for automated
spam filtering system. Branke, J. [10] discussed how the
genetic algorithm can be used to assist in designing and
training. Riley. J. [11] described a method of utilizing genetic
algorithms to train fixed architecture feed-forward and
recurrent neural networks. Yao. X. and Liu. Y. [12] reviewed
the different combinations between ANN and GA, and used
GA to evolve ANN connection weights, architectures,
learning rules, and input features. Wang and et al. presented
feature selection incorporation based on genetic algorithm and
support vector machine based on SRM to detect spam and
legitimate emails. The presented method had better results
than main SVM [13]. Zhu developed a new method based on
rough set and SVM in order to improve the level of
classification. Rough set was used as a feature selection to
decrease the number of feature and SVM as a classifier[14].
Fagboula and et al. considered GA to select an appropriate
subset of features, and they used SVM as a classifier. In order
to improve the classification accuracy and computation time,
some experiments were carried out in terms of data set of
Spam assassin [15]. Patwadhan and Ozarkar presented
random forest algorithm and partial decision trees for spam
classification. Some feature selection methods have been used
as a preprocessing stage such as Correlation based feature
selection, Chi-square, Entropy, Information Gain, Gain Ratio,
Mutual Information, Symmetrical Uncertainty, One R and
Relief. Using above mentioned methods resulting in selecting
2. International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 839 - 843, 2014, ISSN:- 2319–8656
www.ijcat.com 840
more efficient and useful features decrease time complexity
and increase accuracy [17].
3. GENETIC ALGORITHMS
A genetic algorithm (GA) is one of a number of heuristic
techniques that are based on natural selection from the
population members attempt to find high-quality solutions to
large and complex optimization problems. This algorithm can
identify and exploit regularities in the environment, and
converges on solutions (can also be regarded as locating the
local maxima) that were globally optimal [18]. This method is
very effective and widely used to find-out optimal or near
optimal solutions to a wide variety of problems. The genetic
algorithm repeatedly modifies a population of individual
solutions. At each step the genetic algorithm tries to select the
best individuals. From the current “parent” population genetic
algorithm creates “children”, who constitute next generation.
Over successive generations the population evolves toward an
optimal solution. The genetic algorithm uses three main rules
at each step to create the next generation. Select the
individuals, called parents that contribute to the population at
the next generation. Crossover rules that combine two parents
to form children for the next generation. Mutation rules, apply
random changes to individual parents to form children.
4. FEATURE SELECTION
Features selection approaches are usually employed to
reduce the size of the feature set, and to select a subset
of the original features. We use the proposed genetic
algorithms to optimize the features that contribute
significantly to the classification.
4.1. Feature Selection Using Proposed
Genetic Algorithm
In this section, the method of feature selection by using
the proposed genetic Algorithm has been presented. The
procedure of the proposed method has been stated in
details in the following section.
4.1.1. Initialize population
In the genetic algorithm, each solution to the feature
selection problem is a string of binary numbers, called
chromosome. In this algorithm initial population is
generated randomly. IN feature representation as a
chromosome, if the value of chromosome [i] is 1, the ith
feature is selected for classification, while if it is 0, then
these features will be removed [19,20]. Figure 1 shows
feature presentation as a chromosome.
In this research, we used weighted F-score for calculate
the fitness value of each chromosome. The algorithm
starts by randomly initializing a population of N
number of initial chromosome.
4.1.2 Cross over
The crossover is the most important operation in GA.
Crossover as name suggests is a process of
recombination of bit strings via an exchange of
segments between pairs of chromosomes. There are
various kinds of crossover. In one point cross-over, a bit
position is randomly selected that need to change. In
this process, a random number is generated which is a
number (less than or equal to the chromosome length)
as the crossover position [21]. Here one crossover point
is selected, binary string from beginning of
chromosome to the crossover point is copied from one
parent, the rest is copied from the second parent[22].
4.1.3. Proposed mutation
Mutation has the effect of ensuring that all possible
chromosomes can maintain good gene in the newly
generated chromosomes. In our approach, Mutation
operator is a two-steps process, and is a combination of
random and substitution mutation operator. Also is
occurs on the basis of two various mutation rates.
Mutation operator firstly events substitution step with
the probability of 0.03. In each generation, the best
chromosome involving better features and higher fitness
is selected, and it substitutes for the weakest
chromosome having lesser fitness than others. ( َهزحل ایي
َک ضْد هی بعذی ًسل َب جاری ًسل بزتز کزّهْسّم اًتقال باعث
داضت خْاُذ دًبال َب ُن را الگْریتن سزیع )ُوگزاییOtherwise, it
enters the second mutation step with probability of 0.02.
This step changes some gens of chromosome randomly
by inverting their binary cells. In fact the second is
considered to prevent reducing exploration capability of
search space to keep diversity in other chromosomes.
Generally mutation probability is equal to 0.05.
5. RESULTS SIMULATION
In order to investigate the impact of our approach on
email spam classification, spam base data set that
downloaded from the UCI Machine Learning
Repository are used [23]. Data set of Spam base
involving 4601 emails was proposed by Mark Hopkins,
and his colleagues. In This data set that is divided into
two parts, 1 shows spam, and zero indicates non-spam.
This data set involves 57 features with continuous
values. In simulation of the proposed method, training
set involving 70% of the main data set and two
experimental sets have been separately considered for
feature selection and classification. Each one involves
15% of the main data set. After performing feature
selection using the training set, the test set was used to
evaluate the selected subset of features. The evaluation
of the overall process was based on weighted f-score
which is a suitable measure for the spam classification
...
01...101
Chromosome:
Figure 1. Feature Subset: {F1,F3 , …, Fn-1 }
3. International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 839 - 843, 2014, ISSN:- 2319–8656
www.ijcat.com 841
problem. The performance of spam filtering techniques
is determined by two well-known measures used in text
classification. These measures are precision and recall
[24, 25]. Here four metric have been used for evaluating
the performance of proposed method such as precision,
accuracy, recall and F1 score. These metrics are
computed as follows:
(1)i
i
i i
TP
TP FP
(2)
i
i
i i
TP
TP FN
(3)
2
1
( )
F
(4)
iiii
ii
FNTNFPTP
TNTP
Accuracy
Where:
TPi = the number of test samples that have been
properly classified in ci class.
FPi = the number of test samples that have been
incorrectly classified in ci class.
TNi = the number of test samples belonging to ci class,
and have been correctly classified in other classes.
FNi = the number of test samples belonging to ci class,
and have been incorrectly classified in other classes.
The methods of Bayesian network and K nearest
neighbors algorithm (KNN) have been used for
classification. The executed program and the obtained
average have been compared 8 times to investigate the
performance of each classifier. The results obtained
from the proposed method of feature selection have
been compared without considering feature selection.
The obtained results show that when the parameters are
presented in tables 1 the best performance is observed
in terms of GAFS.
Table 1: the parameters of feature selection by using
genetic algorithm
80
Initial population
0.03
Mutation rate1
0.02
Mutation rate2
0.7
Crossover
100
Generations
6. RESULT EVALUATION
In this section, the results of experiments have been
presented to evaluate the efficiency of proposed
method. ( بٌذ َطبق ّد َهقایس ًتایج1ّ2دقت ًظز اسجذّل در2
ضکل ُوچٌیي است ٍضذ ٍآّرد2پیطٌِادی رّش تاثیز گزافیکی ًوْدار
بذست ًتایج دُذ.طبق هی ًطاى را سائذ ُای ّیژگی کاُص هیشاى بز
است َتْاًست پیطٌِادی رّش ّرک ًت بٌذبیشیي َطبق هْرد در ٍآهذ
ُ ّیژگی اس تْجِی قابل تعذاد حذف بز ٍّعالا ًیش را بٌذی َطبق دقت
ُ . دُذ افشایصّیژگی حذف ّجْد با اى اى کا بٌذ َطبق هْرد در وچٌیي
.است ٍرسیذ ّیژگی اًتخاب اس قبل دقت ُواى َب ًشدیک دقتی َب ُا
جذّل در دیگز هعیار َس بزای ٍآهذ بذست ًتایج3.است ٍضذ ٍداد ًطاى
َب ارسیابی هعیار َس ُز تواهی در .. بیشیي بٌذ َطبق جذّل ایي طبق
ا َهالحظ قابل بِبْداى اى کا بٌذ َطبق ّ است ٍرسیذ یاختالف با
ًاچیشی.است ٍرسیذ قبلی دقت َب ًشدیک دقتی َب )Evaluation
results obtained for Bayesian Network and KNN
classifiers are shown in table 2. These results indicate
that feature selection by GA technique improves email
spam classification. GA FS and all features by using
mentioned classifiers have been compared in terms of
Accuracy, number of selected feature, recall, precision
and F score of spam class. As it is observed in table 2,
all evaluation measure of proposed GAFS in Bayesian
network, is more than the All feature while the number
of selected features is lesser. In addition, in comparing
two classifiers, Bayesian network algorithm, better
results were presented in comparison to KNN.
Table 2: comparing feature selection methods in
terms of accuracy
Algorithms
classifier
All Feature GA FS
Bayesian network 0.891 0.918
KNN (N=1) 0.9 0.891
4. International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 839 - 843, 2014, ISSN:- 2319–8656
www.ijcat.com 842
Figure 3: column graph of comparing the number
of selected features
7. CONCLUTION
In this paper, the proposed GA based feature selection
method has been presented and evaluated by using data
set of Spam Base. The results obtained from proposed
method were compared with position without feature
selection. The obtained results show that( تعذاد َب َتْج با
ٍضذ حذف ُای )ّیژگی the proposed method has accuracy
comparable with without feature selection methods. In
addition, in Bayesian network classifier ًسبت بِتزی (ًتایج
) ّ َداضت دیگز بٌذ َطبق َب
all evaluation criteria have been considerably improved.
ّ ُا ّیژگی کاُص بز ای َهالحظ قابل تاثیز پیطٌِادی رّش (پس
افشایص/بِبْد).است َداضت دقت We can use of parameter
optimization in this work also the proposed algorithm
can be combined with other classification algorithms in
the future.
REFERENCE
[1] GOWEDER, A. M., RASHED, T., ELBEKAIE, A., &
ALHAMMI, H. A. (2008). An Anti-Spam System Using
Artificial Neural Networks and Genetic Algorithms.
Paper presented at the Proceedings of the 2008
International Arab Conference on Information
Technology.
[2] Bruening, P.(2004). Technological Responses to the
Problem of Spam: Preserving Free Speech and Open
Internet Values. First Conference on E-mail and Anti-
Spam.
[3] Graham, P.(2003). A Plan for Spam. MIT Conference on
Spam.
[4] William, S., et. al. (2005). A Unified Model of Spam
Filtration, MIT Spam Conference, Cambridge.
[5] GOWEDER, A. M., RASHED, T., ELBEKAIE, A., &
ALHAMMI, H. A. (2008). An Anti-Spam System Using
Artificial Neural Networks and Genetic Algorithms.
Paper presented at the Proceedings of the 2008
International Arab Conference on Information
Technology.
[6] Cohen, W. (1996). Learning Rules that Classify E-mail,
In AAAI Spring Symposium on Machine Learning in
Information Access, California.
[7] Drucker, H., et. al.(1999) Support Vector Machines for
Spam Categorization, In IEEE Transactions on Neural
Networks.
Sahami, M., et. al.,(1998). A Bayesian Approach to
Filtering Junk E-Mail, In Learning for Text
Categorization, AAAI Technical Report, U.S.A.
[8] Riley. J. (2002). An evolutionary approach to training
Feed-Forward and Recurrent Neural Networks", Master
thesis of Applied Science in Information Technology,
Department of Computer Science, Royal Melbourne
Institute of Technology, Australia.
[9] Clark, et. al. (2003). A Neural Network Based Approach
to Automated E-Mail Classification, IEEE/WIC
International Conference on Web Intelligence.
[10] Branke, J. (1995). Evolutionary algorithms for neural
network design and training, In Proceedings 1st Nordic
Workshop on Genetic Algorithms and its Applications,
Finland.
[11] Yao. X., Liu. Y. (1997). A new evolutionary system for
evolving artificial neural networks", IEEE Transactions
on Neural Networks.
[12] Wang, H.-b., Y. Yu, and Z. Liu. (2005) SVM classifier
incorporating feature selection using GA for spam
detection, in Embedded and Ubiquitous Computing–
EUC 2005., Springer. p. 1147-1154.
classifier
measures
KNN(N=1) Bayesian network
All Feature GA FS All Feature GA FS
precision 0.892 0.886 0.89 0.935
recall 0.871 0.860 0.851 0.869
F1 score 0.882 0.871 0.87 0.900
Table 3: comparing feature selection methods
5. International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 839 - 843, 2014, ISSN:- 2319–8656
www.ijcat.com 843
[13] Zhu, Z. (2008). An email classification model based on
rough set and support vector machine. in Fuzzy Systems
and Knowledge Discovery.
[14] .Temitayo, F., O. Stephen, and A. Abimbola. (2012).
Hybrid GA-SVM for efficient feature selection in e-mail
classification. Computer Engineering and Intelligent
Systems. 3(3): p. 17-28.
[15] Stern, H. (2008) A Survey of Modern Spam Tools. in
CEAS. Citeseer.
[16] Ozarkar, P. and M. Patwardhan. (2013).
INTERNATIONAL JOURNAL OF COMPUTER
ENGINEERING & TECHNOLOGY (IJCET). Journal
Impact Factor. 4(3): p. 123-139.
[17] Zhang, L., Zhu, J., & Yao, T. (2004). An evaluation of
statistical spam filtering techniques. ACM Transactions
on Asian Language Information Processing (TALIP),
3(4), 243-269.
[18] Vafaie H, De Jong K. (1992). Genetic algorithms as a
tool for feature selection in machine learning. In
Proceedings of Fourth International Conference on Tools
with Artificial Intelligence (TAI '92). 200-203.
[19] Yang J, Honavar V. (1998). Feature subset selection
using a genetic algorithm. Intelligent Systems and their
Applications, IEEE, 13(2):44-49.
[20] Shrivastava, J. N., & Bindu, M. H. (2014). E-mail Spam
Filtering Using Adaptive Genetic Algorithm.
International Journal of Intelligent Systems &
Applications, 6(2).
[21] Karimpour, J., A.A. Noroozi, and A. Abadi. (2012). The
Impact of Feature Selection on Web Spam Detection.
International Journal of Intelligent Systems and
Applications (IJISA), 4(9): p. 61.
[22] UCI repository of Machine learning Databases. (1998).
Department of Information and Computer Science,
University of California, Irvine, CA,
http://www.ics.uci.edu/~mlearn/MLRepository.html,
Hettich, S., Blake, C. L., and Merz, C. J.
[23] Liao, C., Alpha, S., Dixon.P. (2004). Feature
Preparation in Text Categorization, Oracle Corporation.
[24] Clark, et. al. (2003). A Neural Network Based Approach
to Automated E-Mail Classification, IEEE/WIC
International Conference on Web Intelligence