Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in 1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used cosine similarity and jaccards to compute similarity between the query and documents, and used two proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study concluded that we might have several improvements when using adaptive genetic algorithms.
Applying Genetic Algorithms to Information Retrieval Using Vector Space ModelIJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in 1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used cosine similarity and jaccards to compute similarity between the query and documents, and used two proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study concluded that we might have several improvements when using adaptive genetic algorithms.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Improving the effectiveness of information retrieval system using adaptive ge...ijcsit
Traditional Genetic Algorithm which is used in previous studies depends on fixed control parameters
especially crossover and mutation probabilities, but in this research we tried to use adaptive genetic
algorithm.
Genetic algorithm started to be applied in information retrieval system in order to optimize the query by
genetic algorithm, a good query is a set of terms that express accurately the information need while being
usable within collection corpus, the last part of this specification is critical for the matching process to be
efficient, that is why most research efforts are actually put toward the query improvement.
We investigated the use of adaptive genetic algorithm (AGA) under vector space model, Extended Boolean
model, and Language model in information retrieval (IR), the algorithm used crossover and mutation
operators with variable probability, where a traditional genetic algorithm (GA) uses fixed values of those,
and remain unchanged during execution. GA is developed to support adaptive adjustment of mutation and
crossover probability; this allows faster attainment of better solutions. The paper has been tested using
242 Arabic abstracts collected from the proceedings of the Saudi Arabian National conference.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHijdms
The alignment of two DNA sequences is a basic step in the analysis of biological data. Sequencing a long
DNA sequence is one of the most interesting problems in bioinformatics. Several techniques have been
developed to solve this sequence alignment problem like dynamic programming and heuristic algorithms.
In this paper, we introduce (GPCodon alignment) a pairwise DNA-DNA method for global sequence
alignment that improves the accuracy of pairwise sequence alignment. We use a new scoring matrix to
produce the final alignment called the empirical codon substitution matrix. Using this matrix in our
technique enabled the discovery of new relationships between sequences that could not be discovered using
traditional matrices. In addition, we present experimental results that show the performance of the
proposed technique over eleven datasets of average length of 2967 bps. We compared the efficiency and
accuracy of our techniques against a comparable tool called “Pairwise Align Codons” [1].
COMPARISON OF HIERARCHICAL AGGLOMERATIVE ALGORITHMS FOR CLUSTERING MEDICAL DO...ijseajournal
Extensive amount of data stored in medical documents require developing methods that help users to find
what they are looking for effectively by organizing large amounts of information into a small number of
meaningful clusters. The produced clusters contain groups of objects which are more similar to each other
than to the members of any other group. Thus, the aim of high-quality document clustering algorithms is to
determine a set of clusters in which the inter-cluster similarity is minimized and intra-cluster similarity is
maximized. The most important feature in many clustering algorithms is treating the clustering problem as
an optimization process, that is, maximizing or minimizing a particular clustering criterion function
defined over the whole clustering solution. The only real difference between agglomerative algorithms is
how they choose which clusters to merge. The main purpose of this paper is to compare different
agglomerative algorithms based on the evaluation of the clusters quality produced by different hierarchical
agglomerative clustering algorithms using different criterion functions for the problem of clustering
medical documents. Our experimental results showed that the agglomerative algorithm that uses I1 as its
criterion function for choosing which clusters to merge produced better clusters quality than the other
criterion functions in term of entropy and purity as external measures.
Applying Genetic Algorithms to Information Retrieval Using Vector Space ModelIJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in 1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used cosine similarity and jaccards to compute similarity between the query and documents, and used two proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study concluded that we might have several improvements when using adaptive genetic algorithms.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Improving the effectiveness of information retrieval system using adaptive ge...ijcsit
Traditional Genetic Algorithm which is used in previous studies depends on fixed control parameters
especially crossover and mutation probabilities, but in this research we tried to use adaptive genetic
algorithm.
Genetic algorithm started to be applied in information retrieval system in order to optimize the query by
genetic algorithm, a good query is a set of terms that express accurately the information need while being
usable within collection corpus, the last part of this specification is critical for the matching process to be
efficient, that is why most research efforts are actually put toward the query improvement.
We investigated the use of adaptive genetic algorithm (AGA) under vector space model, Extended Boolean
model, and Language model in information retrieval (IR), the algorithm used crossover and mutation
operators with variable probability, where a traditional genetic algorithm (GA) uses fixed values of those,
and remain unchanged during execution. GA is developed to support adaptive adjustment of mutation and
crossover probability; this allows faster attainment of better solutions. The paper has been tested using
242 Arabic abstracts collected from the proceedings of the Saudi Arabian National conference.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHijdms
The alignment of two DNA sequences is a basic step in the analysis of biological data. Sequencing a long
DNA sequence is one of the most interesting problems in bioinformatics. Several techniques have been
developed to solve this sequence alignment problem like dynamic programming and heuristic algorithms.
In this paper, we introduce (GPCodon alignment) a pairwise DNA-DNA method for global sequence
alignment that improves the accuracy of pairwise sequence alignment. We use a new scoring matrix to
produce the final alignment called the empirical codon substitution matrix. Using this matrix in our
technique enabled the discovery of new relationships between sequences that could not be discovered using
traditional matrices. In addition, we present experimental results that show the performance of the
proposed technique over eleven datasets of average length of 2967 bps. We compared the efficiency and
accuracy of our techniques against a comparable tool called “Pairwise Align Codons” [1].
COMPARISON OF HIERARCHICAL AGGLOMERATIVE ALGORITHMS FOR CLUSTERING MEDICAL DO...ijseajournal
Extensive amount of data stored in medical documents require developing methods that help users to find
what they are looking for effectively by organizing large amounts of information into a small number of
meaningful clusters. The produced clusters contain groups of objects which are more similar to each other
than to the members of any other group. Thus, the aim of high-quality document clustering algorithms is to
determine a set of clusters in which the inter-cluster similarity is minimized and intra-cluster similarity is
maximized. The most important feature in many clustering algorithms is treating the clustering problem as
an optimization process, that is, maximizing or minimizing a particular clustering criterion function
defined over the whole clustering solution. The only real difference between agglomerative algorithms is
how they choose which clusters to merge. The main purpose of this paper is to compare different
agglomerative algorithms based on the evaluation of the clusters quality produced by different hierarchical
agglomerative clustering algorithms using different criterion functions for the problem of clustering
medical documents. Our experimental results showed that the agglomerative algorithm that uses I1 as its
criterion function for choosing which clusters to merge produced better clusters quality than the other
criterion functions in term of entropy and purity as external measures.
Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is timeconsuming
and less accurate comparing with other PEA.
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
Attribute reduction and classification task are an essential process in dealing with large data
sets that comprise numerous number of input attributes. There are many search methods and
classifiers that have been used to find the optimal number of attributes. The aim of this paper is
to find the optimal set of attributes and improve the classification accuracy by adopting
ensemble rule classifiers method. Research process involves 2 phases; finding the optimal set of
attributes and ensemble classifiers method for classification task. Results are in terms of
percentage of accuracy and number of selected attributes and rules generated. 6 datasets were
used for the experiment. The final output is an optimal set of attributes with ensemble rule
classifiers method. The experimental results conducted on public real dataset demonstrate that
the ensemble rule classifiers methods consistently show improve classification accuracy on the
selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
A novel population-based local search for nurse rostering problem IJECEIAES
Population-based approaches regularly are better than single based (local search) approaches in exploring the search space. However, the drawback of population-based approaches is in exploiting the search space. Several hybrid approaches have proven their efficiency through different domains of optimization problems by incorporating and integrating the strength of population and local search approaches. Meanwhile, hybrid methods have a drawback of increasing the parameter tuning. Recently, population-based local search was proposed for a university course-timetabling problem with fewer parameters than existing approaches, the proposed approach proves its effectiveness. The proposed approach employs two operators to intensify and diversify the search space. The first operator is applied to a single solution, while the second is applied for all solutions. This paper aims to investigate the performance of population-based local search for the nurse rostering problem. The INRC2010 database with a dataset composed of 69 instances is used to test the performance of PB-LS. A comparison was made between the performance of PB-LS and other existing approaches in the literature. Results show good performances of proposed approach compared to other approaches, where population-based local search provided best results in 55 cases over 69 instances used in experiments.
Feature selection for multiple water quality status: integrated bootstrapping...IJECEIAES
STORET is one method to determine the river water quality, and to classify them into four classes (very good, good, medium and bad) based on the data of water for each attribute or feature. The success of the formation of pattern recognition model much depends on the quality of data. There are two issues as the concern of this research as follows, the data having disproportionate amount among the classes (imbalance class) and the finding of noise on its attribute. Therefore, this research integrates the SMOTE Technique and bootstrapping to handle the problem of imbalance class. While an experiment is conducted to eliminate the noise on the attribute by using some feature selection algorithms with filter approach (information gain, rule, derivation, correlation and chi square). This research has some stages as follows: data understanding, pre-processing, imbalance class, feature selection, classification and performance evaluation. Based on the result of testing using 10-fold cross validation, it shows that the use of the SMOTE-bootstrapping technique is able to increase the accuracy from 83.3% to be 98.8%. While the process of noise elimination onthe data attribute is also able to increase the accuracy to be 99.5% (the use of feature subset produced by the information gain algorithm and the decision tree classification algorithm).
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
Leave one out cross validated Hybrid Model of Genetic Algorithm and Naïve Bay...IJERA Editor
This paper presents a new approach to select reduced number of features in databases. Every database has a
given number of features but it is observed that some of these features can be redundant and can be harmful as
well as confuse the process of classification. The proposed method first applies a binary coded genetic algorithm
to select a small subset of features. The importance of these features is judged by applying Naïve Bayes (NB)
method of classification. The best reduced subset of features which has high classification accuracy on given
databases is adopted. The classification accuracy obtained by proposed method is compared with that reported
recently in publications on eight databases. It is noted that proposed method performs satisfactory on these
databases and achieves higher classification accuracy but with smaller number of feature
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
Trust Enhanced Role Based Access Control Using Genetic Algorithm IJECEIAES
Improvements in technological innovations have become a boon for business organizations, firms, institutions, etc. System applications are being developed for organizations whether small-scale or large-scale. Taking into consideration the hierarchical nature of large organizations, security is an important factor which needs to be taken into account. For any healthcare organization, maintaining the confidentiality and integrity of the patients’ records is of utmost importance while ensuring that they are only available to the authorized personnel. The paper discusses the technique of Role-Based Access Control (RBAC) and its different aspects. The paper also suggests a trust enhanced model of RBAC implemented with selection and mutation only ‘Genetic Algorithm’. A practical scenario involving healthcare organization has also been considered. A model has been developed to consider the policies of different health departments and how it affects the permissions of a particular role. The purpose of the algorithm is to allocate tasks for every employee in an automated manner and ensures that they are not over-burdened with the work assigned. In addition, the trust records of the employees ensure that malicious users do not gain access to confidential patient data.
Survey on Efficient Techniques of Text Miningvivatechijri
In the current era, with the advancement of technology, more and more data is available in digital
form. Among which, most of the data (approx. 85%) is in unstructured textual form. So it has become essential to
develop better techniques and algorithms to extract useful and interesting information from this large amount of
textual data. Text mining is process of extracting useful data from unstructured text. The algorithm used for text
mining has advantages and disadvantages. Moreover the issues in the field of text mining that affect the accuracy
and relevance of the results are identified.
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
The development of microarray technology has suppli
ed a large volume of data to many fields. The gene
microarray analysis and classification have demonst
rated an effective way for the effective diagnosis
of
diseases and cancers. In as much as the data achiev
ing from microarray technology is very noisy and al
so
has thousands of features, feature selection plays
an important role in removing irrelevant and redund
ant
features and also reducing computational complexity
. There are two important approaches for gene
selection in microarray data analysis, the filters
and the wrappers. To select a concise subset of inf
ormative
genes, we introduce a hybrid feature selection whic
h combines two approaches. The fact of the matter i
s
that candidate’s features are first selected from t
he original set via several effective filters. The
candidate
feature set is further refined by more accurate wra
ppers. Thus, we can take advantage of both the filt
ers
and wrappers. Experimental results based on 11 micr
oarray datasets show that our mechanism can be
effected with a smaller feature set. Moreover, thes
e feature subsets can be obtained in a reasonable t
ime
For years, the Machine Learning community has focused on developing efficient
algorithms that can produce very accurate classifiers. However, it is often much easier
to find several good classifiers based on dataset combination, instead of single classifier
applied on deferent datasets. The advantages of using classifier dataset combinations
instead of a single one are twofold: it helps lowering the computational complexity by
using simpler models, and it can improve the classification accuracy and performance.
Most Data mining applications are based on pattern matching algorithms, thus improving
the performance of the classification has a positive impact on the quality of the overall
data mining task. Since combination strategies proved very useful in improving the
performance, these techniques have become very important in applications such as
Cancer detection, Speech Technology and Natural Language Processing .The aim of this
paper is basically to propose proprietary metric, Normalized Geometric Index (NGI)
based on the latent properties of datasets for improving the accuracy of data mining tasks.
Parallel Genetic Algorithms for University Scheduling ProblemIJECEIAES
University scheduling timetabling problem, falls into NP hard problems. Re-searchers have tried with many techniques to find the most suitable and fastest way for solving the problem. With the emergence of multi-core systems, the parallel implementation was considered for finding the solution. Our approaches attempt to combine several techniques in two algorithms: coarse grained algorithm and multi thread tournament algorithm. The results obtained from two algorithms are compared, using an algorithm evaluation function. Considering execution time, the coarse grained algorithm performed twice better than the multi thread algorithm.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
This paper presents a set of methods that uses a genetic algorithm for automatic test-data generation in
software testing. For several years researchers have proposed several methods for generating test data
which had different drawbacks. In this paper, we have presented various Genetic Algorithm (GA) based test
methods which will be having different parameters to automate the structural-oriented test data generation
on the basis of internal program structure. The factors discovered are used in evaluating the fitness
function of Genetic algorithm for selecting the best possible Test method. These methods take the test
populations as an input and then evaluate the test cases for that program. This integration will help in
improving the overall performance of genetic algorithm in search space exploration and exploitation fields
with better convergence rate.
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...IAEME Publication
Close range photogrammetry network design is referred to the process of placing a set of
cameras in order to achieve photogrammetric tasks. The main objective of this paper is tried to find
the best location of two/three camera stations. The genetic algorithm optimization and Particle
Swarm Optimization are developed to determine the optimal camera stations for computing the three
dimensional coordinates. In this research, a mathematical model representing the genetic algorithm
optimization and Particle Swarm Optimization for the close range photogrammetry network is
developed. This paper gives also the sequence of the field operations and computational steps for this
task. A test field is included to reinforce the theoretical aspects.
Testing Different Log Bases for Vector Model Weighting Techniquekevig
Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of the quotient. By default, we use base 10 to calculate the logarithm. In this paper, we are going to test this weighting technique by using a range of log bases from 0.1 to 100.0 to calculate the IDF. Testing different log bases for vector model weighting technique is to highlight the importance of understanding the performance of the system at different weighting values. We use the documents of MED, CRAN, NPL, LISA, and CISI test collections that scientists assembled explicitly for experiments in data information retrieval systems.
Testing Different Log Bases for Vector Model Weighting Techniquekevig
Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of the quotient. By default, we use base 10 to calculate the logarithm. In this paper, we are going to test this weighting technique by using a range of log bases from 0.1 to 100.0 to calculate the IDF. Testing different log bases for vector model weighting technique is to highlight the importance of understanding the performance of the system at different weighting values. We use the documents of MED, CRAN, NPL, LISA, and CISI test collections that scientists assembled explicitly for experiments in data information retrieval systems.
Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is timeconsuming
and less accurate comparing with other PEA.
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
Attribute reduction and classification task are an essential process in dealing with large data
sets that comprise numerous number of input attributes. There are many search methods and
classifiers that have been used to find the optimal number of attributes. The aim of this paper is
to find the optimal set of attributes and improve the classification accuracy by adopting
ensemble rule classifiers method. Research process involves 2 phases; finding the optimal set of
attributes and ensemble classifiers method for classification task. Results are in terms of
percentage of accuracy and number of selected attributes and rules generated. 6 datasets were
used for the experiment. The final output is an optimal set of attributes with ensemble rule
classifiers method. The experimental results conducted on public real dataset demonstrate that
the ensemble rule classifiers methods consistently show improve classification accuracy on the
selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
A novel population-based local search for nurse rostering problem IJECEIAES
Population-based approaches regularly are better than single based (local search) approaches in exploring the search space. However, the drawback of population-based approaches is in exploiting the search space. Several hybrid approaches have proven their efficiency through different domains of optimization problems by incorporating and integrating the strength of population and local search approaches. Meanwhile, hybrid methods have a drawback of increasing the parameter tuning. Recently, population-based local search was proposed for a university course-timetabling problem with fewer parameters than existing approaches, the proposed approach proves its effectiveness. The proposed approach employs two operators to intensify and diversify the search space. The first operator is applied to a single solution, while the second is applied for all solutions. This paper aims to investigate the performance of population-based local search for the nurse rostering problem. The INRC2010 database with a dataset composed of 69 instances is used to test the performance of PB-LS. A comparison was made between the performance of PB-LS and other existing approaches in the literature. Results show good performances of proposed approach compared to other approaches, where population-based local search provided best results in 55 cases over 69 instances used in experiments.
Feature selection for multiple water quality status: integrated bootstrapping...IJECEIAES
STORET is one method to determine the river water quality, and to classify them into four classes (very good, good, medium and bad) based on the data of water for each attribute or feature. The success of the formation of pattern recognition model much depends on the quality of data. There are two issues as the concern of this research as follows, the data having disproportionate amount among the classes (imbalance class) and the finding of noise on its attribute. Therefore, this research integrates the SMOTE Technique and bootstrapping to handle the problem of imbalance class. While an experiment is conducted to eliminate the noise on the attribute by using some feature selection algorithms with filter approach (information gain, rule, derivation, correlation and chi square). This research has some stages as follows: data understanding, pre-processing, imbalance class, feature selection, classification and performance evaluation. Based on the result of testing using 10-fold cross validation, it shows that the use of the SMOTE-bootstrapping technique is able to increase the accuracy from 83.3% to be 98.8%. While the process of noise elimination onthe data attribute is also able to increase the accuracy to be 99.5% (the use of feature subset produced by the information gain algorithm and the decision tree classification algorithm).
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
Leave one out cross validated Hybrid Model of Genetic Algorithm and Naïve Bay...IJERA Editor
This paper presents a new approach to select reduced number of features in databases. Every database has a
given number of features but it is observed that some of these features can be redundant and can be harmful as
well as confuse the process of classification. The proposed method first applies a binary coded genetic algorithm
to select a small subset of features. The importance of these features is judged by applying Naïve Bayes (NB)
method of classification. The best reduced subset of features which has high classification accuracy on given
databases is adopted. The classification accuracy obtained by proposed method is compared with that reported
recently in publications on eight databases. It is noted that proposed method performs satisfactory on these
databases and achieves higher classification accuracy but with smaller number of feature
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
Trust Enhanced Role Based Access Control Using Genetic Algorithm IJECEIAES
Improvements in technological innovations have become a boon for business organizations, firms, institutions, etc. System applications are being developed for organizations whether small-scale or large-scale. Taking into consideration the hierarchical nature of large organizations, security is an important factor which needs to be taken into account. For any healthcare organization, maintaining the confidentiality and integrity of the patients’ records is of utmost importance while ensuring that they are only available to the authorized personnel. The paper discusses the technique of Role-Based Access Control (RBAC) and its different aspects. The paper also suggests a trust enhanced model of RBAC implemented with selection and mutation only ‘Genetic Algorithm’. A practical scenario involving healthcare organization has also been considered. A model has been developed to consider the policies of different health departments and how it affects the permissions of a particular role. The purpose of the algorithm is to allocate tasks for every employee in an automated manner and ensures that they are not over-burdened with the work assigned. In addition, the trust records of the employees ensure that malicious users do not gain access to confidential patient data.
Survey on Efficient Techniques of Text Miningvivatechijri
In the current era, with the advancement of technology, more and more data is available in digital
form. Among which, most of the data (approx. 85%) is in unstructured textual form. So it has become essential to
develop better techniques and algorithms to extract useful and interesting information from this large amount of
textual data. Text mining is process of extracting useful data from unstructured text. The algorithm used for text
mining has advantages and disadvantages. Moreover the issues in the field of text mining that affect the accuracy
and relevance of the results are identified.
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
The development of microarray technology has suppli
ed a large volume of data to many fields. The gene
microarray analysis and classification have demonst
rated an effective way for the effective diagnosis
of
diseases and cancers. In as much as the data achiev
ing from microarray technology is very noisy and al
so
has thousands of features, feature selection plays
an important role in removing irrelevant and redund
ant
features and also reducing computational complexity
. There are two important approaches for gene
selection in microarray data analysis, the filters
and the wrappers. To select a concise subset of inf
ormative
genes, we introduce a hybrid feature selection whic
h combines two approaches. The fact of the matter i
s
that candidate’s features are first selected from t
he original set via several effective filters. The
candidate
feature set is further refined by more accurate wra
ppers. Thus, we can take advantage of both the filt
ers
and wrappers. Experimental results based on 11 micr
oarray datasets show that our mechanism can be
effected with a smaller feature set. Moreover, thes
e feature subsets can be obtained in a reasonable t
ime
For years, the Machine Learning community has focused on developing efficient
algorithms that can produce very accurate classifiers. However, it is often much easier
to find several good classifiers based on dataset combination, instead of single classifier
applied on deferent datasets. The advantages of using classifier dataset combinations
instead of a single one are twofold: it helps lowering the computational complexity by
using simpler models, and it can improve the classification accuracy and performance.
Most Data mining applications are based on pattern matching algorithms, thus improving
the performance of the classification has a positive impact on the quality of the overall
data mining task. Since combination strategies proved very useful in improving the
performance, these techniques have become very important in applications such as
Cancer detection, Speech Technology and Natural Language Processing .The aim of this
paper is basically to propose proprietary metric, Normalized Geometric Index (NGI)
based on the latent properties of datasets for improving the accuracy of data mining tasks.
Parallel Genetic Algorithms for University Scheduling ProblemIJECEIAES
University scheduling timetabling problem, falls into NP hard problems. Re-searchers have tried with many techniques to find the most suitable and fastest way for solving the problem. With the emergence of multi-core systems, the parallel implementation was considered for finding the solution. Our approaches attempt to combine several techniques in two algorithms: coarse grained algorithm and multi thread tournament algorithm. The results obtained from two algorithms are compared, using an algorithm evaluation function. Considering execution time, the coarse grained algorithm performed twice better than the multi thread algorithm.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
This paper presents a set of methods that uses a genetic algorithm for automatic test-data generation in
software testing. For several years researchers have proposed several methods for generating test data
which had different drawbacks. In this paper, we have presented various Genetic Algorithm (GA) based test
methods which will be having different parameters to automate the structural-oriented test data generation
on the basis of internal program structure. The factors discovered are used in evaluating the fitness
function of Genetic algorithm for selecting the best possible Test method. These methods take the test
populations as an input and then evaluate the test cases for that program. This integration will help in
improving the overall performance of genetic algorithm in search space exploration and exploitation fields
with better convergence rate.
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...IAEME Publication
Close range photogrammetry network design is referred to the process of placing a set of
cameras in order to achieve photogrammetric tasks. The main objective of this paper is tried to find
the best location of two/three camera stations. The genetic algorithm optimization and Particle
Swarm Optimization are developed to determine the optimal camera stations for computing the three
dimensional coordinates. In this research, a mathematical model representing the genetic algorithm
optimization and Particle Swarm Optimization for the close range photogrammetry network is
developed. This paper gives also the sequence of the field operations and computational steps for this
task. A test field is included to reinforce the theoretical aspects.
Testing Different Log Bases for Vector Model Weighting Techniquekevig
Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of the quotient. By default, we use base 10 to calculate the logarithm. In this paper, we are going to test this weighting technique by using a range of log bases from 0.1 to 100.0 to calculate the IDF. Testing different log bases for vector model weighting technique is to highlight the importance of understanding the performance of the system at different weighting values. We use the documents of MED, CRAN, NPL, LISA, and CISI test collections that scientists assembled explicitly for experiments in data information retrieval systems.
Testing Different Log Bases for Vector Model Weighting Techniquekevig
Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of the quotient. By default, we use base 10 to calculate the logarithm. In this paper, we are going to test this weighting technique by using a range of log bases from 0.1 to 100.0 to calculate the IDF. Testing different log bases for vector model weighting technique is to highlight the importance of understanding the performance of the system at different weighting values. We use the documents of MED, CRAN, NPL, LISA, and CISI test collections that scientists assembled explicitly for experiments in data information retrieval systems.
Proposing a scheduling algorithm to balance the time and cost using a genetic...Editor IJCATR
Grid computing is a hardware and software infrastructure and provides affordable, sustainable, and reliable access. Its aim is
to create a supercomputer using free resources. One of the challenges to the Grid computing is scheduling problem which is regarded
as a tough issue. Since scheduling problem is a non-deterministic issue in the Grid, deterministic algorithms cannot be used to improve
scheduling.
In this paper, a combination of genetic algorithms and binary gravitational attraction is used for scheduling problem solving, where the
reduction in the duty performance timing and cost-effective use of simultaneous resources are investigated. In this case, the user
determines the execution time parameter and cost-effective use of resources. In this algorithm, a new approach that has led to a
balanced load of resources is used in the selection of resources. Experimental results reveals that our proposed algorithm in terms of
cost-time and selection of the best resource has reached better results than other algorithm.
Single parent mating in genetic algorithm for real robotic system identificationIAESIJAI
System identification (SI) is a method of determining a mathematical model
for a system given a set of input-output data. A representation is made using
a mathematical model based on certain specified assumptions. In SI, model
structure selection is a step where a model structure perceived as an adequate
system representation is selected. A typical rule is that the final model must
have a good balance between parsimony and accuracy. As a popular search
method, genetic algorithm (GA) is used for selecting a model structure.
However, the optimality of the final model depends much on the
effectiveness of GA operators. This paper presents a mating technique
named single parent mating (SPM) in GA for use in a real robotic SI. This
technique is based on the chromosome structure of the parents such that a
single parent is sufficient in achieving mating that eases the search for the
optimal model. The results show that using three different objective
functions (Akaike information criterion, Bayesian information criterion and
parameter magnitude–based information criterion 2) respectively, GA with
the mating technique is able to find more optimal models than without the
mating technique. Validations show that the selected models using the
mating technique are acceptable.
Reliable and accurate estimation of software has always been a matter of concern for industry and
academia. Numerous estimation models have been proposed by researchers, but no model is suitable for all
types of datasets and environments. Since the motive of estimation model is to minimize the gap between
actual and estimated effort, the effort estimation process can be viewed as an optimization problem to tune
the parameters. In this paper, evolutionary computing techniques, including, Bee colony optimization,
Particle swarm optimization and Ant colony optimization have been employed to tune the parameters of
COCOMO Model. The performance of these techniques has been analysed by established performance
measure. The results obtained have been validated by using data of Interactive voice response (IVR)
projects. Evolutionary techniques have been found to be more accurate than existing estimation models.
EVOLUTIONARY COMPUTING TECHNIQUES FOR SOFTWARE EFFORT ESTIMATIONijcsit
Reliable and accurate estimation of software has always been a matter of concern for industry and academia. Numerous estimation models have been proposed by researchers, but no model is suitable for all types of datasets and environments. Since the motive of estimation model is to minimize the gap between actual and estimated effort, the effort estimation process can be viewed as an optimization problem to tune
the parameters. In this paper, evolutionary computing techniques, including, Bee colony optimization, Particle swarm optimization and Ant colony optimization have been employed to tune the parameters of COCOMO Model. The performance of these techniques has been analysed by established performance measure. The results obtained have been validated by using data of Interactive voice response (IVR)
projects. Evolutionary techniques have been found to be more accurate than existing estimation models.
Reliable and accurate estimation of software has always been a matter of concern for industry and academia. Numerous estimation models have been proposed by researchers, but no model is suitable for all types of datasets and environments. Since the motive of estimation model is to minimize the gap between actual and estimated effort, the effort estimation process can be viewed as an optimization problem to tune the parameters. In this paper, evolutionary computing techniques, including, Bee colony optimization, Particle swarm optimization and Ant colony optimization have been employed to tune the parameters of COCOMO Model. The performance of these techniques has been analysed by established performance measure. The results obtained have been validated by using data of Interactive voice response (IVR) projects. Evolutionary techniques have been found to be more accurate than existing estimation models.
Survey on evolutionary computation tech techniques and its application in dif...ijitjournal
In computer science, 'evolutionary computation' is an algorithmic tool based on evolution. It implements
random variation, reproduction and selection by altering and moving data within a computer. It helps in
building, applying and studying algorithms based on the Darwinian principles of natural selection. In this
paper, studies about different evolutionary computation techniques used in some applications specifically
image processing, cloud computing and grid computing is carried out briefly. This work is an effort to help
researchers from different fields to have knowledge on the techniques of evolutionary computation
applicable in the above mentioned areas.
Using particle swarm optimization to solve test functions problemsriyaniaes
In this paper the benchmarking functions are used to evaluate and check the particle swarm optimization (PSO) algorithm. However, the functions utilized have two dimension but they selected with different difficulty and with different models. In order to prove capability of PSO, it is compared with genetic algorithm (GA). Hence, the two algorithms are compared in terms of objective functions and the standard deviation. Different runs have been taken to get convincing results and the parameters are chosen properly where the Matlab software is used. Where the suggested algorithm can solve different engineering problems with different dimension and outperform the others in term of accuracy and speed of convergence.
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
This paper proposes a simple, automatic and efficient clustering algorithm, namely, Automatic Merging for Optimal Clusters (AMOC) which aims to generate nearly optimal clusters for the given datasets automatically. The AMOC is an extension to standard k-means with a two phase iterative procedure combining certain validation techniques in order to find optimal clusters with automation of merging of clusters. Experiments on both synthetic and real data have proved that the proposed algorithm finds nearly optimal clustering structures in terms of number of clusters, compactness and separation.
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
Similar to APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL (20)
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
1. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.5, No.1, February 2015
DOI : 10.5121/ijcsea.2015.5102 19
APPLYING GENETIC ALGORITHMS TO
INFORMATION RETRIEVAL USING
VECTOR SPACE MODEL
Laith Mohammad Qasim Abualigah
Al-abayt University, Mafraq, Jordan
Essam S. Hanandeh
Zarqa University,Zarqa, Jordan
ABSTRACT
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information
retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the
users’ needs and help them find what they want exactly among the growing numbers of available
information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by
the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the
researcher explored the problems embedded in this process, attempted to find solutions such as the way of
choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on
mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in
1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used
cosine similarity and jaccards to compute similarity between the query and documents, and used two
proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at
evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study
concluded that we might have several improvements when using adaptive genetic algorithms.
KEYWORDS:
Information Retrieval, genetic Algorithm, Adaptive genetic Algorithm, Vector Space Model, Precision,
Recall
1. INTRODUCTION
Due to the increasing number of information and documents created by millions of authors and
organizations on the Internet, information can be retrieved through using information retrieval system.
However, users may encounter several problems during the process of information retrieval system. Thus,
there were several researches tackling these problems that had an effect on the accuracy of the system in
order to find the best solutions in which researchers attempted to increase the efficiency and accuracy of the
system [13].
In this study, results showed some improvements in information retrieval system performance using
adaptive genetic algorithms, through implementing some queries, using several methods in order to obtain
relevant information, sorting such queries and ranking them depending on similarity measure [13].
2. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.5, No.1, February 2015
20
As it should be clear now, the study aims at investigating the information retrieval models. In this study, the
researcher used two models: Vector Space Model and Extended Boolean Model to compute the similarity
between the query and documents [5].
The researcher also proposed two fitness functions: cosine and jaccard’s, and used a variable ratio of
mutation operators in order to have better results by comparing first fitness function (Cosine) with variable
probability mutations and variable probability crossover, as well as comparing the second fitness function
(Jaccard’s) with variable probability mutations and variable probability crossover [5].
The corpus of the study consists of 1400 English documents on Mathematics and 255 queries to evaluate
the effectiveness of the results according to the measures of precision and recall [4] [7].
2. INFORMATION RETRIEVAL
There are fundamental processes in information retrieval systems to get the information that meets the
needs of the user by finding similarity between the query and the existing documents. In this chapter, we
shall show some of the basic processes of matching the query with documents [9], depending on Vector
Space Model (VSM) in which both documents and queries are represented as vectors [10].
There are several processes of information retrieval system as follows:
1- Removing punctuation marks from each document.
2- Removing stop words from each document, such as prepositions, articles and other words that appear
frequently in the document without adding any meaning to it [7] [8].
3- Stemming the words using porter stemmer that is the most commonly used [8].
4- Inverted index, giving each document a unique serial number, known as the document identifier and
connecting them with elements in any document (doc ID) [7].
3. GENETIC ALGORITHM
Genetic algorithm is an important technique in random searches for the best solution among a group of
solutions in the available data [8].In a genetic algorithm, a population of candidate solutions to an
optimization problem is evolved toward better solutions. Each candidate solution has a set of properties (its
chromosomes) which can be mutated and altered; traditionally, solutions are represented in binary system
as strings of 0s and 1s, but other encodings are also possible.[2]
Genetic algorithm uses encoded representation of solutions equivalent to those chromosomes of the
individuals in nature. It is assumed that a potential solution to a problem may be represented as a
chromosome [7].
3.1 Proposed Fitness Function
A proposed fitness function after modification:
1- F (cosine) =
(( )/2
--------------- (1)
• Where wi,j =weights term I in document j.
• Wi.k = weights term I in query k.
The researcher modification of traditional fitness function (Cosine) was through adding new equation
to traditional cosine equation as to compute average weights of all term query,
adding them to the main equation and dividing them by two. This addition in the equation gave more
weight to term query to get relevant documents.
3. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.5, No.1, February 2015
21
2- F (jaccard's) =
(( --- (2)
• Where wi,j =weights term I in document j.
• Wi.k = weights term I in query k.
The researcher modification of traditional fitness function (Jaccard’s) was through adding new
equation to traditional jaccard’s equation as to compute average weights of all term
query, adding them to the main equation and dividing them by two, this addition in equation gave more
weight to term query.
3.2 Equation of crossover probability
The researcher used probability crossover (Pc) equation (3), to find the variable value (not fixed) of
Crossover probability using many ratio probabilities, to get the best solution, though getting better ratio
when applying crossover to chromosome, and to produce new chromosome (original offspring) by
swaping some genes, to get new chromosome (original offspring) better than original chromosome
[11].
----------------------------(3)
Where k1 =0.5 ,k2=0.9 is crossover probabilty depending on many experiences in the system.
Fmax = fitness function maximam in chosen chromosome.
F' = function fitness in chromocome.
Favg = avarge fitness function for all chromosome fitness.
3.3 Adaptive mutation
The probabilities of crossover (pc) and mutation (pm) greatly determine the degree of solution accuracy
and the convergence speed that genetic algorithms can reach. In this study, the researcher got the best
results by choosing some of the ratios used. So, the researcher used many ratios mutation probabilities, but
we have chosen (0.2, 0.001) depending on many experiences in the system to calculate the ratios
probability with fitness function, and then the more accurate class is determined for use in the future in the
next research [2] [5].
3.4 Equation of mutation operator probability
The researcher used probability mutation (Pm) equation (4), to find the variable value (not fixed) of
mutation probability using many ratios probabilities, and get the best solution, though getting a good ratio
when applying mutation to a chromosome produced from crossover as new chromium (mutated offspring)
by flipping some genes,in order to get a new chromosome (mutated offspring) better than the original
chromosome (original offspring) [11].
--------------------------(4)
4. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.5, No.1, February 2015
22
4. LITERATURE REVIEW
In (2013), Wafa Zaal: In this paper, she worked on the adaptation of genetic algorithm using some
models, such as Vector Space Model, the logical model and the language model. She used crossover and
mutation probability variable instead of using a fixed value in traditional genetic algorithms. And she
expanded both crossover and Mutation to get best results. This thesis used Arabic corpus. Her study has
showed that using Vector Space Model with cosine similarity was the best solution and the improvement
rate was 13.0% [10].
In (2013), Korejo and Khuhro: In this paper, the author studied adaptive mutation and proposed four
operators in the genetic algorithm to determine the operator mutation in spite of the difficulty of the matter
in the application process. The author proposed a solution to the problem by adapting the mutation
percentage of mutation. He chose each operator mutation according to the behavior of the initial population
of each generation. Finally, this study has showed that the work of adaptation mutation gave the best result
at work [2].
In (2012), Ammar Sami: In this paper, he proposed a research method based on genetic algorithm to
improve information retrieval system from websites online, and to apply information retrieval using a
genetic algorithm to divide the work into two units, document indexing unit and genetic algorithm unit by
the use of crossover, mutation operator and specialized fitness function. Finally, it obtained an
improvement rate up to 90% [13].
In (2009), Noha Marwan: In this paper, she worked on the use of four different Islands in data retrieval,
and used Jaccard's and Ochiai's in fitness function. She applied each measure to islands independently. She
used expanded query, showed the results and compared the results of the four islands by and without the
use of expansion. She pointed out that the use of expansion improved the search results in accordance with
the user needs and, showed that Jaccard's was better than Ochiai's in using random selection and that
Ochiai's was better than Jaccard's in using unbiased model tournament selection [14].
5. EXPERIMENTAL RESULTS
5.1 Results
In this study, the researcher used ten queries. Every query tended to have 8 statements. In other words, the
researcher had 80 results. After obtaining the results, the researcher analyzed them through using
evaluation criteria. The researcher chose to show the results of query number one.
• In table 1, results showed that the adaptive genetic algorithm (AGA) is used in information retrieval
system (IRs) using Vector Space Model (VSM) and cosine fitness function. It showed relevant
documents in order; the researcher found that the result of using cosine in table 1 was better than the
result of using proposed cosine with VSM in table 2, and that recall increased when precision
decreased in all cases because of the increasing number of samples.
5. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.5, No.1, February 2015
23
Table 1: value of precision and Recall for query number1 by using (VSM) and (AGA) using the cosine
fitness function.
Recall Precision (%)
0.1 85
0.2 75
0.3 72
0.4 64
0.5 50
0.6 38
0.7 32
0.8 22
0.9 20
• In table 2, results showed that the adaptive genetic algorithm (AGA) is used in information
retrieval system (IRs) using Vector Space Model (VSM) and cosine fitness function. It has
showed relevant documents in order; the researcher found better results in the evaluation, a
better degree of similarity and an average precision greater than the traditional cosine. But,
this result had topped the table with a precision rate of 91% and a recall value of 0.1 because
the AGA gave more weight to term query, and has better crossover and mutation probability.
Table 2: value of precision and Recall for query number1 by using (VSM) with (AGA) using the
proposed cosine fitness function.
Recall Precision (%)
0.1 91
0.2 85
0.3 79
0.4 65
0.5 54
0.6 45
0.7 39
0.8 28
0.9 22
• In table3, results shows that the adaptive genetic algorithm (AGA) is used in information
retrieval system (IRs) using the Vector Space Model (VSM) and jaccard's fitness function.
The researcher found better results in table 4 when using proposed jaccard’s and VSM. And,
in this table, the researcher could see that recalls increased when precision decreased in all
cases. But there was a bad result that had a precision rate of 80% and a recall value of 0.1
compared to the next case.
6. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.5, No.1, February 2015
24
Recall Precision (%)
0.1 80
0.2 70
0.3 60
0.4 58
0.5 43
0.6 33
0.7 29
0.8 21
0.9 19
• In table 4, results showed that the adaptive genetic algorithm (AGA) is used in information
retrieval system (IRs) using the Vector Space Model (VSM) and proposed jaccard's fitness
function. Here when using the proposed jaccard's and jaccard's fitness function, the researcher
found better evaluation results and a better degree of similarity compared to proposed
jaccard’s fitness function. The researcher had a higher average precision by the use of
proposed jaccard’s fitness function. But, the best case among the four cases in (VSM) was
when using proposed cosine.
Table 4: value of precision and Recall for query number1 by using (VSM) with (AGA) using
proposed jaccard's fitness function.
Recall Precision (%)
0.1 87
0.2 76
0.3 71
0.4 64
0.5 50
0.6 35
0.7 30
0.8 25
0.9 21
7. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.5, No.1, February 2015
25
Table 5: Average value of precision for all queries using VSM with cosine
Recall
Average precision
Cosine (%)
Average precision
proposed cosine (%)
AGA
Improvement (%)
0.1 85 92 7
0.2 76 85 9
0.3 72 80 8
0.4 65 68 3
0.5 51 55 4
0.6 39 45 6
0.7 33 38 5
0.8 23 28 5
0.9 20 23 3
average 51.5 57.1 5.6
Figure 1: Average value of precision for all queries using VSM-cosine
• In this case, Vector Space Model was used with cosine and proposed cosine. As can be seen,
the average of precision when using proposed cosine was greater than cosine and the
precision was decreasing when the recall was increasing. Moreover, the degree of
improvement was good regarding the top recall value because taking a small sample makes a
precision rate and a recall value of 0.1 good for all queries. But, the recall value of 0.9 was
bad. This result was good because the modification of fitness function gave weight to each
term query and got better ratio probability using adaptive crossover and mutation. But as
showed in figure 1, using VSM with cosine made a greater improvement than using jaccard’s
fitness function.
8. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.5, No.1, February 2015
26
Table 6: Average value of precision for all queries using VSM with jaccard’s
Recall Average precision
Jaccard's (%)
Average precision
proposed Jaccard's (%)
AGA
Improvement (%)
0.1 80 87 7
0.2 71 76 5
0.3 62 70 8
0.4 57 65 8
0.5 42 47 5
0.6 32 35 3
0.7 30 31 1
0.8 21 25 4
0.9 19 20 1
average 46 50.6 4.6
Figure 2: Average value of precision for all queries using VSM-jaccard's
• In this case, Vector Space Model with jaccard’s and proposed jaccard’s was used. As
can be seen, the average precision, when using proposed jaccard’s, was greater than
when using jaccard’s. Also, precision was decreasing, when recall was increasing.
The degree of improvement, the top recall and the average precision rate and the
recall value of 0.1 were good for all queries, whereas the recall value of 0.9 was bad.
This result was good because of the modification of fitness function and the usage of
adaptive crossover and mutation. But when using VSM with cosine, there was a
greater improvement than when using jaccard’s.
Table 7: using Vector Space Model with fitness function option.
option Cosine Proposed
cosine
Jaccard's Proposed
Jaccard's
Average
Precision(%)
51.5 57.1 46 50.6
9. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.5, No.1, February 2015
27
Figure 3: Using VSM with function fitness option
The researcher got the best solution when modifying fitness functions and using probability
crossover and mutation operator.
In table 8, results showed the comparison between improvement strategies and the degree of improvements
for each IR model with each fitness function.
Table 8: Compare between improvements
Average Improvement
Proposed cosine (%)
Average Improvement
Proposed jaccard's (%)
VSM 5.6 4.6
6. CONCLUSION
The researcher proposed an adaptive genetic algorithm (AGA) to enhance information retrieval systems
(IRs) using several fitness functions (cosine, proposed cosine, jaccard's, proposed jaccard's) alongside
Vector Space Model (VSM). Therefore, the researcher concluded things as follows:
1. The best result appeared when using adaptive crossover and mutation operator with VSM-proposed
cosine to retrieve relevant document with an average precision of (57.1%).
2. Having a better generation of initial population was the best result to retrieve relevant documents.
3. When using VSM-proposed cosine, the researcher saw a high degree of similarity for each relevant
document.
4. The best result of using VSM appeared when using proposed cosine with an average precision of (57.1).
5. When comparing between the use of cosine with an average precision of (51.5%) and the use of
proposed cosine with an average precision of (57.1%) alongside VSM, the best result appeared when using
proposed cosine with an improvement degree of 5.6%.
6. When comparing between the use of jaccard's with an average precision of (46%) and the use of
jaccard's addition with an average precision of (50.6%) alongside VSM, the best result appeared when
using proposed jaccard's with an improvement degree of 4.6%.
7. FUTURE WORK
After finishing the study which mainly aims at improving information retrieval using genetic algorithm and
adaptive, The researcher came to the following suggestions:
1. Models, such as Extended Boolean Model and Probabilistic Model, other than Vector Space
Model may be used in future studies.
2. Manners other than adaptive crossover and adaptive mutation can be used in future studies.
10. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.5, No.1, February 2015
28
REFERENCES
[1] Priya Borkar and Leena Patil, "Web Information Retrieval Using Genetic algorithm Particle Swarm
Optimization", international Journal of Future Computer and Communication, Vol. 2, No. 6, pp 595-
599, (2013).
[2] Korejo and Khuhro, "Genetic Algorithm Using an Adaptive Mutation Operator for Numerical
Optimization Functions", University of Sindh, Vol.45, pp 41- 48, (2013).
[3] Taisir Eldos, "Mutative Genetic Algorithms", Journal of Computations & Modelling, Vol.3, No. 2 ,
pp111-124, (2013).
[4] Mohammad Nassar , Feras AL Mshagba , Eman AL mshagba ,"Improving the User Query for the
Boolean Model Using Genetic Algorithms", IJCSI International Journal of Computer Science Issues,
Vol. 8, Issue 5, No. 1 , pp66-70, (2011).
[5] Imtiaz Korejo, Shengxiang Yang, and ChangheLi, "A Comparative Study of Adaptive Mutation
Operators for Genetic Algorithms", The VIII Metaheuristics International Conference, (2009).
[6] Huifang Cheng , Handan, China ,"Improved Genetic Programming algorithm ", International Asia
Symposium on Intelligent Interaction and Affective Computing , ieee, pp168-177, (2009).
[7] Ahmed Radwan and Bahgat Abdel Latef and Abdel Ali , Osman Sadek, "Using Genetic Algorithm to
Improve Information Retrieval Systems", World Academy of Science, Engineering and Technology,
pp1021-1027, (2008).
[8] Detelin Luchev, "APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT
PROBLEM", International Journal "Information Technologies and Knowledge", Vol.1,No. 1, pp309-
216, (2007).
[9] NIR OREN, "Reexamining tf.idf based information retrieval with Genetic Programming", University
of the Witwatersrand, paper, pp1-10, (2002).
[10] Wafa. Maitah, Mamoun. Al-Rababaa and Ghasan. Kannan, "IMPROVING THE EFFECTIVENESS
OF INFORMATION RETRIEVAL SYSTEM USING ADAPTIVE GENETIC ALGORITHM",
International Journal of Computer Science & Information Technology (IJCSIT), Vol 5, No. 5 , pp91-
105, ( 2013).
[11] Sima Uyar, Sanem Sariel, and Gulsen Eryigit, "A Gene Based Adaptive Mutation Strategy for
Genetic Algorithms", Istanbul Technical University, Electrical and Electronics Faculty, Department
of Computer Engineering, LNCS 3103, pp 271–281, (2004).
[12] Sima Etaner and Gulsen Cebiroglu, "An Adaptive Mutation Scheme in Genetic Algorithms for
Fastening the Convergence to the Optimum", Istanbul Technical University, Computer Engineering
Department, (2005).
[13] Informationretrieval"http://www.dsoergel.com/NewPublications/HCIEncyclopediaIRShortEForDS,
pdf, pp1-11.
[14] Kalayanasaravan and Thangamani, "Document Retrieval System using Genetic Algorithm", Kongu
Engineering College, Perundurai, Vol. 2, No.10, pp 943-946, (2013).