The document presents a proposed model for a textual similarity approach for information retrieval systems in wide area networks. It evaluates the performance of four similarity functions (Jaccard, Cosine, Dice, Overlap) using correlation coefficients. Three approaches are proposed: 1) Combining Cosine and Overlap similarity scores, which performed best. 2) Combining Cosine, Dice, and Overlap scores. 3) Combining all four similarity functions. The model is represented as a triangle where the vertices are the results from the three proposed approaches to measure textual similarity between retrieved documents.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...ijcsitcejournal
This paper proposes a semi-structured information retrieval model based on a new method for calculation
of similarity. We have developed CASISS (Calculation of Similarity of Semi-Structured documents)
method to quantify how two given texts are similar. This new method identifies elements of semi-structured
documents using elements descriptors. Each semi-structured document is pre-processed before the
extraction of a set of descriptors for each element, which characterize the contents of elements.It can be
used to increase the accuracy of the information retrieval process by taking into account not only the
presence of query terms in the given document but also the topology (position continuity) of these terms.
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...tmra
Due to the increasing amount and complexity of digital resources, there are several critical issues that arise in digital environments such as ill-structured and poor management of digital information. Different information organization approaches have been used to address these issues. In particular, Semantic Web has been explored for 10 years; however there are not many practical applications. This is in part due to the fact that much attention has been given to the creation rather than the migration of existing data. In addition, the lack of guidelines for choosing the right migration approach, whether Topic Maps or Resource Description Framework (RDF), needs to be addressed. This paper presents a comparison of Semantic Web Data Models (Topic Maps and RDF), followed by an example of migration of existing metadata into ontology-based data for Semantic Web.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...ijcsitcejournal
This paper proposes a semi-structured information retrieval model based on a new method for calculation
of similarity. We have developed CASISS (Calculation of Similarity of Semi-Structured documents)
method to quantify how two given texts are similar. This new method identifies elements of semi-structured
documents using elements descriptors. Each semi-structured document is pre-processed before the
extraction of a set of descriptors for each element, which characterize the contents of elements.It can be
used to increase the accuracy of the information retrieval process by taking into account not only the
presence of query terms in the given document but also the topology (position continuity) of these terms.
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...tmra
Due to the increasing amount and complexity of digital resources, there are several critical issues that arise in digital environments such as ill-structured and poor management of digital information. Different information organization approaches have been used to address these issues. In particular, Semantic Web has been explored for 10 years; however there are not many practical applications. This is in part due to the fact that much attention has been given to the creation rather than the migration of existing data. In addition, the lack of guidelines for choosing the right migration approach, whether Topic Maps or Resource Description Framework (RDF), needs to be addressed. This paper presents a comparison of Semantic Web Data Models (Topic Maps and RDF), followed by an example of migration of existing metadata into ontology-based data for Semantic Web.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
Text document clustering and similarity detection is the major part of document management, where every document should be identified by its key terms and domain knowledge. Based on the similarity, the documents are grouped into clusters. For document similarity calculation there are several approaches were proposed in the existing system. But the existing system is either term based or pattern based. And those systems suffered from several problems. To make a revolution in this challenging environment, the proposed system presents an innovative model for document similarity by applying back propagation time stamp algorithm. It discovers patterns in text documents as higher level features and creates a network for fast grouping. It also detects the most appropriate patterns based on its weight and BPTT performs the document similarity measures. Using this approach, the document can be categorized easily. In order to perform the above, a new approach is used. This helps to reduce the training process problems. The above framework is named as BPTT. The BPTT has implemented and evaluated using dot net platform with different set of datasets.
Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Theoretical work submitted to the Journal should be original in its motivation or modeling structure. Empirical analysis should be based on a theoretical framework and should be capable of replication. It is expected that all materials required for replication (including computer programs and data sets) should be available upon request to the authors.
A study and survey on various progressive duplicate detection mechanismseSAT Journals
Abstract One of the serious problems faced in several applications with personal details management, customer affiliation management, data mining, etc is duplicate detection. This survey deals with the various duplicate record detection techniques in both small and large datasets. To detect the duplicity with less time of execution and also without disturbing the dataset quality, methods like Progressive Blocking and Progressive Neighborhood are used. Progressive sorted neighborhood method also called as PSNM is used in this model for finding or detecting the duplicate in a parallel approach. Progressive Blocking algorithm works on large datasets where finding duplication requires immense time. These algorithms are used to enhance duplicate detection system. The efficiency can be doubled over the conventional duplicate detection method using this algorithm. Severa
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Elevating forensic investigation system for file clusteringeSAT Journals
Abstract In computer forensic investigation, thousands of files are usually surveyed. Much of the data in those files consists of formless manuscript, whose investigation by computer examiners is very tough to accomplish. Clustering is the unverified organization of designs that is data items, remarks, or feature vectors into groups (clusters). To find a noble clarification for this automated method of analysis are of great interest. In particular, algorithms such as K-means, K-medoids, Single Link, Complete Link and Average Link can simplify the detection of new and valuable information from the documents under investigation. This paper is going to present an tactic that applies text clustering algorithms to forensic examination of computers seized in police investigations using multithreading technique for data clustering. Keywords- Clustering, forensic computing, text mining, multithreading.
Semantics-based clustering approach for similar research area detectionTELKOMNIKA JOURNAL
The manual process of searching out individuals in an already existing
research field is cumbersome and time-consuming. Prominent and rookie
researchers alike are predisposed to seek existing research publications in
a research field of interest before coming up with a thesis. From
extant literature, automated similar research area detection systems have
been developed to solve this problem. However, most of them use
keyword-matching techniques, which do not sufficiently capture the implicit
semantics of keywords thereby leaving out some research articles. In this
study, we propose the use of ontology-based pre-processing, Latent Semantic
Indexing and K-Means Clustering to develop a prototype similar research area
detection system, that can be used to determine similar research domain
publications. Our proposed system solves the challenge of high dimensionality
and data sparsity faced by the traditional document clustering technique. Our
system is evaluated with randomly selected publications from faculties
in Nigerian universities and results show that the integration of ontologies
in preprocessing provides more accurate clustering results.
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...ijaia
Regression models and their statistical analyses is one of the most important tool used by scientists and practitioners. The aim of a regression model is to fit parametric functions to data. It is known that the true regression is unknown and specific methods are created and used strictly pertaining to the roblem. For the pioneering work to develop procedures for fitting functions, we refer to the work on the methods of least
absolute deviations, least squares deviations and minimax absolute deviations. Today’s widely celebrated
procedure of the method of least squares for function fitting is credited to the published works of Legendre and Gauss. However, the least squares based models in practice may fail to provide optimal results in nonGaussian situations especially when the errors follow distributions with the fat tails. In this paper an unorthodox method of estimating linear regression coefficients by minimising GMSE(geometric mean of squared errors) is explored. Though GMSE(geometric mean of squared errors) is used to compare models it is rarely used to obtain the coefficients. Such a method is tedious to handle due to the large number of roots obtained by minimisation of the loss function. This paper offers a way to tackle that problem.
Application is illustrated with the ‘Advertising’ dataset from ISLR and the obtained results are compared
with the results of the method of least squares for single index linear regression model.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Design and implementation of Parallel Prefix Adders using FPGAsIOSR Journals
Abstract: Adders are known to have the frequently used in VLSI designs. In digital design we have half adder and full adder, main adders by using these adders we can implement ripple carry adders. RCA use to perform any number of addition. In this RCA is serial adder and it has commutation delay problem. If increase the ha&fa simultaneously delay also increase. That’s why we go for parallel adders(parallel prefix adders). IN the parallel prefix adder are ks adder(kogge-stone),sks adder(sparse kogge-stone),spaning tree and brentkung adder. These adders are designd and implemented on FPGA sparton3E kit. Simulated and synthesis by model sim6.4b, Xilinx ise10.1.
Crash Analysis of Front under Run Protection Device using Finite Element Anal...IOSR Journals
Under-running of passenger vehicles is one of the important parameters to be considered during
design and development of truck chassis. Front Under-run Protection Device (FUPD) plays an important role
in avoiding under-running of vehicles from front side of a truck. An explicit finite element software Altair
Radio's is used in FUPD analysis for impact loading. The deformation of FUPD bar and plastic strains in
FUPD components are determined in the impact analysis for predicting failure of the system to meet the
compliance requirements as per IS 14812-2005. Additionally, failure analysis of the FUPD attachment points
with chassis is determined. Physical testing can be reduced significantly with this approach which ultimately
reduces the total cycle time as well as the cost involved in product development.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
Text document clustering and similarity detection is the major part of document management, where every document should be identified by its key terms and domain knowledge. Based on the similarity, the documents are grouped into clusters. For document similarity calculation there are several approaches were proposed in the existing system. But the existing system is either term based or pattern based. And those systems suffered from several problems. To make a revolution in this challenging environment, the proposed system presents an innovative model for document similarity by applying back propagation time stamp algorithm. It discovers patterns in text documents as higher level features and creates a network for fast grouping. It also detects the most appropriate patterns based on its weight and BPTT performs the document similarity measures. Using this approach, the document can be categorized easily. In order to perform the above, a new approach is used. This helps to reduce the training process problems. The above framework is named as BPTT. The BPTT has implemented and evaluated using dot net platform with different set of datasets.
Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Theoretical work submitted to the Journal should be original in its motivation or modeling structure. Empirical analysis should be based on a theoretical framework and should be capable of replication. It is expected that all materials required for replication (including computer programs and data sets) should be available upon request to the authors.
A study and survey on various progressive duplicate detection mechanismseSAT Journals
Abstract One of the serious problems faced in several applications with personal details management, customer affiliation management, data mining, etc is duplicate detection. This survey deals with the various duplicate record detection techniques in both small and large datasets. To detect the duplicity with less time of execution and also without disturbing the dataset quality, methods like Progressive Blocking and Progressive Neighborhood are used. Progressive sorted neighborhood method also called as PSNM is used in this model for finding or detecting the duplicate in a parallel approach. Progressive Blocking algorithm works on large datasets where finding duplication requires immense time. These algorithms are used to enhance duplicate detection system. The efficiency can be doubled over the conventional duplicate detection method using this algorithm. Severa
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Elevating forensic investigation system for file clusteringeSAT Journals
Abstract In computer forensic investigation, thousands of files are usually surveyed. Much of the data in those files consists of formless manuscript, whose investigation by computer examiners is very tough to accomplish. Clustering is the unverified organization of designs that is data items, remarks, or feature vectors into groups (clusters). To find a noble clarification for this automated method of analysis are of great interest. In particular, algorithms such as K-means, K-medoids, Single Link, Complete Link and Average Link can simplify the detection of new and valuable information from the documents under investigation. This paper is going to present an tactic that applies text clustering algorithms to forensic examination of computers seized in police investigations using multithreading technique for data clustering. Keywords- Clustering, forensic computing, text mining, multithreading.
Semantics-based clustering approach for similar research area detectionTELKOMNIKA JOURNAL
The manual process of searching out individuals in an already existing
research field is cumbersome and time-consuming. Prominent and rookie
researchers alike are predisposed to seek existing research publications in
a research field of interest before coming up with a thesis. From
extant literature, automated similar research area detection systems have
been developed to solve this problem. However, most of them use
keyword-matching techniques, which do not sufficiently capture the implicit
semantics of keywords thereby leaving out some research articles. In this
study, we propose the use of ontology-based pre-processing, Latent Semantic
Indexing and K-Means Clustering to develop a prototype similar research area
detection system, that can be used to determine similar research domain
publications. Our proposed system solves the challenge of high dimensionality
and data sparsity faced by the traditional document clustering technique. Our
system is evaluated with randomly selected publications from faculties
in Nigerian universities and results show that the integration of ontologies
in preprocessing provides more accurate clustering results.
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...ijaia
Regression models and their statistical analyses is one of the most important tool used by scientists and practitioners. The aim of a regression model is to fit parametric functions to data. It is known that the true regression is unknown and specific methods are created and used strictly pertaining to the roblem. For the pioneering work to develop procedures for fitting functions, we refer to the work on the methods of least
absolute deviations, least squares deviations and minimax absolute deviations. Today’s widely celebrated
procedure of the method of least squares for function fitting is credited to the published works of Legendre and Gauss. However, the least squares based models in practice may fail to provide optimal results in nonGaussian situations especially when the errors follow distributions with the fat tails. In this paper an unorthodox method of estimating linear regression coefficients by minimising GMSE(geometric mean of squared errors) is explored. Though GMSE(geometric mean of squared errors) is used to compare models it is rarely used to obtain the coefficients. Such a method is tedious to handle due to the large number of roots obtained by minimisation of the loss function. This paper offers a way to tackle that problem.
Application is illustrated with the ‘Advertising’ dataset from ISLR and the obtained results are compared
with the results of the method of least squares for single index linear regression model.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Design and implementation of Parallel Prefix Adders using FPGAsIOSR Journals
Abstract: Adders are known to have the frequently used in VLSI designs. In digital design we have half adder and full adder, main adders by using these adders we can implement ripple carry adders. RCA use to perform any number of addition. In this RCA is serial adder and it has commutation delay problem. If increase the ha&fa simultaneously delay also increase. That’s why we go for parallel adders(parallel prefix adders). IN the parallel prefix adder are ks adder(kogge-stone),sks adder(sparse kogge-stone),spaning tree and brentkung adder. These adders are designd and implemented on FPGA sparton3E kit. Simulated and synthesis by model sim6.4b, Xilinx ise10.1.
Crash Analysis of Front under Run Protection Device using Finite Element Anal...IOSR Journals
Under-running of passenger vehicles is one of the important parameters to be considered during
design and development of truck chassis. Front Under-run Protection Device (FUPD) plays an important role
in avoiding under-running of vehicles from front side of a truck. An explicit finite element software Altair
Radio's is used in FUPD analysis for impact loading. The deformation of FUPD bar and plastic strains in
FUPD components are determined in the impact analysis for predicting failure of the system to meet the
compliance requirements as per IS 14812-2005. Additionally, failure analysis of the FUPD attachment points
with chassis is determined. Physical testing can be reduced significantly with this approach which ultimately
reduces the total cycle time as well as the cost involved in product development.
IOSR Journal of Humanities and Social Science is an International Journal edited by International Organization of Scientific Research (IOSR).The Journal provides a common forum where all aspects of humanities and social sciences are presented. IOSR-JHSS publishes original papers, review papers, conceptual framework, analytical and simulation models, case studies, empirical research, technical notes etc.
IOSR Journal of Humanities and Social Science is an International Journal edited by International Organization of Scientific Research (IOSR).The Journal provides a common forum where all aspects of humanities and social sciences are presented. IOSR-JHSS publishes original papers, review papers, conceptual framework, analytical and simulation models, case studies, empirical research, technical notes etc.
Analysis of Peak to Average Power Ratio Reduction Techniques in Sfbc Ofdm SystemIOSR Journals
Abstract: Orthogonal Frequency Division Multiplexing (OFDM) has become the most popular modulation echnique for high speed data transmission. But the great disadvantage of the OFDM technique is its high Peak to Average Power Ratio (PAPR). In this paper, the Selected Mapping (SLM) technique and Clipping and Differential Scaling is applied to Space Frequency Block Coded (SFBC) OFDM systems to reduce the PAPR with Alamouti coding scheme. In SLM technique, different representations of OFDM symbols are generated by rotation of the original OFDM frame by different phase sequences, and the signal with minimum PAPR is selected and transmitted. To compensate for the effect of the phase rotation at the receiver, it is necessary to transmit the index of the selected phase sequence as side information (SI). Additionally, a suboptimum detection method that does not need SI is introduced at the receiver side. In Clipping and Differential Scaling technique, the amplitude of complex OFDM signal is clipped and then scaled in such a way so that the PAPR is reduced without causing much degradation in bit error rate (BER). The threshold values for clipping and scaling is determined using Monte Carlo Simulations. Simulation results show that the SLM method and Clipping and Scaling method effectively reduce the PAPR. Keywords: Orthogonal frequency division multiplexing (OFDM), peak to average power ratio (PAPR), selected mapping (SLM), space frequency block coded (SFBC), Side information (SI), high power amplifiers (HPA), complementary cumulative density function (CCDF), inter-symbol interference (ISI).
Performance optimization of thermal systems in textile industriesIOSR Journals
Energy consumption has been growing very rapidly due to population growth, rapid urbanization
and industrial development. For future planning, it is important to know the current specific energy
consumption and the energy intensity in order to estimate future energy consumption. In the industrial sector
especially in textile units’ large amount of energy wasted due to improper working and lack of energy saving
measures. In this study, the various measures that causes the main energy losses are investigated .In the textile
dyeing sector, the main processes are associated with steam in various proportions. For the steam production,
steam boilers of different types are used for different applications. The efficiency of the boiler varies from one
industry to other based on various parameters such as quality of the water, fuel used, blow down quantity, etc.
Steam traps are also having influence on the boiler efficiency. In the case of steam leakage the performance of
the system decreases as the consumption of fuel increases to produce more steam. Thus by adopting various
recovery methods we can reduce the energy losses and fuel consumption.
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
The keyword searching mechanism is traditionally used for information retrieval from Web based systems. However, this system fails to meet the requirements in Web searching of the expert knowledge base based on the popular semantic systems. Semantic search of E-learning documents based on ontology is increasingly adopted in information retrieval systems. Ontology based system simplifies the task of finding correct information on the Web by building a search system based on the meaning of keyword instead of the keyword itself. The major function of the ontology based system is the development of specification of conceptualization which enhances the connection between the information present in the Web pages with that of the background knowledge.The semantic gap existing between the keyword found in documents and those in query can be matched suitably using Ontology based system. This paper provides a detailed account of the semantic search of E-learning documents using ontology based system by making comparison between various ontology systems. Based on this comparison, this survey attempts to identify the possible directions for future research.
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
Vertical intent prediction approach based on Doc2vec and convolutional neural...IJECEIAES
Vertical selection is the task of selecting the most relevant verticals to a given query in order to improve the diversity and quality of web search results. This task requires not only predicting relevant verticals but also these verticals must be those the user expects to be relevant for his particular information need. Most existing works focused on using traditional machine learning techniques to combine multiple types of features for selecting several relevant verticals. Although these techniques are very efficient, handling vertical selection with high accuracy is still a challenging research task. In this paper, we propose an approach for improving vertical selection in order to satisfy the user vertical intent and reduce user’s browsing time and efforts. First, it generates query embeddings vectors using the doc2vec algorithm that preserves syntactic and semantic information within each query. Secondly, this vector will be used as input to a convolutional neural network model for increasing the representation of the query with multiple levels of abstraction including rich semantic information and then creating a global summarization of the query features. We demonstrate the effectiveness of our approach through comprehensive experimentation using various datasets. Our experimental findings show that our system achieves significant accuracy. Further, it realizes accurate predictions on new unseen data.
Computing semantic similarity measure between words using web search enginecsandit
Semantic Similarity measures between words plays an important role in information retrieval,
natural language processing and in various tasks on the web. In this paper, we have proposed a
Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure
between the words by combining both page count method and web snippets method. Four
association measures are used to find semantic similarity between words in page count method
using web search engines. We use a Sequential Minimal Optimization (SMO) support vector
machines (SVM) to find the optimal combination of page counts-based similarity scores and
top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous
word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction
Algorithm outperforms by 89.8 percent of correlation value.
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...cscpconf
Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure
between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method
using web search engines. We use a Sequential Minimal Optimization (SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and
top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction
Algorithm outperforms by 89.8 percent of correlation value.
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYcscpconf
A digital library is a type of information retrieval (IR) system. The existing information retrieval
methodologies generally have problems on keyword-searching. We proposed a model to solve
the problem by using concept-based approach (ontology) and metadata case base. This model
consists of identifying domain concepts in user’s query and applying expansion to them. The
system aims at contributing to an improved relevance of results retrieved from digital libraries
by proposing a conceptual query expansion for intelligent concept-based retrieval. We need to
import the concept of ontology, making use of its advantage of abundant semantics and
standard concept. Domain specific ontology can be used to improve information retrieval from
traditional level based on keyword to the lay based on knowledge (or concept) and change the
process of retrieval from traditional keyword matching to semantics matching. One approach is
query expansion techniques using domain ontology and the other would be introducing a case
based similarity measure for metadata information retrieval using Case Based Reasoning
(CBR) approach. Results show improvements over classic method, query expansion using
general purpose ontology and a number of other approaches.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Application of hidden markov model in question answering systemsijcsa
By the increase of the volume of the saved information on web, Question Answering (QA) systems have been very important for Information Retrieval (IR). QA systems are a specialized form of information retrieval. Given a collection of documents, a Question Answering system attempts to retrieve correct answers to questions posed in natural language. Web QA system is a sample of QA systems that in this system answers retrieval from web environment doing. In contrast to the databases, the saved information on web does not follow a distinct structure and are not generally defined. Web QA systems is the task of automatically answering a question posed in Natural Language Processing (NLP). NLP techniques are used in applications that make queries to databases, extract information from text, retrieve relevant documents from a collection, translate from one language to another, generate text responses, or recognize spoken words converting them into text. To find the needed information on a mass of the non-structured information we have to use techniques in which the accuracy and retrieval factors are implemented well. In this paper in order to well IR in web environment, The QA system in designed and also implemented based on the Hidden Markov Model (HMM)
Similar to Correlation Coefficient Based Average Textual Similarity Model for Information Retrieval System in Wide Area Networks (20)
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Online aptitude test management system project report.pdfKamal Acharya
The purpose of on-line aptitude test system is to take online test in an efficient manner and no time wasting for checking the paper. The main objective of on-line aptitude test system is to efficiently evaluate the candidate thoroughly through a fully automated system that not only saves lot of time but also gives fast results. For students they give papers according to their convenience and time and there is no need of using extra thing like paper, pen etc. This can be used in educational institutions as well as in corporate world. Can be used anywhere any time as it is a web based application (user Location doesn’t matter). No restriction that examiner has to be present when the candidate takes the test.
Every time when lecturers/professors need to conduct examinations they have to sit down think about the questions and then create a whole new set of questions for each and every exam. In some cases the professor may want to give an open book online exam that is the student can take the exam any time anywhere, but the student might have to answer the questions in a limited time period. The professor may want to change the sequence of questions for every student. The problem that a student has is whenever a date for the exam is declared the student has to take it and there is no way he can take it at some other time. This project will create an interface for the examiner to create and store questions in a repository. It will also create an interface for the student to take examinations at his convenience and the questions and/or exams may be timed. Thereby creating an application which can be used by examiners and examinee’s simultaneously.
Examination System is very useful for Teachers/Professors. As in the teaching profession, you are responsible for writing question papers. In the conventional method, you write the question paper on paper, keep question papers separate from answers and all this information you have to keep in a locker to avoid unauthorized access. Using the Examination System you can create a question paper and everything will be written to a single exam file in encrypted format. You can set the General and Administrator password to avoid unauthorized access to your question paper. Every time you start the examination, the program shuffles all the questions and selects them randomly from the database, which reduces the chances of memorizing the questions.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Cosmetic shop management system project report.pdf
Correlation Coefficient Based Average Textual Similarity Model for Information Retrieval System in Wide Area Networks
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. VI (Jan – Feb. 2015), PP 19-25
www.iosrjournals.org
DOI: 10.9790/0661-17161925 www.iosrjournals.org 19 | Page
Correlation Coefficient Based Average Textual Similarity Model
for Information Retrieval System in Wide Area Networks
Jaswinder Singh1
, Parvinder Singh2
, Yogesh Chaba3
1,3
(Department of Computer Science & Engineering, Guru Jambheshwar University of Science & Technology,
Hisar, Haryana , India
2
(Department of Computer Science & Engineering , Deenbandhu Chhotu Ram University of Science &
Technology, Murthal, Sonepat, Haryana, India
Abstract: In wide area networks, retrieving the relevant text is a challenging task for information retrieval
because most of the information requests are text based. The focus of paper is on the similarity measurement,
performance evaluation and design of information retrieval techniques using the four similarity functions i.e.
Jaccard, Cosine, Dice and Overlap. The performance evaluation of these similarity functions has been done for
the similarity between the documents retrieved by the search engine for the entered text using the vector space
model. The correlation coefficient was applied for evaluating the performance of similarity functions. All the
possible combination of similarity functions have been explored and textual similarity model has been proposed
for the information retrieval system in wide area networks.
Keywords: Information Retrieval System, Similarity Functions, Proposed Model of textual similarity, Wide
Area Networks.
I. Introduction
The large amount of information available from the wide area networks is in the form of text, image,
videos and songs i.e. there is variety of data available in the web world [1], [2], [3], [4]. As the major content
available from the web world is in the form of text so to retrieve the relevant text is still a challenge for any
information retrieval system in wide area networks .The user usually types his or her query as text in the search
box of any information retrieval system which is search engine in most of the cases. The search results of the
entered keyword in some cases might not display the required documents which might be due to the lack of the
search method of the user or due to lack in knowledge of how to use the keyword. The goal of the paper is to
design the information retrieval techniques using the four similarity functions i.e. Jaccard, Cosine, Dice and
Overlap similarity functions for enhancing the textual similarity between retrieved documents for the entered
query as text in the chosen search system. This paper is organized as follows.
The first section of paper describes the brief introduction about the heterogeneity of the data and
second section describes the brief introduction about the information retrieval system and about information
retrieval techniques used in wide area networks. The third section is about the similarity functions and the
related work. The fourth section of the paper describes the steps of the experimentation. The fifth section of the
paper describes the results obtained from the experiment. The sixth section of the paper is about the proposed
model of the textual similarity in which three approaches are proposed for the similarity scores for the retrieved
documents for the entered query and model is represented as a triangle in which the three vertices of triangle
represents the results obtained from the three proposed approaches of the information retrieval techniques using
the four similarity functions i.e. Jaccard, Cosine, Dice and Overlap similarity functions .The seventh section of
paper concludes the results obtained from the three proposed approaches.
II. Information retrieval system and information retrieval techniques
in wide area networks
As we know that there is vast amount of information available in the form text in the web world. To
retrieve the relevant information from the web world, information retrieval system is used which delivers the
relevant information to the user. Any information retrieval system contains three main components i.e. query
subsystems, matching mechanism and document database [1], [5]. Fig.1 shows the block diagram of typical
information retrieval system. Matching mechanism retrieve those documents that are judged to be relevant to it
by the use of similarity functions or similarity measures .Similarity functions or the similarity coefficients or the
similarity measures are defined as the functions which measure the degree of similarity between query entered
by the user and documents retrieved using the search system [1]. The technique for comparing the query and
document is called the retrieval technique and Nicholas J. Belkin et.al [6] described that there are two types of
information retrieval techniques i.e. exact match techniques and partial match techniques. Partial match
techniques have the advantage over the exact match techniques that these also include those documents that
2. Correlation Coefficient Based Average Textual Similarity Model for Information Retrieval System ….
DOI: 10.9790/0661-17161925 www.iosrjournals.org 20 | Page
exactly match with the query in the retrieved documents. Next level of the classification of retrieval techniques
distinguishes the techniques that compare the query with the individual document representation and the
techniques based on the representation of network of documents. Individual representation based techniques
were further classified by Nicholas J. Belkin et.al [6] as the structure based and feature based techniques. In the
feature based techniques queries and documents are represented as sets of features such as terms. This category
includes the techniques based upon the formal models which include the vector space model, probabilistic
model and others.
Figure1: Block diagram of typical information retrieval system
III. Similarity functions and related work
In the information retrieval, similarity functions are functions which are used to measure the similarity
between user query and documents. To retrieve the documents in response to a user query is the most common
text retrieval task. For this reason, most of the text similarity functions have been developed that take input as a
query and retrieve the matching documents. Various similarity functions have been developed but how they are
best applied in information retrieval and how similarity values or rankings should be interpreted is not answered
yet. It is therefore difficult to decide which similarity function should be used for a particular application as
wide range of similarity functions were developed which are used in the different fields such as information
retrieval [7], image retrieval [8], genetics and molecular biology [9] and chemistry [10]. Several similarity
functions were surveyed by McGill et.al [11].Sung-Hyuk Cha [12] classified similarity measures for comparing
the nominal type of histograms. The vector space model was used by William P. Jones et.al [7] for the
geometric representation of similarity measures i.e. Inner Product, Cosine, Dice and Overlap. The String-based,
Corpus-based and knowledge-based are the three categories of textual similarity functions described by Wael H.
Gomaa et. al [13]. It was further described that the character-based approach and the term based approach are
the two sub categories of the string-based approach The term-based approach includes Jaccard, Cosine, Dice and
Overlap similarity functions. Suphakit Niwattanakul et.al [14] concluded that Jaccard similarity coefficient is
suitable sufficiently to be employed in the word similarity measurement. Wael Musa Hadi et.al [15] concluded
the Cosine similarity measure outperforms Jaccard and Dice similarity functions using the vector space model.
From the literature survey of the similarity functions it was found that there are wide range of similarity
functions and various authors have used them differently in the different domains and our work is different from
their work in view that we have explored all the combinations of four similarity functions i.e. Jaccard, Cosine,
Dice and Overlap similarity functions and proposed a model for the design of information retrieval techniques
using similarity functions in wide area networks using the vector space model .
IV. Experimentation
In the experiment Google search engine was used as the search tool to retrieve the web pages for the
entered keyword and ten queries were considered for the similarity measurement using four similarity functions
i.e. Jaccard, Cosine, Dice and Overlap. For the performance evaluation and design of information retrieval
techniques with the said similarity functions using the vector space model in wide area networks, binary weights
were used for the representation of query and documents which means that the weight of term is „1‟ if term
occurs in the document and „0‟ if the term does not occurs in the document. The similarity was measured by the
four similarity functions i.e. Jaccard, Cosine, Dice and Overlap.
The experiment was divided into the different steps.
Step1: Similarity measurement using the similarity functions.
Step2: Analysis of the similarity functions based upon the similarity scores.
Step3: Correlation coefficient measurement for the similarity scores obtained from step 2.
Step4: Exploring all the combinations of similarity functions.
Step5: Performance evaluation of the similarity functions based upon the correlation coefficient.
Step6: Proposed the model for textual similarity using similarity functions.
3. Correlation Coefficient Based Average Textual Similarity Model for Information Retrieval System ….
DOI: 10.9790/0661-17161925 www.iosrjournals.org 21 | Page
V. Results Obtained From Experimentation
Step 1: Similarity measurement using the similarity functions
The similarity between the documents retrieved for the entered query in the search engine was measured and the
process was repeated for the ten different queries and similarity scores were obtained by using the Jaccard,
Cosine, Dice and Overlap similarity functions and average similarity value was measured for the obtained
values of similarity for the different queries [16]. The results obtained are shown in table1.
Table1: Average Similarity for Jaccard, Cosine, Dice and Overlap Similarity Functions for Different Queries.
Query
No.
Query Entered in Search Engine Jaccard
Similarity
(A)
Cosine
Similarity
(B)
Dice
Similarity
(C)
Overlap
Similarity
(D)
Q1 Terrorist Attack Mumbai 0.3111 0.4280 0.4218 0.4863
Q2 Cloud Burst India 0.2277 0.3112 0.3085 0.3427
Q3 Moist Attack India 0.2443 0.3345 0.3262 0.3960
Q4 Corruption Cricket India 0.2906 0.4093 0.4047 0.4592
Q5 Pollution River Ganga 0.4493 0.5969 0.5914 0.6645
Q6 Power Generation India 0.2800 0.3823 0.3784 0.4269
Q7 Sand Mining India 0.3898 0.5210 0.5176 0.5675
Q8 Mid Day Meal India 0.3111 0.4278 0.4198 0.4949
Q9 Sikh Riots India 0.3536 0.4784 0.4763 0.5141
Q10 Moist Attack Train 0.3760 0.5116 0.5070 0.5627
Step 2: Analysis of the similarity functions based upon similarity scores
From the above table it is clear that the similarity scores of the Overlap similarity function outperforms
the similarity scores obtained using the Cosine, Dice and Jaccard similarity functions. The cosine similarity
outperforms the Dice and Jaccard similarity.
Step 3: Correlation Coefficient measurement for the similarity scores obtained using similarity functions
The linear associations between the similarity scores obtained using the four similarity functions is
obtained using the correlation coefficient .Correlation Coefficient is a measure which measures of the strength
of linear association between two variables. Correlation will always between -1.0 and +1.0. If the correlation is
positive, a positive relationship is there and if it is negative, the relationship is negative. In this step of
experiment the average Jaccard similarity is represented as A, average Cosine similarity is represented as B,
average Dice similarity is represented as C and average Overlap similarity is represented as D. The general
formula of the Correlation coefficient between the two scores i.e. A and B for N no. of values is given below.
Correlation Coefficient = [NΣAB - (ΣA) (ΣB) / Sqrt ([NΣA2
- (ΣA)2
] [NΣB2
- (ΣB) 2
])]
Where N = no. of values , A = First score, B= Second score
ΣAB = Sum of product of first and second scores
ΣA = Sum of first scores, ΣB = Sum of second scores
ΣA2
= Sum of squares of first scores, ΣB2
= Sum of squares of second scores
In the experiment the evaluation of the similarity scores using the different similarity functions i.e. Jaccard,
Cosine, Dice, Overlap have been done by measuring the correlation coefficient [17].The results are summarized
in table 2.
Table 2: Correlation Coefficient between Jaccard and Cosine, Jaccard and Dice, Jaccard and Overlap, Cosine
and Dice, Cosine and Overlap, Dice and Overlap Similarity Functions
Correlation Between Correlation Coefficient
A and B(Jaccard and Cosine) 0.974
A and C(Jaccard and Dice) 0.972
A and D(Jaccard and Overlap) 0.963
B and C(Cosine and Dice) 0.999
B and D(Cosine and Overlap) 0.992
C and D(Dice and Overlap) 0.988
Step 4: Exploring all the combinations of similarity functions.
In this step of experimentation all the possible combinations of four similarity functions have been
explored .It was found that if two similarity functions are to be combined then six combinations are there i.e.
Jaccard Cosine, Jaccard Dice, Jaccard Overlap, Cosine Dice, Cosine Overlap and Dice Overlap. If three
similarity functions are to be combined then four combinations are there i.e. Jaccard Cosine Dice, Jaccard
Cosine Overlap, Jaccard Dice Overlap and Cosine Dice Overlap. If all the four similarity functions are
combined then only one combination is there i.e. Jaccard Cosine Dice Overlap combination.
4. Correlation Coefficient Based Average Textual Similarity Model for Information Retrieval System ….
DOI: 10.9790/0661-17161925 www.iosrjournals.org 22 | Page
Step 5: Performance evaluation of the similarity functions based upon the correlation coefficient.
It was proposed in [17] that if two similarity functions are combined then from the possible six
combinations which are described in above step if we combine the the similarity scores of Cosine
similarity(B), obtained using the Cosine similarity function and similarity scores of Overlap similarity(D),
obtained using Overlap similarity function we got the highest average values than the average values of other
combinations as shown in table 3. From the table 2 it is clear correlation coefficient between similarity scores
of the Cosine and Dice is highest i.e. 0.999 and the correlation coefficient between similarity scores of Cosine
and Overlap is 0.992. and correlation between similarity scores between Dice and Overlap is 0.988.In the
proposed approach [17] , Cosine Overlap combination was chosen because average of scores of the Cosine and
Overlap combination give the results which are in correlation with the other similarity scores using Cosine &
Dice simlarity functions and similarity scores is more than Cosine and Dice individually.
Step 6: In this step other possible combimations which are described in step 4 are evaluated on the basis of
correlation coefficient and a model for the textual similarity is proposed .
VI. Proposed Model of Textual Similarity Using Similarity Functions
Model of textual similarity is proposed for the information retrieval system in which all the possibilities of the
combinations of four similarity functions have been explored.
Figure 2 Three approaches for the textual similarity using Jaccard, Cosine, Dice and Overlap Similarity
functions.
From the possible six combinations of two similarity functions it i.e. JaccardCosine, JaccardDice,
JaccardOverlap, CosineDice, CosineOverlap and DiceOverlap, the best one is Avg.CosineOverlap combination.
From the possible four combinations of three similarity functions i.e. JaccardCosineDice,
JaccardCosineOverlap, JaccardDiceOverlap and CosineDiceOverlap the best one is
Avg.CosineDiceOverlap.The last possible combination is of combination of four similarity functions i.e. Avg.
JaccardCosineDiceOverlap. In the proposed model all the three approaches are explored.
(1) First approach based on the combination of Cosine and Overlap similarity functions (Avg.
CosineOverlap):
It was proposed in [17] that on combining the similarity scores of Cosine similarity(B) and similarity
scores of Overlap similarity(D) which is obtained using the Cosine and Overlap similarity functions , the
highest average values was obtained than the average values of other combinations as shown in table 3 and
figure 2.The results obtained are highly correlated with ths similarity scores of Cosine, Dice and Overlap
similarity.
Evaluation of First Approach: On evaluation of scores of Jaccard similarity(A), scores of Cosine
similarity(B), scores of Dice similarity(C) and scores of Overlap similarity(D) using Jaccard,Cosine, Dice and
Overlap similarity functions respectively it was found from table 1 that the Overlap similarity outperforms the
Cosine similarity, Dice similarity and Jaccard similarity but from table 2 it was found that the correlation
coefficient between scores of Cosine similarity(B) and scores of Dice similarity(C) is highest i.e. 0.999 So we
proposed our approach that on taking the average of similarity scores of Cosine similarity(B) and similarity
5. Correlation Coefficient Based Average Textual Similarity Model for Information Retrieval System ….
DOI: 10.9790/0661-17161925 www.iosrjournals.org 23 | Page
scores of Overlap similarity(D) which is obtained using the Cosine and Overlap similarity functions and
obtained results shows that the highest average values for the said combinations than the average values of
other combinations as shown in table 3[17] and fig. 3[17].The results obtained are correlated with ths similarity
scores of Cosine, Dice and Overlap similarity.
Table 3: Average of JaccardCosine, JaccardDice, JaccardOverlap, CosineDice, CosineOverlap, DiceOverlap.
Query JaccardCosine
(Avg. AB)
JaccardDice
(Avg. AC)
JaccardOverlap
(Avg.AD)
Cosine Dice
(Avg. BC)
CosineOverlap
(Avg. BD)
DiceOverlap
(Avg.CD)
Q1 0.36955 0.36645 0.3987 0.4249 0.45715 0.45405
Q2 0.26945 0.2681 0.2852 0.30985 0.32695 0.3256
Q3 0.2894 0.28525 0.32015 0.33035 0.36525 0.3611
Q4 0.34995 0.34765 0.3749 0.407 0.43425 0.43195
Q5 0.5231 0.52035 0.5569 0.59415 0.6307 0.62795
Q6 0.33115 0.3292 0.35345 0.38035 0.4046 0.40265
Q7 0.4554 0.4537 0.47865 0.5193 0.54425 0.54255
Q8 0.36945 0.36545 0.403 0.4238 0.46135 0.45735
Q9 0.416 0.41495 0.43385 0.47735 0.49625 0.4952
Q10 0.4438 0.4415 0.46935 0.5093 0.53715 0.53485
Avg. Value 0.381725 0.37926 0.407415 0.437635 0.46579 0.463325
Figure3. Values of Similarity for Avg. JaccardCosine, Avg. JaccardDice, Avg.JaccardOverlap, Avg. CosineDice, Avg.
CosineOverlap, Avg. DiceOverlap for the different queries and Avg. values for all the queries.
(2) Second approach based on the combination of Cosine, Dice and Overlap similarity functions i.e. Avg.
CosineDiceOverlap
From the table 2 it was found that the correlation coefficient is maximum between Cosine and Dice
similarity scores i.e. 0.999 and it is 0.992 for the Cosine and Overlap and it 0.988 for Dice and Overlap. So
from this evaluation of correlation coefficient we here proposed another approach that if we combine Cosine
Dice Overlap then the results obtained are optimum. The results of the combination are shown in table 4. We
have ignored the Jaccard Similarity function because from the table 2 it was found that the correlation
coefficient between the Jaccard and Cosine Similarity scores was 0.974 and correlation coefficient between
Jaccard and Dice similarity scores was 0.972 and it was 0.963 for the Jaccard and Overlap.
Table 4: Similarity scores of JaccardCosineDice, JaccardCosineOverlap, JaccardDiceOverlap and
CosineDiceOverlap
Query
Avg. JaccardCosineDice
(Avg. ABC)
Avg.
JaccardCosineOverlap
(Avg. ABD)
( Avg.
JaccardDiceOverlap
(Avg. ACD)
( Avg. CosineDiceOverlap
(Avg. BCD)
Q1 0.386967 0.408467 0.4064 0.445367
Q2 0.282467 0.293867 0.292967 0.3208
Q3 0.301667 0.324933 0.322167 0.352233
Q4 0.3682 0.386367 0.384833 0.4244
Q5 0.545867 0.570233 0.5684 0.6176
Q6 0.3469 0.363067 0.361767 0.395867
Q7 0.476133 0.492767 0.491633 0.535367
Q8 0.386233 0.411267 0.4086 0.4475
Q9 0.4361 0.4487 0.448 0.4896
Q10 0.464867 0.483433 0.4819 0.5271
Avg.
Value 0.3994 0.41831 0.4166667 0.455583
6. Correlation Coefficient Based Average Textual Similarity Model for Information Retrieval System ….
DOI: 10.9790/0661-17161925 www.iosrjournals.org 24 | Page
(3) Third approach based on the combination of Jaccard, Dice, Cosine and Overlap similarity functions
i.e. Avg. JaccardCosineDiceOverlap:
In the last proposed approach the similarity scores of all the four similarity functions are combined
using the four similarity functions i.e. Jaccard, Cosine, Dice and Overlap similarity functions and average is
taken which is represented as Avg. JaccardCosineDiceOverlap and results obtained are shown in the table 5.
Table 5: Similarity scores using Avg. JaccardCosineDiceOverlap approach
Comparative analysis of the proposed design approaches of information retrieval techniques using four
similarity functions:
Based on the above three proposed design approaches the experiment is repeated with the different
queries i.e. Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10 and average of all the scores were taken to get the average
values for the ten entered queries as shown in table 6. Three average values have been obtained from the three
proposed design approaches from the different combinations of similarity functions.
Table 6: Similarity scores using the three proposed approaches
Avg. Values for ten queries(Q1,Q2.......Q10) using Results
Avg. CosineOverlap approach 0.46579
Avg. CosineDiceOverlap approach 0.455583
Avg. JaccardCosineDiceOverlap approach 0.422525
Representation of proposed model:
These three avg. values represent the three vertices of a triangle in the proposed model for the textual
similarity as shown in figure 4. In the proposed model R1, R2 and R3 are the vertices of triangle where R1 is
result1 and it is the avg. value of CosineOverlap combination which is first approach in the proposed model, R2
which is result 2 and it is the avg. value of CosineDiceOverlap combination which is the second approach in the
proposed model and R3 which is result 3 and it is the avg. value of JaccardCosineDiceOverlap combination
which is the third approach in proposed model..
Figure 4: The proposed model of textual similarity using similarity functions.
VII. Conclusions
The model is proposed for the textual similarity between the documents retrieved for the entered
query in the information retrieval system using the similarity functions in wide area networks.The model is
based upon the correlation coefficient. While proposing the model for the matching mechanism for the
information retrieval system for the textual similarity all the posible combinations of similarity functions were
explored and it was found that there are sixteen possible combinations including empty set. On evaluation of
Jaccard, Cosine, Dice and Overlap similarity functions it was found from the table 2 that correlation oefficient
between the scores of similarity of Cosine & Dice is highest i.e 0.999 than the others. But from table 1 it is
clear that the scores of similarity of Overlap similarity function outperforms the similarity scores of Cosine,
Dice and Jaccard similarity function. From the table 3 it is concluded that first proposed approach of taking
Query Avg. JaccardCosineDiceOverlap
(Avg. ABCD)
Q1 0.4118
Q2 0.297525
Q3 0.32525
Q4 0.39095
Q5 0.575525
Q6 0.3669
Q7 0.498975
Q8 0.4134
Q9 0.4556
Q10 0.489325
Avg. Value 0.422525
7. Correlation Coefficient Based Average Textual Similarity Model for Information Retrieval System ….
DOI: 10.9790/0661-17161925 www.iosrjournals.org 25 | Page
the average of similarity scores of Cosine & Overlap combination using Cosine and Overlap similarity
functions outperforms the avg. of other combinations and from the fig. 3 it is clear that the Avg. CosineOverlap
combination give better results than average of other combinations of two similarity functions i.e.
JaccardCosine, JaccardDice, JaccardOverlap, CosineDice, Dice Overlap. It is also concluded from the second
proposed approach that avg. CosineDiceOverlap give the results better than the avg. of other combinations of
three similarity functions i.e. JaccardCosineDice, JaccardCosineOverlap and JaccardDiceOverlap. The last
approach combines the similarity scores of Jaccard, Cosine, Dice and Overlap similarity functions and average
is taken. In the proposed model R1, R2 and R3 are the results of average value for all the said queries and are
the results of three proposed approaches and represented by a triangle as shown in figure 4.
References
[1]. R. Baeza-Yates, B. Ribiero-Neto, Modern Information Retrieval, Pearson Education, 1999.
[2]. M.P.S. Bhatia, Akshi Kumar Khalid, “A Primer on the Web Information Retrieval Paradigm”, Journal of Theoretical and Applied
Information Technology, Vol.4, No.2, pp.657-662, 2008.
[3]. Nicholas J. Belkin, W. Bruce Croft, “Information Filtering and Information Retrieval: Two Sides of the Same Coin?”
Communications of the ACM, Vol.35, No.12, p29 (10), 1992.
[4]. Jaswinder Singh, Parvinder Singh, Yogesh Chaba, “A Study of Similarity Functions Used in Textual Information Retrieval in Wide
Area Networks”, International Journal of Computer Science and Information Technologies, Vol. 5, Issue 6, pp. 7880-7884,2014.
[5]. G. Salton and M.H. McGill, Introduction to Modern Information Retrieval, McGraw Hill, 1983.
[6]. Nicholas J. Belkin, W. Bruce Croft, “Retrieval Techniques,” Annual Review of Information Science & Technology, M.E Williams,
Ed. Chapter. 4, pp.109-145 Elsevier, 1987.
[7]. William P. Jones, George W.furnas, “Picture of Relevance: A Geometric Analysis of Similarity Measures,” Journal of the American
Society for Information Science, Vol.38, No.6, pp.420-442, 1987.
[8]. Siti Salwa Salleh, Noor Aznimah Abdul Aziz, Daud Mohamad and Megawati Omar, “Combining Mahalanobis and Jaccard
Distance to Overcome Similarity Measurement Constriction on Geometrical Shapes,” International Journal of Computer Science
Issues, Vol. 9, Issue 4, pp. 124-132, 2012.
[9]. Jair Moura Duarte, Joao Bosco dos Santos and Leonardo Cunha Melo, “Comparison of Similarity Coefficients Based on RAPD
Markers in the Common Bean,” Genetics and Molecular Biology, Vol. 22, Issue 3, pp. 427-432, 1999.
[10]. P. Wallet, J. M. Barnard and G.M. Downs, “Chemical Similarity Searching,” Journal of Chemical and Information and Computer
Sciences, Vol. 38, No. 6, pp. 983-996, 1998.
[11]. McGill, Koll and Noreault, “An Evaluation of Factors Affecting Document Ranking by Information Retrieval Systems,” Project
report, Syracuse University, 1979.
[12]. Sung-Hyuk Cha, “Comprehensive Survey on the Distance/Similarity Measures between Probability Density Functions,”
International Journal of Mathematical Models and Methods in Applied Sciences, Vol. 1, Issue 4, pp. 300-307, 2007.
[13]. Wael H. Gomaa, Aly A. Fahmy, “A Survey of Text Similarity Approaches,” International Journal of Computer Applications, Vol.
68, No. 13, pp. 13-18, 2013.
[14]. Suphakit Niwattanakul, Jatsada Singhthongchai, Ekkachai Naenudorn and Supachanun Wanapu, “Using of Jaccard Coefficient for
Keywords Similarity,” Proc. of International Multi Conference of Engineers and Computer Scientists, IMECS 2013, 2013, Hong
Kong.
[15]. Wa‟el Musa Hadi, Fadi Thabtah, Hussein Abdel-jaber, “A Comparative Study Using Vector Space Model with K-Nearest Neighbor
on Text Categorization Data,” Proc. of the World Congress on Engineering, WCE2007,2007,Vol.1, London, U.K.
[16]. Jaswinder Singh, Parvinder Singh, Yogesh Chaba, “Performance Modeling of Information Retrieval Techniques Using Similarity
Functions in Wide Area Networks”, International Journal of Advanced Research in Computer Science and Software
Engineering,Vol.4, Issue12, pp.786-793, 2014.
[17]. Jaswinder Singh, Parvinder Singh,Yogesh Chaba, “Performance Evaluation and Design of Optimized Information Retrieval
Techniques Using Similarity Functions in Wide Area Networks” , International Journal of Advanced Research in Computer
Science and Software Engineering,Vol.5, Issue1, pp.461-469, 2015.