This document discusses methods for measuring semantic similarity between words. It begins by discussing how traditional lexical similarity measurements do not consider semantics. It then discusses several existing approaches that measure semantic similarity using web search engines and text snippets. These approaches calculate word co-occurrence statistics from page counts and analyze lexical patterns extracted from snippets. Pattern clustering is used to group semantically similar patterns. The approaches are evaluated using datasets and metrics like precision and recall. Finally, the document proposes a new method that combines page count statistics, lexical pattern extraction and clustering, and support vector machines to measure semantic similarity.
Computing semantic similarity measure between words using web search enginecsandit
Semantic Similarity measures between words plays an important role in information retrieval,
natural language processing and in various tasks on the web. In this paper, we have proposed a
Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure
between the words by combining both page count method and web snippets method. Four
association measures are used to find semantic similarity between words in page count method
using web search engines. We use a Sequential Minimal Optimization (SMO) support vector
machines (SVM) to find the optimal combination of page counts-based similarity scores and
top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous
word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction
Algorithm outperforms by 89.8 percent of correlation value.
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by
combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values,
Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8 % of correlation value.
The growing number of datasets published on the Web as linked data brings both opportunities for high data
availability of data. As the data increases challenges for querying also increases. It is very difficult to search
linked data using structured languages. Hence, we use Keyword Query searching for linked data. In this paper,
we propose different approaches for keyword query routing through which the efficiency of keyword search can
be improved greatly. By routing the keywords to the relevant data sources the processing cost of keyword search
queries can be greatly reduced. In this paper, we contrast and compare four models – Keyword level, Element
level, Set level and query expansion using semantic and linguistic analysis. These models are used for keyword
query routing in keyword search.
Extracting and Reducing the Semantic Information Content of Web Documents to ...ijsrd.com
Ranking and optimization of web service compositions represent challenging areas of research with significant implication for realization of the "Web of Services" vision. The semantic web, where the semantics information is indicated using machine-process able language such as the Web Ontology Language (OWL) "Semantic web service" use formal semantic description of web service functionality and enable automated reasoning over web service compositions. These semantic web services can then be automatically discovered, composed into more complex services, and executed. Automating web service composition through the use of semantic technologies calculating the semantic similarities between outputs and inputs of connected constituent services, and aggregate these values into a measure of semantics quality for the composition. It propose a novel and extensible model balancing the new dimensions of semantic quality ( as a functional quality metric) with QoS metric, and using them together as a ranking and optimization criteria. It also demonstrates the utility of Genetic Algorithms to allow optimization within the context of a large number of services foreseen by the "Web of Service" vision. To reduce the semantics of the web documents then to support semantic document retrieval by using Network Ontology Language (NOL) and to improve QoS as a ranking and optimization.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
Existing work on keyword search relies on an element-level model (data graphs) to compute keyword query results.Elements mentioning keywords are retrieved from this model and paths between them are explored to compute Steiner graphs. KRG (keyword Element Relationship Graph) captures relationships at the keyword level.Relationships captured by a KRG are not direct edges between tuples but stand for paths between keywords.
We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements,element sets, and subgraphs that connect these elements
Computing semantic similarity measure between words using web search enginecsandit
Semantic Similarity measures between words plays an important role in information retrieval,
natural language processing and in various tasks on the web. In this paper, we have proposed a
Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure
between the words by combining both page count method and web snippets method. Four
association measures are used to find semantic similarity between words in page count method
using web search engines. We use a Sequential Minimal Optimization (SMO) support vector
machines (SVM) to find the optimal combination of page counts-based similarity scores and
top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous
word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction
Algorithm outperforms by 89.8 percent of correlation value.
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by
combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values,
Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8 % of correlation value.
The growing number of datasets published on the Web as linked data brings both opportunities for high data
availability of data. As the data increases challenges for querying also increases. It is very difficult to search
linked data using structured languages. Hence, we use Keyword Query searching for linked data. In this paper,
we propose different approaches for keyword query routing through which the efficiency of keyword search can
be improved greatly. By routing the keywords to the relevant data sources the processing cost of keyword search
queries can be greatly reduced. In this paper, we contrast and compare four models – Keyword level, Element
level, Set level and query expansion using semantic and linguistic analysis. These models are used for keyword
query routing in keyword search.
Extracting and Reducing the Semantic Information Content of Web Documents to ...ijsrd.com
Ranking and optimization of web service compositions represent challenging areas of research with significant implication for realization of the "Web of Services" vision. The semantic web, where the semantics information is indicated using machine-process able language such as the Web Ontology Language (OWL) "Semantic web service" use formal semantic description of web service functionality and enable automated reasoning over web service compositions. These semantic web services can then be automatically discovered, composed into more complex services, and executed. Automating web service composition through the use of semantic technologies calculating the semantic similarities between outputs and inputs of connected constituent services, and aggregate these values into a measure of semantics quality for the composition. It propose a novel and extensible model balancing the new dimensions of semantic quality ( as a functional quality metric) with QoS metric, and using them together as a ranking and optimization criteria. It also demonstrates the utility of Genetic Algorithms to allow optimization within the context of a large number of services foreseen by the "Web of Service" vision. To reduce the semantics of the web documents then to support semantic document retrieval by using Network Ontology Language (NOL) and to improve QoS as a ranking and optimization.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
Existing work on keyword search relies on an element-level model (data graphs) to compute keyword query results.Elements mentioning keywords are retrieved from this model and paths between them are explored to compute Steiner graphs. KRG (keyword Element Relationship Graph) captures relationships at the keyword level.Relationships captured by a KRG are not direct edges between tuples but stand for paths between keywords.
We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements,element sets, and subgraphs that connect these elements
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Waqas Tariq
In this paper information retrieval system for local databases are discussed. The approach is to search the web both semantically and syntactically. The proposal handles the search queries related to the user who is interested in the focused results regarding a product with some specific characteristics. The objective of the work will be to find and retrieve the accurate information from the available information warehouse which contains related data having common keywords. This information retrieval system can eventually be used for accessing the internet also. Accuracy in information retrieval that is achieving both high precision and recall is difficult. So both semantic and syntactic search engine are compared for information retrieval using two parameters i.e. precision and recall.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
An Improved Web Explorer using Explicit Semantic Similarity with ontology and...INFOGAIN PUBLICATION
The Improved Web Explorer aims at extraction and selection of the best possible hyperlinks and retrieving more accurate search results for the entered search query. The hyperlinks that are more preferable to the entered search query are evaluated by taking into account weighted values of frequencies of words in search string that are present in anchor texts and plain texts available in title and body tags of various hyperlink pages respectively to retrieve relevant hyperlinks from all available links. Then the concept of ontology is used to gain insights of words in search string by finding their hypernyms, hyponyms and synsets to reach to the source and context of the words in search string. The Explicit Semantic Similarity analysis along with Naïve Bayes method is used to find the semantic similarity between lexically different terms using Wikipedia and Google as explicit semantic analysis tools and calculating the probabilities of occurrence of words in anchor and body texts .Vector Space Model is being used to calculate Term frequency and Inverse document frequency values, and then calculate cosine similarities between the entered Search query and extracted relevant hyperlinks to get the most appropriate relevance wise ranked search results to the entered search string
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Semantic similarity and semantic relatedness
measure in particular is very important in the current scenario
due to the huge demand for natural language processing based
applications such as chatbots and information retrieval systems
such as knowledge base based FAQ systems. Current approaches
generally use similarity measures which does not use the context
sensitive relationships between the words. This leads to erroneous
similarity predictions and is not of much use in real life
applications. This work proposes a novel approach that gives an
accurate relatedness measure of any two words in a sentence by
taking their context into consideration. This context correction
results in a more accurate similarity prediction which results in
higher accuracy of information retrieval systems.
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
Information is overloaded in the Internet due to the unstable growth of information and it makes information search as complicate process. Recommendation System (RS) is the tool and largely used nowadays in many areas to generate interest items to users. With the development of e-commerce and information access, recommender systems have become a popular technique to prune large information spaces so that users are directed toward those items that best meet their needs and preferences. As the exponential explosion of various contents generated on the Web, Recommendation techniques have become increasingly indispensable. Web recommendation systems assist the users to get the exact information and facilitate the information search easier. Web recommendation is one of the techniques of web personalization, which recommends web pages or items to the user based on the previous browsing history. But the tremendous growth in the amount of the available information and the number of visitors to web sites in recent years places some key challenges for recommender system. The recent recommender systems stuck with producing high quality recommendation with large information, resulting unwanted item instead of targeted item or product, and performing many recommendations per second for millions of user and items. To avoid these challenges a new recommender system technologies are needed that can quickly produce high quality recommendation, even for a very large scale problems. To address these issues we use two recommender system process using fuzzy clustering and collaborative filtering algorithms. Fuzzy clustering is used to predict the items or product that will be accessed in the future based on the previous action of user browsers behavior. Collaborative filtering recommendation process is used to produce the user expects result from the result of fuzzy clustering and collection of Web Database data items. Using this new recommendation system, it results the user expected product or item with minimum time. This system reduces the result of unrelated and unwanted item to user and provides the results with user interested domain.
Cluster Based Web Search Using Support Vector MachineCSCJournals
Now days, searches for the web pages of a person with a given name constitute a notable fraction of queries to Web search engines. This method exploits a variety of semantic information extracted from web pages. The rapid growth of the Internet has made the Web a popular place for collecting information. Today, Internet user access billions of web pages online using search engines. Information in the Web comes from many sources, including websites of companies, organizations, communications and personal homepages, etc. Effective representation of Web search results remains an open problem in the Information Retrieval community. For ambiguous queries, a traditional approach is to organize search results into groups (clusters), one for each meaning of the query. These groups are usually constructed according to the topical similarity of the retrieved documents, but it is possible for documents to be totally dissimilar and still correspond to the same meaning of the query. To overcome this problem, the relevant Web pages are often located close to each other in the Web graph of hyperlinks. It presents a graphical approach for entity resolution & complements the traditional methodology with the analysis of the entity-relationship (ER) graph constructed for the dataset being analyzed. It also demonstrates a technique that measures the degree of interconnectedness between various pairs of nodes in the graph. It can significantly improve the quality of entity resolution. Using Support vector machines (SVMs) which are a set of related Supervised learning methods used for classification of load of user queries to the sever machine to different client machines so that system will be stable. clusters web pages based on their capacities stores whole database on server machine. Keywords: SVM, cluster; ER.
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...cscpconf
Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure
between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method
using web search engines. We use a Sequential Minimal Optimization (SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and
top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction
Algorithm outperforms by 89.8 percent of correlation value.
A web content mining application for detecting relevant pages using Jaccard ...IJECEIAES
The tremendous growth in the availability of enormous text data from a variety of sources creates a slew of concerns and obstacles to discovering meaningful information. This advancement of technology in the digital realm has resulted in the dispersion of texts over millions of web sites. Unstructured texts are densely packed with textual information. The discovery of valuable and intriguing relationships in unstructured texts demands more computer processing. So, text mining has developed into an attractive area of study for obtaining organized and useful data. One of the purposes of this research is to discuss text pre-processing of automobile marketing domains in order to create a structured database. Regular expressions were used to extract data from unstructured vehicle advertisements, resulting in a well-organized database. We manually develop unique rule-based ways of extracting structured data from unstructured web pages. As a result of the information retrieved from these advertisements, a systematic search for certain noteworthy qualities is performed. There are numerous approaches for query recommendation, and it is vital to understand which one should be employed. Additionally, this research attempts to determine the optimal value similarity for query suggestions based on user-supplied parameters by comparing MySQL pattern matching and Jaccard similarity.
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
The keyword searching mechanism is traditionally used for information retrieval from Web based systems. However, this system fails to meet the requirements in Web searching of the expert knowledge base based on the popular semantic systems. Semantic search of E-learning documents based on ontology is increasingly adopted in information retrieval systems. Ontology based system simplifies the task of finding correct information on the Web by building a search system based on the meaning of keyword instead of the keyword itself. The major function of the ontology based system is the development of specification of conceptualization which enhances the connection between the information present in the Web pages with that of the background knowledge.The semantic gap existing between the keyword found in documents and those in query can be matched suitably using Ontology based system. This paper provides a detailed account of the semantic search of E-learning documents using ontology based system by making comparison between various ontology systems. Based on this comparison, this survey attempts to identify the possible directions for future research.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Waqas Tariq
In this paper information retrieval system for local databases are discussed. The approach is to search the web both semantically and syntactically. The proposal handles the search queries related to the user who is interested in the focused results regarding a product with some specific characteristics. The objective of the work will be to find and retrieve the accurate information from the available information warehouse which contains related data having common keywords. This information retrieval system can eventually be used for accessing the internet also. Accuracy in information retrieval that is achieving both high precision and recall is difficult. So both semantic and syntactic search engine are compared for information retrieval using two parameters i.e. precision and recall.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
An Improved Web Explorer using Explicit Semantic Similarity with ontology and...INFOGAIN PUBLICATION
The Improved Web Explorer aims at extraction and selection of the best possible hyperlinks and retrieving more accurate search results for the entered search query. The hyperlinks that are more preferable to the entered search query are evaluated by taking into account weighted values of frequencies of words in search string that are present in anchor texts and plain texts available in title and body tags of various hyperlink pages respectively to retrieve relevant hyperlinks from all available links. Then the concept of ontology is used to gain insights of words in search string by finding their hypernyms, hyponyms and synsets to reach to the source and context of the words in search string. The Explicit Semantic Similarity analysis along with Naïve Bayes method is used to find the semantic similarity between lexically different terms using Wikipedia and Google as explicit semantic analysis tools and calculating the probabilities of occurrence of words in anchor and body texts .Vector Space Model is being used to calculate Term frequency and Inverse document frequency values, and then calculate cosine similarities between the entered Search query and extracted relevant hyperlinks to get the most appropriate relevance wise ranked search results to the entered search string
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Semantic similarity and semantic relatedness
measure in particular is very important in the current scenario
due to the huge demand for natural language processing based
applications such as chatbots and information retrieval systems
such as knowledge base based FAQ systems. Current approaches
generally use similarity measures which does not use the context
sensitive relationships between the words. This leads to erroneous
similarity predictions and is not of much use in real life
applications. This work proposes a novel approach that gives an
accurate relatedness measure of any two words in a sentence by
taking their context into consideration. This context correction
results in a more accurate similarity prediction which results in
higher accuracy of information retrieval systems.
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
Information is overloaded in the Internet due to the unstable growth of information and it makes information search as complicate process. Recommendation System (RS) is the tool and largely used nowadays in many areas to generate interest items to users. With the development of e-commerce and information access, recommender systems have become a popular technique to prune large information spaces so that users are directed toward those items that best meet their needs and preferences. As the exponential explosion of various contents generated on the Web, Recommendation techniques have become increasingly indispensable. Web recommendation systems assist the users to get the exact information and facilitate the information search easier. Web recommendation is one of the techniques of web personalization, which recommends web pages or items to the user based on the previous browsing history. But the tremendous growth in the amount of the available information and the number of visitors to web sites in recent years places some key challenges for recommender system. The recent recommender systems stuck with producing high quality recommendation with large information, resulting unwanted item instead of targeted item or product, and performing many recommendations per second for millions of user and items. To avoid these challenges a new recommender system technologies are needed that can quickly produce high quality recommendation, even for a very large scale problems. To address these issues we use two recommender system process using fuzzy clustering and collaborative filtering algorithms. Fuzzy clustering is used to predict the items or product that will be accessed in the future based on the previous action of user browsers behavior. Collaborative filtering recommendation process is used to produce the user expects result from the result of fuzzy clustering and collection of Web Database data items. Using this new recommendation system, it results the user expected product or item with minimum time. This system reduces the result of unrelated and unwanted item to user and provides the results with user interested domain.
Cluster Based Web Search Using Support Vector MachineCSCJournals
Now days, searches for the web pages of a person with a given name constitute a notable fraction of queries to Web search engines. This method exploits a variety of semantic information extracted from web pages. The rapid growth of the Internet has made the Web a popular place for collecting information. Today, Internet user access billions of web pages online using search engines. Information in the Web comes from many sources, including websites of companies, organizations, communications and personal homepages, etc. Effective representation of Web search results remains an open problem in the Information Retrieval community. For ambiguous queries, a traditional approach is to organize search results into groups (clusters), one for each meaning of the query. These groups are usually constructed according to the topical similarity of the retrieved documents, but it is possible for documents to be totally dissimilar and still correspond to the same meaning of the query. To overcome this problem, the relevant Web pages are often located close to each other in the Web graph of hyperlinks. It presents a graphical approach for entity resolution & complements the traditional methodology with the analysis of the entity-relationship (ER) graph constructed for the dataset being analyzed. It also demonstrates a technique that measures the degree of interconnectedness between various pairs of nodes in the graph. It can significantly improve the quality of entity resolution. Using Support vector machines (SVMs) which are a set of related Supervised learning methods used for classification of load of user queries to the sever machine to different client machines so that system will be stable. clusters web pages based on their capacities stores whole database on server machine. Keywords: SVM, cluster; ER.
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...cscpconf
Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure
between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method
using web search engines. We use a Sequential Minimal Optimization (SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and
top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction
Algorithm outperforms by 89.8 percent of correlation value.
A web content mining application for detecting relevant pages using Jaccard ...IJECEIAES
The tremendous growth in the availability of enormous text data from a variety of sources creates a slew of concerns and obstacles to discovering meaningful information. This advancement of technology in the digital realm has resulted in the dispersion of texts over millions of web sites. Unstructured texts are densely packed with textual information. The discovery of valuable and intriguing relationships in unstructured texts demands more computer processing. So, text mining has developed into an attractive area of study for obtaining organized and useful data. One of the purposes of this research is to discuss text pre-processing of automobile marketing domains in order to create a structured database. Regular expressions were used to extract data from unstructured vehicle advertisements, resulting in a well-organized database. We manually develop unique rule-based ways of extracting structured data from unstructured web pages. As a result of the information retrieved from these advertisements, a systematic search for certain noteworthy qualities is performed. There are numerous approaches for query recommendation, and it is vital to understand which one should be employed. Additionally, this research attempts to determine the optimal value similarity for query suggestions based on user-supplied parameters by comparing MySQL pattern matching and Jaccard similarity.
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
The keyword searching mechanism is traditionally used for information retrieval from Web based systems. However, this system fails to meet the requirements in Web searching of the expert knowledge base based on the popular semantic systems. Semantic search of E-learning documents based on ontology is increasingly adopted in information retrieval systems. Ontology based system simplifies the task of finding correct information on the Web by building a search system based on the meaning of keyword instead of the keyword itself. The major function of the ontology based system is the development of specification of conceptualization which enhances the connection between the information present in the Web pages with that of the background knowledge.The semantic gap existing between the keyword found in documents and those in query can be matched suitably using Ontology based system. This paper provides a detailed account of the semantic search of E-learning documents using ontology based system by making comparison between various ontology systems. Based on this comparison, this survey attempts to identify the possible directions for future research.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERINGIJDKP
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near
Duplicates is very difficult in large collection of data like ”internet”. The presence of these web pages
plays an important role in the performance degradation while integrating data from heterogeneous
sources. These pages either increase the index storage space or increase the serving costs. Detecting these
pages has many potential applications for example may indicate plagiarism or copyright infringement.
This paper concerns detecting, and optionally removing duplicate and near duplicate documents which are
used to perform clustering of documents .We demonstrated our approach in web news articles domain. The
experimental results show that our algorithm outperforms in terms of similarity measures. The near
duplicate and duplicate document identification has resulted reduced memory in repositories.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
Computing semantic similarity between two words comes with variety of approaches. This is mainly
essential for the applications such as text analysis, text understanding. In traditional system search engines are used to compute the similarity between words. In that search engines are keyword based. There is one drawback that user should know what exactly they are looking for. There are mainly two main approaches for computation namely
knowledge based and corpus based approaches. But there is one drawback that these two approaches are not suitable for computing similarity between multi-word expressions. This system provides efficient and effective approach for computing term similarity using semantic network approach. A clustering approach is used in order to improve the
accuracy of the semantic similarity. This approach is more efficient than other computing algorithms. technique
can also apply to large scale dataset to compute term similarity.
Semantic Information Retrieval Using Ontology in University Domain dannyijwest
Today’s conventional search engines hardly do provide the essential content relevant to the user’s search
query. This is because the context and semantics of the request made by the user is not analyzed to the full
extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search
which combines Natural Language Processing and Artificial Intelligence. The objective of the work done
here is to design, develop and implement a semantic search engine- SIEU(Semantic Information
Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge
base for the information retrieval process. It is not just a mere keyword search. It is one layer above what
Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed
both syntactically and semantically. The developed system retrieves the web results more relevant to the
user query through keyword expansion. The results obtained here will be accurate enough to satisfy the
request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically.
The system will be of great use to the developers and researchers who work on web. The Google results are
re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which
fetches more apt results for the user query.
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAINcscpconf
Today’s conventional search engines hardly do provide the essential content relevant to the
user’s search query. This is because the context and semantics of the request made by the user
is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is
upcoming in the area of web search which combines Natural Language Processing and
Artificial Intelligence.
The objective of the work done here is to design, develop and implement a semantic search
engine- SIEU(Semantic Information Extraction in University Domain) confined to the
university domain. SIEU uses ontology as a knowledge base for the information retrieval
process. It is not just a mere keyword search. It is one layer above what Google or any other
search engines retrieve by analyzing just the keywords. Here the query is analyzed both
syntactically and semantically.
The developed system retrieves the web results more relevant to the user query through keyword
expansion. The results obtained here will be accurate enough to satisfy the request made by the
user. The level of accuracy will be enhanced since the query is analyzed semantically. The
system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query
Ontological approach for improving semantic web search resultseSAT Journals
Abstract we propose personalized information system to provide more user-oriented information considering context information such as a personal profile based, location based and click based. Our system can provide associated search results from relations between the objects using context ontologies modeling created by the categorized layers data. Based on the similarities between item descriptions and user profiles, and the semantic relations between concepts, content based and collaborative recommendation models are supported by the system. User defined rules based on the ontology description of the service and interoperates within any service domain that has an ontology description. Keywords- Ontology; Semantic Web; RDF;OWL
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
`A Survey on approaches of Web Mining in Varied Areasinventionjournals
There has been lot of research in recent years for efficient web searching. Several papers have proposed algorithm for user feedback sessions, to evaluate the performance of inferring user search goals. When the information is retrieved, user clicks on a particular URL. Based on the click rate, ranking will be done automatically, clustering the feedback sessions. Web search engines have made enormous contributions to the web and society. They make finding information on the web quick and easy. However, they are far from optimal. A major deficiency of generic search engines is that they follow the ‘‘one size fits all’’ model and are not adaptable to individual users.
Conceptual Similarity Measurement Algorithm For Domain Specific OntologyZac Darcy
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of ontology based data integration system. In this paper, we focus on the web
query service to apply this proposed algorithm. Concepts similarity is important for web query service
because the words in user input query are not same wholly with the concepts in ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that cannot be confirmed the similarity of these words by WordNet. We prove the effect of this
algorithm with two degree semantic result of web mining by generating the concepts results obtained form
the input query.
Electrically small antennas: The art of miniaturizationEditor IJARCET
We are living in the technological era, were we preferred to have the portable devices rather than unmovable devices. We are isolating our self rom the wires and we are becoming the habitual of wireless world what makes the device portable? I guess physical dimensions (mechanical) of that particular device, but along with this the electrical dimension is of the device is also of great importance. Reducing the physical dimension of the antenna would result in the small antenna but not electrically small antenna. We have different definition for the electrically small antenna but the one which is most appropriate is, where k is the wave number and is equal to and a is the radius of the imaginary sphere circumscribing the maximum dimension of the antenna. As the present day electronic devices progress to diminish in size, technocrats have become increasingly concentrated on electrically small antenna (ESA) designs to reduce the size of the antenna in the overall electronics system. Researchers in many fields, including RF and Microwave, biomedical technology and national intelligence, can benefit from electrically small antennas as long as the performance of the designed ESA meets the system requirement.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...NelTorrente
In this research, it concludes that while the readiness of teachers in Caloocan City to implement the MATATAG Curriculum is generally positive, targeted efforts in professional development, resource distribution, support networks, and comprehensive preparation can address the existing gaps and ensure successful curriculum implementation.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Unit 8 - Information and Communication Technology (Paper I).pdf
Volume 2-issue-6-2016-2020
1. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
www.ijarcet.org
2016
Abstract— The growth of information in the web is too large,
so search engine come to play a more critical role to find
relation between input keywords. Semantic Similarity Measure
is widely used in Information Retrieval (IR) and also it is
important component in various tasks on the web such as
relation extraction, community mining, document clustering,
and automatic metadata extraction. An empirical method to
estimate semantic similarity using page counts and text snippets
retrieved from a web search engine for two words. Specifically,
define various word co-occurrence measures using page counts
and integrate those with lexical patterns extracted from text
snippets. Pattern clustering is used to identify the numerous
semantic relations that exist between two given words. The
optimal combination of page counts-based co-occurrence
measures and lexical pattern clusters is learned using support
vector machines. The proposed method context Aware
Semantic Association Ranking discovering complex and
meaningful relationships, which we call Semantic Associations.
Index Terms— Semantic Similarity, Pattern Extraction,
Pattern Clustering, Page Count, Snippet, Semantic Association.
I. INTRODUCTION
Highlight Due to the rapid development of
technologies, we all receive much more information than we
did years ago. Organizing and comparing information
effectively has now become a real problem. The string
comparison functions available in most high-level languages
make it easy to compare the lexical similarity between text
passages. But in most cases to avoid information overloading
and to improve the quality of search result, we want to
consider the semantics rather than the lexical of these text
passages. The traditional lexical similarity measurements do
not adapt well to the requirements of semantic contexts.
While searching in the web, it normally looks for matching
documents that contain the keywords specified by the user.
Web mining is use of data mining technique which extracts
the information from the web documents. Keywords are the
input for the searching and retrieving the document in the
web. For example, when we give “Tablet” and “PC” as
keyword, the search engine displays the document that
contains the words Tablet and PC but it does not relate the
words semantically.
Manuscript received June, 2013.
Ilakiya P, Department Of Computer Science and Engineering, SNS
College Of Technology, Coimbatore, India, 9677466553
So we get the unwanted information i.e. user in focused to
get the information related to the Tablet products like Tablet
PC, but in ordinary web it also gives the related information
to Tablet and PC it will be avoided through the semantic web.
This Unwanted information is said to be as Information
Overload. [8]
Semantic similarity is the degree to which text
passages have the same meaning. This is typically established
by analyzing a set of documents or a list of terms and
assigning a metric based on the likeness of their meaning or
the concept they represent or express. To be semantically
similar, two terms do not have to be synonyms one. Instead,
two terms are given a number to indicate the semantic
distance between them, i.e., how term A is relates to term B.
Semantic similarity provides a common way to build
ontology’s, which in turn provides the knowledge base for
the semantic web.[8]
Semantic similarity between words is achieved in
web with the help of semantic web which is an extension of
the current web that contains information with their well
defined meaning. It describes the relationship between
keywords and the properties of the keywords. Some of the
examples for semantic web are swoogle, Hakia and Yebola.
A. Lexical based search
Searching key word in the semantic web is totally
different from the Text based search engine Google. In this,
while giving the keyword it uses the Cashing Built and Bit
table. Cashing Built is like call log in the Mobile Phone and
also it is called as the temporary buffer that stores recent
search history. Bit Table is an algorithm Centric Database,
which contains the huge collection of data and indexes of all
the keywords in the document except the stopping words.
Both the Cashing Built and the Bit table are searched
simultaneously, and when the search engine founds the
keyword it automatically disconnect the another connection
and display the result.
The searching Result in the Google is mainly based
on four criteria: Architecture, Site content, Credibility and
Back link, Engagement. However it displays the result based
on hits and Page Rank Algorithm. So there are more chances
to extract irrelevant documents also. This is a major
disadvantage of Google search. So it paved the way to the
semantic web.
B. Semantic based search
Discovering Semantic Similarity between
Words Using Web Document and Context
Aware Semantic Association Ranking
P.Ilakiya
2. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
2017
www.ijarcet.org
Semantic web consists of three parts: RDF
(Resource Description Framework), Ontology, and OWL
(Web Ontology Language). In this, the keyword is taken as a
snippet. Snippets are useful for search because most of the
time a user can read the snippet and decide whether a
particular search is relevant without opening the “url” [14].
First the snippet is processed into the RDF. It is like the XML
Language, RDF divides the snippet as three parts Subject,
Predicate, Object, and then constructs the combination
between these that results are stored in the Database.
Second Ontology is applied to the Database, it helps
to find entity in the database and the same entity are group,
after this it finds the number of the relationship between the
entities; finally OWL cuts the minimal number of Semantic
Measure and display the result to user. These are the
searching process behind the Semantic Web Search Engine.
II. RELATED WORKS
A. Relation Based Page Rank Algorithm
Relation based page rank algorithm to be used in
conjunction with semantic web search engines that simply
relies on information that could be extracted from user
queries and on annotated resources [7]. Its help to give the
result in ordered manner that is, best fit user query document
is displayed first. Flow of query extraction and result is
shown in figure1. First the user post there query in search
engine, the web crawler download the document from the
web page database. Then the initial Result set is constructed
that contain query keyword and concept.
Figure 1: Flow chart for query extraction and presentation of
results
For that resultant set search engine construct the
query sub graph for each page then page score is calculated.
Based on the page score final result set is constructed and
given to the user. [11]
Advantage
i. Effectively manage the search space.
ii. Reduce the Complexity associated with the ranking
task.
Disadvantage
i. Less scalable.
B. Unsupervised Semantic Similarity between terms
Page count and context based metrics are discussed
in the Unsupervised semantic similarity computation. Page
count metrics consider the hits returned by a search engine
this can be done by following three sequential executions.
1) Jaccard and Dice Coefficient are used to find the similarity
between set of word. Example W1 and W2 are the two words
if the co occurrence is equal to one means; both the W1 and
W2 are same. Co occurrence zero means W1 and W2 did not
same.
2) Mutual Information is used to calculate the number of
document that contain the word W1 and W2 that number is
assigned to a random variable X, Y.[12] Then the mutual
dependence between these two words are calculated it can be
evaluated by following two case i) If X and Y independent
means both X and Y does not share any information so in this
case mutual information is 0. ii) If X and Y are equal means X
share some value and the mutual information is 1.
3) Google based semantic relatedness calculates the
similarity between two words and then compared with the
distance between these two words. If the similarity is high
means distance between these words will be low [5].Context
based semantic similarity algorithm download the top ranked
document based on user query and also compute the frequent
occurrence of these words.
Advantage
i. Provides good performance.
ii. Fully automatic and required little computation power.
Disadvantage
i. Related document selection feature selection does not
discuss.
C. Measure Word-Group Similarity Using Web Search
Result
Distribution similarity measure is performed in
Word-Group Similarity measure. [1] Measure the search
counts within the small number of word pair. Based on
Dataset, that contains some limited number of document.
Apply the page count based measure in the particular dataset,
results the similarity score always high, this is possible
because of the limited number of document.
Advantage
i. Proposed method can be utilized to measure the
similarity of entire sets of word in addition to word-pairs
Access Web Page
database
Construct the initial result
set
Sub Graph construction for
each page in the result set
Page score calculation
Output (Final Result set)
Input (User Query)
3. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
www.ijarcet.org
2018
Disadvantage
i. Optimal values are not obtained
D. A Study on similarity and relatedness Using
Distributional and Word Net -based Approaches
Word Net is the Database for English words it
contains the synonyms for most of the words repeatedly
using by the users. Some versions of word net are MCR
(Multilingual Central Repository) contains some other
language like Spanish, Italian [10].
Cross Linguality Similarity method is applied to
calculate the same word in different languages[6]. For
example Car and the Coche having the same meaning, Car is
from English and coche is from Spanish that can be analyses
from MCR and graph is constructed to calculating the
similarity and page rank.
Advantage
i. Effective way to compute cross-lingual similarity with
minor losses.
E. Measuring Semantic Similarity between Words Using
Web Document.
One of the approaches to measure the semantic
similarity between words is snippet that is returned by the
Wikipedia. First the snippet is extracted from Wikipedia
based on the user query, then the snippet is preprocessed
because some time semantically unrelated snippet are
extracted by the search engine. Finally stop words and suffix
are removed from the snippet [12]. If the system is not
removing the Stop Key word and Suffixes means total
number of keywords for a particular document will be high
and also the time taken to find the similarity also is increases.
There is a five similarity measure of association that
is simple similarity, Jaccard similarity, Dice similarity,
Cosine similarity, Overlap Similarity. Jaccard similarity
comparing the similarity and diversity of given sample set.
Dice similarity also related to the jaccard measure. Over Lap
method is used to find the overlapping between the two sets.
The cosine similarity is a measure of similarity between two
vectors of n dimensions by finding the angle between them
[12].
F. A Web Search engine-Based Approach to Measure
Semantic Similarity between Words
Web search engine based approach only combines
the page count and snippets together and then the resultant set
will be given to the SVM. While we are searching keywords
in the search engine, it passes the keywords to page count and
snippet that shown in the figure 2. Page Count represents the
number of pages that contain the keyword. [13]It is the global
co occurrence of the keyword. Page count for the separate
word and also for both the words is calculated by using the
four similarity score method or co occurrence measure that
are Web Jaccard, Web Overlap, Web Dice, Web PMI. Page
count not only sufficient to measure the semantic similarity,
why means it does not gives the position of the word in the
document, semantic of the word does not possible in this.
Text snippet is also help to find the similarity
between the given keyword. For example the query words are
cricket and sport. Cricket is a sport played between two teams
is the snippet retrieved from the search engine then the user
easily knows, it is related to the query or not without viewing
the whole document. In the above example “is a” is the
semantic relationship and some of the semantic relationship
are also know as, part of etc.
Then the Lexical Pattern extraction and pattern
Clustering is applied. Lexical Pattern extraction must contain
subsequence with one occurrence of each keyword that is X
and Y. Maximum length of the subsequence is L Words.
Subsequence is allowed to skip some words [4]
Lexical Pattern Clustering group set of object into
classes of similar object. And also it defines the features of
the words. Although it defines the similarity vary from one
cluster to another cluster model.
Gem Jewel
Semantic Similarity Result
Figure 2: Search Engine working Outline Diagram
Finally the pattern cluster result and the similarity
score result is given to the SVM.SVM is the Support Vector
Machine this will be the supervised learning model used for
classification that analyze the data and recognize the patterns.
It is the one of the classifier that helps to classify the
synonymous and non synonymous data.
Advantage
i. Precision and recall is improved in this.
III. PROPOSED SEMANTIC SIMILARITY METHOD
Search Engine
Page Count Snippets
Similarity Measure
Taken for page count
Lexical Pattern
Clustering
SVM (Support Vector
Machine)
4. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
2019
www.ijarcet.org
A search engine is an interesting task. A user, who
searches for apple on the web, might be interested in this
sense of apple and not apple as a fruit. New words are
constantly being created as well as new senses are assigned to
existing words. Manually maintaining ontologies to capture
these new words and senses is costly if not impossible. So by
this method most similar document are extracted and ranked.
A. Dataset Preprocessing
The dataset preprocessing can often have a
significant impact on generalization performance of any data
mining algorithm. The elimination of noise instances is one
of the most difficult problems in such scenarios. Usually the
removed instances have excessively deviating instances that
have too many null feature values. These excessively
deviating features are also referred to as outliers. In addition,
a common approach to cope with the infeasibility of learning
from very large data sets is to select a single sample from the
large data set.
Missing data handling is another issue often dealt
with in the data preparation steps. WordSimilarity353 (WS)
dataset has been used throughout the project and the same has
been preprocessed here.
B. Lexical Pattern Extraction
The snippets returned by a search engine for the
conjunctive query of two words provide useful clues related
to the semantic relations that exist between two words.
A snippet contains a window of text selected from a
document that includes the queried words. Snippets are
useful for search because, most of the time, a user can read
the snippet and decide whether a particular search result is
relevant, without even opening the url. Using snippets as
contexts is also computationally efficient because it obviates
the need to download the source documents from the web,
which can be time consuming if a document is large. For
example, consider the snippet cricket and sport. Here, the
phrase is indicates a semantic relationship between cricket
and sport. Many such phrases indicate semantic
relationships. Lexical pattern Extraction gives page count of
the single word and combination of two words using
following four similarity measure, Jaccard, Overlap
(Simpson), Dice, and Pointwise mutual information (PMI).
The Web Jaccard coefficient between words (or
multiword phrases) or the similarity between two words P
and Q.
Web Overlap is a natural modification to the Overlap
coefficient i.e., Overlap of two keywords.
The Web Dice coefficient as a variant of the Dice
coefficient P and Q. It defines the dependence between two
probabilistic events.
Pointwise mutual information is a measure that is
motivated by information theory; it is intended to reflect the
dependence between two probabilistic events. The Web PMI
define as
C. Lexical Pattern Clustering
Typically, a semantic relation can be expressed
using more than one pattern. For example, consider the two
distinct patterns, X is a Y, and X is a large Y. Both these
patterns indicate that there exists and is-a relation between X
and Y. Identifying the different patterns that express the same
semantic relation enables us to represent the relation between
two words accurately. According to the distributional
hypothesis, words that occur in the same context have similar
meanings. The distributional hypothesis has been used in
various related tasks, such as identifying related words, and
extracting paraphrases. If we consider the word pairs that
satisfy (i.e., co-occur with) a particular lexical pattern as the
context of that lexical pair, then from the distributional
hypothesis, it follows that the lexical patterns which are
similarly distributed over word pairs must be semantically
similar.
D. Measuring Semantic Similarity
Semantic similarity defined four co-occurrence
measures using page counts and shows how to extract
clusters of lexical patterns from snippets to represent
numerous semantic relations that exist between two words. In
this describe a machine learning approach to combine both
page counts-based co-occurrence measures, and
snippets-based lexical pattern clusters to construct a robust
semantic similarity measure.
E. Context Aware Semantic Association Ranking
Context-Aware Semantic Association Ranking is applied
to enhance the results with ranking. Discovering complex
and meaningful relationships, which we call Semantic
Associations, is an important challenge.[2] Just as ranking of
documents is a critical component of today’s search results,
ranking of relationships will be essential in tomorrow’s0, ( ) ,
( )
, .
min{ ( ), ( )}
if H P Q c
W ebOverlap H P Q
otherwise
H P H Q
∩ ≤
= ∩
0, ( ) ,
( , ) 2 ( )
, .
( ) _ ( )
if H P Q c
WebDice P Q H P Q
otherwise
H P H Q
∩ ≤
= ∩
2
0, ( ) ,
( )
( , )
log , .
( ) ( )
if H P Q c
H P Q
WebPMI P Q N otherwise
H P H Q
N N
∩ ≤
∩
=
0, ( ) ,
( , ) ( )
, .
( ) ( ) ( )
ifH P Q c
WebJaccard P Q H P Q
otherwise
H P H Q H P Q
∩ ≤
= ∩
+ − ∩
5. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
www.ijarcet.org
2020
semantic search results that would support discovery and
mining of the Semantic Web. Building upon the above
proposed work, a framework is proposed where ranking
techniques can be used to identify more interesting and more
relevant Semantic Associations. These techniques utilize
alternative ways of specifying the context using ontology.
This enables capturing users’ interests more precisely and
better quality results in relevance ranking.
IV. CONCLUSION
The World Wide Web is huge, widely distributed; global
information service centre in this retrieving accurate
information for users in Search Engine faces a lot of
problems. This is due to accurately measuring the semantic
similarity between words is an important problem, and also
efficient estimation of semantic similarity between words is
critical for various natural language processing tasks such as
word sense disambiguation, textual entailment, and
automatic text summarization. In information retrieval, one
of the main problems is to retrieve a set of documents that is
semantically related to a given user query. For example, the
word “apple” consists of two meaning one indicates the fruit
apple and the other is the apple company. Retrieving accurate
information to users to such kind of similar words is
challenging. Some existing system proposed an architecture
and method to measure semantic similarity between words,
which consists of snippets, page-counts and two class support
vector machine. Proposed approaches to compute the
semantic similarity between words or entities using text
snippets is good. Context-Aware Semantic Association
Ranking is applied to enhance the results with ranking.
ACKNOWLEDGMENT
The authors would like to thank the Editor-in-Chief, the
Associate Editor and anonymous Referees for their
comments.
REFERENCES
[1] Ann Gledson and John Keane,” Using Web-Search
Results to Measure Word-Group Similarity”
Proceedings of the 22nd International Conference on
Computational Linguistics, pages 281–288 Manchester,
August 2008.
[2] Boanerges Aleman-Meza, Chris Halaschek, I. Budak
Arpinar, and Amit Sheth “Context-Aware Semantic
Association Ranking”, Semantic Web and Databases
Workshop Proceedings. Belin, September 7,8 2003.
[3] Bollegala, Y. Matsuo, and M. Ishizuka, “Measuring
Semantic Similarity between Words Using Web Search
Engines,” Proc. Int’l Conf. World Wide Web (WWW
’07), pp. 757-766, 2007.
[4] DanushkaBollegala, Yutaka Matsuo, and Mitsuru
Ishizuka,” A Web Search Engine-Based Approach to
Measure Semantic Similarity between Words” IEEE
transactions on knowledge and data engineering, vol. 23,
no. 7, July 2011
[5] Elias Iosif and Alexandros Potamianos, “Unsupervised
Semantic Similarity Computation between Terms Using
Web Documents” IEEE transactions on knowledge and
data engineering, vol. 22, no.11, November 2010.
[6] Eneko Agirre,Enrique Alfonseca,Keith Hall,Jana
Kravalova, Marius Pasca,Aitor Soroa,” A Study on
Similarity and Relatedness Using Distributional and
Word Net-based Approaches” The 2009 Annual
Conference of the North American Chapter of the ACL,
pages 19–27,Boulder, Colorado, June 2009.
[7] Fabrizio Lamberti,Andrea Sanna, and Claudio
Demartini,”A Relation-Based Page Rank Algorithm for
Semantic Web Search Engines” IEEE transactions on
knowledge and data engineering, vol. 21, no. 1, January
2009.
[8] Gabriela Polcicova and Pavol Navrat “Semantic
Similarity in Content-Based Filtering” ADBIS 2002,
LNCS 2435, pp. 80–85, 2002. Springer-Verlag Berlin
Heidelberg 2002.
[9] Sheetal A. Takale and Sushma S. Nandgaonkar,”
Measuring Semantic Similarity between Words Using
Web Documents” (IJACSA) International Journal of
Advanced Computer Science and Applications,Vol. 1,
No.4 October, 2010
[10]Gledson and J. Keane, “Using Web-Search Results to
Measure Word-Group Similarity,” Proc. Int’l Conf.
Computational Linguistics (COLING ’08), pp. 281-288,
2008.
[11]Jiang and D. Conrath, “Semantic Similarity Based on
Corpus Statistics and Lexical Taxonomy,” Proc. Int’l
Conf. Research in Computational Linguistics
(ROCLING X), 1997.
[12]Strube and S.P. Ponzetto, “Wikirelate! Computing
Semantic Relatedness Using Wikipedia,” Proc. Nat’l
Conf. Artificial Intelligence (AAAI ’06), pp. 1419-1424,
2006.
[13]Wu and M. Palmer, “Verb Semantics and Lexical
Selection,” Proc. Ann. Meeting on Assoc. for
Computational Linguistics (ACL ’94), pp. 133-138,
1994.
Ms P.Ilakiya currently pursuing her
Master of Engineering in Software
Engineering at SNS College of
Technology, affiliated to Anna
University Chennai, Tamilnadu,
India. She received her B.E degree
from Arunai Engineering College,
affiliated to Anna University
Chennai. Her research interests
include data mining and
networking.