Ijetcas14 446

International Association of Scientific Innovation and Research (IASIR)
(An Association Unifying the Sciences, Engineering, and Applied Research)
International Journal of Emerging Technologies in Computational
and Applied Sciences (IJETCAS)
www.iasir.net
IJETCAS 14-446; © 2014, IJETCAS All Rights Reserved Page 407
ISSN (Print): 2279-0047
ISSN (Online): 2279-0055
ONTOLOGY BASED RANKING WEB DOCUMENTS USING
SEMANTIC SIMILARITY
M.Mahalaksmi1
R.Anusuya2
Dr.S.Srinivasan
Computer Science and Engineering
Anna University Madurai Regional, Chennai, Tamilnadu, INDIA.
Abstract: Many web search engines retrieve enormous amounts of irrelevant information in answer to users
‘queries. The semantic web provides a promising approach to improve search operation. This paper is to show
how to measure the closeness (relevancy) of retrieved web sites to user query-concepts and re-rank them
accordingly. Therefore paper proposed a new relevancy measure to re-rank retrieved documents. We termed the
approach ‘‘ontology concepts’’ and it on the domain of electronic commerce. Results suggested that we could
re-rank the retrieved documents (web sites) according to their relevancy to the search query. This paper
proposed a method depends on the frequency of the ‘‘ontology concepts’’ in the retrieved documents and uses
this to compute their relevancy
Keywords: Ontology, Ontology concepts, Ranking, Semantic web, Electronic commerce
I. Introduction
The semantic web uses ontology as a tool to capture concepts for specific domains. As a result, computers can
deal with the data of those domains semantically. An ontology language can be used generate class and property
descriptions based on their names, along with some axioms about them. Ontologies have many benefits. First,
they capture the concepts, their properties, and their relationships. Second, they represent the domain data in a
semantic way and define the knowledge that is embedded in the domain. Third, they can be used to analyze the
domain independent of any application requirements. Fourth, they are used to satisfy the new vision of the next
generation of the WWW, the semantic web. Fifth, they can be used to build web data in a structured way.
One of the main challenges for search engines is to provide a good ranking for documents that are retrieved as
relevant to the users’ query [2]. Our approach used the ontology to build a relevancy measure that checked how
close the content of a document was to the user query. The ‘‘ontology concepts’’ approach differs from
‘‘keyword concepts’’ because ‘‘ontology concepts’’ search on the semantic of the users’ query not merely on
keywords. Ontology concepts and relations were used to define hyperlink relationships that indicate the
important entities but unimportant entities might not be selected. Ontology concepts and the frequencies are the
important measures that are used to specific document.
Figure 1 Methodology of building Ontologies.

M.Mahalaksmi et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 8(5), March-May, 2014, pp.
407-410
II. Ranking method and search engine results
1. The ranking method
1.1. The first phase: building ‘‘ontology concepts’’
We split the methodologies for building Ontologies around three major stages of the ontology life cycle
Building, Manipulating, and Maintaining (see Fig. 1). These three stages are overlapped. Ellipses in Fig. 1
represent the inner steps for each stage. Building ‘‘ontology concepts’’ is a necessity in order for them to be
used in the second phase.
The electronic commerce domain was selected for this research. The key motivation for choosing this was the
increasing number of web documents that discuss electronic commerce. The common terms and most frequent
terms in specific domains are pointed out [3]. The input is a set of documents. It is collected from several
resources such as online reports, news, banking, teleconference and academic research. The extracted ‘‘ontology
concepts’’ for electronic commerce consisted of concepts that are not only the most frequent terms but also
those having high ontological relevance keywords.
1.2. The second phase: using the ‘‘ontology concepts’’ to measure relevance
Documents/sites are retrieved in the domain of interest (e-commerce here) using the specified search engines;
the ranks of these documents are stored according to the search engines’ (e.g., Google or Yahoo) ordering. This
step was also divided into two parts; the first converts the retrieved documents/sites into text format saving their
original ranking, while, In the second, the retrieved documents were input into our algorithm where each was
given a new rank based on its ‘‘distance’’ from the ontology.
The ranks produced by this method and those of the search engines were compared. Ranking each document in
the best order by its relevancy to the user query. Only the first thirty documents were selected because it was
difficult to find domain experts to rank more. At the same time, the relevancy ordering would be likely to be
inaccurate after the first twenty.
The distance between each document’s position in this proposed method and its original position are calculated
and find out their error. The average ranking error represents the average distance for the documents between
their original rank and the our method of ranking.[8]
Figure 2 Flow of the process
III. Procedure for Ranking method
The ranking method
Part one: Obtain the documents and theirs ranks
Step A:
Retrieve documents using search engines. The query ‘‘e-commerce’’ was used to retrieve the relevant web
documents.
Step B:
Save the first 30 (or any desired number) documents in text format and save them. These are the data source for
testing.

407-410
Step C:
Save the original ranking of each document as retrieved by each search engine. Thus document N will be given
rank number N, etc.
Here, the original ranks were saved for comparison with our measure.
Part two: The ranking method is based on the ‘‘ontology concepts’’. The algorithm splits each document for
each search engine into words and computes the occurrences of these words in the proposed ontology concepts;
it then re-ranks these documents according to the number of occurrences.
IV. Procedure for Re-ranking method
This procedure will be run separately for each search engine.
Step A:
For each text document, store its words into an array. Read the text files to divide each document into words.
Then store the words in a string array called split.
Step B:
Store only one occurrence for each word into an array. Eliminate the frequency of words for each document and
store them without frequency in a string array called unique Split.
Step C:
Eliminate the stop words by using porter stemming algorithm. Store stop words in an array to eliminate them
from each document. They are to be ignored during the comparison process.
Step D:
Determine the ‘‘ontology concepts’’ for each document. Words in the unique Split Array for each document are
compared with the words of the ‘‘ontology concepts’’. Store only the words in the document that are included as
‘‘Ontology Concepts” .
Step E:
Count the frequency of ‘‘ontology concepts’’ for each document.
To find the term frequency in each document,
- frequency of terms in document based on ontology concept.
-maximum frequency of most repeated concepts in document.
To find the inverse document frequency,
D – total number of documents
web doc set:
Step F:
Re-rank the documents according to their frequency. Use the array the frequency of Exist Term and give the
highest rank for the highest frequency, and the second highest for the second highest rank (two), etc.
V. Implementaion and Result
Evaluation metrics is used to measure the re-ranking the documents. After re-ranking the documents according
to their frequency, the performance is evaluated using precision and recall methods. These are calculated using
following formulas,

407-410
Figure 3 fairness Distance Evaluation Graph
The resultant curves shows that the blue one shows the average difference between each document’s position in
Google and the position of each document according to our re-ranking method. The pink curve shows the
average difference between each document’s position in our method and the position of each document
according to the three experts.
VI. Conclusions
We have proposed a new approach, the use of ‘‘ontology concepts’’, as a relevancy measure to re-rank retrieved
web documents. We showed its value in the electronic commerce domain. The re-ranking of documents
enhanced their relevancy. Our results showed that the average ranking error was less than several search
engines.
VII. References
[1] A. Kayed, R. Colomb, Extracting ontological concepts for tendering conceptual structures, Data and Knowledge Engineering 40
(1), 2002, pp. 71–89.
[2] A. Kayed, N. Hirzallah, L. Al-Shalabi, M. Najjar, Building ontological relationships: a new approach, Journal of the American
Society for Information Science and Technology, ISSN: 1532-2882, John Wiley & Sons Inc., pp. 1801–1809, 2008.
[3] L. Ding, R. Pan, T. Finin, A. Joshi, Y. Peng, P. Kolari, Finding and ranking knowledge on the semantic web, in: Proceedings of
the 4th International Semantic Web Conference, 2005, pp. 156–170.
[4] Ontology Ranking based on the Analysis of Concept Structures, Harith Alani Dept. of Electronics & Computer Science
University of Southampton, UK, Christopher Brewster Dept. of Computer Science University of Sheffield, UK.
[5] Concept Based Information Access Using Ontologies and Latent Semantic Analysis Rifat Ozcan, Y. Alp Aslandogan
{ozcan,alp}@cse.uta.edu
[6] Semantic Search using Ontology and RDBMS for Cricket S. M. Patil Information Technology Department, BVCOE, Navi
Mumbai, Maharashtra, India D. M. Jadhav Information Technology Department, PIIT, New Panvel, Maharashtra, India.
[7] Identifying key concepts in an ontology, through the integration of cognitive principles with statistical and topological measures
Silvio Peroni, Enrico Motta, and Mathieu d’Aquin Knowledge Media Institute The Open University Milton Keynes, United
Kingdom
[8] Ranking web sites using domain ontology concepts, Ahmad Kayed a,*, Eyas El-Qawasmeh b, Zakariya Qawaqneh c, Science
direct(2010)

Ijetcas14 446

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (19)

Similar to Ijetcas14 446

Similar to Ijetcas14 446 (20)

More from Iasir Journals

More from Iasir Journals (20)

Recently uploaded

Recently uploaded (20)

Ijetcas14 446