Tips & Tricks
Enriching search results using ontology
Like this document? Why not share!
From Kindling to Wilfire: Position,...
J query websites performance analy...
by Pritika Saini
DMDW Lesson 01 - Introduction
by Johannes Hoppe
Email sent successfully!
Show related SlideShares at end
Enriching search results using ontology
May 12, 2013
Comment goes here.
12 hours ago
Are you sure you want to
Your message goes here
Be the first to comment
Be the first to like this
Number of Embeds
No notes for slide
Enriching search results using ontology
1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME500ENRICHING SEARCH RESULTS USING ONTOLOGYShobha B. Patil1, S. K. Shirgave21ME student, D. Y. Patil College of Engineering & Technology, Kolhapur,Maharashtra, India.2Associate Professor, Dept. of I.T, D.K.T.E. Societys Textile & Engineering Institute,Ichalkaranji, Maharashtra, India.ABSTRACTThe contents of World Wide Web increases dynamically every day. Keyword basedsearch is used for finding documents that are relevant to search query. Due to tremendousamount of information available on internet, it becomes very difficult to get relevantdocuments by using only keyword based search. The search results are based solely upon thefrequency of keywords alone without any extra intelligence. The meaning of keywords is notconsidered in traditional search techniques.It is possible to increase rate of relevant documents by using ontology. It allowssophisticated semantic search. Ontology is representation of concepts in a domain of interest,their relationships. This paper presents a method for enriching search result documents bysemantically relevant documents to the search query. In existing systems, ranking ofdocuments in search result is determined by the pages having the highest frequency of thequeried words. The method proposed in this paper also provides how to rank documents insearch results according to their semantic relevance to search query.Keywords: Ontology, OWL, OWLAPI, Protégé.1. INTRODUCTIONThe World Wide Web is one of the fastest growing areas of information. There is adynamic and explosive growth of information on internet every day. Traditionally, keywordbased search is used for finding documents that are relevant to search query. Keyword basedINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING& TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online)Volume 4, Issue 2, March – April (2013), pp. 500-507© IAEME: www.iaeme.com/ijcet.aspJournal Impact Factor (2013): 6.1302 (Calculated by GISI)www.jifactor.comIJCET© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME501search uses concept of TF-IDF to find relevant documents. Term Frequency InverseDocument Frequency (TF-IDF) determines what words in a set of documents might be morerelevant to use in a query.TF-IDF finds values for each word in a document through aninverse proportion of the frequency of the word in a particular document to the percentage ofdocuments the word appears in. Words with high TF-IDF value imposes a strong relationshipwith the document they appear in, suggesting that if that word were to appear in a query, thedocument could be of relevant to the search query terms.Due to tremendous amount of information available on internet, it becomes verydifficult to get relevant documents by using only keyword based search. The current keywordbased search returns only few relevant documents and a lot of irrelevant documents. Thesearch results are based solely upon the frequency of keywords alone without any extraintelligence.Ontology can help to find documents that are semantically relevant to search query.Ontology describes concepts and their relationships in a domain of interests. Ontologies arethe structural frameworks for organizing the information. Ontology is a "formal, explicitspecification of a shared conceptualization". There are various tools are available toimplement, design and maintain ontology such as Protégé, OntoEdit, SWOOP and etc .Ontology languages such as RDF, OWL allow users to write explicit, formalconceptualizations of domains models. OWL builds on RDF and RDF Schema, and usesRDFs XML syntax .The method proposed in this paper, enrich keyword based search by using ontology.The documents resulted from keyword based search will be augmented with the documentsresulted from ontology mapping to search query. All documents from search result get rankedby considering both the frequency of keywords in that document and ontology. Thedocuments those are semantically relevant to search query are given more relevance.The organization of this paper is as follows: In section 2, terminologies used in thispaper are defined. Section 3 focuses on related work. In section 4, the architecture ofproposed method and implementation is discussed. Section 5 presents results and evaluationusing proposed method. Finally, in section 6 conclusion of this paper is presented.2. TERMINOLOGIES2.1 Ontology :Ontology is a "formal, explicit specification of a shared conceptualization". Ontologyis an explicit representation of concepts of some domain of interest, with their characteristicsand their relationships. Ontology represents knowledge in terms of concepts defined byclasses, properties and individuals and a set of axioms that assert how those concepts are tobe interpreted.2.2 Domain ontology :A domain ontology (or domain-specific ontology) models a specific domain, whichrepresents part of the world. Particular meanings of terms applied to that domain areprovided by domain ontology.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME5022.3 Owl :Web Ontology Language (OWL) is one of the knowledge representation languagesfor creating, manipulating and processing ontologies. OWL used to describe the classes andrelations between them that are inherent in web documents and applications.2.4 Owl API:There are three variations of OWL existing currently: OWL Full, OWL DL and OWLLite. The OWL API is targeted primarily at representing OWL-DL.OWLAPI is a java basedAPI for semantic web ontology.2.5 Search terms:Search terms consisting of one or more keywords, which represent informationneeded to user.2.6 Protégé :It is a free, open-source ontology editor and knowledge-based framework, producedby Stanford University. Protégé is a tool that enables the construction of domain ontologies,customized data entry forms to enter data. Protégé allows the definition of classes, classhierarchies, variables, variable-value restrictions, and the relationships between classes andthe properties of these relationships.3. LITERATURE REVIEWTraditionally, in the keyword based search technique, the documents are representedusing the vector space model. The TF/IDF method is used to find out frequency of terms anddocuments. The documents having high TF/IDF value resulted as relevant documents.As the amount of information available from various information sources such as theWorld Wide Web is dynamically increasing, it becomes more difficult to find relevantinformation in large information spaces. In this situation, an ontology that describes conceptsand their relationships in a domain of interests can help since terms provided by ontology canhelp novices within a specific domain or people who are not familiar with searching .Ontology can be also utilized for effective navigation.Ontology is referred to as the explicit and formal specification of a sharedconceptualization. As an engineering artifact, it consists of terms and relationships thatdescribe a certain reality, plus a set of explicit assumptions regarding the intended meaning ofthe vocabulary. Ontology can serve as background knowledge  and it can help users refinethe search results from domains that they are not familiar with . Constructing a formalontology generally relies on an interactive process to explicit knowledge and formalizes it.An overview of some editing tools for ontology is given in . The paper  givesinformation about ontology management tools (Protégé 3.4, Apollo, IsaViz & SWOOP) thatare freely available and review them in terms of: a) interoperability, b) openness, c) easinessto update and maintain, d) market status and penetration.The method presented in this paper uses the keyword based search (TF/IDF data) andontology to provide the documents that is most relevant to search query.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME5034. PROPOSED METHODFig.1 The structure of proposed method.The method implementation is divided into five different modules:A. Extraction of terms from documentsB. TF/IDF dataC. OntologyD. Enriching search result using ontologyE. Ranking the result.A. Extraction of terms from documents:All html files from web site are processed for getting terms in following way.1. Tokenization: It is process of splitting a string containing text into individual tokens.2. Stop words are removed from obtained terms.3. Stemming process is applied on terms.Stemming is the reduction of words to abbreviated word roots that allow for easycomparison for equality of similar words.B. TF/IDF data:1.Vector space model:Vector Space Model represents each document as a vector with one entry per term. Ifterm j appears k times in document i, the document vector for i contains value k inposition j. The document vector for i contains the value 0 in positions corresponding toterms that do not appear in document i.SearchqueryDocumentsearching withTF/IDF dataEnrichedsearch resultdocumentsOntologyRankingModuleRankeddocuments
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME504The value for a term in a document vector is a Term Frequency (TF) i.e. number ofoccurrences of that term in the given document.2. TF/IDF data:TF is the term frequency which is extracted from vector space model.IDF (inverse document frequency):IDF of a term j is defined as log (N/nj),Where N is the total number of documents.nj is the number of documents that term j appears in.Document vector is refined by using wij.wij= tij × IDFjThe value associated with term j in the document vector for document i, denoted aswij, is obtained by multiplying the term frequency tij by the IDF of term j in the documentcollection. IDF effectively increases the weight given to rare terms.C. Ontology:Domain ontology is designed by using protégé tool . Protégé ontologies can beexported into a variety of formats including RDF(S), OWL, and XML Schema. Conceptsand their relationships are modeled as ontology using Protégé and stored in OWL file.D. Enriching search result using ontology:1. TF/IDF data is used to return the documents relevant to the search query. The text fromsearch query is tokenized to get terms. Then stop words are removed from query terms.Stemming is applied to get stemmed terms. Resulted terms are then mapped in TF/IDFdata to get relevant documents containing the search query.2. OWLAPI is used to map search query to domain ontology. The text from search terms istokenized to get terms and then stop words are removed from terms. Each of the classfrom ontology retrieved to check whether it is in required search terms. If it is present inrequired search query, subclasses are retrieved for that class from ontology. The webpages related to subclasses are accessed to add in relevant documents.E. Ranking of documents in the search result  :Ranking of documents resulted from semantic search is done by using Cosine Similarity.The Cosine Similarity between each document from result and query terms is calculated.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME505Where dj: document from user profileq : query documentN: Number of termsWi,j: Weight of ith term in document j (term frequency).Wi,q: Weight of ith term in document q (term frequency).The documents resulted from ontology mapping is given higher priority; whereasdocuments resulted from searching using TF/IDF data is given second priority.The resulted cosine similarity is boosted by 0.9 if document is resulted from ontologyMapping, and by 0.7 for those documents resulted from TF/IDF data.The resulted web pages are then sorted in descending order based on boosted similarity.5. RESULTS AND EVALUATIONThe result of this system is evaluated on following metrics:1. Precision:Precision is the ratio of the number of relevant documents retrieved to the total numberOf documents retrieved.2. Recall:Recall is the ratio of the number of relevant documents retrieved to the total number ofrelevant documents in the collection.3. F1-measure:F1=ଶכ ୮୰ୣୡ୧ୱ୧୭୬כ୰ୣୡୟ୪୪୮୰ୣୡ୧ୱ୧୭୬ା୰ୣୡୟ୪୪The following search queries are tested for one web site:Q1: “film music”.Q2: “film magic”.Q3: “poetry literature”.Q4: “dance”Q5: “comedy magic”Table 1: Performance of document searching using TF/IDF dataQuery Precision Recall F1-measureQ1 0.67 0.45 0.54Q2 1 0.5 0.67Q3 0.83 0.34 0.47Q4 0.83 0.62 0.71Q5 1 0.28 0.48
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME506Table 2: Performance of proposed methodFig.2 Graph of existing method and proposed method for F1-measure6. CONCLUSIONKeyword based system for searching documents returns only few relevant documentsfor the search query. The search results are based only on the frequency of keywordswithout any extra intelligence.The method proposed in this paper, enrich keyword based search by using ontology.Ontology can help to find documents that are semantically relevant to search query.Keyword based search system with ontology gives more relevant documents for the searchquery.REFERENCES: Webpage: http://www.en.wikipedia.org/ontology Escórcio, L. and Cardoso, J. "Editing Tools for Ontology Construction", in "SemanticWeb Services: Theory, Tools and Applications", Idea Group. 2007. Grigoris Antoniou and Frank van Harmelen, “Web Ontology Language: OWL”. Web Ontology Language (OWL) : http://www.w3.org/2004/OWL/ “A Comparative Study Ontology Building Tools for Semantic Web Applications”International journal of Web & Semantic Technology (IJWesT) Vol.1, Num.3,July 2010.00.10.20.220.127.116.11.70.80.91Q1 Q2 Q3 Q4 Q5ExistingmethodProposedmethodQuery Precision Recall F1-measureQ1 0.75 0.67 0.70Q2 0.75 0.75 0.75Q3 0.86 0.8 0.88Q4 0.86 0.75 0.80Q5 0.86 0.86 0.86
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME507 http://protege.stanford.edu Groth, K., Lanner¨o, P., “Context browser: ontology based navigation ininformation spaces”. Proc. 1st Int. Conf. on information interaction in Context, IIiX:Vol. 176, p. 75-78, 2006. Schroeder, M., Burger, A., Kostkova, P., Stevens, R., Habermann, B., Dieng-Kunts, R“Sealife: A Semantic Grid Browser for the Life Sciences Applied to the Study ofInfectious Diseases”. Vol. 120, p.167-178, 2006. Bonomi, A., Mosca, A., Palmonari, M., Vizzari, G., “Integrating a Wiki in anOntology Driven Web Site: Approach, Architecture and Application in theArchaeological Domain”. 3rdSemantic Wiki Workshop. 2008. Book: “Database management systems”, McGraw Hill, 3rd edition, by : RaghuRamkrishanan, Gehrke. Zhuhadar, L., Nasraoui, O., Wyatt, R.: Dual representation of the semantic userprofile for personalized web search in an evolving domain. In: Proceedings of theAAAI 2009 Spring Symposium on Social Semantic Web, Where Web 2.0 meetsWeb 3.0. (2009) 84–89. Ahu Sieg, Bamshad Mobasher, Robin Burke “Learning Ontology-Based UserProfiles:A Semantic Approach to Personalized Web Search” IEEE Int. InformaticsBulletin Nov.2007  Anna Huang, “Similarity Measures for Text DocumentClustering”, NZCSRSC 2008,April 2008, Christchurch, New Zealand C. Santhosh kumar, D.Palanikkumar, “Dynamic Customization In The BusinessProcess Service Composition Using Ontology” International Journal Of ComputerEngineering & Technology (IJCET) Volume 3, Issue 2, 2012, pp. 138 - 149, IssnPrint: 0976 - 6367, Issn Online: 0976 - 6375. Vinu P.V., Sherimon P.C., Reshmy Krishnan, “Development Of Seafood Ontology ForSemantically Enhanced Information Retrieval” International Journal Of ComputerEngineering & Technology (IJCET) Volume 3, Issue 1, 2012, pp. 154 - 162, ISSNPrint: 0976 - 6367, ISSN Online: 0976 - 6375.
Email sent successfully..