A survey on various architectures, models and methodologies for information retrieval
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

A survey on various architectures, models and methodologies for information retrieval

on

  • 508 views

 

Statistics

Views

Total Views
508
Views on SlideShare
508
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

A survey on various architectures, models and methodologies for information retrieval Document Transcript

  • 1. INTERNATIONALComputer EngineeringCOMPUTER ENGINEERING International Journal of JOURNAL OF and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME & TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online)Volume 4, Issue 1, January- February (2013), pp. 182-194 IJCET© IAEME: www.iaeme.com/ijcet.aspJournal Impact Factor (2012): 3.9580 (Calculated by GISI) ©IAEMEwww.jifactor.com A SURVEY ON VARIOUS ARCHITECTURES, MODELS AND METHODOLOGIES FOR INFORMATION RETRIEVAL Prakasha S Shashidhar HR Dr. G T Raju sprakashjpg@yahoo.co.in shashi_dhara@yahoo.com gtraju1990@yahoo.com RNSIT, Bengaluru 560098 RNSIT, Bengaluru 560098 RNSIT, Bengaluru 560098 ABSTRACT The typical Information Retrieval (IR) model of the search process consists of three essentials: query, documents and search results. An user looking to fulfill information need has to formulate a query usually consisting of a small set of keywords summarizing the information need. The goal of an IR system is to retrieve documents containing information which might be useful or relevant to the user. Throughout the search process there is a loss of focus, because keyword queries entered by users often do not suitably summarize their complex information needs, and IR systems do not sufficiently interpret the contents of documents leading to result lists containing irrelevant and redundant information. The short keyword query used as input to the retrieval system can be supplemented with topic categories from structured Web resources. The topic categories can be used as query context to retrieve documents that are not only relevant to the query but also belongs to a relevant topic category. Category information is especially useful for the task of entity ranking where the user is searching for a certain type of entity such as companies or persons. Category information can help to improve the search results by promoting in the ranking pages belonging to relevant topic categories, or categories similar to the relevant categories. Users may raise various queries to describe the same information need. For example, to search for National Board of Accreditation, queries “National Board of Accreditation (NBA)” or “NB Accreditation” may be formulated. Directly using individual queries to describe context cannot capture contexts concisely and accurately. Also queries may arise where “NBA” can be expanded as either “National Basketball Association” or “National Board of accreditation”. Hence it becomes extremely important to go for context based query based on the user history and present requirements of the user in that context. In this paper, an extensive survey has been made on different Architectures, Models and Methodologies that have been used in IR by various researchers along with the comparison of results against various performance metrics, also highlighting the need for context based query. 182
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEKeywords: Query Model, Ranking Model, feedback-model, Retrieval model, query context1. INTRODUCTION Given the constantly increasing information overflow of the digital age, theimportance of IR has become critical. Web search is one of the most challenging problems ofthe Internet today, striving to provide users with search results most relevant to theirinformation needs. IR deals with the representation, storage, organization of, and access toinformation items such as documents, Web pages, online catalogues, structured and semi-structured records, and multimedia objects [Baeza-Yates and Ribeiro-Neto, 2011]. Web search engines are by far the most popular and heavily used IR applications. Thenext step in the search process is to translate the information need into a query, which can beeasily processed by the search engine. The primary goal of an IR system is to retrieve all thedocuments which are relevant to a user query while retrieving as few non-relevant documentsas possible. To achieve this goal IR systems must somehow `interpret the contents of thedocuments in a collection, and rank them according to a degree of relevance to the userquery. The `interpretation of a document involves extracting syntactic and semanticinformation from the document and using this information to match the user informationneed. The notion of relevance is at the centre of IR. While for simple navigationalinformation needs the search process is straightforward, for more complex information needswe need focused retrieval methods. The notion of `focused retrieval can be defined asproviding more direct access to relevant information by locating the relevant informationinside the retrieved documents [Trotman et al., 2007]. The first element of the search process is the query. In an ideal situation this shortkeyword query is a suitable summarization of the information need, and the user will onlyhave to inspect the first few search results to fulfill his information need. To overcome theshallowness of the query, i.e., users entering only a few keywords poorly summarizing theinformation need, we add context to the query to focus the search results on the relevantcontext. We define context as: all available information about the users information need,besides the query itself. Different forms of context can be considered to implicitly orexplicitly gather more information on the users search request. Potential forms of querycontext are document relevance, and category information. The second elements of search we examine are the documents. Documents on theWeb are rich in structure. Documents can contain HTML structure, link structure, differenttypes of classification schemes, etc. Most of the structural elements however are not usedconsistently throughout the Web. A key question is how to deal with all this (semi-)structuredinformation, that is how IR systems can `interpret these documents to reduce the shallownessin the document representation. A problem in Web search is the large amount of redundant and duplicate informationon the Web. Web pages can have many duplicates or near-duplicates. Web pages containingredundant information can be hard to recognize for a search engine, but users easilyrecognize redundant information and this will usually not help them in their search. Moststructured Web resources have organized their information in such a way that they do notcontain, or significantly reduce redundant information [Anna Maria Kaptein 2011]. Structured resources provide two interesting opportunities: `Documents categorizedinto a category structure and `Absence of redundant information. Category information is of 183
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEvital importance to a special type of search, namely entity ranking. Entity ranking is the taskof finding documents representing entities of an appropriate entity type that are relevant to aquery. Entities can be almost anything, from broad categories such as persons, locations andorganizations to more specific types such as churches, science-fiction writers or CDs.Searchers looking for entities are arguably better served by presenting a ranked list ofentities. Rather directly, than a list of Web pages with relevant but also potentially redundantinformation about these entities. Category information can be used to favor pages belongingto appropriate entity types[Anna Maria Kaptein 2011]. Search Intent and Context is an important criterion in catering to the users query.Suppose a user raises a query “apple” It is hard to determine the user’s search intent that is,whether the user is interested in the history of apple Inc, or the fruit apple. Without looking atthe context of search, the existing methods often suggest many queries for various possibleintents, and thus result in a low accuracy in query suggestion. The query context whichconsists of the search intent expressed by the users’ recent queries can help to betterunderstand thesaurus search intent and make more meaningful suggestions.2. DIFFERENT MODELS USED IN IR For effectively retrieving relevant documents by IR strategies, the documents aretypically transformed into a suitable representation. Each retrieval strategy incorporates aspecific model for its document representation purposes. Keke Cai et al., in their paper useretrieval process based on context-based Retrieval model consists of KL_divergence retrievalmodel for initial retrieval [9]. Similarly Tangjian Deng et al., present a brain memoryinspired, context-based information re-finding framework, which enables users to re-findresults accessed before by relevant contexts [16]. Yunping Huanget et al., propose a newquery model refinement approach: random walk smoothing method which exploits theexpanded terms and term relationships based on the feedback documents [13]. Xiaohui Yanet al., address the problem of context-aware query recommendation. Unlike the existingapproaches which leverage query sequence patterns in query sessions, they use the click-through of the given query as the major clue of user search intents to provide context-awarerecommendation [22]. Chang Liu and Nicholas J. Belkinhas proposes an a personalized IRmodel based on implicit acquisition of task type and document preferences as search contextby observing and analyzing user behaviors, and then use implicit relevance feedback to re-rank or reformulate user queries to help users search effectively and efficiently [4].Huanhuan Cao et al., proposes modeling search context by CRF[31]. Ji-Rong Wen et al.,proposes four models for contextual retrieval [20]. Protima Banerjee et al., proposed theAspect Model forms the foundation of the Probabilistic Latent Semantic Analysis (PLSA)method. They also put forward a technique that estimates a relevance model from the queryalone without the need for training data. Yan Qi et al., proposes a Query-driven feedback-based conflict resolution. They have developed data structures and algorithms to enablefeedback-based conflict resolution during query processing on imperfectly aligned data [25]. The various models listed above are used for query expansion with the help of variousfeedback techniques. By expanding the query it adds a context to the query. The above saidmodels are also used for ranking the query. Comparison of these models has been presentedin Table 1. 184
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME Model Author Approach Parameters Inference Inputs Markov Random Field (MRF). MRR are Top ranked respectivel-y KL_Divergence KekeCai MRF based document list Top-ranked improved byRetrieval Model[9]. sentence retrieval and ranked list documents 19.7%, 25.5% and Bayesian average 24.1% network Query model Random walk Yunping Score of each refinement smoothing Query Huang vertex approach[13]. method λ to 0.1 λ -controls the or 0.2 usually Probabilistic Xiaohui High-order weight of the Feedback yields the best model[22]. Yan method initial query documents retrieval model. performance Intuitive Model Query And 51:1% of the query Modeling Search Huanhuan Context Model occurrences & Context by Document- dmax Query’s Cao 51:7% of the URL CRF[20]. Eliminate Noisy clicks remained Elements Model Documents Improvement in Protima smoothing withAspect Model[25]. PLSA method precision & recall Banerjee parameter - λ probability (no % specified) p(d) Concept matching. Quest The FICSR pre- Query driven processing Feedback based Yan Qi et module the stabbed version k- simple paths User query Conflict al Constraint was 60% faster resolution[15]. analysis & system feedback User’s feedback Query Model and vk-aggregate set of Liang Jeff Mean precision Ranking document keywords - Chen 10.2 for 30 query Model[33]. parameter- sc (Qk) Table1. Comparison of Various Models used by different authors for IR3. THE VARIOUS ARCHITECTURES OF IR The various architectures for query context are defined since all the existing systemsdo not perform ranking a query pattern according to context. Some of the architectures arementioned in the following sentences. Giorgio Orsi et al., has proposed a SAFE architecturethat receives input of sequence of keywords and produces, as output, a ranking over a set ofquery patterns, possibly with a suggested assignment for their parameters [19]. They alsopropose The Context Model is an instantiation of the context vocabulary and defines thecontext model for the given application. In particular, the context-model specifies the(possibly hierarchical) context dimensions for the specific application, along with theirpossible values. A K Sharma et al., proposes Query Semantic Search System (QUESEM, 185
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME/’Qu-sem/) to improve the search quality. QUESEM maintains a database of definitions(referred to as Definition Repository), as the core of the system to accomplish its desiredtask [26]. Haizhou Fu et al., proposes CoSisystem architecture consists of three corecomponents: an indexer, a context-sensitive cost model and a query interpreter [23].Christian Sengstock and Michael Gertz proposes architecture of the CONQUER system iscomposed of a model generation component, a model index, and a suggestion service[37]. Reiner Kraft et al., propose the overall Y! Q system design and architecture. The Y!Q back-end comprises three major system components for processing contextual searchqueries: Content Analysis (CA), Query Planning and Rewriting Framework (QPW), andContextual Ranking (CR) [29]. Liang Jeff Chen et al., proposes Query Model andRanking Model. In Query model a document, denoted by d, is modeled as a tuple offields, each consisting of a bag of words [33]. The various architectures mentioned above suggest to improve the retrievalprocess by enhancing the context of query. A comparison of these architectures ispresented in Table 2. Models / Architecture Authors Inputs Inference Methods 65% queries were found on top SAFE Giorgio Keyword The Context of theranked list25% of cases,architecture [19]. Orsi Search, Model users found the query in the second position indexer CoSi will learn what user is a context- asking for & rank the intended CoSisystem Haizhou keyword sensitive cost interpretationhigher such that thearchitecture [23]. Fu queries model end users can _nd them more query easily. interpreter Model Generator space-complexity of O(1) perArchitecture Of patterns Christian node in the FP-tree & O(1)the CONQUER and their Model Index Sengstock runtime-complexity overhead for System[ 37]. synopses Suggestion each node update opertion. Service CA Y!Q is superior to Yahoo! WS Y!Q System component 32.3% of the context and query Reiner Design And QPW’s pairs, while Yahoo! WS is better KraftArchitecture[29]. only 8.3% of them (with 59.4% CR tied.) Table 2: Comparison of various Architectures proposed by different authors for IR 186
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME4. METHODOLOGIES PROPOSED BY DIFFERENT BY AUTHORS A K Sharma et al., proposes two algorithms, Local Site Search for Query andDefinition Generation & Annotation. As the response pages are retrieved from dictionarybased sites, it is assumed that they will contain the direct thesaurus and synonyms of thequery terms[26]. Lidong Bing et al., proposes scoring algorithm and Latent TopicAnalysis and Training Algorithm [32]. Wenwei Xue et al., proposes algorithm for context attribute matching and contextschema matching [27]. Reiner Kraft et al.,proposed two algorithms for ranking andfiltering of documents. They are rank averaging and MC4 [29]. Liang Jeff Chen et alproposes Data-Mining-based Selection and graph decomposition algorithm [33].Huanhuan Cao et al., proposes algorithm for clustering queries. In their method, a clusterC is a set of queries [36]. ZimingZhuang and Silviu Cucerzan proposes re-rankingalgorithm. Q-Rank is based on a straight-forward yet very effective rationale, that themost frequently seen query extensions of a target query (terms extracted from queries thatcontain the target query as an affix) and adjacent queries (queries that immediatelyprecede or follow a query in a user search session) provide important hints about users’search intents [35]. Zhen Liao et al., proposes Query Stream Clustering with IterativeScanning (QSC-IS). Query Stream Clustering with Master-Slave Model (QSC-MS) andquery suggestion algorithm [1]. Mariam Daoud et al., proposed session basedpersonalized search algorithm which describes the general view of the overall process ofour session-based personalized search is set according to the algorithm [30]. MinminChen et al., proposed adaptive self training algorithm [31]. Self training is a verycommonly used algorithm to wrap complex models for semi-supervised learning [30]. The various algorithms used in IR range from query clustering, query ranking, toquery suggestion to query expansion. The query clustering usually clusters similar queriesthat leads to a similar or same documents viewed by the user. In query ranking algorithmthe queries are ranked according to frequency with which users raise their queries. Thealgorithms that use the concept of query expansion use some kind feedback or probabilitytechnique to expand the query. A comparison of these methodologies has been presentedin Table 3.5. APPLICATIONS OF IR The applications of IR are mainly classified into general applications and domainspecific applications. The general applications includes digital libraries, Search Enginesetc, Domain specific application includes Expert Search Finding, Genomic IR GeographicIR etc.,5.1General applications of IRDigital libraries: A digital library is a library in which collections are stored in digitalformats (as opposed to print, microform, or other media) and accessible by computers. 187
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME Parameters Author Technique / Methodology Outcome / Results performance considered Defination_Generator_An AK Sharma et Keywords From 0.6 lakhs to a 1.6 lakhs relevance notator(D) al[26]. results is achieved from 2.5 lakhs results Local _Site_searching Query Scoring algorithm Query The differences between theLindongbing et performances of our method and CTA Latent topic analysis and al[32]. Ranking are significant with significance level training algorithm 0.05. Context attribute a pair of context matching attributes schema matcher CAMSUBSYN achieved as high as Wenweixue et integrates a local 100% precision and 64% recall upon our al[27]. Context schema matching schema into the dataset current set of global schemas assigning a score Rank averaging algorithm to every position in a rank list, 95 % confidence interval is [2.873, the input is k 2.972]), compared to an average of 2.54 Reneirkraft et ranked lists ([2.45, 2.66]) based on ComScore al[29]. which (which includes MSN, Google, and MC4 algorithm are the top few Yahoo) results of k sub queries. For two keyword Data-mining based combinations The average number of MeSH terms in aLiang Jeffchen et selection algorithm P1; P2, citation after the inheritance is 44better al[33]. Graph decomposition keyword ranking in 21 out of 30 queries algorithm combinations The average overall precision of CRF-B,Huanhuancao et Algorithm for clustering Diameter CRF-B-C and CRF-B-C-T is improved al[36]. queries parameter Dmax across different K by 50%, 52% and 57%, respectively. Interpolation parameter (γ). WhenZimingzhuang et varying γ, on average, Q-Rank improved Re-ranking algorithm adjacent queries al[35]. the rankings for 75.8% of the re-ranked queries. Query stream clustering The M1-th with iterative scanning query. Zhen liao et al Query stream clustering Total response time is still small, that is, x modM= ω. [1]. with master-slave model about 0.3 millisecond. preceding Query suggestion queries The setting (r =0,3) produces the best Mariam Session personalized improvement in personalized search Query daoud[30]. search algorithm since it produces higher precision improvement at P@5 (11,63%). Adaptive self trainingMinminchen[31] Unlabeled 51.38% precision with only 10% of the with conditional random . queries training data labeled. fields Table 3: Comparison of different methodologies for IR 188
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEThe digital content may be stored locally, or accessed remotely via computer networks. Adigital library is a type of IR system.Search engines : - Desktop search: is the name for the field of search tools which search the contents of a users own computer files, rather than searching the Internet. These tools are designed to find information on the users PC, including web browser histories, e-mail archives, text documents, sound files, images and video. - Enterprise search : Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience. - Federated search : Federated search is an IR technology that allows the simultaneous search of multiple searchable resources. A user makes a single query request which is distributed to the search engines participating in the federation. The federated search then aggregates the results that are received from the search engines for presentation to the user. - Mobile search : Mobile search is an evolving branch of IR services that is centered on the convergence of mobile platforms and mobile phones, or that it can be used to tell information about something and other mobile devices. Web search engine ability in a mobile form allows users to find mobile content on websites which are available to mobile devices on mobile networks - Social search : Social search or a social search engine is a type of web search that takes into account the Social Graph of the person initiating the search query. When applied to web search this Social-Graph approach to relevance is in contrast to established algorithmic or machine-based approaches where relevance is determined by analyzing the text of each document or the link structure of the documents.Web search : It is designed to search for information on the World Wide Web. The searchresults are generally presented in a line of results often referred to as Search Engine ResultsPages (SERPs). The information may be a specialist in web pages, images, information andother types of files. Some search engines also mine data available in databases or opendirectories.5.2 Domain Specific applications of IR In domain specific IR the information is based on a particular domain andclassification based on the specific domain. The domain may be legal system, geographicsystem etc…Expert search finding: Expert search is a task of growing importance in Enterprise settings.An expert search system predicts and ranks the expertise of a set of candidate persons withrespect to the user’s query.Genomic IR: The in-silico revolution has changed how biologists characterise DNA andprotein sequences. As a first step to exploring the structure and function of an unknownsequence, biologists search large genomic databases for similar sequences. This process ofGenomic IR has allowed significant advances in biology and led to advancements in criticalareas such as cancer research. 189
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEGeographic IR : Geographic IR (GIR) is the augmentation of IR with geographic metadata.GIR involves extracting and resolving the meaning of locations in unstructured text. This isknown as Geo-parsing. After identifying location references in text, a GIR system must indexthis information for search and retrievalLegal IR : Legal IR is the science of IR applied to legal text, including legislation, case law,and scholarly works. Accurate legal IR is important to provide access to the law to laymenand legal professionalsVertical search : A vertical search engine, as distinct from a general web search engine,focuses on a specific segment of online content. The vertical content area may be based ontopicality, media type, or genre of content. Common verticals include shopping, theautomotive industry, legal information, medical information, and travel.5.3 Other Applications of IR IR has been applied in other fields also such as Adversarial IR , Automaticsummarization, Question Answering etc.,Adversarial IR : Adversarial IR is a topic in IR related to strategies for working with a datasource where some portion of it has been manipulated maliciously. Tasks can includegathering, indexing, and filtering, retrieving and ranking information from such a data source.Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulationAutomatic summarization : Automatic summarization is the creation of a shortened versionof a text by a computer program. The phenomenon of information overload has meant thataccess to coherent and correctly-developed summaries is vital. As access to data hasincreased so has interest in automatic summarization. An example of the use ofsummarization technology is employed in Google search engine.Multi-document summarization : Multi-document summarization is an automaticprocedure aimed at extraction of information from multiple texts written about the same topic - Compound term processing : Compound term processing is the name that is used for a category of techniques in IR applications that performs matching on the basis of compound terms. Compound terms are built by combining two (or more) simple terms, for example "triple" is a single word term but "triple heart bypass" is a compound term.Cross-lingual retrieval : Cross-Language IR (CLIR) is a subfield of IR dealing withretrieving information written in a language different from the language of the users query. - Document classification : The task of document classification is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is used mainly in information science and computer scienceSpam filtering : is a statistical technique of e-mail filtering. It makes use of a naive Bayesclassifier to identify spam e-mail.Question answering : Question Answering (QA) is a computer science discipline within thefields of IR and Natural Language Processing (NLP) which is concerned with buildingsystems that automatically answer questions posed by humans in a natural language. A QAimplementation, usually a computer program, may construct its answers by querying astructured database of knowledge or information, usually a knowledge base. 190
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME6. OPEN ISSUES/CHALLENGES Although the discussed models implement efficiently the stated objectives, butstill they lack in efficient retrieval process when context is to be considered. When usersubmits a query for the first time, the search engine is unable to find a context of thequery. However, if some events of web pages can be captured, this problem can beresolved. Some of the open challenges in this area are Reducing the volume of the documents for effective retrieval. i.e., to improve the quality of documents to be considered for retrieval through filtering of irrelevant and redundant documents Ranking of structured and unstructured documents for better accuracy in retrieval Context awareness in both modeling and scaling up of query suggestion Visualization and presentation of search results with in-depth summarized analysis.To address the above challenges, we propose a novel retrieval technique which is querybased on the context along with concept which enhances retrieval operation throughexploitation of unstructured documents that can increase the focused retrieval ofdocuments especially from web by capturing recent browsing sessions of the user.The snippets used in modern Web search are query based and are proven to be better thanstatic document summaries. For instance, we can examine for the word clouds, in respectof the following:Depth on the query side: to add depth on the user side is a bottleneck for deliveringmore accurate retrieval results. Users provide only 2 to 3 keywords on average to searchin the complete Web.Depth in the document representation: Documents on the Web are rich in structure.Most of the structural elements however are not used consistently throughout the Web. Akey question is how to compact with semi structured information.Depth on the result side: While a query can have thousands of relevant results, only thefirst 10 or 20 results will get any attention in a Web search interface. Often these first nresults will still contain redundant information. Our main objective is to exploit query context and document structure to addressfollowing challenges Ambiguity in query from the user Appropriate feedback from the user search logs Effective use and exploitation of structured and unstructured documents for better query formulation and search results. 191
  • 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME7. CONCLUSION In this paper, we have discussed and analyzed various models, algorithms andarchitectures against their performance that have been used by various researchers in IR. Thevarious models discussed are used for query ranking and query expansion with the help ofvarious feedback techniques that adds context to the query. The various architecturesdiscussed are either completely new architectures or some variations in the existingarchitecture models to improve the retrieval process by enhancing the context of query. Thevarious algorithms used in IR range from query clustering, query ranking, to querysuggestion and query expansion. The query clustering usually clusters a similar query thatleads to a similar set of documents viewed by the user. In query ranking algorithm, thequeries are ranked according to frequency with which the users submit their queries. Thealgorithms that use the concept of query expansion use some kind feedback or probabilitytechnique to expand the query. Although the discussed models implement efficiently thestated objectives, but still they lack in efficient retrieval process when context is to beconsidered. Hence exploitation of structured and unstructured documents which can increasethe focused retrieval of documents from web has become a challenging one.REFERENCES[1] Zhen Liao, Nankai University, Daxin Jiang, Microsoft Research Asia, Enhong Chen, University of Science and Technology of China, Jian Pei, Simon Fraser University, HUANHUAN CAO, University of Science and Technology of China, Hang Li, Microsoft Research Asia “Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion “ACM Transactions, October 2011.[2] Mario Cataldi Università di Torino, Claudio Schifanella Università di Torino K. SelçukCandan Arizona State University, Maria Luisa SapinoUniversità di Torino Luigi Di Caro Università di Torino “CoSeNa: a Context-based Search and Navigation System” 2009 October ACM.[3] Michal Kajaba and PavolNavrat, “Personalized Web Search Using Context Enhanced Query”.International Conference on Computer Systems and Technologies - CompSysTech’09[4] Chang Liu and Nicholas J. Belkin “Implicit Acquisition of Context for Personalization ofInformation Retrieval Systems”CaRR 2011, February 13, 2011, Stanford, CA, USA.[5] Ziv Bar-Yossef Google Inc. MATAM, Bldg 30 Israel and Naama Kraus Computer Science Department Technion, Israel “Context-Sensitive Query Auto-Completion”CIKM’10, October 26–30, 2010, Toronto, Ontario, Canada. Copyright 2010 ACM.[6] RianneKaptein University of Amsterdam, “Effective Focused Retrieval by Exploiting Query Context and Document Structure” ACM October 6, 2011.[7] Zheng Ye1;2, Xiangji Huang2 and Hongfei Lin1 1Department of Computer Science and Engineering, Dalian University of Technology Dalian China 2 School of Information Technology York University, Toronto, Ontario, M3J 1P3, Canada “A Bayesian Network Approach to Context Sensitive Query Expansion” SAC’11 March 21-25, 2011, TaiChung, Taiwan. Copyright 2011 ACM.[8] Minmin Chen1,Jian-Tao Sun2, Xiaochuan Ni2, Yixin Chen1 1Department of Computer Science and Engineering Washington University in Saint Louis, Saint Louis, MO, USA 2Microsoft Research Asia, Beijing, P.R. China “ Improving Context-Aware Query Classification viaAdaptive Self-training” October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM.[9] KekeCai, Chun Chen*, Jiajun Bu, Peng Huang, Zhiming Kang College of Computer Science, University Hangzhou,China “Exploration of Query Context for Information Retrieval” May 8– 12, 2007, Banff, Alberta, Canada. ACM. 192
  • 12. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME[10] Lev Finkelstein, EvgeniyGabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, GadiWolfman, And EytanRuppin Zapper Technologies, Inc. “Placing Search in Context: The Concept Revisited” ACM Transactions on Information Systems, Vol. 20, No. 1, January 2002.[11] Raymond Y.K. Lau, Centre for Information Technology Innovation, Queensland University of Technology and Peter D. Bruza and Dawei Song, Distributed Systems Technology Centre, The University of Queensland, Australia “Belief Revision for Adaptive Information Retrieval” July 25–29, 2004, Sheffield, South Yorkshire, UK. Copyright 2004 ACM.[12] Jiang Bian,College of Computing, Georgia Institute of Technology, Tie-Yan Liu, Tao Qin Microsoft Research Asia,HongyuanZha,College of Computing, Georgia Institute of Technology “ Ranking with Query-Dependent Loss for Web Search” February 4–6, 2010, New York City, New York, USA. Copyright 2010 ACM.[13] Yunping Huang, Le Sun Institute of Software, Chinese Academy of Sciences, Beijing, China and Jian-Yun Nie ,Department of Computer Science and Operations Research, University of Montreal, Canada “Query Model Refinement Using Word Graphs” October 26–30, 2010, Toronto, Ontario, Canada. Copyright 2010 ACM.[14] Jing Bai 1, Jian-Yun Nie 1,Hugues Bouchard 2, and Guihong Cao 1 1 Department IRO, University of Montreal Canada 2 Yahoo! Inc. Montreal, Quebec, Canada “Using Query Contexts in Information Retrieval” July 23–27, 2007, Ámsterdam, The Netherlands. Copyright 2007 ACM.[15] Yan Qi Arizona State University Tempe, USA, K. SelçukCandan, Arizona State University, Tempe, AZ 85287, USA and Maria Luisa Sapino ,Universita’ di Torino,Italy”FICSR: Feedback-based InConSistencyResolution and Query Processing on Misaligned Data Sources” June 12–14, 2007, Beijing, China. Copyright 2007 ACM.[16] Tangjian Deng, Liang Zhao, Ling Feng Tsinghua ,National Laboratory for Information Science and Technology Tsinghua University, Beijing, China and WenweiXue Nokia Research Center, Beijing, China “Information Re-finding by Context: A Brain MemoryInspired Approach” October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM.[17] Xing Wei, FuchunPeng, Huihsin Tseng Yumao Lu, Benoit Dumoulin Yahoo! Labs, California, USA, “Context Sensitive Synonym Discovery for Web SearchQueries” November 2–6, 2009, Hong Kong, ChinaCopyright 2009 ACM.[18] Ivan T. Bowman, School of Computer Science, University of Waterloo And Kenneth Salem School of Computer Science ,University of Waterloo “ Optimization of Query Streams Using SemanticPrefetching” June 1318 2004, Paris, France, Copyright 2004 ACM.[19] Giorgio Orsi, Politecnico di Milano,Italy,LetiziaTanca,Politecnico di Milano, Italy, Eugenio Zimeo,Universitá del Sannio,Italy“Keyword-based, Context-aware Selection of Natural Language Query Patterns” March 22–24, 2011, Uppsala, Sweden., Copyright 2011 ACM.[20] Huanhuan Cao1¤,Daxin Jiang2 Jian Pei3 Enhong Chen1 Hang Li2 ,1University of Science and Technology of China 2Microsoft Research Asia 3Simon Fraser University “Towards Context- Aware Search by Learning A Very Large Variable Length Hidden Markov Model from Search Logs” April 20–24, 2009, Madrid, Spain. ACM.[21] Carla Teixeira Lopes, Departamento de EngenhariaInformáticaFaculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias , Portugal, Cristina Ribeiro, Departamento de EngenhariaInformáticaFaculdade de Engenharia, Universidade do “Context Effect on Query Formulation and Subjective Relevance in Health Searches” August 18–21, 2010, New Brunswick, New Jersey, USA. Copyright 2010 ACM.[22] Xiaohui Yan, JiafengGuo, Xueqi Cheng, Institute of Computing Technology, CASBeijing, China “Context-Aware Query Recommendation by Learning High-Order Relation in Query Logs” October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM.[23] HaizhouFu,North Carolina State, University, Raleigh, NC, SidanGao,North Carolina State University, Raleigh, NC,KemaforAnyanwu,North Carolina State, University, Raleigh, NC “CoSi: Context-Sensitive Keyword Query Interpretation on RDF Databases” 2011, March 28– April 1, 2011, Hyderabad, India. ACM. 193
  • 13. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME[24] Ying-Hsang Liu Nicholas J. Belkin, Rutgers University, USA “Query Reformulation, Search Performance, and Term Suggestion Devices in Question-Answering Tasks” Information Interaction in Context, 2008, London, UK Copyright 2008 ACM.[25] Protima Banerjee, College of Information Science and Technology, Drexel University Philadelphia, and Hyoil Han ,College of Information Science and Technology, Drexel University Philadelphia, USA “Incorporation of Corpus-Specific Semantic Information into Question Answering Context” October 30, 2008, Napa Valley, California, USA. Copyright 2008 ACM.[26] A. K. Sharma Computer Engg. Department YMCA Univ. of Sc. & Technology Faridabad, India, NeelamDuhan Computer Engg. Department YMCA Univ. of Sc. & Technology Faridabad, India and Bharti Sharma Computer Engg. Department MVN Instt. ofEngg& Technology Palwal, India“A Semantic Search System using Query Definitions” December 28- 30, 2010, Allahabad, UP, India. Copyright 2010 ACM.[27] WenweiXue, HungkengPung, Paulito P. PalmesSchool of Computing, National University of Singapore , Singapore 117543 and Tao GuInstitute for Infocomm Research ,Terrace, Singapore “Schema Matching for Context-Aware Computing” September 21-24, 2008, Seoul, Korea. Copyright 2008 ACM.[28] Huanhuan Cao1 Derek Hao Hu2 Dou Shen3 Daxin Jiang4 ,Jian-Tao Sun4 ,Enhong Chen and Qiang Yang2 ,1University of Science and Technology of China 2Hong Kong University of Science and Technology 3Microsoft Corporation 4Microsoft Research Asia “Context-Aware Query Classification” July 19–23, 2009, Boston, Massachusetts, USA. Copyright 2009 ACM.[29] Reiner Kraft, Chi Chao Chang, FarzinMaghoul, Ravi Kumar Yahoo!, Inc. Sunnyvale, USA “Searching with Context”.[30] Mariam Daoud,LyndaTamine-Lechani and MohandBoughanem Institute de Recherche enInformatique de Toulouse, France“Learning user interests for a session-based personalized search” Information Interaction in Context, 2008, London, UK. Copyright 2008 ACM.[31] Ji-Rong Wen, Microsoft Research Asia Beijing, China,Ni Lao, Tsinghua University Beijing, China and Wei-Ying Ma Microsoft Research Asia Beijing, China “Probabilistic Model for Contextual Retrieval” July 25-29, 2004, Sheffield, South Yorkshire, UK. Copyright 2004 ACM.[32] Lidong Bing Wai Lam ,Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong Shatin, Hong Kong and Tak-Lam Wong Department of Mathematics and Information Technology The Hong Kong Institute of Education “Using Query Log and Social Tagging to Refine Queries Based on Latent Topics” October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM.[33] Liang Jeff Chen, UC San Diego La Jolla, CA, US and YannisPapakonstantinou UC San Diego “Context-sensitive Ranking for Document Retrieval” June12–16, 2011, Athens, Greece. Copyright 2011 ACM.[34] Reiner Kraft, FarzinMaghoul and Chi Chao ChangYahoo!, Inc.701 First AvenueSunnyvale, CA 94089“Y!Q: Contextual Search at the Point of Inspiration” October 31–November 5, 2005, Bremen, Germany. Copyright 2005 ACM.[35] ZimingZhuang, The Pennsylvania State University, University Park, USA and SilviuCucerzan Microsoft Research Redmond, USA “Re-Ranking Search Results Using Query Logs” November 5–11, 2006, Arlington, Virginia, USA. ACM.[36] Huanhuan Cao1 Daxin Jiang2 Jian Pei3 Qi He4, Zhen Liao5, Enhong Chen1 ,Hang Li2 ,1University of Science and Technology of China ,2Microsoft Research Asia, 3Simon Fraser University,4Nanyang Technological University ,5Nankai University“Context-Aware Query Suggestion by Mining Click-Through and Session Data” August 24–27, 2008, Las Vegas, Nevada, USA. Copyright 2008 ACM.[37] Christian Sengstock and Michael Gertz Institute of Computer Science, University of Heidelberg, Germany“CONQUER: A System for Efficient Context-awareQuery Suggestions” 2011, March 28–April 1, 2011, Hyderabad, India, ACM. 194