Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Sharmili C, Rexie J. A. M / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.349-353 Efficient Keyword Search Methods In Relational Databases Sharmili C1, Rexie J. A. M 2 1 Post-Graduate Student, Department Of Computer Science And Engineering, Karunya University, India 2 Assistant Professor, Department Of Computer Science And Engineering, Karunya University, IndiaAbstract Now a days the keyword search is a simple to people who are familiar with using webmechanism used by all type of organizations. In search engines. One important advantage ofrelational databases the keyword search is used keyword search is that it enables users to search forto find the tuples by giving queries. But most of information without having to know complexthe methods that contain low performance and structured query languages (e.g., SQL) or priormore storage space for storing the results. So we knowledge about the structures of the underlyingmust find the efficient method for keyword in relational databases. So we go for the Keyword search over relational databasescomparative study for the keyword search in finds the answers of the tuples in the databasesrelational databases. which are connected through primary/foreign keys and contain query keywords. Keyword searchKeywords: relational databases, keyword provides a simple and user-friendly query interfacesearch, IR ranking, banks, discover. to access the data in web and scientific applications. Keyword search is considered to be anI. INTRODUCTION effective information discovery method for structure Data mining is the process that attempts to and semi-structured data. It allows users withoutdiscover patterns in large data sets. It utilizes prior knowledge of schema and query language tomethods at the intersection of artificial intelligence, search. The database research community hasmachine learning, statistics, and database recently recognized the benefits of keyword searchsystems. The overall goal of the data mining process and has been introducing keyword-searchis to extract information from a data set and capabilities for effective keyword search.transform it into an understandable structure for Keyword search over relational databasesfurther use. finds the answers of tuples in the databases which The actual data mining task is the automatic are connected through primary/foreign keys andor semi-automatic analysis of large quantities of data contain query keywords. Existing three types ofto extract previously unknown interesting patterns methods: candidate-network-based methods, Steiner-such as groups of data records (cluster analysis), tree-based algorithms, and Backward expandingunusual records (anomaly detection) and keyword search approaches. These methods aredependencies (association rule mining). This usually explained in the comparative study.involves using database techniques such as spatialindexes. These patterns can then be seen as a kind of II. KEY CONCEPTSsummary of the input data, and may be used in A. Keyword searching using banksfurther analysis or, for example, in machine learning Relational databases are commonlyand predictive analytics. For example, the data searched using structured query languages. The usermining step might identify multiple groups in the needs to know the data schema to be able to askdata, which can then be used to obtain more accurate suitable queries. Search engines on the Web haveprediction results by a decision support system. popularized an alternative unstructured querying andNeither the data collection and data preparation, nor browsing paradigm that is simple and user-friendly.result interpretation and reporting are part of the data Users type in keywords and follow hyperlinks tomining step, but do belong to the overall KDD navigate from one document to the other.process as additional steps. No knowledge of schema is needed. In Keyword search is a proven and widely relational databases, information needed to answer aaccepted mechanism for querying in textual keyword query is often split across the tables/tuples,document systems and the World Wide Web. The due to normalization. The BANKS system enablesdatabase research community has recently data and schema browsing together with keyword-recognized the benefits of keyword search and has based search for relational databases.been introducing keyword-search capabilities into BANKS enables a user to get informationrelational databases, XML databases, graph by typing a few keywords and interacting with,databases, and heterogeneous data sources. controls on the results; absolutely no query language Keyword search provides an alternative or programming is required. BANKS greatly reducesmeans of querying relational databases, which is 349 | P a g e
  2. 2. Sharmili C, Rexie J. A. M / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.349-353the effort involved in publishing relational data on III. COMPARATIVE STUDYthe Web and making it searchable. This section includes a study on some of the algorithm that searches the tuples from theB. Discover: keyword search in relational database but it not efficient in their performance.databases DISCOVER returns qualified joining A. Backward expanding searchnetworks of tuples, that is, sets of tuples that are Backward Expanding Search algorithmassociated because they join on their primary and offers a heuristic solution for incremental computingforeign keys and collectively contain all the query results. Assume that the graph fits in memory.keywords of the query. DISCOVER proceeds in two This is not unreasonable, even for moderately largesteps. First the Candidate Network Generator databases, because the in-memory nodegenerates all candidate networks of relations, that is, representation need not store any attribute of thejoin expressions that generate the joining networks corresponding tuple other than the RID. The onlyof tuples. Then the Plan Generator builds plans for other in-memory structure is an index to map RIDsthe efficient evaluation of the set of candidate to the graph nodes. Indices to map keywords to RIDsnetworks, exploiting the opportunities to reuse can be disk resident. As a result the graphs of evencommon sub expressions of the candidate networks. large databases with millions of nodes and edges canDISCOVER provides a simple interface where the fit in modest amounts of memory. BANKS modelsuser simply types the keywords as would do on a tuples as nodes in a graph, connected by linkssearch engine. induced by foreign key and other relationships. According to DISCOVER, an association Answers to a query are modeled as rooted treesexists between two keywords if they are contained in connecting tuples that match individual keywords intwo associated tuples, i.e., two tuples that join the query. Answers are ranked using a notion ofthrough foreign key to primary key relationships, proximity coupled with a notion of prestige of nodeswhich potentially involve more tuples. As the based on inlinks, similar to techniques developed foramount of information stored in databases increases, Web search. It presents an efficient heuristicso does the need for efficient information discovery. algorithm for finding and ranking query results.Keyword search enables information discoverywithout requiring from the user to know the schema B. Candidate Network Generatorof the database. The Candidate Network Generator inputs the set of keywords k1, . . . ,km, the non-empty tupleC. IR Ranking sets RKi and the maximum candidate networks’ size With the amount of available text data in T and outputs a complete and non-redundant set ofrelational databases growing rapidly, the need for candidate networks. The key challenge is to avoidordinary users to search such information is the generation of redundant joining networks ofdramatically increasing. Even though the major tuple sets. The solution to this problem requires anRDBMSs have provided full-text search capabilities, analysis of the conditions that force a joiningthey still require users to have knowledge of the network of tuples to be non-minimal the conditiondatabase schemas and use a structured query for the totality of the network is straightforward.language to search information. As the amount of information stored in This search model is complicated for most databases increases, so does the need for efficientordinary users. Inspired by the big success of information discovery. Keyword search enablesinformation retrieval (IR) style keyword search on information discovery without requiring from thethe web, keyword search in relational databases has user to know the schema of the database, SQL orrecently emerged as a new research topic. some QBE-like interface, and the roles of the The differences between text databases and various entities and terms used in the query. Inrelational databases result in three new challenges: DISCOVER databases that do not require(1) Answers needed by users are not limited to knowledge of the database schema or of a queryingindividual tuples, but results assembled from joining language, DISCOVER is a system that performstuples from multiple tables are used to form answers keyword search in relational databases. It proceedsin the form of tuple trees. (2) A single score for each in three steps. First it generates the smallest set ofanswer (i.e. a tuple tree) is needed to estimate its candidate networks. Then the greedy algorithmrelevance to a given query. These scores are used to creates a near-optimal execution plan to evaluate therank the most relevant answers as high as possible. set of candidate networks. Finally, the execution(3) Relational databases have much richer structures plan is executed by the DBMS.than text databases. Existing IR strategies areinadequate in ranking relational outputs. In this C. Steiner-tree-based searchpaper, we propose a novel IR ranking strategy for A relational database can be modeled as aeffective keyword search. database graph G = (V, E) such that there is a one-to- one mapping between a tuple in the database and a 350 | P a g e
  3. 3. Sharmili C, Rexie J. A. M / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.349-353node in V. G can be considered as a directed graph A unique feature of keyword search over RDBMSswith two a edge: a forward edge (u, v) ∈ E iff there is that search results are often assembled fromis a foreign key from u to v, and a back edge (v, u) relevant tuples in several relations such that they areiff (u, v) is a forward edge in E. An edge (u, v) inter-connected and collectively be relevant to theindicates a close relationship between tuples u and v keyword query [1], [3]. Supporting such feature has(i.e., they can be directly joined together), and the a number of advantages. Firstly, data may have to beintroduction of two edge types allows differentiating split and stored in different relations due to databasethe importance of u to v and vice versa. When such normalization requirement. Such data will not beseparation is not necessary for some applications, G returned if keyword search is limited to only singlebecomes an undirected graph. relations. Secondly, it lowers the barrier for casual To support keyword search over relational users to search databases, as it does not require usersdata, G is typically modeled as a weighted graph, to have knowledge aboutwith a node weight ω(v) to represent the “prestige” Query languages and database schema.level of each node v ∈ V and an edge weight ω(u, v) Thirdly, it helps to reveal interesting or unexpectedfor each edge in E to represent the strength of the relationships among entities [19]. Lastly, forcloseness relationship between the two tuples. websites with database back-ends, it provides a more Most of existing methods of keyword flexible search method than the existing solution thatsearch over relational databases find the Steiner trees uses a fixed set of pre-built template queries.composed of relevant tuples as the answers. Theyidentify the Steiner trees by discovering the rich E. Tastier Approachstructural relationships between tuples, and neglect In this paper the proposed a novel approachthe fact that such structural relationships can be pre- to keyword search in the relational world, calledcomputed and indexed. Tuple units that are Tastier. A Tastier system can bring instantcomposed of most relevant tuples are proposed to gratification to users by supporting type-aheadaddress this problem. Tuple units can be search, which finds answers “on the fly” as the userprecomputed and indexed. Existing methods identify types in query keywords.a single tuple unit to answer keyword queries. They, A main challenge is how to achieve a highhowever, may involve false negatives as in many interactive speed for large amounts of data incases a single tuple unit cannot answer a keyword multiple tables, so that a query can be answeredquery. Instead, multiple tuple units should be efficiently within milliseconds. It proposes efficientintegrated to answer keyword queries. index structures and algorithms for finding relevant To address this problem, in this paper, we answers on-the-fly by joining tuples in the how to integrate multiple related tuple units to It devises a partition-based method to improve queryeffectively answer keyword queries. It devises novel performance by grouping relevant tuples andindices and incorporates the structural relationships pruning irrelevant tuples efficiently. It also developsbetween different tuple units into the indices. It uses a technique to answer a query efficiently bythe indices to efficiently and progressively identify predicting highly relevant complete queries for thethe top-k relevant answers. It have implemented our user. It has conducted a thorough experimentalmethod in real database systems, and the evaluation of the proposed techniques on real dataexperimental results show that our approach sets to demonstrate the efficiency and practicality ofachieves high search efficiency and accuracy, and this new search paradigm.outperforms state-of-the-art methods significantly. IV. FUTURE RESEARCH DIRECTIONSD. Joined Tuple Tree Algorithm Emphasize on the efficiency and In this paper, it describes the effectiveness effectiveness of keyword search over relationaland the efficiency issues of answering top-k databases by shortening and indexing tuples in thekeyword query in relational database systems. They fundamental relational databases. It proposes thepropose a new ranking formula by adapting existing concept of tuple units to effectively answer keywordIR techniques based on a natural notion of virtual queries. It generates and materializes the tuple units,document. Compared with previous approaches, our which are composed of relevant tuples connected bynew ranking method is simple yet effective, and primary foreign- key relationships. They identify theagrees with human perceptions. It has conducted most relevant and meaningful tuple units to answerextensive experiments on large-scale real databases keyword queries. Moreover, they examine theusing two popular RDBMSs. techniques of indexing and ranking to enhance the It focuses on the problem of supporting search efficiency and search accuracy.effective and efficient top-k keyword search in A tuple unit is a set of highly relevantrelational databases. While many RDBMSs support tuples which contain query keywords. Moreoverfull-text search, they only allow retrieving relevant tuple units can be precomputed and indexed, and ittuples from within the same relation. can use the indexed tuple units to efficiently answer a keyword query. 351 | P a g e
  4. 4. Sharmili C, Rexie J. A. M / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.349-353 Propose a structure-aware index based The problem of effective keyword search overmethod to integrate multiple related tuple units to relational databases is the major issue in theeffectively answer keyword queries. It discovers the databases. Here proposes to integrate multiplestructural relationships between different tuple units relevant tuple units to effectively answer keywordand stores them into structure-aware indices, and queries.progressively finds the top-k answers using such It is devised two novel structure-awareindices. Propose a novel method, namely Saint indexes, SKSA-Index and KPSA-Index, which(Structure-Aware INdexing for finding and ranking incorporate the structural relationships between tupleTuple units), to answer keyword queries over units and the textual relevancy between inputrelational databases. keywords into the indexes. It is suggested a novel Devise two structure-aware indexes, single- ranking mechanism by taking into considerationkeyword based structure-aware index (SKSA-Index) both the textual relevancy in IR literature and theand keyword-pair based structure-aware index structural compactness of tuple units from the DB(KPSA-Index), to capture structural relationships viewpoint.between tuple units. We use the indexes to on-the-fly integrate multiple tuple units to answer a REFERENCESkeyword query. [1] Jianhua Feng, Guoliang Li, and Jianyong Keyword query K=(k1; k2; . . . ; kn} and a Wang, “Finding Top-k Answers inrelational database, first identify the tuple units with Keyword Search over Relational Databasesthe top-k highest scores (using the scoring function Using Tuple Units” IEEE Transactions OnSCORE(K,u)). Then, construct the subtrees which Knowledge And Data Engineering, Vol. 23,are rooted at each identified tuple unit and contain No. 12, December 2011.the paths from the tuple unit to the corresponding [2] Balmin.A., Hristidis.V., andpivotal tuple units for each keyword. Papakonstantinou.Y. (2004) “Objectrank: Finally, take the subtrees as the answers. Authority-Based Keyword Search inNote that the subtree is composed of multiple highly Databases,” Int’l Conf. Very Large Datarelevant tuple units. Indexing in keyword search is Bases (VLDB), 564-575.used to find the keywords in the tuples in an efficient [3] Bhalotia.G., Hulgeri.A., Nakhe.C.,manner. Chakrabarti.S., and Sudarshan.S. (2002) To efficiently retrieve IR scores, propose a “Keyword Searching and Browsing insingle keyword- based structure-aware index, called Databases Using Banks,” Int’l Conf. DataSKSA-Index, which is similar to the traditional Eng. (ICDE), 431-440.inverted index. The entries of SKSA-Index are also [4] Ding.B et al. (2007) “Finding Top-k Min-the keywords that are contained in the underlying Cost Connected Trees in Databases,” IEEEdatabase. Int’l Conf. Data Eng. (ICDE). Different from inverted indexes which only [5] Feng.J., Li.G., Wang.J, and Zhou.L. (2010)maintain the tuple units that directly contain the “Finding and Ranking Compact Connectedkeyword, each entry of SKSA-Index preserves the Trees for Effective Keyword Proximitytuple units that directly or indirectly contain the Search in XML Documents,” Informationkeyword in the form of a triple <TupleUnit, Score, Systems, vol. 35, no. 2, 186-203.TupleUnitLists>, where the Score is the assigned [6] He.H., Wang.H., Yang.J., and Yu.P. (2007)score of the keyword in the tuple unit TupleUnit, and “Blinks: Ranked Keyword Searches onTupleUnitLists preserves the tuple unit lists from Graphs,” ACM SIGMOD Int’l Conf.Unit to the corresponding pivotaltupleunits, which Management of Data.can be obtained from the pivotal tuple unit matrix. [7] Hristidis.V., and Papakonstantinou.Y. The tuple units are sorted by the (2002a) “Discover: Keyword Search incorresponding scores in descending order. Most Relational Databases,” Int’l Conf. Veryimportantly, SKSA-Index captures the rich structural Large Data Bases (VLDB), 670-681.relationships, as each entry preserves the paths from [8] Li.G., Ji.S., Li.C., and Feng.J. (2009a)a given tuple unit to the corresponding pivotal tuple “Efficient Type-Ahead Search onunit and also keeps the structure-aware scores of Relational Data: A Tastier Approach,”tuple units indirectly or directly containing SIGMOD Int’l Conf. Management of Data,keywords. 695-706. [9] Li.G., Ooi B.C., Feng.J., Wang.J., andV. CONCLUSION Zhou.L. (2008b) “Ease: An Effective 3-in-1 The above comparative study algorithm are Keyword Search Method for Unstructured,not efficient because of the disadvantages. These Semi-Structured and Structured Data,”algorithms needed more space for storing the ACM SIGMOD Int’l Conf. Management ofinformation and it also complex for finding the Data, 903-914.tuples from the relational database. 352 | P a g e
  5. 5. Sharmili C, Rexie J. A. M / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.349-353[10] Li.G., Zhou.X., Feng.J., and Wang.J. (2009c) “Progressive Keyword Search in Relational Databases,” IEEE Int’l Conf. Data Eng. (ICDE),1183-1186.[11] Liu.F., Yu.C., Meng.W., and Chowdhury.A. (2006) “Effective Keyword Search in Relational Databases,” ACM SIGMOD Int’l Conf. Management of Data, 563-574.[12] Luo.L., Lin.X., Wang.W., and Zhou.X. (2007) “Spark: Top-k Keyword Query in Relational Databases,” ACM SIGMOD Int’l Conf. Management of Data.[13] Li.G., Feng.J., and Zhou.L. (2008) “Retune: Retrieving and Materializing Tuple Units for Effective Keyword Search over Relational Databases,” Int’l Conf. Conceptual Modeling (ER), 469- 483.[14] Li.G., Feng.J., and Wang.J. (2009a) “Structure-Aware Indexing for Keyword Search in Databases,” ACM Conf. Information and Knowledge Management (CIKM),1453-1456.[15] Li.G., Feng.J., Zhou.X., and Wang.J (2011b) “Providing Built-in Keyword Search Capabilities in RDBMS,” The VLDB J., vol. 20, no. 1,1- 19.[16] Markowetz.A., Yang.Y., and Papadias.D. (2007) “Keyword Search on Relational Data Streams,” ACM SIGMOD Int’l Conf. Management of Data.[17] Qin.L., Yu.J.X., and Chang.L. (2009) “Keyword Search in Databases: The Power of RDBMS,” SIGMOD Int’l Conf. Management of Data, 681-694.[18] Sayyadian.M., LeKhac.H., Doan.A., and Gravano.L. (2007) “Efficient Keyword Search across Heterogeneous Relational Databases,” IEEE Int’l Conf. Data Eng. (ICDE).[19] Yu.B., Li.G., Sollins.K., and Tung.A.K.H. (2007) “Effective Keyword- Based Selection of Relational Databases,” ACM SIGMOD Int’l Conf. Management of Data, 139-150. 353 | P a g e