SlideShare a Scribd company logo
KEYWORDS SEARCH
ON STRUCTURED
DATABASE
Xiaoyu Chen, Min Li, Yihan Gao, Tianning
Xu
Introduction
 Structured data
 Schema as a summary of the data
 Retrieve through structured language
 What would big data bring to structured data
retrieval?
Introduction
 In terms of high volume of data
 Hadoop + Pig Latin came to rescue
 However, is this enough?
 Recall how you write selection. What do you need
to know
 Can you remember this ?
Introduction
 Big data-> big and complicated schema
 Hard to remember and operate!
 May not even fit in main memory!
 What should we do about it ?
 How does information retrieval deals with this ?
Introduction
 Search based on keywords
 No need for schema
 Efficiency guaranteed using index
 All seem to to be straightforward and easy
 What are the challenges ?
Introduction
 Search for “Apple + company”
 Match to “apple(fruit)”, “Apple Inc.”, “Adams’
apple”
 Which one is correct ? How to filter?
Challenge1:
Filtering and disambiguat
Introduction
 Search for “Steve Jobs + Apple”
 Normalization. What to return ?
ID Nam
e
Gend
er
Employ
er
Location
ID Company Locatio
n
Type Product
ID Street City State Countr
y
Challenge2:
Automatic join
back
Introduction
 Search for “Jordan”
 Match “Jordan (brand)” , ”Michael Jordan (player)”,
“Michael Jordan (professor)” etc.
 All of them should match. Which one is better ?
 Ranking
Challenge3:
Ranking of the
result
Literature Overview
 Two kinds of approaches
 1. Interpretative approach
 Reuse database query language and index
 Translate the keywords into queries
 Will introduce 3 papers
 2. Un-interpretative approach (focus)
 Typically build own index and data structure
 Model as graph and use graph-based analysis
 Will introduce 3 papers
Literature Overview –
Interpretative approach
 DBXplorer Sanjay Agrawal et al.
 General: two steps
 Publish step: pre-computation, indexing etc.
 Search step: lookup, enumerate over join tree,
generate SQL etc.
 Efficiency:
 Symbol table (index) design
 Symbol table compaction
Literature Overview –
Interpretative approach
 Publish step:
 1: A database is identified, along with the set of
tables and columns within the database to be
published.
 2: Auxiliary tables are created for supporting
keyword searches. E.g. index table
 But, how to build efficient index ?
Literature Overview –
Interpretative approach
 Index goal: find out the keyword belonging
row_id and column_id.
 If the column (attribute) already has index, we
need only column_id index (reuse database
index)
ID Name Gender Addr Org
1
2
3
Column index
Row index
Literature Overview –
Interpretative approach
 Compress index table
 Foreign key constraint etc.
 General Algorithm -- CP-Comp
Name Product …
Name Gender …
Sells table
Person table
Table 1.
Compressed
table Table 2.
Uncompressed table
Literature Overview –
Interpretative approach
 Search step
 Step 1: look up index find columns/rows of the
database that contain the query keywords.
 Step 2: All potential subsets of tables in the
database that, if joined, might contain rows having
all keywords, are identified and enumerated. Join
Tree
 Step 3: For each enumerated join tree, a SQL
statement is constructed (and executed) that joins
the tables in the tree and selects those rows that
contain all keywords. The final rows are ranked
and presented to the user.
Literature Overview –
Interpretative approach
 Join Tree example:
Literature Overview –
Interpretative approach
 Keyword Search in Databases: The Power of
RDBMS
 Lu Qin et al.
 SIGMOD 09
Integrating IR and DB
 DB techniques provide users with efficient
ways to access structured data in RDBMSs
 IR techniques allow users to use keywords to
access unstructured data
 Eg. Structural keyword search, finds how
tuples that contain keywords in a RDB are
interconnected (the structure), three types:
Schema-based approach
Connected Tree Semantics: query
results in minimal total joining network
of tuples; adjacent tuples joined by
foreign key reference, #tuples <=
Tmax
Connected Tree Semantics
 1. Candidate Network (CN) generation:
relational algebra expressions that creates
trees with all keywords up to a certain size
 2. CN evaluation: evaluates generated CNs
using SQL
Schema-based approach
Distinct Root Semantics: query
results in collection of tuples all
reachable from root; root uniquely
defines tuples, distance(any tuple,
root) <= Dmax
Schema-based approach
Distinct Core Semantics: query results in
multi-center subgraphs (communities);
keyword tuples uniquely defines a
community, distance(any keyword tuple, any
center tuple) <= Dmax
Distinct Core/Root Semantics
 1. Creates pairs between tuple containing
keyword and every other tuple, that is the
shortest distance between them
 2. generate graphs using SQL with distinct
core/roots
Literature Overview –
Interpretative approach
 Keyword search over relational databases: a
metadata approach.
 Bergamaschiet al.
 SIGMOD 11
Problem Definition

A database D is a collection of relational tables. Each relational table
contains its name, attributes and value domains. All these elements
together form the vocabulary.

A keyword query q is an ordered list of keywords. Each keyword
specifies the element of the interest.

A configuration of a keyword query on Database is an injective
mapping from the keyword to vocabulary of the database

Task: First derive the top configurations based on some metrics and
then interpret it as SQL query (select-project-join interpretations)
From Keywords to Queries

Need to consider inter-dependency of the query keywords:
Introduce two different kinds of weights: the intrinsic weights, and the
contextual weights

Need to give a ranked list of all the configurations
Develop an algorithm based on and extends the Hungarian (a.k.a.,
Munkres) algorithm

Need to separate the process of evaluating the schema terms and
value terms
Evaluate the value weights based on the schema mapping
Contributions and Insights

Formally define the problem of keyword querying over relational
databases that lack a-priori access to the database instance

Introduce the notion of a weight as a measure of the likelihood that the
semantics of a keyword are represented by a database structure.
Need to consider both intrinsic weights and contextual weights

Extend and exploit the Hungarian (a.k.a., Munkres) algorithm to
generate a ranking of different interpretations.
Literature Overview
 Two kinds of approaches
 1. Interpretative approach
 Reuse database query language and index
 Translate the keywords into queries
 2. Un-interpretative approach
 Typically build own index and data structure
 Model as graph and use graph-based analysis
Literature Overview –
Un-interpretative approach
 Effective Keyword Search in Relational
Databases
 Fang Liu et al.
 SIGMOD 06
Difficulties of Keyword Search
 Keyword search in text databases only need to
compute score for each document
 Keyword search on RDBMS more complicated
(relations, attributes, tuples):
 1. Generate tuple trees (answers) by joining
tuples from different tables
 2. Rank the answers by computing score
Generate Answer Tuple Trees
 Tuple tree answer rules:
1. Each leaf node in a tuple tree must contain at
least one keyword
2. Each tuple only appears at most once in tree
 Separate tuples into tuple sets that contain
keywords and tuple sets that contain all tuples
for each relation, join adjacent sets from
schema graph within constraints of answer
trees
Ranking Tuple Trees
 Treat the text of each tuple within an answer
set as a “document”
 Assign similarity rating between each
document and query, normalizing for:
 Term Frequency
 Document Frequency
 Document Length
 Compute score for tuple tree as average over
all documents
Focused work
 Keyword Searching and Browsing in
Databases using BANKS
 Gaurav Bhalotia et al.
 ICDE 02
BANKS (Browsing And Keyword
Searching)
 a system which enables keyword-
based search on relational
databases, together with data and
schema browsing
User HTTP
BANKS
System JDBC Database
Database and Query Model
 Relational Database -> Directed
Graph
 Each Tuple in Database -> Node in
Graph
 Foreign Key -> Directed Edge
Database and Query Model
Database and Query Model
 An answer to a query should be a
subgraph connecting nodes matching
the keywords.
 The importance of a link depends upon
the type of the link i.e. what relations it
connects and on its semantics
 Ignoring directionality would cause
problems because of “hubs” which are
connected to a large numbers of nodes.
Database and Query Model
 We may restrict the information node to
be from a selected set of nodes of the
graph
 We incorporate another interesting
feature, namely node weights, inspired
by prestige rankings
 Node weights and tree weights need to
be combined to get an overall relevance
score
Formal Model
 Node Weight : N(u)
Depends on the prestige
Set the node prestige = the in-degree of
the node
Nodes that have multiple pointers to
them get a higher prestige
Formal Model
 Edge Weights
Some pupluar tuples can be connected
many other tuples  Edge with forward
and backward edge weights
Weight of a forward link = the strength of
the proximity relationship between two
tuples (set to 1 by default)
Weight of a backward link = in-degree of
edges pointing to the node
Formal Model

Result
Result of query “sudarshan soumen”
Searching for the best answer
 Backward Expanding Search
Algorithm
Intuition: find vertices from which a
forward path exists to at least one node
from each Si.
Run concurrent single source shortest
path algorithm from each node matching
a keyword
Searching for the best answer
S.
Sudarsha
n
Prasan
Roy
writes
author
paper
Charuta
BANKS: Keyword
search…
As an extension of BANKS
 BLINKS: ranked keyword searches on
graphs.
 He H et al.
 SIGMOD 07
Introduction
 Efficient ranked keyword searches on schemaless node-labeled
graphs.
 Challenges:
 Lack of schema for optimization
 Hard to guarantee strong performance
 Proposed technique
 Backward search algorithm
 SLINKS: single-level index search *
 Extension for scalability: BLINKS ( bi-level index search )
 Contributions
 Cost-balanced expansion based backward search
 Combining indexing with search
 Partition-based indexing (bi-level indexing)
Problem Formulation

Backward search algorithm

A single level index

A single level index

SLINKS Algorithm

BLINKS ( brief idea)
 The index is too large to store and too expensive to construct in large
graphs?
Use a divide and conquer approach to create a bi-level index
 Partition the data graph into multiple subgraphs, or blocks.
 Intra-Block Index
 indexes information inside a block
 4 kinds of index, 2 for separator nodes (important, so specially considered )
 Block Index
 2 simple index
Conclusion
 Keywords search challenges:
 Filtering and disambiguation
 Automatic join back
 Ranking of the result
 Additional consideration:
 Efficiency
 Space
Thank you and have fun

More Related Content

What's hot

Relational model
Relational modelRelational model
Relational model
Sabana Maharjan
 
Relational model
Relational modelRelational model
Relational model
Dabbal Singh Mahara
 
COMPUTERS Database
COMPUTERS Database COMPUTERS Database
COMPUTERS Database
Rc Os
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 
Object Oriented Dbms
Object Oriented DbmsObject Oriented Dbms
Object Oriented Dbms
maryeem
 
Lecture 07 relational database management system
Lecture 07 relational database management systemLecture 07 relational database management system
Lecture 07 relational database management systememailharmeet
 
Development of a new indexing technique for XML document retrieval
Development of a new indexing technique for XML document retrievalDevelopment of a new indexing technique for XML document retrieval
Development of a new indexing technique for XML document retrievalAmjad Ali
 
08. Object Oriented Database in DBMS
08. Object Oriented Database in DBMS08. Object Oriented Database in DBMS
08. Object Oriented Database in DBMSkoolkampus
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
ijwscjournal
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
ijwscjournal
 
Unit 3 rdbms study_materials-converted
Unit 3  rdbms study_materials-convertedUnit 3  rdbms study_materials-converted
Unit 3 rdbms study_materials-converted
gayaramesh
 
Intro to relational model
Intro to relational modelIntro to relational model
Intro to relational model
ATS SBGI MIRAJ
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics PresentationSkylar Ritchie
 
Mi0034 database management systems
Mi0034  database management systemsMi0034  database management systems
Mi0034 database management systemssmumbahelp
 
Object relationship mapping and hibernate
Object relationship mapping and hibernateObject relationship mapping and hibernate
Object relationship mapping and hibernateJoe Jacob
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
ijseajournal
 
Linked List Problems
Linked List ProblemsLinked List Problems
Linked List ProblemsSriram Raj
 
Denormalization
DenormalizationDenormalization
Denormalization
Sohail Haider
 
Database Management System-session1-2
Database Management System-session1-2Database Management System-session1-2
Database Management System-session1-2
Infinity Tech Solutions
 

What's hot (20)

Relational model
Relational modelRelational model
Relational model
 
Bc0041
Bc0041Bc0041
Bc0041
 
Relational model
Relational modelRelational model
Relational model
 
COMPUTERS Database
COMPUTERS Database COMPUTERS Database
COMPUTERS Database
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Object Oriented Dbms
Object Oriented DbmsObject Oriented Dbms
Object Oriented Dbms
 
Lecture 07 relational database management system
Lecture 07 relational database management systemLecture 07 relational database management system
Lecture 07 relational database management system
 
Development of a new indexing technique for XML document retrieval
Development of a new indexing technique for XML document retrievalDevelopment of a new indexing technique for XML document retrieval
Development of a new indexing technique for XML document retrieval
 
08. Object Oriented Database in DBMS
08. Object Oriented Database in DBMS08. Object Oriented Database in DBMS
08. Object Oriented Database in DBMS
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
Unit 3 rdbms study_materials-converted
Unit 3  rdbms study_materials-convertedUnit 3  rdbms study_materials-converted
Unit 3 rdbms study_materials-converted
 
Intro to relational model
Intro to relational modelIntro to relational model
Intro to relational model
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics Presentation
 
Mi0034 database management systems
Mi0034  database management systemsMi0034  database management systems
Mi0034 database management systems
 
Object relationship mapping and hibernate
Object relationship mapping and hibernateObject relationship mapping and hibernate
Object relationship mapping and hibernate
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
 
Linked List Problems
Linked List ProblemsLinked List Problems
Linked List Problems
 
Denormalization
DenormalizationDenormalization
Denormalization
 
Database Management System-session1-2
Database Management System-session1-2Database Management System-session1-2
Database Management System-session1-2
 

Viewers also liked

Interactive Query and Search for your Big Data
Interactive Query and Search for your Big DataInteractive Query and Search for your Big Data
Interactive Query and Search for your Big Data
DataWorks Summit
 
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)weiw_oz
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
Optum
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
Rupali Bhatnagar
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event Models
DKALab
 
E-Learning Baseline, UCL
E-Learning Baseline, UCLE-Learning Baseline, UCL
E-Learning Baseline, UCL
Jisc
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
M. Atif Qureshi
 
Keyword proximity search in xml trees andrada astefanoaie - presentation
Keyword proximity search in xml trees   andrada astefanoaie - presentationKeyword proximity search in xml trees   andrada astefanoaie - presentation
Keyword proximity search in xml trees andrada astefanoaie - presentationAndrada Astefanoaie
 

Viewers also liked (8)

Interactive Query and Search for your Big Data
Interactive Query and Search for your Big DataInteractive Query and Search for your Big Data
Interactive Query and Search for your Big Data
 
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event Models
 
E-Learning Baseline, UCL
E-Learning Baseline, UCLE-Learning Baseline, UCL
E-Learning Baseline, UCL
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Keyword proximity search in xml trees andrada astefanoaie - presentation
Keyword proximity search in xml trees   andrada astefanoaie - presentationKeyword proximity search in xml trees   andrada astefanoaie - presentation
Keyword proximity search in xml trees andrada astefanoaie - presentation
 

Similar to Presentation

Dbms Lec Uog 02
Dbms Lec Uog 02Dbms Lec Uog 02
Dbms Lec Uog 02smelltulip
 
Database_Introduction.pdf
Database_Introduction.pdfDatabase_Introduction.pdf
Database_Introduction.pdf
Satyanarayan Shenoy
 
No sql – rise of the clusters
No sql – rise of the clustersNo sql – rise of the clusters
No sql – rise of the clusters
responseteam
 
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
Dipen Parmar
 
Bt0066 dbms
Bt0066 dbmsBt0066 dbms
Bt0066 dbms
smumbahelp
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1
Sonia Mim
 
Bca examination 2015 dbms
Bca examination 2015 dbmsBca examination 2015 dbms
Bca examination 2015 dbms
Anjaan Gajendra
 
Codds rules & keys
Codds rules & keysCodds rules & keys
Codds rules & keys
Balasingham Karthiban
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
Chirag vasava
 
Databases and its representation
Databases and its representationDatabases and its representation
Databases and its representation
Ruhull
 
NIF as a Multi-Model Semantic Information System
NIF as a Multi-Model Semantic Information SystemNIF as a Multi-Model Semantic Information System
NIF as a Multi-Model Semantic Information System
Neuroscience Information Framework
 
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESEFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
IJCSEIT Journal
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
Prakash Zodge
 
Database Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdfDatabase Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdf
rsujeet169
 
Implementing the Database Server session 01
Implementing the Database Server  session 01Implementing the Database Server  session 01
Implementing the Database Server session 01
Guillermo Julca
 
D.dsgn + dbms
D.dsgn + dbmsD.dsgn + dbms
D.dsgn + dbms
Dori Dorian
 
For project
For projectFor project
For project
jesalnmistry
 

Similar to Presentation (20)

Dbms Lec Uog 02
Dbms Lec Uog 02Dbms Lec Uog 02
Dbms Lec Uog 02
 
Database_Introduction.pdf
Database_Introduction.pdfDatabase_Introduction.pdf
Database_Introduction.pdf
 
Week 1
Week 1Week 1
Week 1
 
No sql – rise of the clusters
No sql – rise of the clustersNo sql – rise of the clusters
No sql – rise of the clusters
 
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
 
Bt0066 dbms
Bt0066 dbmsBt0066 dbms
Bt0066 dbms
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1
 
Bca examination 2015 dbms
Bca examination 2015 dbmsBca examination 2015 dbms
Bca examination 2015 dbms
 
2 rel-algebra
2 rel-algebra2 rel-algebra
2 rel-algebra
 
Codds rules & keys
Codds rules & keysCodds rules & keys
Codds rules & keys
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
Databases and its representation
Databases and its representationDatabases and its representation
Databases and its representation
 
NIF as a Multi-Model Semantic Information System
NIF as a Multi-Model Semantic Information SystemNIF as a Multi-Model Semantic Information System
NIF as a Multi-Model Semantic Information System
 
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESEFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
Database Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdfDatabase Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdf
 
Implementing the Database Server session 01
Implementing the Database Server  session 01Implementing the Database Server  session 01
Implementing the Database Server session 01
 
Ch10
Ch10Ch10
Ch10
 
D.dsgn + dbms
D.dsgn + dbmsD.dsgn + dbms
D.dsgn + dbms
 
For project
For projectFor project
For project
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 

Presentation

  • 1. KEYWORDS SEARCH ON STRUCTURED DATABASE Xiaoyu Chen, Min Li, Yihan Gao, Tianning Xu
  • 2. Introduction  Structured data  Schema as a summary of the data  Retrieve through structured language  What would big data bring to structured data retrieval?
  • 3. Introduction  In terms of high volume of data  Hadoop + Pig Latin came to rescue  However, is this enough?  Recall how you write selection. What do you need to know  Can you remember this ?
  • 4. Introduction  Big data-> big and complicated schema  Hard to remember and operate!  May not even fit in main memory!  What should we do about it ?  How does information retrieval deals with this ?
  • 5. Introduction  Search based on keywords  No need for schema  Efficiency guaranteed using index  All seem to to be straightforward and easy  What are the challenges ?
  • 6. Introduction  Search for “Apple + company”  Match to “apple(fruit)”, “Apple Inc.”, “Adams’ apple”  Which one is correct ? How to filter? Challenge1: Filtering and disambiguat
  • 7. Introduction  Search for “Steve Jobs + Apple”  Normalization. What to return ? ID Nam e Gend er Employ er Location ID Company Locatio n Type Product ID Street City State Countr y Challenge2: Automatic join back
  • 8. Introduction  Search for “Jordan”  Match “Jordan (brand)” , ”Michael Jordan (player)”, “Michael Jordan (professor)” etc.  All of them should match. Which one is better ?  Ranking Challenge3: Ranking of the result
  • 9. Literature Overview  Two kinds of approaches  1. Interpretative approach  Reuse database query language and index  Translate the keywords into queries  Will introduce 3 papers  2. Un-interpretative approach (focus)  Typically build own index and data structure  Model as graph and use graph-based analysis  Will introduce 3 papers
  • 10. Literature Overview – Interpretative approach  DBXplorer Sanjay Agrawal et al.  General: two steps  Publish step: pre-computation, indexing etc.  Search step: lookup, enumerate over join tree, generate SQL etc.  Efficiency:  Symbol table (index) design  Symbol table compaction
  • 11. Literature Overview – Interpretative approach  Publish step:  1: A database is identified, along with the set of tables and columns within the database to be published.  2: Auxiliary tables are created for supporting keyword searches. E.g. index table  But, how to build efficient index ?
  • 12. Literature Overview – Interpretative approach  Index goal: find out the keyword belonging row_id and column_id.  If the column (attribute) already has index, we need only column_id index (reuse database index) ID Name Gender Addr Org 1 2 3 Column index Row index
  • 13. Literature Overview – Interpretative approach  Compress index table  Foreign key constraint etc.  General Algorithm -- CP-Comp Name Product … Name Gender … Sells table Person table Table 1. Compressed table Table 2. Uncompressed table
  • 14. Literature Overview – Interpretative approach  Search step  Step 1: look up index find columns/rows of the database that contain the query keywords.  Step 2: All potential subsets of tables in the database that, if joined, might contain rows having all keywords, are identified and enumerated. Join Tree  Step 3: For each enumerated join tree, a SQL statement is constructed (and executed) that joins the tables in the tree and selects those rows that contain all keywords. The final rows are ranked and presented to the user.
  • 15. Literature Overview – Interpretative approach  Join Tree example:
  • 16. Literature Overview – Interpretative approach  Keyword Search in Databases: The Power of RDBMS  Lu Qin et al.  SIGMOD 09
  • 17. Integrating IR and DB  DB techniques provide users with efficient ways to access structured data in RDBMSs  IR techniques allow users to use keywords to access unstructured data  Eg. Structural keyword search, finds how tuples that contain keywords in a RDB are interconnected (the structure), three types:
  • 18. Schema-based approach Connected Tree Semantics: query results in minimal total joining network of tuples; adjacent tuples joined by foreign key reference, #tuples <= Tmax
  • 19. Connected Tree Semantics  1. Candidate Network (CN) generation: relational algebra expressions that creates trees with all keywords up to a certain size  2. CN evaluation: evaluates generated CNs using SQL
  • 20. Schema-based approach Distinct Root Semantics: query results in collection of tuples all reachable from root; root uniquely defines tuples, distance(any tuple, root) <= Dmax
  • 21. Schema-based approach Distinct Core Semantics: query results in multi-center subgraphs (communities); keyword tuples uniquely defines a community, distance(any keyword tuple, any center tuple) <= Dmax
  • 22. Distinct Core/Root Semantics  1. Creates pairs between tuple containing keyword and every other tuple, that is the shortest distance between them  2. generate graphs using SQL with distinct core/roots
  • 23. Literature Overview – Interpretative approach  Keyword search over relational databases: a metadata approach.  Bergamaschiet al.  SIGMOD 11
  • 24. Problem Definition  A database D is a collection of relational tables. Each relational table contains its name, attributes and value domains. All these elements together form the vocabulary.  A keyword query q is an ordered list of keywords. Each keyword specifies the element of the interest.  A configuration of a keyword query on Database is an injective mapping from the keyword to vocabulary of the database  Task: First derive the top configurations based on some metrics and then interpret it as SQL query (select-project-join interpretations)
  • 25. From Keywords to Queries  Need to consider inter-dependency of the query keywords: Introduce two different kinds of weights: the intrinsic weights, and the contextual weights  Need to give a ranked list of all the configurations Develop an algorithm based on and extends the Hungarian (a.k.a., Munkres) algorithm  Need to separate the process of evaluating the schema terms and value terms Evaluate the value weights based on the schema mapping
  • 26.
  • 27. Contributions and Insights  Formally define the problem of keyword querying over relational databases that lack a-priori access to the database instance  Introduce the notion of a weight as a measure of the likelihood that the semantics of a keyword are represented by a database structure. Need to consider both intrinsic weights and contextual weights  Extend and exploit the Hungarian (a.k.a., Munkres) algorithm to generate a ranking of different interpretations.
  • 28. Literature Overview  Two kinds of approaches  1. Interpretative approach  Reuse database query language and index  Translate the keywords into queries  2. Un-interpretative approach  Typically build own index and data structure  Model as graph and use graph-based analysis
  • 29. Literature Overview – Un-interpretative approach  Effective Keyword Search in Relational Databases  Fang Liu et al.  SIGMOD 06
  • 30. Difficulties of Keyword Search  Keyword search in text databases only need to compute score for each document  Keyword search on RDBMS more complicated (relations, attributes, tuples):  1. Generate tuple trees (answers) by joining tuples from different tables  2. Rank the answers by computing score
  • 31. Generate Answer Tuple Trees  Tuple tree answer rules: 1. Each leaf node in a tuple tree must contain at least one keyword 2. Each tuple only appears at most once in tree  Separate tuples into tuple sets that contain keywords and tuple sets that contain all tuples for each relation, join adjacent sets from schema graph within constraints of answer trees
  • 32. Ranking Tuple Trees  Treat the text of each tuple within an answer set as a “document”  Assign similarity rating between each document and query, normalizing for:  Term Frequency  Document Frequency  Document Length  Compute score for tuple tree as average over all documents
  • 33. Focused work  Keyword Searching and Browsing in Databases using BANKS  Gaurav Bhalotia et al.  ICDE 02
  • 34. BANKS (Browsing And Keyword Searching)  a system which enables keyword- based search on relational databases, together with data and schema browsing User HTTP BANKS System JDBC Database
  • 35. Database and Query Model  Relational Database -> Directed Graph  Each Tuple in Database -> Node in Graph  Foreign Key -> Directed Edge
  • 37. Database and Query Model  An answer to a query should be a subgraph connecting nodes matching the keywords.  The importance of a link depends upon the type of the link i.e. what relations it connects and on its semantics  Ignoring directionality would cause problems because of “hubs” which are connected to a large numbers of nodes.
  • 38. Database and Query Model  We may restrict the information node to be from a selected set of nodes of the graph  We incorporate another interesting feature, namely node weights, inspired by prestige rankings  Node weights and tree weights need to be combined to get an overall relevance score
  • 39. Formal Model  Node Weight : N(u) Depends on the prestige Set the node prestige = the in-degree of the node Nodes that have multiple pointers to them get a higher prestige
  • 40. Formal Model  Edge Weights Some pupluar tuples can be connected many other tuples  Edge with forward and backward edge weights Weight of a forward link = the strength of the proximity relationship between two tuples (set to 1 by default) Weight of a backward link = in-degree of edges pointing to the node
  • 42. Result Result of query “sudarshan soumen”
  • 43. Searching for the best answer  Backward Expanding Search Algorithm Intuition: find vertices from which a forward path exists to at least one node from each Si. Run concurrent single source shortest path algorithm from each node matching a keyword
  • 44. Searching for the best answer S. Sudarsha n Prasan Roy writes author paper Charuta BANKS: Keyword search…
  • 45. As an extension of BANKS  BLINKS: ranked keyword searches on graphs.  He H et al.  SIGMOD 07
  • 46. Introduction  Efficient ranked keyword searches on schemaless node-labeled graphs.  Challenges:  Lack of schema for optimization  Hard to guarantee strong performance  Proposed technique  Backward search algorithm  SLINKS: single-level index search *  Extension for scalability: BLINKS ( bi-level index search )  Contributions  Cost-balanced expansion based backward search  Combining indexing with search  Partition-based indexing (bi-level indexing)
  • 49. A single level index 
  • 50. A single level index 
  • 52. BLINKS ( brief idea)  The index is too large to store and too expensive to construct in large graphs? Use a divide and conquer approach to create a bi-level index  Partition the data graph into multiple subgraphs, or blocks.  Intra-Block Index  indexes information inside a block  4 kinds of index, 2 for separator nodes (important, so specially considered )  Block Index  2 simple index
  • 53. Conclusion  Keywords search challenges:  Filtering and disambiguation  Automatic join back  Ranking of the result  Additional consideration:  Efficiency  Space
  • 54. Thank you and have fun