Top-k Exploration of Query Candidatesfor Efficient Keyword Search on Graph-           Shaped (RDF) Data    Thanh Tran1, Ha...
Motivation• Semantic search   – Access to KB facts and semantically described documents   – Support for expressive / preci...
Related Work• Translation of NL questions  – Can the user specify a precise question when the    information need is vague...
Scenario – Interpreting Information Needs           User Information Need                                            RDF D...
Keyword Search – An Overview• Mapping of keywords to ”labels” of data elements   – Result in a set of keyword elements   –...
Keyword Search – The Workflow• Offline: Summarization, Scoring, Term Expansion• Online: Query Computation, Query Processing
Graph Summarization• Goal: preserve sufficient information to compute elements and  structure of the query, while reducing...
Keyword Mapping & Graph Augmentation•   Summary graph captures information for exploration of query structure•   Online au...
Top-k Graph Exploration• Cost-directed exploration of the graph, starting from keyword elements Nk• Explore all possible d...
Mapping Query Graph to Conjunctive Query•   Conjunctive query obtained by exhaustive application of mapping rules     • Ev...
Rich Client Demo – xXploreKnow!      http://ontoware.org/projects/xxplore/
Web Demo – Q2Semantic   http://q2semantic.apexlab.org/UI.html
Evaluation – Effectiveness• 12 users provide 30 keyword queries on DBLP, along with the  NL description of the information...
Evaluation – Usability of Query Interpretation- Standard approaches return top-k results- Our approach based on interpreta...
Evaluation – Efficiency• Comparison with bidirectional search [V. Kacholia et al.] and search based on  graph indexing (10...
Conclusions and Future Work• Conclusions   – A new approach for keyword search on graph-structured     data, RDF in partic...
Thank you for your attention!            Q&A
Upcoming SlideShare
Loading in …5
×

Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

579 views
553 views

Published on

Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

ICDE paper presentation

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
579
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

  1. 1. Top-k Exploration of Query Candidatesfor Efficient Keyword Search on Graph- Shaped (RDF) Data Thanh Tran1, Haofen Wang2, Sebastian Rudolph1, Philipp Cimiano3 1Institute AIFB, University Karlsruhe, Germany 2APEX Lab, Shanghai Jiao Tong University, China 3Web Information Systems, TU Delft, Netherlands
  2. 2. Motivation• Semantic search – Access to KB facts and semantically described documents – Support for expressive / precise information need• How to capture the user’s information need? – Expressive queries with difficult syntax (SQL, SPARQL) vs. limited but intuitive queries (Keywords) – Expressive power is crucial! – Support the user in specifying information needs in an intuitive way is also crucial!• Goal: Interpreting Complex Information Needs by Translating Keywords to Expressive Formal Queries
  3. 3. Related Work• Translation of NL questions – Can the user specify a precise question when the information need is vague?• Relaxed-structure query models – Require some knowledge about the query syntax and the structure of the underlying data• Labeled query models – Require some knowledge about schema elements• In keyword search, the user does not need to know about the query syntax and data schema – Crucial for environment like the Web where most data sources to be queried are unknown to the user
  4. 4. Scenario – Interpreting Information Needs User Information Need RDF Data Graph Query Specification„2006 Philipp Cimiano X-Media“ Query Translation Query ProcessingSELECT ?x , ? y , ? z WHERE {? x type Publication . ? x year 2006 .? x author ?y . ? y name ’P . Cimiano ’ .? y worksAt ? z . ? z name ’AIFB’}
  5. 5. Keyword Search – An Overview• Mapping of keywords to ”labels” of data elements – Result in a set of keyword elements – Through imprecise matching, user even does not need to know the labels of data elements (c.f. precise matching in [G. Bhalotia et al.])• Data Graph exploration – Search for substructures (query graph) connecting keyword elements – Query graph vs. answer trees [H. He et al.] – Exploration of query graphs operates on summary of data graph only• Top-k computation – Search guided by a scoring function to output only the top-k results – Guaranteed top-k vs. approximate top-k V. [V. Kacholia et al.]• Mapping query graph to conjunctive query• Processing the conjunctive query using standard query engine
  6. 6. Keyword Search – The Workflow• Offline: Summarization, Scoring, Term Expansion• Online: Query Computation, Query Processing
  7. 7. Graph Summarization• Goal: preserve sufficient information to compute elements and structure of the query, while reducing the exploration space• Summary graph captures relations between entity classes, thus preserve structural information of the original data graph Summary Graph Example RDF Graph
  8. 8. Keyword Mapping & Graph Augmentation• Summary graph captures information for exploration of query structure• Online augmentation with elements & scores obtained from keyword mapping• Augmented graph contains further information for exploration of query elements „2006 Philipp Cimiano AIFB“ Keyword Query Summary Graph Augmented Summary Graph
  9. 9. Top-k Graph Exploration• Cost-directed exploration of the graph, starting from keyword elements Nk• Explore all possible distinct paths starting from nk 2 Nk• At each step, take cursor (“path”) from queues with lowest cost for exploration• When a connecting element nc is found, • Paths from nk to nc are merged to construct the query graph • Top-k is invoked to add query graph to candidate list• Top-k terminates when highest cost of the candidate list (the cost of the k- ranked query graph) is found to be lower than the lowest possible cost that can achieved with paths in the queues yet to be explored Augmented Summary Graph Explored Paths
  10. 10. Mapping Query Graph to Conjunctive Query• Conjunctive query obtained by exhaustive application of mapping rules • Every value vertex vvertex  a term • Every class vertex cvertex  a distinct variable • Every A-edge e(cvertex, vvertex)  a query predicate e[var(cvertex), term(vvertex)] • Every R-edge e(cvertex1, cvertex2)  a query predicate e[var(cvertex1), var(cvertex2)]• Treat all query variables as distinguished• Specific mechanisms can be provided for the user to choose distinguished variables• Query chosen by the user finally translated to query formalism supported by the query engine (SPARQL) for retrieving query answers Query Graph Conjunctive Query
  11. 11. Rich Client Demo – xXploreKnow! http://ontoware.org/projects/xxplore/
  12. 12. Web Demo – Q2Semantic http://q2semantic.apexlab.org/UI.html
  13. 13. Evaluation – Effectiveness• 12 users provide 30 keyword queries on DBLP, along with the NL description of the information need• Reciprocal Rank = 1/r, where r is the rank of the correct query• A query is correct if it matches the information need• Information need can be interpreted in most cases, in particular when path length, matching score as well as popularity of graph elements are incorporated into scoring function (C3) 10.80.6 C10.4 C20.2 C3 0 Q1 Q3 Q5 Q7 Q9 Q11 Q13 Q15 Q17 Q19 Q21 Q23 Q25 Q27 Q29 MRRs of different Scoring Functions on DBLP
  14. 14. Evaluation – Usability of Query Interpretation- Standard approaches return top-k results- Our approach based on interpretation of keywords as queries, i.e. compute top-k queries instead of top-k answer trees [V. Kacholia et al.] [H. He et al.]- Queries are then transformed to simple natural language and presented to user- 90% of users prefer to obtain question first, since it facilitates understanding of results- All user prefers to do refinement on the structured query, rather than on the keywords, since the structured query can be manipulated in a more precise and predictable way
  15. 15. Evaluation – Efficiency• Comparison with bidirectional search [V. Kacholia et al.] and search based on graph indexing (1000 BFS, 1000 METIS, 300 BFS, 300 METIS in [H. He et al.])• We measure time for query computation + time for processing several queries until finding 10 answers• Outperforms bidirectional search by at least one order of magnitude• Performs fairly well when compared to indexing based approaches 100000 10000 Our Solution 1000 Bidirect 1000 BFS 100 1000 METIS 10 300BFS 1 300METIS Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Query Performance on DBLP Data
  16. 16. Conclusions and Future Work• Conclusions – A new approach for keyword search on graph-structured data, RDF in particular – Novel algorithms for the top-k exploration of subgraphs to compute queries as an additional intermediate step – Query computing is performed on an aggregated graph while query processing can leverage optimization capability of the database• Future Work – Indexing connectivity and scores for further speed up – Consider special query operations (e.g. filters) as keywords
  17. 17. Thank you for your attention! Q&A

×