SlideShare a Scribd company logo
1 of 17
Top-k Exploration of Query Candidates
for Efficient Keyword Search on Graph-
           Shaped (RDF) Data

    Thanh Tran1, Haofen Wang2, Sebastian Rudolph1,
                      Philipp Cimiano3
      1Institute AIFB, University Karlsruhe, Germany
     2APEX Lab, Shanghai Jiao Tong University, China
    3Web Information Systems, TU Delft, Netherlands
Motivation
• Semantic search
   – Access to KB facts and semantically described documents
   – Support for expressive / precise information need
• How to capture the user’s information need?
   – Expressive queries with difficult syntax (SQL, SPARQL) vs.
     limited but intuitive queries (Keywords)
   – Expressive power is crucial!
   – Support the user in specifying information needs in an
     intuitive way is also crucial!
• Goal: Interpreting Complex Information Needs by
  Translating Keywords to Expressive Formal Queries
Related Work
• Translation of NL questions
  – Can the user specify a precise question when the
    information need is vague?
• Relaxed-structure query models
  – Require some knowledge about the query syntax and
    the structure of the underlying data
• Labeled query models
  – Require some knowledge about schema elements
• In keyword search, the user does not need to
  know about the query syntax and data schema
  – Crucial for environment like the Web where most data
    sources to be queried are unknown to the user
Scenario – Interpreting Information Needs
           User Information Need
                                            RDF Data Graph




        Query Specification


„2006   Philipp Cimiano X-Media“


           Query Translation
                      Query Processing
SELECT ?x , ? y , ? z WHERE {
? x type Publication . ? x year 2006 .
? x author ?y . ? y name ’P . Cimiano ’ .
? y worksAt ? z . ? z name ’AIFB’}
Keyword Search – An Overview
• Mapping of keywords to ”labels” of data elements
   – Result in a set of keyword elements
   – Through imprecise matching, user even does not need to know the
     labels of data elements (c.f. precise matching in [G. Bhalotia et al.])
• Data Graph exploration
   – Search for substructures (query graph) connecting keyword elements
   – Query graph vs. answer trees [H. He et al.]
   – Exploration of query graphs operates on summary of data graph only
• Top-k computation
   – Search guided by a scoring function to output only the top-k results
   – Guaranteed top-k vs. approximate top-k V. [V. Kacholia et al.]
• Mapping query graph to conjunctive query
• Processing the conjunctive query using standard query engine
Keyword Search – The Workflow
• Offline: Summarization, Scoring, Term Expansion
• Online: Query Computation, Query Processing
Graph Summarization
• Goal: preserve sufficient information to compute elements and
  structure of the query, while reducing the exploration space
• Summary graph captures relations between entity classes, thus
  preserve structural information of the original data graph




                                                  Summary Graph
     Example RDF Graph
Keyword Mapping & Graph Augmentation
•   Summary graph captures information for exploration of query structure
•   Online augmentation with elements & scores obtained from keyword mapping
•   Augmented graph contains further information for exploration of query elements


                                                                     „2006

                                                                     Philipp Cimiano

                                                                     AIFB“
                                                                    Keyword Query




    Summary Graph                   Augmented Summary Graph
Top-k Graph Exploration
• Cost-directed exploration of the graph, starting from keyword elements Nk
• Explore all possible distinct paths starting from nk 2 Nk
• At each step, take cursor (“path”) from queues with lowest cost for exploration
• When a connecting element nc is found,
   • Paths from nk to nc are merged to construct the query graph
   • Top-k is invoked to add query graph to candidate list
• Top-k terminates when highest cost of the candidate list (the cost of the k-
  ranked query graph) is found to be lower than the lowest possible cost that can
  achieved with paths in the queues yet to be explored




    Augmented Summary Graph                        Explored Paths
Mapping Query Graph to Conjunctive Query

•   Conjunctive query obtained by exhaustive application of mapping rules
     • Every value vertex vvertex  a term
     • Every class vertex cvertex  a distinct variable
     • Every A-edge e(cvertex, vvertex)  a query predicate e[var(cvertex), term(vvertex)]
     • Every R-edge e(cvertex1, cvertex2)  a query predicate e[var(cvertex1), var(cvertex2)]
•   Treat all query variables as distinguished
•   Specific mechanisms can be provided for the user to choose distinguished variables
•   Query chosen by the user finally translated to query formalism supported by the
    query engine (SPARQL) for retrieving query answers




           Query Graph                                      Conjunctive Query
Rich Client Demo – xXploreKnow!




      http://ontoware.org/projects/xxplore/
Web Demo – Q2Semantic




   http://q2semantic.apexlab.org/UI.html
Evaluation – Effectiveness
• 12 users provide 30 keyword queries on DBLP, along with the
  NL description of the information need
• Reciprocal Rank = 1/r, where r is the rank of the correct query
• A query is correct if it matches the information need
• Information need can be interpreted in most cases, in
  particular when path length, matching score as well as
  popularity of graph elements are incorporated into scoring
  function (C3)
 1
0.8
0.6                                                            C1
0.4                                                            C2
0.2                                                            C3
 0
      Q1 Q3 Q5 Q7 Q9 Q11 Q13 Q15 Q17 Q19 Q21 Q23 Q25 Q27 Q29
              MRRs of different Scoring Functions on DBLP
Evaluation – Usability of Query Interpretation
- Standard approaches return top-k results
- Our approach based on interpretation of keywords as
  queries, i.e. compute top-k queries instead of top-k
  answer trees [V. Kacholia et al.] [H. He et al.]
- Queries are then transformed to simple natural
  language and presented to user
- 90% of users prefer to obtain question first, since it
  facilitates understanding of results
- All user prefers to do refinement on the structured
  query, rather than on the keywords, since the
  structured query can be manipulated in a more
  precise and predictable way
Evaluation – Efficiency
• Comparison with bidirectional search [V. Kacholia et al.] and search based on
  graph indexing (1000 BFS, 1000 METIS, 300 BFS, 300 METIS in [H. He et al.])
• We measure time for query computation + time for processing several
  queries until finding 10 answers
• Outperforms bidirectional search by at least one order of magnitude
• Performs fairly well when compared to indexing based approaches

 100000
  10000                                                           Our Solution

   1000                                                           Bidirect
                                                                  1000 BFS
    100
                                                                  1000 METIS
     10                                                           300BFS
       1                                                          300METIS
           Q1   Q2    Q3   Q4   Q5   Q6   Q7    Q8      Q9 Q10

                       Query Performance on DBLP Data
Conclusions and Future Work
• Conclusions
   – A new approach for keyword search on graph-structured
     data, RDF in particular
   – Novel algorithms for the top-k exploration of subgraphs to
     compute queries as an additional intermediate step
   – Query computing is performed on an aggregated graph
     while query processing can leverage optimization
     capability of the database
• Future Work
   – Indexing connectivity and scores for further speed up
   – Consider special query operations (e.g. filters) as keywords
Thank you for your attention!

            Q&A

More Related Content

Viewers also liked

Tips on how to use AdBlue for fleet operators and drivers by air1_yara
Tips on how to use AdBlue for fleet operators and drivers by air1_yaraTips on how to use AdBlue for fleet operators and drivers by air1_yara
Tips on how to use AdBlue for fleet operators and drivers by air1_yaraYara International
 
Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
Heterogeneous Web Data Search Using Relevance-based On The Fly Data IntegrationHeterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
Heterogeneous Web Data Search Using Relevance-based On The Fly Data IntegrationThanh Tran
 
Faculty forum presentation march 2012
Faculty forum presentation  march 2012Faculty forum presentation  march 2012
Faculty forum presentation march 2012Jeff Simmons
 
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)Thanh Tran
 
Benefit Professor Sr. AE and Recruiting Process
Benefit Professor Sr. AE and Recruiting ProcessBenefit Professor Sr. AE and Recruiting Process
Benefit Professor Sr. AE and Recruiting Processbene_professor
 
The Information Workbench -
The Information Workbench -  The Information Workbench -
The Information Workbench - Thanh Tran
 
Diesel Exhaust Fluid (DEF) Fact Sheets
Diesel Exhaust Fluid (DEF) Fact SheetsDiesel Exhaust Fluid (DEF) Fact Sheets
Diesel Exhaust Fluid (DEF) Fact SheetsYara International
 
20120410 aiming水口 ドイツゲームのすゝめ
20120410 aiming水口 ドイツゲームのすゝめ20120410 aiming水口 ドイツゲームのすゝめ
20120410 aiming水口 ドイツゲームのすゝめTakeo Mizuguchi
 

Viewers also liked (9)

Tips on how to use AdBlue for fleet operators and drivers by air1_yara
Tips on how to use AdBlue for fleet operators and drivers by air1_yaraTips on how to use AdBlue for fleet operators and drivers by air1_yara
Tips on how to use AdBlue for fleet operators and drivers by air1_yara
 
Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
Heterogeneous Web Data Search Using Relevance-based On The Fly Data IntegrationHeterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
 
Faculty forum presentation march 2012
Faculty forum presentation  march 2012Faculty forum presentation  march 2012
Faculty forum presentation march 2012
 
Genetically Modified Food
Genetically Modified FoodGenetically Modified Food
Genetically Modified Food
 
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
 
Benefit Professor Sr. AE and Recruiting Process
Benefit Professor Sr. AE and Recruiting ProcessBenefit Professor Sr. AE and Recruiting Process
Benefit Professor Sr. AE and Recruiting Process
 
The Information Workbench -
The Information Workbench -  The Information Workbench -
The Information Workbench -
 
Diesel Exhaust Fluid (DEF) Fact Sheets
Diesel Exhaust Fluid (DEF) Fact SheetsDiesel Exhaust Fluid (DEF) Fact Sheets
Diesel Exhaust Fluid (DEF) Fact Sheets
 
20120410 aiming水口 ドイツゲームのすゝめ
20120410 aiming水口 ドイツゲームのすゝめ20120410 aiming水口 ドイツゲームのすゝめ
20120410 aiming水口 ドイツゲームのすゝめ
 

Similar to Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Rakebul Hasan
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Information Exploitation at BBN
Information Exploitation at BBNInformation Exploitation at BBN
Information Exploitation at BBNPlamen Petrov
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RDatabricks
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...eswcsummerschool
 
A Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionA Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionRakebul Hasan
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresCrai Macdonald
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...Cambridge Semantics
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jDatabricks
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jFred Madrid
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
 
Using graphs for recommendations
Using graphs for recommendationsUsing graphs for recommendations
Using graphs for recommendationsRik Van Bruggen
 

Similar to Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data (20)

An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Information Exploitation at BBN
Information Exploitation at BBNInformation Exploitation at BBN
Information Exploitation at BBN
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
 
A Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionA Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance Prediction
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing Infrastructures
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
HDF5 FastQuery
HDF5 FastQueryHDF5 FastQuery
HDF5 FastQuery
 
Using graphs for recommendations
Using graphs for recommendationsUsing graphs for recommendations
Using graphs for recommendations
 

Recently uploaded

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

  • 1. Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph- Shaped (RDF) Data Thanh Tran1, Haofen Wang2, Sebastian Rudolph1, Philipp Cimiano3 1Institute AIFB, University Karlsruhe, Germany 2APEX Lab, Shanghai Jiao Tong University, China 3Web Information Systems, TU Delft, Netherlands
  • 2. Motivation • Semantic search – Access to KB facts and semantically described documents – Support for expressive / precise information need • How to capture the user’s information need? – Expressive queries with difficult syntax (SQL, SPARQL) vs. limited but intuitive queries (Keywords) – Expressive power is crucial! – Support the user in specifying information needs in an intuitive way is also crucial! • Goal: Interpreting Complex Information Needs by Translating Keywords to Expressive Formal Queries
  • 3. Related Work • Translation of NL questions – Can the user specify a precise question when the information need is vague? • Relaxed-structure query models – Require some knowledge about the query syntax and the structure of the underlying data • Labeled query models – Require some knowledge about schema elements • In keyword search, the user does not need to know about the query syntax and data schema – Crucial for environment like the Web where most data sources to be queried are unknown to the user
  • 4. Scenario – Interpreting Information Needs User Information Need RDF Data Graph Query Specification „2006 Philipp Cimiano X-Media“ Query Translation Query Processing SELECT ?x , ? y , ? z WHERE { ? x type Publication . ? x year 2006 . ? x author ?y . ? y name ’P . Cimiano ’ . ? y worksAt ? z . ? z name ’AIFB’}
  • 5. Keyword Search – An Overview • Mapping of keywords to ”labels” of data elements – Result in a set of keyword elements – Through imprecise matching, user even does not need to know the labels of data elements (c.f. precise matching in [G. Bhalotia et al.]) • Data Graph exploration – Search for substructures (query graph) connecting keyword elements – Query graph vs. answer trees [H. He et al.] – Exploration of query graphs operates on summary of data graph only • Top-k computation – Search guided by a scoring function to output only the top-k results – Guaranteed top-k vs. approximate top-k V. [V. Kacholia et al.] • Mapping query graph to conjunctive query • Processing the conjunctive query using standard query engine
  • 6. Keyword Search – The Workflow • Offline: Summarization, Scoring, Term Expansion • Online: Query Computation, Query Processing
  • 7. Graph Summarization • Goal: preserve sufficient information to compute elements and structure of the query, while reducing the exploration space • Summary graph captures relations between entity classes, thus preserve structural information of the original data graph Summary Graph Example RDF Graph
  • 8. Keyword Mapping & Graph Augmentation • Summary graph captures information for exploration of query structure • Online augmentation with elements & scores obtained from keyword mapping • Augmented graph contains further information for exploration of query elements „2006 Philipp Cimiano AIFB“ Keyword Query Summary Graph Augmented Summary Graph
  • 9. Top-k Graph Exploration • Cost-directed exploration of the graph, starting from keyword elements Nk • Explore all possible distinct paths starting from nk 2 Nk • At each step, take cursor (“path”) from queues with lowest cost for exploration • When a connecting element nc is found, • Paths from nk to nc are merged to construct the query graph • Top-k is invoked to add query graph to candidate list • Top-k terminates when highest cost of the candidate list (the cost of the k- ranked query graph) is found to be lower than the lowest possible cost that can achieved with paths in the queues yet to be explored Augmented Summary Graph Explored Paths
  • 10. Mapping Query Graph to Conjunctive Query • Conjunctive query obtained by exhaustive application of mapping rules • Every value vertex vvertex  a term • Every class vertex cvertex  a distinct variable • Every A-edge e(cvertex, vvertex)  a query predicate e[var(cvertex), term(vvertex)] • Every R-edge e(cvertex1, cvertex2)  a query predicate e[var(cvertex1), var(cvertex2)] • Treat all query variables as distinguished • Specific mechanisms can be provided for the user to choose distinguished variables • Query chosen by the user finally translated to query formalism supported by the query engine (SPARQL) for retrieving query answers Query Graph Conjunctive Query
  • 11. Rich Client Demo – xXploreKnow! http://ontoware.org/projects/xxplore/
  • 12. Web Demo – Q2Semantic http://q2semantic.apexlab.org/UI.html
  • 13. Evaluation – Effectiveness • 12 users provide 30 keyword queries on DBLP, along with the NL description of the information need • Reciprocal Rank = 1/r, where r is the rank of the correct query • A query is correct if it matches the information need • Information need can be interpreted in most cases, in particular when path length, matching score as well as popularity of graph elements are incorporated into scoring function (C3) 1 0.8 0.6 C1 0.4 C2 0.2 C3 0 Q1 Q3 Q5 Q7 Q9 Q11 Q13 Q15 Q17 Q19 Q21 Q23 Q25 Q27 Q29 MRRs of different Scoring Functions on DBLP
  • 14. Evaluation – Usability of Query Interpretation - Standard approaches return top-k results - Our approach based on interpretation of keywords as queries, i.e. compute top-k queries instead of top-k answer trees [V. Kacholia et al.] [H. He et al.] - Queries are then transformed to simple natural language and presented to user - 90% of users prefer to obtain question first, since it facilitates understanding of results - All user prefers to do refinement on the structured query, rather than on the keywords, since the structured query can be manipulated in a more precise and predictable way
  • 15. Evaluation – Efficiency • Comparison with bidirectional search [V. Kacholia et al.] and search based on graph indexing (1000 BFS, 1000 METIS, 300 BFS, 300 METIS in [H. He et al.]) • We measure time for query computation + time for processing several queries until finding 10 answers • Outperforms bidirectional search by at least one order of magnitude • Performs fairly well when compared to indexing based approaches 100000 10000 Our Solution 1000 Bidirect 1000 BFS 100 1000 METIS 10 300BFS 1 300METIS Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Query Performance on DBLP Data
  • 16. Conclusions and Future Work • Conclusions – A new approach for keyword search on graph-structured data, RDF in particular – Novel algorithms for the top-k exploration of subgraphs to compute queries as an additional intermediate step – Query computing is performed on an aggregated graph while query processing can leverage optimization capability of the database • Future Work – Indexing connectivity and scores for further speed up – Consider special query operations (e.g. filters) as keywords
  • 17. Thank you for your attention! Q&A