© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.der...
Digital Enterprise Research Institute www.deri.ie
Big Data
 Big Data: More complete data-based picture of the
world.
Digital Enterprise Research Institute www.deri.ie
Growing Schema Size
10s-100s attributes
1,000s-1,000,000s attributes
 H...
Digital Enterprise Research Institute www.deri.ie
Growing Semantic Heterogeneity
 Multiple perspectives (conceptualizatio...
Digital Enterprise Research Institute www.deri.ie
Problem
 Structured queries are still the primary way
to query database...
Digital Enterprise Research Institute www.deri.ie
Structured query
Schema size &
heterogeneity
Query
construction time
Hig...
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton mar...
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton mar...
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton mar...
Digital Enterprise Research Institute www.deri.ie
Solution: Schema-agnostic queries
Lexical-level
Abstraction-level
Struct...
Digital Enterprise Research Institute www.deri.ie
Statistical
analysis
Datasets
Digital Enterprise Research Institute www.deri.ie
Statistical
analysis
Datasets
Digital Enterprise Research Institute www.deri.ie
Core Elements of the Proposed Approach
 Hybrid model database/IR/QA.
 ...
Digital Enterprise Research Institute www.deri.ie
Does it work?
 DBpedia 3.7 + YAGO.
 102 natural language queries (QALD...
Digital Enterprise Research Institute www.deri.ie
Digital Enterprise Research Institute www.deri.ie
Digital Enterprise Research Institute www.deri.ie
Digital Enterprise Research Institute www.deri.ie
Selected Publications
André Freitas, Edward Curry, João Gabriel Oliveira...
Digital Enterprise Research Institute www.deri.ie
http://treo.deri.ie
Upcoming SlideShare
Loading in …5
×

Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

819 views

Published on

Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases: A Distributional-Compositional Semantics Perspective

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
819
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Part of the Big Data vision
  • Part of the Big Data vision
  • Part of the Big Data vision
  • Part of the Big Data vision
  • Include user feedbacks
  • Include user feedbacks
  • Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

    1. 1. © Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases: A Distributional-Compositional Semantics Perspective André Freitas, Sean O’Riain, Edward Curry DEOS 2013, Oxford, UK
    2. 2. Digital Enterprise Research Institute www.deri.ie Big Data  Big Data: More complete data-based picture of the world.
    3. 3. Digital Enterprise Research Institute www.deri.ie Growing Schema Size 10s-100s attributes 1,000s-1,000,000s attributes  Heterogeneous, complex and large-scale databases.  Very-large and dynamic “schemas”.
    4. 4. Digital Enterprise Research Institute www.deri.ie Growing Semantic Heterogeneity  Multiple perspectives (conceptualizations) of the reality.  Ambiguity, vagueness, inconsitency.
    5. 5. Digital Enterprise Research Institute www.deri.ie Problem  Structured queries are still the primary way to query databases.
    6. 6. Digital Enterprise Research Institute www.deri.ie Structured query Schema size & heterogeneity Query construction time HighLow High Low 10-100s attributes 103 -106 s attributes
    7. 7. Digital Enterprise Research Institute www.deri.ie Vocabulary Problem for Databases Who is the daughter of Bill Clinton married to? Schema-agnostic queries Possible representations
    8. 8. Digital Enterprise Research Institute www.deri.ie Vocabulary Problem for Databases Who is the daughter of Bill Clinton married to ? Semantic Gap Lexical-level Abstraction-level Structural-level
    9. 9. Digital Enterprise Research Institute www.deri.ie Vocabulary Problem for Databases Who is the daughter of Bill Clinton married to ? Semantic Gap Lexical-level Abstraction-level Structural-level Query: Data
    10. 10. Digital Enterprise Research Institute www.deri.ie Solution: Schema-agnostic queries Lexical-level Abstraction-level Structural-level Distributional Semantics Compositional Semantics Based on the statistical analysis of large unstructured corpora Query Processing and Planning
    11. 11. Digital Enterprise Research Institute www.deri.ie Statistical analysis Datasets
    12. 12. Digital Enterprise Research Institute www.deri.ie Statistical analysis Datasets
    13. 13. Digital Enterprise Research Institute www.deri.ie Core Elements of the Proposed Approach  Hybrid model database/IR/QA.  Ranked query results.  Existing IR approaches: traditional Vector Space Models (VSMs) were not able to:  (i) capture the structure of data.  (ii) support a precise and comprehensive semantic matching.  A VSM supporting these two requirements was formulated: Ƭ-Space.  Ranking function based on a distributional semantic relatedness measure.
    14. 14. Digital Enterprise Research Institute www.deri.ie Does it work?  DBpedia 3.7 + YAGO.  102 natural language queries (QALD 2011). Entity-Attribute-Value (EAV) Dataset: 45,767 predicates 5,556,492 classes 9,434,677 instances
    15. 15. Digital Enterprise Research Institute www.deri.ie
    16. 16. Digital Enterprise Research Institute www.deri.ie
    17. 17. Digital Enterprise Research Institute www.deri.ie
    18. 18. Digital Enterprise Research Institute www.deri.ie Selected Publications André Freitas, Edward Curry, João Gabriel Oliveira, João C. Pereira da Silva, Sean O'Riain, Querying the Semantic Web using Semantic Relatedness: A Vocabulary Independent Approach. Data & Knowledge Engineering (DKE) Journal, 2013. (Article).   André Freitas, Fabricio de Faria, Sean O'Riain, Edward Curry, Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach, In Proceedings of the 36th Annual ACM SIGIR Conference, Dublin, Ireland, 2013. (Demonstration Paper in Proceedings). André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends. IEEE Internet Computing, Special Issue on Internet-Scale Data, 2012 (Article). André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A Distributional Structured Semantic Space for Querying RDF Graph Data. International Journal of Semantic Computing (IJSC), 2012 (Article).  
    19. 19. Digital Enterprise Research Institute www.deri.ie http://treo.deri.ie

    ×