A Compositional-distributional
Semantic Model over
Structured Data
André Freitas, Edward Curry
Insight @ NUI Galway
Insigh...
Talking to your Data

André Freitas, Edward Curry
Insight @ NUI Galway
Outline
 Motivation & Context
 Compositional-distributional Semantic Model: Distributional
Relational Networks (DRNs)
 ...
Motivation & Context
Big Data
 Vision: More complete data-based picture of the world for
systems and users.
Shift in the Database Landscape
 Heterogeneous, complex and large-scale
Bases (KBs).
 Very-large and dynamic “schemas”.
...
Semantic Heterogeneity
 Decentralized content generation.
 Multiple perspectives (conceptualizations) of the reality.
 ...
Databases for a Complex World
How do you query and do reasoning over data on this
scenario?
Vocabulary Problem for Databases
Query: Who is the daughter of Bill Clinton married to?
Semantic Gap
Possible representati...
Semantics for a Complex World
 “Most semantic models have dealt with particular types of
constructions, and have been car...
Compositional-Distributional Semantic Models
 Principled and effective semantic models for coping with
real world semanti...
Addressing the Vocabulary Problem for
Databases (with Distributional Semantics)
Solution (Video)
More Complex Queries (Video)
Treo Answers Jeopardy Queries (Video)
Evaluation
 102 natural language queries (Test Collection: QALD 2011).
 Avg. query execution time: 8.53 s.

Dataset (DBp...
Evaluation
Compositional-Distributional Model:
Distributional Relational Networks
(DRNs)
Distributional Hypothesis
“Words occurring in similar (linguistic) contexts are
semantically similar.”

 If we can equate...
Vector Space Model
function (number of times that the words occur in c1)

c1
0.7
0.5

husband
spouse

cn
c2

child
Semantic Similarity/Relatedness
c1

husband
spouse
θ

cn
c2

child
Approach Overview
Querying &
Reasoning
Core semantic approximation &
composition operations

Distributional
Relational
Net...
Approach Overview
Querying &
Reasoning
Core semantic approximation &
composition operations

Distributional
Relational
Net...
Approach Overview
Querying &
Reasoning
Core semantic approximation &
composition operations

Distributional
Relational
Net...
DRN (Ƭ-Space)
predicate

KB

instance

e

p

DRN
r
DRN (Ƭ-Space)

Space is segmented by the
instances
Core Operations
Query

DRN
Core Operations
Query

DRN

Core
Operations

Query &
Reasoning
Algorithms
Core Principles
 Minimize the impact of Ambiguity, Vagueness, Synonymy.
 Address the simplest matchings first (heuristic...
Specificity Ordering
Use specificity (grammatical class + (IDF)) as a heuristc measure

Ambiguity, vagueness, synonimy

Lo...
Core Operations
 Instance search
- Proper nouns
- String similarity + node cardinality

 Class (unary predicate) search
...
Core Operations
 Navigation
 Extensional expansion
- Expands the instances associated with a class.

 Disambiguation di...
Use Case: Question
Answering over Linked Data
Question Analysis
Transform natural language queries into triple
patterns
“Who is the daughter of Bill Clinton married to?...
Query Plan
Map query features into a query plan.
A query plan contains a sequence of core operations.
(INSTANCE)

(PREDICA...
Instance Search
Query:

Bill Clinton

daughter

Instance Search
Linked
Data:

:Bill_Clinton

married to
Predicate Search
Query:

Linked
Data:

Bill Clinton

daughter

married to

:child

:Bill_Clinton

:Chelsea_Clinton
:religi...
Predicate Search
Query:

Bill Clinton

daughter

married to

Which properties are semantically related to „daughter‟?

Lin...
Navigate
Query:

Linked
Data:

Bill Clinton

daughter

:child

:Bill_Clinton

:Chelsea_Clinton

married to
Navigate
Query:

Linked
Data:

Bill Clinton

daughter

:child

:Bill_Clinton

:Chelsea_Clinton

(PIVOT ENTITY)

married to
Predicate Search
Query:

Linked
Data:

Bill Clinton

daughter

:spouse

:child

:Bill_Clinton

married to

:Chelsea_Clinto...
Results
From Exact to Approximate: User Feedback
Distributional Semantics @ NUI Galway
Vector Spaces in Information Extraction, B. Qasemizadeh
• Random Projection is one s...
Distributional Semantics @ NUI Galway
Vector Spaces in Information Extraction, B. Qasemizadeh
• Application: Extraction of...
Conclusions
 Distributional Relational Networks (DRNs) provide:
- A Knowledge Representation model.
- Built-in semantic a...
References
Querying
Freitas & Curry, Natural Language Queries over Heterogeneous
Linked Data Graphs: A Distributional-Comp...
http://andrefreitas.org
Upcoming SlideShare
Loading in …5
×

A Compositional-distributional Semantic Model over Structured Data

718
-1

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
718
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

A Compositional-distributional Semantic Model over Structured Data

  1. 1. A Compositional-distributional Semantic Model over Structured Data André Freitas, Edward Curry Insight @ NUI Galway Insight Workshop on Latent Space Methods (UCD)
  2. 2. Talking to your Data André Freitas, Edward Curry Insight @ NUI Galway
  3. 3. Outline  Motivation & Context  Compositional-distributional Semantic Model: Distributional Relational Networks (DRNs)  Use Case: Question Answering over Linked Data  Conclusions
  4. 4. Motivation & Context
  5. 5. Big Data  Vision: More complete data-based picture of the world for systems and users.
  6. 6. Shift in the Database Landscape  Heterogeneous, complex and large-scale Bases (KBs).  Very-large and dynamic “schemas”. Knowledge circa 2013 circa 2000 10s-100s attributes 1,000s-1,000,000s attributes
  7. 7. Semantic Heterogeneity  Decentralized content generation.  Multiple perspectives (conceptualizations) of the reality.  Ambiguity, vagueness, inconsistency.
  8. 8. Databases for a Complex World How do you query and do reasoning over data on this scenario?
  9. 9. Vocabulary Problem for Databases Query: Who is the daughter of Bill Clinton married to? Semantic Gap Possible representations Semantic approximation = Commonsense Knowledge
  10. 10. Semantics for a Complex World  “Most semantic models have dealt with particular types of constructions, and have been carried out under very simplifying assumptions, in true lab conditions.”  “If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest sentences.” Baroni et al. Frege in Space Formal World Real World
  11. 11. Compositional-Distributional Semantic Models  Principled and effective semantic models for coping with real world semantic conditions.  Based on vector space models.  Practical way to automatically harvest word “meanings” on a large-scale.  Focus on semantic approximation.  Applications - Semantic search. - Approximate semantic inference. - Paraphrase detection. - Semantic anomaly detection.
  12. 12. Addressing the Vocabulary Problem for Databases (with Distributional Semantics)
  13. 13. Solution (Video)
  14. 14. More Complex Queries (Video)
  15. 15. Treo Answers Jeopardy Queries (Video)
  16. 16. Evaluation  102 natural language queries (Test Collection: QALD 2011).  Avg. query execution time: 8.53 s. Dataset (DBpedia 3.7 + YAGO): 45,767 predicates, 5,556,492 classes and 9,434,677 instances
  17. 17. Evaluation
  18. 18. Compositional-Distributional Model: Distributional Relational Networks (DRNs)
  19. 19. Distributional Hypothesis “Words occurring in similar (linguistic) contexts are semantically similar.”  If we can equate meaning with context, we can simply record the contexts in which a word occurs in a collection of texts (a corpus).  This can then be used as a surrogate of its semantic representation.
  20. 20. Vector Space Model function (number of times that the words occur in c1) c1 0.7 0.5 husband spouse cn c2 child
  21. 21. Semantic Similarity/Relatedness c1 husband spouse θ cn c2 child
  22. 22. Approach Overview Querying & Reasoning Core semantic approximation & composition operations Distributional Relational Network (DRN) Structured/logic Knowledge Bases Distributional semantics Large-scale unstructured data Commonsense knowledge
  23. 23. Approach Overview Querying & Reasoning Core semantic approximation & composition operations Distributional Relational Network (DRN) Graph Data (RDF) Explicit Semantic Analysis (ESA) Wikipedia Commonsense knowledge
  24. 24. Approach Overview Querying & Reasoning Core semantic approximation & composition operations Distributional Relational Networks Large-scale unstructured data Structured/logic Knowledge Bases Commonsense knowledge
  25. 25. DRN (Ƭ-Space) predicate KB instance e p DRN r
  26. 26. DRN (Ƭ-Space) Space is segmented by the instances
  27. 27. Core Operations Query DRN
  28. 28. Core Operations Query DRN Core Operations Query & Reasoning Algorithms
  29. 29. Core Principles  Minimize the impact of Ambiguity, Vagueness, Synonymy.  Address the simplest matchings first (heuristics).  Semantic Relatedness as a primitive operation.  Distributional semantics as commonsense knowledge.
  30. 30. Specificity Ordering Use specificity (grammatical class + (IDF)) as a heuristc measure Ambiguity, vagueness, synonimy Low High
  31. 31. Core Operations  Instance search - Proper nouns - String similarity + node cardinality  Class (unary predicate) search - Nouns, adjectives and adverbs - String similarity + Distributional semantic relatedness  Property (binary predicate) search - Nouns, adjectives, verbs and adverbs - Distributional semantic relatedness
  32. 32. Core Operations  Navigation  Extensional expansion - Expands the instances associated with a class.  Disambiguation dialog (instance, predicate)
  33. 33. Use Case: Question Answering over Linked Data
  34. 34. Question Analysis Transform natural language queries into triple patterns “Who is the daughter of Bill Clinton married to?” Bill Clinton daughter married to PODS (INSTANCE) (PREDICATE) (PREDICATE) Query Features
  35. 35. Query Plan Map query features into a query plan. A query plan contains a sequence of core operations. (INSTANCE) (PREDICATE) (PREDICATE)  (1) INSTANCE SEARCH (Bill Clinton)  (2) p1 <- SEARCH PREDICATE (Bill Clintion, daughter)  (3) e1 <- NAVIGATE (Bill Clintion, p1)  (4) p2 <- SEARCH PREDICATE (e1, married to)  (5) e2 <- NAVIGATE (e1, p2) Query Features Query Plan
  36. 36. Instance Search Query: Bill Clinton daughter Instance Search Linked Data: :Bill_Clinton married to
  37. 37. Predicate Search Query: Linked Data: Bill Clinton daughter married to :child :Bill_Clinton :Chelsea_Clinton :religion :Baptists :almaMater ... (PIVOT ENTITY) :Yale_Law_School (ASSOCIATED TRIPLES)
  38. 38. Predicate Search Query: Bill Clinton daughter married to Which properties are semantically related to „daughter‟? Linked Data: :child :Bill_Clinton :Chelsea_Clinton :religion ... :Baptists sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 :almaMater :Yale_Law_School sem_rel(daughter,alma mater)=0.001
  39. 39. Navigate Query: Linked Data: Bill Clinton daughter :child :Bill_Clinton :Chelsea_Clinton married to
  40. 40. Navigate Query: Linked Data: Bill Clinton daughter :child :Bill_Clinton :Chelsea_Clinton (PIVOT ENTITY) married to
  41. 41. Predicate Search Query: Linked Data: Bill Clinton daughter :spouse :child :Bill_Clinton married to :Chelsea_Clinton (PIVOT ENTITY) :Mark_Mezvinsky
  42. 42. Results
  43. 43. From Exact to Approximate: User Feedback
  44. 44. Distributional Semantics @ NUI Galway Vector Spaces in Information Extraction, B. Qasemizadeh • Random Projection is one solution: • Estimate a VSM by a random projection matrix consists a set of randomly created vectors. • i.e. based on the Johnson-Lindenstrauss lemma • verified by the results reported in (Hecht-Nielsen, 1994) * The above figure is copyrighted by Alex Clemmer (http://nullspace.io/)
  45. 45. Distributional Semantics @ NUI Galway Vector Spaces in Information Extraction, B. Qasemizadeh • Application: Extraction of Technology Terms (term classification) • Random Projection • Data Size: 10,000 publicationd • Contexts: words and their position in the neighbourhood of terms • Original Dimension: • approximately 5 million • Reducing the dimension to Random Projection 2000 using Behrang‟s research evolves around classification and finding the optimal contexts in random vector spaces for the extraction of technology terms and their relation. If you are interested please email him at behrangatoffice@gmail.com
  46. 46. Conclusions  Distributional Relational Networks (DRNs) provide: - A Knowledge Representation model. - Built-in semantic approximation. - Associated large-scale commonsense knowledge base (KB) with no manual construction effort.  The compositional-distributional model supports schema-agnostic QA system over a database: - Better recall and query coverage compared to baseline systems. a
  47. 47. References Querying Freitas & Curry, Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach, IUI 2014. Reasoning Pereira da Silva & Freitas, Towards An Approximative OntologyAgnostic Approach for Logic Programs, In FoIKS 2014. KR Novacek et al. Getting the Meaning Right, ISWC 2011. Freitas et al. Distributional Relational Networks, AAAI Fall Symposium 2013.
  48. 48. http://andrefreitas.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×