Semantics at Scale:
A Distributional Approach
André Freitas
UFRJ
Rio de Janeiro, March 2015
Motivation
Semantic Computing for coping with the
long tail of data variety
frequency of use
# of entities and attributes
relational NoSQL
schema-less
unstructured
more
knowledge
Full data coverage
Full automation
Full knowledge
Structure/Semantics
Unstructured Data Structured Data
Consistent
Comparable
Processable
Easy to generate Easy to analyze
Semantic Computing
Distributional
Semantics
Robust Semantic Model
 Semantic intelligent behavior is highly dependent on
knowledge scale (commonsense, semantic)
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
6
Robust Semantic Model
 Not scalable!
1st Hard problem: Acquisition
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
7
Robust Semantic Model
 Not scalable!
2nd Hard problem: Consistency
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
8
Robust Semantic Model
 Not scalable!
3rd Hard problem: Performance
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
9
 “Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying
assumptions, in true lab conditions.”
 “If these idealizations are removed it is not clear at all that modern
semantics can give a full account of all but the simplest
models/statements.”
Formal World Real World
Baroni et al. 2013
Semantics for a Complex World
10
Distributional Semantic Models
 Semantic Model with low acquisition effort
(automatically built from text)
Simplification of the representation
 Enables the construction of comprehensive
commonsense/semantic KBs
 What is the cost?
Some level of noise
(semantic best-effort)
Limited semantic model
11
Distributional Hypothesis
“Words occurring in similar (linguistic) contexts tend
to be semantically similar”
 “He filled the wampimuk with the substance, passed it
around and we all drunk some”
12 McDonald & Ramscar, 2001Baroni & Boleda, 2010Harris, 1954
Distributional Semantic Models (DSMs)
“The dog barked in the park. The owner of the dog put him on the
leash since he barked.”
contexts = nouns and verbs in the same
sentence
13
Distributional Semantic Models (DSMs)
“The dog barked in the park. The owner of the dog put him on the
leash since he barked.”
bark
dog
park
leash
contexts = nouns and verbs in the same
sentence
bark : 2
park : 1
leash : 1
owner : 1
14
Semantic Relatedness
car
dog
bark
run
leash
15
Semantic Relatedness
θ
car
dog
cat
bark
run
leash
16
DSMs as Commonsense Reasoning
θ
car
dog
cat
bark
run
leash
17
First Application
Schema-agnostic
Query Approach
Shift in the Database Landscape
 Very-large and dynamic “schemas”.
10s-100s attributes
1,000s-1,000,000s attributes
before 2000
circa 2015
19 Brodie & Liu, 2010
Databases for a Complex World
How do you query data on this scenario?
20
Schema-agnosticism
Abstraction
Layer
21
Who is the daughter
of Bill Clinton?
Bill
Clinton
Chelsea
Clinton
child
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to?
Semantic Gap Schema-agnostic
query mechanisms
 Abstraction level differences
 Lexical variation
 Structural (compositional) differences
22
Proposed Approach
Who is the daughter of Bill Clinton married to?
 Abstraction level differences
 Lexical variation
 Structural (compositional) differences
23
Ƭ-Space: Hybrid Distributional-Relational
Semantic Model
24
A Distributional Structured Semantic Space for
Querying RDF Graph Data, IJSC 2012
Approach Overview
Query Planner
Ƭ
Large-scale
unstructured data
Database
Query AnalysisSchema-agnostic
Query
Query Features
Query Plan
25
Addressing the Vocabulary Problem for
Databases (with Distributional Semantics)
Gaelic: direction
26
Dataset
Dataset (DBpedia 3.6 + YAGO classes):
45,768 properties
288,316 classes
9,434,677 instances
128,071,259 triples
27
Simple Queries (Video)
28
More Complex Queries (Video)
29
Treo Answers Jeopardy Queries (Video)
http://bit.ly/1hWcch9
Relevance
31
Comparative Analysis
 Better recall and query coverage compared to baselines with
equivalent precision.
 More comprehensive semantic matching.
32
Distributional Semantics vs WordNet
 Distributional semantics provides a more comprehensive
semantic matching
33
A Distributional Approach for Terminological Semantic Search on the Linked Data
Web, ACM SAC, 2012
Large-scale Querying
frequency of use
# of entities and attributes
relational NoSQL
schema-less
unstructured
Schema-agnostic querying
Schema-agnostic Database will be
released in April 2015
Large-Scale Graph Extraction
Relation/Graph Extraction
 Now that we are schema-agnostic ...
 From Text to Knowledge Graph
 Relations + Context + Entity Linking
 Ontology-agnostic
 RDF serialization
Relation/Graph Extraction
In 2002, GE acquired the wind power assets of Enron.In 2002 GE acquired the wind power assets of Enron
Relation/Graph Extraction
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
A Semantic Best-Effort Approach for Extracting Structured
Discourse Graphs from Wikipedia, WoLE 2012
Large-scale Extraction
frequency of use
# of entities and attributes
relational NoSQL
schema-less
unstructured
Large-scale Graph Extraction
Approximate &
Selective Reasoning
Commonsense Reasoning
 Coping with KB incompleteness
- Supporting semantic approximation
 Selective (focussed) reasoning
- Selecting the relevant facts in the context of the inference
Acquisition
Scalability
Strategy: Using distributional semantics to solve both the acquisition
and scalability problems
42
Commonsense Reasoning
43
John Smith EngineerInstance-level
occupation
Does John Smith have a degree?
Commonsense Reasoning
44
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
Does John Smith have a degree?
Commonsense
KB
Selective Reasoning
45
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
memorization
is a
Does John Smith have a degree?
Commonsense
KB
Selective reasoning
Commonsense Reasoning
46
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
memorization
is a
educationhave or
involve
Does John Smith have a degree?
Commonsense
KB
Commonsense Reasoning
47
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
memorization
is a
education
have or
involve
university at location
Does John Smith have a degree?
Commonsense
KB
Coping with Incompleteness
48
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
memorization
is a
education
have or
involve
university at locationcollege
Does John Smith have a degree?
Commonsense
KB
Coping with KB
Incompleteness
Commonsense Reasoning
Does John Smith have a degree?
49
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
memorization
is a
education
have or
involve
university at locationcollege
degreegives
Commonsense
KB
A Distributional Semantics Approach for Selective Reasoning on
Commonsense Graph Knowledge Bases, NLDB 2014.
Programming in a Schema-agnostic World
50
Towards An Approximative Ontology-Agnostic Approach for Logic
Programs, FOIKS 2014.
Semantics at Scale: When Distributional Semantics meets Logic
Programming, ALP Newsletter, 2014
Programming in a Schema-agnostic World
frequency of use
# of entities and attributes
relational NoSQL
schema-less
unstructured
Schema-agnostic programs
Concluding Remarks
 Existing semantic technologies can address today major data
management problems
 Muiti-disciplinarity is one key:
- NLP + IR + Semantic Web + Databases
 Schema-agnosticism is a central property/functionality/goal!
 Distributional Semantics + semantics of structured data =
schema-agnosticism
 Schema-agnosticism brings major impact for information systems.
 We can tame the long tail of data variety!
 The wave is just starting. Be a part of it!
Take-away Message
53
Want to play with Distributional
Semantics?
http://easy-esa.org
54

Semantics at Scale: A Distributional Approach

  • 1.
    Semantics at Scale: ADistributional Approach André Freitas UFRJ Rio de Janeiro, March 2015
  • 2.
  • 3.
    Semantic Computing forcoping with the long tail of data variety frequency of use # of entities and attributes relational NoSQL schema-less unstructured more knowledge Full data coverage Full automation Full knowledge
  • 4.
    Structure/Semantics Unstructured Data StructuredData Consistent Comparable Processable Easy to generate Easy to analyze Semantic Computing
  • 5.
  • 6.
    Robust Semantic Model Semantic intelligent behavior is highly dependent on knowledge scale (commonsense, semantic) Semantics = Formal meaning representation model (lots of data) + inference model 6
  • 7.
    Robust Semantic Model Not scalable! 1st Hard problem: Acquisition Semantics = Formal meaning representation model (lots of data) + inference model 7
  • 8.
    Robust Semantic Model Not scalable! 2nd Hard problem: Consistency Semantics = Formal meaning representation model (lots of data) + inference model 8
  • 9.
    Robust Semantic Model Not scalable! 3rd Hard problem: Performance Semantics = Formal meaning representation model (lots of data) + inference model 9
  • 10.
     “Most semanticmodels have dealt with particular types of constructions, and have been carried out under very simplifying assumptions, in true lab conditions.”  “If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements.” Formal World Real World Baroni et al. 2013 Semantics for a Complex World 10
  • 11.
    Distributional Semantic Models Semantic Model with low acquisition effort (automatically built from text) Simplification of the representation  Enables the construction of comprehensive commonsense/semantic KBs  What is the cost? Some level of noise (semantic best-effort) Limited semantic model 11
  • 12.
    Distributional Hypothesis “Words occurringin similar (linguistic) contexts tend to be semantically similar”  “He filled the wampimuk with the substance, passed it around and we all drunk some” 12 McDonald & Ramscar, 2001Baroni & Boleda, 2010Harris, 1954
  • 13.
    Distributional Semantic Models(DSMs) “The dog barked in the park. The owner of the dog put him on the leash since he barked.” contexts = nouns and verbs in the same sentence 13
  • 14.
    Distributional Semantic Models(DSMs) “The dog barked in the park. The owner of the dog put him on the leash since he barked.” bark dog park leash contexts = nouns and verbs in the same sentence bark : 2 park : 1 leash : 1 owner : 1 14
  • 15.
  • 16.
  • 17.
    DSMs as CommonsenseReasoning θ car dog cat bark run leash 17
  • 18.
  • 19.
    Shift in theDatabase Landscape  Very-large and dynamic “schemas”. 10s-100s attributes 1,000s-1,000,000s attributes before 2000 circa 2015 19 Brodie & Liu, 2010
  • 20.
    Databases for aComplex World How do you query data on this scenario? 20
  • 21.
    Schema-agnosticism Abstraction Layer 21 Who is thedaughter of Bill Clinton? Bill Clinton Chelsea Clinton child
  • 22.
    Vocabulary Problem forDatabases Who is the daughter of Bill Clinton married to? Semantic Gap Schema-agnostic query mechanisms  Abstraction level differences  Lexical variation  Structural (compositional) differences 22
  • 23.
    Proposed Approach Who isthe daughter of Bill Clinton married to?  Abstraction level differences  Lexical variation  Structural (compositional) differences 23
  • 24.
    Ƭ-Space: Hybrid Distributional-Relational SemanticModel 24 A Distributional Structured Semantic Space for Querying RDF Graph Data, IJSC 2012
  • 25.
    Approach Overview Query Planner Ƭ Large-scale unstructureddata Database Query AnalysisSchema-agnostic Query Query Features Query Plan 25
  • 26.
    Addressing the VocabularyProblem for Databases (with Distributional Semantics) Gaelic: direction 26
  • 27.
    Dataset Dataset (DBpedia 3.6+ YAGO classes): 45,768 properties 288,316 classes 9,434,677 instances 128,071,259 triples 27
  • 28.
  • 29.
  • 30.
    Treo Answers JeopardyQueries (Video) http://bit.ly/1hWcch9
  • 31.
  • 32.
    Comparative Analysis  Betterrecall and query coverage compared to baselines with equivalent precision.  More comprehensive semantic matching. 32
  • 33.
    Distributional Semantics vsWordNet  Distributional semantics provides a more comprehensive semantic matching 33 A Distributional Approach for Terminological Semantic Search on the Linked Data Web, ACM SAC, 2012
  • 34.
    Large-scale Querying frequency ofuse # of entities and attributes relational NoSQL schema-less unstructured Schema-agnostic querying
  • 35.
    Schema-agnostic Database willbe released in April 2015
  • 36.
  • 37.
    Relation/Graph Extraction  Nowthat we are schema-agnostic ...  From Text to Knowledge Graph  Relations + Context + Entity Linking  Ontology-agnostic  RDF serialization
  • 38.
    Relation/Graph Extraction In 2002,GE acquired the wind power assets of Enron.In 2002 GE acquired the wind power assets of Enron
  • 39.
    Relation/Graph Extraction General ElectricCompany, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia, WoLE 2012
  • 40.
    Large-scale Extraction frequency ofuse # of entities and attributes relational NoSQL schema-less unstructured Large-scale Graph Extraction
  • 41.
  • 42.
    Commonsense Reasoning  Copingwith KB incompleteness - Supporting semantic approximation  Selective (focussed) reasoning - Selecting the relevant facts in the context of the inference Acquisition Scalability Strategy: Using distributional semantics to solve both the acquisition and scalability problems 42
  • 43.
    Commonsense Reasoning 43 John SmithEngineerInstance-level occupation Does John Smith have a degree?
  • 44.
    Commonsense Reasoning 44 John SmithEngineerInstance-level occupation Engineer learn subjectof Does John Smith have a degree? Commonsense KB
  • 45.
    Selective Reasoning 45 John SmithEngineerInstance-level occupation Engineer learn subjectof memorization is a Does John Smith have a degree? Commonsense KB Selective reasoning
  • 46.
    Commonsense Reasoning 46 John SmithEngineerInstance-level occupation Engineer learn subjectof memorization is a educationhave or involve Does John Smith have a degree? Commonsense KB
  • 47.
    Commonsense Reasoning 47 John SmithEngineerInstance-level occupation Engineer learn subjectof memorization is a education have or involve university at location Does John Smith have a degree? Commonsense KB
  • 48.
    Coping with Incompleteness 48 JohnSmith EngineerInstance-level occupation Engineer learn subjectof memorization is a education have or involve university at locationcollege Does John Smith have a degree? Commonsense KB Coping with KB Incompleteness
  • 49.
    Commonsense Reasoning Does JohnSmith have a degree? 49 John Smith EngineerInstance-level occupation Engineer learn subjectof memorization is a education have or involve university at locationcollege degreegives Commonsense KB A Distributional Semantics Approach for Selective Reasoning on Commonsense Graph Knowledge Bases, NLDB 2014.
  • 50.
    Programming in aSchema-agnostic World 50 Towards An Approximative Ontology-Agnostic Approach for Logic Programs, FOIKS 2014. Semantics at Scale: When Distributional Semantics meets Logic Programming, ALP Newsletter, 2014
  • 51.
    Programming in aSchema-agnostic World frequency of use # of entities and attributes relational NoSQL schema-less unstructured Schema-agnostic programs
  • 52.
  • 53.
     Existing semantictechnologies can address today major data management problems  Muiti-disciplinarity is one key: - NLP + IR + Semantic Web + Databases  Schema-agnosticism is a central property/functionality/goal!  Distributional Semantics + semantics of structured data = schema-agnosticism  Schema-agnosticism brings major impact for information systems.  We can tame the long tail of data variety!  The wave is just starting. Be a part of it! Take-away Message 53
  • 54.
    Want to playwith Distributional Semantics? http://easy-esa.org 54