A Distributional Semantics Approach for
Selective Reasoning on Commonsense
Graph Knowledge Bases
André Freitas, João C. Pereira Da Silva, Edward Curry, Paul Buitelaar
Insight Centre for Data Analytics
NLDB 2014
Montpellier, France
Applying Distributional Semantics to
Commonsense Reasoning
André Freitas, João C. Pereira Da Silva, Edward Curry, Paul Buitelaar
Insight Centre for Data Analytics
NLDB 2014
Montpellier, France
Outline
 Motivation
 Distributional Semantics
 Distributional Navigational Algorithm (DNA)
 Evaluation
 Take-away message
Motivation
4
Semantic Systems & Commonsense Knowledge Bases
Knowledge Representation
Model
Commonsense Data
Expected Result: Intelligent behavior
Semantic flexibility, predictive power, automation ...
Acquisition
Inference Model Scalability
Consistency
5
Formal Representation of Meaning
6
 Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying
assumptions, in true lab conditions.
 If these idealizations are removed it is not clear at all that modern
semantics can give a full account of all but the simplest
models/statements.
Formal World Real World
Baroni et al. 2013
Semantics for a Complex World
7
Commonsense Reasoning
 Coping with KB incompleteness
- Supporting semantic approximation
 Selective reasoning
- Selecting the relevant facts in the context of the inference
Acquisition
Scalability
Strategy: Using distributional semantics to solve both the acquisition and
scalability problems
10
Example
Does John Smith have a degree?
11
Example
Does John Smith have a degree?
12
Example
Does John Smith have a degree?
Selective reasoning
Coping with KB
Incompleteness
13
Applications
 Semantic search
 Question answering
 Approximate semantic inference
 Word sense disambiguation
 Paraphrase detection
 Text entailment
 Semantic anomaly detection
...
14
Distributional Semantics
15
Distributional Hypothesis
“Words occurring in similar (linguistic) contexts tend
to be semantically similar”
 He filled the wampimuk with the substance, passed it
around and we all drunk some
16
Distributional Semantic Models (DSMs)
car
dog
cat
bark
run
leash
17
Context
Semantic Similarity & Relatedness
θ
car
dog
cat
bark
run
leash
18
DSMs as Commonsense Reasoning
Commonsense is here
θ
car
dog
cat
bark
run
leash
19
DSMs as Commonsense Reasoning
θ
car
dog
cat
bark
run
leash
...
vs.
Semantic best-effort
Distributional Navigational
Algorithm (DNA)
21
Approach Overview
Distributional
Navigational
Algorithm (DNA)
Ƭ-Space
Large-scale
unstructured data
Unstructured
Commonsense KB
Structured
Commonsense KB
Distributional
semantics
Reasoning
Context
22
Ƭ-Space
Distributional
heuristics
23
Distributional semantic relatedness as a
Selectivity Heuristics
Distributional
heuristics
24
target
source
Distributional
heuristics
25
Distributional semantic relatedness as a
Selectivity Heuristics
target
source
Distributional
heuristics
26
Distributional semantic relatedness as a
Selectivity Heuristics
targettarget
source
Distributional Navigational Algorithm
(DNA)
Input:
Reasoning context: Source and target word pairs
Structured Knowledge Base (KB)
Distributional Semantic Model (DSM)
Output:
Meaningful paths in the KB connecting source and target
27
Distributional Navigational Algorithm
(DNA)
28
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
John
Smith
29
Step: Resoning context = <John Smith, degree>
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
30
occupation
Step: Get neighboring relations
engineer
John
Smith
John
Smith
catholic
religion
...
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
31
Step: Calculate the distributional semantic relatedness
between the target term and the neighboring entities
John
Smith
John
Smith
catholicoccupation
engineer
religion
...
sem rel (catholic, degree)
= 0.004
sem rel (engineer, degree) =
0.07
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
32
John
Smith
John
Smith
catholicoccupation
engineer
religion
...
sem rel (catholic, degree)
= 0.004
sem rel (engineer, degree) =
0.01
Step: Filter the elements below the threshold
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
33
John
Smith
John
Smith
occupation
engineer
Step: Navigate to the next nodes
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
34
John
Smith
John
Smith
occupation
engineer
Step: redefine the reasoning context:
<engineer, degree>
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
Step: Get neighboring relations
John
Smith
engineer learn
subjectof
bridge a
rivercapableof
dam
creates
35
occupation
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
sem rel (dam, degree) = 0.002
Step: Calculate distributional semantic relatedness
between the target term and the neighboring entities
sem rel (brdge a river, degree)
= 0.004
sem rel (learn, degree) = 0.01
John
Smith
engineer learn
subjectof
bridge a
rivercapableof
dam
creates
36
occupation
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
sem rel (dam, degree) = 0.002
Step: Filter the elements below the threshold
sem rel (brdge a river, degree)
= 0.004
sem rel (learn, degree) = 0.01
John
Smith
engineer learn
subjectof
bridge a
rivercapableof
dam
creates
37
occupation
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
Step: Search highly related entities in the KB not connected
(distributional semantics)
John
Smith
engineer learn
subjectof
Reasoning context: ‘learn degree’
38
occupation
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
Step: Navigate to the elements above the threshold
John
Smith
engineer learn
subjectof
39
occupation
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
Step: Repeat the steps
John
Smith
engineer learn
subjectof
education
have or
involve
40
occupation
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
Step: Repeat the steps
John
Smith
engineer learn
subjectof
education
have or
involve
at location
university
41
occupation
Distributional Navigational Algorithm
(DNA)
Does John Smith have a degree?
Structured
Commonsense KB
Distributional
Commonsense KB
Step: Search highly related entities in the KB not connected
(distributional semantics)
John
Smith
engineer learn
subjectof
education
have or
involve
at location
university
Reasoning context: ‘university degree’
42
occupation
Distributional Navigational Algorithm
(DNA)
Structured
Commonsense KB
Distributional
Commonsense KB
John
Smith
engineer learn
subjectof
education
have or
involve
at location
universitycollege
Does John Smith have a degree?
Step: Search highly related entities in the KB not connected
(distributional semantics)
Reasoning context: ‘university degree’
43
occupation
Distributional Navigational Algorithm
(DNA)
Structured
Commonsense KB
Distributional
Commonsense KB
John
Smith
engineer learn
subjectof
education
have or
involve
at location
universitycollege
Does John Smith have a degree?
Step: Repeat the steps
degree
gives
44
occupation
Examples of Selected Paths
Reasoning context: < battle, war >
45
Examples of Selected Paths
46
Improving the DNA Algorithm:
Semantic Differential Δ
47
Closer to the target
Evaluation
48
Evaluating Semantic Selectivity
How does the semantic selectivity scale with the increase in the
number of candidate paths?
How does the accuracy of the semantic selectivity scale with the
increase in the number of candidate paths?
49
Experimental Setup
 Query set: 102 word pairs (derived from Question
Answering over Linked Data queries 2011/2012)
E.g.
- What is the highest mountain?
- Mount Everest elevation 8848.0
 Distributional Semantic Model: ESA
 Threshold: η = 0.05
 Dataset: ConceptNet
 Gold standard: Manual validation with two independent
annotators
50
ConceptNet
 Number of clauses x per relation type:
x = 1 (45,311)
1 < x < 10 (11,804)
10 <= x < 20 (906)
20<= x < 500 (790)
x >= 500 (50)
51
Semantic Selectivity
total number of paths (path length n)
number of paths selected
selectivity =
 The semantic selectivity for the DNA approach scales with the
increasing in the number of candidate paths
 How does the semantic selectivity scale with the increase in
the number of candidate paths?
52
Semantic Relevance
number of returned paths
number of relevant paths
accuracy =
 What is the semantic relevance of the returned paths?
 How does the accuracy of the semantic selectivity scale with
the increase in the number of candidate paths?
 There is a significant reduction in the accuracy with the increase
in the number of paths. However the accuracy value remains
high.
53
Evaluating Semantic Incompleteness
How does distributional semantics support increasing the KB
completeness?
54
Incompleteness
 39 <source, target> query pairs
 Over all ConceptNet entities
 Example:
- Query: < mayor, city >
- Returned entities:
council
municipality
downtown
ward
incumbent
borough
reelected
metropolitan
elect
candidate
politician
democratic
55
Incompleteness
 Avg. KB completion precision = 0.568
 Avg. # of strongly related entities returned per query = 19.21
number of retrieved entities
number of strongly related entities
KB completion precision =
 How does distributional semantics support increasing the KB
completeness?
 Distributional semantics supports improving the completeness of
the KB
 However, further investigation is necessary to improve the
precision of distributional models
56
Take-away message
 Distributional Semantics provides in the selection of
meaningful paths:
- high selectivity
- high selectivity scalability
- medium-high accuracy
 Distributional semantics supports improving the completeness
of the KB
- However, further investigation is necessary to improve the
precision of distributional models in this context
57
EasyESA: Do-it-yourself
http://treo.deri.ie/easyesa/
58

A Distributional Semantics Approach for Selective Reasoning on Commonsense Graph Knowledge Bases

  • 1.
    A Distributional SemanticsApproach for Selective Reasoning on Commonsense Graph Knowledge Bases André Freitas, João C. Pereira Da Silva, Edward Curry, Paul Buitelaar Insight Centre for Data Analytics NLDB 2014 Montpellier, France
  • 2.
    Applying Distributional Semanticsto Commonsense Reasoning André Freitas, João C. Pereira Da Silva, Edward Curry, Paul Buitelaar Insight Centre for Data Analytics NLDB 2014 Montpellier, France
  • 3.
    Outline  Motivation  DistributionalSemantics  Distributional Navigational Algorithm (DNA)  Evaluation  Take-away message
  • 4.
  • 5.
    Semantic Systems &Commonsense Knowledge Bases Knowledge Representation Model Commonsense Data Expected Result: Intelligent behavior Semantic flexibility, predictive power, automation ... Acquisition Inference Model Scalability Consistency 5
  • 6.
  • 7.
     Most semanticmodels have dealt with particular types of constructions, and have been carried out under very simplifying assumptions, in true lab conditions.  If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements. Formal World Real World Baroni et al. 2013 Semantics for a Complex World 7
  • 8.
    Commonsense Reasoning  Copingwith KB incompleteness - Supporting semantic approximation  Selective reasoning - Selecting the relevant facts in the context of the inference Acquisition Scalability Strategy: Using distributional semantics to solve both the acquisition and scalability problems 10
  • 9.
    Example Does John Smithhave a degree? 11
  • 10.
    Example Does John Smithhave a degree? 12
  • 11.
    Example Does John Smithhave a degree? Selective reasoning Coping with KB Incompleteness 13
  • 12.
    Applications  Semantic search Question answering  Approximate semantic inference  Word sense disambiguation  Paraphrase detection  Text entailment  Semantic anomaly detection ... 14
  • 13.
  • 14.
    Distributional Hypothesis “Words occurringin similar (linguistic) contexts tend to be semantically similar”  He filled the wampimuk with the substance, passed it around and we all drunk some 16
  • 15.
    Distributional Semantic Models(DSMs) car dog cat bark run leash 17 Context
  • 16.
    Semantic Similarity &Relatedness θ car dog cat bark run leash 18
  • 17.
    DSMs as CommonsenseReasoning Commonsense is here θ car dog cat bark run leash 19
  • 18.
    DSMs as CommonsenseReasoning θ car dog cat bark run leash ... vs. Semantic best-effort
  • 19.
  • 20.
    Approach Overview Distributional Navigational Algorithm (DNA) Ƭ-Space Large-scale unstructureddata Unstructured Commonsense KB Structured Commonsense KB Distributional semantics Reasoning Context 22
  • 21.
  • 22.
    Distributional semantic relatednessas a Selectivity Heuristics Distributional heuristics 24 target source
  • 23.
  • 24.
    Distributional heuristics 26 Distributional semantic relatednessas a Selectivity Heuristics targettarget source
  • 25.
    Distributional Navigational Algorithm (DNA) Input: Reasoningcontext: Source and target word pairs Structured Knowledge Base (KB) Distributional Semantic Model (DSM) Output: Meaningful paths in the KB connecting source and target 27
  • 26.
  • 27.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB John Smith 29 Step: Resoning context = <John Smith, degree>
  • 28.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB 30 occupation Step: Get neighboring relations engineer John Smith John Smith catholic religion ...
  • 29.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB 31 Step: Calculate the distributional semantic relatedness between the target term and the neighboring entities John Smith John Smith catholicoccupation engineer religion ... sem rel (catholic, degree) = 0.004 sem rel (engineer, degree) = 0.07
  • 30.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB 32 John Smith John Smith catholicoccupation engineer religion ... sem rel (catholic, degree) = 0.004 sem rel (engineer, degree) = 0.01 Step: Filter the elements below the threshold
  • 31.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB 33 John Smith John Smith occupation engineer Step: Navigate to the next nodes
  • 32.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB 34 John Smith John Smith occupation engineer Step: redefine the reasoning context: <engineer, degree>
  • 33.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB Step: Get neighboring relations John Smith engineer learn subjectof bridge a rivercapableof dam creates 35 occupation
  • 34.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB sem rel (dam, degree) = 0.002 Step: Calculate distributional semantic relatedness between the target term and the neighboring entities sem rel (brdge a river, degree) = 0.004 sem rel (learn, degree) = 0.01 John Smith engineer learn subjectof bridge a rivercapableof dam creates 36 occupation
  • 35.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB sem rel (dam, degree) = 0.002 Step: Filter the elements below the threshold sem rel (brdge a river, degree) = 0.004 sem rel (learn, degree) = 0.01 John Smith engineer learn subjectof bridge a rivercapableof dam creates 37 occupation
  • 36.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB Step: Search highly related entities in the KB not connected (distributional semantics) John Smith engineer learn subjectof Reasoning context: ‘learn degree’ 38 occupation
  • 37.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB Step: Navigate to the elements above the threshold John Smith engineer learn subjectof 39 occupation
  • 38.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB Step: Repeat the steps John Smith engineer learn subjectof education have or involve 40 occupation
  • 39.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB Step: Repeat the steps John Smith engineer learn subjectof education have or involve at location university 41 occupation
  • 40.
    Distributional Navigational Algorithm (DNA) DoesJohn Smith have a degree? Structured Commonsense KB Distributional Commonsense KB Step: Search highly related entities in the KB not connected (distributional semantics) John Smith engineer learn subjectof education have or involve at location university Reasoning context: ‘university degree’ 42 occupation
  • 41.
    Distributional Navigational Algorithm (DNA) Structured CommonsenseKB Distributional Commonsense KB John Smith engineer learn subjectof education have or involve at location universitycollege Does John Smith have a degree? Step: Search highly related entities in the KB not connected (distributional semantics) Reasoning context: ‘university degree’ 43 occupation
  • 42.
    Distributional Navigational Algorithm (DNA) Structured CommonsenseKB Distributional Commonsense KB John Smith engineer learn subjectof education have or involve at location universitycollege Does John Smith have a degree? Step: Repeat the steps degree gives 44 occupation
  • 43.
    Examples of SelectedPaths Reasoning context: < battle, war > 45
  • 44.
  • 45.
    Improving the DNAAlgorithm: Semantic Differential Δ 47 Closer to the target
  • 46.
  • 47.
    Evaluating Semantic Selectivity Howdoes the semantic selectivity scale with the increase in the number of candidate paths? How does the accuracy of the semantic selectivity scale with the increase in the number of candidate paths? 49
  • 48.
    Experimental Setup  Queryset: 102 word pairs (derived from Question Answering over Linked Data queries 2011/2012) E.g. - What is the highest mountain? - Mount Everest elevation 8848.0  Distributional Semantic Model: ESA  Threshold: η = 0.05  Dataset: ConceptNet  Gold standard: Manual validation with two independent annotators 50
  • 49.
    ConceptNet  Number ofclauses x per relation type: x = 1 (45,311) 1 < x < 10 (11,804) 10 <= x < 20 (906) 20<= x < 500 (790) x >= 500 (50) 51
  • 50.
    Semantic Selectivity total numberof paths (path length n) number of paths selected selectivity =  The semantic selectivity for the DNA approach scales with the increasing in the number of candidate paths  How does the semantic selectivity scale with the increase in the number of candidate paths? 52
  • 51.
    Semantic Relevance number ofreturned paths number of relevant paths accuracy =  What is the semantic relevance of the returned paths?  How does the accuracy of the semantic selectivity scale with the increase in the number of candidate paths?  There is a significant reduction in the accuracy with the increase in the number of paths. However the accuracy value remains high. 53
  • 52.
    Evaluating Semantic Incompleteness Howdoes distributional semantics support increasing the KB completeness? 54
  • 53.
    Incompleteness  39 <source,target> query pairs  Over all ConceptNet entities  Example: - Query: < mayor, city > - Returned entities: council municipality downtown ward incumbent borough reelected metropolitan elect candidate politician democratic 55
  • 54.
    Incompleteness  Avg. KBcompletion precision = 0.568  Avg. # of strongly related entities returned per query = 19.21 number of retrieved entities number of strongly related entities KB completion precision =  How does distributional semantics support increasing the KB completeness?  Distributional semantics supports improving the completeness of the KB  However, further investigation is necessary to improve the precision of distributional models 56
  • 55.
    Take-away message  DistributionalSemantics provides in the selection of meaningful paths: - high selectivity - high selectivity scalability - medium-high accuracy  Distributional semantics supports improving the completeness of the KB - However, further investigation is necessary to improve the precision of distributional models in this context 57
  • 56.