Learning Multilingual Semantic Parsers for Question Answering over Linked Data - A comparison of neural and probabilistic graphical model architectures

Learning Multilingual Semantic Parsers for  
Question Answering over Linked Data
A comparison of neural and probabilistic graphical model architectures
PhD Dissertation Defense Talk

March 2019
Sherzod Hakimov
Semantic Computing Group, CITEC

Bielefeld University

Outline
1. Introduction

2. Motivation & Problem

3. Preliminaries

4. Dependency parse tree-based Semantic Parsing Approach

5. Evaluation of diﬀerent architectural choices for “simple question” answering

6. Discussion

7. Conclusion
2

Introduction
44
Human-machine interaction
Give me the route to Jahnplatz

Introduction
55
?

Introduction
66
? Knowledge Base
6

7
What is Semantic Parsing?
• mapping natural language sentence to a detailed meaning representation

8
route($LOC, “Jahnplatz”)
• mapping natural language sentence to a detailed representation of meaning representation
route(StartLocation, EndLocation)

9

• meaning representation can be modelled using a formal language that

10

• meaning representation can be modelled using a formal language, e.g. lambda calculus

• an ontology with properties, classes, entities, etc. (route, create_calendar_event, set_alarm)

• supports automated execution or reasoning

11
Give me the route to Jahnplatz Query
Knowledge Base
Answer
Why do we need Semantic Parsers?

12
Give me the route to Jahnplatz Query
Knowledge Base
Answer
Why do we need Semantic Parsers?

Motivation
• Building semantic parsers with application on Question Answering
13
Linked Data Cloud

Motivation

• Building multilingual solutions that can be applied for multiple languages
14
Which German politicians were born in Bielefeld?
Which metal has a liquid form?
Welche deutschen Politiker wurden in Bielefeld geboren?
Welches Metall hat eine ﬂüssige Form?
¿Qué políticos alemanes nacieron en Bielefeld?
¿Qué metal tiene una forma líquida?

Motivation

• Building multilingual solutions that can be extended for other languages

• Comparison and evaluation of diﬀerent model architectures
15

Motivation

• Building multilingual solutions that can be extended for other languages

• Comparison and evaluation of diﬀerent model architectures

• Highlight the challenges of building Question Answering systems
16

• based on structured content from Wikipedia

• more than 130 languages supported

• 760 classes, 1105 object & 1622 data type properties

• ca. 9 million resources
17

dbr:Dan_Brown
dbo:author
dbo:author
dbr:Tom_Hanks
dbo:starring
dbo:starring
dbo:basedOn
dbo:basedOn
dbo:Writer dbo:Book dbo:Film dbo:Actor
DBpedia Ontology & Data
18

dbr:Dan_Brown
dbo:author
Question Answering on RDF Data
dbr:Inferno_(novel)
19

dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)Triple: dbr:Inferno_(novel) dbo:author dbr:Dan_Brown
20

dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)
Dan Brown is the author of Inferno
Triple:
Natural Language:
dbr:Inferno_(novel) dbo:author dbr:Dan_Brown
21

dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)
Who is the author of Inferno?Natural Language:
Question format
dbr:Inferno_(novel) dbo:author dbr:Dan_BrownTriple:
Natural Language:
22

dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)
dbr:Inferno_(novel) dbo:author dbr:Dan_BrownTriple:
Natural Language:
Who is the author of Inferno?Natural Language:
SPARQL Query:
Question format
SELECT ?x WHERE {dbr:Inferno_(novel) dbo:author ?x}
23

dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)Who is the author of Inferno?
Semantic Parser
24

Research Questions
How to map natural language phrases into knowledge base entries for multiple languages?
Which linguistic resources can be used?
25
dbr:Dan_Brown
dbo:author
Who is the author of Inferno? dbr:Inferno_(novel)
Who wrote Inferno?
Who is the writer of Inferno?

Research Questions
How to map natural language phrases into knowledge base entries for multiple languages?
Which linguistic resources can be used?
26
dbr:Dan_Brown
dbo:author
Who is the author of Inferno? dbr:Inferno_(novel)
Who wrote Inferno?
Who is the writer of Inferno?
Lexical Gap: write -> dbo:author

Research Questions
How to disambiguate URIs when multiple candidates are retrieved from mapping natural
language tokens into knowledge base entries?
27
When was Inferno released?
SELECT ?x WHERE {dbr:Inferno_(novel) dbo:releaseDate ?x}
dbr:Inferno_(2016_ﬁlm) dbr:Inferno_(novel)

Research Questions
How to use syntactic information of a natural language question together with semantic
representations of entries in a knowledge base?
28
Who wrote Inferno?
dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)
SELECT ?x WHERE { dbr:Inferno_(novel) dbo:author ?x }
wrote 
(VERB)
Who 
(PRON)
Inferno 
(PROPN)
nsubj dobj

Research Questions
What are the advantages and the disadvantages of a multilingual QA system vs. a
monolingual system built for each language?
29
Who is the author of Inferno? dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)
Wer ist der Autor von Inferno?
¿Quién es el autor de Inferno?

Research Questions
What effort is required to adapt our QA pipelines to another language?
30
Who is the author of Inferno? dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)
Qui est l'auteur de Inferno?
Infernoning mualliﬁ kim?

Preliminaries
• Logical Form - DUDES, formalism for specifying meaning representations for dependency tree
structures

32

Preliminaries
• Logical Form - DUDES, formalism for specifying meaning representations for dependency tree
structures

• Semantic Composition - acquiring the meaning representations using the syntax of questions
33

Logical Form
34
• DUDES - Dependency-based Underspeciﬁed Discourse Representation Structures (Cimiano et al [1])

[1] Cimiano, P., 2009, Flexible semantic composition with DUDES. In Proceedings of the Eighth International Conference on
Computational Semantics (pp. 272-276). Association for Computational Linguistics.

Logical Form
35
• DUDES - Dependency-based Underspeciﬁed Discourse Representation Structures (Cimiano et al [1])

• Formalism for specifying meaning representation

• Flexible semantic composition w.r.t order of application

• Build on semantic dependencies e.g. suitable for working with dependency-based syntactic analysis 
[1] Cimiano, P., 2009, Flexible semantic composition with DUDES. In Proceedings of the Eighth International Conference on
Computational Semantics (pp. 272-276). Association for Computational Linguistics.

DUDES
v : is the main variable
vs : the projection variables
l : is the label of the main DRS
drs : is a DRS (Discourse Representation Structure)
slots : is a set of semantic dependencies
36

Semantic Composition with DUDES
Who created Wikipedia?
Input: a natural language question and its dependency parse tree
37

Who created Wikipedia?
Input: a natural language question and its dependency parse tree
dbr:Wikipedia dbo:author ?x
Output: a meaning representation based on certain domain
38

Each node gets a pair of assignments: DUDES Type + Knowledge base ID
Oracle
39

Semantic Composition with DUDES - Bottom Up
40

41
1
2

42
2

43
2

44
2

45
x

46

Dependency parse tree-based
Semantic Parsing Approach
48

Dependency parse tree-based Semantic Parsing
Approach
• multilingual semantic parsing approach: English, German & Spanish [1]
49
[1] Hakimov S, Jebbara S, Cimiano P. AMUSE: Multilingual Semantic Parsing for Question Answering over Linked Data.
In Proceedings of the 16th International Semantic Web Conference (ISWC), 2017

Approach

• uses language-independent dependency parse trees from Universal
Dependencies
50
[1] Hakimov S, Jebbara S, Cimiano P. AMUSE: Multilingual Semantic Parsing for Question Answering over Linked Data.
In Proceedings of the 16th International Semantic Web Conference (ISWC), 2017

Approach

• uses language-independent dependency parse trees from Universal
Dependencies

• combines diﬀerent types of lexical information: DBpedia Ontology labels,
the M-ATOLL[2] lexicon & word embeddings
51
[1] Hakimov S, Jebbara S, Cimiano P. “AMUSE: Multilingual Semantic Parsing for Question Answering over Linked
Data”. ISWC 2017
[2] Walter S, Unger C, and Cimiano P. “M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple
Languages”. ISWC 2014
[3] Hakimov S, Walter S, Unger C, and Cimiano P. “Applying semantic parsing to question answering over linked data:
Addressing the lexical gap”. NLDB 2015

Dependency parse tree-based Semantic Parsing Approach
Oracle
SELECT ?x WHERE { dbr:Wikipedia dbo:author ?x. }
Semantic Composition
52

Dependency parse tree-based Semantic Parsing Approach
Oracle
SELECT ?x WHERE { dbr:Wikipedia dbo:author ?x. }
Structure Learning
53

Approach
• inference

• factor graph model
54

Inference
• Metropolis-Hastings: exploring huge search space (ca. 10 million resources, 2000 properties)
• Linking to Knowledge Base (L2KB)
•objective : compare set of URIs to the expected set of URIs
• Query Construction (QC)
•objective : compare the constructed query to the expected query
55
Input: initial state

L2KB Sampling
Explore the edges and assign Knowledge Base IDs based on lemmas of nodes
Inverted index: Ontology labels, lexicon from M-ATOLL & word embeddings
56

L2KB Sampling
Explore the edges and assign Knowledge Base IDs based on lemmas of nodes
Check the triple pattern- ?x dbo:author dbr:Wikipedia : Slot 2, dbr:Wikipedia dbo:author ?x : Slot1
Inverted index: Ontology labels, lexicon from M-ATOLL & word embeddings
57

QC Sampling
Assign DUDES with Return Variable and KB ID to nodes and assign remaining slots
61

Model Representation
63
Observed variables: dependency parse tree
Hidden variables: KB IDs, slot, DUDE types

64

65
Child_Node [POS: PROPN, DUDES: Resource] Parent_Node [POS: VERB, DUDES:Property]
Sample Feature

Evaluation
Dataset: Question Answering over Linked Data (QALD), 6th challenge
English, German, Spanish, Italian, French, Dutch, Romanian, Farsi
350 for train, 100 for test
Unger, Christina, Axel-Cyrille Ngonga Ngomo, and Elena Cabrio (2016). “6th open challenge on question
answering over linked data (qald-6)”. In: Semantic Web Evaluation Challenge.
66

Evaluation
DBP: lexicon from DBpedia Ontology & WordNet 
M-ATOLL: lexicon induced by the M-ATOLL (Walter et al. 2014) 
Embed: lexicon added using pre-trained word embeddings (Mikolov et al. 2013)
Walter, Sebastian, Christina Unger, and Philipp Cimiano. “M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple Languages”. ISWC 2014
Mikolov, Tomas et al. “Distributed representations of words and phrases and their compositionality”. NIPS 2013
67

Evaluation
Embed: lexicon added using pre-trained word embeddings (Mikolov et al. 2013) 
Dict: manually deﬁned lexicon
68

Evaluation
Embed: lexicon added using pre-trained word embeddings (Mikolov et al. 2013) 
Dict: manually deﬁned lexicon
69

Evaluation of Lexicon
70
English German Spanish

Evaluation of different architectural
choices for “simple question”
answering
71

Outline
• SimpleQuestions dataset, 74k samples, Freebase data

• Question: “Who wrote Mildred Pierced?”

• Fact: mildred_pierced, book.written_work.author, stuart_kaminsky

• Answer: mildred_pierced, book.written_work.author, ?x

• Systematic comparison of diﬀerent model architectures
72
Hakimov S, Jebbara S, Cimiano P. “Evaluating Architectural Choices for Deep Learning Approaches for Question
Answering over Knowledge Bases”. ICSC 2019

Named Entity Recognition
• Used by all models to predict the entity span

• Character & word embeddings

• Trained using weak supervision: inference is correct if the
expected entity has been found
73

Candidate Retrieval
Candidate pairs: subject and predicate
74
Surface form frequencies

Model1: BiLSTM-Softmax
75
Model2: BiLSTM-KB Model3: BiLSTM-Binary Model4: Fasttext [1]
Architectures
[1] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, "Bag of Tricks for Eﬃcient Text Classiﬁcation", 2016, arxiv.org

76
Model2: BiLSTM-KB Model3: BiLSTM-Binary Model4: Fasttext [1]
Architectures
[1] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, "Bag of Tricks for Eﬃcient Text Classiﬁcation", 2016, arxiv.org

Evaluation
77
Named Entity Linking
Named Entity Recognition
Predicate & Answer Prediction

Evaluation
78
Named Entity Recognition 0,82

Evaluation
79

Evaluation
80

Discussion
• Manual Eﬀort

• Syntax and Semantics

• Multilinguality

• Cross-domain Transferability

• Training Data Size and Search Space
82

Discussion
83
Systems Manual Effort Syntax & Semantics Multilinguality
Cross-domain
transferability
Training Data &  
Number of KB IDs
CCG-based 
(Chapter 6)
CCG combination rules 
manual lexicon
• learned in tandem

• CCG for syntax

• lambda calculus for semantics
manual effort is required manual effort is required
600 training instances

750 entities
Dependency-
based 
(Chapter 7)
feature templates
syntax is given

DUDES as a formalism
an adaptable solution
• a dependency parser
required.

• e.g. biomedical domain
300 training instances

<= 10 mil. entities 
>= 2000 predicates
BiLSTM-Softmax 
(Chapter 8)
-
• word & char embed. for lexical
& contextual info

• semantics is limited to a single
predicate and a subject entity

only word & char
embed.

only word & char embed.
>= 75K instances 
<= 2 mil. entities

Research Questions
85
• Ontology lexicalisations, e.g. M-ATOLL (Walter et al. 2014)

• Ontology labels, e.g. DBpedia labels

• Dictionaries

• WordNet synsets

• Induced from contextual embeddings of words
RQ1: How to map natural language phrases into knowledge base entries for
multiple languages? Which linguistic resources can be used?

Research Questions
86
• Supervised models with objective for disambiguation

• CCG-based model

• uses lexical and syntactic information as features

• Dependency tree-based model

• syntactic dependency between words, lexical similarity, ontology restrictions

• Neural network-based model

• ranking objective of predicates
RQ2: How to disambiguate URIs when multiple candidates are retrieved from
mapping natural language tokens into knowledge base entries?

Research Questions
87
• Semantic Parsing

• bottom-up composition

• CCG-based model

• learns the syntax and semantics together

• Dependency tree-based model

• learns composing semantics based on dependency trees

RQ3: How to use syntactic information of a natural language question together
with semantic representations of entries in a knowledge base?

Research Questions
88
RQ4: What are the advantages and the disadvantages of a multilingual QA system
vs. a monolingual system built for each language?
• Advantages

• Multilingual: broader coverage

• Monolingual: higher performance, e.g. Xser (Xu et al. 2014) 0.7 F1 on QALD-4

• Disadvantages

• Multilingual: lower performance, e.g. AMUSE 0.3 F1 on QALD-6

• Monolingual: need expertise, e.g. CCG rules, lexicon

Research Questions
89
• CCG-based model

• grammar rules, manually defined lexicon

• language-specific

• Dependency parse tree-based model

• dependency parse tree generator

• lexicon

• Neural network-based model

• depends on the training data

RQ5: What effort is required to adapt our QA pipelines to another language?

Conclusion
• Address the lexical gap for QA systems

• Incorporate ontology lexicalizations to reduce the lexical gap

• Use Universal Dependencies to build language-independent QA pipeline

• Multilingual semantic parsing for Question Answering

• Evaluate diﬀerent QA models under the certain conditions

• Highlight importance of building blocks of a pipeline for a fair comparison
90

CCG-based Semantic Parsing Approach
91

GENLEX
Barack Obama is married to Michelle Obama
[1] Zettlemoyer, Luke S and Michael Collins (2005). “Learning to Map Sentences to Logical Form : Structured
Classiﬁcation with Probabilistic Categorial Grammars”. In: 21st Conference on Uncertainty in Artiﬁcial Intelligence
[2] Hakimov, Sherzod et al. (2015). “Applying semantic parsing to question answering over linked data: Addressing the lexical
gap”. In: International Conference on Applications of Natural Language to Information Systems 92

Barack Obama is married to Michelle Obama
94

Lexicon
During sampling, compute cosine similarity of words into Ontology labels of properties 
Vectors for multiple words are summed, e.g. V(population) + V(total)
96

Dataset - GeoQuery
600 train, 280 test instances
98

Dataset - SimpleQuestions
Train: 75,910
Validation: 10,845
Test: 21,687
Total: 108,442
101

Dataset Complexity
Lexical Overlap, tokens exist in both
102

•recursively computing the meaning of each node from the meanings of its child nodes
•build the meaning representation bottom-up
ComposeSemantics(dependency-parse-tree)
If parse-tree is a terminal node (word) then
return an atomic lexical meaning for the word.
Else
For each child, subtreei, of parse-tree
Create its MR by calling ComposeSemantics(subtreei)
Return an MR by properly combining the resulting MRs
for its children into an MR for the overall parse-tree.
103

108
• States can be ranked by

• objective score : compare to ground truth

• model score: computed using feature weights

• Training procedures

• switch between model & objective score after every iteration

• Softmax layer that predicts predicates seen
during training

• Encoding layer: word & character

• BiLSTM: two LSTM layers (backward, forward)
111

Model2: BiLSTM-KB
• Learn embedding of predicates in KB

• Encoding layer: word & character

• BiLSTM: two LSTM layers (backward, forward)

• Output layer computes cosine similarity to all
predicates and chooses the closest
112

Model3: BiLSTM-Binary
• Encoding layer: encodes input question with word &
character embeddings

• Encoding layer: encodes input predicate with word &
character embeddings

• Output layer: binary decision
113

Model4: Fasttext
• Document classiﬁcation tool developed by Facebook*

• Uses word & character embeddings

• Softmax layer that predicts the expected predicate
114
* http://fasttext.cc

116
Generative Models -> computing joint probability distribution on p(y|x) 
HMM -> y_t depends on y_t-1 and x_t
how output label y_t generates input vector x
Discriminative Models -> computing conditional probability distribution over inputs x and outputs y 
CRF -> doesn’t have any limitation like that
how feature vector x gets assignment y_t

Manual Effort
• CCG-based model

• deﬁne CCG grammar rules, hand-crafted lexicon for domain independent phrases


• feature functions

• Neural network-based model (BiLSTM-Softmax)

• not required
118

Syntax and Semantics
• CCG-based model

• syntax and semantics is learned in tandem

• CCG for syntax and the lambda calculus for semantics

• syntax guides the semantics of the sentences


• syntax is given and the semantics is learned

• DUDES as a formalism for semantics, syntax is based on dependency trees from Universal Dependencies


• syntactic information is learned, e.g. word and character embeddings provide contextual information

• semantics is based on a single subject and the predicate, simpler task
119

Multilinguality
• CCG-based model

• CCG grammar rules needs to be extended


• a multilingual solution


• can be adapted to other languages, e.g. word & characters as features
120

Cross-domain Transferability
• CCG-based model

• manual eﬀort is required: CCG rules, lexicon


• dependency parse trees e.g. biomedical domain


• can be easily adapted
121

Training Data Size and Search Space
• CCG-based model



• heavily depends on the data
122

Learning Multilingual Semantic Parsers for Question Answering over Linked Data - A comparison of neural and probabilistic graphical model architectures

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Learning Multilingual Semantic Parsers for Question Answering over Linked Data - A comparison of neural and probabilistic graphical model architectures

Similar to Learning Multilingual Semantic Parsers for Question Answering over Linked Data - A comparison of neural and probabilistic graphical model architectures (20)

Recently uploaded

Recently uploaded (20)

Learning Multilingual Semantic Parsers for Question Answering over Linked Data - A comparison of neural and probabilistic graphical model architectures