08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Encoding Linguistic Structures with Graph Convolutional Networks
1. Encoding Linguistic Structures with
Graph Convolutional Networks
Diego Marcheggiani
Joint work with IvanTitov and Joost Bastings
University of Amsterdam
University of Edinburgh
@South England NLP Meetup
2. Structured (Linguistic) Priors
Sequa makes and repairs jet engines.
creator
creation
entity repaired
repairer
SBJ COORD
OBJ
CONJ NMOD
ROOT
“I voted for Palpatine because he was
most aligned with my values,” she said.
2
4. Sequence to Sequence
} Language is not (only) a sequence of words
} We have linguistic knowledge
4
[Sutskever et al., 2014]
the black cat
le chat noire <s>
<s> le chat noire
5. Sequence to Sequence
} Language is not (only) a sequence of words
} We have linguistic knowledge
Encode structured linguistic knowledge into NN using
Graph Convolutional Networks
5
the black cat
le chat noire <s>
<s> le chat noire
6. Outline
} Semantic Role Labeling
} Graph Convolutional Networks (GCN)
} Syntactic GCN for Semantic Role Labeling (SRL)
} SRL Model
} Exploiting Semantics in Neural MachineTranslation with GCNs
Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling
Diego Marcheggiani,IvanTitov. In Proceedings of EMNLP, 2017.
Exploiting Semantics in Neural MachineTranslation with Graph Convolutional Networks
Diego Marcheggiani,Joost Bastings,IvanTitov. In Proceedings of NAACL-HLT, 2018.
6
7. Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence
Sequa makes and repairs jet engines.
Sequa makes and repairs jet engines.
7
8. Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence
} Discover and disambiguate predicates
8
Sequa makes and repairs jet engines.
make.01 repair.01
Sequa makes and repairs jet engines.
9. } Predicting the predicate-argument structure of a sentence
} Discover and disambiguate predicates
} Identify arguments and label them with their semantic roles
Sequa makes and repairs jet engines.
make.01 repair.01
Creator
Semantic Role Labeling
9
10. } Predicting the predicate-argument structure of a sentence
} Discover and disambiguate predicates
} Identify arguments and label them with their semantic roles
Sequa makes and repairs jet engines.
make.01 repair.01
Creator
Creation
Semantic Role Labeling
10
11. } Predicting the predicate-argument structure of a sentence
} Discover and disambiguate predicates
} Identify arguments and label them with their semantic roles
Sequa makes and repairs jet engines.
make.01 repair.01
Creator
Creation
Entity repaired
Repairer
Semantic Role Labeling
11
12. Semantic Role Labeling
} Only the head of an argument is labeled
} Sequence labeling task for each predicate
} Focus on argument identification and labeling
12
Sequa makes and repairs jet engines.
make.01 repair.01
Creator
Creation
Entity repaired
Repairer
13. Semantic Role Labeling
13
Question answering
Narayanan and Harabagiu 2004
Shen and Lapata 2007
Khashabi et al. 2018
Machine translation
Wu and Fung 2009
Aziz et al. 2011
Information extraction
Surdeanu et al. 2003
Christensen et al. 2010
15. Related work
} SRL systems that use syntax with simple NN architectures
} [FitzGerald et al., 2015]
} [Roth and Lapata,2016]
} Recent models ignore linguistic bias
} [Zhou and Xu, 2014]
} [He et al., 2017]
} [Marcheggiani et al., 2017]
15
Tutorial on Semantic Role
Labeling at EMNLP 2017
16. Motivations
} Some semantic dependencies are mirrored in the syntactic graph
Sequa makes and repairs jet engines.
creator
creation
SBJ COORD
OBJ
CONJ NMOD
ROOT
16
17. Sequa makes and repairs jet engines.
creator
creation
entity repaired
repairer
SBJ COORD
OBJ
CONJ NMOD
ROOT
Motivations
} Some semantic dependencies are mirrored in the syntactic graph
} Not all of them – syntax-semantics interface is not trivial
17
18. Outline
} Semantic Role Labeling
} Graph Convolutional Networks (GCN)
} Syntactic GCN for Semantic Role Labeling (SRL)
} SRL Model
} Exploiting Semantics in Neural MachineTranslation with GCNs
18
19. Graph Convolutional Networks (message passing)
Undirected graph
[Gori et al. 2005
Scarselli et al. 2009
Kipf and Welling,2016]
19
20. Graph Convolutional Networks (message passing)
Undirected graph Update of the blue node
[Gori et al. 2005
Scarselli et al. 2009
Kipf and Welling,2016]
20
21. Graph Convolutional Networks (message passing)
Undirected graph Update of the blue node
[Kipf and Welling,2016]
21
hi = ReLU
0
@W0hi +
X
j2N (v)
W1hj
1
A
<latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit><latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit><latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit><latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit>
Neighborhood
Self loop
22. GCNs Pipeline
Hidden layer Hidden layer
Input Output
X = H(0)
H(1) H(2)
Z = H(n)
Initial feature
representation of
nodes
Representation
informed by nodes’
neighborhood
[Kipf and Welling,2016]
…
…
…
22
23. GCNs Pipeline
Hidden layer Hidden layer
Input Output
X = H(0)
H(1) H(2)
Z = H(n)
[Kipf and Welling,2016]
…
…
…
Extend GCNs for syntactic dependency trees
Initial feature
representation of
nodes
Representation
informed by nodes’
neighborhood
23
24. Outline
} Semantic Role Labeling
} Graph Convolutional Networks (GCN)
} Syntactic GCN for Semantic Role Labeling (SRL)
} SRL Model
} Exploiting Semantics in Neural MachineTranslation with GCNs
24
35. Syntactic GCNs
Syntactic neighborhood Self-loop is included in N
Messages are direction and
label specific
h(k+1)
v = ReLU
0
@
X
u2N (v)
W
(k)
L(u,v)h(k)
u + b
(k)
L(u,v)
1
A
Message
[Marcheggiani andTitov, 2017]
35
36. } Overparametrized: one matrix for each label-direction pair
}
Syntactic GCNs
Syntactic neighborhood
W
(k)
L(u,v) = V
(k)
dir(u,v)
Self-loop is included in N
Messages are direction and
label specific
h(k+1)
v = ReLU
0
@
X
u2N (v)
W
(k)
L(u,v)h(k)
u + b
(k)
L(u,v)
1
A
Message
[Marcheggiani andTitov, 2017]
36
37. Edge-wise Gates
} Not all edges are equally important for the final task
[Marcheggiani andTitov, 2017]
37
38. Edge-wise Gates
} Not all edges are equally important for the final task
} We should not blindly rely on predicted syntax
[Marcheggiani andTitov, 2017]
38
39. Edge-wise Gates
} Not all edges are equally important for the final task
} We should not blindly rely on predicted syntax
} Gates decide the“importance” of each message
Lane disputed those estimates
NMOD
SBJ
OBJ
ReLU(⌃·) ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
g g g g g g g g g g
[Marcheggiani andTitov, 2017]
39
40. Edge-wise Gates
} Not all edges are equally important for the final task
} We should not blindly rely on predicted syntax
} Gates decide the“importance” of each message
Gates depend on
nodes and edges Lane disputed those estimates
NMOD
SBJ
OBJ
ReLU(⌃·) ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
g g g g g g g g g g
[Marcheggiani andTitov, 2017]
40
41. Outline
} Semantic Role Labeling
} Graph Convolutional Networks (GCN)
} Syntactic GCN for Semantic Role Labeling (SRL)
} SRL Model
} Exploiting Semantics in Neural MachineTranslation with GCNs
41
42. Our Model
} Word representation
} Bidirectional LSTM encoder
} GCN Encoder
} Local role classifier
[Marcheggiani andTitov, 2017]
42
43. Word Representation
} Pretrained word embeddings
} Word embeddings
} POS tag embeddings
} Predicate lemma embeddings
Lane disputed those estimates
word
representation
[Marcheggiani andTitov, 2017]
43
44. BiLSTM Encoder
} Encode each word with its left and right context
} Stacked BiLSTM
Lane disputed those estimates
word
representation
J layers
BiLSTM
[Marcheggiani andTitov, 2017]
44
45. GCNs Encoder
} Syntactic GCNs after BiLSTM encoder
} Add syntactic information
} Skip connections
} Longer dependencies are captured
Lane disputed those estimates
word
representation
J layers
BiLSTM
dobj
nmodnsubj
K layers
GCN
[Marcheggiani andTitov, 2017]
45
46. Semantic Role Classifier
Lane disputed those estimates
word
representation
J layers
BiLSTM
dobj
nmodnsubj
K layers
GCN
A1
Classifier
predicate
representation
candidate argument
representation
} Local log-linear classifier
p(r|ti, tp, l) / exp(Wl,r(ti tp))
46
47. Experiments
} Data
} CoNLL-2009 dataset - English and Chinese
} F1 evaluation measure
} Model
} Hyperparameters tuned on English development set
} State-of-the-art predicate disambiguation models
[Marcheggiani andTitov, 2017]
47
54. Syntactic Graph Convolutional Networks
54
} Fast and simple
} Can be seamlessly applied to other tasks
Graph Convolutional Encoders for Syntax-aware Machine Translation
Joost Bastings,IvanTitov,Wilker Aziz,Diego Marcheggiani,Khalil Sima'an.
In Proceedings of EMNLP, 2017.
55. Syntactic Graph Convolutional Networks
55
} Fast and simple
} Can be seamlessly applied to other tasks
Graph Convolutional Encoders for Syntax-aware Machine Translation
Joost Bastings,IvanTitov,Wilker Aziz,Diego Marcheggiani,Khalil Sima'an.
In Proceedings of EMNLP, 2017.
Improvements on
English to German and
English to Czech translations
56. Multi-document Question Answering
56
[De Cao et al., 2018]
• Nodes are entities and edges are co-reference links
• Inference on a graph representing the documents collection
61. Outline
} Semantic Role Labeling
} Graph Convolutional Networks (GCN)
} Syntactic GCN for Semantic Role Labeling (SRL)
} SRL Model
} Exploiting Semantics in Neural MachineTranslation with GCNs
61
62. Motivations [Marcheggiani at al., 2018]
62
John gave his wonderful wife a nice present .
Giver
Thing given
Entity given to
John gave a nice present to his wonderful wife .
Giver
Entity given to
Thing given
63. Motivations
SRL helps to generalize over different surface realizations
of the same underlying “meaning”.
[Marcheggiani at al., 2018]
63
John gave his wonderful wife a nice present .
Giver
Thing given
Entity given to
John gave a nice present to his wonderful wife .
Giver
Entity given to
Thing given
66. Related work
} Semantics in statistical MT
} [Wu and Fung,2009]
} [Liu and Gildea, 2010]
} [Aziz et al., 2011]
} ...
} Syntax in neural MT
} [Sennrich and Haddow,2016]
} [Aharoni and Goldberg,2017 ]
} [Bastings et al., 2017]
} …
} Semantics in neural MT
} ???
[Marcheggiani at al., 2018]
66
67. Predicate-argument encoding
67
John gave his wonderful wife a nice present
WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Semantic
GCN
Semantic
GCN WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Giver
Thing given
Entity given to
68. Our Model
} Standard sequence2sequence with attention
} Semantic GCN encoder on top of a bidirectional RNN
} RNN decoder
[Marcheggiani at al., 2018]
68
69. Our model
John gave his wonderful wife a nice present
WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
BiRNN/
CNN
Semantic
GCN
Semantic
GCN WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
<bos> John
John
+
RNN
DECODER
ATTENTION
MECHANISM
[Marcheggiani at al., 2018]
69
70. Our model
John gave his wonderful wife a nice present
WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
BiRNN/
CNN
Semantic
GCN
Semantic
GCN WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
<bos> John
John
+
RNN
DECODER
ATTENTION
MECHANISM
[Marcheggiani at al., 2018]
70
71. Experiments
} Data
} WMT‘16 English-German dataset (~4.5 million sentence pairs)
} BLEU as evaluation measure
} Model
} Hyperparameters tuned on News Commentary En-De (~226K sentence pairs)
} GRU as RNN
[Marcheggiani at al., 2018]
71
79. Analysis
John sold the car to Mark .
Seller Thing sold Buyer
The boy walking down the dusty road is drinking a beer
Walker AM-DIR
Drinker Liquid
SOURCE
SEM GCN
BiRNN John verkaufte das Auto nach Mark .
John verkaufte das Auto an Mark .
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SOURCE
[Marcheggiani at al., 2018]
79
BiRNN mistranslates “to” as “nach” (directionality)
80. Analysis
John sold the car to Mark .
Seller Thing sold Buyer
The boy walking down the dusty road is drinking a beer
Walker AM-DIR
Drinker Liquid
SOURCE
SEM GCN
BiRNN John verkaufte das Auto nach Mark .
John verkaufte das Auto an Mark .
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SOURCE
[Marcheggiani at al., 2018]
80
BiRNN mistranslates “to” as “nach” (directionality)
81. John sold the car to Mark .
Seller Thing sold Buyer
The boy walking down the dusty road is drinking a beer
Walker AM-DIR
Drinker Liquid
SOURCE
SEM GCN
BiRNN John verkaufte das Auto nach Mark .
John verkaufte das Auto an Mark .
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SOURCE
81
BiRNN mistranslates “to” as “nach” (directionality)
Analysis [Marcheggiani at al., 2018]
82. The boy sitting on a bench in the park plays chess .
Thing sitting Location Player Game
AM-LOC
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SEM GCN
BiRNN Der Junge auf einer Bank im Park spielt Schach .
Der Junge sitzt auf einer Bank im Park Schach .
SOURCE
Analysis [Marcheggiani at al., 2018]
82
Both translations are wrong,
but the BiRNN’s one is grammatically correct
83. The boy sitting on a bench in the park plays chess .
Thing sitting Location Player Game
AM-LOC
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SEM GCN
BiRNN Der Junge auf einer Bank im Park spielt Schach .
Der Junge sitzt auf einer Bank im Park Schach .
SOURCE
Analysis [Marcheggiani at al., 2018]
83
Both translations are wrong,
but the BiRNN’s one is grammatically correct
84. The boy sitting on a bench in the park plays chess .
Thing sitting Location Player Game
AM-LOC
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SEM GCN
BiRNN Der Junge auf einer Bank im Park spielt Schach .
Der Junge sitzt auf einer Bank im Park Schach .
SOURCE
Analysis [Marcheggiani at al., 2018]
84
Both translations are wrong,
but the BiRNN’s one is grammatically correct
85. Conclusion
} GCNs for encoding linguistic structures into NN
} Semantics, coreference, discourse
} Fast
} Cheap
} State-of-the-art model for dependency-based SRL
} First to exploit semantics in NMT
85