SlideShare a Scribd company logo
Should Neural Network
Architecture Reļ¬‚ect
Linguistic Structure?
CoNLLAugust 3, 2017
Adhi Kuncoroā€Ø
(Oxford)
Phil Blunsomā€Ø
(DeepMind)
Dani Yogatamaā€Ø
(DeepMind)
Chris Dyerā€Ø
(DeepMind/CMU)
Miguel Ballesterosā€Ø
(IBM)
Wang Lingā€Ø
(DeepMind)
Noah A. Smithā€Ø
(UW)
Ed Grefenstetteā€Ø
(DeepMind)
Should Neural Network
Architecture Reļ¬‚ect
Linguistic Structure?
Yes.
CoNLLAugust 3, 2017
But why?
ā€Ø
And how?
CoNLLAugust 3, 2017
(1) a. The talk I gave did not appeal to anybody.
Modeling languageā€Ø
Sentences are hierarchical
Examples adapted from Everaert et al. (TICS 2015)
(1) a. The talk I gave did not appeal to anybody.
b. *The talk I gave appealed to anybody.
Modeling languageā€Ø
Sentences are hierarchical
Examples adapted from Everaert et al. (TICS 2015)
(1) a. The talk I gave did not appeal to anybody.
b. *The talk I gave appealed to anybody.
Modeling languageā€Ø
Sentences are hierarchical
NPI
Examples adapted from Everaert et al. (TICS 2015)
(1) a. The talk I gave did not appeal to anybody.
b. *The talk I gave appealed to anybody.
Generalization hypothesis: not must come before anybody
Modeling languageā€Ø
Sentences are hierarchical
NPI
Examples adapted from Everaert et al. (TICS 2015)
(1) a. The talk I gave did not appeal to anybody.
b. *The talk I gave appealed to anybody.
Generalization hypothesis: not must come before anybody
(2) *The talk I did not give appealed to anybody.
Modeling languageā€Ø
Sentences are hierarchical
NPI
Examples adapted from Everaert et al. (TICS 2015)
Examples adapted from Everaert et al. (TICS 2015)
containing anybody. (This structural conļ¬guration is called c(onstituent)-command in the
linguistics literature [31].) When the relationship between not and anybody adheres to this
structural conļ¬guration, the sentence is well-formed.
In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not
in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence
is not well-formed.
(A) (B)
X X
X X X X
The book X X X The book X appealed to anybody
did not
that I bought appeal to anybody that I did not buy
Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item.
(B) Negative polarity not licensed. Negative element does not c-command negative polarity item.
- the psychological reality of structural sensitivtyā€Ø
is not empirically controversial
- many different theories of the details of structure
- more controversial hypothesis: kids learn languageā€Ø
easily because they donā€™t consider many ā€œobviousā€ struct
insensitive hypotheses
Language is hierarchical
The talk
I gave
did not
appeal to anybody
appealed to anybodyThe talk
I did not give
Examples adapted from Everaert et al. (TICS 2015)
containing anybody. (This structural conļ¬guration is called c(onstituent)-command in the
linguistics literature [31].) When the relationship between not and anybody adheres to this
structural conļ¬guration, the sentence is well-formed.
In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not
in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence
is not well-formed.
(A) (B)
X X
X X X X
The book X X X The book X appealed to anybody
did not
that I bought appeal to anybody that I did not buy
Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item.
(B) Negative polarity not licensed. Negative element does not c-command negative polarity item.
Generalization: not must ā€œstructurally precedeā€ anybody
- the psychological reality of structural sensitivtyā€Ø
is not empirically controversial
- many different theories of the details of structure
- more controversial hypothesis: kids learn languageā€Ø
easily because they donā€™t consider many ā€œobviousā€ struct
insensitive hypotheses
Language is hierarchical
The talk
I gave
did not
appeal to anybody
appealed to anybodyThe talk
I did not give
Examples adapted from Everaert et al. (TICS 2015)
containing anybody. (This structural conļ¬guration is called c(onstituent)-command in the
linguistics literature [31].) When the relationship between not and anybody adheres to this
structural conļ¬guration, the sentence is well-formed.
In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not
in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence
is not well-formed.
(A) (B)
X X
X X X X
The book X X X The book X appealed to anybody
did not
that I bought appeal to anybody that I did not buy
Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item.
(B) Negative polarity not licensed. Negative element does not c-command negative polarity item.
Generalization: not must ā€œstructurally precedeā€ anybody
- the psychological reality of structural sensitivtyā€Ø
is not empirically controversial
- many different theories of the details of structure
- more controversial hypothesis: kids learn languageā€Ø
easily because they donā€™t consider many ā€œobviousā€ struct
insensitive hypotheses
Language is hierarchical
The talk
I gave
did not
appeal to anybody
appealed to anybodyThe talk
I did not give
Examples adapted from Everaert et al. (TICS 2015)
containing anybody. (This structural conļ¬guration is called c(onstituent)-command in the
linguistics literature [31].) When the relationship between not and anybody adheres to this
structural conļ¬guration, the sentence is well-formed.
In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not
in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence
is not well-formed.
(A) (B)
X X
X X X X
The book X X X The book X appealed to anybody
did not
that I bought appeal to anybody that I did not buy
Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item.
(B) Negative polarity not licensed. Negative element does not c-command negative polarity item.
Generalization: not must ā€œstructurally precedeā€ anybody
- the psychological reality of structural sensitivtyā€Ø
is not empirically controversial
- many different theories of the details of structure
- more controversial hypothesis: kids learn languageā€Ø
easily because they donā€™t consider many ā€œobviousā€ struct
insensitive hypotheses
Language is hierarchical
The talk
I gave
did not
appeal to anybody
appealed to anybodyThe talk
I did not give
Examples adapted from Everaert et al. (TICS 2015)
containing anybody. (This structural conļ¬guration is called c(onstituent)-command in the
linguistics literature [31].) When the relationship between not and anybody adheres to this
structural conļ¬guration, the sentence is well-formed.
In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not
in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence
is not well-formed.
(A) (B)
X X
X X X X
The book X X X The book X appealed to anybody
did not
that I bought appeal to anybody that I did not buy
Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item.
(B) Negative polarity not licensed. Negative element does not c-command negative polarity item.
Generalization: not must ā€œstructurally precedeā€ anybody
- the psychological reality of structural sensitivtyā€Ø
is not empirically controversial
- many different theories of the details of structure
- more controversial hypothesis: kids learn languageā€Ø
easily because they donā€™t consider many ā€œobviousā€ā€Ø
structural insensitive hypotheses
Language is hierarchical
The talk
I gave
did not
appeal to anybody
appealed to anybodyThe talk
I did not give
ā€¢ Recurrent neural networks are incredibly powerful models of
sequences (e.g., of words)
ā€¢ In fact, RNNs are Turing complete!ā€Ø
(Siegelmann, 1995)
ā€¢ But do they make good generalizations from ļ¬nite samples of
data?
ā€¢ What inductive biases do they have?
ā€¢ Using syntactic annotations to improve inductive bias.
ā€¢ Inferring better compositional structure without explicit
annotation
Recurrent neural networksā€Ø
Good models of language?
(part 1)
(part 2)
ā€¢ Recurrent neural networks are incredibly powerful models of
sequences (e.g., of words)
ā€¢ In fact, RNNs are Turing complete!ā€Ø
(Siegelmann, 1995)
ā€¢ But do they make good generalizations from ļ¬nite samples of
data?
ā€¢ What inductive biases do they have?
ā€¢ Using syntactic annotations to improve inductive bias.
ā€¢ Inferring better compositional structure without explicit
annotation
Recurrent neural networksā€Ø
Good models of language?
(part 1)
(part 2)
ā€¢ Understanding the biases of neural networks is tricky
ā€¢ We have enough trouble understanding the representations they learn in
speciļ¬c cases, much less general cases!
ā€¢ But, there is lots of evidence RNNs prefer sequential recency
ā€¢ Evidence 1: Gradients become attenuated across time
ā€¢ Analysis; experiments with synthetic datasetsā€Ø
(yes, LSTMs help but they have limits)
ā€¢ Evidence 2: Training regimes like reversing sequences in seq2seq learning
ā€¢ Evidence 3: Modeling enhancements to use attention (direct connections back
in remote time)
ā€¢ Evidence 4: Linzen et al. (2017) ļ¬ndings on English subject-verb agreement.
Recurrent neural networksā€Ø
Inductive bias
ā€¢ Understanding the biases of neural networks is tricky
ā€¢ We have enough trouble understanding the representations they learn in
speciļ¬c cases, much less general cases!
ā€¢ But, there is lots of evidence RNNs have a bias for sequential recency
ā€¢ Evidence 1: Gradients become attenuated across time
ā€¢ Functional analysis; experiments with synthetic datasetsā€Ø
(yes, LSTMs/GRUs help, but they also forget)
ā€¢ Evidence 2: Training regimes like reversing sequences in seq2seq learning
ā€¢ Evidence 3: Modeling enhancements to use attention (direct connections back
in remote time)
ā€¢ Evidence 4: Linzen et al. (2017) ļ¬ndings on English subject-verb agreement.
Recurrent neural networksā€Ø
Inductive bias
The keys to the cabinet in the closet is/are on the table.
AGREE
Linzen, Dupoux, Goldberg (2017)ā€Ø
ā€¢ Understanding the biases of neural networks is tricky
ā€¢ We have enough trouble understanding the representations they learn in
speciļ¬c cases, much less general cases!
ā€¢ But, there is lots of evidence RNNs have a bias for sequential recency
ā€¢ Evidence 1: Gradients become attenuated across time
ā€¢ Functional analysis; experiments with synthetic datasetsā€Ø
(yes, LSTMs/GRUs help, but they also forget)
ā€¢ Evidence 2: Training regimes like reversing sequences in seq2seq learning
ā€¢ Evidence 3: Modeling enhancements to use attention (direct connections back
in remote time)
ā€¢ Evidence 4: Linzen et al. (2017) ļ¬ndings on English subject-verb agreement.
Recurrent neural networksā€Ø
Inductive bias
Chomsky (crudely paraphrasing 60 years of work):ā€Ø
ā€Ø
Sequential recency is not the right bias forā€Ø
modeling human language.
An alternativeā€Ø
Recurrent Neural Net Grammars
ā€¢ Generate symbols sequentially using an RNN
ā€¢ Add some control symbols to rewrite the history occasionally
ā€¢ Occasionally compress a sequence into a constituent
ā€¢ RNN predicts next terminal/control symbol based on the history of
compressed elements and non-compressed terminals
ā€¢ This is a top-down, left-to-right generation of a tree+sequence
Adhi Kuncoro
An alternativeā€Ø
Recurrent Neural Net Grammars
ā€¢ Generate symbols sequentially using an RNN
ā€¢ Add some control symbols to rewrite the history occasionally
ā€¢ Occasionally compress a sequence into a constituent
ā€¢ RNN predicts next terminal/control symbol based on the history of
compressed elements and non-compressed terminals
ā€¢ This is a top-down, left-to-right generation of a tree+sequence
Adhi Kuncoro
ā€¢ Generate symbols sequentially using an RNN
ā€¢ Add some control symbols to rewrite the history occasionally
ā€¢ Occasionally compress a sequence into a constituent
ā€¢ RNN predicts next terminal/control symbol based on the history of
compressed elements and non-compressed terminals
ā€¢ This is a top-down, left-to-right generation of a tree+sequence
An alternativeā€Ø
Recurrent Neural Net Grammars
Adhi Kuncoro
The hungry cat meows loudly
Example derivationā€Ø
stack action probability
stack action probability
NT(S) p(nt(S) | top)
stack action
(S
probability
NT(S) p(nt(S) | top)
stack action
(S
probability
NT(S) p(nt(S) | top)
NT(NP) p(nt(NP) | (S)
stack action
(S
(S (NP
probability
NT(S) p(nt(S) | top)
NT(NP) p(nt(NP) | (S)
stack action
(S
(S (NP
probability
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
stack action
(S
(S (NP
(S (NP The
probability
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
stack action
(S
(S (NP
(S (NP The
probability
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
stack action
(S
(S (NP
(S (NP The hungry
(S (NP The
probability
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
stack action
(S
(S (NP
(S (NP The hungry
(S (NP The
probability
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
GEN(cat) p(gen(cat) | . . .)
stack action
(S
(S (NP
(S (NP The hungry cat
(S (NP The hungry
(S (NP The
probability
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
GEN(cat) p(gen(cat) | . . .)
stack action
(S
(S (NP
(S (NP The hungry cat
(S (NP The hungry
(S (NP The
probability
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
GEN(cat) p(gen(cat) | . . .)
REDUCE p(reduce | . . .)
stack action
(S
(S (NP
(S (NP The hungry cat
(S (NP The hungry
(S (NP The
(S (NP The hungry cat )
probability
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
GEN(cat) p(gen(cat) | . . .)
REDUCE p(reduce | . . .)
stack action
(S
(S (NP
(S (NP The hungry cat
(S (NP The hungry
(S (NP The
(S (NP The hungry cat )
(S (NP The hungry cat)
Compress ā€œThe hungry catā€ā€Ø
into a single composite symbol
probability
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
GEN(cat) p(gen(cat) | . . .)
REDUCE p(reduce | . . .)
stack action
(S (NP The hungry cat)
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
GEN(cat) p(gen(cat) | . . .)
REDUCE p(reduce | . . .)
(S
(S (NP
(S (NP The hungry cat
(S (NP The hungry
(S (NP The
probability
stack action
(S (NP The hungry cat)
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
GEN(cat) p(gen(cat) | . . .)
REDUCE p(reduce | . . .)
(S
(S (NP
(S (NP The hungry cat
(S (NP The hungry
(S (NP The
NT(VP) p(nt(VP) | (S,
(NP The hungry cat)
probability
stack action
(S (NP The hungry cat) (VP
(S (NP The hungry cat)
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
GEN(cat) p(gen(cat) | . . .)
REDUCE p(reduce | . . .)
(S
(S (NP
(S (NP The hungry cat
(S (NP The hungry
(S (NP The
NT(VP) p(nt(VP) | (S,
(NP The hungry cat)
probability
stack action
GEN(meows)(S (NP The hungry cat) (VP
(S (NP The hungry cat)
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
GEN(cat) p(gen(cat) | . . .)
REDUCE p(reduce | . . .)
(S
(S (NP
(S (NP The hungry cat
(S (NP The hungry
(S (NP The
NT(VP) p(nt(VP) | (S,
(NP The hungry cat)
probability
stack action
GEN(meows)
REDUCE
(S (NP The hungry cat) (VP meows) GEN(.)
REDUCE
(S (NP The hungry cat) (VP meows) .)
(S (NP The hungry cat) (VP meows) .
(S (NP The hungry cat) (VP meows
(S (NP The hungry cat) (VP
(S (NP The hungry cat)
NT(S) p(nt(S) | top)
GEN(The) p(gen(The) | (S, (NP)
NT(NP) p(nt(NP) | (S)
GEN(hungry) p(gen(hungry) | (S, (NP,
The)
GEN(cat) p(gen(cat) | . . .)
REDUCE p(reduce | . . .)
(S
(S (NP
(S (NP The hungry cat
(S (NP The hungry
(S (NP The
NT(VP) p(nt(VP) | (S,
(NP The hungry cat)
probability
ā€¢ Valid (tree, string) pairs are in bijection to valid sequences of
actions (speciļ¬cally, the DFS, left-to-right traversal of the
trees)
ā€¢ Every stack conļ¬guration perfectly encodes the complete
history of actions.
ā€¢ Therefore, the probability decomposition is justiļ¬ed by the
chain rule, i.e.
(chain rule)
(prop 2)
p(x, y) = p(actions(x, y))
p(actions(x, y)) =
Y
i
p(ai | a<i)
=
Y
i
p(ai | stack(a<i))
(prop 1)
Some things you can showā€Ø
Modeling the next actionā€Ø
(S (NP The hungry cat) (VP meowsp(ai | )
Modeling the next actionā€Ø
(S (NP The hungry cat) (VP meowsp(ai | )
1. unbounded depth
Modeling the next actionā€Ø
(S (NP The hungry cat) (VP meowsp(ai | )
1. unbounded depth
1. Unbounded depth ā†’ recurrent neural nets
h1 h2 h3 h4
Modeling the next actionā€Ø
(S (NP The hungry cat) (VP meowsp(ai | )
1. Unbounded depth ā†’ recurrent neural nets
h1 h2 h3 h4
Modeling the next actionā€Ø
(S (NP The hungry cat) (VP meowsp(ai | )
1. Unbounded depth ā†’ recurrent neural nets
h1 h2 h3 h4
2. arbitrarily complex trees
2. Arbitrarily complex trees ā†’ recursive neural nets
(NP The hungry cat)Need representation for:
Syntactic compositionā€Ø
(NP The hungry cat)Need representation for:
NP
What head type?
Syntactic compositionā€Ø
The
(NP The hungry cat)Need representation for:
NP
What head type?
Syntactic compositionā€Ø
The hungry
(NP The hungry cat)Need representation for:
NP
What head type?
Syntactic compositionā€Ø
The hungry cat
(NP The hungry cat)Need representation for:
NP
What head type?
Syntactic compositionā€Ø
The hungry cat )
(NP The hungry cat)Need representation for:
NP
What head type?
Syntactic compositionā€Ø
TheNP hungry cat )
(NP The hungry cat)Need representation for:
Syntactic compositionā€Ø
TheNP hungry cat ) NP
(NP The hungry cat)Need representation for:
Syntactic compositionā€Ø
TheNP hungry cat ) NP
(NP The hungry cat)Need representation for:
(
Syntactic compositionā€Ø
TheNP hungry cat ) NP(
(NP The hungry cat)Need representation for:
Syntactic compositionā€Ø
TheNP cat ) NP(
(NP The (ADJP very hungry) cat)
Need representation for: (NP The hungry cat)
hungry
Syntactic compositionā€Ø
Recursion
TheNP cat ) NP(
(NP The (ADJP very hungry) cat)
Need representation for: (NP The hungry cat)
| {z }
v
v
Syntactic compositionā€Ø
Recursion
Modeling the next actionā€Ø
(S (NP The hungry cat) (VP meowsp(ai | )
1. Unbounded depth ā†’ recurrent neural nets
2. Arbitrarily complex trees ā†’ recursive neural nets
h1 h2 h3 h4
Modeling the next actionā€Ø
(S (NP The hungry cat) (VP meowsp(ai | )
1. Unbounded depth ā†’ recurrent neural nets
2. Arbitrarily complex trees ā†’ recursive neural nets
ā‡ REDUCE
h1 h2 h3 h4
Modeling the next actionā€Ø
(S (NP The hungry cat) (VP meowsp(ai | )
1. Unbounded depth ā†’ recurrent neural nets
2. Arbitrarily complex trees ā†’ recursive neural nets
ā‡ REDUCE
(S (NP The hungry cat) (VP meows)p(ai+1 | )
Modeling the next actionā€Ø
(S (NP The hungry cat) (VP meowsp(ai | )
1. Unbounded depth ā†’ recurrent neural nets
2. Arbitrarily complex trees ā†’ recursive neural nets
ā‡ REDUCE
(S (NP The hungry cat) (VP meows)p(ai+1 | )
3. limited updates
3. Limited updates to state ā†’ stack RNNs
ā€¢ If we accept the following two propositionsā€¦
ā€¢ Sequential RNNs have a recency bias
ā€¢ Syntactic composition learns to represent trees by
endocentric heads
ā€¢ then we can say that they have a bias for syntactic recency
rather than sequential recency
RNNGsā€Ø
Inductive bias?
ā€¢ If we accept the following two propositionsā€¦
ā€¢ Sequential RNNs have a recency bias
ā€¢ Syntactic composition learns to represent trees by
endocentric heads
ā€¢ then we can say that they have a bias for syntactic recency
rather than sequential recency
RNNGsā€Ø
Inductive bias?
(S (NP The keys (PP to the cabinet (PP in the closet))) (VP is/are
ā€¢ If we accept the following two propositionsā€¦
ā€¢ Sequential RNNs have a recency bias
ā€¢ Syntactic composition learns to represent trees by
endocentric heads
ā€¢ then we can say that they have a bias for syntactic recency
rather than sequential recency
RNNGsā€Ø
Inductive bias?
(S (NP The keys (PP to the cabinet (PP in the closet))) (VP is/are
ā€œkeysā€
z }| {
ā€¢ If we accept the following two propositionsā€¦
ā€¢ Sequential RNNs have a recency bias
ā€¢ Syntactic composition learns to represent trees by
endocentric heads
ā€¢ then we can say that they have a bias for syntactic recency
rather than sequential recency
RNNGsā€Ø
Inductive bias?
(S (NP The keys (PP to the cabinet (PP in the closet))) (VP is/are
ā€œkeysā€
z }| {
ā€¢ If we accept the following two propositionsā€¦
ā€¢ Sequential RNNs have a recency bias
ā€¢ Syntactic composition learns to represent trees by
endocentric heads
ā€¢ then we can say that they have a bias for syntactic recency
rather than sequential recency
RNNGsā€Ø
Inductive bias?
(S (NP The keys (PP to the cabinet (PP in the closet))) (VP is/are
The keys to the cabinet in the closet is/areSequential RNN:
ā€œkeysā€
z }| {
ā€¢ Generative models are great! We have deļ¬ned a
joint distribution p(x,y) over strings (x) and parse
trees (y)
ā€¢ We can look at two questions:
ā€¢ What is p(x) for a given x? [language modeling]
ā€¢ What is max p(y | x) for a given x? [parsing]
RNNGsā€Ø
A few empirical results
English PTB (Parsing)
Type F1
Petrov and Klein (2007) Gen 90.1
Shindo et al (2012)ā€Ø
Single model
Gen 91.1
Vinyals et al (2015)ā€Ø
PTB only
Disc 90.5
Shindo et al (2012)ā€Ø
Ensemble
Gen 92.4
Vinyals et al (2015)ā€Ø
Semisupervised
Disc+SemiS
up
92.8
Discriminativeā€Ø
PTB only
Disc 91.7
Generativeā€Ø
PTB only
Gen 93.6
Choe and Charniak (2016)ā€Ø
Semisupervised
Genā€Ø
+SemiSup
93.8
Fried et al. (2017)
Gen+Semi+
Ensemble
94.7
English PTB (Parsing)
Type F1
Petrov and Klein (2007) Gen 90.1
Shindo et al (2012)ā€Ø
Single model
Gen 91.1
Vinyals et al (2015)ā€Ø
PTB only
Disc 90.5
Shindo et al (2012)ā€Ø
Ensemble
Gen 92.4
Vinyals et al (2015)ā€Ø
Semisupervised
Disc+SemiS
up
92.8
Discriminativeā€Ø
PTB only
Disc 91.7
Generativeā€Ø
PTB only
Gen 93.6
Choe and Charniak (2016)ā€Ø
Semisupervised
Genā€Ø
+SemiSup
93.8
Fried et al. (2017)
Gen+Semi+
Ensemble
94.7
English PTB (Parsing)
Type F1
Petrov and Klein (2007) Gen 90.1
Shindo et al (2012)ā€Ø
Single model
Gen 91.1
Vinyals et al (2015)ā€Ø
PTB only
Disc 90.5
Shindo et al (2012)ā€Ø
Ensemble
Gen 92.4
Vinyals et al (2015)ā€Ø
Semisupervised
Disc+SemiS
up
92.8
Discriminativeā€Ø
PTB only
Disc 91.7
Generativeā€Ø
PTB only
Gen 93.6
Choe and Charniak (2016)ā€Ø
Semisupervised
Genā€Ø
+SemiSup
93.8
Fried et al. (2017)
Gen+Semi+
Ensemble
94.7
Type F1
Petrov and Klein (2007) Gen 90.1
Shindo et al (2012)ā€Ø
Single model
Gen 91.1
Vinyals et al (2015)ā€Ø
PTB only
Disc 90.5
Shindo et al (2012)ā€Ø
Ensemble
Gen 92.4
Vinyals et al (2015)ā€Ø
Semisupervised
Disc+SemiS
up
92.8
Discriminativeā€Ø
PTB only
Disc 91.7
Generativeā€Ø
PTB only
Gen 93.6
Choe and Charniak (2016)ā€Ø
Semisupervised
Genā€Ø
+SemiSup
93.8
Fried et al. (2017)
Gen+Semi+
Ensemble
94.7
English PTB (Parsing)
Perplexity
5-gram IKN 169.3
LSTM + Dropout 113.4
Generative (IS) 102.4
English PTB (LM)
ā€¢ RNNGs are effective for modeling language and parsing
ā€¢ Finding 1: Generative parser > discriminative parser
ā€¢ Better sample complexity
ā€¢ No label bias in modeling action sequence
ā€¢ Finding 2: RNNGs > RNNs
ā€¢ PTB phrase structure is a useful inductive bias for making good
linguistic generalizations!
ā€¢ Open question: Do RNNGs solve the Linzen, Dupoux, and
Goldberg problem?
Summary
ā€¢ Recurrent neural networks are incredibly powerful models of
sequences (e.g., of words)
ā€¢ In fact, RNNs are Turing complete!ā€Ø
(Siegelmann, 1995)
ā€¢ But do they make good generalizations from ļ¬nite samples of
data?
ā€¢ What inductive biases do they have?
ā€¢ Using syntactic annotations to improve inductive bias.
ā€¢ Inferring better compositional structure without explicit
annotation
Recurrent neural networksā€Ø
Good models of language?
(part 1)
(part 2)
ā€¢ We have compared two end points: a sequence model and a
PTB-based syntactic model
ā€¢ If we search for structure to obtain the best performance on down
stream tasks, what will it ļ¬nd?
ā€¢ In this part of the talk, I focus on representation learning for
sentences.
Unsupervised structureā€Ø
Do we need syntactic annotation?
Dani Yogatama
ā€¢ Advances in deep learning have led to three predominant approaches for
constructing representations of sentences
Background
ā€¢ Convolutional neural networks
ā€¢ Recurrent neural networks
ā€¢ Recursive neural networks
Kim, 2014; Kalchbrenner et al., 2014; Ma et al., 2015
Cho et al., 2014; Sutskever et al., 2014; Bahdanau et al., 2014
Socher et al., 2011; Socher et al., 2013;Tai et al., 2015; Bowman et al., 2016
ā€¢ Advances in deep learning have led to three predominant approaches for
constructing representations of sentences
Background
ā€¢ Convolutional neural networks
ā€¢ Recurrent neural networks
ā€¢ Recursive neural networks
Kim, 2014; Kalchbrenner et al., 2014; Ma et al., 2015
Cho et al., 2014; Sutskever et al., 2014; Bahdanau et al., 2014
Socher et al., 2011; Socher et al., 2013;Tai et al., 2015; Bowman et al., 2016
Recurrent Encoder
A boy drags his sleds through the snow
output
input
(word embeddings)
L = log p(y | x)
Recursive Encoder
A boy drags his sleds through the snow
output
input
(word embeddings)
Recursive Encoder
ā€¢ Prior work on tree-structured neural networks assumed that the trees are
either provided as input or predicted based on explicit human
annotations (Socher et al., 2013; Bowman et al., 2016; Dyer et al., 2016)
ā€¢ When children learn a new language, they are not given parse trees
ā€¢ Infer tree structure as a latent variable that is marginalized during learning
of a downstream task that depends on the composed representation.
ā€¢ Hierarchical structures can also be encoded into a word embedding
model to improve performance and interpretability (Yogatama et al., 2015)
L = log p(y | x)
= log
X
z2Z(x)
p(y, z | x)
Model
ā€¢ Shift reduce parsing
A boy drags his sleds
(Aho and Ullman, 1972)
Stack Queue
Model
ā€¢ Shift reduce parsing
A boy drags his sleds
(Aho and Ullman, 1972)
Stack Queue
Model
ā€¢ Shift reduce parsing
boy drags his sleds
(Aho and Ullman, 1972)
Stack Queue
SHIFT
A
A
Tree
Model
ā€¢ Shift reduce parsing
boy drags his sleds
(Aho and Ullman, 1972)
Stack Queue
A
A
Tree
Model
ā€¢ Shift reduce parsing
drags his sleds
(Aho and Ullman, 1972)
Stack Queue
SHIFT
A
boy
A
Tree
boy
Model
ā€¢ Shift reduce parsing
drags his sleds
(Aho and Ullman, 1972)
Stack Queue
A
boy
A
Tree
boy
Model
ā€¢ Shift reduce parsing
drags his sleds
(Aho and Ullman, 1972)
Stack Queue
A
boy
REDUCE
A
Tree
boy
Model
ā€¢ Shift reduce parsing
drags his sleds
(Aho and Ullman, 1972)
Stack Queue
REDUCE
i = (WI[hi, hj] + bI)
o = (WO[hi, hj] + bI)
fL = (WFL
[hi, hj] + bFL
)
fR = (WFR
[hi, hj] + bFR
)
g = tanh(WG[hi, hj] + bG)
c = fL ci + fR cj + i g
h = o c
ā€¢ Compose top two elements of the stack with Tree LSTM (Tai et al., 2015
Zhu et al., 2015)
A
boy
Model
ā€¢ Shift reduce parsing
drags his sleds
(Aho and Ullman, 1972)
Stack Queue
REDUCE
ā€¢ Compose top two elements of the stack with Tree LSTM (Tai et al., 2015
Zhu et al., 2015)
Tree-LSTM(a, boy)
ā€¢ Push the result back onto the stack
A
Tree
boy
Model
ā€¢ Shift reduce parsing
drags his sleds
(Aho and Ullman, 1972)
Stack Queue
Tree-LSTM(a, boy)
A
Tree
boy
Model
ā€¢ Shift reduce parsing
his sleds
(Aho and Ullman, 1972)
Stack Queue
Tree-LSTM(a, boy)
SHIFT
drags
A
Tree
boy drags
Model
ā€¢ Shift reduce parsing (Aho and Ullman, 1972)
Stack Queue
REDUCE
Tree-LSTM(Tree-LSTM(a,boy), Tree-LSTM(drags, Tree-LSTM(his,sleds)))
A
Tree
boy drags his sleds
ā€¢ Shift reduce parsing
Model
1 2 4 5
3 6
7
1 2 4 6
3
7
5
1 2 3 4
7
5
6
1 2 3 6
4
5
7
S, S, R, S, S, R, R S, S, S, R, R, S, R S, S, R, S, R, S, R S, S, S, S, R, R, R
Different Shift/Reduce sequences lead to different tree structures
(Aho and Ullman, 1972)
Learning
ā€¢ How do we learn the policy for the shift reduce sequence?
ā€¢ Supervised learning! Imitation learning!
ā€¢ But we need a treebank.
ā€¢ What if we donā€™t have labels?
Reinforcement Learning
ā€¢ Given a state, and a set of possible actions, an agent needs to
decide what is the best possible action to take
ā€¢ There is no supervision, only rewards which can be not observed until
after several actions are taken
Reinforcement Learning
ā€¢ Given a state, and a set of possible actions, an agent needs to
decide what is the best possible action to take
ā€¢ There is no supervision, only rewards which can be not observed until
after several actions are taken
ā€¢ State
ā€¢ Action
ā€¢ A Shift-Reduce ā€œagentā€
ā€¢ Reward
embeddings of top two elements of the stack,
embedding of head of the queue
shift, reduce
log likelihood on a downstream task given the
produced representation
Reinforcement Learning
ā€¢ We use a simple policy gradient method, REINFORCE (Williams, 1992)
R(W) = Eā‡”(a,s;WR)
" TX
t=1
rtat
#
ā‡”(a | s; WR)ā€¢ The goal of training is to train a policy network
to maximize rewards
ā€¢ Other techniques for approximating the marginal likelihood are available
(Guu et al., 2017)
Stanford Sentiment Treebank
Method Accuracy
Naive Bayes (from Socher et al., 2013) 81.8
SVM (from Socher et al., 2013) 79.4
Average of Word Embeddings (from Socher et al., 2013) 80.1
Bayesian Optimization (Yogatama et al., 2015) 82.4
Weighted Average of Word Embeddings (Arora et al., 2017) 82.4
Left-to-Right LSTM 84.7
Right-to-Left LSTM 83.9
Bidirectional LSTM 84.7
Supervised Syntax 85.3
Semi-supervised Syntax 86.1
Latent Syntax 86.5
Socher et al., 2013
Stanford Sentiment Treebank
Method Accuracy
Naive Bayes (from Socher et al., 2013) 81.8
SVM (from Socher et al., 2013) 79.4
Average of Word Embeddings (from Socher et al., 2013) 80.1
Bayesian Optimization (Yogatama et al., 2015) 82.4
Weighted Average of Word Embeddings (Arora et al., 2017) 82.4
Left-to-Right LSTM 84.7
Right-to-Left LSTM 83.9
Bidirectional LSTM 84.7
Supervised Syntax 85.3
Semi-supervised Syntax 86.1
Latent Syntax 86.5
Socher et al., 2013
Examples of Learned Structures
intuitive structures
non-intuitive structures
Examples of Learned Structures
ā€¢ Trees look ā€œnon linguisticā€
ā€¢ But downstream performance is great!
ā€¢ Ignore the generation of the sentence in favor of
optimal structures for interpretation
ā€¢ Natural languages must balance both.
Grammar Induction
Summary
ā€¢ Do we need better bias in our models?
ā€¢ Yes! They are making the wrong generalizations,
even from large data.
ā€¢ Do we have to have the perfect model?
ā€¢ No! Small steps in the right direction can pay big
dividends.
Linguistic Structure
Summary
Thanks!
Questions?

More Related Content

Viewers also liked

State of Blockchain 2017: Smartnetworks and the Blockchain Economy
State of Blockchain 2017:  Smartnetworks and the Blockchain EconomyState of Blockchain 2017:  Smartnetworks and the Blockchain Economy
State of Blockchain 2017: Smartnetworks and the Blockchain Economy
Melanie Swan
Ā 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Association for Computational Linguistics
Ā 
Philosophy of Deep Learning
Philosophy of Deep LearningPhilosophy of Deep Learning
Philosophy of Deep Learning
Melanie Swan
Ā 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
Bhaskar Mitra
Ā 
iPhone5cēš„ęœ€åŽēŒœęµ‹
iPhone5cēš„ęœ€åŽēŒœęµ‹iPhone5cēš„ęœ€åŽēŒœęµ‹
iPhone5cēš„ęœ€åŽēŒœęµ‹
Yanbin Kong
Ā 
Deep Learning for Chatbot (4/4)
Deep Learning for Chatbot (4/4)Deep Learning for Chatbot (4/4)
Deep Learning for Chatbot (4/4)
Jaemin Cho
Ā 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP Hack
Roelof Pieters
Ā 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and Understanding
Yanbin Kong
Ā 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Association for Computational Linguistics
Ā 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
LINAGORA
Ā 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
Association for Computational Linguistics
Ā 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
Bhaskar Mitra
Ā 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Association for Computational Linguistics
Ā 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
Association for Computational Linguistics
Ā 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Association for Computational Linguistics
Ā 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
LINAGORA
Ā 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
Melanie Swan
Ā 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
Roelof Pieters
Ā 
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedBlockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Melanie Swan
Ā 
Cs231n 2017 lecture10 Recurrent Neural Networks
Cs231n 2017 lecture10 Recurrent Neural NetworksCs231n 2017 lecture10 Recurrent Neural Networks
Cs231n 2017 lecture10 Recurrent Neural Networks
Yanbin Kong
Ā 

Viewers also liked (20)

State of Blockchain 2017: Smartnetworks and the Blockchain Economy
State of Blockchain 2017:  Smartnetworks and the Blockchain EconomyState of Blockchain 2017:  Smartnetworks and the Blockchain Economy
State of Blockchain 2017: Smartnetworks and the Blockchain Economy
Ā 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Ā 
Philosophy of Deep Learning
Philosophy of Deep LearningPhilosophy of Deep Learning
Philosophy of Deep Learning
Ā 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
Ā 
iPhone5cēš„ęœ€åŽēŒœęµ‹
iPhone5cēš„ęœ€åŽēŒœęµ‹iPhone5cēš„ęœ€åŽēŒœęµ‹
iPhone5cēš„ęœ€åŽēŒœęµ‹
Ā 
Deep Learning for Chatbot (4/4)
Deep Learning for Chatbot (4/4)Deep Learning for Chatbot (4/4)
Deep Learning for Chatbot (4/4)
Ā 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP Hack
Ā 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and Understanding
Ā 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Ā 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
Ā 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
Ā 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
Ā 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Ā 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
Ā 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Ā 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
Ā 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
Ā 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
Ā 
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedBlockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Ā 
Cs231n 2017 lecture10 Recurrent Neural Networks
Cs231n 2017 lecture10 Recurrent Neural NetworksCs231n 2017 lecture10 Recurrent Neural Networks
Cs231n 2017 lecture10 Recurrent Neural Networks
Ā 

Similar to Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Reflect Linguistic Structure?

causativeppt.ppt
causativeppt.pptcausativeppt.ppt
causativeppt.ppt
AjiNug3
Ā 
1. introduction to infinity
1. introduction to infinity1. introduction to infinity
1. introduction to infinity
Biagio Tassone
Ā 
Karlgren
KarlgrenKarlgren
Karlgrenyaevents
Ā 
Class4 - The Language Programming
Class4 - The Language ProgrammingClass4 - The Language Programming
Class4 - The Language Programming
Nathacia Lucena
Ā 
Towards a lingua universalis
Towards a lingua universalisTowards a lingua universalis
Towards a lingua universalis
Walid Saba
Ā 
KBSI 2015: Towards multimodal indicators of idea improvement in knowledge bui...
KBSI 2015: Towards multimodal indicators of idea improvement in knowledge bui...KBSI 2015: Towards multimodal indicators of idea improvement in knowledge bui...
KBSI 2015: Towards multimodal indicators of idea improvement in knowledge bui...
Bodong Chen
Ā 
semantics nour.pptx
semantics nour.pptxsemantics nour.pptx
semantics nour.pptx
Nour Al-Huda Al-Shammary
Ā 
7. TaMoS slides explanations and causes OLD.pdf
7. TaMoS slides explanations and causes OLD.pdf7. TaMoS slides explanations and causes OLD.pdf
7. TaMoS slides explanations and causes OLD.pdf
YunChen83
Ā 
Professor Michael Hoey: The hidden similarities across languages - some good ...
Professor Michael Hoey: The hidden similarities across languages - some good ...Professor Michael Hoey: The hidden similarities across languages - some good ...
Professor Michael Hoey: The hidden similarities across languages - some good ...
eaquals
Ā 
Assessing Experiential Learning: Epic Finales and Roleplaying Rubrics
Assessing Experiential Learning: Epic Finales and Roleplaying RubricsAssessing Experiential Learning: Epic Finales and Roleplaying Rubrics
Assessing Experiential Learning: Epic Finales and Roleplaying Rubrics
SeriousGamesAssoc
Ā 
Assessing Experiential Learning: Epic Finales and Roleplaying Rubrics
Assessing Experiential Learning: Epic Finales and Roleplaying RubricsAssessing Experiential Learning: Epic Finales and Roleplaying Rubrics
Assessing Experiential Learning: Epic Finales and Roleplaying Rubrics
SeriousGamesAssoc
Ā 
Aspects of Sentential Meaning.pptx
Aspects of Sentential Meaning.pptxAspects of Sentential Meaning.pptx
Aspects of Sentential Meaning.pptx
karlwinn1
Ā 
Logic & critical thinking
Logic & critical thinking Logic & critical thinking
Logic & critical thinking
AMIR HASSAN
Ā 
usage based model.pptx
usage based model.pptxusage based model.pptx
usage based model.pptx
Yang829407
Ā 
Syntax
SyntaxSyntax
Lecture5 Meaning
Lecture5 MeaningLecture5 Meaning
Lecture5 Meaning
Adel Thamery
Ā 
Psychological processes in language acquisition
Psychological processes in language acquisitionPsychological processes in language acquisition
Psychological processes in language acquisition
srnaz
Ā 
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
Walid Saba
Ā 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
AI Summary
Ā 
The deconstruction of the Chinese Room
The deconstruction of the Chinese Room The deconstruction of the Chinese Room
The deconstruction of the Chinese Room
Bogdan Bocse
Ā 

Similar to Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Reflect Linguistic Structure? (20)

causativeppt.ppt
causativeppt.pptcausativeppt.ppt
causativeppt.ppt
Ā 
1. introduction to infinity
1. introduction to infinity1. introduction to infinity
1. introduction to infinity
Ā 
Karlgren
KarlgrenKarlgren
Karlgren
Ā 
Class4 - The Language Programming
Class4 - The Language ProgrammingClass4 - The Language Programming
Class4 - The Language Programming
Ā 
Towards a lingua universalis
Towards a lingua universalisTowards a lingua universalis
Towards a lingua universalis
Ā 
KBSI 2015: Towards multimodal indicators of idea improvement in knowledge bui...
KBSI 2015: Towards multimodal indicators of idea improvement in knowledge bui...KBSI 2015: Towards multimodal indicators of idea improvement in knowledge bui...
KBSI 2015: Towards multimodal indicators of idea improvement in knowledge bui...
Ā 
semantics nour.pptx
semantics nour.pptxsemantics nour.pptx
semantics nour.pptx
Ā 
7. TaMoS slides explanations and causes OLD.pdf
7. TaMoS slides explanations and causes OLD.pdf7. TaMoS slides explanations and causes OLD.pdf
7. TaMoS slides explanations and causes OLD.pdf
Ā 
Professor Michael Hoey: The hidden similarities across languages - some good ...
Professor Michael Hoey: The hidden similarities across languages - some good ...Professor Michael Hoey: The hidden similarities across languages - some good ...
Professor Michael Hoey: The hidden similarities across languages - some good ...
Ā 
Assessing Experiential Learning: Epic Finales and Roleplaying Rubrics
Assessing Experiential Learning: Epic Finales and Roleplaying RubricsAssessing Experiential Learning: Epic Finales and Roleplaying Rubrics
Assessing Experiential Learning: Epic Finales and Roleplaying Rubrics
Ā 
Assessing Experiential Learning: Epic Finales and Roleplaying Rubrics
Assessing Experiential Learning: Epic Finales and Roleplaying RubricsAssessing Experiential Learning: Epic Finales and Roleplaying Rubrics
Assessing Experiential Learning: Epic Finales and Roleplaying Rubrics
Ā 
Aspects of Sentential Meaning.pptx
Aspects of Sentential Meaning.pptxAspects of Sentential Meaning.pptx
Aspects of Sentential Meaning.pptx
Ā 
Logic & critical thinking
Logic & critical thinking Logic & critical thinking
Logic & critical thinking
Ā 
usage based model.pptx
usage based model.pptxusage based model.pptx
usage based model.pptx
Ā 
Syntax
SyntaxSyntax
Syntax
Ā 
Lecture5 Meaning
Lecture5 MeaningLecture5 Meaning
Lecture5 Meaning
Ā 
Psychological processes in language acquisition
Psychological processes in language acquisitionPsychological processes in language acquisition
Psychological processes in language acquisition
Ā 
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
Ā 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
Ā 
The deconstruction of the Chinese Room
The deconstruction of the Chinese Room The deconstruction of the Chinese Room
The deconstruction of the Chinese Room
Ā 

More from Association for Computational Linguistics

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Association for Computational Linguistics
Ā 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Association for Computational Linguistics
Ā 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Association for Computational Linguistics
Ā 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Association for Computational Linguistics
Ā 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Association for Computational Linguistics
Ā 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Association for Computational Linguistics
Ā 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Association for Computational Linguistics
Ā 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Association for Computational Linguistics
Ā 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Association for Computational Linguistics
Ā 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
Association for Computational Linguistics
Ā 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
Association for Computational Linguistics
Ā 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Association for Computational Linguistics
Ā 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Association for Computational Linguistics
Ā 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Association for Computational Linguistics
Ā 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Association for Computational Linguistics
Ā 
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Association for Computational Linguistics
Ā 
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Association for Computational Linguistics
Ā 
Toshiaki Nakazawa - 2015 - Overview of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa  - 2015 - Overview of the 2nd Workshop on Asian TranslationToshiaki Nakazawa  - 2015 - Overview of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa - 2015 - Overview of the 2nd Workshop on Asian Translation
Association for Computational Linguistics
Ā 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Association for Computational Linguistics
Ā 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Association for Computational Linguistics
Ā 

More from Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Ā 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Ā 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Ā 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Ā 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Ā 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Ā 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Ā 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Ā 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Ā 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
Ā 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
Ā 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Ā 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Ā 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Ā 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Ā 
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Ā 
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Ā 
Toshiaki Nakazawa - 2015 - Overview of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa  - 2015 - Overview of the 2nd Workshop on Asian TranslationToshiaki Nakazawa  - 2015 - Overview of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa - 2015 - Overview of the 2nd Workshop on Asian Translation
Ā 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Ā 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Ā 

Recently uploaded

The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
Ā 
ą¦¬ą¦¾ą¦‚ą¦²ą¦¾ą¦¦ą§‡ą¦¶ ą¦…ą¦°ą§ą¦„ą¦Øą§ˆą¦¤ą¦æą¦• ą¦øą¦®ą§€ą¦•ą§ą¦·ą¦¾ (Economic Review) ą§Øą§¦ą§Øą§Ŗ UJS App.pdf
ą¦¬ą¦¾ą¦‚ą¦²ą¦¾ą¦¦ą§‡ą¦¶ ą¦…ą¦°ą§ą¦„ą¦Øą§ˆą¦¤ą¦æą¦• ą¦øą¦®ą§€ą¦•ą§ą¦·ą¦¾ (Economic Review) ą§Øą§¦ą§Øą§Ŗ UJS App.pdfą¦¬ą¦¾ą¦‚ą¦²ą¦¾ą¦¦ą§‡ą¦¶ ą¦…ą¦°ą§ą¦„ą¦Øą§ˆą¦¤ą¦æą¦• ą¦øą¦®ą§€ą¦•ą§ą¦·ą¦¾ (Economic Review) ą§Øą§¦ą§Øą§Ŗ UJS App.pdf
ą¦¬ą¦¾ą¦‚ą¦²ą¦¾ą¦¦ą§‡ą¦¶ ą¦…ą¦°ą§ą¦„ą¦Øą§ˆą¦¤ą¦æą¦• ą¦øą¦®ą§€ą¦•ą§ą¦·ą¦¾ (Economic Review) ą§Øą§¦ą§Øą§Ŗ UJS App.pdf
eBook.com.bd (ą¦Ŗą§ą¦°ą¦Æą¦¼ą§‹ą¦œą¦Øą§€ą¦Æą¦¼ ą¦¬ą¦¾ą¦‚ą¦²ą¦¾ ą¦¬ą¦‡)
Ā 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
Ā 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
Ā 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
Ā 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
christianmathematics
Ā 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
Ā 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
KrisztiƔn SzƔraz
Ā 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
Ā 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
AG2 Design
Ā 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
Ā 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Ashish Kohli
Ā 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
Ā 
Lapbook sobre os Regimes TotalitƔrios.pdf
Lapbook sobre os Regimes TotalitƔrios.pdfLapbook sobre os Regimes TotalitƔrios.pdf
Lapbook sobre os Regimes TotalitƔrios.pdf
Jean Carlos Nunes PaixĆ£o
Ā 
kitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptxkitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptx
datarid22
Ā 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
Ā 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
Ā 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
Ā 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
Ā 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
NelTorrente
Ā 

Recently uploaded (20)

The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
Ā 
ą¦¬ą¦¾ą¦‚ą¦²ą¦¾ą¦¦ą§‡ą¦¶ ą¦…ą¦°ą§ą¦„ą¦Øą§ˆą¦¤ą¦æą¦• ą¦øą¦®ą§€ą¦•ą§ą¦·ą¦¾ (Economic Review) ą§Øą§¦ą§Øą§Ŗ UJS App.pdf
ą¦¬ą¦¾ą¦‚ą¦²ą¦¾ą¦¦ą§‡ą¦¶ ą¦…ą¦°ą§ą¦„ą¦Øą§ˆą¦¤ą¦æą¦• ą¦øą¦®ą§€ą¦•ą§ą¦·ą¦¾ (Economic Review) ą§Øą§¦ą§Øą§Ŗ UJS App.pdfą¦¬ą¦¾ą¦‚ą¦²ą¦¾ą¦¦ą§‡ą¦¶ ą¦…ą¦°ą§ą¦„ą¦Øą§ˆą¦¤ą¦æą¦• ą¦øą¦®ą§€ą¦•ą§ą¦·ą¦¾ (Economic Review) ą§Øą§¦ą§Øą§Ŗ UJS App.pdf
ą¦¬ą¦¾ą¦‚ą¦²ą¦¾ą¦¦ą§‡ą¦¶ ą¦…ą¦°ą§ą¦„ą¦Øą§ˆą¦¤ą¦æą¦• ą¦øą¦®ą§€ą¦•ą§ą¦·ą¦¾ (Economic Review) ą§Øą§¦ą§Øą§Ŗ UJS App.pdf
Ā 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
Ā 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
Ā 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Ā 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
Ā 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
Ā 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Ā 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
Ā 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
Ā 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Ā 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Ā 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Ā 
Lapbook sobre os Regimes TotalitƔrios.pdf
Lapbook sobre os Regimes TotalitƔrios.pdfLapbook sobre os Regimes TotalitƔrios.pdf
Lapbook sobre os Regimes TotalitƔrios.pdf
Ā 
kitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptxkitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptx
Ā 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Ā 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Ā 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Ā 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Ā 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
Ā 

Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Reflect Linguistic Structure?

  • 1. Should Neural Network Architecture Reļ¬‚ect Linguistic Structure? CoNLLAugust 3, 2017 Adhi Kuncoroā€Ø (Oxford) Phil Blunsomā€Ø (DeepMind) Dani Yogatamaā€Ø (DeepMind) Chris Dyerā€Ø (DeepMind/CMU) Miguel Ballesterosā€Ø (IBM) Wang Lingā€Ø (DeepMind) Noah A. Smithā€Ø (UW) Ed Grefenstetteā€Ø (DeepMind)
  • 2. Should Neural Network Architecture Reļ¬‚ect Linguistic Structure? Yes. CoNLLAugust 3, 2017
  • 4. (1) a. The talk I gave did not appeal to anybody. Modeling languageā€Ø Sentences are hierarchical Examples adapted from Everaert et al. (TICS 2015)
  • 5. (1) a. The talk I gave did not appeal to anybody. b. *The talk I gave appealed to anybody. Modeling languageā€Ø Sentences are hierarchical Examples adapted from Everaert et al. (TICS 2015)
  • 6. (1) a. The talk I gave did not appeal to anybody. b. *The talk I gave appealed to anybody. Modeling languageā€Ø Sentences are hierarchical NPI Examples adapted from Everaert et al. (TICS 2015)
  • 7. (1) a. The talk I gave did not appeal to anybody. b. *The talk I gave appealed to anybody. Generalization hypothesis: not must come before anybody Modeling languageā€Ø Sentences are hierarchical NPI Examples adapted from Everaert et al. (TICS 2015)
  • 8. (1) a. The talk I gave did not appeal to anybody. b. *The talk I gave appealed to anybody. Generalization hypothesis: not must come before anybody (2) *The talk I did not give appealed to anybody. Modeling languageā€Ø Sentences are hierarchical NPI Examples adapted from Everaert et al. (TICS 2015)
  • 9. Examples adapted from Everaert et al. (TICS 2015) containing anybody. (This structural conļ¬guration is called c(onstituent)-command in the linguistics literature [31].) When the relationship between not and anybody adheres to this structural conļ¬guration, the sentence is well-formed. In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence is not well-formed. (A) (B) X X X X X X The book X X X The book X appealed to anybody did not that I bought appeal to anybody that I did not buy Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item. (B) Negative polarity not licensed. Negative element does not c-command negative polarity item. - the psychological reality of structural sensitivtyā€Ø is not empirically controversial - many different theories of the details of structure - more controversial hypothesis: kids learn languageā€Ø easily because they donā€™t consider many ā€œobviousā€ struct insensitive hypotheses Language is hierarchical The talk I gave did not appeal to anybody appealed to anybodyThe talk I did not give
  • 10. Examples adapted from Everaert et al. (TICS 2015) containing anybody. (This structural conļ¬guration is called c(onstituent)-command in the linguistics literature [31].) When the relationship between not and anybody adheres to this structural conļ¬guration, the sentence is well-formed. In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence is not well-formed. (A) (B) X X X X X X The book X X X The book X appealed to anybody did not that I bought appeal to anybody that I did not buy Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item. (B) Negative polarity not licensed. Negative element does not c-command negative polarity item. Generalization: not must ā€œstructurally precedeā€ anybody - the psychological reality of structural sensitivtyā€Ø is not empirically controversial - many different theories of the details of structure - more controversial hypothesis: kids learn languageā€Ø easily because they donā€™t consider many ā€œobviousā€ struct insensitive hypotheses Language is hierarchical The talk I gave did not appeal to anybody appealed to anybodyThe talk I did not give
  • 11. Examples adapted from Everaert et al. (TICS 2015) containing anybody. (This structural conļ¬guration is called c(onstituent)-command in the linguistics literature [31].) When the relationship between not and anybody adheres to this structural conļ¬guration, the sentence is well-formed. In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence is not well-formed. (A) (B) X X X X X X The book X X X The book X appealed to anybody did not that I bought appeal to anybody that I did not buy Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item. (B) Negative polarity not licensed. Negative element does not c-command negative polarity item. Generalization: not must ā€œstructurally precedeā€ anybody - the psychological reality of structural sensitivtyā€Ø is not empirically controversial - many different theories of the details of structure - more controversial hypothesis: kids learn languageā€Ø easily because they donā€™t consider many ā€œobviousā€ struct insensitive hypotheses Language is hierarchical The talk I gave did not appeal to anybody appealed to anybodyThe talk I did not give
  • 12. Examples adapted from Everaert et al. (TICS 2015) containing anybody. (This structural conļ¬guration is called c(onstituent)-command in the linguistics literature [31].) When the relationship between not and anybody adheres to this structural conļ¬guration, the sentence is well-formed. In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence is not well-formed. (A) (B) X X X X X X The book X X X The book X appealed to anybody did not that I bought appeal to anybody that I did not buy Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item. (B) Negative polarity not licensed. Negative element does not c-command negative polarity item. Generalization: not must ā€œstructurally precedeā€ anybody - the psychological reality of structural sensitivtyā€Ø is not empirically controversial - many different theories of the details of structure - more controversial hypothesis: kids learn languageā€Ø easily because they donā€™t consider many ā€œobviousā€ struct insensitive hypotheses Language is hierarchical The talk I gave did not appeal to anybody appealed to anybodyThe talk I did not give
  • 13. Examples adapted from Everaert et al. (TICS 2015) containing anybody. (This structural conļ¬guration is called c(onstituent)-command in the linguistics literature [31].) When the relationship between not and anybody adheres to this structural conļ¬guration, the sentence is well-formed. In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence is not well-formed. (A) (B) X X X X X X The book X X X The book X appealed to anybody did not that I bought appeal to anybody that I did not buy Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item. (B) Negative polarity not licensed. Negative element does not c-command negative polarity item. Generalization: not must ā€œstructurally precedeā€ anybody - the psychological reality of structural sensitivtyā€Ø is not empirically controversial - many different theories of the details of structure - more controversial hypothesis: kids learn languageā€Ø easily because they donā€™t consider many ā€œobviousā€ā€Ø structural insensitive hypotheses Language is hierarchical The talk I gave did not appeal to anybody appealed to anybodyThe talk I did not give
  • 14. ā€¢ Recurrent neural networks are incredibly powerful models of sequences (e.g., of words) ā€¢ In fact, RNNs are Turing complete!ā€Ø (Siegelmann, 1995) ā€¢ But do they make good generalizations from ļ¬nite samples of data? ā€¢ What inductive biases do they have? ā€¢ Using syntactic annotations to improve inductive bias. ā€¢ Inferring better compositional structure without explicit annotation Recurrent neural networksā€Ø Good models of language? (part 1) (part 2)
  • 15. ā€¢ Recurrent neural networks are incredibly powerful models of sequences (e.g., of words) ā€¢ In fact, RNNs are Turing complete!ā€Ø (Siegelmann, 1995) ā€¢ But do they make good generalizations from ļ¬nite samples of data? ā€¢ What inductive biases do they have? ā€¢ Using syntactic annotations to improve inductive bias. ā€¢ Inferring better compositional structure without explicit annotation Recurrent neural networksā€Ø Good models of language? (part 1) (part 2)
  • 16. ā€¢ Understanding the biases of neural networks is tricky ā€¢ We have enough trouble understanding the representations they learn in speciļ¬c cases, much less general cases! ā€¢ But, there is lots of evidence RNNs prefer sequential recency ā€¢ Evidence 1: Gradients become attenuated across time ā€¢ Analysis; experiments with synthetic datasetsā€Ø (yes, LSTMs help but they have limits) ā€¢ Evidence 2: Training regimes like reversing sequences in seq2seq learning ā€¢ Evidence 3: Modeling enhancements to use attention (direct connections back in remote time) ā€¢ Evidence 4: Linzen et al. (2017) ļ¬ndings on English subject-verb agreement. Recurrent neural networksā€Ø Inductive bias
  • 17. ā€¢ Understanding the biases of neural networks is tricky ā€¢ We have enough trouble understanding the representations they learn in speciļ¬c cases, much less general cases! ā€¢ But, there is lots of evidence RNNs have a bias for sequential recency ā€¢ Evidence 1: Gradients become attenuated across time ā€¢ Functional analysis; experiments with synthetic datasetsā€Ø (yes, LSTMs/GRUs help, but they also forget) ā€¢ Evidence 2: Training regimes like reversing sequences in seq2seq learning ā€¢ Evidence 3: Modeling enhancements to use attention (direct connections back in remote time) ā€¢ Evidence 4: Linzen et al. (2017) ļ¬ndings on English subject-verb agreement. Recurrent neural networksā€Ø Inductive bias
  • 18. The keys to the cabinet in the closet is/are on the table. AGREE Linzen, Dupoux, Goldberg (2017)ā€Ø
  • 19. ā€¢ Understanding the biases of neural networks is tricky ā€¢ We have enough trouble understanding the representations they learn in speciļ¬c cases, much less general cases! ā€¢ But, there is lots of evidence RNNs have a bias for sequential recency ā€¢ Evidence 1: Gradients become attenuated across time ā€¢ Functional analysis; experiments with synthetic datasetsā€Ø (yes, LSTMs/GRUs help, but they also forget) ā€¢ Evidence 2: Training regimes like reversing sequences in seq2seq learning ā€¢ Evidence 3: Modeling enhancements to use attention (direct connections back in remote time) ā€¢ Evidence 4: Linzen et al. (2017) ļ¬ndings on English subject-verb agreement. Recurrent neural networksā€Ø Inductive bias Chomsky (crudely paraphrasing 60 years of work):ā€Ø ā€Ø Sequential recency is not the right bias forā€Ø modeling human language.
  • 20. An alternativeā€Ø Recurrent Neural Net Grammars ā€¢ Generate symbols sequentially using an RNN ā€¢ Add some control symbols to rewrite the history occasionally ā€¢ Occasionally compress a sequence into a constituent ā€¢ RNN predicts next terminal/control symbol based on the history of compressed elements and non-compressed terminals ā€¢ This is a top-down, left-to-right generation of a tree+sequence Adhi Kuncoro
  • 21. An alternativeā€Ø Recurrent Neural Net Grammars ā€¢ Generate symbols sequentially using an RNN ā€¢ Add some control symbols to rewrite the history occasionally ā€¢ Occasionally compress a sequence into a constituent ā€¢ RNN predicts next terminal/control symbol based on the history of compressed elements and non-compressed terminals ā€¢ This is a top-down, left-to-right generation of a tree+sequence Adhi Kuncoro
  • 22. ā€¢ Generate symbols sequentially using an RNN ā€¢ Add some control symbols to rewrite the history occasionally ā€¢ Occasionally compress a sequence into a constituent ā€¢ RNN predicts next terminal/control symbol based on the history of compressed elements and non-compressed terminals ā€¢ This is a top-down, left-to-right generation of a tree+sequence An alternativeā€Ø Recurrent Neural Net Grammars Adhi Kuncoro
  • 23. The hungry cat meows loudly Example derivationā€Ø
  • 27. stack action (S probability NT(S) p(nt(S) | top) NT(NP) p(nt(NP) | (S)
  • 28. stack action (S (S (NP probability NT(S) p(nt(S) | top) NT(NP) p(nt(NP) | (S)
  • 29. stack action (S (S (NP probability NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S)
  • 30. stack action (S (S (NP (S (NP The probability NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S)
  • 31. stack action (S (S (NP (S (NP The probability NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The)
  • 32. stack action (S (S (NP (S (NP The hungry (S (NP The probability NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The)
  • 33. stack action (S (S (NP (S (NP The hungry (S (NP The probability NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The) GEN(cat) p(gen(cat) | . . .)
  • 34. stack action (S (S (NP (S (NP The hungry cat (S (NP The hungry (S (NP The probability NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The) GEN(cat) p(gen(cat) | . . .)
  • 35. stack action (S (S (NP (S (NP The hungry cat (S (NP The hungry (S (NP The probability NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The) GEN(cat) p(gen(cat) | . . .) REDUCE p(reduce | . . .)
  • 36. stack action (S (S (NP (S (NP The hungry cat (S (NP The hungry (S (NP The (S (NP The hungry cat ) probability NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The) GEN(cat) p(gen(cat) | . . .) REDUCE p(reduce | . . .)
  • 37. stack action (S (S (NP (S (NP The hungry cat (S (NP The hungry (S (NP The (S (NP The hungry cat ) (S (NP The hungry cat) Compress ā€œThe hungry catā€ā€Ø into a single composite symbol probability NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The) GEN(cat) p(gen(cat) | . . .) REDUCE p(reduce | . . .)
  • 38. stack action (S (NP The hungry cat) NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The) GEN(cat) p(gen(cat) | . . .) REDUCE p(reduce | . . .) (S (S (NP (S (NP The hungry cat (S (NP The hungry (S (NP The probability
  • 39. stack action (S (NP The hungry cat) NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The) GEN(cat) p(gen(cat) | . . .) REDUCE p(reduce | . . .) (S (S (NP (S (NP The hungry cat (S (NP The hungry (S (NP The NT(VP) p(nt(VP) | (S, (NP The hungry cat) probability
  • 40. stack action (S (NP The hungry cat) (VP (S (NP The hungry cat) NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The) GEN(cat) p(gen(cat) | . . .) REDUCE p(reduce | . . .) (S (S (NP (S (NP The hungry cat (S (NP The hungry (S (NP The NT(VP) p(nt(VP) | (S, (NP The hungry cat) probability
  • 41. stack action GEN(meows)(S (NP The hungry cat) (VP (S (NP The hungry cat) NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The) GEN(cat) p(gen(cat) | . . .) REDUCE p(reduce | . . .) (S (S (NP (S (NP The hungry cat (S (NP The hungry (S (NP The NT(VP) p(nt(VP) | (S, (NP The hungry cat) probability
  • 42. stack action GEN(meows) REDUCE (S (NP The hungry cat) (VP meows) GEN(.) REDUCE (S (NP The hungry cat) (VP meows) .) (S (NP The hungry cat) (VP meows) . (S (NP The hungry cat) (VP meows (S (NP The hungry cat) (VP (S (NP The hungry cat) NT(S) p(nt(S) | top) GEN(The) p(gen(The) | (S, (NP) NT(NP) p(nt(NP) | (S) GEN(hungry) p(gen(hungry) | (S, (NP, The) GEN(cat) p(gen(cat) | . . .) REDUCE p(reduce | . . .) (S (S (NP (S (NP The hungry cat (S (NP The hungry (S (NP The NT(VP) p(nt(VP) | (S, (NP The hungry cat) probability
  • 43. ā€¢ Valid (tree, string) pairs are in bijection to valid sequences of actions (speciļ¬cally, the DFS, left-to-right traversal of the trees) ā€¢ Every stack conļ¬guration perfectly encodes the complete history of actions. ā€¢ Therefore, the probability decomposition is justiļ¬ed by the chain rule, i.e. (chain rule) (prop 2) p(x, y) = p(actions(x, y)) p(actions(x, y)) = Y i p(ai | a<i) = Y i p(ai | stack(a<i)) (prop 1) Some things you can showā€Ø
  • 44. Modeling the next actionā€Ø (S (NP The hungry cat) (VP meowsp(ai | )
  • 45. Modeling the next actionā€Ø (S (NP The hungry cat) (VP meowsp(ai | ) 1. unbounded depth
  • 46. Modeling the next actionā€Ø (S (NP The hungry cat) (VP meowsp(ai | ) 1. unbounded depth 1. Unbounded depth ā†’ recurrent neural nets h1 h2 h3 h4
  • 47. Modeling the next actionā€Ø (S (NP The hungry cat) (VP meowsp(ai | ) 1. Unbounded depth ā†’ recurrent neural nets h1 h2 h3 h4
  • 48. Modeling the next actionā€Ø (S (NP The hungry cat) (VP meowsp(ai | ) 1. Unbounded depth ā†’ recurrent neural nets h1 h2 h3 h4 2. arbitrarily complex trees 2. Arbitrarily complex trees ā†’ recursive neural nets
  • 49. (NP The hungry cat)Need representation for: Syntactic compositionā€Ø
  • 50. (NP The hungry cat)Need representation for: NP What head type? Syntactic compositionā€Ø
  • 51. The (NP The hungry cat)Need representation for: NP What head type? Syntactic compositionā€Ø
  • 52. The hungry (NP The hungry cat)Need representation for: NP What head type? Syntactic compositionā€Ø
  • 53. The hungry cat (NP The hungry cat)Need representation for: NP What head type? Syntactic compositionā€Ø
  • 54. The hungry cat ) (NP The hungry cat)Need representation for: NP What head type? Syntactic compositionā€Ø
  • 55. TheNP hungry cat ) (NP The hungry cat)Need representation for: Syntactic compositionā€Ø
  • 56. TheNP hungry cat ) NP (NP The hungry cat)Need representation for: Syntactic compositionā€Ø
  • 57. TheNP hungry cat ) NP (NP The hungry cat)Need representation for: ( Syntactic compositionā€Ø
  • 58. TheNP hungry cat ) NP( (NP The hungry cat)Need representation for: Syntactic compositionā€Ø
  • 59. TheNP cat ) NP( (NP The (ADJP very hungry) cat) Need representation for: (NP The hungry cat) hungry Syntactic compositionā€Ø Recursion
  • 60. TheNP cat ) NP( (NP The (ADJP very hungry) cat) Need representation for: (NP The hungry cat) | {z } v v Syntactic compositionā€Ø Recursion
  • 61. Modeling the next actionā€Ø (S (NP The hungry cat) (VP meowsp(ai | ) 1. Unbounded depth ā†’ recurrent neural nets 2. Arbitrarily complex trees ā†’ recursive neural nets h1 h2 h3 h4
  • 62. Modeling the next actionā€Ø (S (NP The hungry cat) (VP meowsp(ai | ) 1. Unbounded depth ā†’ recurrent neural nets 2. Arbitrarily complex trees ā†’ recursive neural nets ā‡ REDUCE h1 h2 h3 h4
  • 63. Modeling the next actionā€Ø (S (NP The hungry cat) (VP meowsp(ai | ) 1. Unbounded depth ā†’ recurrent neural nets 2. Arbitrarily complex trees ā†’ recursive neural nets ā‡ REDUCE (S (NP The hungry cat) (VP meows)p(ai+1 | )
  • 64. Modeling the next actionā€Ø (S (NP The hungry cat) (VP meowsp(ai | ) 1. Unbounded depth ā†’ recurrent neural nets 2. Arbitrarily complex trees ā†’ recursive neural nets ā‡ REDUCE (S (NP The hungry cat) (VP meows)p(ai+1 | ) 3. limited updates 3. Limited updates to state ā†’ stack RNNs
  • 65. ā€¢ If we accept the following two propositionsā€¦ ā€¢ Sequential RNNs have a recency bias ā€¢ Syntactic composition learns to represent trees by endocentric heads ā€¢ then we can say that they have a bias for syntactic recency rather than sequential recency RNNGsā€Ø Inductive bias?
  • 66. ā€¢ If we accept the following two propositionsā€¦ ā€¢ Sequential RNNs have a recency bias ā€¢ Syntactic composition learns to represent trees by endocentric heads ā€¢ then we can say that they have a bias for syntactic recency rather than sequential recency RNNGsā€Ø Inductive bias? (S (NP The keys (PP to the cabinet (PP in the closet))) (VP is/are
  • 67. ā€¢ If we accept the following two propositionsā€¦ ā€¢ Sequential RNNs have a recency bias ā€¢ Syntactic composition learns to represent trees by endocentric heads ā€¢ then we can say that they have a bias for syntactic recency rather than sequential recency RNNGsā€Ø Inductive bias? (S (NP The keys (PP to the cabinet (PP in the closet))) (VP is/are ā€œkeysā€ z }| {
  • 68. ā€¢ If we accept the following two propositionsā€¦ ā€¢ Sequential RNNs have a recency bias ā€¢ Syntactic composition learns to represent trees by endocentric heads ā€¢ then we can say that they have a bias for syntactic recency rather than sequential recency RNNGsā€Ø Inductive bias? (S (NP The keys (PP to the cabinet (PP in the closet))) (VP is/are ā€œkeysā€ z }| {
  • 69. ā€¢ If we accept the following two propositionsā€¦ ā€¢ Sequential RNNs have a recency bias ā€¢ Syntactic composition learns to represent trees by endocentric heads ā€¢ then we can say that they have a bias for syntactic recency rather than sequential recency RNNGsā€Ø Inductive bias? (S (NP The keys (PP to the cabinet (PP in the closet))) (VP is/are The keys to the cabinet in the closet is/areSequential RNN: ā€œkeysā€ z }| {
  • 70. ā€¢ Generative models are great! We have deļ¬ned a joint distribution p(x,y) over strings (x) and parse trees (y) ā€¢ We can look at two questions: ā€¢ What is p(x) for a given x? [language modeling] ā€¢ What is max p(y | x) for a given x? [parsing] RNNGsā€Ø A few empirical results
  • 71. English PTB (Parsing) Type F1 Petrov and Klein (2007) Gen 90.1 Shindo et al (2012)ā€Ø Single model Gen 91.1 Vinyals et al (2015)ā€Ø PTB only Disc 90.5 Shindo et al (2012)ā€Ø Ensemble Gen 92.4 Vinyals et al (2015)ā€Ø Semisupervised Disc+SemiS up 92.8 Discriminativeā€Ø PTB only Disc 91.7 Generativeā€Ø PTB only Gen 93.6 Choe and Charniak (2016)ā€Ø Semisupervised Genā€Ø +SemiSup 93.8 Fried et al. (2017) Gen+Semi+ Ensemble 94.7
  • 72. English PTB (Parsing) Type F1 Petrov and Klein (2007) Gen 90.1 Shindo et al (2012)ā€Ø Single model Gen 91.1 Vinyals et al (2015)ā€Ø PTB only Disc 90.5 Shindo et al (2012)ā€Ø Ensemble Gen 92.4 Vinyals et al (2015)ā€Ø Semisupervised Disc+SemiS up 92.8 Discriminativeā€Ø PTB only Disc 91.7 Generativeā€Ø PTB only Gen 93.6 Choe and Charniak (2016)ā€Ø Semisupervised Genā€Ø +SemiSup 93.8 Fried et al. (2017) Gen+Semi+ Ensemble 94.7
  • 73. English PTB (Parsing) Type F1 Petrov and Klein (2007) Gen 90.1 Shindo et al (2012)ā€Ø Single model Gen 91.1 Vinyals et al (2015)ā€Ø PTB only Disc 90.5 Shindo et al (2012)ā€Ø Ensemble Gen 92.4 Vinyals et al (2015)ā€Ø Semisupervised Disc+SemiS up 92.8 Discriminativeā€Ø PTB only Disc 91.7 Generativeā€Ø PTB only Gen 93.6 Choe and Charniak (2016)ā€Ø Semisupervised Genā€Ø +SemiSup 93.8 Fried et al. (2017) Gen+Semi+ Ensemble 94.7
  • 74. Type F1 Petrov and Klein (2007) Gen 90.1 Shindo et al (2012)ā€Ø Single model Gen 91.1 Vinyals et al (2015)ā€Ø PTB only Disc 90.5 Shindo et al (2012)ā€Ø Ensemble Gen 92.4 Vinyals et al (2015)ā€Ø Semisupervised Disc+SemiS up 92.8 Discriminativeā€Ø PTB only Disc 91.7 Generativeā€Ø PTB only Gen 93.6 Choe and Charniak (2016)ā€Ø Semisupervised Genā€Ø +SemiSup 93.8 Fried et al. (2017) Gen+Semi+ Ensemble 94.7 English PTB (Parsing)
  • 75. Perplexity 5-gram IKN 169.3 LSTM + Dropout 113.4 Generative (IS) 102.4 English PTB (LM)
  • 76. ā€¢ RNNGs are effective for modeling language and parsing ā€¢ Finding 1: Generative parser > discriminative parser ā€¢ Better sample complexity ā€¢ No label bias in modeling action sequence ā€¢ Finding 2: RNNGs > RNNs ā€¢ PTB phrase structure is a useful inductive bias for making good linguistic generalizations! ā€¢ Open question: Do RNNGs solve the Linzen, Dupoux, and Goldberg problem? Summary
  • 77. ā€¢ Recurrent neural networks are incredibly powerful models of sequences (e.g., of words) ā€¢ In fact, RNNs are Turing complete!ā€Ø (Siegelmann, 1995) ā€¢ But do they make good generalizations from ļ¬nite samples of data? ā€¢ What inductive biases do they have? ā€¢ Using syntactic annotations to improve inductive bias. ā€¢ Inferring better compositional structure without explicit annotation Recurrent neural networksā€Ø Good models of language? (part 1) (part 2)
  • 78. ā€¢ We have compared two end points: a sequence model and a PTB-based syntactic model ā€¢ If we search for structure to obtain the best performance on down stream tasks, what will it ļ¬nd? ā€¢ In this part of the talk, I focus on representation learning for sentences. Unsupervised structureā€Ø Do we need syntactic annotation? Dani Yogatama
  • 79. ā€¢ Advances in deep learning have led to three predominant approaches for constructing representations of sentences Background ā€¢ Convolutional neural networks ā€¢ Recurrent neural networks ā€¢ Recursive neural networks Kim, 2014; Kalchbrenner et al., 2014; Ma et al., 2015 Cho et al., 2014; Sutskever et al., 2014; Bahdanau et al., 2014 Socher et al., 2011; Socher et al., 2013;Tai et al., 2015; Bowman et al., 2016
  • 80. ā€¢ Advances in deep learning have led to three predominant approaches for constructing representations of sentences Background ā€¢ Convolutional neural networks ā€¢ Recurrent neural networks ā€¢ Recursive neural networks Kim, 2014; Kalchbrenner et al., 2014; Ma et al., 2015 Cho et al., 2014; Sutskever et al., 2014; Bahdanau et al., 2014 Socher et al., 2011; Socher et al., 2013;Tai et al., 2015; Bowman et al., 2016
  • 81. Recurrent Encoder A boy drags his sleds through the snow output input (word embeddings) L = log p(y | x)
  • 82. Recursive Encoder A boy drags his sleds through the snow output input (word embeddings)
  • 83. Recursive Encoder ā€¢ Prior work on tree-structured neural networks assumed that the trees are either provided as input or predicted based on explicit human annotations (Socher et al., 2013; Bowman et al., 2016; Dyer et al., 2016) ā€¢ When children learn a new language, they are not given parse trees ā€¢ Infer tree structure as a latent variable that is marginalized during learning of a downstream task that depends on the composed representation. ā€¢ Hierarchical structures can also be encoded into a word embedding model to improve performance and interpretability (Yogatama et al., 2015) L = log p(y | x) = log X z2Z(x) p(y, z | x)
  • 84. Model ā€¢ Shift reduce parsing A boy drags his sleds (Aho and Ullman, 1972) Stack Queue
  • 85. Model ā€¢ Shift reduce parsing A boy drags his sleds (Aho and Ullman, 1972) Stack Queue
  • 86. Model ā€¢ Shift reduce parsing boy drags his sleds (Aho and Ullman, 1972) Stack Queue SHIFT A A Tree
  • 87. Model ā€¢ Shift reduce parsing boy drags his sleds (Aho and Ullman, 1972) Stack Queue A A Tree
  • 88. Model ā€¢ Shift reduce parsing drags his sleds (Aho and Ullman, 1972) Stack Queue SHIFT A boy A Tree boy
  • 89. Model ā€¢ Shift reduce parsing drags his sleds (Aho and Ullman, 1972) Stack Queue A boy A Tree boy
  • 90. Model ā€¢ Shift reduce parsing drags his sleds (Aho and Ullman, 1972) Stack Queue A boy REDUCE A Tree boy
  • 91. Model ā€¢ Shift reduce parsing drags his sleds (Aho and Ullman, 1972) Stack Queue REDUCE i = (WI[hi, hj] + bI) o = (WO[hi, hj] + bI) fL = (WFL [hi, hj] + bFL ) fR = (WFR [hi, hj] + bFR ) g = tanh(WG[hi, hj] + bG) c = fL ci + fR cj + i g h = o c ā€¢ Compose top two elements of the stack with Tree LSTM (Tai et al., 2015 Zhu et al., 2015) A boy
  • 92. Model ā€¢ Shift reduce parsing drags his sleds (Aho and Ullman, 1972) Stack Queue REDUCE ā€¢ Compose top two elements of the stack with Tree LSTM (Tai et al., 2015 Zhu et al., 2015) Tree-LSTM(a, boy) ā€¢ Push the result back onto the stack A Tree boy
  • 93. Model ā€¢ Shift reduce parsing drags his sleds (Aho and Ullman, 1972) Stack Queue Tree-LSTM(a, boy) A Tree boy
  • 94. Model ā€¢ Shift reduce parsing his sleds (Aho and Ullman, 1972) Stack Queue Tree-LSTM(a, boy) SHIFT drags A Tree boy drags
  • 95. Model ā€¢ Shift reduce parsing (Aho and Ullman, 1972) Stack Queue REDUCE Tree-LSTM(Tree-LSTM(a,boy), Tree-LSTM(drags, Tree-LSTM(his,sleds))) A Tree boy drags his sleds
  • 96. ā€¢ Shift reduce parsing Model 1 2 4 5 3 6 7 1 2 4 6 3 7 5 1 2 3 4 7 5 6 1 2 3 6 4 5 7 S, S, R, S, S, R, R S, S, S, R, R, S, R S, S, R, S, R, S, R S, S, S, S, R, R, R Different Shift/Reduce sequences lead to different tree structures (Aho and Ullman, 1972)
  • 97. Learning ā€¢ How do we learn the policy for the shift reduce sequence? ā€¢ Supervised learning! Imitation learning! ā€¢ But we need a treebank. ā€¢ What if we donā€™t have labels?
  • 98. Reinforcement Learning ā€¢ Given a state, and a set of possible actions, an agent needs to decide what is the best possible action to take ā€¢ There is no supervision, only rewards which can be not observed until after several actions are taken
  • 99. Reinforcement Learning ā€¢ Given a state, and a set of possible actions, an agent needs to decide what is the best possible action to take ā€¢ There is no supervision, only rewards which can be not observed until after several actions are taken ā€¢ State ā€¢ Action ā€¢ A Shift-Reduce ā€œagentā€ ā€¢ Reward embeddings of top two elements of the stack, embedding of head of the queue shift, reduce log likelihood on a downstream task given the produced representation
  • 100. Reinforcement Learning ā€¢ We use a simple policy gradient method, REINFORCE (Williams, 1992) R(W) = Eā‡”(a,s;WR) " TX t=1 rtat # ā‡”(a | s; WR)ā€¢ The goal of training is to train a policy network to maximize rewards ā€¢ Other techniques for approximating the marginal likelihood are available (Guu et al., 2017)
  • 101. Stanford Sentiment Treebank Method Accuracy Naive Bayes (from Socher et al., 2013) 81.8 SVM (from Socher et al., 2013) 79.4 Average of Word Embeddings (from Socher et al., 2013) 80.1 Bayesian Optimization (Yogatama et al., 2015) 82.4 Weighted Average of Word Embeddings (Arora et al., 2017) 82.4 Left-to-Right LSTM 84.7 Right-to-Left LSTM 83.9 Bidirectional LSTM 84.7 Supervised Syntax 85.3 Semi-supervised Syntax 86.1 Latent Syntax 86.5 Socher et al., 2013
  • 102. Stanford Sentiment Treebank Method Accuracy Naive Bayes (from Socher et al., 2013) 81.8 SVM (from Socher et al., 2013) 79.4 Average of Word Embeddings (from Socher et al., 2013) 80.1 Bayesian Optimization (Yogatama et al., 2015) 82.4 Weighted Average of Word Embeddings (Arora et al., 2017) 82.4 Left-to-Right LSTM 84.7 Right-to-Left LSTM 83.9 Bidirectional LSTM 84.7 Supervised Syntax 85.3 Semi-supervised Syntax 86.1 Latent Syntax 86.5 Socher et al., 2013
  • 103. Examples of Learned Structures intuitive structures
  • 105. ā€¢ Trees look ā€œnon linguisticā€ ā€¢ But downstream performance is great! ā€¢ Ignore the generation of the sentence in favor of optimal structures for interpretation ā€¢ Natural languages must balance both. Grammar Induction Summary
  • 106. ā€¢ Do we need better bias in our models? ā€¢ Yes! They are making the wrong generalizations, even from large data. ā€¢ Do we have to have the perfect model? ā€¢ No! Small steps in the right direction can pay big dividends. Linguistic Structure Summary