Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Squashing
Computational Linguistics
Noah A. Smith
Paul G. Allen School of Computer Science & Engineering
University of Was...
data
Applications of NLP in 2017
• Conversation, IE, MT, QA, summarization, text categorization
Applications of NLP in 2017
• Conversation, IE, MT, QA, summarization, text categorization
• Machine-in-the-loop tools for...
Applications of NLP in 2017
• Conversation, IE, MT, QA, summarization, text categorization
• Machine-in-the-loop tools for...
data
?
Squash
Squash Networks
• Parameterized differentiable functions composed out of simpler
parameterized differentiable functions, s...
Squash Networks
• Parameterized differentiable functions composed out of simpler
parameterized differentiable functions, s...
Squash Networks
• Parameterized differentiable functions composed out of simpler
parameterized differentiable functions, s...
Who wants an all-squash diet?
wow
very Curcurbita
much festive
many dropout
Linguistic Structure Prediction
input (text)
output (structure)
Linguistic Structure Prediction
input (text)
output (structure)
sequences,
trees,
graphs, …
“gold” output
Linguistic Structure Prediction
input (text)
output (structure)
sequences,
trees,
graphs, …
“gold” output
Linguistic Structure Prediction
input representation
input (text)
output (structure)
sequences,
trees,
graph...
“gold” output
Linguistic Structure Prediction
input representation
input (text)
output (structure)
clusters,lexicons,
embe...
“gold” output
Linguistic Structure Prediction
input representation
input (text)
output (structure)
training objective
clus...
“gold” output
Linguistic Structure Prediction
input representation
input (text)
output (structure)
training objective
clus...
“gold” output
Linguistic Structure Prediction
input representation
input (text)
part representations
output (structure)
tr...
“gold” output
Linguistic Structure Prediction
input representation
input (text)
part representations
output (structure)
tr...
“gold” output
Linguistic Structure Prediction
input representation
input (text)
part representations
output (structure)
tr...
“gold” output
Linguistic Structure Prediction
input representation
input (text)
part representations
output (structure)
tr...
Inductive Bias
• What does your learning
algorithm assume?
• How will it choose among good
predictive functions?
See also:...
data
bias
Three New Models
• Parsing sentences into
predicate-argument structures
• Fillmore frames
• Semantic dependency graphs
• L...
When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much
further than the Big Lie ab...
Frame-Semantic Analysis
When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much
fur...
Frame-Semantic Analysis
When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much
fur...
Frame-Semantic Analysis
When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much
fur...
Frame-Semantic Analysis
When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much
fur...
FrameNet: https://framenet.icsi.berkeley.edu
brood, consider,
contemplate, deliberate, …
appraise,
assess,
evaluate, …
com...
Frame-Semantic Analysis
When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much
fur...
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … words + frame
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
biLSTM
(contextualized
word ...
parts: segments
up to length d
scored by
another biLSTM,
with labels
When Democrats wonderCogitation why there is so much ...
parts: segments
up to length d
scored by
another biLSTM,
with labels
When Democrats wonderCogitation why there is so much ...
Segmental RNN
(Lingpeng Kong, Chris Dyer, N.A.S., ICLR 2016)
biLSTM
(contextualized
word vectors)
input sequence
parts: se...
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
cognizer topic ∅
∅
∅
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
cognizer topic ∅
wonderCogit...
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
cognizer topic ∅
wonderCogit...
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
cognizer topic ∅
wonderCogit...
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
cognizer topic ∅
wonderCogit...
Inference via dynamic programming in O(Ldn)
When Democrats wonderCogitation why there is so much resentment of Clinton, th...
Open-SESAME
(Swabha Swayamdipta, Sam Thomson,
Chris Dyer, N.A.S., arXiv:1706.09528)
biLSTM
(contextualized
word vectors)
w...
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
cognizer topic ∅
wonderCogit...
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
cognizer topic ∅
wonderCogit...
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
yes no yes yesno
Penn Treeba...
When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
cognizer topic ∅
wonderCogit...
Multitask Representation Learning
(Caruana, 1997)
“gold” output
input representation
input (text)
part representations
out...
ours
65
66
67
68
69
70
71
72
SEMAFOR 1.0
(Das et al.,
2014)
SEMAFOR 2.0
(Kshirsagar et
al.,2015)
Open-SESAME … with
syntac...
ours
65
66
67
68
69
70
71
72
SEMAFOR 1.0
(Das et al.,
2014)
SEMAFOR 2.0
(Kshirsagar et
al.,2015)
Open-SESAME … with
syntac...
biLSTM
(contextualized
word vectors)
words + frame
training objective:
recall-oriented
softmax margin
(Gimpel et al.,
2010...
biLSTM
(contextualized
word vectors)
words + frame
training objective:
recall-oriented
softmax margin
(Gimpel et al.,
2010...
biLSTM
(contextualized
word vectors)
words + frame
training objective:
recall-oriented
softmax margin
(Gimpel et al.,
2010...
Semantic Dependency Graphs
(DELPH-IN minimal recursion semantics-derived representation; “DM”)
Oepen et al. (SemEval 2014;...
When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …
Democrats wonder
arg1
tanh(C[hwonder; h...
When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …
When Democrats Democrats wonder why is
...
When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …
When Democrats
wonder why
Democrats won...
When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …
Inference via AD3
(alternating directio...
Neurboparser
(Hao Peng, Sam Thomson, N.A.S., ACL 2017)
biLSTM
(contextualized
word vectors)
words
parts: labeled
bilexical...
When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …
Three Formalisms, Three Separate Parser...
When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …
Shared Input Representations
(Daumé, 20...
When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …
Cross-Task Parts
Democrats wonder
arg1
...
When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …
Both: Shared Input Representations
& Cr...
Multitask Learning: Many Possibilities
• Shared input representations, parts? Which parts?
• Joint decoding?
• Overlapping...
ours
80
81
82
83
84
85
86
87
88
Du et al.,2015 Almeida &
Martins, 2015
(no syntax)
Almeida &
Martins, 2015
(syntax)
Neurbo...
50
60
70
80
90
100
DM PAS PSD
WSJ (in-domain) Brown (out-of-domain)
Neurboparser F1 on three semantic dependency parsing f...
biLSTM
(contextualized
word vectors)
words
training objective:
structured hinge
loss
output: labeled
semantic
dependency
g...
Text
Text ≠ Sentences
larger context
Generative Language Models
history next word
p(W | history)
Generative Language Models
history next word
p(W | history)
When Democrats wonderwhy there is so much resentment of
Entity Language Model
(Yangfeng Ji, Chenhao Tan, Sebastian Martsch...
Entity Language Model
(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,
Yejin Choi, N.A.S., EMNLP 2017)
When Democrats wonde...
Entity Language Model
(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,
Yejin Choi, N.A.S., EMNLP 2017)
When Democrats wonde...
Entity Language Model
(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,
Yejin Choi, N.A.S., EMNLP 2017)
When Democrats wonde...
Entity Language Model
(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,
Yejin Choi, N.A.S., EMNLP 2017)
When Democrats wonde...
Entity Language Model
(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,
Yejin Choi, N.A.S., EMNLP 2017)
When Democrats wonde...
Entity Language Model
(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,
Yejin Choi, N.A.S., EMNLP 2017)
When Democrats wonde...
128
132
136
140
5-gram LM RNN LM entity
language
model
Perplexity on CoNLL 2012 test set. CoNLL 2012 coreference evaluatio...
history next word
p(W | history)
entities
Bias?
Bias in the Future?
• Linguistic scaffold tasks.
Bias in the Future?
• Linguistic scaffold tasks.
• Language is by and about people.
Bias in the Future?
• Linguistic scaffold tasks.
• Language is by and about people.
• NLP is needed when texts are costly ...
Bias in the Future?
• Linguistic scaffold tasks.
• Language is by and about people.
• NLP is needed when texts are costly ...
data
bias
Thank you!
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
Upcoming SlideShare
Loading in …5
×

Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics

3,098 views

Published on

Invited Keynote by Professor Noah A Smith (University of Washington) for ACL 2017. 1 Aug 2017. Vancouver, Canada.

Also available at https://homes.cs.washington.edu/~nasmith/slides/acl-8-1-17.pdf

License: CC BY 4.0

Published in: Education
  • Be the first to comment

Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics

  1. 1. Squashing Computational Linguistics Noah A. Smith Paul G. Allen School of Computer Science & Engineering University of Washington Seattle, USA @nlpnoah Research supported in part by: NSF, DARPA DEFT, DARPA CWC, Facebook, Google, Samsung, University of Washington.
  2. 2. data
  3. 3. Applications of NLP in 2017 • Conversation, IE, MT, QA, summarization, text categorization
  4. 4. Applications of NLP in 2017 • Conversation, IE, MT, QA, summarization, text categorization • Machine-in-the-loop tools for (human) authors Elizabeth Clark Collaborate with an NLP model through an “exquisite corpse” storytelling game Chenhao Tan Revise your message with help from NLP tremoloop.com
  5. 5. Applications of NLP in 2017 • Conversation, IE, MT, QA, summarization, text categorization • Machine-in-the-loop tools for (human) authors • Analysis tools for measuring social phenomena Lucy Lin Sensationalism in science news bit.ly/sensational-news … bookmark this survey! Dallas Card Track ideas, propositions, frames in discourse over time
  6. 6. data ?
  7. 7. Squash
  8. 8. Squash Networks • Parameterized differentiable functions composed out of simpler parameterized differentiable functions, some nonlinear
  9. 9. Squash Networks • Parameterized differentiable functions composed out of simpler parameterized differentiable functions, some nonlinear From Jack (2010),Dynamic System Modeling and Control, goo.gl/pGvJPS *Yes, rectified linear units (relus) are only half-squash; hat-tip Martha White.
  10. 10. Squash Networks • Parameterized differentiable functions composed out of simpler parameterized differentiable functions, some nonlinear • Estimate parameters using Leibniz (1676) From existentialcomics.com
  11. 11. Who wants an all-squash diet? wow very Curcurbita much festive many dropout
  12. 12. Linguistic Structure Prediction input (text) output (structure)
  13. 13. Linguistic Structure Prediction input (text) output (structure) sequences, trees, graphs, …
  14. 14. “gold” output Linguistic Structure Prediction input (text) output (structure) sequences, trees, graphs, …
  15. 15. “gold” output Linguistic Structure Prediction input representation input (text) output (structure) sequences, trees, graphs, …
  16. 16. “gold” output Linguistic Structure Prediction input representation input (text) output (structure) clusters,lexicons, embeddings, … sequences, trees, graphs, …
  17. 17. “gold” output Linguistic Structure Prediction input representation input (text) output (structure) training objective clusters,lexicons, embeddings, … sequences, trees, graphs, …
  18. 18. “gold” output Linguistic Structure Prediction input representation input (text) output (structure) training objective clusters,lexicons, embeddings, … sequences, trees, graphs, … probabilistic, cost-aware,…
  19. 19. “gold” output Linguistic Structure Prediction input representation input (text) part representations output (structure) training objective clusters,lexicons, embeddings, … sequences, trees, graphs, … probabilistic, cost-aware,…
  20. 20. “gold” output Linguistic Structure Prediction input representation input (text) part representations output (structure) training objective clusters,lexicons, embeddings, … segments/spans, arcs, graph fragments, … sequences, trees, graphs, … probabilistic, cost-aware,…
  21. 21. “gold” output Linguistic Structure Prediction input representation input (text) part representations output (structure) training objective
  22. 22. “gold” output Linguistic Structure Prediction input representation input (text) part representations output (structure) training objective error definitions & weights regularization annotation conventions & theory constraints & independence assumptions dataselection “task”
  23. 23. Inductive Bias • What does your learning algorithm assume? • How will it choose among good predictive functions? See also: No Free Lunch Theorem (Mitchell, 1980;Wolpert, 1996)
  24. 24. data bias
  25. 25. Three New Models • Parsing sentences into predicate-argument structures • Fillmore frames • Semantic dependency graphs • Language models that dynamically track entities
  26. 26. When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992. Original story on Slate.com: http://goo.gl/Hp89tD
  27. 27. Frame-Semantic Analysis When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992. FrameNet: https://framenet.icsi.berkeley.edu
  28. 28. Frame-Semantic Analysis When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992. FrameNet: https://framenet.icsi.berkeley.edu
  29. 29. Frame-Semantic Analysis When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992. cognizer: Democrats topic: why … Clinton FrameNet: https://framenet.icsi.berkeley.edu
  30. 30. Frame-Semantic Analysis When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992. cognizer: Democrats topic: why … Clinton explanation: why degree: so much content: of Clinton experiencer: ? helper: Stephanopoulos … Carville goal: to put over time: in 1992 benefited_party: ? FrameNet: https://framenet.icsi.berkeley.edu landmark event: Democrats … Clinton trajector event: they… 1992 entity: so … Clinton degree: so mass: resentment of Clinton time: When … Clinton required situation: they … to look … 1992 time: When … Clinton cognizer agent: they ground: much … 1992 sought entity: ? topic: about … 1992 trajector event: the Big Lie … over landmark period: 1992
  31. 31. FrameNet: https://framenet.icsi.berkeley.edu brood, consider, contemplate, deliberate, … appraise, assess, evaluate, … commit to memory, learn, memorize, … agonize, fret, fuss, lose sleep, … translate bracket, categorize, class, classify
  32. 32. Frame-Semantic Analysis When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992. cognizer: Democrats topic: why … Clinton explanation: why degree: so much content: of Clinton experiencer: ? helper: Stephanopoulos … Carville goal: to put over time: in 1992 benefited_party: ? FrameNet: https://framenet.icsi.berkeley.edu landmark event: Democrats … Clinton trajector event: they… 1992 entity: so … Clinton degree: so mass: resentment of Clinton time: When … Clinton required situation: they … to look … 1992 time: When … Clinton cognizer agent: they ground: much … 1992 sought entity: ? topic: about … 1992 trajector event: the Big Lie … over landmark period: 1992
  33. 33. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … words + frame
  34. 34. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … biLSTM (contextualized word vectors) words + frame
  35. 35. parts: segments up to length d scored by another biLSTM, with labels When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … … biLSTM (contextualized word vectors) words + frame
  36. 36. parts: segments up to length d scored by another biLSTM, with labels When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … … biLSTM (contextualized word vectors) words + frame output: covering sequence of nonoverlapping segments
  37. 37. Segmental RNN (Lingpeng Kong, Chris Dyer, N.A.S., ICLR 2016) biLSTM (contextualized word vectors) input sequence parts: segments up to length d scored by another biLSTM, with labels training objective: log loss output: covering sequence of nonoverlapping segments, recovered in O(Ldn); see Sarawagi & Cohen, 2004 …
  38. 38. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
  39. 39. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …
  40. 40. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer topic ∅ ∅ ∅
  41. 41. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅
  42. 42. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅
  43. 43. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅
  44. 44. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅
  45. 45. Inference via dynamic programming in O(Ldn) When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅
  46. 46. Open-SESAME (Swabha Swayamdipta, Sam Thomson, Chris Dyer, N.A.S., arXiv:1706.09528) biLSTM (contextualized word vectors) words + frame parts: segments with role labels, scored by another biLSTM training objective: recall-oriented softmax margin (Gimpel et al., 2010) output: labeled argument spans When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer: Democrats topic: why there is so much resentment of Clinton …
  47. 47. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅
  48. 48. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅ syntax features?
  49. 49. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … yes no yes yesno Penn Treebank (Marcus et al., 1993)
  50. 50. When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅ main task scaffold task yes no yes yesno
  51. 51. Multitask Representation Learning (Caruana, 1997) “gold” output input representation input (text) part representations output (structure) training objective “gold” output input representation input (text) part representations output (structure) training objective main task: find and label semantic arguments scaffold task: predict syntactic constituents shared training datasets need not overlap output structures need not be consistent
  52. 52. ours 65 66 67 68 69 70 71 72 SEMAFOR 1.0 (Das et al., 2014) SEMAFOR 2.0 (Kshirsagar et al.,2015) Open-SESAME … with syntactic scaffold … with syntax features Framat (Roth, 2016) FitzGerald et al. (2015) single model ensemble F1 on frame-semantic parsing (frames & arguments), FrameNet 1.5 test set. open-source systems
  53. 53. ours 65 66 67 68 69 70 71 72 SEMAFOR 1.0 (Das et al., 2014) SEMAFOR 2.0 (Kshirsagar et al.,2015) Open-SESAME … with syntactic scaffold … with syntax features Framat (Roth, 2016) FitzGerald et al. (2015) single model ensemble F1 on frame-semantic parsing (frames & arguments), FrameNet 1.5 test set. open-source systems
  54. 54. biLSTM (contextualized word vectors) words + frame training objective: recall-oriented softmax margin (Gimpel et al., 2010) output: labeled argument spans When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer: Democrats topic: why there is so much resentment of Clinton … segments get scores Bias? parts: segments with role labels, scored by another biLSTM
  55. 55. biLSTM (contextualized word vectors) words + frame training objective: recall-oriented softmax margin (Gimpel et al., 2010) output: labeled argument spans When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer: Democrats topic: why there is so much resentment of Clinton … segments get scores syntactic scaffold Bias? parts: segments with role labels, scored by another biLSTM
  56. 56. biLSTM (contextualized word vectors) words + frame training objective: recall-oriented softmax margin (Gimpel et al., 2010) output: labeled argument spans When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need … cognizer: Democrats topic: why there is so much resentment of Clinton … segments get scores recall-oriented cost syntactic scaffold Bias? parts: segments with role labels, scored by another biLSTM
  57. 57. Semantic Dependency Graphs (DELPH-IN minimal recursion semantics-derived representation; “DM”) Oepen et al. (SemEval 2014; 2015),see also http://sdp.delph-in.net
  58. 58. When Democrats wonderwhy there is so much resentment of Clinton, they don’t need … Democrats wonder arg1 tanh(C[hwonder; hDemocrats] + b) · arg1
  59. 59. When Democrats wonderwhy there is so much resentment of Clinton, they don’t need … When Democrats Democrats wonder why is is resentment so much much resentment resentment of of Clinton arg2 arg1 arg1 comp_so arg1 arg1 arg1arg1
  60. 60. When Democrats wonderwhy there is so much resentment of Clinton, they don’t need … When Democrats wonder why Democrats wonder why is is resentment so much much resentment resentment of of Clinton arg2 arg1 arg1 comp_so arg1 arg1 arg1 arg1arg1 wonder there arg1 wonder is arg1
  61. 61. When Democrats wonderwhy there is so much resentment of Clinton, they don’t need … Inference via AD3 (alternating directions dual decomposition; Martins et al., 2014) When Democrats wonder why Democrats wonder why is is resentment so much much resentment resentment of of Clinton arg2 arg1 arg1 comp_so arg1 arg1 arg1 arg1arg1 wonder there arg1 wonder is arg1
  62. 62. Neurboparser (Hao Peng, Sam Thomson, N.A.S., ACL 2017) biLSTM (contextualized word vectors) words parts: labeled bilexical dependencies training objective: structured hinge loss output: labeled semantic dependency graph with constraints When Democrats wonderwhy there is so much resentment of Clinton, they don’t need … …
  63. 63. When Democrats wonderwhy there is so much resentment of Clinton, they don’t need … Three Formalisms, Three Separate Parsers Democrats wonder arg1 formalism 3 (PSD) formalism 2 (PAS) formalism 1 (DM) Democrats wonder arg1 Democrats wonder act
  64. 64. When Democrats wonderwhy there is so much resentment of Clinton, they don’t need … Shared Input Representations (Daumé, 2007) Democrats wonder arg1 shared across all formalism 3 (PSD) formalism 2 (PAS) formalism 1 (DM) Democrats wonder arg1 Democrats wonder act
  65. 65. When Democrats wonderwhy there is so much resentment of Clinton, they don’t need … Cross-Task Parts Democrats wonder arg1 Democrats wonder arg1 Democrats wonder act shared across all
  66. 66. When Democrats wonderwhy there is so much resentment of Clinton, they don’t need … Both: Shared Input Representations & Cross-Task Parts Democrats wonder arg1 shared across all formalism 3 (PSD) formalism 2 (PAS) formalism 1 (DM) Democrats wonder arg1 Democrats wonder act
  67. 67. Multitask Learning: Many Possibilities • Shared input representations, parts? Which parts? • Joint decoding? • Overlapping training data? • Scaffold tasks? “gold” output “gold” output“gold” output formalism 1 (DM) formalism 2 (PAS) formalism 3 (PSD)
  68. 68. ours 80 81 82 83 84 85 86 87 88 Du et al.,2015 Almeida & Martins, 2015 (no syntax) Almeida & Martins, 2015 (syntax) Neurboparser … with shared input representations and cross-task parts WSJ (in-domain) Brown (out-of-domain) F1 averaged on three semantic dependency parsing formalisms, SemEval 2015 test set.
  69. 69. 50 60 70 80 90 100 DM PAS PSD WSJ (in-domain) Brown (out-of-domain) Neurboparser F1 on three semantic dependency parsing formalisms, SemEval 2015 test set. good enough?
  70. 70. biLSTM (contextualized word vectors) words training objective: structured hinge loss output: labeled semantic dependency graph with constraints When Democrats wonderwhy there is so much resentment of Clinton, they don’t need … … Bias? cross-formalism sharing parts: labeled bilexical dependencies
  71. 71. Text
  72. 72. Text ≠ Sentences larger context
  73. 73. Generative Language Models history next word p(W | history)
  74. 74. Generative Language Models history next word p(W | history)
  75. 75. When Democrats wonderwhy there is so much resentment of Entity Language Model (Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, N.A.S., EMNLP 2017)
  76. 76. Entity Language Model (Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, N.A.S., EMNLP 2017) When Democrats wonderwhy there is so much resentment of entity 1
  77. 77. Entity Language Model (Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, N.A.S., EMNLP 2017) When Democrats wonderwhy there is so much resentment of Clinton, 1. new entity with new vector 2. mention word will be “Clinton” entity 1 entity 2
  78. 78. Entity Language Model (Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, N.A.S., EMNLP 2017) When Democrats wonderwhy there is so much resentment of Clinton, entity 1 entity 2
  79. 79. Entity Language Model (Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, N.A.S., EMNLP 2017) When Democrats wonderwhy there is so much resentment of Clinton, they 1. coreferent of entity 1 (previously known as “Democrats”) 2. mention word will be “they” 3. embedding of entity 1 will be updated entity 1 entity 2
  80. 80. Entity Language Model (Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, N.A.S., EMNLP 2017) When Democrats wonderwhy there is so much resentment of Clinton, they entity 1 entity 2
  81. 81. Entity Language Model (Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, N.A.S., EMNLP 2017) When Democrats wonderwhy there is so much resentment of Clinton, they don’t 1. not part of an entity mention 2. “don’t” entity 1 entity 2
  82. 82. 128 132 136 140 5-gram LM RNN LM entity language model Perplexity on CoNLL 2012 test set. CoNLL 2012 coreference evaluation. 50 55 60 65 70 75 Martschat and Strube (2015) reranked with entity LM CoNLL MUC F1 B3 CEAF 25 35 45 55 65 75 always new shallow features Modi et al. (2017) entity LM human Accuracy: InScript.
  83. 83. history next word p(W | history) entities Bias?
  84. 84. Bias in the Future? • Linguistic scaffold tasks.
  85. 85. Bias in the Future? • Linguistic scaffold tasks. • Language is by and about people.
  86. 86. Bias in the Future? • Linguistic scaffold tasks. • Language is by and about people. • NLP is needed when texts are costly to read.
  87. 87. Bias in the Future? • Linguistic scaffold tasks. • Language is by and about people. • NLP is needed when texts are costly to read. • Polyglot learning.
  88. 88. data bias
  89. 89. Thank you!

×