K-SRL: Instance-based
Learning for Semantic Role
Labeling
Alan Akbik and Yunyao Li
IBM Research | Almaden
COLING 2016, 12/13/2016
SystemT
In Short
• K-SRL utilizes a simple but effective approach for
SRL
– Instance-based learning
– kNN: k-nearest neighbors
• State-of-the-art
– CoNLL-09 shared task data
– Outperforms previous SotA systems based on logistic regression and
NNs
– Both in-domain and out-of-domain data
2
Outline
• Problem Statement
• Proposed Approach
• Evaluation
• Discussion
SRL focuses on semantics, not syntax
Useful for many applications (Shen and
Lapata, 2007; Maqsud et al., 2014)
Dirk broke the window with a hammer.
Break.01A0 A1 A2
The window was broken by Dirk.
The window broke.
A1 Break.01 A0
A1 Break.01
Break.01
A0 – Breaker
A1 – Thing broken
A2 – Instrument
A3 – Pieces
Break.15
A0 – Journalist,
exposer
A1 – Story,
thing exposed
• Identify predicate-argument structures in sentences with
shallow semantic labels
• Identify which predicates evoke which semantic frames
• Identify constituents that take semantic roles within these frames
Semantic Role Labeling
SRL for NLP tasks
• Question Answering
– “What did Dirk break?”
– (Maqsud et al., 2014)
• Information Extraction
– “Who wants to buy what?”
– (Shen and Lapata, 2007)
• At IBM Research:
– Multilingual SRL
– Multilingual text analytics
5
SRL vs. Dependency Parsing
• LAS of state-of-the-art dependency parsers
– 92.05 (Weiss et al, 2015)
– 92.36 (Alberti et al, 2015)
– 92.79 (Andor et al, 2016)
• F1 of state-of-the-art SRL
– 87.79 (Roth and Woodsend, 2014)
– 88.19 (Roth and Lapata, 2016)
• SotA SRL quality lower than SotA dependency
parsing
6
SRL Challenges
• So, what makes SRL so difficult?
• Heavy-tailed distribution of class labels
– Common frames
• say.01 (8243), have.01 (2040), sell.01 (1009)
– Many uncommon frames
• swindle.01, feed.01, hum.01, toast.01
– Almost half of all frames seen fewer than 3 times in training data
• Many low-frequency exceptions
– Difficult to capture in model
7
0
2000
4000
6000
8000
10000
Distribution of frame labels
Low-frequency Exceptions
• Strong correlation of syntactic function of an argument to its role
• Example: passive subject
• CoNLL-09 Shared Task data:
– 86% of passive subjects are labeled A1
– Remaining 14% irregular, low-frequency exceptions
8
The window was broken by Dirk
SBJ
PMOD
VC NMOD
A1
The silver was sold by the man.
SBJ
PMOD
VC NMOD
A1
Creditors were told to hold off.
SBJ
ORPD
VC
IM PRT
TELL.01
A0: speaker (agent)
A1: utterance (topic)
A2: hearer (recipient)
Local Bias
• 86% of passive subjects are labeled A1 (over 4.000x in training
data)
• 87% of passive subjects of Tell.01 are labeled A2 (53x in training
data)
• Most classifiers:
– Bag-of-features
– Learn weights for features to classes
– Perform generalization
• Question: How do we explicitly capture low-frequency exceptions?
9
Outline
• Problem Statement
• Proposed Approach
• Evaluation
• Discussion
Instance-based Learning
• Proposed approach: Instance-based learning
– kNN: k-Nearest Neighbors classification
– Find the k most similar instances in training data
– Derive class label from nearest neighbors
11
A0
A1
A1
A2
A1
A1
A1
A1
A1
A0
A0
A1
A0
A2
A2A2
A2
A1
A2
?
1 2 3 ndistance
Creditors were told to hold off.
SBJ
ORPD
VC
IM PRT
“creditor” passive subject of TELL.01
noun passive subject of TELL.01
COMPOSITE FEATURE DISTANCE
1
2
.
.
.
.
.
.
any passive subject of any agentive verb n
?
– Main idea: Back off to composite feature seen at least k times
System Outline
• kNN classifier for both tasks
– Predicate labeling: Subcategorization features (subject, object, PPs)
– Argument labeling: Predicate frame, predicate POS, predicate voice, argument
head lemma, argument head POS, syntactic function of argument
• Additionally model global argument constraints
– Easy-First Argument Labeling
12
1. PREDICATE LABELING
2. ARGUMENT LABELING
Creditors were told to hold off.
TELL.01
A2 A1 A0
HOLD.01
Easy-First Argument Labeling
• Core roles can only be assigned once per predicate
13
Creditors were told to hold off.
Interesting stories were told to creditors.
TELL.01
A0: speaker (agent)
A1: utterance (topic)
A2: hearer (recipient)
A2 – 86%
A1 – 14%
A2 – 86%
A1 – 14%
A2 – 100%
• Easy-first:
– Order argument classifications by confidence
– Make most confident prediction first
– Remove assigned labels from options
Outline
• Problem Statement
• Proposed Approach
• Evaluation
• Discussion
Evaluation
• Comparison to previous SotA approaches:
– Chen(Zhao et al., 2009), maximum entropy
– Che(Che et al., 2009), support vector machines
– MatePlus(Roth and Woodsend, 2014), logistic regression and word embeddings
– PathLSTM (Roth and Lapata, 2016), logistic regression for predicates and NNs
with word embeddings for arguments
• Additional questions:
– Impact of choice of k?
– Overfitting?
– Impact of modeling global argument constraints?
• Dataset: CoNLL-09 Shared Task data
– Penn treebank w. PropBank SRL annotations
– NomBank annotations not evaluated
– Out-of-domain data: Brown corpus
15
Results
16
In-domain Out-of-domain
Results
• Significantly outperform previous approaches
– Especially on out-of-domain data
• Small neighborhoods suffice
• Very simple feature set
– No word embeddings or verb classes
17
Discussion
• Pros:
– Simple, highly transparent classification approach
• Composite features make feature interactions explicit
• Nearest neighborhood makes classification decision transparent
– Very fast to train and test
– Fast and transparent: Great for feature engineering
• Cons:
– IBL necessitates more feature engineering
• Define composites
• “Rank” composites into backoff sequences
18
Current and Future Work
• Learning instance similarity measures
– Manual definition of composites, but automatic ranking
– Improves quality over published numbers
• Better generalization features
– Word-level: Embeddings
– Predicate-level: “Frame classes”
• Special focus on raised arguments
19
The man broke the window.
SBJ OBJ
A0
The rock broke the window.
SBJ OBJ
A2
Creditors were told to hold off.
SBJ
ORPD
VC
IM PRT
Current Related Work at IBM Research
• Multilingual SRL
– Creating “Universal Proposition Banks” on top of Universal
Dependencies
– Version 1.0 released: https://github.com/System-
T/UniversalPropositions/
– COLING paper “Multilingual Aliasing for Auto-Generating
Proposition Banks”
• Poster session on Friday
• Multilingual text analytics
– COLING demo “Multilingual Information Extraction with
PolyglotIE”
• Demo session on Friday
20
Questions?
21
22
• Problem: Current SRL error-prone
• Strong local bias
• Many low-frequency exceptions, difficult to generalize
23
Instance-based Semantic Role Labeling
• K-SRL: Instance-based
learning
– K-nearest neighbors
• Composite features
• Significantly outperform all
other current systems
In-domain Out-of-domain
Instance-based Learning
• Most ML algorithms
– Bag of features: Syntactic function, predicate, voice …
– Learn weights for features to classes
• Strong weight for “passive subject” to A1
– Perform generalization
• Proposed approach: Instance-based learning
– kNN: k-Nearest Neighbors classification
– Find the k most similar instances in training data
• “all instances in which argument is a passive subject of TELL.01”
• Derive class label from majority in neighborhood
– No generalization
– Explicitly capture local bias using composite features
24
• Problem: Current SRL error-prone
• Strong local bias, many low-frequency exceptions, difficult to
generalize
25
Instance-based Semantic Role Labeling
Composite Features and Support
26
Creditors were told to hold off.
SBJ
ORPD
VC
IM PRT
TELL.01
A0: speaker (agent)
A1: utterance (topic)
A2: hearer (recipient)
voice (V): passive
predicate (P): tell
path (P): VC SBJ
ActionAPI - Demo
27
28
John hastily ordered a dozen dandelions for Mary from Amazon’s Flower Shop.
order.02 (request to be delivered)
A0: Orderer
A1: Thing ordered
A2: Benefactive, ordered-for
A3: Source
A0: Orderer
A1: Thing ordered
A2: Benefactive, ordered-for
A3: SourceAM-MNR: Manner
SRL: Who did what to whom, when,
where and how?
WHO
HOW
FOR
WHOM
DID
WHAT WHERE
• Problem: What type of labels are valid across
languages?
• Lexical, morphological and syntactic labels differ greatly
• But shallow semantic labels remain stable
Core idea: Predict English Semantic Role Labels for
arbitrary languages
Semantic Role Labeling
The station bought Cosby reruns for record prices
Buy.01A0
… spurred dollar buying by Japanese institutions.
Spur.01 A1 Buy.01 A0
The company payed 2 million US$ to buy ….
Corpus of annotated text dataFrame set
A1 A3
Buy.01
A0 – Buyer
A1 – Thing bought
A2 – Seller
A3 – Price paid
A4 – Benefactive
Pay.01
A0 – Payer
A1 – Money
A2 – Being payed
A3 – Commodity
Buy.01Pay.01A0 A1
• English SRL
– FrameNet
– PropBank
SRL Resources

K-SRL: Instance-based Learning for Semantic Role Labeling

  • 1.
    K-SRL: Instance-based Learning forSemantic Role Labeling Alan Akbik and Yunyao Li IBM Research | Almaden COLING 2016, 12/13/2016 SystemT
  • 2.
    In Short • K-SRLutilizes a simple but effective approach for SRL – Instance-based learning – kNN: k-nearest neighbors • State-of-the-art – CoNLL-09 shared task data – Outperforms previous SotA systems based on logistic regression and NNs – Both in-domain and out-of-domain data 2
  • 3.
    Outline • Problem Statement •Proposed Approach • Evaluation • Discussion
  • 4.
    SRL focuses onsemantics, not syntax Useful for many applications (Shen and Lapata, 2007; Maqsud et al., 2014) Dirk broke the window with a hammer. Break.01A0 A1 A2 The window was broken by Dirk. The window broke. A1 Break.01 A0 A1 Break.01 Break.01 A0 – Breaker A1 – Thing broken A2 – Instrument A3 – Pieces Break.15 A0 – Journalist, exposer A1 – Story, thing exposed • Identify predicate-argument structures in sentences with shallow semantic labels • Identify which predicates evoke which semantic frames • Identify constituents that take semantic roles within these frames Semantic Role Labeling
  • 5.
    SRL for NLPtasks • Question Answering – “What did Dirk break?” – (Maqsud et al., 2014) • Information Extraction – “Who wants to buy what?” – (Shen and Lapata, 2007) • At IBM Research: – Multilingual SRL – Multilingual text analytics 5
  • 6.
    SRL vs. DependencyParsing • LAS of state-of-the-art dependency parsers – 92.05 (Weiss et al, 2015) – 92.36 (Alberti et al, 2015) – 92.79 (Andor et al, 2016) • F1 of state-of-the-art SRL – 87.79 (Roth and Woodsend, 2014) – 88.19 (Roth and Lapata, 2016) • SotA SRL quality lower than SotA dependency parsing 6
  • 7.
    SRL Challenges • So,what makes SRL so difficult? • Heavy-tailed distribution of class labels – Common frames • say.01 (8243), have.01 (2040), sell.01 (1009) – Many uncommon frames • swindle.01, feed.01, hum.01, toast.01 – Almost half of all frames seen fewer than 3 times in training data • Many low-frequency exceptions – Difficult to capture in model 7 0 2000 4000 6000 8000 10000 Distribution of frame labels
  • 8.
    Low-frequency Exceptions • Strongcorrelation of syntactic function of an argument to its role • Example: passive subject • CoNLL-09 Shared Task data: – 86% of passive subjects are labeled A1 – Remaining 14% irregular, low-frequency exceptions 8 The window was broken by Dirk SBJ PMOD VC NMOD A1 The silver was sold by the man. SBJ PMOD VC NMOD A1 Creditors were told to hold off. SBJ ORPD VC IM PRT TELL.01 A0: speaker (agent) A1: utterance (topic) A2: hearer (recipient)
  • 9.
    Local Bias • 86%of passive subjects are labeled A1 (over 4.000x in training data) • 87% of passive subjects of Tell.01 are labeled A2 (53x in training data) • Most classifiers: – Bag-of-features – Learn weights for features to classes – Perform generalization • Question: How do we explicitly capture low-frequency exceptions? 9
  • 10.
    Outline • Problem Statement •Proposed Approach • Evaluation • Discussion
  • 11.
    Instance-based Learning • Proposedapproach: Instance-based learning – kNN: k-Nearest Neighbors classification – Find the k most similar instances in training data – Derive class label from nearest neighbors 11 A0 A1 A1 A2 A1 A1 A1 A1 A1 A0 A0 A1 A0 A2 A2A2 A2 A1 A2 ? 1 2 3 ndistance Creditors were told to hold off. SBJ ORPD VC IM PRT “creditor” passive subject of TELL.01 noun passive subject of TELL.01 COMPOSITE FEATURE DISTANCE 1 2 . . . . . . any passive subject of any agentive verb n ? – Main idea: Back off to composite feature seen at least k times
  • 12.
    System Outline • kNNclassifier for both tasks – Predicate labeling: Subcategorization features (subject, object, PPs) – Argument labeling: Predicate frame, predicate POS, predicate voice, argument head lemma, argument head POS, syntactic function of argument • Additionally model global argument constraints – Easy-First Argument Labeling 12 1. PREDICATE LABELING 2. ARGUMENT LABELING Creditors were told to hold off. TELL.01 A2 A1 A0 HOLD.01
  • 13.
    Easy-First Argument Labeling •Core roles can only be assigned once per predicate 13 Creditors were told to hold off. Interesting stories were told to creditors. TELL.01 A0: speaker (agent) A1: utterance (topic) A2: hearer (recipient) A2 – 86% A1 – 14% A2 – 86% A1 – 14% A2 – 100% • Easy-first: – Order argument classifications by confidence – Make most confident prediction first – Remove assigned labels from options
  • 14.
    Outline • Problem Statement •Proposed Approach • Evaluation • Discussion
  • 15.
    Evaluation • Comparison toprevious SotA approaches: – Chen(Zhao et al., 2009), maximum entropy – Che(Che et al., 2009), support vector machines – MatePlus(Roth and Woodsend, 2014), logistic regression and word embeddings – PathLSTM (Roth and Lapata, 2016), logistic regression for predicates and NNs with word embeddings for arguments • Additional questions: – Impact of choice of k? – Overfitting? – Impact of modeling global argument constraints? • Dataset: CoNLL-09 Shared Task data – Penn treebank w. PropBank SRL annotations – NomBank annotations not evaluated – Out-of-domain data: Brown corpus 15
  • 16.
  • 17.
    Results • Significantly outperformprevious approaches – Especially on out-of-domain data • Small neighborhoods suffice • Very simple feature set – No word embeddings or verb classes 17
  • 18.
    Discussion • Pros: – Simple,highly transparent classification approach • Composite features make feature interactions explicit • Nearest neighborhood makes classification decision transparent – Very fast to train and test – Fast and transparent: Great for feature engineering • Cons: – IBL necessitates more feature engineering • Define composites • “Rank” composites into backoff sequences 18
  • 19.
    Current and FutureWork • Learning instance similarity measures – Manual definition of composites, but automatic ranking – Improves quality over published numbers • Better generalization features – Word-level: Embeddings – Predicate-level: “Frame classes” • Special focus on raised arguments 19 The man broke the window. SBJ OBJ A0 The rock broke the window. SBJ OBJ A2 Creditors were told to hold off. SBJ ORPD VC IM PRT
  • 20.
    Current Related Workat IBM Research • Multilingual SRL – Creating “Universal Proposition Banks” on top of Universal Dependencies – Version 1.0 released: https://github.com/System- T/UniversalPropositions/ – COLING paper “Multilingual Aliasing for Auto-Generating Proposition Banks” • Poster session on Friday • Multilingual text analytics – COLING demo “Multilingual Information Extraction with PolyglotIE” • Demo session on Friday 20
  • 21.
  • 22.
  • 23.
    • Problem: CurrentSRL error-prone • Strong local bias • Many low-frequency exceptions, difficult to generalize 23 Instance-based Semantic Role Labeling • K-SRL: Instance-based learning – K-nearest neighbors • Composite features • Significantly outperform all other current systems In-domain Out-of-domain
  • 24.
    Instance-based Learning • MostML algorithms – Bag of features: Syntactic function, predicate, voice … – Learn weights for features to classes • Strong weight for “passive subject” to A1 – Perform generalization • Proposed approach: Instance-based learning – kNN: k-Nearest Neighbors classification – Find the k most similar instances in training data • “all instances in which argument is a passive subject of TELL.01” • Derive class label from majority in neighborhood – No generalization – Explicitly capture local bias using composite features 24
  • 25.
    • Problem: CurrentSRL error-prone • Strong local bias, many low-frequency exceptions, difficult to generalize 25 Instance-based Semantic Role Labeling
  • 26.
    Composite Features andSupport 26 Creditors were told to hold off. SBJ ORPD VC IM PRT TELL.01 A0: speaker (agent) A1: utterance (topic) A2: hearer (recipient) voice (V): passive predicate (P): tell path (P): VC SBJ
  • 27.
  • 28.
    28 John hastily ordereda dozen dandelions for Mary from Amazon’s Flower Shop. order.02 (request to be delivered) A0: Orderer A1: Thing ordered A2: Benefactive, ordered-for A3: Source A0: Orderer A1: Thing ordered A2: Benefactive, ordered-for A3: SourceAM-MNR: Manner SRL: Who did what to whom, when, where and how? WHO HOW FOR WHOM DID WHAT WHERE • Problem: What type of labels are valid across languages? • Lexical, morphological and syntactic labels differ greatly • But shallow semantic labels remain stable Core idea: Predict English Semantic Role Labels for arbitrary languages Semantic Role Labeling
  • 29.
    The station boughtCosby reruns for record prices Buy.01A0 … spurred dollar buying by Japanese institutions. Spur.01 A1 Buy.01 A0 The company payed 2 million US$ to buy …. Corpus of annotated text dataFrame set A1 A3 Buy.01 A0 – Buyer A1 – Thing bought A2 – Seller A3 – Price paid A4 – Benefactive Pay.01 A0 – Payer A1 – Money A2 – Being payed A3 – Commodity Buy.01Pay.01A0 A1 • English SRL – FrameNet – PropBank SRL Resources

Editor's Notes

  • #29 Useful for : Information Extraction Question Answering (NLQ)