K-SRL: Instance-based Learning for Semantic Role Labeling

K-SRL: Instance-based
Learning for Semantic Role
Labeling
Alan Akbik and Yunyao Li
IBM Research | Almaden
COLING 2016, 12/13/2016
SystemT

In Short
• K-SRL utilizes a simple but effective approach for
SRL
– Instance-based learning
– kNN: k-nearest neighbors
• State-of-the-art
– CoNLL-09 shared task data
– Outperforms previous SotA systems based on logistic regression and
NNs
– Both in-domain and out-of-domain data
2

Outline
• Problem Statement
• Proposed Approach
• Evaluation
• Discussion

SRL focuses on semantics, not syntax
Useful for many applications (Shen and
Lapata, 2007; Maqsud et al., 2014)
Dirk broke the window with a hammer.
Break.01A0 A1 A2
The window was broken by Dirk.
The window broke.
A1 Break.01 A0
A1 Break.01
Break.01
A0 – Breaker
A1 – Thing broken
A2 – Instrument
A3 – Pieces
Break.15
A0 – Journalist,
exposer
A1 – Story,
thing exposed
• Identify predicate-argument structures in sentences with
shallow semantic labels
• Identify which predicates evoke which semantic frames
• Identify constituents that take semantic roles within these frames
Semantic Role Labeling

SRL for NLP tasks
• Question Answering
– “What did Dirk break?”
– (Maqsud et al., 2014)
• Information Extraction
– “Who wants to buy what?”
– (Shen and Lapata, 2007)
• At IBM Research:
– Multilingual SRL
– Multilingual text analytics
5

SRL vs. Dependency Parsing
• LAS of state-of-the-art dependency parsers
– 92.05 (Weiss et al, 2015)
– 92.36 (Alberti et al, 2015)
– 92.79 (Andor et al, 2016)
• F1 of state-of-the-art SRL
– 87.79 (Roth and Woodsend, 2014)
– 88.19 (Roth and Lapata, 2016)
• SotA SRL quality lower than SotA dependency
parsing
6

SRL Challenges
• So, what makes SRL so difficult?
• Heavy-tailed distribution of class labels
– Common frames
• say.01 (8243), have.01 (2040), sell.01 (1009)
– Many uncommon frames
• swindle.01, feed.01, hum.01, toast.01
– Almost half of all frames seen fewer than 3 times in training data
• Many low-frequency exceptions
– Difficult to capture in model
7
0
2000
4000
6000
8000
10000
Distribution of frame labels

Low-frequency Exceptions
• Strong correlation of syntactic function of an argument to its role
• Example: passive subject
• CoNLL-09 Shared Task data:
– 86% of passive subjects are labeled A1
– Remaining 14% irregular, low-frequency exceptions
8
The window was broken by Dirk
SBJ
PMOD
VC NMOD
A1
The silver was sold by the man.
SBJ
PMOD
VC NMOD
A1
Creditors were told to hold off.
SBJ
ORPD
VC
IM PRT
TELL.01
A0: speaker (agent)
A1: utterance (topic)
A2: hearer (recipient)

Local Bias
• 86% of passive subjects are labeled A1 (over 4.000x in training
data)
• 87% of passive subjects of Tell.01 are labeled A2 (53x in training
data)
• Most classifiers:
– Bag-of-features
– Learn weights for features to classes
– Perform generalization
• Question: How do we explicitly capture low-frequency exceptions?
9

Instance-based Learning
• Proposed approach: Instance-based learning
– kNN: k-Nearest Neighbors classification
– Find the k most similar instances in training data
– Derive class label from nearest neighbors
11
A0
A1
A1
A2
A1
A1
A1
A1
A1
A0
A0
A1
A0
A2
A2A2
A2
A1
A2
?
1 2 3 ndistance
SBJ
ORPD
VC
IM PRT
“creditor” passive subject of TELL.01
noun passive subject of TELL.01
COMPOSITE FEATURE DISTANCE
1
2
.
.
.
.
.
.
any passive subject of any agentive verb n
?
– Main idea: Back off to composite feature seen at least k times

System Outline
• kNN classifier for both tasks
– Predicate labeling: Subcategorization features (subject, object, PPs)
– Argument labeling: Predicate frame, predicate POS, predicate voice, argument
head lemma, argument head POS, syntactic function of argument
• Additionally model global argument constraints
– Easy-First Argument Labeling
12
1. PREDICATE LABELING
2. ARGUMENT LABELING
TELL.01
A2 A1 A0
HOLD.01

Easy-First Argument Labeling
• Core roles can only be assigned once per predicate
13
Interesting stories were told to creditors.
TELL.01
A0: speaker (agent)
A2 – 86%
A1 – 14%
A2 – 86%
A1 – 14%
A2 – 100%
• Easy-first:
– Order argument classifications by confidence
– Make most confident prediction first
– Remove assigned labels from options

Evaluation
• Comparison to previous SotA approaches:
– Chen(Zhao et al., 2009), maximum entropy
– Che(Che et al., 2009), support vector machines
– MatePlus(Roth and Woodsend, 2014), logistic regression and word embeddings
– PathLSTM (Roth and Lapata, 2016), logistic regression for predicates and NNs
with word embeddings for arguments
• Additional questions:
– Impact of choice of k?
– Overfitting?
– Impact of modeling global argument constraints?
• Dataset: CoNLL-09 Shared Task data
– Penn treebank w. PropBank SRL annotations
– NomBank annotations not evaluated
– Out-of-domain data: Brown corpus
15

Results
16
In-domain Out-of-domain

Results
• Significantly outperform previous approaches
– Especially on out-of-domain data
• Small neighborhoods suffice
• Very simple feature set
– No word embeddings or verb classes
17

Discussion
• Pros:
– Simple, highly transparent classification approach
• Composite features make feature interactions explicit
• Nearest neighborhood makes classification decision transparent
– Very fast to train and test
– Fast and transparent: Great for feature engineering
• Cons:
– IBL necessitates more feature engineering
• Define composites
• “Rank” composites into backoff sequences
18

Current and Future Work
• Learning instance similarity measures
– Manual definition of composites, but automatic ranking
– Improves quality over published numbers
• Better generalization features
– Word-level: Embeddings
– Predicate-level: “Frame classes”
• Special focus on raised arguments
19
The man broke the window.
SBJ OBJ
A0
The rock broke the window.
SBJ OBJ
A2
SBJ
ORPD
VC
IM PRT

Current Related Work at IBM Research
• Multilingual SRL
– Creating “Universal Proposition Banks” on top of Universal
Dependencies
– Version 1.0 released: https://github.com/System-
T/UniversalPropositions/
– COLING paper “Multilingual Aliasing for Auto-Generating
Proposition Banks”
• Poster session on Friday
• Multilingual text analytics
– COLING demo “Multilingual Information Extraction with
PolyglotIE”
• Demo session on Friday
20

• Problem: Current SRL error-prone
• Strong local bias
• Many low-frequency exceptions, difficult to generalize
23
Instance-based Semantic Role Labeling
• K-SRL: Instance-based
learning
– K-nearest neighbors
• Composite features
• Significantly outperform all
other current systems
In-domain Out-of-domain

Instance-based Learning
• Most ML algorithms
– Bag of features: Syntactic function, predicate, voice …
– Learn weights for features to classes
• Strong weight for “passive subject” to A1
– Perform generalization
• Proposed approach: Instance-based learning
– kNN: k-Nearest Neighbors classification
– Find the k most similar instances in training data
• “all instances in which argument is a passive subject of TELL.01”
• Derive class label from majority in neighborhood
– No generalization
– Explicitly capture local bias using composite features
24

• Problem: Current SRL error-prone
• Strong local bias, many low-frequency exceptions, difficult to
generalize
25
Instance-based Semantic Role Labeling

Composite Features and Support
26
SBJ
ORPD
VC
IM PRT
TELL.01
A0: speaker (agent)
voice (V): passive
predicate (P): tell
path (P): VC SBJ

28
John hastily ordered a dozen dandelions for Mary from Amazon’s Flower Shop.
order.02 (request to be delivered)
A0: Orderer
A1: Thing ordered
A2: Benefactive, ordered-for
A3: Source
A0: Orderer
A1: Thing ordered
A2: Benefactive, ordered-for
A3: SourceAM-MNR: Manner
SRL: Who did what to whom, when,
where and how?
WHO
HOW
FOR
WHOM
DID
WHAT WHERE
• Problem: What type of labels are valid across
languages?
• Lexical, morphological and syntactic labels differ greatly
• But shallow semantic labels remain stable
Core idea: Predict English Semantic Role Labels for
arbitrary languages
Semantic Role Labeling

The station bought Cosby reruns for record prices
Buy.01A0
… spurred dollar buying by Japanese institutions.
Spur.01 A1 Buy.01 A0
The company payed 2 million US$ to buy ….
Corpus of annotated text dataFrame set
A1 A3
Buy.01
A0 – Buyer
A1 – Thing bought
A2 – Seller
A3 – Price paid
A4 – Benefactive
Pay.01
A0 – Payer
A1 – Money
A2 – Being payed
A3 – Commodity
Buy.01Pay.01A0 A1
• English SRL
– FrameNet
– PropBank
SRL Resources

K-SRL: Instance-based Learning for Semantic Role Labeling

More Related Content

Similar to K-SRL: Instance-based Learning for Semantic Role Labeling

More from Yunyao Li

Recently uploaded

K-SRL: Instance-based Learning for Semantic Role Labeling

Editor's Notes