English Proposition Bank Frames
Generating Training Data for Multilingual SRL
Semantic Parsing of 9 Languages
English Chinese Spanish
buy.01
Roles:
A0: buyer (agent)
A1: thing bought (theme)
A2: seller (source)
A3: price paid (asset)
A4: benefactive (beneficiary)
German Japanese Russian
Training data generation pipeline
• Optional: Manual aliasing of TL verbs to English frames
• Filtered annotation projection (Akbik et al., 2015)
like.01
Roles:
A0: liker (experiencer)
A1: object of affection (theme)
give.01
Roles:
A0: giver (agent)
A1: thing given(theme)
A2: entity given to(recipient)
sell.01
Roles:
A0: Seller (agent)
A1: Thing Sold (theme)
A2: Buyer (recipient)
A3: Price Paid
A4: Benefactive
Challenges and open questions
• Source-language SRL errors
• Coverage: Do appropriate English
frames exist for all TL verbs?
• pouvoir (to be able to), sollen (to be
supposed to)
• Crowdsourced data curation
(Akbik et al., 2016)
• Design of crowdsourcing task
Alan Akbik and Yunyao Li
IBM Research - Almaden
POLYGLOT
Multilingual Semantic Role Labeling with Unified Labels
Idea: Use English Proposition
Bank Frames and Roles as
universal semantic labels
annehmen.01
(accept)
Roles:
A0: acceptor (agent)
A1: thing accepted (theme)
A2: accepted-from (source)
A3: attribute (attribute)
annehmen.02
(assume)
Roles:
A0: thinker (agent)
A1: thought(theme)
A2: attributive (source)
Predicate: annehmen
Example Target Language FrameAnnotation Projection
English, German, Chinese, French, Japanese, Spanish, Russian, Hindi and Arabic
semantic labels
(predicates + roles)
EN
unlabeled corpus
TL
Parallel corpus
semantic labels
(projected)
Annotation
projection
TL
Annotation projection Future work: Crowdsourced and expert data curation
Crowd
agrees?
Input
Crowdsourced data
curation
semantic labels
(crowd cannot curate)
semantic labels
(curated, final)
TL
TL
Expert data
curation
yes
no
Multilingual
aliases
Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. ACL 2015.
Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik and Yunyao Li. EMNLP 2016.
Evaluation

Polyglot: Multilingual Semantic Role Labeling with Unified Labels

  • 1.
    English Proposition BankFrames Generating Training Data for Multilingual SRL Semantic Parsing of 9 Languages English Chinese Spanish buy.01 Roles: A0: buyer (agent) A1: thing bought (theme) A2: seller (source) A3: price paid (asset) A4: benefactive (beneficiary) German Japanese Russian Training data generation pipeline • Optional: Manual aliasing of TL verbs to English frames • Filtered annotation projection (Akbik et al., 2015) like.01 Roles: A0: liker (experiencer) A1: object of affection (theme) give.01 Roles: A0: giver (agent) A1: thing given(theme) A2: entity given to(recipient) sell.01 Roles: A0: Seller (agent) A1: Thing Sold (theme) A2: Buyer (recipient) A3: Price Paid A4: Benefactive Challenges and open questions • Source-language SRL errors • Coverage: Do appropriate English frames exist for all TL verbs? • pouvoir (to be able to), sollen (to be supposed to) • Crowdsourced data curation (Akbik et al., 2016) • Design of crowdsourcing task Alan Akbik and Yunyao Li IBM Research - Almaden POLYGLOT Multilingual Semantic Role Labeling with Unified Labels Idea: Use English Proposition Bank Frames and Roles as universal semantic labels annehmen.01 (accept) Roles: A0: acceptor (agent) A1: thing accepted (theme) A2: accepted-from (source) A3: attribute (attribute) annehmen.02 (assume) Roles: A0: thinker (agent) A1: thought(theme) A2: attributive (source) Predicate: annehmen Example Target Language FrameAnnotation Projection English, German, Chinese, French, Japanese, Spanish, Russian, Hindi and Arabic semantic labels (predicates + roles) EN unlabeled corpus TL Parallel corpus semantic labels (projected) Annotation projection TL Annotation projection Future work: Crowdsourced and expert data curation Crowd agrees? Input Crowdsourced data curation semantic labels (crowd cannot curate) semantic labels (curated, final) TL TL Expert data curation yes no Multilingual aliases Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. ACL 2015. Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik and Yunyao Li. EMNLP 2016. Evaluation