Transition-based Semantic Role Labeling Using Predicate Argument Clustering

1,518 views

Published on

This paper suggests two ways of improving semantic role labeling (SRL). First, we introduce a novel transition-based SRL algorithm that gives a quite different approach to SRL. Our algorithm is inspired by shift-reduce parsing and brings the advantages of the transition-based approach to SRL. Second, we present a self-learning clustering technique that effectively improves labeling accuracy in the test domain. For better generalization of the statistical models, we cluster verb predicates by comparing their predicate argument structures and apply the clustering information to the final labeling decisions. All approaches are evaluated on the CoNLL’09 English data. The new algorithm shows comparable results to another state-of-the-art system. The clustering technique improves labeling accuracy for both in-domain and out-of-domain tasks.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,518
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transition-based Semantic Role Labeling Using Predicate Argument Clustering

  1. 1. Transition-based Semantic Role Labeling Using Predicate Argument Clustering Workshop on Relational Models of Semantics Jinho D. Choi & Martha Palmer University of Colorado at Boulder June 23rd, 2011Thursday, June 23, 2011
  2. 2. Dependency-based SRL • Semantic role labeling - Task of identifying arguments of each predicate and labeling them with semantic roles in relation to the predicate. • Dependency-based semantic role labeling - Advantages over constituent-based semantic role labeling. • Dependency parsing is faster (2.29 milliseconds / sentence). • Dependency structure is more similar to predicate argument structure. - Labels headwords instead of phrases. • Still can recover the original semantic chunks for the most of time (Choi and Palmer, LAW 2010). 2Thursday, June 23, 2011
  3. 3. Dependency-based SRL • Constituent-based vs. dependency-based SRL Agent S Theme Instrument Temporal NP VP NP PP PP NP NP He opened the door with his foot at ten He door with at the foot ten his 3Thursday, June 23, 2011
  4. 4. Dependency-based SRL • Constituent-based vs. dependency-based SRL opened SBJ OBJ ADV TMP He door with at the foot ten his opened ARG0 ARG1 ARG2 TMP He the door with his foot at ten 4Thursday, June 23, 2011
  5. 5. Motivations • Do argument identification and classification need to be in separate steps? - They may require two different feature sets. - Training them in a pipeline takes less time than as a joint- inference task. - We have seen advantages of dealing with them as a joint- inference task in dependency parsing, why not in SRL? 5Thursday, June 23, 2011
  6. 6. Transition-based SRL • Dependency parsing vs. dependency-based SRL - Both try to find relations between word pairs. - Dep-based SRL is a special kind of dep. parsing. • It restricts the search only to top-down relations between predicate (head) and argument (dependent) pairs. • It allows multiple predicates for each argument. • Transition-based SRL algorithm - Top-down, bidirectional search. → More suitable for SRL - Easier to develop a joint-inference system between dependency parsing and semantic role labeling. 6Thursday, June 23, 2011
  7. 7. Transition-based SRL • Parsing states - (λ1, λ2, p, λ3, λ4, A) - p - index of the current predicate candidate. - λ1 - indices of lefthand-side argument candidates. - λ4 - indices of righthand-side argument candidates. - λ2,3 - indices of processed tokens. - A - labeled arcs with semantic roles • Initialization: ([ ], [ ], 1, [ ], [2, ..., n], ∅) • Termination: (λ1, λ2, ∄, [ ], [ ], A) 7Thursday, June 23, 2011
  8. 8. Transition-based SRL • Transitions - No-Pred - finds the next predicate candidate. - No-Arc← - rejects the lefthand-side argument candidate. - No-Arc→ - rejects the righthand-side argument candidate. - Left-Arc← - accepts the lefthand-side argument candidate. - Right-Arc→ - accepts the righthand-side argument candidate. 8Thursday, June 23, 2011
  9. 9. A0 A1 John1 wants2 to3 buy4 a5 car6 A0 A1 John wants wants to car to buy → car to John buy a buy John ← buy wants wants car buy a wants → to John John to to a car John ← wants λ1 λ2 λ3 λ4 A • No-Pred • No-Pred • Left-Arc : John ← wants • No-Arc x 2 • Right-Arc : wants → to • Left-Arc : John ← buy • No-Arc x 3 • No-Arc • Shift • Right-Arc : buy → car 9Thursday, June 23, 2011
  10. 10. Features • Baseline features - N-gram and binary features (similar to ones in Johansson and Nugues, EMNLP 2008). - Structural features. wants Subcategorization of “wants” SBJ OPRD PRP:John TO:to SBJ ← V → OPRD IM VB:buy Path from “John” to “buy” PRP ↑ LCA ↓ TO ↓ VB Depth from “John” to “buy” SBJ ↑ LCA ↓ OPRD ↓ IM 1 ↑ LCA ↓ 2 10Thursday, June 23, 2011
  11. 11. Features • Dynamic features - Derived from previously identified arguments. - Previously identified argument label of warg. A0 A1 John1 wants2 to3 buy4 a5 car6 A0 A1 - Label of the very last predicted numbered argument of wpred. - These features can narrow down the scope of expected arguments of wpred. 11Thursday, June 23, 2011
  12. 12. Experiments • Corpora - CoNLL’09 English data. - In-domain task: the Wall Street Journal. - Out-of-domain task: the Brown corpus. • Input to our semantic role labeler - Automatically generated dependency trees. - Used our open-source dependency parser, ClearParser. • Machine learning algorithm - Liblinear L2-L1 SVM. 12Thursday, June 23, 2011
  13. 13. Experiments • Results - AI - Argument Identification. - AC - Argument Classification. In-domain Out-of-domain Task P R F1 P R F1 AI 92.57 88.44 90.46 90.96 81.57 86.01 Baseline AI+AC 87.20 83.31 85.21 77.11 69.14 72.91 + AI 92.38 88.76 90.54 90.90 82.25 86.36 Dynamic AI+AC 87.33 83.91 85.59 77.41 70.05 73.55 JN’08 AI+AC 88.46 83.55 85.93 77.67 69.63 73.43 13Thursday, June 23, 2011
  14. 14. Summary • Introduced a transition-based SRL algorithm, showing near state-of-the-art results. - No need to design separate systems for argument identification and classification. - Make it easier to develop a joint-inference system between dependency parsing and semantic role labeling. • Future work - Several techniques, designed to improve transition-based parsing, can be applied (e.g., dynamic programming, k-best ranking) - We can apply more features, such as clustering information, to improve labeling accuracy. 14Thursday, June 23, 2011
  15. 15. Predicate Argument Clustering • Verb clusters can give more generalization to the statistical models. - Clustering verbs using bag-of-words, syntactic structure. - Clustering verbs using predicate argument structure. • Self-learning clustering - Cluster verbs in the test data using automatically generated predicate argument structures. - Cluster verbs in the training data using the verb clusters found in the test data as seeds. - Re-run our semantic role labeler on the test data using the clustering information. 15Thursday, June 23, 2011
  16. 16. Figure 2: Projecting the predicate argument structure of each verb into vector space. Predicate Argument Clustering Figure 2: Projecting the predicate argument structure of rma- higherverb into vector space.more important than the each confidence, or are e la- orma- t al., • others; e.g., ARG0 andorARG1moregenerally predicted Vector confidence, are are important than the higher representation with higher confidence than modifiers, nouns give ole la- ually et al., - others; e.g., ARG0 and semantic role labels predictedlemmas. Semantic role labels, ARG1 are generally + word more important information than some other gram- with higher confidence than modifiers, nouns giveg, for maticalVerb A0 A1 ... john:A0 assign each ex- categories, etc. Instead, we to:A1 car:A1 ... sually more important information than some other gram- seful isting feature with a value computed by 1 follow- want 1 the 1 etc. Instead, we assign each0ex- 0s 0s 1 ng, for matical categories, g-of- ing equations: 1 1 0s useful isting buy 1 0 feature with a value computed by the follow- 0s 1 how- ag-of- ing equations: 1 redi- how- cally - s(lj |vi ) = 1 + exp(−score(lj |vi )) predi- 1 s(lj |vi ) = s by tically 11 scorejof lj noun) jlabel of vi (w = being a + exp(−score(l |vi ))mized s(mj , lj ) = count(mj ,lj ) ) rbs by exp( (w count(mk ,lk ) 1 ∀k = noun) mized - s(mj , lj ) = exp( j count(mj ,lj ) ) clus- ∀k count(mk ,lk ) vi is the current verb, ljlikelihood of m co-occurring with l is the j’th label of vi , and eling max. j j clus- mj is lj ’s corresponding lemma. score(lj |vi ) is am se- vi is the current verb, lj is the j’th label of vi , and beling score of lj being a correct argument label of vi ; this algo- mj is lj ’s corresponding lemma. score(lj |vi ) is a 16 rm se- is always 1 for training data and is provided by oure test score of l being a correct argument label of v ; this Thursday, June 23, 2011
  17. 17. Predicate Argument Clustering • Clustering verbs in the test data - K-best hierarchical agglomerative clustering. • Merges k-best pairs at each iteration. • Uses a threshold to dynamically determine the top k clusters. - We set another threshold for early break-out. • Clustering verbs in the training data - K-means clustering. • Starts with centroids estimated from the clusters found in the test data. • Uses a threshold to filter out verbs not close enough to any cluster. 17Thursday, June 23, 2011
  18. 18. Experiments • Results In-domain Out-of-domain Task P R F1 P R F1 AI 92.57 88.44 90.46 90.96 81.57 86.01 Baseline AI+AC 87.20 83.31 85.21 77.11 69.14 72.91 + AI 92.38 88.76 90.54 90.90 82.25 86.36 Dynamic AI+AC 87.33 83.91 85.59 77.41 70.05 73.55 + AI 92.62 88.90 90.72 90.87 82.43 86.44 Cluster AI+AC 87.43 83.92 85.64 77.47 70.28 73.70 JN’08 AI+AC 88.46 83.55 85.93 77.67 69.63 73.43 18Thursday, June 23, 2011
  19. 19. Conclusion • Introduced self-learning clustering technique, potential for improving labeling accuracy in the new domain. - Need to try with large scale data to see a clear impact of the clustering. - Can also be improved by using different features or clustering algorithms. • ClearParser open-source project - http://code.google.com/p/clearparser/ 19Thursday, June 23, 2011
  20. 20. Acknowledgements • We gratefully acknowledge the support of the National Science Foundation Grants CISE-IIS- RI-0910992, Richer Representations for Machine Translation, a subcontract from the Mayo Clinic and Harvard Children’s Hospital based on a grant from the ONC, 90TR0002/01, Strategic Health Advanced Research Project Area 4: Natural Language Processing, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. 20Thursday, June 23, 2011

×