Optimization of NLP Components for Robustness and Scalability

1,282 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,282
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Optimization of NLP Components for Robustness and Scalability

  1. 1. Optimization of NLP Components For Robustness and Scalability Dissertation Defense August 15, 2012 Jinho D. Choi choijd@colorado.edu Department of Computer Science University of Colorado BoulderFriday, August 17, 2012
  2. 2. Ever since I came, Boulder has been ... • #1: Top 10 College Towns (Livability, 2012) • #1: Top 10 Least Obese Metro Areas (Gallup Healthways, 2012) • #1: Top 10 Happiest Cities (Gallup Healthways, 2012) • #1: The 10 Most Educated U.S. Cities (US News, 2011) • #1: America’s 15 Most Active Cities (Time - Healthland, 2011) • #1: Best Quality of Life in America (Porfolio, 2011) • #1: 20 Brainiest Cities in America (Daily Beast, 2010) • #1: Western Cities Fare Best in Well-being (USA Today, 2010) • #1: Americas Foodiest Town (Bon Appétit, 2010) • #1: The Best Cities to Raise an Outdoor Kid (Backpacker, 2009) • #1: Americas Top 25 Towns To Live Well (Forbes, 2009) • #1: Americas Smartest Cities (Forbes, 2008) • #1: Top Heart Friendly Cities (American Heart Association, 2008) 2Friday, August 17, 2012
  3. 3. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 3Friday, August 17, 2012
  4. 4. Introduction • The application of NLP has ... - Expanded to everyday computing. - Broadened to a general audience. ‣ More attention is drawn to the practical aspects of NLP. • NLP components should be tested for - Robustness in handling heterogeneous data. • Need to be evaluated on data from several different sources. - Scalability in handling a large amount of data. • Need to be evaluated for speed and complexity. 4Friday, August 17, 2012
  5. 5. Introduction • Research question - How to improve the robustness and scalability of standard NLP components. • Goals - To prepare gold-standard data from several different sources for in-genre and out-of-genre experiments. - To develop a POS tagger, a dependency parser, and a semantic role labeler showing robust results across this data. - To reduce average complexities of these components while retaining good performance in accuracy. 5Friday, August 17, 2012
  6. 6. Introduction • Thesis statement 1. We improve the robustness of three NLP components: • POS tagger: by building a generalized model. • Dependency parser: by bootstrapping parse information. • Semantic role labeler: by applying higher-order argument pruning. 2. We improve the scalability of these three components: • POS tagger: by adapting dynamic model selection. • Dependency parser: by optimizing the engineering of transition- based parsing algorithms. • Semantic role labeler: by applying conditional higher-order argument pruning. 6Friday, August 17, 2012
  7. 7. Introduction Start Constituent Treebanks PropBanks Dependency Conversion Training Set: Evaluation Set: Dependency Trees Dependency Trees + Semantic Roles + Semantic Roles Part-of-speech Part-of-speech Part-of-speech Trainer Tagging Model Tagger Dependency Dependency Dependency Trainer Parsing Model Parser Semantic Role Semantic Role Semantic Role Trainer Labeling Model Labeler Stop 7Friday, August 17, 2012
  8. 8. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 8Friday, August 17, 2012
  9. 9. Dependency Conversion • Motivation - A small amount of manually annotated dependency trees (Rambow et al., 2002; Čmejrek et al., 2004). - A large amount of manually annotated constituent trees (Marcus et al., 1993; Weischedel et al., 2011). - Converting constituent trees into dependency trees → A large amount of pseudo annotated dependency trees. • Previous approaches - Penn2Malt (stp.lingfil.uu.se/~nivre/research/Penn2Malt.html). - LTH converter (Johansson and Nugues, 2007). - Stanford converter (de Marneffe and Manning, 2008a). 9Friday, August 17, 2012
  10. 10. Dependency Conversion • Comparison - The Stanford and CLEAR dependency approaches generate 3.62% and 0.23% of unclassified dependencies, respectively. - Our conversion produces 3.69% of non-projective trees. Penn2Malt LTH Stanford CLEAR Labels Malt CoNLL Stanford Stanford+ Long-distance DPs ✓✓✓ ✓✓✓ Secondary DPs ✓ ✓✓ ✓✓✓✓ Function Tags ✓✓ ✓✓ New TB Format NO NO NO YES Maintenance NO NO YES YES 10Friday, August 17, 2012
  11. 11. Dependency Conversion (1/6) 1. Input a constituent tree. • Penn, OntoNotes, CRAFT, MiPACQ, and SHARP Treebanks. NP NP SBAR WHNP-1 S NP VP NP NN CC NN WDT PRP VB -NONE- Peace and joy that we want *T*-1 11Friday, August 17, 2012
  12. 12. Dependency Conversion (2/6) 2. Reorder constituents related to empty categories. • *T*: wh-movement and topicalization. • *RNR*: right node raising. • *ICH* and *PPA*: discontinuous constituent. NP NP NP SBAR NP SBAR WHNP-1 S S NP VP NP VP NP WHNP-1 NN CC NN WDT PRP VB -NONE- NN CC NN PRP VB WDT Peace and joy that we want *T*-1 Peace and joy we want that 12Friday, August 17, 2012
  13. 13. Dependency Conversion (3/6) 3. Handle special cases. • Apposition, coordination, and small clauses. NP NP SBAR S NP VP WHNP-1 NN CC NN PRP VB WDT Peace and joy we want that The original word order is preserved conj cc in the converted dependency tree. root Peace and joy that we want 13Friday, August 17, 2012
  14. 14. Dependency Conversion (4/6) 4. Handle general cases. • Head-finding rules and heuristics. NP NP SBAR S NP VP WHNP-1 NN CC NN PRP VB WDT Peace and joy we want that rcmod conj dobj root cc nsubj root Peace and joy that we want 14Friday, August 17, 2012
  15. 15. Dependency Conversion (5/6) 5. Add secondary dependencies. • Gapping, referent, right node raising, open clausal subject. NP NP SBAR S NP VP WHNP-1 NN CC NN PRP VB WDT Peace and joy we want that rcmod conj dobj root cc nsubj root Peace and joy that we want ref 15Friday, August 17, 2012
  16. 16. Nielsen et al., 2010; Weischedel et al., 2011; Verspoor et al., 2012). Tags followed by ∗ are not the Dependency Conversion (6/6) typical Penn Treebank tags but used in some other Treebanks. A.1 Function tags 6. Add function tags. Syntactic roles ADV Adverbial PUT Locative complement of put CLF It-cleft PRD Non-VP predicate CLR Closely related constituent RED∗ Reduced auxiliary DTV Dative SBJ Surface subject LGS Logical subject in passive TPC Topicalization NOM Nominalization Semantic roles BNF Benefactive MNR Manner DIR Direction PRP Purpose or reason EXT Extent TMP Temporal LOC Locative VOC Vocative Text and speech categories ETC Et cetera SEZ Direct speech FRM∗ Formula TTL Title HLN Headline UNF Unfinished constituent IMP Imperative Table A.1: A list of function tags for English. 16Friday, August 17, 2012
  17. 17. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 17Friday, August 17, 2012
  18. 18. Experimental Setup • The Wall Street Journal (WSJ) models - Train • The WSJ 2-21 in OntoNotes (Weischedel et al., 2011). • Total: 30,060 sentences, 731,677 tokens, 77,826 predicates. - In-genre evaluation (Avgi) • The WSJ 23 in OntoNotes. • Total: 1,640 sentences, 39,590 tokens, 4,138 predicates. - Out-of-genre evaluation (Avgo) • 5 genres in OntoNotes, 2 genres in MiPACQ (Nielsen et al., 2010), 1 genre in SHARP. • Total: 19,368 sentences, 265,337 tokens, 32,142 predicates. 18Friday, August 17, 2012
  19. 19. Experimental Setup • The OntoNotes models - Train • 6 genres in OntoNotes. • Total: 96,406 sentences, 1,983,012 tokens, 213,695 predicates. - In-genre evaluation (Avgi) • 6 genres in OntoNotes. • Total: 13,337 sentences, 201,893 tokens, 25,498 predicates. - Out-of-genre evaluation (Avgo) • Same 2 genres in MiPACQ, same 1 genre in SHARP. • Total: 7,671 sentences, 103,034 tokens, 10,782 predicates. 19Friday, August 17, 2012
  20. 20. Experimental Setup • Accuracy - Part-of-speech tagging • Accuracy. - Dependency parsing • Labeled attachment score (LAS). • Unlabeled attachment score (UAS). - Semantic role labeling • F1-score of argument identification. • F1-score of both argument identification and classification. 20Friday, August 17, 2012
  21. 21. Experimental Setup • Speed - All experiments are run on an Intel Xeon 2.57GHz machine. - Each model is run 5 times, and an average speed is measured by taking the average of middle 3 speeds. • Machine learning algorithm - Liblinear L2-regularization, L1-loss SVM classification (Hsieh et al., 2008). - Designed to handle large scale, high dimensional vectors. - Runs fast with accurate performance. - Our implementation of LibLinear is publicly available. 21Friday, August 17, 2012
  22. 22. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 22Friday, August 17, 2012
  23. 23. Part-of-Speech Tagging • Motivation - Supervised learning approaches do not perform well in out-of-genre experiments. - Domain adaptation approaches require knowledge of incoming data. - Complicated tagging or learning approaches often run slowly during decoding. • Dynamic model selection - Build two models, generalized and domain-specific, given one set of training data. - Dynamically select one of the models during decoding. 23Friday, August 17, 2012
  24. 24. Part-of-Speech Tagging • Training 1. Group training data into documents (e.g., sections in WSJ). 2. Get the document frequency of each simplified word form. • In simplified word forms, all numerical expressions with or w/o special characters are converted to 0. 3. Build a domain-specific model using features extracted from only tokens whose DF(SW) > 1. 4. Build a generalized model using features extracted from only tokens whose DF(SW) > 2. 5. Find the cosine similarity threshold for dynamic model selection. 24Friday, August 17, 2012
  25. 25. Part-of-Speech Tagging • Cosine similarity threshold - During cross-validation, collect cosine-similarities between simplified word forms used for building the domain-specific model and input sentences that the domain-specific model shows advantage. - The cosine similarity in the first 5% area becomes the threshold for dynamic model selection. 190 160 Occurrence 120 80 40 5% 0 0 0.02 0.04 0.06 Cosine Similarity 25Friday, August 17, 2012
  26. 26. Part-of-Speech Tagging • Decoding - Measure the cosine similarity between simplified word forms used for building the domain-specific model and each input sentence. - If the similarity is greater than the threshold, use the domain- specific model. - If the similarity is less than or equal to the threshold, use the generalized model. Runs as fast as a single model approach. 26Friday, August 17, 2012
  27. 27. Part-of-Speech Tagging • Experiments - Baseline: using the original word forms. - Baseline+: using lowercase simplified word forms. - Domain: domain-specific model. - General: generalized model. - ClearNLP: dynamic model selection. - Stanford: Toutanova et al., 2003. - SVMTool: Giménez and Màrquez, 2004. 27Friday, August 17, 2012
  28. 28. Part-of-Speech Tagging • Accuracy - WSJ models (Avgi and Avgo) In-domain experiments 97.5 97.39 97.40 97.41 97.31 97.24 97.0 96.93 96.98 96.5 Baseline Baseline+ Domain General ClearNLP Stanford SVMTool Out-of-domain experiments 90.5 90.61 90.79 90.43 89.5 89.92 89.49 88.5 88.64 88.25 87.5 Baseline Baseline+ Domain General ClearNLP Stanford SVMTool 28Friday, August 17, 2012
  29. 29. Part-of-Speech Tagging • Accuracy - OntoNotes models (Avgi and Avgo) In-domain experiments 96.6 96.58 96.56 96.52 96.4 96.41 96.32 96.2 96.23 96.19 96 Baseline Baseline+ Domain General ClearNLP Stanford SVMTool Out-of-domain experiments 90 89 89.26 89.26 89.20 88 88.60 87.75 87.61 87 86.79 86 Baseline Baseline+ Domain General ClearNLP Stanford SVMTool 29Friday, August 17, 2012
  30. 30. Part-of-Speech Tagging • Speed comparison Model Tokens per sec. Millisecs. per sen. ClearNLP 32,654 0.44 ClearNLP+ 39,491 0.37 WSJ Stanford 250 58.06 SVMTool 1,058 13.71 ClearNLP 32,206 0.45 ClearNLP+ 39,882 0.36 OntoNotes Stanford 136 106.34 SVMTool 924 15.71 • ClearNLP : as reported in the thesis. • ClearNLP+: new improved results. 30Friday, August 17, 2012
  31. 31. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 31Friday, August 17, 2012
  32. 32. Dependency Parsing • Goals 1. To improve the average parsing complexity for non- projective dependency parsing. 2. To reduce the discrepancy between dynamic features used for training on gold trees and decoding automatic trees. 3. To ensure well-formed dependency graph properties. • Approach 1. Combine transitions in both projective and non-projective dependency parsing algorithms. 2. Bootstrap dynamic features during training. 3. Post-process. 32Friday, August 17, 2012
  33. 33. Table 5.1 shows functional decomposition of transitions used in Nivre’s arc-eager and Covington’s Dependency Parsing algorithms. Nivre’s arc-eager algorithm is a projective parsing algorithm that shows a worst-case parsing complexity of O(n) (Nivre, 2003). Covington’s algorithm is a non-projective parsing al- • Transition decomposition gorithm that shows a worst-case parsing complexity of O(n2 ) without backtracking (Covington, - Decompose transitions in: 2001). Covington’s algorithm was later formulated as a transition-based parsing algorithm by Nivre • (2008), called Nivre’s list-based algorithm. Table(projective; Nivre, 2003). Nivre’s arc-eager algorithm 5.3 shows the relation between the decomposed • Nivre’s list-based algorithm (non-projective; Nivre, 2008). transitions in Table 5.1 and the transitions from the original algorithms. Operation Transition Description l Left-∗l ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ∪ {i ← j} ) Arc l Right-∗l ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ∪ {i → j} ) No-∗ ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ) ∗-Shiftd|n ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i|λ2 |j], [ ], β, A ) List ∗-Reduce ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( λ1 , λ2 , [j|β], A ) ∗-Pass ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( λ1 , [i|λ2 ], [j|β], A ) Table 5.1: Decomposed transitions grouped into the Arc and List operations. This decomposition makes it easier to integrate transitions from different parsing algorithms. Operation Transition Precondition Left-∗l [i = 0] ∧ ¬[∃k. (i ← k) ∈ A] ∧ ¬[(i →∗ j) ∈ A] Arc Right-∗l 33 ¬[∃k. (k → j) ∈ A] ∧ ¬[(i ∗← j) ∈ A]Friday, August 17, 2012 No-∗ ¬[∃l. Left-∗l ∨ Right-∗l ]
  34. 34. be recomposed into transitions used in several different dependency parsing algorithms. 5.2.2 Dependency Parsing Transition recomposition • Transition recomposition Any combination of two decomposed transitions in Table 5.1, one from each operation, can be - recomposed into a new transition. Forof two decomposedof Left-∗l and ∗-Reduce makes a Any combination instance, the combination transitions, one from each operation,performs Left-∗ and ∗-Reduce sequentially; the Arc operation can be recomposed. transition, Left-Reduce , which l l - is always performed before the List operation. Table 5.3 an ARC operation is For each recomposed transition, shows how these decomposed transitions performed first and a LIST operation is performed later. are recomposed into transitions used in different dependency parsing algorithms. Projective Non-projective Transition Nivre’03 Covington’01 Nivre’08 CP’11 This work Left-Reducel Left-Passl Right-Shiftnl Right-Passl No-Shiftd No-Shiftn No-Reduce No-Pass Table 5.3: Transitions in different dependency parsing algorithms. The last column shows transitions used in our parsing algorithm. The other columns show transitions used in Nivre (2003), Covington 34 (2001), Nivre (2008), and Choi and Palmer (2011a), respectively.Friday, August 17, 2012
  35. 35. Dependency Parsing • Average parsing complexity - The number of transitions performed per sentence. 2850 330 Nivre08 Covington01 250 # of transitions # of transitions 2000 200 CP11 1500 150 Nivre08 1000 CP11 100 This work 500 This work 50 0 10 20 30 40 40 50 50 60 60 70 70 80 80 Sentence length Sentence length 35Friday, August 17, 2012
  36. 36. Dependency Parsing • Bootstrapping - Transition-based dependency parsing can take advantage of dynamic features (e.g., head, leftmost/rightmost dependent). w0 ! h j wi p j wi wj w1 wj-1 wi+1 wj-1 - Features extracted from gold-standard trees during training can be different from features extracted from automatic trees during decoding. - By bootstrapping these dynamic features, we can significantly improve parsing accuracy. 36Friday, August 17, 2012
  37. 37. Dependency Parsing Begin Training Data Gold-standard Gold-standard Features Labels Machine Learning Algorithm Statistical Automatic Model Features Determined by Stop? NO Dependency cross-validation. Parser YES End 37Friday, August 17, 2012
  38. 38. Dependency Parsing • Post-processing - Transition-based dependency parsing does not guarantee parse output to be a tree. - After parsing, we find the head of each headless token by comparing it to all other tokens using the same model. - A predicted head with the highest score that does not break tree properties becomes the head of this token. - This post-processing technique significantly improves parsing accuracy in out-of-genre experiments. 38Friday, August 17, 2012
  39. 39. Dependency Parsing • Experiments - Baseline: using all recomposed transitions. - Baseline+: Baseline with post-processing. - ClearNLP: Baseline+ with bootstrapping. - CN’09: Choi and Nicolov, 2009. - CP’11: Choi and Palmer, 2011a. - MaltParser: Nivre, 2009. - MSTParser: McDonald et al., 2005. • Use only 1st order features; with 2nd order features, accuracy is expected to be higher and speed is expected to be slower. 39Friday, August 17, 2012
  40. 40. Dependency Parsing • Accuracy - WSJ models (Avgi and Avgo) LAS UAS In-genre experiments 90 89.68 89.5 89.74 88.75 88.57 88.81 88.23 88.36 87.5 88.10 87.79 88.03 86.94 87.18 86.25 86.49 86.03 85 Baseline Baseline+ ClearNLP CN’09 CP’11 MaltParser MSTParser Out-of-genre experiments 80 78.25 79.36 79.08 79.18 79.26 78.60 78.29 78.04 76.5 74.75 75.50 75.23 75.34 74.68 74.46 74.18 74.10 73 Baseline Baseline+ ClearNLP CN’09 CP’11 MaltParser MSTParser 40Friday, August 17, 2012
  41. 41. Dependency Parsing • Accuracy - OntoNotes models (Avgi and Avgo) LAS UAS In-genre experiments 88 87 87.75 87.48 87.57 86 86.54 86.83 86.70 86.40 85 85.68 85.41 85.49 84 84.51 84.76 84.05 83 83.66 Baseline Baseline+ ClearNLP CN’09 CP’11 MaltParser MSTParser Out-of-genre experiments 78.5 78.05 77.94 76.75 77.43 77.40 77.54 76.26 76.65 75 73.25 74.18 73.83 73.86 73.47 73.30 72.37 72.73 71.5 Baseline Baseline+ ClearNLP CN’09 CP’11 MaltParser MSTParser 41Friday, August 17, 2012
  42. 42. Dependency Parsing • Speed comparison - WSJ models ClearNLP ClearNLP+ CN’09 CP’11 MaltParser 1.61 ms 1.16 ms 1.25 ms 1.08 ms 2.14 ms 20 15 Milliseconds 10 5 0 10 20 30 40 50 60 70 80 Sentence Length 42Friday, August 17, 2012
  43. 43. Dependency Parsing • Speed comparison - OntoNotes models ClearNLP ClearNLP+ CN’09 CP’11 MaltParser 1.89 ms 1.28 ms 1.26 ms 1.12 ms 2.14 ms 20 15 Milliseconds 10 5 0 10 20 30 40 50 60 70 80 Sentence Length 43Friday, August 17, 2012
  44. 44. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 44Friday, August 17, 2012
  45. 45. Semantic Role Labeling • Motivation - Not all tokens need to be visited for semantic role labeling. - A typical pruning algorithm does not work as well when automatically generated trees are provided. - An enhanced pruning algorithm could improve argument coverage while maintaining low average labeling complexity. • Approach - Higher-order argument pruning. - Conditional higher-order argument pruning. - Positional feature separation. 45Friday, August 17, 2012
  46. 46. Semantic Role Labeling • Semantic roles in dependency trees ARG0 ARG1 ARG2 ARGM-TMP 46Friday, August 17, 2012
  47. 47. Semantic Role Labeling • First-order argument pruning (1st) - Originally designed for constituent trees. • Considers only siblings of the predicate, predicate’s ancestors, and siblings of predicate’s ancestors argument candidates (Xue and Palmer, 2004). - Redesigned for dependency trees. • Considers only dependents of the predicate, predicate’s ancestors, and dependents of predicate’s ancestors argument candidates (Johansson and Nugues, 2008). - Covers over 99% of all arguments using gold-standard trees. - Covers only 93% of all arguments using automatic trees. 47Friday, August 17, 2012
  48. 48. Semantic Role Labeling • Higher-order argument pruning (High) - Considers all descendants of the predicate, predicate’s ancestors, and dependents of predicate’s ancestors argument candidates. - Significantly improves argument coverage when automatically generated trees are used. 100 99.92 99.44 Argument Coverage 98 98.24 97.59 96 94 92 92.94 91.02 90 WSJ-1st ON-1st WSJ-High ON-High Gold-1st Gold-High 48Friday, August 17, 2012
  49. 49. Semantic Role Labeling • Conditional higher-order argument pruning (High+) - Reduces argument candidates using path-rules. - Before training, • Collect paths between predicates and their descendants whose subtrees contain arguments of the predicates. • Collect paths between predicates and their ancestors whose direct dependents or ancestors are arguments of the predicates. • Cut off paths whose counts are below thresholds. - During training and decoding, skip tokens and their subtrees or ancestors whose paths to the predicates are not seen. 49Friday, August 17, 2012
  50. 50. Semantic Role Labeling • Average labeling complexity - The number of tokens visited per predicate. Using the WSJ models (OntoNotes graph is similar) 75 All 60 # of candidates 50 40 High 30 High+ 20 1st 10 0 10 20 30 40 50 60 70 80 Sentence length 50Friday, August 17, 2012
  51. 51. Semantic Role Labeling • Positional feature separation - Group features by arguments’ positions with respect to their predicates. - Two sets of features are extracted. • All features derived from arguments on the lefthand side of the predicates are grouped in one set, SL. • All features derived from arguments on the righthand side of the predicates are grouped in another set, SR. - During training, build two models, ML and MR, for SL and SR. - During decoding, use ML and MR for argument candidates on the lefthand and righthand sides of the predicates. 51Friday, August 17, 2012
  52. 52. Semantic Role Labeling • Experiments - Baseline: 1st order argument pruning. - Baseline+: Baseline with positional feature separation. - High: higher-order argument pruning. - All: no argument pruning. - ClearNLP: conditional higher-order argument pruning. • Previously called High+. - ClearParser: Choi and Palmer, 2011b. 52Friday, August 17, 2012
  53. 53. Semantic Role Labeling • Accuracy - WSJ models (Avgi and Avgo) In-domain experiments 82.6 82.52 82.48 82.3 82.42 82.28 82.26 82.0 81.88 81.7 Baseline Baseline+ High All ClearNLP ClearParser Out-of-domain experiments 72 71.90 71.95 71.7 71.85 71.64 71.4 71.52 71.1 71.07 70.8 Baseline Baseline+ High All ClearNLP ClearParser 53Friday, August 17, 2012
  54. 54. Semantic Role Labeling • Accuracy - OntoNotes models (Avgi and Avgo) In-domain experiments 81.7 81.69 81.51 81.48 81.52 81.3 81.33 80.9 80.73 80.5 Baseline Baseline+ High All ClearNLP ClearParser Out-of-domain experiments 70.9 70.81 70.5 70.68 70.68 70.54 70.1 70.02 70.01 69.7 Baseline Baseline+ High All ClearNLP ClearParser 54Friday, August 17, 2012
  55. 55. Semantic Role Labeling • Speed comparison - WSJ models - Milliseconds for finding all arguments of each predicate. 3 ClearNLP ClearNLP+ Baseline+ 2.25 High All Milliseconds ClearParser 1.5 0.75 0 10 20 30 40 50 60 70 80 Sentence Length 55Friday, August 17, 2012
  56. 56. Semantic Role Labeling • Speed comparison - OntoNotes models 3 ClearNLP ClearNLP+ Baseline+ 2.25 High All Milliseconds ClearParser 1.5 0.75 0 10 20 30 40 50 60 70 80 Sentence Length 56Friday, August 17, 2012
  57. 57. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 57Friday, August 17, 2012
  58. 58. Conclusion • Our dependency conversion gives rich dependency representations and can be applied to most English Treebanks. • The dynamic model selection runs fast and shows robust POS tagging accuracy across different genres. • Our parsing algorithm shows linear-time average parsing complexity for generating both proj. and non-proj. trees. • The bootstrapping technique gives significant improvement on parsing accuracy. • The higher-order argument pruning gives significant improvement on argument coverage. • The conditional higher-order argument pruning reduces average labeling complexity without compromising the F1-score. 58Friday, August 17, 2012
  59. 59. Conclusion • Contributions - First time that these three components have been evaluated together on such a wide variety of English data. - Maintained high level accuracy while improving efficiency, modularity, and portability of these components. - Dynamic model selection and bootstrapping can be generally applicable for tagging and parsing, respectively. - Processing all three components take about 2.49 - 2.69 ms (tagging: 0.36 - 0.37, parsing: 1.16 - 1.28, labeling: 0.97 - 1.04). - All components are publicly available as an open source project, called ClearNLP (clearnlp.googlecode.com). 59Friday, August 17, 2012
  60. 60. Conclusion • Future work - Integrate the dynamic model selection approach with more sophisticated tagging algorithms. - Evaluate our parsing approach on languages containing more non-projective dependency trees. - Improve semantic role labeling where the quality of input parse trees is poor (using joint-inference). 60Friday, August 17, 2012
  61. 61. Acknowledgment • We gratefully acknowledge the support of the following grants. Any contents expressed in this material are those of the authors and do not necessarily reflect the views of any grant. - The National Science Foundation Grants IIS-0325646, Domain Independent Semantic Parsing, CISE-CRI-0551615, Towards a Comprehensive Linguistic Annotation, CISE-CRI 0709167, Collaborative: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu, CISE- IIS-RI-0910992, Richer Representations for Machine Translation. - A grant from the Defense Advanced Research Projects Agency (DARPA/ IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06- C-0022, subcontract from BBN, Inc. - A subcontract from the Mayo Clinic and Harvard Children’s Hospital based on a grant from the ONC, 90TR0002/01. - Strategic Health Advanced Research Project Area 4: Natural Language Processing. 61Friday, August 17, 2012
  62. 62. Acknowledgment • Special thanks are due to - Martha Palmer for practically being my mom for 5 years. - James Martin for always encouraging me when I’m low. - Wayne Ward for wonderful smiles. - Bhuvana Narasimhan for bringing Hindi to my life. - Joakim Nivre for suffering under millions of my questions. - Nicolas Nicolov for making me feel normal when others call me “workaholic”. - All CINC folks for letting me live (literally) at my cube. 62Friday, August 17, 2012
  63. 63. References • Jinho D. Choi and Nicolas Nicolov. K-best, Locally Pruned, Transition-based Dependency Parsing Using Robust Risk Minimization. In Recent Advances in Natural Language Processing V, pages 205–216. John Benjamins, 2009. • Jinho D. Choi and Martha Palmer. Getting the Most out of Transition-based Dependency Parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies, ACL:HLT’11, pages 687–692, 2011a. • Jinho D. Choi and Martha Palmer. Transition-based Semantic Role Labeling Using Predicate Argument Clustering. In Proceedings of ACL workshop on Relational Models of Semantics, RELMS’11, pages 37– 45, 2011b. • M. Čmejrek, J. Cuřín, and J. Havelka. Prague Czech-English Dependency Treebank: Any Hopes for a Common Annotation Scheme? In HLT-NAACL’04 workshop on Frontiers in CorpusAnnotation, pages 47–54, 2004. • Jesús Giménez and Lluís Màrquez. SVMTool: A general POS tagger generator based on Support Vector Machines. In Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC’04, 2004. • Richard Johansson and Pierre Nugues. Dependency-based Semantic Role Labeling of PropBank. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP’08), pages 69–78, 2008. 63Friday, August 17, 2012
  64. 64. References • Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. A Dual Coordinate Descent Method for Large-scale Linear SVM. In Proceedings of the 25th international conference on Machine learning, ICML’08, pages 408–415, 2008. • Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993. • Marie-Catherine de Marneffe and Christopher D. Manning. The Stanford typed dependencies representation. In Proceedings of the COLING workshop on Cross-Framework and Cross- DomainParser Evaluation, 2008a. • Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. Non-projective Dependency Parsing using Spanning Tree Algorithms. In Proceedings of the Conference on Human LanguageTechnology and Empirical Methods in Natural Language Processing (HLT-EMNLP’05), pages523–530, 2005. • Rodney D. Nielsen, James Masanz, Philip Ogren, Wayne Ward, James H. Martin, Guergana Savova, and Martha Palmer. An architecture for complex clinical question answering. In Proceedings of the 1st ACM International Health Informatics Symposium, IHI’10, pages 395–399, 2010. • Joakim Nivre. An Efficient Algorithm for Projective Dependency Parsing. In Proceedings of the 8th International Workshop on Parsing Technologies, IWPT’03, pages 149–160, 2003. • Joakim Nivre. Algorithms for deterministic incremental dependency parsing. Computational Linguistics, 34(4):513–553, 2008. 64Friday, August 17, 2012
  65. 65. References • Joakim Nivre. Non-Projective Dependency Parsing in Expected Linear Time. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International JointConference on Natural Language Processing of the AFNLP (ACL-IJCNLP’09), pages 351–359,2009. • Owen Rambow, Cassandre Creswell, Rachel Szekely, Harriet Taber, and Marilyn Walker. A Dependency Treebank for English. In Proceedings of the 3rd International Conference on LanguageResources and Evaluation (LREC’02), 2002. • Ralph Weischedel, Eduard Hovy, Martha Palmer, Mitch Marcus, Robert Belvin, Sameer Pradhan, Lance Ramshaw, and Nianwen Xue. OntoNotes: A Large Training Corpus for Enhanced Processing. In Joseph Olive, Caitlin Christianson, and John McCary, editors, Handbook of NaturalLanguage Processing and Machine Translation. Springer, 2011. • Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics on HumanLanguage Technology, NAACL’03, pages 173–180, 2003. • Nianwen Xue and Martha Palmer. Calibrating Features for Semantic Role Labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004. 65Friday, August 17, 2012

×