Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text

139 views

Published on

NAACL 2016 - N16-1085

Published in: Education
  • Be the first to comment

  • Be the first to like this

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text

  1. 1. Contributions Dataset Models Experiments Conclusion Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design
  2. 2. Contributions Dataset Models Experiments Conclusion Paper Contributions In this paper, we contributed: 1 Noun phrase-annotated SMS corpus1 1 Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message service corpus: the NUS SMS corpus”. In: Language Resources and Evaluation. Vol. 47. Springer Netherlands, pp. 299–335. 2 / 13
  3. 3. Contributions Dataset Models Experiments Conclusion Paper Contributions In this paper, we contributed: 1 Noun phrase-annotated SMS corpus1 2 Weak semi-Markov CRF 1 Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message service corpus: the NUS SMS corpus”. In: Language Resources and Evaluation. Vol. 47. Springer Netherlands, pp. 299–335. 2 / 13
  4. 4. Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus 3 / 13
  5. 5. Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases. 2 http://brat.nlplab.org/ 4 / 13
  6. 6. Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases. Examples: 2 http://brat.nlplab.org/ 4 / 13
  7. 7. Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases. Examples: 2 http://brat.nlplab.org/ 4 / 13
  8. 8. Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 5 / 13
  9. 9. Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 26,500 SMS messages 5 / 13
  10. 10. Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 26,500 SMS messages 76,490 noun phrases 5 / 13
  11. 11. Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 26,500 SMS messages 76,490 noun phrases 359,009 tokens 5 / 13
  12. 12. Contributions Dataset Models Experiments Conclusion Models 6 / 13
  13. 13. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) 7 / 13
  14. 14. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) 7 / 13
  15. 15. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) 7 / 13
  16. 16. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) 7 / 13
  17. 17. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) 7 / 13
  18. 18. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) N N NN N N O O OO O O said Dr Teh Fig. 3: Weak Semi-CRF: O(n |Y| 2 + nL |Y|) 7 / 13
  19. 19. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) N N NN N N O O OO O O said Dr Teh Fig. 3: Weak Semi-CRF: O(n |Y| 2 + nL |Y|) 7 / 13
  20. 20. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) N N NN N N O O OO O O said Dr Teh Fig. 3: Weak Semi-CRF: O(n |Y| 2 + nL |Y|) 7 / 13
  21. 21. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) N N NN N N O O OO O O said Dr Teh Fig. 3: Weak Semi-CRF: O(n |Y| 2 + nL |Y|) 7 / 13
  22. 22. Contributions Dataset Models Experiments Conclusion Empirical Verification 8 / 13
  23. 23. Contributions Dataset Models Experiments Conclusion F1-Score Basic features +affixes All features 50 60 70 80 71.19 72.49 72.68 74.37 74.69 74.5874.39 74.60 74.31 F1-Score(%) Linear CRF Semi-CRF Weak Semi-CRF 9 / 13
  24. 24. Contributions Dataset Models Experiments Conclusion Training Speed 5,000 10,000 15,000 20,000 0.5 1 1.5 2 # training instances (SMS) Avg.timeperiteration(s) Linear-CRF Semi-CRF Weak Semi-CRF 10 / 13
  25. 25. Contributions Dataset Models Experiments Conclusion Conclusion 11 / 13
  26. 26. Contributions Dataset Models Experiments Conclusion Conclusion We have created a new NP-annotated dataset on informal text 12 / 13
  27. 27. Contributions Dataset Models Experiments Conclusion Conclusion We have created a new NP-annotated dataset on informal text We can split the decisions of selecting segment length and segment type to improve the training time, while maintaining similar accuracy 12 / 13
  28. 28. Contributions Dataset Models Experiments Conclusion Thank You Code and data available at: http://statnlp.org/research/ie/ Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design 13 / 13

×