Your SlideShare is downloading. ×
Overview of the 2013 ALTA Shared Task
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Overview of the 2013 ALTA Shared Task

348
views

Published on

Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013), pp132-136, Brisbane, Australia. http://aclweb.org/anthology/U/U13/

Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013), pp132-136, Brisbane, Australia. http://aclweb.org/anthology/U/U13/

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
348
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Overview of the 2013 ALTA Shared Task Diego Moll´ a Australasian Language Technology Macquarie University ALTA 2013, Brisbane, Australia
  • 2. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Contents The ALTA Shared Tasks The 2013 ALTA Shared Task Kaggle in Class Results Use in University of Melbourne (Karin Verspoor) 2013 ALTA Shared Task Diego Moll´ a 2/26
  • 3. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Contents The ALTA Shared Tasks The 2013 ALTA Shared Task Kaggle in Class Results Use in University of Melbourne (Karin Verspoor) 2013 ALTA Shared Task Diego Moll´ a 3/26
  • 4. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb The ALTA Shared Tasks Aims Target university students with programming experience. No background on text processing required. Aim to expose potential researchers to NLP-related problems. Format All participants attempt to solve the same problem. The training and test data are common to all. Any tools and external resources can be used. The solution must be completely automated. 2013 ALTA Shared Task Diego Moll´ a 4/26
  • 5. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb The ALTA Shared Tasks Aims Target university students with programming experience. No background on text processing required. Aim to expose potential researchers to NLP-related problems. Format All participants attempt to solve the same problem. The training and test data are common to all. Any tools and external resources can be used. The solution must be completely automated. 2013 ALTA Shared Task Diego Moll´ a 4/26
  • 6. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb The 2013 Shared Task Task: Case and punctuation restoration Categories: student, open Prize: $350 Framework: Kaggle in Class Student Category Open Category All members are university students. Any other teams. No members are full-time employed. No members have a PhD. 2013 ALTA Shared Task Diego Moll´ a 5/26
  • 7. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Contents The ALTA Shared Tasks The 2013 ALTA Shared Task Kaggle in Class Results Use in University of Melbourne (Karin Verspoor) 2013 ALTA Shared Task Diego Moll´ a 6/26
  • 8. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Case and Punctuation Restoration Input . . . stored at the ucla television archives the archived episodes were telecast march 8 16 and 24 1971 april 1 and . . . Output . . . stored at the UCLA Television Archives. The archived episodes were telecast: March 8, 16, and 24, 1971, April 1 and . . . 2013 ALTA Shared Task Diego Moll´ a 7/26
  • 9. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Motivation In some situations, English text does not have information about capitalisation or punctuation. Automated text transcriptions. Quick notes. Text messages, tweets. In some applications, a preliminary stage of case and punctuation restoration improves outcomes. Machine translation. Information extraction. 2013 ALTA Shared Task Diego Moll´ a 8/26
  • 10. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Motivation In some situations, English text does not have information about capitalisation or punctuation. Automated text transcriptions. Quick notes. Text messages, tweets. In some applications, a preliminary stage of case and punctuation restoration improves outcomes. Machine translation. Information extraction. 2013 ALTA Shared Task Diego Moll´ a 8/26
  • 11. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Case and Punctuation Restoration as a Classification Task Baldwin and Joseph (2009) Multi-label classification. Each label indicates the information to restore. COMMA: Word is followed by a comma. CAPi: Character i is in uppercase. ALLCAPS: All characters in uppercase. NOCHANGE: No special restoration needed. ... corp/CAP1+FULLSTOP+COMMA Corp. 2013 ALTA Shared Task Diego Moll´ a 9/26
  • 12. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Simplification for the ALTA Shared Task Only Two Labels Case: The word has at least one character in uppercase. Punct: The word is followed by at least one punctuation mark. Punctuation Marks ,.;:?! 2013 ALTA Shared Task Diego Moll´ a 10/26
  • 13. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Training Set CAPITALIZED PUNCTUATION WORD True False positive False False pressure False False ventilation False False ( True False ppv False False ) False False consists False False of False False using False False a False False fan False False to False False create 2013 ALTA Shared Task Diego Moll´ a 11/26
  • 14. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Test Set Input Output ID WORD 255 stored 256 at 257 the 258 ucla 259 television 260 archives 261 the 262 archived 263 episodes 264 were Id,documents Case,258 259 260 261 266 272 Punct,260 265 267 268 270 271 2013 ALTA Shared Task Diego Moll´ a 12/26
  • 15. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Data Sources Test Set Data collected by Baldwin & Joseph (2009) from the AP Newswire (APW) and New York Times (NYT) sections of the English Gigaword Corpus. 1. Public test set: available for participants during the competition. 2. Private test set: released at the last minute. Training Set A third partition from the data by Baldwin & Joseph (2009). An extract of Wikipedia. 2013 ALTA Shared Task Diego Moll´ a 13/26
  • 16. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Data Sizes Wikipedia Extract for Training 18 files. 306,445 words in total. Data from Baldwin & Joseph (2009) Training: 66,371 words. Public test: 64,072 words. Private test: 66,371 words. 2013 ALTA Shared Task Diego Moll´ a 14/26
  • 17. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Contents The ALTA Shared Tasks The 2013 ALTA Shared Task Kaggle in Class Results Use in University of Melbourne (Karin Verspoor) 2013 ALTA Shared Task Diego Moll´ a 15/26
  • 18. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Kaggle in Class Kaggle Kaggle offers a Web-based framework for data-driven competitions. A large base of potential participants. Potentially large prizes for the participants. Fee-based for the organisers; free for the participants. Kaggle in Class Free for organisers and participants. Limited user support by Kaggle. Used by course-based competitions. 2013 ALTA Shared Task Diego Moll´ a 16/26
  • 19. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Alta Shared Task in Kaggle in Class 2013 ALTA Shared Task Diego Moll´ a 17/26
  • 20. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Features of Kaggle in Class Public leaderboard: all participants can submit and compare with other participants. Automated evaluation: organisers can choose among several evaluation metrics. Public and private partitions: A private partition of the test data is held private for the final ranking But this feature does not work well with some evaluation metrics. Discussion forum: for communication among participants. 2013 ALTA Shared Task Diego Moll´ a 18/26
  • 21. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Contents The ALTA Shared Tasks The 2013 ALTA Shared Task Kaggle in Class Results Use in University of Melbourne (Karin Verspoor) 2013 ALTA Shared Task Diego Moll´ a 19/26
  • 22. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Evaluation Metric Output Macro-Averaged F1 Id,documents Case,258 259 260 262 270 Punct,259 260 265 270 Case: P = 3/5; R = 3/6; F1 = 0.54 Target Punct: P = 3/4; R = 3/6; F1 = 0.6 Id,documents Case,258 259 260 261 266 272 Punct,260 265 267 268 270 271 Final score: (0.54+0.6)/2 = 0.57 2013 ALTA Shared Task Diego Moll´ a 20/26
  • 23. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb A Baseline Training data F1 (public) F1 (private) Train data Wikipedia 0-5 Wikipedia 0-10 Wikipedia 0-1 Train + Wikipedia 0.4355 0.4077 0.4173 0.42267 0.4493 0.2895 0.2761 0.2791 0.2789 0.2876 Single-label task: Each of the 4 combinations of possible labels forms a single label. Trained NLTK’s Hidden Markov Model (HMM). Results improved as we added more training data. Large difference between “public” and “private” test sets. 2013 ALTA Shared Task Diego Moll´ a 21/26
  • 24. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Results Public Data Rank Team Score 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Winner Second ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (test system) ? ? ? 0.73763 0.68360 0.63232 0.63109 0.60251 0.60147 0.59517 0.58332 0.56832 0.56747 0.55793 0.55606 0.55087 0.52261 0.51954 0.51167 0.49311 0.47622 0.46667 0.46490 0.45986 0.45291 Baseline Public Data 0.44930 Rank Team Score 23 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 (8 systems) ? ? ? ? ? ? ? ? ? ? ? ? ? ? Team A ? ? ? ? ? ? 0.44930 0.44914 0.42710 0.42257 0.41692 0.40239 0.38812 0.38113 0.32594 0.32320 0.30988 0.29891 0.29304 0.27642 0.23504 0.23108 0.21930 0.21771 0.21291 0.20226 0.13397 0.00000 2013 ALTA Shared Task Private Data Rank Team Score 1 2 3 4 Winner Second ? Team A 0.73660 0.64934 0.30037 0.07656 Diego Moll´ a 22/26
  • 25. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Contents The ALTA Shared Tasks The 2013 ALTA Shared Task Kaggle in Class Results Use in University of Melbourne (Karin Verspoor) 2013 ALTA Shared Task Diego Moll´ a 23/26
  • 26. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb The ALTA Shared Task in Class at UniMelb Students in the UniMelb Knowledge Technologies subject were assigned the shared task as a class project. Blended Learning : augmenting classroom learning with on-line opportunities. Some adaptations were made to the class context: Stage 1: Data pre-processing Stage 2: Feature and Method Exploration; Report write-up Stage 3: Peer review Emphasis on critical analysis of methods and results. 2013 ALTA Shared Task Diego Moll´ a 24/26
  • 27. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb ALTA Kaggle in Class at UniMelb Students were given the option of participating on-line through Kaggle in Class. Participating in the on-line forum gave immediate feedback on performance. Open ’competition’ through leader board stimulated experimentation. Anecdotal observation suggested better overall marks for students who participated on-line. 2013 ALTA Shared Task Diego Moll´ a 25/26
  • 28. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Conclusions Conclusions Larger participation than in past tasks. Used as an assignment at a Masters unit at University of Melbourne. Many participants did much better than our baseline. Easy to produce training data. Larger training data from other domains (Wikipedia) improves on results. Kaggle in Class useful, though had to use a second “final” submission that had very few participants. Questions? 2013 ALTA Shared Task Diego Moll´ a 26/26
  • 29. The ALTA Shared Tasks The 2013 Task Kaggle in Class Results Use in UniMelb Conclusions Conclusions Larger participation than in past tasks. Used as an assignment at a Masters unit at University of Melbourne. Many participants did much better than our baseline. Easy to produce training data. Larger training data from other domains (Wikipedia) improves on results. Kaggle in Class useful, though had to use a second “final” submission that had very few participants. Questions? 2013 ALTA Shared Task Diego Moll´ a 26/26

×