Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SemEval-2018 task 5: Counting events and participants in the long tail

16 views

Published on

Slides from the presentation of SemEval-2018 task 5: Counting events and participants within highly ambiguous data covering a very long tail

Published in: Science
  • Be the first to comment

  • Be the first to like this

SemEval-2018 task 5: Counting events and participants in the long tail

  1. 1. Counting Events and Participants within Highly Ambiguous Data covering a very long tail SemEval-2018 Task 5 Marten Postma, Filip Ilievski, Piek Vossen {m.c.postma, f.ilievski, piek.vossen}@vu.nl
  2. 2. Input: Event properties "2-5109": { "event_types": "injuring", "location": {“state” : "http://dbpedia.org/resource/Iowa"}, "subtask": 2, "time": {“year”: "2017"}, "verbose_question": "How many 'injuring' events happened in 2017 (year) in ('Iowa') (state) ?" }
  3. 3. Input: Event types "2-5109": { "event_types": "injuring", "location": {“state” : "http://dbpedia.org/resource/Iowa"}, "subtask": 2, "time": {“year”: "2017"}, "verbose_question": "How many 'injuring' events happened in 2017 (year) in ('Iowa') (state) ?" }
  4. 4. Subtasks ● Subtask S1: Event Questions with Answer=1 ○ Which killing incident happened in 2014 in Columbus, OH? ● Subtask S2 Event Questions with Answer=any number ○ How many killing incidents happened in 2016 in Columbus, MS? ● Subtask S3 Participant Questions with Answer=any number ○ How many people were killed in 2016 in Columbus, MS? Optionally, participants could also provide the text mentions of events in the documents.
  5. 5. Data 3 Domains: gun violence, fire disasters, business. The data consists of local news articles and reports on small-world events and participants. The data is split into trial and test parts.
  6. 6. Evaluation 1. Incident-level evaluation – Did you get the question right? a. Accuracy (exact answer matching) b. RMSE 2. Document-level evaluation – How many of the gold documents did you retrieve? a. Precision, recall and F1-value 3. Mention-level evaluation – Did you extract the correct coreference chains? a. BLANC, CEAF_E, CEAF_M, MUC, BCUB
  7. 7. Participating systems Task 5 had four participating systems. NewsReader and ID-DE built a knowledge graph of incidents, which was then queried for each question based on its constraints. NAI-SEA performed clustering on a document level. FEUP applied a supervised approach to address participants and locations separately. All systems aided their clustering by extracting temporal expressions, word senses, participants, locations, semantic roles, ...
  8. 8. Results Subtask 2 Team Incident-level accuracy norm Document-level accuracy norm FEUP 26.38 (1) 30.51 (4) *NewsReader 21.87 (2) 36.91 (3) NAI-SEA 17.35 (4) 50.52 (1) ID-DE 13.74 (5) 37.24 (2) Baseline 18.25 (3) 26.38 (5)
  9. 9. See the full paper for more analysis of system results
  10. 10. Thank you for your attention

×