Poster for our extended abstract presented at NewsKDD workshop at KDD 2014 conference and ESWC Summer School 2014 where it won 3rd place for best student poster.
1. Funded under: FP7
Area: Language Technologies (ICT-2011.4.2)
Project reference: 288342
Coordinator: Marko Grobelnik
www.xlike.org
INTERFACE
Aljaž Košmerlj, Jenya Belyaeva, Gregor Leban, Blaž Fortuna, Marko Grobelnik
Artificial Intelligence Laboratory, Jožef Stefan Institute, Ljubljana, Slovenia
We present a system for manually extracting structured event information from freeform newswire text. The extraction is
performed on news articles preprocessed by services developed within the XLike project and is guided by suggestions the
system produces using machine learning techniques. Results of testing performed using human annotators show the
system can produce meaningful data and suggest several avenues for improvement of the system.
List of articles about the event and a
list of entities (i.e. noun phrases)
found in the articles.
Type of event described in the
articles defined by user or selected
from suggestions.
List of filled and unfilled roles
defined by users for the selected
event type.
Entity role selection using a
dropdown list – either in text or in
entity list.
ABSTRACT
INPUT
Sets of articles about the same
event from the Event Registry
service (http://eventregistry.org).
PIPELINE EVENT TYPE SUGGESTION EVALUATION
suggestions generated by SVM classifier
built using the QMiner data analytics
platform (http://qminer.ijs.si)
11 annotators annotating the
same 10 events
12.1% ± 3.1% of proposed
entities annotated per event
6.2 ± 0.9 roles filled per event
average pairwise event type
agreement: 5.9 ± 2.0
average pairwise Jaccard index
of roles with same annotation:
0.25 ± 0.09
average number of successful
event type suggestions per
user: 6.6 ± 1.9
built on dataset of 100 events annotated
into 5 event types (road accident, product
launch, protest, earthquake and bombing)
by an expert annotator
features include concepts found in event
by Event Registry as well as bag-of-words
features computed on article titles and
event summary
leave-one-out testing classification
accuracy score of: CA = 0.67