1. CrowdTruth for Digital Hermeneutics
Human-assisted computing for
understanding of events in cultural heritage
Lora Aroyo
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
5. DIGITAL
HERMENEUTICS
theory of interpretation: relation parts of wholes
events as context for interpretation of online collections
intersection of hermeneutics & Web technology
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
9. Issues
• How
to
get
(extract)
the
events?
– needs
to
be
scalable
and
automated
• How
to
collect
ground
truth
for
training?
– experts
don’t
do
as
good
job
as
they
think
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
10. Flickr: vanilllaph
What
are
Events?
events perdure = their parts exist at different time points
objects endure = they have all their parts at all points in time
objects are wholly present at any point in time,
events unfold over time
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
11. Events
are
Vague
humans have no clear notion of what events are
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
13. “event is a significant
happening or gathering
of people. I would
define a happening as
an event if the group of
people gathered were
united in one common
goal.”
We asked the crowd what an EVENT is ...
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
14. We asked the crowd what an EVENT is ...
“Event is a happening,
which can be scheduled
or unscheduled. An
earthquake or fire
happens (unscheduled). A
wedding or birthday
party (scheduled). It is an
occasion that is unusual
and tends to be
memorable.”
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
15. “An event would be any
occurrence where physical
action has taken place. It may
be a single, momentary
instance (I sneezed), or it may
span a period of time (the
festival ran for four hours). An
event may also be made up of a
number of smaller events,
such as a day at school is an
event, but each individual class
is also an event itself. Basically
an event must have a physical
action over any delimited time
span.”
We asked the crowd what an EVENT is ...
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
16. “A planned public or social get together or occasion.”
“an event is an incident that's very important or monumental”
“An event is something occurring at a specific time and/or
date to celebrate or recognize a particular occurrence.”
“a location where something like a function is held. you could tell
if something is an event if there people gathering for a purpose.”
“Event can refer to many things such as: An observable
occurrence, phenomenon or an extraordinary occurrence.”
We asked the crowd what an EVENT is ...
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
17. “an
event
is
the
exemplifica:on
of
a
property
by
a
substance
at
a
given
:me” Jaegwon
Kim,
1966
“events
are
changes
that
physical
objects
undergo”
Lawrence
Lombard,
1981
“events
are
proper:es
of
spa:otemporal
regions”,
David
Lewis,
1986
under30ceo.com
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
18. Gold Standard
Assumption
• Systems
need
to
be
told
what
is
right
&
what
is
wrong
with
a
gold
standard
or
ground
truth
• Performance
is
measured
on
test
sets
veHed
by
human
experts
à
never
perfect,
always
improving
against
test
data
• Historically,
gold
standards
are
created
assuming
that
for
each
annotated
instance
there
is
a single right answer
• Gold
standard
quality
is
measured
in
inter-annotator agreement à
does
not
account
for
perspec:ves,
for
reasonable
alterna:ve
interpreta:ons
19. HOW
DO
WE
SCALE
&
AUTOMATE
SOMETHING
FOR
WHICH
THERE
IS
SO
MUCH
DISAGREEMENT?
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
20. Position
Annotator disagreement is not noise, but signal.
Not a problem to overcome but a source of information for machines
Artificially restricting humans does not help machines to learn.
They will learn better from diversity
21. Crowd Truth
Annotator disagreement is indicative of the variation
in human semantic interpretation of signs, and can
indicate ambiguity, vagueness, over-generality, etc.
http://www.freefoto.com/preview/01-47-44/Flock-of-Birds
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
22. HOW DO WE COLLECT & REPRESENT
DISAGREEMENT SO THAT IT CAN BE
HARNESSED?
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
23. • Use crowdsourcing to get multiple perspectives (in
the collection)
• Automatically generate examples (for scalability)
• Multiple people annotate each example
• Represent the annotations (the result) in a way that
captures the disagreement
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
24. CrowdTruth Framework for Medical Relation Extraction
Aroyo, L., Welty, C.: Crowd Truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. WebSci2013. ACM, 2013
Aroyo, L., Welty, C.: Truth is a Lie: 7 Myths about Human Annotation, AI Magazine, 2014 (in print)
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
25. CrowdTruth Framework for News Event Extraction
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
26. The police came to Apple’s glass cube on Fifth Avenue
on Tuesday to enforce order after activists released
black balloons inside the cube to [protest] the
company’s environmental policies.
The police came to Apple’s glass cube on Fifth Avenue
on Tuesday [to enforce] order after activists released
black balloons inside the cube to protest the company’s
environmental policies.
The police came to Apple’s glass cube on Fifth Avenue
on Tuesday [to enforce order] after activists released
black balloons inside the cube to protest the company’s
environmental policies.
The police came to Apple’s glass cube on Fifth Avenue
on Tuesday to enforce order after activists [released]
black [balloons] inside the cube to protest the
company’s environmental policies.
The police [came]to Apple’s glass cube on Fifth Avenue
on Tuesday [to enforce] order after activists released
black balloons inside the cube to protest the company’s
environmental policies.
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
27. Event Semantics are Hard
event type: Semafor
event location type: GeoNames
event time type: Allen’s time theory, KSL time ontology
event participant type: based on proper nouns classes
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Inel, O., Aroyo, L., et al (2013). Domain-independent Quality Measures for Crowd Truth Disagreement, DeRIVE2013
28. Events have multiple DIMENSIONS
Micro-taskTemplate
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
29. Each DIMENSION has different GRANULARITY
Micro-taskTemplate
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
30. People have different POINTS OF VIEWS
Micro-taskTemplate
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
31. Why do people disagree?
Sign
Reference Observer
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
32. Why do people disagree?
Sentence
Annotation Task Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
33. Disagreement Analytics
• sentence metrics: sentence clarity, sentence-relation score
• annotation task metrics: event clarity, event type similarity, relation ambiguity
• worker metrics:
o worker-sentence disagreement
o worker-worker disagreement
o avg number of annotations per sentence
o valid words in explanation text
o same explanation across contributions
o “[OTHER]” + different type
o time to complete, number of sentences, etc.
Soberón G.,Aroyo, L., et al (2013): Crowd truth metrics. CrowdSem2013 Workshop
Aroyo, L., Welty, C.: (2013) Measuring crowd truth for medical relation extraction. AAAI Fall Symposium on Semantics for Big Data
http://www.americanprogress.org/wp-content/uploads/2012/12/multiple_measures_onpage.jpghttp://lora-aroyo.org http://slideshare.net/laroyo @laroyo
34. Experimental Setting
• 70 putative events
• 8 experiments 2 for each
event role filler
• annotations 15 workers
per putative event
• max annot./worker 10
• workers native English
speakers on CF
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
35. Annotation Example
Around 2:30 p.m., as if delivering birthday greetings, several Greenpeace
demonstrators [ENTERED] the cube clutching helium-filled balloons, which
were the shape and color of charcoal briquettes.
Overall annotation & granularity distribution:
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
36. EventTypeDisagreement
[ENTERED]ACTION (18.2%)
MOTION (9.1%)
ARRIVING_OR_
DEPARTING (54.5%)
PURPOSE (18.2%)
Around 2:30 p.m., as if delivering birthday greetings, several Greenpeace
demonstrators [ENTERED] the cube clutching helium-filled balloons, which
were the shape and color of charcoal briquettes.
type
type
type
type
Sentence
Ontology Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
38. EventTimeDisagreement
[ENTERED]
Around
(9.1%)
Around
2:30 p.m.
(45.45%)
2:30 p.m.
(45.45%)
TIMESTAM
P (100%)
TIMESTAM
P (100%)
type
type
TIMESTAM
P (100%)
type
Around 2:30 p.m., as if delivering birthday greetings, several Greenpeace
demonstrators [ENTERED] the cube clutching helium-filled balloons, which
were the shape and color of charcoal briquettes.
Sentence
Ontology Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
40. Comparative Annotation Distribution
Event Type Distribution Time Type Distribution
The high disagreement for event type across all sentences likely indicates problems with the ontology. These
event types are difficult to distinguish between. The event classes may overlap, be confusable, too vague, etc.
Sentence
Ontology Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
41. Comparative Annotation Distribution
Location Type Distribution Participant Type Distribution
Sentence
Ontology Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
42. Sentence Clarity
Identifies sentences that are unclear or ambiguous based on the distribution of types
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
43.
44. Spam Detection
From the TRIANGLE we aim to efficiently remove the spam &
low-quality contributors:
o we filter sentences based on their clarity score first in order to avoid
penalizing workers for contributing on difficult or ambiguous sentences;
When bad sentences are identified we remove them and see significant
increase of accuracy on spam detection
o apply the worker metrics to analyze worker agreement to identify
workers who systematically disagree (1) with the opinion of the majority
(worker-sentence disagreement), or (2) with the rest of their co-workers
(worker-worker disagreement); When spammers are identified we remove
their annotations and the accuracy of the sentence metrics improves
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
45. What more …
Understand human disagreement on event extraction with
focus on ambiguity:
○ Would different classification (ontology) of putative
events perform better?
○ Does the overlapping of the types (ontology) influence
the results?
○ Identify the right role fillers (per event) for multiple
putative events.
○ Would event clustering help with determining the most
appropriate structure of the event and its role fillers?
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
46. Conclusions
● Capturing events is important for digital hermeneutics
● Understanding disagreement helps understand event
semantics
● Considering the interdependence of the different aspects of
the annotations improves their quality
● Disagreement metrics adaptable across domains - helped
us to understand the vagueness and the clarity of a sentence/
putative event
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Sentence
Annotation task Worker
47. The Crowd Truth Crew
• Lora
Aroyo,
PI
(VU)
• Chris
Welty,
PI
(IBM)
• Robert-‐Jan
Sips
(IBM)
• Anca
Dumitrache,
PhD
candidate
(VU-‐IBM)
• Oana
Inel,
Lukasz
Romasko,
researchers
(VU)
• Students:
Khalid,
Rens,
Benjamin,
TaXana,
HarrieYe
• Engineers:
Jelle,
Arne
(IBM)
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
49. AGORA
Eventing History
Susan Legêne Chiel van den Akker
VU History department
VU Computer Science
Guus Schreiber Lora Aroyo
Geertje Jacobs
Johan Oomen
http://agora.cs.vu.nl
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
50. Events @
• Agora: Historical Events in Cultural Heritage Collections
– http://agora.cs.vu.nl/
• Extractivism: Activist Events in Newspapers
– http://mona-project.org/
• Semantics of History
– http://www2.let.vu.nl/oz/cltl/semhis/
• BiographyNet: Events Change in Perspective over Time
• NewsReader: Multilingual Events & Storylines in Newspapers
– http://www.newsreader-project.eu/
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo