CrowdTruth for User-Centric Relevance

CrowdTruth
Human-assisted computing for understanding
semantic interpretation & user-centric relevance
Lora Aroyo
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo

“The
Gallery
of
Cornelis
van
der
Geest”
(Willem
van
Haecht)

hun<ng
vs.
dogs

events
vs.
people

DIGITAL
HERMENEUTICS
theory of interpretation: relation parts of wholes
events as context for interpretation of online collections
intersection of hermeneutics & Web technology

Linking to Events

Generating
Events
Narrative

So
far
so
good,
but
….
L.
Aroyo,
C.
Welty:
Truth
is
a
Lie:
7
Myths
about
Human
Annota;on,
AI
Magazine
2014
(in
press).

Events
are
Vague
people have no clear notion of what events are

Events
have
Perspec+ves
and people don’t always agree

“Event can refer to many things such as: An observable
occurrence, phenomenon or an extraordinary occurrence.”
“an event is an incident that's very important or monumental”
“A planned public or social get together or occasion.”
“An event is something occurring at a specific time and/or
date to celebrate or recognize a particular occurrence.”
“a location where something like a function is held. you could tell
if something is an event if there people gathering for a purpose.”
If you ask the crowd ...

“an
event
is
the
exemplifica;on
of
a
property
by
a
substance
at
a
given
;me” Jaegwon
Kim,
1966
“events
are
changes
that
physical
objects
undergo”
Lawrence
Lombard,
1981
“events
are
proper;es
of
spa;otemporal
regions”,
David
Lewis,
1986
under30ceo.com
If you ask the experts ...

under30ceo.com
People are the ones who search
& determine relevance

Experts
vs.
Crowd?
Medical Relation Extraction Task (Aroyo, Welty 2014)
• 91% of expert annotations covered by the crowd
• expert annotators agree only in 30%
• popular crowd vote covers 95% of expert agreement
Waisda? Video Tagging (Gligorov et al. 2011)
• 14% tags in search logs are in professional vocab (GTAA)
• huge gap between expert & lay users’ views on what’s
important
Steve.Museum Project (Leason 2009)
• 14% user tags are in expert-curated documentation
under30ceo.com

CrowdTruth
Based on annotator disagreement as an indication of
the variation in human semantic interpretation of
signs, and can indicate ambiguity, vagueness, over-generality,
etc.
crowdtruth.org
L. Aroyo, C. Welty: Crowd Truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. ACM WebSci 2013.

CrowdTruth Framework for News Event Extraction
O.Inel, K.Khamkham, T.Cristea, A.Rutjes, J.van der Ploeg, L.Aroyo, R. Sips, A.Dumitrache, L.Romaszko: CrowdTruth: Machine-
Human Computation Framework for Harnessing Disagreement in Gathering Annotated Data. ISWC 2014.

News Event Extraction
The police came to Apple’s glass cube on Fifth Avenue
on Tuesday to enforce order after activists released
black balloons inside the cube to [protest] the
company’s environmental policies.
on Tuesday [to enforce] order after activists released
black balloons inside the cube to protest the company’s
environmental policies.
on Tuesday [to enforce order] after activists released
on Tuesday to enforce order after activists [released]
black [balloons] inside the cube to protest the
company’s environmental policies.
The police [came]to Apple’s glass cube on Fifth Avenue
on Tuesday [to enforce] order after activists released

Video Event Extraction
Following the grandeur of Baroque, Rococo
art is often dismissed as frivolous and
unserious, but Waldemar Januszczak
disagrees. […] The first episode is about
travel in the 18th century and how it
impacted greatly on some of the finest art
ever made. The world was getting smaller and
took on new influences shown in the glorious
Bavarian pilgrimage architecture,
Canaletto's romantic Venice and the
blossoming of exotic designs and tastes all
over Europe.
Rococo: Travel, pleasure, madness

Events have multiple DIMENSIONS
Micro-task Template

Each DIMENSION has different GRANULARITY
Micro-task Template

People have different POINTS OF VIEWS
Micro-task Template

Triangle of Reference
Sign
Reference Observer

Triangle of Reference
to Capture Disagreement
Sentence
Annotation Task Worker

CrowdTruth Metrics
Event Extraction
Three parts to understand human interpretations:
Sentence
How good is a sentence for the event extraction task?
Workers
How well does a worker understand the sentence?
Relations
Is the meaning of the event type clear?
How ambiguous/confusable is it?
Aroyo, L., Welty, C.: (2014) The Three Sides of CrowdTruth. Journal of Human Computation
Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Crowd Truth Metrics
based on the Triangle of Reference
Three parts to understand human interpretations:
Sign
How good is a sign for conveying information?
People
How well does a person understand the sign?
Ontology
Are the distinctions of the ontology clear?
How ambiguous/confusable are they?
Aroyo, L., Welty, C.: (2014) The Three Sides of CrowdTruth. Journal of Human Computation
Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Disagreement Analytics
• sentence metrics: sentence clarity, sentence-relation score
• annotation task metrics: event clarity, type similarity, relation
ambiguity
• worker metrics:
o worker-sentence disagreement
o worker-worker disagreement
o avg number of annotations per sentence
o valid words in explanation text
o same explanation across contributions
o “[OTHER]” + different type
o time to complete, number of sentences, etc.
Aroyo, L., Welty, C.: (2013) Measuring crowd truth for medical relation extraction. AAAI Fall Symposium on Semantics for Big Data
http://www.americanprogress.org/wp-content/uploads/2012/12/multiple_http://lora-aroyo.org http://slideshare.net/laroyo @laroyo measures_onpage.jpg

Spam Detection
o filter sentences on their clarity score: to avoid penalizing workers
for contributing on ambiguous sentences
o bad sentences are removed = increase of accuracy on spam
detection
o apply worker metrics to analyze worker agreement: workers
who systematically disagree
o with majority (worker-sentence disagreement)
o with rest of co-workers (worker-worker disagreement)
o spammers annotations are removed = improvement of accuracy of
sentence metrics

Annotation Example
Around 2:30 p.m., as if delivering birthday greetings, several Greenpeace
demonstrators [ENTERED] the cube clutching helium-filled balloons, which
were the shape and color of charcoal briquettes.
Overall annotation & granularity distribution:

Event Type Disagreement
[ENTERED]
ACTION (18.2%)
MOTION (9.1%)
ARRIVING_OR_
DEPARTING (54.5%)
PURPOSE (18.2%)
type
type
Sentence
type type
Ontology Worker

Event Location Disagreement
[ENTERED]
the cube
(38.5%)
cube
(38.5%)
none
(23%)
OTHER (100%)
NOT
APPLICABLE
(100%)
type
type
COMMERCIAL
(40%)
OTHER (40%)
INDUSTRIAL
(20%)
type
type
type
Sentence
Ontology Worker

[ENTERED]
Event Time Disagreement
Around
(9.1%)
Around
2:30 p.m.
(45.45%)
2:30 p.m.
(45.45%)
TIMESTAM
P (100%)
TIMESTAM
P (100%)
type
type
TIMESTAM
P (100%)
type
Sentence
Ontology Worker

Event Participant Disagreement
[ENTERED]
Greenpeace
(15.39%)
demonstrators
(15.39%)
Greenpeace
demonstrators
(69.23%)
ORGANIZATION
(100%)
PERSON (100%)
type
type
ORGANIZATION
(77.77%)
PERSON
(22.22%)
type
type
Sentence
Ontology Worker

Comparative Annotation Distribution
Event Type Distribution Time Type Distribution
Sentence
Ontology Worker
The high disagreement for event type across all sentences likely indicates problems with the ontology. These
event types are difficult to distinguish between. The event classes may overlap, be confusable, too vague, etc.

Comparative Annotation Distribution
Location Type Distribution Participant Type Distribution
Sentence
Ontology Worker

Challenges
● Defining relevance, e.g. relevant
or related events, entities, videos
● Depicted vs. associated
relevance, e.g. in video, in audio
● Deal with reliability, e.g.
provenance
● Visualize quality analytics, e.g.
multidimensionality

Events Cultural Heritage Exploration
http://dive.beeldengeluid.nl/

Events @
• Agora: Historical Events in Cultural Heritage Collections
– http://agora.cs.vu.nl/
• Extractivism: Activist Events in Newspapers
– http://mona-project.org/
• Semantics of History
– http://www2.let.vu.nl/oz/cltl/semhis/
• BiographyNet: Events Change in Perspective over Time
• NewsReader: Multilingual Events & Storylines in Newspapers
– http://www.newsreader-project.eu/

Conclusions
● Events are just one example for diversity of human
interpretations
● Understanding crowd disagreement helps understand event
semantics
● Considering the interdependence of the different aspects of
the annotations improves their quality
● Disagreement metrics adaptable across domains - helped
us to understand the vagueness and the clarity of a sentence/
putative event
Sentence
Annotation task Worker

Questions?

CrowdTruth for User-Centric Relevance

More Related Content

Viewers also liked

Similar to CrowdTruth for User-Centric Relevance

More from Lora Aroyo

Recently uploaded

CrowdTruth for User-Centric Relevance