The document discusses using both experts and crowdsourcing to enrich cultural heritage knowledge. It describes four research projects: 1) extracting knowledge from social tagging, 2) harnessing non-expert medical knowledge, 3) obtaining expert knowledge through nichesourcing, and 4) capturing event knowledge. The goal is to develop scalable and reliable methods of capturing human knowledge to improve machine systems.
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Crowds
1. Crowdsourcing
&
Nichesourcing:
Enriching
Cultural
Heritage
with
Experts
&
Crowds
Lora
Aroyo
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
2. so9ware
systems
are
ever
more
intelligent
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
but
they
don’t
actually
understand
people
3. focus
on
human
knowledge
in
machine-‐readable
form
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
but
there
are
types
of
human
knowledge
that
can’t
be
captured
by
machines
5. classical
AI
involves
human
experts
to
manually
provide
training
knowledge
for
machines
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
human
expert-‐based
ground
truth
does
not
scale
for
current
demand
for
machines
to
deal
with
wide
ranges
of
real-‐world
tasks
and
contexts
7. QuanCty
is
the
new
Quality
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
Human
ComputaCon
adopts
human
intelligence
at
scale
to
improve
purely
machine-‐based
systems
9. humans
accurately
perform
interpretaCon
tasks
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
can
their
effort
be
adequately
harnessed
in
a
scienCfically
reliable
manner
that
scales
across
tasks,
contexts
&
data
modaliCes?
13. http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
1.
extracCng
knowledge
from
social
annotaCon
2.
harnessing
of
non-‐expert
human
knowledge
3.
nichesourcing
of
expert
knowledge
Research
Projects
14. 4.
capturing
event
knowledge
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
1.
extracCng
knowledge
from
social
annotaCon
2.
harnessing
of
non-‐expert
human
knowledge
3.
nichesourcing
of
expert
knowledge
Research
Projects
15. 1.
Human
Knowledge
from
Social
Tagging
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
16. Waisda?
@
Beeld
en
Geluid
– an
online
video
tagging
game,
e.g.
NCRV
and
VARA
– power
of
the
crowd
for
Web-‐scale
enrichment
of
A/V
collecAons
– significant
accuracy
for
improving
IR
beyond
the
state-‐
of-‐the-‐art
of
expert-‐based
annotaAons
– sustainable
means
for
engaging
large
volunteer
crowds
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
1.
Human
Knowledge
from
Social
Tagging
18. Following the grandeur of Baroque, Rococo art is
often dismissed as frivolous and unserious, but
Waldemar Januszczak disagrees. […] The first
episode is about travel in the 18th century and how
it impacted greatly on some of the finest art ever
made. The world was getting smaller and took on
new influences shown in the glorious Bavarian
pilgrimage architecture, Canaletto's romantic
Venice and the blossoming of exotic designs and
tastes all over Europe.
Rococo:
Travel,
pleasure,
madness
A
boarding
school
where
boys
from
the
Dutch
East
Indies
receive
a
vocaAonal
educaAon
has
been
set
up
by
the
ministry
of
Social
Affairs
near
Batavia.
Trumpeter
sounds
the
reveille;
-‐
the
boys
get
out
of
bed;
-‐
the
muster
is
held
in
front
of
the
building;
aIerwards
the
boys
stand
in
line
with
a
food
bowl;
they
get
rice
pudding
and
eat
this
around
long
tables
in
the
open
air;
-‐
the
boys
get
instrucAons
and
pracAce
a.o.
drawing
shapes,
forging,
metalworking,
filing
and
woodworking;
News
from
Indonesia:
youth
care
hZp://vista-‐tv.eu
hZp://dive.beeldengeluid.nl
Crowdsourcing
for
Video
Analysis
19. 2.
Harnessing
Non-‐expert
Human
Knowledge
in
Medical
Domain
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
20. 2.
Harnessing
Non-‐expert
Human
Knowledge
in
Medical
Domain
Training
IBM
Watson
– gathering
higher
quality
ground
truth
data
from
the
crowd
than
from
experts
– open
source
human-‐machine
computaAon
framework
for
accurately
measuring
quality
of
resulAng
data
– CrowdTruth.org
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
25. 3.
Nichesourcing
Expert
Knowledge
Cultural
Heritage
– Different
layers
and
domains
of
experAse
– Intrinsic
moAvaAon
of
knowledgeable
crowds
– Accurator.nl
@
Rijksmuseum
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
seman'c
enrichment
with
LOD
can
be
complemented
with
domain
knowledge
from
niches
of
experts
26. there
are
massive
niches
of
experts
online
…
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
30. 4.
Capturing
Event
Knowledge
interpretaCon
support
framework
Digital
HermeneuCcs
– text,
images
and
videos,
e.g.
newspapers,
tweets,
TV
and
radio
programs,
and
images
– events
carry
ambiguity,
bias
&
wide
range
of
perspecAves
– knowledge
representaAon
challenges,
e.g
informaAon
granularity,
temporal
and
provenance
modeling
– Agora,
DIVE+
@
KB,
Beeld
en
Geluid,
Amsterdam
Museum
– Crowddriven.nl
for
TV
show
Game
of
Thrones
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
32. “Digital
Hermeneu/cs:
Agora
and
the
online
understanding
of
cultural
heritage”
In
proceedings
of
Web
Science
Conference,
(ACM:
New
York,
2011)
InterpretaCon
Support
for
Scholars
35. http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
understanding
perspecCves
– diversity
of
opinions
from
the
crowd
– mulAtude
of
contexts
– ambiguity
in
language,
images
or
videos
– independent
interpretaCons
– aggregated
view
–
the
big
picture
– a
new
approach
to
understanding
semanCcs
by
harnessing
the
power
&
diversity
(disagreement
on
the
correct
interpretaAon)
of
the
crowd
– human
disagreement
is
essenCal
in
helping
machines
with
semanCc
interpretaCon