Harnessing Human Semantics at Scale (updated)

http://lora-aroyo.org @laroyo
Harnessing Human Semantics at Scale
Measurable, Reproducible, Engaging, Sustainable
Crowdsourcing & Nichesourcing
Lora Aroyo
Join to participate in the CATS4ML Data Challenge
cats4ml.humancomputation.com

20071998 2006 2009
the data science journey of an online DVD rental

20071998 2006 2009
streaming experiments

20071998 2006 2009
Netflix Prize

20071998 2006 2009
Team BellKor wins Netflix Prize

20071998 2006 2014
data science for personalization

20061994 2003 2016 2017
the data science journey of an online bookstore

https://data-flair.training/blogs/data-science-use-cases/

data is at the centre of every process

data is essential to evolve with users

Ceci n'est pas … la mona lisa

Ceci n'est pas … la mona lisa
Louvre’s Mona Lisa
is only #14

the battle of two worlds
9,6 million
Louvre
visitors 2019
14 million
website visitors
2,3 million
social media

in the (very near) future
most visitors will be digital-born
not bound by time or location
native to new forms of co-makership
native to new media
Siebe Weide, Max Meijer and Marieke Krabshuis (2012).
Agenda 2026: Study on the Future of the Dutch Museum Sector

variety of meanings
multitude of perspectives
abundance of sources
endless contexts
know your data

crowdsourcing to know your data at scale

variety of types
multitude of platforms
abundance of interactions
endless characteristics
know your crowds

https://www.rijksmuseum.nl/en/rijksstudio
Engage with Co-creation

Engage with Co-creativity

Engage with Co-curation

Engage the Expert Niche
http://annotate.accurator.nl

expertise of Rijksmuseum professionals is
in annotating their collection
with art-historical information, e.g. when they
were created, by whom, etc.

detailed domain-specific information
about depicted objects, e.g. which species the
animal or plant belongs to,
is in most cases not available

use nichesourcing, i.e. niches of people with
the right expertise, to add more specific
information

Keep Reproducing
http://annotate.accurator.nl

Engage with Games
training the general crowd to be a niche:
game in which players can carry out an expert
annotation tasks with some assistance

http://waisda.nl
Engage with Games

http://spotvogel.vroegevogels.vara.nl
Keep Reproducing

CrowdTruth.org
Experiment with Paid Crowds

http://crowdtruth.org/

http://data.crowdtruth.org/

Challenges

Low reproducibility rates
Difficult to estimate & control the time to complete
Difficult to assess & compare quality
Demands continuous promotional effort
Active learning (human-in-the-loop) needs different expertise
Difficult to incorporate results into existing content infrastructure
Challenges
Crowdsourcing typically undertaken in isolation

Assess Impact of Task Design

Instructions
Layout
Sequence
Crowds
Payment
Campaign
Assess Impact of Task Design
experiment with different designs

for example
mapping music to mood

Choose one:
Which is the mood most appropriate
for each song?
Goal:
(Lee and Hu 2012)
1 song - 1 mood???

If “One Truth” & “No Disagreement”
Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5
W1 1
W2 1
W3 1
W4 1
W5 1
W6 1
W7
W8
W9 1
W10 1
Totals 1 3 1 2 1

Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other
W1 1 1 1
W2 1 1 1
W3 1 1 1
W4 1 1
W5 1 1
W6 1 1 1
W7 1 1 1
W8 1 1 1
W9 1 1
W10 1 1 1 1 1
Totals 3 5 6 5 2 8
If “Many Truths” & “Disagreement”

Web & Media Group
simplification of context
this all results in

Web & Media Group

● Identify Crowdsourcing Goals through user log analysis
○ # queries, #unique queries, #queries of specific type
○ ranked by popularity
○ ranked by popularity and with error, e.g.
■ # queries entered over 50 times with 0 results
■ # queries of specific type with 0 results
○ which will have biggest impact
○ which has biggest urgency
● … or through other user analysis
○ museum visits, external channels
Assess Impact of Results

for example
in video search

people search for fragments
experts annotate full videos
35% of search queries result in not found
people search for fragments
experts annotate full videos
35% of search queries result in not found
for example
in video search

Measure Quality
“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011

Measure Quality
time-based annotation
bernhard
88% of the tags useful
for specific genres
describe short segments
often not very specific
don’t describe program as a whole

for example
in video search
video annotation is time-consuming
5 times the video duration
experts use a specific vocabulary
that is unknown to general audiences
video annotation is time-consuming
5 times the video duration
experts use a specific vocabulary
that is unknown to general audiences

user vocabulary
8% in professional vocabulary
23% in Dutch lexicon
89% found on Google
locations (7%)
engeland
persons (31%)
objects (57%)
Measure Quality

Web & Media Group
human subjectivity, ambiguity & uncertainty of expression
natural part of human semantics

measure quality
quality is not just about spam
quality is typically multi-dimensional
understand the diversity in crowd answers
do not ignore multitude of interpretations
understand the variety of contexts
identify cases with high ambiguity, similarity, …
experiment with explicit metrics
experiment with different designs

Measure Progress
6 months 2 years
340,551 tags 36,981 tags
137.421 matches
602 items 1.782 items
555 registered players 2,017 users (taggers)
thousands of anonymous players
12,279 visits (3+ min online)
44,362 pageviews
Riste Gligorov, Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber, Lora Aroyo (2011).
On the role of user-generated metadata in audio visual collections. International conference
on Knowledge capture K-CAP '11, Pages 145-152

campaign, campaign, campaign

Measurable quality
Reproducible results
Sustainable settings
Engaging interaction
Goals

Crowdsourcing AI Blindspots
CATS4ML Data Challenge
ˈl ɪ k ə r t

Your AI model is as good
as your evaluation data
… but is your evaluation
data missing relevant
examples?
… and how can we ﬁnd
such examples, especially
if they are AI blindspots
(i.e. unknown unknowns)?
CATS4ML Challenge
offers a crowdsourced red
team for finding
blindspots of your AI
models

AI Blindspots
real images with visual patterns that confuse AI models
in ways humans might find meaningful
Lipstick?
Airplane?
Car?
Construction worker? Thanksgiving?
https://opensource.google/projects/open-images-dataset

Inspired by Bug Bounty
this is a data challenge to find
the blindspots in our AI models
Challenge running
until mid Jan 2021
Join, and start hunting
for AI Blindspots, and
spread the word to other
teams that might be
interested!

Harnessing Human Semantics at Scale (updated)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Harnessing Human Semantics at Scale (updated)

Similar to Harnessing Human Semantics at Scale (updated) (20)

More from Lora Aroyo

More from Lora Aroyo (18)

Recently uploaded

Recently uploaded (20)

Harnessing Human Semantics at Scale (updated)