Crowdsourcing the Semantic Web

Crowdsourcing the
Semantic Web
Elena Simperl
University of Southampton
Seminar@University of Manchester
4 March 2015

Crowdsourcing Web semantics
• Crowdsourcing is increasingly used
to augment the results of
algorithms solving Semantic Web
problems
• Great challenges
• Which form of crowdsourcing for
which task?
• How to design the crowdsourcing
exercise effectively?
• How to combine human- and
machine-driven approaches?

There is crowdsourcing and crowdsourcing
06-Mar-15 3

Typology of crowdsourcing
Macrotasks Microtasks
Challenges Self-organized
crowds
Crowdfunding
4
Source:Source: (Prpić, Shukla, Kietzmann and McCarthy 2015)

Dimensions of crowdsourcing
• What to crowdsource?
– Tasks you can’t complete in-house or using computers
– A question of time, budget, resources, ethics etc.
• Who is the crowd?
– Crowdsourcing ≠‘turkers’
– Open call, biased through use of platforms and promotion channels
– No traditional means to manage and incentivize
– Crowd has little to no context about the project
• How to crowdsource?
– Macro vs. microtasks
– Complex workflows
– Assessment and aggregation
– Aligning incentives
– Hybrid systems
– Learn to optimize from previous interactions
5

Hybrid systems (or ‚social machines‘)
06-Mar-15 Tutorial@ISWC2013 6
Physical world
(people and devices)
Design and composition Participation and data supply
Model of social interaction
Virtual world
(Network of
social interactions)
Source: Dave Robertson

CROWDSOURCING DATA
CURATION
Crowdsourcing Linked Data quality assessment
M Acosta, A Zaveri, E Simperl, D Kontokostas, S Auer, J Lehmann
ISWC 2013, 260-276
7

Overview
• Compare two forms of crowdsourcing
(challenges and paid microtasks) in
the context of data quality assessment
and repair
• Experiments on DBpedia
– Identify potential errors, classify,
and repair them
– TripleCheckMate challenge vs.
Mechanical Turk
8

What to crowdsource
• Incorrect object
 Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.
• Incorrect data type or language tags
 Example: dbpedia:Torishima_Izu_Islands foaf:name “鳥島”@en.
• Incorrect link to “external Web pages”
 Example: dbpedia:John-Two-Hawks dbpedia-owl:wikiPageExternalLink
<http://cedarlakedvd.com/>

Who is the crowd
Challenge
LD Experts
Difficult task
Final prize
Find Verify
Microtasks
Workers
Easy task
Micropayments
TripleCheckMate
[Kontoskostas2013] MTurk
Adapted from [Bernstein2010]
http://mturk.com

How to crowdsource: workflow
11

How to crowdsource: microtask design
• Selection of foaf:name or
rdfs:label to extract human-
readable descriptions
• Values extracted automatically
from Wikipedia infoboxes
• Link to the Wikipedia article via
foaf:isPrimaryTopicOf
• Preview of external pages by
implementing HTML iframe
Incorrect object
Incorrect data type or language tag
Incorrect outlink

Experiments
• Crowdsourcing approaches
• Find stage: Challenge with LD experts
• Verify stage: Microtasks (5 assignments)
• Gold standard
• Two of the authors generated gold standard for all contest triples
indepedently
• Conflicts resolved via mutual agreement
• Validation: precision

Overall results
LD experts Microtask
workers
Number of distinct
participants
50 80
Total time
3 weeks (predefined) 4 days
Total triples evaluated
1,512 1,073
Total cost
~ US$ 400 (predefined) ~ US$ 43

Precision results: Incorrect objects
• Turkers can reduce the error rates of LD experts from the Find stage
• 117 DBpedia triples had predicates related to dates with
incorrect/incomplete values:
”2005 Six Nations Championship” Date 12 .
• 52 DBpedia triples had erroneous values from the source:
”English (programming language)” Influenced by ? .
• Experts classified all these triples as incorrect
• Workers compared values against Wikipedia and successfully classified this
triples as “correct”
Triples compared LD Experts MTurk
(majority voting: n=5)
509 0.7151 0.8977

Precision results: Incorrect data types
0
20
40
60
80
100
120
140
Date English Millimetre Nanometre
Number
Number
with
decimals
Second Volt Year Not
specified /
URI
Numberoftriples
Data types
Experts TP
Experts FP
Crowd TP
Crowd FP
Triples compared LD Experts MTurk
(majority voting: n=5)
341 0.8270 0.4752

Precision results: Incorrect links
• We analyzed the 189 misclassifications by the experts
• The 6% misclassifications by the workers correspond to
pages with a language different from English
50%
39%
11%
Freebase links
Wikipedia images
External links
Triples compared Baseline LD Experts MTurk
(n=5 majority voting)
223 0.2598 0.1525 0.9412

Conclusions
• The effort of LD experts should be invested in tasks that
demand domain-specific skills
– Experts seem less motivated to perform on tasks that
come with an additional overhead (e.g., checking an
external link, going back on Wikipedia, data repair)
• MTurk crowd was exceptionally good at performing object
types and links checks
– But seem not to understand the intricacies of data types

Future work
• Additional experiments with data types
• Workflows for new types of errors, e.g., tasks which experts cannot
answer on the fly
• Training the crowd
• Building the DBpedia ‘social machine’
– Add automatic QA tools
– Close the feedback loop (from crowd to experts)
– Change the parameters of the ‘Find’ step to achieve seamless
integration
– Change crowdsourcing model of ‘Find’ to improve retention
19

CROWDSOURCING IMAGE
ANNOTATION
Improving paid microtasks through gamification and adaptive
furtherance incentives
O Feyisetan, E Simperl, M Van Kleek, N Shadbolt
WWW 2015, to appear
20

Overview
• Make paid microtasks more
cost-effective though
gamification*
*use of game elements and
mechanics in a non-game
context

What to crowdsource: image labeling
22

Who is the crowd: paid microtask platforms
23
[Kaufmann, Schulze, Viet, 2011]

Research hypotheses
• Contributors will perform better if tasks are
more engaging
• Increased accuracy through higher inter-
annotator agreement
• Cost savings through reduced unit costs
• Micro-targeting incentives when players
attempt to quit improves retention

How to crowdsource: microtask design
• Image labeling tasks published
on microtask platform
– Free text labels, varying
numbers of labels per
image, taboo words
– 1st setting: ‘standard’ tasks,
including basic spam
control
– 2nd setting: the same
requirements and rewards,
but contributors were asked
to carry out the task in
WordSmith
25

How to crowdsource: gamification
• Levels – 9 levels from newbie to Wordsmith, function of images
tagged
• Badges – function of number of images tagged
• Bonus points – for new tags
• Treasure points – for multiples of bonus points
• Feedback alerts - related to badges, points, levels
• Leaderboard - hourly scores and level of the top 5 players
• Activities widget – real-time updates on other players
26

How to crowdsource: incentives
• Money – 5 cents extra for the effort
• Power – see how other players tagged the images
• Leaderboard - advance to the ‘Global’ leaderboard seen by everyone
• Badges – receive ’Ultimate’ badge and a shiny new avatar
• Levels – advance straight to the next level
• Access - quicker access to treasure points
27
Assigned at random or targeted based on Bayes inference
(number of tagged images as feature)

Experiments
• Crowdsourcing approaches
– Paid microtasks
– Wordsmith GWAP
• Data set: ESP Game (120K images with curated labels)
• Validation: precision
28

Experiments (2)
• 1st experiment: ‘newcomers’
– Tag one image with two keywords ~ complete
Wordsmith level 1
– 200 images, US$ 0.02 per task, 3 judgements per task
• 2nd experiment: ‘novice’
– Tag 11 images ~ complete Wordsmith level 2
– 2200 images, US$ 0.10 per task, 600 workers
– Furtherance incentives: none, random, targeted (Bayes
inference based on # of images tagged)
29

Results: 1st experiment
Metric CrowdFlower Wordsmith
Total workers 600 423
Total keywords 1,200 41,206
Unique keywords 111 5,708
Avg. agreement 5.72% 37.7%
Mean images/person 1 32
Max images/person 1 200

Results: 2nd experiment
Metric CrowdFlower Wordsmith (no
furtherance
incentives
Total workers 600 514
Total keywords 13,200 35,890
Unique keywords 1,323 4,091
Avg. agreement 6.32% 10.9%
Mean images/person 11 27
Max images/person 1 351

Results: 2nd experiment, incentives
• Incentives led to increased participation
– Power: mean image/person = 132
– People come back (20 times!) and play longer (43 hours,
3 hours without incentives)
32

Conclusions and future work
• Task design matters as much as payment
• Gamification achieves high accuracy for lower costs and
improved engagement
• Workers appreciate social features, but their main
motivation is still task-driven
– Feedback mechanisms, peer learning, training
33

Email: e.simperl@soton.ac.uk
Twitter: @esimperl
34

Crowdsourcing the Semantic Web

Recommended

Recommended

More Related Content

Similar to Crowdsourcing the Semantic Web

Similar to Crowdsourcing the Semantic Web (20)

More from Elena Simperl

More from Elena Simperl (20)

Recently uploaded

Recently uploaded (20)

Crowdsourcing the Semantic Web