Your SlideShare is downloading. ×
0
HUMAN
COMPUTATION IN THE
LINKED DATA
MANAGEMENT LIFE
CYCLE
ELENA SIMPERL
UNIVERSITY OF SOUTHAMPTON
7/18/2013
1st PRELIDA w...
HUMAN
COMPUTATION
Outsourcing tasks that machines find difficult to solve to
humans (accuracy, efficiency, costs)
SEMANTIC TECHNOLOGIES
ARE ALL ABOUT
AUTOMATION
…but many tasks rely
on human input
• Modeling a domain
• Integrating data ...
DIMENSIONS OF HUMAN
COMPUTATION SYSTEMS
What
Tasks that
require basic
human skills
How
Distribution
Coordination
Aggregati...
GAMES WITH A
PURPOSE (GWAP)
Human computation disguised as casual games
Tasks are divided into parallelizable atomic units...
MICROTASK
CROWDSOURCING
Similar types of tasks, but different incentives model
(monetary reward, PPP)
Successfully applied...
THE SAME, BUT
DIFFERENT
• Tasks leveraging common human skills, appealing to large
audiences
• Selection of domain and tas...
Physical World
(people and devices)
HYBRID SYSTEMS
Design and
composition
Participation and
data supply
Model of social in...
Not sure
EXAMPLE: HYBRID DATA
INTEGRATION
paper conf
Data integration VLDB-01
Data mining SIGMOD-02
title author email
OLA...
EXAMPLES FROM
THE LINKED DATA
WORLD
ELENA SIMPERL
UNIVERSITY OF SOUTHAMPTON, UK
7/18/2013
1st PRELIDA workshop
10
WHAT IS DIFFERENT ABOUT
SEMANTIC SYSTEMS?
Semantic Web tools vs.
applications
• Intelligent (specialized) Web
sites (porta...
TASKS NAMED IN
METHODOLOGIES ARE TOO HIGH-
LEVEL
Crowdsource very specific tasks that
are (highly) divisible
• Labeling (i...
TASTE IT! TRY IT!
• Restaurant review Android app developed in the Insemtives project
• Uses Dbpedia concepts to generate ...
LODREFINE
7/18/2013
1st PRELIDA workshop
14
http://research.zemanta.com/crowds-to-the-rescue/
DBPEDIA CURATION
7/18/2013
1st PRELIDA workshop
15
http://aksw.org/Projects/TripleCheckMate.html
CROWDMAP
Experiments using MTurk, CrowdFlower and established benchmarks
Enhancing the results of automatic techniques
Fas...
ONTOLOGY
POPULATION
7/18/2013
1st PRELIDA workshop
17
LINKED DATA
CURATION
7/18/2013
1st PRELIDA workshop
18
PROBLEMS AND
CHALLENGES
•What is feasible and how can tasks be optimally translated into microtasks?
• Examples: data qual...
Upcoming SlideShare
Loading in...5
×

Crowdsourcing Linked Data management

285

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
285
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Crowdsourcing Linked Data management"

  1. 1. HUMAN COMPUTATION IN THE LINKED DATA MANAGEMENT LIFE CYCLE ELENA SIMPERL UNIVERSITY OF SOUTHAMPTON 7/18/2013 1st PRELIDA workshop 1
  2. 2. HUMAN COMPUTATION Outsourcing tasks that machines find difficult to solve to humans (accuracy, efficiency, costs)
  3. 3. SEMANTIC TECHNOLOGIES ARE ALL ABOUT AUTOMATION …but many tasks rely on human input • Modeling a domain • Integrating data sources originating from different contexts • Producing semantic markup for various types of digital artifacts • ... 3 1st PRELIDA workshop
  4. 4. DIMENSIONS OF HUMAN COMPUTATION SYSTEMS What Tasks that require basic human skills How Distribution Coordination Aggregation Quality Closed vs open answers Ground truth Quantitative vs qualitative Who is the evaluator? Optimize! Incentives Reduce problem size Task assignment 7/18/2013 1st PRELIDA workshop 4
  5. 5. GAMES WITH A PURPOSE (GWAP) Human computation disguised as casual games Tasks are divided into parallelizable atomic units (challenges) solved (consensually) by players Game models • Single vs. multi-player • Selection agreement vs. input agreement vs. inversion- problem games 7/18/2013 5
  6. 6. MICROTASK CROWDSOURCING Similar types of tasks, but different incentives model (monetary reward, PPP) Successfully applied to transcription, classification, and content generation, data collection, image tagging, website feedback, usability tests… 7/18/2013 1st PRELIDA workshop 6
  7. 7. THE SAME, BUT DIFFERENT • Tasks leveraging common human skills, appealing to large audiences • Selection of domain and task more constrained in games to create typical UX • Tasks decomposed into smaller units of work to be solved independently • Complex workflows • Creating a casual game experience vs. patterns in microtasks • Quality assurance • Synchronous interaction in games • Levels of difficulty and near-real-time feedback in games • Many methods applied in both cases (redundancy, votes, statistical techniques) • Different set of incentives and motivators 7/18/2013 1st PRELIDA workshop 7
  8. 8. Physical World (people and devices) HYBRID SYSTEMS Design and composition Participation and data supply Model of social interaction Virtual world (Network of social interactions) Dave Robertson
  9. 9. Not sure EXAMPLE: HYBRID DATA INTEGRATION paper conf Data integration VLDB-01 Data mining SIGMOD-02 title author email OLAP Mike mike@a Social media Jane jane@b Generate plausible matches – paper = title, paper = author, paper = email, paper = venue – conf = title, conf = author, conf = email, conf = venue Ask users to verify paper conf Data integration VLDB-01 Data mining SIGMOD-02 title author email venue OLAP Mike mike@a ICDE-02 Social media Jane jane@b PODS-05 Does attribute paper match attribute author? NoYes [McCann, Shen, Doan, ICDE 2008] 9
  10. 10. EXAMPLES FROM THE LINKED DATA WORLD ELENA SIMPERL UNIVERSITY OF SOUTHAMPTON, UK 7/18/2013 1st PRELIDA workshop 10
  11. 11. WHAT IS DIFFERENT ABOUT SEMANTIC SYSTEMS? Semantic Web tools vs. applications • Intelligent (specialized) Web sites (portals) with improved (local) search based on vocabularies and ontologies • X2X integration (often combined with Web services) • Knowledge representation, communication and exchange 7/18/2013 1st PRELIDA workshop
  12. 12. TASKS NAMED IN METHODOLOGIES ARE TOO HIGH- LEVEL Crowdsource very specific tasks that are (highly) divisible • Labeling (in different languages) • Finding relationships • Populating the ontology • Aligning and interlinking • Ontology-based annotation • Validating the results of automatic methods • … Think about the context of the application (social structure) and about how to hide tasks behind existing practices and tools 12 7/18/2013 Tutorial@ESWC2013
  13. 13. TASTE IT! TRY IT! • Restaurant review Android app developed in the Insemtives project • Uses Dbpedia concepts to generate structured reviews • Uses mechanism design/gamification to configure incentives • User study • 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts 7/18/2013 1st PRELIDA workshop 13 https://play.google.com/store/apps/details?id=insemtives.android&hl=en 0 500 1000 1500 2000 2500 CAFE FASTFOOD PUB RESTAURANT Numer of reviews Number of semantic annotations (type of cuisine) Number of semantic annotations (dishes)
  14. 14. LODREFINE 7/18/2013 1st PRELIDA workshop 14 http://research.zemanta.com/crowds-to-the-rescue/
  15. 15. DBPEDIA CURATION 7/18/2013 1st PRELIDA workshop 15 http://aksw.org/Projects/TripleCheckMate.html
  16. 16. CROWDMAP Experiments using MTurk, CrowdFlower and established benchmarks Enhancing the results of automatic techniques Fast, accurate, cost-effective [Sarasua, Simperl, Noy, ISWC2012] 16 CartP 301-304 100R50P Edas-Iasted 100R50P Ekaw-Iasted 100R50P Cmt-Ekaw 100R50P ConfOf-Ekaw Imp 301-304 PRECISION 0.53 0.8 1.0 1.0 0.93 0.73 RECALL 1.0 0.42 0.7 0.75 0.65 1.0
  17. 17. ONTOLOGY POPULATION 7/18/2013 1st PRELIDA workshop 17
  18. 18. LINKED DATA CURATION 7/18/2013 1st PRELIDA workshop 18
  19. 19. PROBLEMS AND CHALLENGES •What is feasible and how can tasks be optimally translated into microtasks? • Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions •What to show to users • Natural language descriptions of Linked Data/SPARQL • How much context • What form of rendering • How about links? •How to combine with automatic tools • Which results to validate • Low precision (no fun for gamers...) • Low recall (vs all possible questions) •How to embed it into an existing application • Tasks are fine granular, perceived as additional burden to the actual functionality •What to do with the resulting data? • Integration into existing practices • Vocabularies! 7/18/2013 1st PRELIDA workshop 19
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×