Crowdsourcing for research libraries

926 views
850 views

Published on

Invited talk at the LIBER2014

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
926
On SlideShare
0
From Embeds
0
Number of Embeds
127
Actions
Shares
0
Downloads
22
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Crowdsourcing for research libraries

  1. 1. CROWDSOURCING CONTENT MANAGEMENT: CHALLENGES AND OPPORTUNITIES ELENA SIMPERL UNIVERSITY OF SOUTHAMPTON 03-Jul-14 LIBER2014 1
  2. 2. EXECUTIVE SUMMARY Crowdsourcing helps with content management tasks. However, • there is crowdsourcing and crowdsourcing  pick your faves and mix them • human intelligence is a valuable resource  experiment design is key • sustaining engagement is an art  crowdsourcing analytics may help • computers are sometimes better than humans  the age of ‘social machines’ 2
  3. 3. CROWDSOURCING: PROBLEM SOLVING VIA OPEN CALLS "Simply defined, crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. This can take the form of peer-production (when the job is performed collaboratively), but is also often undertaken by sole individuals. The crucial prerequisite is the use of the open call format and the large network of potential .“ [Howe, 2006] 03-Jul-14 3
  4. 4. THE MANY FACES OF CROWDSOURCING 03-Jul-14 4
  5. 5. CROWDSOURCING AND RESEARCH LIBRARIES CHALLENGES Understand what drives participation Design systems to reach critical mass and sustain engagement OPPORTUNITIES Better ‘customer’ experience Enhanced information management Capitalize on crowdsourced scientific workflows 03-Jul-14 5
  6. 6. 03-Jul-14 Tutorial@ISWC2013 IN THIS TALK: CROWDSOURCING AS ‚HUMAN COMPUTATION‘ Outsourcing tasks that machines find difficult to solve to humans 6
  7. 7. IN THIS TALK: CROWDSOURCING DATA CITATION ‘The USEWOD experiment ‘ • Goal: collect information about the usage of Linked Data sets in research papers • Explore different crowdsourcing methods • Online tool to link publications to data sets (and their versions) • 1st feasibility study with 10 researchers in May 2014 03-Jul-14 7 http://prov.usewod.org/ 9650 publications
  8. 8. 03-Jul-14 8 DIMENSIONS OF CROWDSOURCING
  9. 9. DIMENSIONS OF CROWDSOURCING WHAT IS OUTSOURCED Tasks based on human skills not easily replicable by machines • Visual recognition • Language understanding • Knowledge acquisition • Basic human communication • ... WHO IS THE CROWD • Open call (crowd accessible through a platform) • Call may target specific skills and expertise (qualification tests) • Requester typically knows less about the ‘workers’ than in other ‘work’ environments 03-Jul-14 9 See also [Quinn & Bederson, 2012]
  10. 10. DIMENSIONS OF CROWDSOURCING (2) HOW IS THE TASK OUTSOURCED • Explicit vs. implicit participation • Tasks broken down into smaller units undertaken in parallel by different people • Coordination required to handle cases with more complex workflows • Partial or independent answers consolidated and aggregated into complete solution 03-Jul-14 10 See also [Quinn & Bederson, 2012]
  11. 11. EXAMPLE: CITIZEN SCIENCE WHAT IS OUTSOURCED • Object recognition, labeling, categorization in media content WHO IS THE CROWD • Anyone HOW IS THE TASK OUTSOURCED • Highly parallelizable tasks • Every item is handled by multiple annotators • Every annotator provides an answer • Consolidated answers solve scientific problems 03-Jul-14 11
  12. 12. USEWOD EXPERIMENT: TASK AND CROWD WHAT IS OUTSOURCED Annotating research papers with data set information • Alternative representations of the domain • What if the paper is not available? • What if the domain is not known in advance or is infinite? • Do we know the list of potential answers? • Is there only one correct solution to each atomic task? • How many people would solve the same task? WHO IS THE CROWD • People who know the papers or the data sets • Experts in the (broader ) field • Casual gamers • Librarians • Anyone (knowledgeable of English, with a computer/cell phone…) • Combinations thereof… 03-Jul-14 12
  13. 13. USEWOD EXPERIMENT: TASK DESIGN HOW IS THE TASK OUTSOURCED: ALTERNATIVE MODELS • Use the data collected here to train a IE algorithm • Use paid microtask workers to go a first screening, then expert crowd to sort out challenging cases • What if you have very long documents potentially mentioning different/unknown data sets? • Competition via Twitter • ‘Which version of DBpedia does this paper use?’ • One question a day, prizes • Needs golden standard to bootstrap and redundancy • Involve the authors • Use crowdsourcing to find out Twitter accounts, then launch campaign on Twitter • Write an email to the authors… • Change the task • Which papers use Dbpedia 3.X? • Competition to find all papers 03-Jul-14 13
  14. 14. DIMENSIONS OF CROWDSOURCING (3) HOW ARE THE RESULTS VALIDATED • Solutions space closed vs. open • Performance measurements/ground truth • Statistical techniques employed to predict accurate solutions • May take into account confidence values of algorithmically generated solutions HOW CAN THE PROCESS BE OPTIMIZED • Incentives and motivators • Assigning tasks to people based on their skills and performance (as opposed to random assignments) • Symbiotic combinations of human- and machine- driven computation, including combinations of different forms of crowdsourcing 03-Jul-14 14 See also [Quinn & Bederson, 2012]
  15. 15. USEWOD EXPERIMENT: VALIDATION • Domain is fairly restricted • Spam and obvious wrong answers can be detected easily • When are two answers the same? Can there be more than one correct answer per question? • Redundancy may not be the final answer • Most people will be able to identify the data set, but sometimes the actual version is not trivial to reproduce • Make educated version guess based on time intervals and other features 03-Jul-14 15
  16. 16. ALIGNING INCENTIVES IS ESSENTIAL Successful volunteer crowdsourcing is difficult to predict or replicate • Highly context-specific • Not applicable to arbitrary tasks Reward models often easier to study and control (if performance can be reliably measured) • Different models: pay-per-time, pay-per-unit, winner- takes-it-all • Not always easy to abstract from social aspects (free- riding, social pressure) • May undermine intrinsic motivation 16
  17. 17. IT‘S NOT ALWAYS JUST ABOUT MONEY 03-Jul-14 17 http://www.crowdsourcing.org/editorial/how-to-motivate-the-crowd-infographic/ http://www.oneskyapp.com/blog/tips-to-motivate-participants-of-crowdsourced- translation/ [Source: Kaufmann, Schulze, Viet, 2011] [Source: Ipeirotis, 2008]
  18. 18. CROWDSOURCING ANALYTICS 03-Jul-14 18 0 2 4 6 8 10 12 14 16 18 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Activeusersin% Month since registration See also [Luczak-Rösch et al. 2014]
  19. 19. USEWOD EXPERIMENT: OTHER INCENTIVES MODELS • Who benefits from the results • Who owns the results • Twitter-based contest • ‘Which version of DBpedia does this paper use?’ • One question a day, prizes • If question is not answered correctly, increase the prize • If low participation, re-focus the audience or change the incentive. • Altruism: for each ten papers annotated we send a student to ESWC… 03-Jul-14 19 [Source: Nature.com]
  20. 20. DIFFERENT CROWDS FOR DIFFERENT TASKS Contest Linked Data experts Difficult task Final prize Find Verify Microtasks Workers Easy task Micropayments TripleCheckMate [Kontoskostas2013] MTurk http://mturk.com See also [Acosta et al., 2013] 20
  21. 21. Not sure COMBINING HUMAN AND COMPUTATIONAL INTELLIGENCE EXAMPLE: BIBLIOGRAPHIC DATA INTEGRATION 21 paper conf Data integration VLDB-01 Data mining SIGMOD-02 title author email OLAP Mike mike@a Social media Jane jane@b Generate plausible matches – paper = title, paper = author, paper = email, paper = venue – conf = title, conf = author, conf = email, conf = venue Ask users to verify paper conf Data integration VLDB-01 Data mining SIGMOD-02 title author email venue OLAP Mike mike@a ICDE-02 Social media Jane jane@b PODS-05 Does attribute paper match attribute author? NoYes See also [McCann, Shen, Doan, 2008]
  22. 22. 03-Jul-14 22 SUMMARY AND FINAL REMARKS [Source: Dave de Roure]
  23. 23. SUMMARY • There is crowdsourcing and crowdsourcing  pick your faves and mix them • Human intelligence is a valuable resource  experiment design is key • Sustaining engagement is an art  crowdsourcing analytics may help • Computers are sometimes better than humans  the age of ‘social machines’ 03-Jul-14 23
  24. 24. THE AGE OF SOCIAL MACHINES 03-Jul-14 24
  25. 25. E.SIMPERL@SOTON.AC.UK @ESIMPERL WWW.SOCIAM.ORG WWW.PLANET-DATA.EU THANKS TO MARIBEL ACOSTA, LAURA DRAGAN, MARKUS LUCZAK-RÖSCH, RAMINE TINATI, AND MANY OTHERS 03-Jul-14 25

×