Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SOCIAL 
SEMANTIC WEB 
AND 
CROWDSOURCING 
ELENA SIMPERL, UNIVERSITY OF 
SOUTHAMPTON, UK 
E.SIMPERL@SOTON.AC.UK, @ESIMPERL ...
FUNDAMENTALS OF THE SOCIAL WEB 
03.09.14 
2
THE SOCIAL WEB 
Software deployed over the Web, designed and developed to 
foster and leverage social interaction 
Users a...
DEVELOPMENT 
Early ‘90s 
• Read-only, static Web sites 
• Interaction via posts on personal Web 
sites possibly responding...
CHALLENGES OF 
SOCIAL SOFTWARE 
Many isolated communities 
• Data silos 
Challenges 
• Data cannot be shared and reused ea...
SEMANTIC SOCIAL SOFTWARE 
03.09.14 
6
THE ROLE OF SEMANTIC 
TECHNOLOGIES 
SEMANTICS FOR… 
Advanced 
information 
management 
Data interoperability 
Data privacy...
ONTOLOGIES 
FOAF (Friend-of-a-Friend) 
SIOC (Semantically-Interlinked Online Communities) 
OPO (Online Presence Ontology) ...
OPEN GRAPH 
Represent Web content in a social 
graph in an interoperable way 
Design decisions, see 
http://www.scribd.com...
SOFTWARE 
PLATFORMS 
• Drupal (RDFa), https://www.drupal.org/ 
• Semantic MediaWiki, https://semantic-mediawiki.org/ 
• Se...
11 
COLLABORATIVE TECHNOLOGIES FOR THE 
SEMANTIC WEB 
03.09.14
BASICS 
Idea: leverage Social Web technologies and platforms to 
create and manage Semantic Web content 
Two classes of ap...
CROWDSOURCING: 
PROBLEM SOLVING VIA 
OPEN CALLS 
"Simply defined, crowdsourcing represents the act of a 
company or instit...
THE MANY FACES OF 
CROWDSOURCING 
14 
Image from http://03.09.14 dailycrowdsource.com/
RELATED: HUMAN 
COMPUTATION 
Outsourcing tasks that machines find difficult to solve 
to humans 
03.09.14 
15
IN THIS TALK: 
CROWDSOURCING DATA 
CITATION 
‘The USEWOD experiment ‘ 
• Goal: collect information about 
the usage of Lin...
HOW TO 
CROWDSOURCE 
What: Goal 
Who: Contributors 
How: Process 
Why: Motivation and 
incentives 
http:// 
sloanreview.mi...
EXAMPLE: CITIZEN SCIENCE 
WHAT IS OUTSOURCED 
• Object recognition, labeling, categorization 
in media content 
WHO IS THE...
WHY 
Money (rewards) 
Love 
Glory 
Love and glory reduce costs 
Money and glory make the crowd 
move faster 
Intrinsic vs ...
WHY (2) 
In our example: 
Who benefits from the results 
Who owns the results 
Money 
• Different models: pay-per-time, pa...
WHAT 
In general: something you cannot do (fully) automatically 
In our example: annotating research papers with data set ...
WHO 
In general 
• An open (‘unknown’) crowd 
• Scale helps 
• Qualifications might be a pre-requisite 
In our example 
• ...
HOW: PROCESS 
In general: 
• Explicit vs. implicit participation 
• Affects motivation 
See also [Quinn & Bederson, 2012] ...
HOW: PROCESS (2) 
In our example: 
• Use the data collected here to train a IE algorithm 
• Use paid microtask workers to ...
HOW: VALIDATION 
In general: 
• Solutions space closed vs. open 
• Redundancy vs. iterations/peer-review 
• Performance me...
03.09.14 
26 
EXAMPLES 
Image courtesy of M. Acosta
TASTE IT! TRY IT! 
Restaurant review Android app developed in the Insemtives project 
Uses Dbpedia concepts to generate st...
ONTOPRONTO 
GWAP ~ Game with a purpose 
Players‘ contributions are used to 
improve the results of technically 
challengin...
URBANOPOLY 
03.09.14 
29 
http://www.cefriel.com/en/urbanopoly-game/
See also [Sarasua, Simperl, Noy, ISWC2012] 
CROWDMAP 
Experiments using MTurk, CrowdFlower and established benchmarks 
Enh...
See also [Acosta et al., ISWC 2013] 
DBPEDIA CURATION 
Find 
Contest 
Linked Data experts 
Difficult task 
Final prize 
Ve...
CROWDQ 
Understand the meaning of a keyword query 
Build a structured (SPARQL) query template 
Answer the query over Linke...
LODREFINE 
03.09.14 
33 
http://research.zemanta.com/crowds-to-the-rescue/
PROBLEMS AND 
CHALLENGES 
• What is feasible and how can tasks be optimally crowdsourced, e.g., translated into 
microtask...
FURTHER READING 
03.09.14 
35
E.SIMPERL@SOTON.AC.UK 
@ESIMPERL 
WWW.SOCIAM.ORG 
WWW.PLANET-DATA.EU 
03.09.14 
36
Upcoming SlideShare
Loading in …5
×

Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014

580 views

Published on

Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014

  • Be the first to comment

Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014

  1. 1. SOCIAL SEMANTIC WEB AND CROWDSOURCING ELENA SIMPERL, UNIVERSITY OF SOUTHAMPTON, UK E.SIMPERL@SOTON.AC.UK, @ESIMPERL WITH CONTRIBUTIONS FROM “COMBINING THE SOCIAL AND THE SEMANTIC WEB”, TUTORIAL @ESWC2011 BY SHEILA KINSELLA, DERI, IE AND DENNY VRANDECIC, KIT, DE, HTTP://SEMANTICWEB.ORG/WIKI/COMBINING_THE_SOCIAL_AND_THE_SEMANTIC_WEB 03.09.14 1
  2. 2. FUNDAMENTALS OF THE SOCIAL WEB 03.09.14 2
  3. 3. THE SOCIAL WEB Software deployed over the Web, designed and developed to foster and leverage social interaction Users are connected via common digital artifacts they share and through social ties • Sharing platforms such as flickr • Collaborative platforms such Wikipedia, discussion forums, Quora, Groupon etc • Social networks such as Facebook combining both aspects 03.09.14 3
  4. 4. DEVELOPMENT Early ‘90s • Read-only, static Web sites • Interaction via posts on personal Web sites possibly responding to someone else’s statements Mid '90s • Read-write applications • Gradual increase in interaction between site owners and their readers/customers through various communication features Late ‘90s to present • First social sites, increasingly ubiquitous deployment of social features in non-social Web sites 4
  5. 5. CHALLENGES OF SOCIAL SOFTWARE Many isolated communities • Data silos Challenges • Data cannot be shared and reused easily • No control over the management and transfer of personal data • User engagement and incentives engineering 5 03.09.14
  6. 6. SEMANTIC SOCIAL SOFTWARE 03.09.14 6
  7. 7. THE ROLE OF SEMANTIC TECHNOLOGIES SEMANTICS FOR… Advanced information management Data interoperability Data privacy Data liberation SCENARIOS 1. Semantic technologies as integral part of the functionality of social software, enabling interoperability 2. Enhanced information management through the usage of machine-processable semantics 03.09.14 7
  8. 8. ONTOLOGIES FOAF (Friend-of-a-Friend) SIOC (Semantically-Interlinked Online Communities) OPO (Online Presence Ontology) Tagging ontologies 03.09.14 8
  9. 9. OPEN GRAPH Represent Web content in a social graph in an interoperable way Design decisions, see http://www.scribd.com/doc/ 30715288/The-Open-Graph-Protocol- Design-Decisions Used by Facebook (‘stories’), Google (snippets), IMDb etc. Facebook: actors, apps, objects with metadata to create stories • Example: Elena has finished reading ‘The Economist’, an object of type Newspaper • Types with attributes, extensions allowed • Pre-defined and custom actions on objects 9 http://ogp.me/ Image from https://developers.facebook.com/docs/ opengraph/creating-custom-stories 03.09.14
  10. 10. SOFTWARE PLATFORMS • Drupal (RDFa), https://www.drupal.org/ • Semantic MediaWiki, https://semantic-mediawiki.org/ • Semantic blogs, http://www.zemanta.com/ 03.09.14 10
  11. 11. 11 COLLABORATIVE TECHNOLOGIES FOR THE SEMANTIC WEB 03.09.14
  12. 12. BASICS Idea: leverage Social Web technologies and platforms to create and manage Semantic Web content Two classes of approaches • Mining user contributions, e.g., turning folksonomies into OWL ontologies, not part of this tutorial • Crowdsource Semantic Web tasks, e.g., GWAPs, paid microtasks, campaigns, contents 03.09.14 12
  13. 13. CROWDSOURCING: PROBLEM SOLVING VIA OPEN CALLS "Simply defined, crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. This can take the form of peer-production (when the job is performed collaboratively), but is also often undertaken by sole individuals. The crucial prerequisite is the use of the open call format and the large network of potential .“ [Howe, 2006] 13 03.09.14
  14. 14. THE MANY FACES OF CROWDSOURCING 14 Image from http://03.09.14 dailycrowdsource.com/
  15. 15. RELATED: HUMAN COMPUTATION Outsourcing tasks that machines find difficult to solve to humans 03.09.14 15
  16. 16. IN THIS TALK: CROWDSOURCING DATA CITATION ‘The USEWOD experiment ‘ • Goal: collect information about the usage of Linked Data sets in research papers • Explore different crowdsourcing methods • Online tool to link publications to data sets (and their versions) • 1st feasibility study with 10 researchers in May 2014 03.09.14 http://prov.usewod.org/ 16 9650 publications
  17. 17. HOW TO CROWDSOURCE What: Goal Who: Contributors How: Process Why: Motivation and incentives http:// sloanreview.mit.edu/ article/the-collective-intelligence- genome/ 17 Image at http:// jasonbiv.files.wordpress.com/ 2012/08/ collectiveintelligence_genome .jpg
  18. 18. EXAMPLE: CITIZEN SCIENCE WHAT IS OUTSOURCED • Object recognition, labeling, categorization in media content WHO IS THE CROWD • Anyone HOW IS THE TASK OUTSOURCED • Highly parallelizable tasks • Every item is handled by multiple annotators • Every annotator provides an answer • Consolidated answers solve scientific problems WHY DO PEOPLE CONTRIBUTE • Fun, interest in science, desire to contribute to science 03.09.14 18 zooniverse.org
  19. 19. WHY Money (rewards) Love Glory Love and glory reduce costs Money and glory make the crowd move faster Intrinsic vs extrinsic motivation Rewards/incentives influence motivation. • They are granted by an external ‘judge’ Successful volunteer crowdsourcing is difficult to predict or replicate • Highly context-specific • Not applicable to arbitrary tasks Reward models often easier to study and control (if performance can be reliably measured) • Not always easy to abstract from social aspects (free-riding, social pressure) • May undermine intrinsic motivation 19
  20. 20. WHY (2) In our example: Who benefits from the results Who owns the results Money • Different models: pay-per-time, pay-per-unit, winner-takes-it-all • Define the rewards, analyze trade-offs accuracy vs costs, avoid spam Love • Authors, researchers, Dbpedia community, open access advocates, publishers, casual gamers etc. Glory • Competitions, mentions, awards 03.09.14 20 Image from Nature.com
  21. 21. WHAT In general: something you cannot do (fully) automatically In our example: annotating research papers with data set information • Alternative representations of the domain • What if the paper is not available? • What if the domain is not known in advance or is infinite? • Do we know the list of potential answers? • Is there only one correct solution to each atomic task? • How many people would solve the same task? 03.09.14 21
  22. 22. WHO In general • An open (‘unknown’) crowd • Scale helps • Qualifications might be a pre-requisite In our example • People who know the papers or the data sets • Experts in the (broader) field • Casual gamers • Librarians • Anyone (knowledgeable of English, with a computer/cell phone etc) • Combinations thereof 03.09.14 22
  23. 23. HOW: PROCESS In general: • Explicit vs. implicit participation • Affects motivation See also [Quinn & Bederson, 2012] • Tasks broken down into smaller units undertaken in parallel by different people • Does not apply to all forms of crowdsourcing • Coordination required to handle cases with more complex workflows • Sometimes using different crowds for workflow parts • Example: text summarization • Task assignment to match skills, preferences, and contribution history • Example: random assignment vs meritocracy vs full autonomy • Partial or independent answers consolidated and aggregated into complete solution • Example: challenges (e.g., Netflix) vs aggregation (e.g., Wikipedia) 03.09.14 23
  24. 24. HOW: PROCESS (2) In our example: • Use the data collected here to train a IE algorithm • Use paid microtask workers to go a first screening, then expert crowd to sort out challenging cases • What if you have very long documents potentially mentioning different/ unknown data sets? • Competition via Twitter • ‘Which version of DBpedia does this paper use?’ • One question a day, prizes • Needs golden standard to bootstrap and redundancy • Involve the authors • Use crowdsourcing to find out Twitter accounts, then launch campaign on Twitter • Write an email to the authors… • Change the task • Which papers use Dbpedia 3.X? • Competition to find all papers 03.09.14 24
  25. 25. HOW: VALIDATION In general: • Solutions space closed vs. open • Redundancy vs. iterations/peer-review • Performance measurements/ground truth • Available and accepted • Automated techniques employed to predict accurate solutions In our example: • Domain is fairly restricted • Spam and obvious wrong answers can be detected easily • When are two answers the same? • Can there be more than one correct answer per question? • Redundancy may not be the final answer • Most people will be able to identify the data set, but sometimes the actual version is not trivial to reproduce • Make educated version guess based on time intervals and other features 03.09.14 25
  26. 26. 03.09.14 26 EXAMPLES Image courtesy of M. Acosta
  27. 27. TASTE IT! TRY IT! Restaurant review Android app developed in the Insemtives project Uses Dbpedia concepts to generate structured reviews Uses mechanism design/gamification to configure incentives User study • 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts 2500 2000 1500 1000 500 03.09.14 27 https://play.google.com/store/apps/details?id=insemtives.android&hl=en 0 CAFE FASTFOOD PUB RESTAURANT Numer of reviews Number of semantic annotations (type of cuisine) Number of semantic annotations (dishes)
  28. 28. ONTOPRONTO GWAP ~ Game with a purpose Players‘ contributions are used to improve the results of technically challenging tasks OntoPronto • Task: map Wikipedia topics to classes of an ontology • Selection-agreement game: players have to agree on answers • Step 1: decide whether a topic is a class or an instance • Step 2: classify topic in the ontology 03.09.14 1 2 28
  29. 29. URBANOPOLY 03.09.14 29 http://www.cefriel.com/en/urbanopoly-game/
  30. 30. See also [Sarasua, Simperl, Noy, ISWC2012] CROWDMAP Experiments using MTurk, CrowdFlower and established benchmarks Enhancing the results of automatic techniques Fast, accurate, cost-effective 30 03.09.14 CartP 301-­‐304 100R50P Edas-­‐Iasted 100R50P Ekaw-­‐Iasted 100R50P Cmt-­‐Ekaw 100R50P ConfOf-­‐Ekaw Imp 301-­‐304 PRECISION 0.53 0.8 1.0 1.0 0.93 0.73 RECALL 1.0 0.42 0.7 0.75 0.65 1.0
  31. 31. See also [Acosta et al., ISWC 2013] DBPEDIA CURATION Find Contest Linked Data experts Difficult task Final prize Verify Microtasks Workers Easy task Micropayments TripleCheckMate [Kontoskostas2013] MTurk http://mturk.com 31 03.09.14
  32. 32. CROWDQ Understand the meaning of a keyword query Build a structured (SPARQL) query template Answer the query over Linked Open Data Gianluca Demartini, Beth Trushkowsky, Tim Kraska, and Michael Franklin. CrowdQ: Crowdsourced Query Understanding. In: 6th Biennial Conference on Innovative Data Systems Research (CIDR 2013) 03.09.14 32
  33. 33. LODREFINE 03.09.14 33 http://research.zemanta.com/crowds-to-the-rescue/
  34. 34. PROBLEMS AND CHALLENGES • What is feasible and how can tasks be optimally crowdsourced, e.g., translated into microtasks? • Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions • What to show to users • Natural language descriptions of Linked Data/SPARQL • How much context • What form of rendering • How about links? • How to combine with automatic tools • Which results to validate • Low precision (no fun for gamers...) • Low recall (vs all possible questions) • How to embed crowdsourcing into an existing application • Tasks are fine granular, perceived as additional burden to the actual functionality • What to do with the resulting data? • Integration into existing practices • Vocabularies 03.09.14 34
  35. 35. FURTHER READING 03.09.14 35
  36. 36. E.SIMPERL@SOTON.AC.UK @ESIMPERL WWW.SOCIAM.ORG WWW.PLANET-DATA.EU 03.09.14 36

×