SlideShare a Scribd company logo
1 of 30
Download to read offline
Using crowdsourcing for Semantic
     Web applications and tools
           Elena Simperl, Karlsruhe Institute of Technology, Germany
                      Talk at SWAT4LS Summer School, Aveiro, Portugal
                                        May 2012




6/7/2012                               www.insemtives.eu                1
Semantic technologies are mainly
    about automation…
• …but many tasks in semantic content
  authoring fundamentally rely on human input
  – Modeling a domain
  – Understanding text and media content (in all their
    forms and languages)
  – Integrating data sources originating from different
    contexts
Incentives and motivators
• What motivates people to engage with an application?
• Which rewards are effective and when?

• Motivation is the driving force that makes humans
  achieve their goals
• Incentives are ‘rewards’ assigned by an external ‘judge’
  to a performer for undertaking a specific task
   – Common belief (among economists): incentives can be
     translated into a sum of money for all practical purposes
• Incentives can be related to extrinsic and intrinsic
  motivations
Examples of applications




            www.insemtives.eu   4
Incentives and motivators (2)
• Successful volunteer crowdsourcing is difficult to
  predict or replicate
   – Highly context-specific
   – Not applicable to arbitrary tasks


• Reward models often easier to study and control
  (if performance can be reliably measured)
   – Different models: pay-per-time, pay-per-unit, winner-takes-it-
     all…
   – Not always easy to abstract from social aspects (free-riding,
     social pressure…)
   – May undermine intrinsic motivation
Examples (2)




Mason & Watts: Financial incentives and the performance of the crowds, HCOMP 2009.
Amazon‘s Mechanical Turk
  • Successfully applied to transcription, classification, and
    content generation, data collection, image tagging, website
    feedback, usability tests…*
  • Increasingly used by academia for evaluation purposes
  • Extensions for quality assurance, complex workflows,
    resource management, vertical domains…




* http://behind-the-enemy-lines.blogspot.com/2010/10/what-tasks-are-posted-on-mechanical.html
What tasks can be (microtask-)
    crowdsourced?
• Best case
  – Routine work requiring common knowledge,
    decomposable into simpler, independent sub- tasks,
    performance easily measurable, no spam
• Ongoing research in task design, quality
  assurance, estimated time of completion…

• Example: open-scale tasks in MTurk
  – Generate, then vote.
  – Introduce random noise to identify potential issues in
    the second step                            Label   Correct




                                                                         Vote answers
                                               Generate answer
                                                                 image                  or not?
Examples (2)




           www.insemtives.eu   9
GWAPs and gamification
 • GWAPs: human computation disguised as casual games
 • Gamification/game mechanics: integrate game elements
   to applications*
     – Accelerated feedback cycles
          • Annual performance appraisals vs immediate feedback to maintain
            engagement
     – Clear goals and rules of play
          • Players feel empowered to achieve goals vs fuzzy, complex system of
            rules in real-world
     – Compelling narrative
          • Gamification builds a narrative that engages players to participate and
            achieve the goals of the activity
     – But in the end it’s about what tasks users
     want to get better at

*http://www.gartner.com/it/page.jsp?id=1629214
What tasks can be gamified?*
•    Work is decomposable into simpler tasks
•    Tasks are nested
•    Performance is measurable
•    One can define an obvious rewarding scheme
•    Skills can be arranged in a smooth learning
     curve


    *http://www.lostgarden.com/2008/06/what-actitivies-that-can-be-turned-into.html
What is different about semantic
      systems?
• It‘s still about the context
  of the actual application

• User engagement with
  semantic tasks to
   – Ensure knowledge is
     relevant and up-to-date
   – People accept the new
     solution and understand its
     benefits
   – Avoid cold-start problems
   – Optimize maintenance
     costs
What do you want your
    users to do?
• Semantic applications
  – Context of the actual application
  – Need to involve users in knowledge engineering tasks?
     • Incentives are related to organizational and social factors
     • Seamless integration of new features
• Semantic tools
  – Game mechanics
  – Paid crowdsourcing (possibly integrated with the tool)
• Using results of casual games

        http://gapingvoid.com/2011/06/07/pixie-dust-the-mountain-of-mediocrity/
Crowdsourcing knowledge
       engineering
• Granularity of activities is
  typically too high
• Further splitting is needed
• Crowdsource very specific tasks
  that are (highly) divisible
   –Labeling (in different languages)
   –Finding relationships
   –Populating the ontology
   –Aligning and interlinking
   –Ontology-based annotation
   –Validating the results of automatic
    methods
   –…
                          www.insemtives.eu   14
Example: ontology building




6/7/2012              www.insemtives.eu   15
Example: relationship finding
Example: video annotation




          www.insemtives.eu   17
Example: ontology alignment




          www.insemtives.eu   18
Example: ontology evaluation




           www.insemtives.eu   19
OntoGame API
• API that provides several methods that are shared by the
  OntoGame games, such as
      –    Different agreement types (e.g. selection agreement)
      –    Input matching (e.g. , majority)
      –    Game modes (multi-player, single player)
      –    Player reliability evaluation
      –    Player matching (e.g., finding the optimal partner to play)
      –    Resource (i.e., data needed for games) management
      –    Creating semantic content
• http://insemtives.svn.sourceforge.net/viewvc/insemtives/gen
  eric-gaming-toolkit


6/7/2012                             www.insemtives.eu                   20
Lessons learned
• Approach is feasible for mainstream domains, where a
  knowledge corpus is available
• Approach is per design less applicable to Semantic Web-tasks
   – Knowledge-intensive tasks are not easily nestable
   – Repetitive tasks  players‘ retention?
• Knowledge corpus has to be large-enough to allow for a rich
  game experience
   – But you need a critical mass of players to validate the results
• Advertisement is essential
• Game design vs useful content
   – Reusing well-known game paradigms
   – Reusing game outcomes and integration in existing workflows and
     tools
• Cost-benefit analysis
General guidelines
• Focus on the actual goal and incentivize related
  actions
  – Write posts, create graphics, annotate pictures, reply
    to customers in a given time…
• Build a community around the intended actions
  – Reward helping each other in performing the task and
    interaction
  – Reward recruiting new contributors
• Reward repeated actions
  – Actions become part of the daily routine
Games vs Mechanical Turk
Combining human and computational
       intelligence
           Give me the German names of all commercial airports in Baden-
           Württemberg, ordered by their most informative description.


„Retrieve the labels in German of commercial airports located
in Baden-Württemberg, ordered by the better human-readable
description of the airport given in the comment“.


• This query cannot be optimally answered automatically
   – Incorrect/missing classification of entities (e.g. classification as airports
     instead of commercial airports)
   – Missing information in data sets (e.g. German labels)
   – It is not possible to optimally perform subjective operations (e.g. comparisons
     of pictures or NL comments)
What tasks should be
      crowdsourced?
„Retrieve the labels in German of commercial airports
located in Baden-Württemberg, ordered by the better
human-readable description of the airport given in the
comment“.
                                      Classification
SPARQL Query:                     1
SELECT ?label WHERE {
  ?x a metar:CommercialHubAirport;
    rdfs:label ?label;
    rdfs:comment ?comment .                        Identity resolution
  ?x geonames:parentFeature ?z .                                       2
  ?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg>
.                            3 Missing Information
  FILTER (LANG(?label) = "de")
} ORDER BY CROWD(?comment, "Better description of %x")  4 Ordering
Crowdsourced query processing
• Extensions to VoID and
  SPARQL
• Formal, declarative
  description of data and
  tasks using SPARQL
  patterns as a basis for the
  automatic design of HITs.
• Hybrid query processing
  (adaptive techniques,
  caching, semantically
  driven task design)
HITs design: Classification
  • It is not always possible to automatically infer classification
    from the properties.
  • Example: Retrieve the names (labels) of METAR stations that correspond to
     commercial airports.
SELECT ?label WHERE {
  ?station a metar:CommercialHubAirport;
    rdfs:label ?label .}
 Input:   {?station a metar:Station;
             rdfs:label ?label;
             wgs84:lat ?lat;
             wgs84:long ?long}

 Output: {?station a ?type.
          ?type rdfs:subClassOf metar:Station}
HITs design: Ordering
  • Orderings defined via less straightforward built-ins; for instance,
    the ordering of pictorial representations of entities.
  • SPARQL extension: ORDER BY CROWD
  • Example: Retrieves all airports and their pictures, and the pictures should be
         ordered according to the more representative image of the given airport.

SELECT ?airport ?picture WHERE {
  ?airport a metar:Airport;
    foaf:depiction ?picture .
} ORDER BY CROWD(?picture,
"Most representative image for %airport")

Input:     {?airport foaf:depiction ?x, ?y}

Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}
Challenges
• Appropriate level of granularity for HITs design for specific
  SPARQL constructs
• Caching
   – Naively we can materialise HIT results into
     datasets
   – How to deal with partial coverage and dynamic
     datasets
• Optimal user interfaces of graph-like content

• Pricing and workers’ assignment
Thank you
  e: elena.simperl@kit.edu, t: @esimperl

Publications available at www.insemtives.org

   Team: Maribel Acosta, Barry Norton, Katharina Siorpaes,
       Stefan Thaler, Stephan Wölger and many others

More Related Content

Similar to Insemtives swat4ls 2012

INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES project
 
Insemtives cluj meetup
Insemtives cluj meetupInsemtives cluj meetup
Insemtives cluj meetup
Elena Simperl
 
SemTech2011 - Employee-of-the-Month' Badge Unlocked
SemTech2011 - Employee-of-the-Month' Badge UnlockedSemTech2011 - Employee-of-the-Month' Badge Unlocked
SemTech2011 - Employee-of-the-Month' Badge Unlocked
INSEMTIVES project
 
Insemtives cluj iccp
Insemtives cluj iccpInsemtives cluj iccp
Insemtives cluj iccp
Elena Simperl
 

Similar to Insemtives swat4ls 2012 (20)

INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1
 
Insemtives iswc2011 session1
Insemtives iswc2011 session1Insemtives iswc2011 session1
Insemtives iswc2011 session1
 
Insemtives cluj meetup
Insemtives cluj meetupInsemtives cluj meetup
Insemtives cluj meetup
 
SemTech2011 - Employee-of-the-Month' Badge Unlocked
SemTech2011 - Employee-of-the-Month' Badge UnlockedSemTech2011 - Employee-of-the-Month' Badge Unlocked
SemTech2011 - Employee-of-the-Month' Badge Unlocked
 
MODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in PracticeMODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in Practice
 
Introduction (1/6)
Introduction (1/6)Introduction (1/6)
Introduction (1/6)
 
Keynote at-icpc-2020
Keynote at-icpc-2020Keynote at-icpc-2020
Keynote at-icpc-2020
 
Insemtives cluj iccp
Insemtives cluj iccpInsemtives cluj iccp
Insemtives cluj iccp
 
[2015/2016] Software systems engineering PRINCIPLES
[2015/2016] Software systems engineering PRINCIPLES[2015/2016] Software systems engineering PRINCIPLES
[2015/2016] Software systems engineering PRINCIPLES
 
Design Systems Operations
Design Systems OperationsDesign Systems Operations
Design Systems Operations
 
Insemtives stanford
Insemtives stanfordInsemtives stanford
Insemtives stanford
 
Lecture 3 GORE.pptx
Lecture 3 GORE.pptxLecture 3 GORE.pptx
Lecture 3 GORE.pptx
 
are algorithms really a black box
are algorithms really a black boxare algorithms really a black box
are algorithms really a black box
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startup
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Object Oriented System Design
Object Oriented System DesignObject Oriented System Design
Object Oriented System Design
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Specialized estimation tech
Specialized estimation techSpecialized estimation tech
Specialized estimation tech
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Softwareproject planning
Softwareproject planningSoftwareproject planning
Softwareproject planning
 

More from Elena Simperl

One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
Elena Simperl
 

More from Elena Simperl (20)

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impact
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 

Insemtives swat4ls 2012

  • 1. Using crowdsourcing for Semantic Web applications and tools Elena Simperl, Karlsruhe Institute of Technology, Germany Talk at SWAT4LS Summer School, Aveiro, Portugal May 2012 6/7/2012 www.insemtives.eu 1
  • 2. Semantic technologies are mainly about automation… • …but many tasks in semantic content authoring fundamentally rely on human input – Modeling a domain – Understanding text and media content (in all their forms and languages) – Integrating data sources originating from different contexts
  • 3. Incentives and motivators • What motivates people to engage with an application? • Which rewards are effective and when? • Motivation is the driving force that makes humans achieve their goals • Incentives are ‘rewards’ assigned by an external ‘judge’ to a performer for undertaking a specific task – Common belief (among economists): incentives can be translated into a sum of money for all practical purposes • Incentives can be related to extrinsic and intrinsic motivations
  • 4. Examples of applications www.insemtives.eu 4
  • 5. Incentives and motivators (2) • Successful volunteer crowdsourcing is difficult to predict or replicate – Highly context-specific – Not applicable to arbitrary tasks • Reward models often easier to study and control (if performance can be reliably measured) – Different models: pay-per-time, pay-per-unit, winner-takes-it- all… – Not always easy to abstract from social aspects (free-riding, social pressure…) – May undermine intrinsic motivation
  • 6. Examples (2) Mason & Watts: Financial incentives and the performance of the crowds, HCOMP 2009.
  • 7. Amazon‘s Mechanical Turk • Successfully applied to transcription, classification, and content generation, data collection, image tagging, website feedback, usability tests…* • Increasingly used by academia for evaluation purposes • Extensions for quality assurance, complex workflows, resource management, vertical domains… * http://behind-the-enemy-lines.blogspot.com/2010/10/what-tasks-are-posted-on-mechanical.html
  • 8. What tasks can be (microtask-) crowdsourced? • Best case – Routine work requiring common knowledge, decomposable into simpler, independent sub- tasks, performance easily measurable, no spam • Ongoing research in task design, quality assurance, estimated time of completion… • Example: open-scale tasks in MTurk – Generate, then vote. – Introduce random noise to identify potential issues in the second step Label Correct Vote answers Generate answer image or not?
  • 9. Examples (2) www.insemtives.eu 9
  • 10. GWAPs and gamification • GWAPs: human computation disguised as casual games • Gamification/game mechanics: integrate game elements to applications* – Accelerated feedback cycles • Annual performance appraisals vs immediate feedback to maintain engagement – Clear goals and rules of play • Players feel empowered to achieve goals vs fuzzy, complex system of rules in real-world – Compelling narrative • Gamification builds a narrative that engages players to participate and achieve the goals of the activity – But in the end it’s about what tasks users want to get better at *http://www.gartner.com/it/page.jsp?id=1629214
  • 11. What tasks can be gamified?* • Work is decomposable into simpler tasks • Tasks are nested • Performance is measurable • One can define an obvious rewarding scheme • Skills can be arranged in a smooth learning curve *http://www.lostgarden.com/2008/06/what-actitivies-that-can-be-turned-into.html
  • 12. What is different about semantic systems? • It‘s still about the context of the actual application • User engagement with semantic tasks to – Ensure knowledge is relevant and up-to-date – People accept the new solution and understand its benefits – Avoid cold-start problems – Optimize maintenance costs
  • 13. What do you want your users to do? • Semantic applications – Context of the actual application – Need to involve users in knowledge engineering tasks? • Incentives are related to organizational and social factors • Seamless integration of new features • Semantic tools – Game mechanics – Paid crowdsourcing (possibly integrated with the tool) • Using results of casual games http://gapingvoid.com/2011/06/07/pixie-dust-the-mountain-of-mediocrity/
  • 14. Crowdsourcing knowledge engineering • Granularity of activities is typically too high • Further splitting is needed • Crowdsource very specific tasks that are (highly) divisible –Labeling (in different languages) –Finding relationships –Populating the ontology –Aligning and interlinking –Ontology-based annotation –Validating the results of automatic methods –… www.insemtives.eu 14
  • 15. Example: ontology building 6/7/2012 www.insemtives.eu 15
  • 17. Example: video annotation www.insemtives.eu 17
  • 18. Example: ontology alignment www.insemtives.eu 18
  • 19. Example: ontology evaluation www.insemtives.eu 19
  • 20. OntoGame API • API that provides several methods that are shared by the OntoGame games, such as – Different agreement types (e.g. selection agreement) – Input matching (e.g. , majority) – Game modes (multi-player, single player) – Player reliability evaluation – Player matching (e.g., finding the optimal partner to play) – Resource (i.e., data needed for games) management – Creating semantic content • http://insemtives.svn.sourceforge.net/viewvc/insemtives/gen eric-gaming-toolkit 6/7/2012 www.insemtives.eu 20
  • 21. Lessons learned • Approach is feasible for mainstream domains, where a knowledge corpus is available • Approach is per design less applicable to Semantic Web-tasks – Knowledge-intensive tasks are not easily nestable – Repetitive tasks  players‘ retention? • Knowledge corpus has to be large-enough to allow for a rich game experience – But you need a critical mass of players to validate the results • Advertisement is essential • Game design vs useful content – Reusing well-known game paradigms – Reusing game outcomes and integration in existing workflows and tools • Cost-benefit analysis
  • 22. General guidelines • Focus on the actual goal and incentivize related actions – Write posts, create graphics, annotate pictures, reply to customers in a given time… • Build a community around the intended actions – Reward helping each other in performing the task and interaction – Reward recruiting new contributors • Reward repeated actions – Actions become part of the daily routine
  • 24. Combining human and computational intelligence Give me the German names of all commercial airports in Baden- Württemberg, ordered by their most informative description. „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human-readable description of the airport given in the comment“. • This query cannot be optimally answered automatically – Incorrect/missing classification of entities (e.g. classification as airports instead of commercial airports) – Missing information in data sets (e.g. German labels) – It is not possible to optimally perform subjective operations (e.g. comparisons of pictures or NL comments)
  • 25. What tasks should be crowdsourced? „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human-readable description of the airport given in the comment“. Classification SPARQL Query: 1 SELECT ?label WHERE { ?x a metar:CommercialHubAirport; rdfs:label ?label; rdfs:comment ?comment . Identity resolution ?x geonames:parentFeature ?z . 2 ?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg> . 3 Missing Information FILTER (LANG(?label) = "de") } ORDER BY CROWD(?comment, "Better description of %x") 4 Ordering
  • 26. Crowdsourced query processing • Extensions to VoID and SPARQL • Formal, declarative description of data and tasks using SPARQL patterns as a basis for the automatic design of HITs. • Hybrid query processing (adaptive techniques, caching, semantically driven task design)
  • 27. HITs design: Classification • It is not always possible to automatically infer classification from the properties. • Example: Retrieve the names (labels) of METAR stations that correspond to commercial airports. SELECT ?label WHERE { ?station a metar:CommercialHubAirport; rdfs:label ?label .} Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long} Output: {?station a ?type. ?type rdfs:subClassOf metar:Station}
  • 28. HITs design: Ordering • Orderings defined via less straightforward built-ins; for instance, the ordering of pictorial representations of entities. • SPARQL extension: ORDER BY CROWD • Example: Retrieves all airports and their pictures, and the pictures should be ordered according to the more representative image of the given airport. SELECT ?airport ?picture WHERE { ?airport a metar:Airport; foaf:depiction ?picture . } ORDER BY CROWD(?picture, "Most representative image for %airport") Input: {?airport foaf:depiction ?x, ?y} Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}
  • 29. Challenges • Appropriate level of granularity for HITs design for specific SPARQL constructs • Caching – Naively we can materialise HIT results into datasets – How to deal with partial coverage and dynamic datasets • Optimal user interfaces of graph-like content • Pricing and workers’ assignment
  • 30. Thank you e: elena.simperl@kit.edu, t: @esimperl Publications available at www.insemtives.org Team: Maribel Acosta, Barry Norton, Katharina Siorpaes, Stefan Thaler, Stephan Wölger and many others