Using crowdsourcing for Semantic
     Web applications and tools
           Elena Simperl, Karlsruhe Institute of Technology, Germany
                      Talk at SWAT4LS Summer School, Aveiro, Portugal
                                        May 2012




6/7/2012                               www.insemtives.eu                1
Semantic technologies are mainly
    about automation…
• …but many tasks in semantic content
  authoring fundamentally rely on human input
  – Modeling a domain
  – Understanding text and media content (in all their
    forms and languages)
  – Integrating data sources originating from different
    contexts
Incentives and motivators
• What motivates people to engage with an application?
• Which rewards are effective and when?

• Motivation is the driving force that makes humans
  achieve their goals
• Incentives are ‘rewards’ assigned by an external ‘judge’
  to a performer for undertaking a specific task
   – Common belief (among economists): incentives can be
     translated into a sum of money for all practical purposes
• Incentives can be related to extrinsic and intrinsic
  motivations
Examples of applications




            www.insemtives.eu   4
Incentives and motivators (2)
• Successful volunteer crowdsourcing is difficult to
  predict or replicate
   – Highly context-specific
   – Not applicable to arbitrary tasks


• Reward models often easier to study and control
  (if performance can be reliably measured)
   – Different models: pay-per-time, pay-per-unit, winner-takes-it-
     all…
   – Not always easy to abstract from social aspects (free-riding,
     social pressure…)
   – May undermine intrinsic motivation
Examples (2)




Mason & Watts: Financial incentives and the performance of the crowds, HCOMP 2009.
Amazon‘s Mechanical Turk
  • Successfully applied to transcription, classification, and
    content generation, data collection, image tagging, website
    feedback, usability tests…*
  • Increasingly used by academia for evaluation purposes
  • Extensions for quality assurance, complex workflows,
    resource management, vertical domains…




* http://behind-the-enemy-lines.blogspot.com/2010/10/what-tasks-are-posted-on-mechanical.html
What tasks can be (microtask-)
    crowdsourced?
• Best case
  – Routine work requiring common knowledge,
    decomposable into simpler, independent sub- tasks,
    performance easily measurable, no spam
• Ongoing research in task design, quality
  assurance, estimated time of completion…

• Example: open-scale tasks in MTurk
  – Generate, then vote.
  – Introduce random noise to identify potential issues in
    the second step                            Label   Correct




                                                                         Vote answers
                                               Generate answer
                                                                 image                  or not?
Examples (2)




           www.insemtives.eu   9
GWAPs and gamification
 • GWAPs: human computation disguised as casual games
 • Gamification/game mechanics: integrate game elements
   to applications*
     – Accelerated feedback cycles
          • Annual performance appraisals vs immediate feedback to maintain
            engagement
     – Clear goals and rules of play
          • Players feel empowered to achieve goals vs fuzzy, complex system of
            rules in real-world
     – Compelling narrative
          • Gamification builds a narrative that engages players to participate and
            achieve the goals of the activity
     – But in the end it’s about what tasks users
     want to get better at

*http://www.gartner.com/it/page.jsp?id=1629214
What tasks can be gamified?*
•    Work is decomposable into simpler tasks
•    Tasks are nested
•    Performance is measurable
•    One can define an obvious rewarding scheme
•    Skills can be arranged in a smooth learning
     curve


    *http://www.lostgarden.com/2008/06/what-actitivies-that-can-be-turned-into.html
What is different about semantic
      systems?
• It‘s still about the context
  of the actual application

• User engagement with
  semantic tasks to
   – Ensure knowledge is
     relevant and up-to-date
   – People accept the new
     solution and understand its
     benefits
   – Avoid cold-start problems
   – Optimize maintenance
     costs
What do you want your
    users to do?
• Semantic applications
  – Context of the actual application
  – Need to involve users in knowledge engineering tasks?
     • Incentives are related to organizational and social factors
     • Seamless integration of new features
• Semantic tools
  – Game mechanics
  – Paid crowdsourcing (possibly integrated with the tool)
• Using results of casual games

        http://gapingvoid.com/2011/06/07/pixie-dust-the-mountain-of-mediocrity/
Crowdsourcing knowledge
       engineering
• Granularity of activities is
  typically too high
• Further splitting is needed
• Crowdsource very specific tasks
  that are (highly) divisible
   –Labeling (in different languages)
   –Finding relationships
   –Populating the ontology
   –Aligning and interlinking
   –Ontology-based annotation
   –Validating the results of automatic
    methods
   –…
                          www.insemtives.eu   14
Example: ontology building




6/7/2012              www.insemtives.eu   15
Example: relationship finding
Example: video annotation




          www.insemtives.eu   17
Example: ontology alignment




          www.insemtives.eu   18
Example: ontology evaluation




           www.insemtives.eu   19
OntoGame API
• API that provides several methods that are shared by the
  OntoGame games, such as
      –    Different agreement types (e.g. selection agreement)
      –    Input matching (e.g. , majority)
      –    Game modes (multi-player, single player)
      –    Player reliability evaluation
      –    Player matching (e.g., finding the optimal partner to play)
      –    Resource (i.e., data needed for games) management
      –    Creating semantic content
• http://insemtives.svn.sourceforge.net/viewvc/insemtives/gen
  eric-gaming-toolkit


6/7/2012                             www.insemtives.eu                   20
Lessons learned
• Approach is feasible for mainstream domains, where a
  knowledge corpus is available
• Approach is per design less applicable to Semantic Web-tasks
   – Knowledge-intensive tasks are not easily nestable
   – Repetitive tasks  players‘ retention?
• Knowledge corpus has to be large-enough to allow for a rich
  game experience
   – But you need a critical mass of players to validate the results
• Advertisement is essential
• Game design vs useful content
   – Reusing well-known game paradigms
   – Reusing game outcomes and integration in existing workflows and
     tools
• Cost-benefit analysis
General guidelines
• Focus on the actual goal and incentivize related
  actions
  – Write posts, create graphics, annotate pictures, reply
    to customers in a given time…
• Build a community around the intended actions
  – Reward helping each other in performing the task and
    interaction
  – Reward recruiting new contributors
• Reward repeated actions
  – Actions become part of the daily routine
Games vs Mechanical Turk
Combining human and computational
       intelligence
           Give me the German names of all commercial airports in Baden-
           Württemberg, ordered by their most informative description.


„Retrieve the labels in German of commercial airports located
in Baden-Württemberg, ordered by the better human-readable
description of the airport given in the comment“.


• This query cannot be optimally answered automatically
   – Incorrect/missing classification of entities (e.g. classification as airports
     instead of commercial airports)
   – Missing information in data sets (e.g. German labels)
   – It is not possible to optimally perform subjective operations (e.g. comparisons
     of pictures or NL comments)
What tasks should be
      crowdsourced?
„Retrieve the labels in German of commercial airports
located in Baden-Württemberg, ordered by the better
human-readable description of the airport given in the
comment“.
                                      Classification
SPARQL Query:                     1
SELECT ?label WHERE {
  ?x a metar:CommercialHubAirport;
    rdfs:label ?label;
    rdfs:comment ?comment .                        Identity resolution
  ?x geonames:parentFeature ?z .                                       2
  ?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg>
.                            3 Missing Information
  FILTER (LANG(?label) = "de")
} ORDER BY CROWD(?comment, "Better description of %x")  4 Ordering
Crowdsourced query processing
• Extensions to VoID and
  SPARQL
• Formal, declarative
  description of data and
  tasks using SPARQL
  patterns as a basis for the
  automatic design of HITs.
• Hybrid query processing
  (adaptive techniques,
  caching, semantically
  driven task design)
HITs design: Classification
  • It is not always possible to automatically infer classification
    from the properties.
  • Example: Retrieve the names (labels) of METAR stations that correspond to
     commercial airports.
SELECT ?label WHERE {
  ?station a metar:CommercialHubAirport;
    rdfs:label ?label .}
 Input:   {?station a metar:Station;
             rdfs:label ?label;
             wgs84:lat ?lat;
             wgs84:long ?long}

 Output: {?station a ?type.
          ?type rdfs:subClassOf metar:Station}
HITs design: Ordering
  • Orderings defined via less straightforward built-ins; for instance,
    the ordering of pictorial representations of entities.
  • SPARQL extension: ORDER BY CROWD
  • Example: Retrieves all airports and their pictures, and the pictures should be
         ordered according to the more representative image of the given airport.

SELECT ?airport ?picture WHERE {
  ?airport a metar:Airport;
    foaf:depiction ?picture .
} ORDER BY CROWD(?picture,
"Most representative image for %airport")

Input:     {?airport foaf:depiction ?x, ?y}

Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}
Challenges
• Appropriate level of granularity for HITs design for specific
  SPARQL constructs
• Caching
   – Naively we can materialise HIT results into
     datasets
   – How to deal with partial coverage and dynamic
     datasets
• Optimal user interfaces of graph-like content

• Pricing and workers’ assignment
Thank you
  e: elena.simperl@kit.edu, t: @esimperl

Publications available at www.insemtives.org

   Team: Maribel Acosta, Barry Norton, Katharina Siorpaes,
       Stefan Thaler, Stephan Wölger and many others

Insemtives swat4ls 2012

  • 1.
    Using crowdsourcing forSemantic Web applications and tools Elena Simperl, Karlsruhe Institute of Technology, Germany Talk at SWAT4LS Summer School, Aveiro, Portugal May 2012 6/7/2012 www.insemtives.eu 1
  • 2.
    Semantic technologies aremainly about automation… • …but many tasks in semantic content authoring fundamentally rely on human input – Modeling a domain – Understanding text and media content (in all their forms and languages) – Integrating data sources originating from different contexts
  • 3.
    Incentives and motivators •What motivates people to engage with an application? • Which rewards are effective and when? • Motivation is the driving force that makes humans achieve their goals • Incentives are ‘rewards’ assigned by an external ‘judge’ to a performer for undertaking a specific task – Common belief (among economists): incentives can be translated into a sum of money for all practical purposes • Incentives can be related to extrinsic and intrinsic motivations
  • 4.
    Examples of applications www.insemtives.eu 4
  • 5.
    Incentives and motivators(2) • Successful volunteer crowdsourcing is difficult to predict or replicate – Highly context-specific – Not applicable to arbitrary tasks • Reward models often easier to study and control (if performance can be reliably measured) – Different models: pay-per-time, pay-per-unit, winner-takes-it- all… – Not always easy to abstract from social aspects (free-riding, social pressure…) – May undermine intrinsic motivation
  • 6.
    Examples (2) Mason &Watts: Financial incentives and the performance of the crowds, HCOMP 2009.
  • 7.
    Amazon‘s Mechanical Turk • Successfully applied to transcription, classification, and content generation, data collection, image tagging, website feedback, usability tests…* • Increasingly used by academia for evaluation purposes • Extensions for quality assurance, complex workflows, resource management, vertical domains… * http://behind-the-enemy-lines.blogspot.com/2010/10/what-tasks-are-posted-on-mechanical.html
  • 8.
    What tasks canbe (microtask-) crowdsourced? • Best case – Routine work requiring common knowledge, decomposable into simpler, independent sub- tasks, performance easily measurable, no spam • Ongoing research in task design, quality assurance, estimated time of completion… • Example: open-scale tasks in MTurk – Generate, then vote. – Introduce random noise to identify potential issues in the second step Label Correct Vote answers Generate answer image or not?
  • 9.
    Examples (2) www.insemtives.eu 9
  • 10.
    GWAPs and gamification • GWAPs: human computation disguised as casual games • Gamification/game mechanics: integrate game elements to applications* – Accelerated feedback cycles • Annual performance appraisals vs immediate feedback to maintain engagement – Clear goals and rules of play • Players feel empowered to achieve goals vs fuzzy, complex system of rules in real-world – Compelling narrative • Gamification builds a narrative that engages players to participate and achieve the goals of the activity – But in the end it’s about what tasks users want to get better at *http://www.gartner.com/it/page.jsp?id=1629214
  • 11.
    What tasks canbe gamified?* • Work is decomposable into simpler tasks • Tasks are nested • Performance is measurable • One can define an obvious rewarding scheme • Skills can be arranged in a smooth learning curve *http://www.lostgarden.com/2008/06/what-actitivies-that-can-be-turned-into.html
  • 12.
    What is differentabout semantic systems? • It‘s still about the context of the actual application • User engagement with semantic tasks to – Ensure knowledge is relevant and up-to-date – People accept the new solution and understand its benefits – Avoid cold-start problems – Optimize maintenance costs
  • 13.
    What do youwant your users to do? • Semantic applications – Context of the actual application – Need to involve users in knowledge engineering tasks? • Incentives are related to organizational and social factors • Seamless integration of new features • Semantic tools – Game mechanics – Paid crowdsourcing (possibly integrated with the tool) • Using results of casual games http://gapingvoid.com/2011/06/07/pixie-dust-the-mountain-of-mediocrity/
  • 14.
    Crowdsourcing knowledge engineering • Granularity of activities is typically too high • Further splitting is needed • Crowdsource very specific tasks that are (highly) divisible –Labeling (in different languages) –Finding relationships –Populating the ontology –Aligning and interlinking –Ontology-based annotation –Validating the results of automatic methods –… www.insemtives.eu 14
  • 15.
  • 16.
  • 17.
    Example: video annotation www.insemtives.eu 17
  • 18.
    Example: ontology alignment www.insemtives.eu 18
  • 19.
    Example: ontology evaluation www.insemtives.eu 19
  • 20.
    OntoGame API • APIthat provides several methods that are shared by the OntoGame games, such as – Different agreement types (e.g. selection agreement) – Input matching (e.g. , majority) – Game modes (multi-player, single player) – Player reliability evaluation – Player matching (e.g., finding the optimal partner to play) – Resource (i.e., data needed for games) management – Creating semantic content • http://insemtives.svn.sourceforge.net/viewvc/insemtives/gen eric-gaming-toolkit 6/7/2012 www.insemtives.eu 20
  • 21.
    Lessons learned • Approachis feasible for mainstream domains, where a knowledge corpus is available • Approach is per design less applicable to Semantic Web-tasks – Knowledge-intensive tasks are not easily nestable – Repetitive tasks  players‘ retention? • Knowledge corpus has to be large-enough to allow for a rich game experience – But you need a critical mass of players to validate the results • Advertisement is essential • Game design vs useful content – Reusing well-known game paradigms – Reusing game outcomes and integration in existing workflows and tools • Cost-benefit analysis
  • 22.
    General guidelines • Focuson the actual goal and incentivize related actions – Write posts, create graphics, annotate pictures, reply to customers in a given time… • Build a community around the intended actions – Reward helping each other in performing the task and interaction – Reward recruiting new contributors • Reward repeated actions – Actions become part of the daily routine
  • 23.
  • 24.
    Combining human andcomputational intelligence Give me the German names of all commercial airports in Baden- Württemberg, ordered by their most informative description. „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human-readable description of the airport given in the comment“. • This query cannot be optimally answered automatically – Incorrect/missing classification of entities (e.g. classification as airports instead of commercial airports) – Missing information in data sets (e.g. German labels) – It is not possible to optimally perform subjective operations (e.g. comparisons of pictures or NL comments)
  • 25.
    What tasks shouldbe crowdsourced? „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human-readable description of the airport given in the comment“. Classification SPARQL Query: 1 SELECT ?label WHERE { ?x a metar:CommercialHubAirport; rdfs:label ?label; rdfs:comment ?comment . Identity resolution ?x geonames:parentFeature ?z . 2 ?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg> . 3 Missing Information FILTER (LANG(?label) = "de") } ORDER BY CROWD(?comment, "Better description of %x") 4 Ordering
  • 26.
    Crowdsourced query processing •Extensions to VoID and SPARQL • Formal, declarative description of data and tasks using SPARQL patterns as a basis for the automatic design of HITs. • Hybrid query processing (adaptive techniques, caching, semantically driven task design)
  • 27.
    HITs design: Classification • It is not always possible to automatically infer classification from the properties. • Example: Retrieve the names (labels) of METAR stations that correspond to commercial airports. SELECT ?label WHERE { ?station a metar:CommercialHubAirport; rdfs:label ?label .} Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long} Output: {?station a ?type. ?type rdfs:subClassOf metar:Station}
  • 28.
    HITs design: Ordering • Orderings defined via less straightforward built-ins; for instance, the ordering of pictorial representations of entities. • SPARQL extension: ORDER BY CROWD • Example: Retrieves all airports and their pictures, and the pictures should be ordered according to the more representative image of the given airport. SELECT ?airport ?picture WHERE { ?airport a metar:Airport; foaf:depiction ?picture . } ORDER BY CROWD(?picture, "Most representative image for %airport") Input: {?airport foaf:depiction ?x, ?y} Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}
  • 29.
    Challenges • Appropriate levelof granularity for HITs design for specific SPARQL constructs • Caching – Naively we can materialise HIT results into datasets – How to deal with partial coverage and dynamic datasets • Optimal user interfaces of graph-like content • Pricing and workers’ assignment
  • 30.
    Thank you e: elena.simperl@kit.edu, t: @esimperl Publications available at www.insemtives.org Team: Maribel Acosta, Barry Norton, Katharina Siorpaes, Stefan Thaler, Stephan Wölger and many others