Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Human Computation for VGI Management

322 views

Published on

invited talk about my experience on Human Computation and Games with a Purpose for VGI at the Workshop on "Volunteered Geographic Information: Enabling VGI creation, management and sharing", at CNR in MIlano on April 16th 2018

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Human Computation for VGI Management

  1. 1. HUMAN COMPUTATION FOR VGI MANAGEMENT Irene Celino – irene.celino@cefriel.com Cefriel, Viale Sarca 226, 20126 Milano Workshop on Volunteered Geographic Information – Milano, April 16th, 2018
  2. 2. 1. Introduction 2. Human Computation and Games with a Purpose 3. GWAP examples for VGI Management 4. Indirect people involvement 5. Future perspectives AGENDA 2copyright © 2018 Cefriel – All rights reserved
  3. 3. from ideation to business value 3 1. INTRODUCTION Relation between VGI and Human Computation copyright © 2018 Cefriel – All rights reserved
  4. 4. VGI AND HUMAN COMPUTATION • VGI is carried out by volunteers, so by definition it implies human intervention • Still VGI suffers of all issues related to that: • Varying participation  impact on sustainability (long tail effect) • Reliability of volunteers  impact of information quality • Uneven distribution of contributions  impact on coverage • Human Computation is an approach that can bring benefits to VGI… • …and VGI can reveal more than you could expect! 4copyright © 2018 Cefriel – All rights reserved
  5. 5. WISDOM OF CROWDS • “Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations” • Criteria for a wise crowd • Diversity of opinion (importance of interpretation) • Independence (not a “single mind”) • Decentralization (importance of local knowledge) • Aggregation (aim to get a collective decision) • The are also failures/risks in crowd decisions: • Homogeneity, centralization, division, imitation, emotionality 5copyright © 2018 Cefriel – All rights reserved James Surowiecki The wisdom of crowds Anchor, 2005
  6. 6. from ideation to business value 6 2. HUMAN COMPUTATION & GAMES WITH A PURPOSE What is Human Computation? What goals can humans help machines to achieve? How to involve a crowd of persons? What extrinsic rewards (money, prizes, etc.) or intrinsic incentives can we adopt to motivate people? copyright © 2018 Cefriel – All rights reserved
  7. 7. HUMAN COMPUTATION • Human Computation is a computer science technique in which a computational process is performed by outsourcing certain steps to humans. Unlike traditional computation, in which a human delegates a task to a computer, in Human Computation the computer asks a person or a large group of people to solve a problem; then it collects, interprets and integrates their solutions • The original concept of Human Computation by its inventor Luis von Ahn derived from the common sense observation that people are intrinsically very good at solving some kinds of tasks which are, on the other hand, very hard to address for a computer; this is the case of a number of targets of Artificial Intelligence (like image recognition or natural language understanding) for which research is still open 7copyright © 2018 Cefriel – All rights reserved Edith Law and Luis von Ahn. Human computation. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2011
  8. 8. HUMAN COMPUTATION 8copyright © 2018 Cefriel – All rights reserved Problem: an Artificial Intelligence algorithm is unable to achieve an adequate result with a satisfactory level of confidence Solution: ask people to intervene when the AI system fails, “masking” the task within another human process Example: https://www.google.com/recaptcha/
  9. 9. WHY HUMAN COMPUTATION FOR VGI? • Collection of new data – as a complement to VGI itself, exploiting redundancy of multiple contributions • Validation of collected data or automatic processing – as “third party” to solve discrepancies • Completion of data, to fill out “missing pieces” • Identification of mistakes/outdated information and respective “correction” 9copyright © 2018 Cefriel – All rights reserved
  10. 10. GAMES WITH A PURPOSE • A GWAP lets to outsource to humans some steps of a computational process in an entertaining way • The application has a “collateral effect”, because players’ actions are exploited to solve a hidden task • The application *IS* a fully-fledged game (opposed to gamification, which is the use of game-like features in non-gaming environments) • The players are (usually) unaware of the hidden purpose, they simply meet game challenges 10copyright © 2018 Cefriel – All rights reserved Luis Von Ahn. Games with a purpose. Computer, 39(6):92–94, 2006 Luis Von Ahn and Laura Dabbish. Designing games with a purpose. Communications of the ACM, 51(8):58–67, 2008
  11. 11. GAMES WITH A PURPOSE (GWAP) 11copyright © 2018 Cefriel – All rights reserved Problem: it’s the same of Human Computation (ask humans when AI fails) Solution: Solution: hide the task within a game, so that users are motivated by game challenges, often remaining unaware of the hidden purpose, task solution comes from agreement between players
  12. 12. SOME “VARIATIONS” OF HUMAN COMPUTATION • Other terms have been used to indicate approaches and methods that are similar to Human Computation and sometimes mistaken for it • While there is of course quite a large overlap, it is useful to distinguish them • Crowdsourcing • Citizen Science 12copyright © 2018 Cefriel – All rights reserved
  13. 13. CROWDSOURCING • Crowdsourcing is the process to outsource tasks to a “crowd” of distributed people. The possibility to exploit the Internet as vehicle to recruit contributors and to assign tasks led to the rise of micro-work platforms, thus often (but not always) implying a monetary reward. The term Crowdsourcing, although quite recent, is used to indicate a wide range of practices; however, the most common meaning of Crowdsourcing implies that the “crowd” of workers involved in the solution of tasks is different from the traditional or intended groups of task solvers 13copyright © 2018 Cefriel – All rights reserved Jeff Howe. Crowdsourcing: How the power of the crowd is driving the future of business. Random House, 2008
  14. 14. CROWDSOURCING 14copyright © 2018 Cefriel – All rights reserved Problem: a company needs to execute a lot of simple tasks, but cannot afford hiring a person to do that job Solution: pack tasks in bunches (human intelligence tasks or HITs) and outsource them to a very cheap workforce through an online platform Example: https://www.mturk.com/
  15. 15. CITIZEN SCIENCE • Citizen Science is the involvement of volunteers to collect or process data as part of a scientific or research experiment; those volunteers can be the scientists and researchers themselves, but more often the name of this discipline “implies a form of science developed and enacted by citizens” including those “outside of formal scientific institutions”, thus representing a form of public participation to science. Formally, Citizen Science has been defined as “the systematic collection and analysis of data; development of technology; testing of natural phenomena; and the dissemination of these activities by researchers on a primarily avocational basis”. 15copyright © 2018 Cefriel – All rights reserved Alan Irwin. Citizen science: A study of people, expertise and sustainable development. Psychology Press, 1995
  16. 16. CITIZEN SCIENCE 16copyright © 2018 Cefriel – All rights reserved Example: https://www.zooniverse.org/ Problem: a scientific experiment requires the execution of a lot of simple tasks, but researchers are busy Solution: engage the general audience in solving those tasks, explaining that they are contributing to science, research and the public good
  17. 17. SPOT THE DIFFERENCE… • Similarities: • Involvement of people • Aggregation of multiple contributions • No automatic replacement • Variations: • Motivation • Reward (glory, money, passion/need) • Hybrids or parallel! 17copyright © 2018 Cefriel – All rights reserved Citizen Science Crowdsourcing Human Computation
  18. 18. from ideation to business value 18 3. GWAP EXAMPLES FOR VGI MANAGEMENT Can we embed VGI management tasks within Games with a Purpose? copyright © 2018 Cefriel – All rights reserved
  19. 19. 3 EXAMPLES OF GAMES WITH A PURPOSE FOR VGI • Collection of missing data: GWAP enabler for OSM Restaurants • Validation of automatically collected information: LCV game • Collection, validation and correction of data: Urbanopoly 19copyright © 2018 Cefriel – All rights reserved
  20. 20. 20 • Input: OSM restaurants in a given area with/without cuisine tag (those with the tag are used for assessing player reliability) • Goal: assign score 𝜎 to each restaurant-cuisine pair to discover the “right” category • Score 𝜎 of each pair is updated on the basis of players’ choices (incremented if link selected) • When the score overcomes the threshold 𝜎 ≥ 𝑡 , the restaurant’s category is considered “true” (and removed from the game) • Restaurant POIs (amenity=restaurant) from OSM may miss the cuisine type (cuisine key) GWAP ENABLER TUTORIAL FOR OSM RESTAURANTS copyright © 2018 Cefriel – All rights reserved Pure GWAP with double player game mechanics Points, badges, leaderboard as intrinsic reward A player scores if he/she chooses the same cuisine of its gameplay “mate” Data validation is a result of the “agreement” between players https://github.com/STARS4ALL/ gwap-enabler-tutorial Points, badges, leaderboard as intrinsic reward
  21. 21. 21 • Input: set of pixels where the two classifications “disagree” • Goal: assign score 𝜎 to each pixel-category pair to discover the “right” land cover class • Score 𝜎 of each pair is updated on the basis of players’ choices (incremented if selected, decremented if not selected) • When the score overcomes the threshold 𝜎 ≥ 𝑡 , the pixel’s category is considered “true” (and removed from the game) • Two automatic land cover classifications in disagreement: • DUSAF (Lombardy Region) and GlobeLand 30 (Chinese governmental agency) LAND COVER VALIDATION GAME copyright © 2018 Cefriel – All rights reserved https://youtu.be/Q0ru1hhDM9Q http://bit.ly/foss4game Pure GWAP with not-so-hidden purpose (played by “experts”) Points, badges, leaderboard as intrinsic reward A player scores if he/she guess one of the two disagreeing classifications Data validation is a result of the “agreement” between players Maria Antonia Brovelli, Irene Celino, Andrea Fiano, Monia Elisa Molinari, Vijaycharan Venkatachalam. A crowdsourcing-based game for land cover validation. Applied Geomatics, 2017
  22. 22. 22 • Input: data from OSM • Goal: if data doesn’t exist, collect if data exists, validate if data is wrong, correct • Complex game embedding “mini-games” for data collection, validation and correction • Same score mechanisms, with score 𝜎 updated on the basis of players’ choices • When the score overcomes the threshold 𝜎 ≥ 𝑡 , data is considered “true” (and can be sent back to OSM) • POI information from OSM to be collected or validated/corrected URBANOPOLY copyright © 2018 Cefriel – All rights reserved Irene Celino. Geospatial dataset curation through a location-based game. Semantic Web Journal, Volume 6, Number 2, IOS Press, 2015 Monopoly-like game to win venues in the real world Wheel of fortune and mini-games to acquire venues and become “rich” in the game Data acquisition challenges as contributions for missing data Data validation challenges to check pre-existing data Result from players “agreement”
  23. 23. LESSONS LEARNED BY DESIGNING AND RUNNING THOSE GAMES • Designing and developing a full game is expensive • The simpler the game, the better its acceptance by players and its “throughput” • Different players are motivated by different incentives • Fun is not always enough to engage people, especially in the long term • Data collected via games can be enough to train automatic models 23copyright © 2018 Cefriel – All rights reserved Gloria Re Calegari, Gioele Nasi, Irene Celino. Human Computation vs. Machine Learning: an Experimental Comparison for Image Classification. Human Computation Journal, 2018.
  24. 24. from ideation to business value 24 4. INDIRECT PEOPLE INVOLVEMENT Are there indirect ways to involve humans in data processing? copyright © 2018 Cefriel – All rights reserved
  25. 25. HUMANS AS A SOURCE OF INFORMATION • People are not only task executors, they are also information providers! • Open content and cooperative knowledge • Data explicitly provided by people like VGI can “hide” further information • e.g., logs of wiki editing, statistical distribution of contributes • Opportunistic sensing • Voluntary or involuntary digital traces of human-related activities • e.g., phone call logs, GPS traces, social media activities 25copyright © 2018 Cefriel – All rights reserved
  26. 26. FROM SPATIAL ANALYTICS TO GEO-SPATIAL “SEMANTICS” • Spatial distribution and conglomeration of specific points of interest (POI) from OpenStreetMap can give hints about the geographical space • Re-engineering of spatial features through comparison between areas: same POI type shows different distribution  evidence for different semantics (e.g. what is a pub in Milano vs. London) • Semantic specification of spatial neighbourhoods: • Emerging neighbourhoods from spatial clustering of POIs (opposed to administrative divisions) • Spatial version of tf-idf to compare between different areas (e.g. central or peripheral areas in different cities) and to characterise neighbourhoods (e.g. shopping district) 26copyright © 2018 Cefriel – All rights reserved Gloria Re Calegari, Emanuela Carlino, Irene Celino, Diego Peroni. Supporting Geo-Ontology Engineering through Spatial Data Analytics. 13th Extended Semantic Web Conference, 2016
  27. 27. FROM POI INFORMATION AND PHONE CALL LOGS TO LAND USE • General topic: exploit “low-cost” information about a geographic area as features to train a predictive model that outputs “expensive” information about the same area • “Inexpensive” input information: • Geo-information about points of interests processed to characterize space (distance from the nearest POI of type X) • Mobile traffic data processed using different time series techniques (smoothing, decomposition, filtering, time-windowing) • “Expensive” output information: • Land use characterization (usually collected through long and expensive workflows that mix machine processing and costly human labour) 27copyright © 2018 Cefriel – All rights reserved Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Extracting Urban Land Use from Linked Open Geospatial Data. IJGI, 2015 Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Filtering and Windowing Mobile Traffic Time Series for Territorial Land Use Classification. COMCOM, 2016
  28. 28. from ideation to business value 28 5. FUTURE PERSPECTIVES Are we there yet?!? copyright © 2018 Cefriel – All rights reserved
  29. 29. FUTURE PERSPECTIVES • VGI management is still an open issue • Human Computation methods (and the like) can be employed to support VGI management • Parallel/joint adoption of different methods to get the best out of them • Research challenges are still the same • Collection, completion/coverage, quality, (in)homogeneity, update/sustainability, … • Human-in-the-loop is an emerging trend and paradigm also in Machine Learning research (e.g. active learning) 29copyright © 2018 Cefriel – All rights reserved
  30. 30. MILANO viale Sarca 226, 20126, Milano - Italy LONDON 4th floor 57 Rathbone Place London W1T 1JU – UK NEW YORK One Liberty Plaza, 165 Broadway, 23rd Floor, New York City, New York, 10006 USA Cefriel.com Thanks for your attention! Any question? Irene Celino Knowledge Technologies Digital Interaction Division irene.celino@cefriel.com

×