Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Crowdsourcing Linked Data Quality Assessment

507 views

Published on

Presented at Data Quality Tutorial: 2016.semantics.cc/satellite-events/data-quality-tutorial

Published in: Technology
  • Be the first to comment

Crowdsourcing Linked Data Quality Assessment

  1. 1. www.kit.edu @Data Quality Tutorial, September 12, 2016 Crowdsourcing Linked Data Quality Assessment Amrapali Zaveri
  2. 2. Linked Data - over billion facts What about the quality?
  3. 3. Motivation - Linked Data Quality Varying quality of Linked Data sources Source, Extraction, Integration etc. Some quality issues require certain interpretation that can be easily performed by humans Incompleteness Incorrectness Semantic Accuracy
  4. 4. Motivation - Linked Data Quality Solution: Include human verification in the process of LD quality assessment via crowdsourcing Human Intelligent Tasks (HIT) Labor market Monetary Reward/Incentive Time & Cost effective Large-scale problem solving approach, divided into smaller tasks, independently solved by a large group of people.
  5. 5. Research questions RQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing mechanisms? RQ2: What type of crowd is most suitable for each type of quality issue? RQ3: Which types of errors are made by lay users and experts when assessing RDF triples?
  6. 6. Related work Crowdsourcing & Linked Data Web of data quality assessment Our work ZenCrowd Entity resolution CrowdMAP Ontology alignment GWAP for LD Assessing LD mappings (Automatic) Quality characteristics of LD data sources (Semi-automatic) DBpedia WIQA, Sieve, (Manual)
  7. 7. OUR APPROACH
  8. 8. Find-Verify Phases of Crowdsourcing Contest LD Experts Difficult task Final prize Find Verify Microtasks Workers Easy task Micropayments TripleCheckMate [Kontoskostas2013] MTurk (1) Adapted from [Bernstein2010] http://mturk.com
  9. 9. LD Experts AMT Workers Type Contest-based Human Intelligent Tasks (HITs) Participants Linked Data (LD) experts Labor market Task Detect and classify quality issues in resources Detect quality issues in triples Reward Most no. of resources evaluated Per task/triple Tool TripleCheckMate Amazon Mechanical Turk, CrowdFlower etc. Difference between LD experts & Workers
  10. 10. Methodology Crowdsource using: • Linked Data experts - Find Phase • Amazon Mechanical Turk workers - Verify Phase
  11. 11. Crowdsourcing using Linked Data Experts — Methodology Phase I: Creation of quality problem taxonomy Phase II: Launching a contest
  12. 12. Zaveri et. al. Quality assessment methodologies for Linked Open Data. Semantic Web Journal, 2015. D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki Crowdsourcing using Linked Data Experts — Quality Problem Taxonomy
  13. 13. D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki Crowdsourcing using Linked Data Experts — Quality Problem Taxonomy D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki
  14. 14. http://nl.dbpedia.org:8080/TripleCheckMate-Demo/ Crowdsourcing using Linked Data Experts — Contest
  15. 15. Crowdsourcing using Linked Data Experts — Results
  16. 16. Methodology Crowdsource using: • Linked Data experts - Find Phase • Amazon Mechanical Turk workers - Verify Phase
  17. 17. Crowdsourcing using AMT Workers Selecting LD quality issues to crowdsource Designing and generating the micro tasks to present the data to the crowd 1 2 Dataset {s p o .} {s p o .} Correct Incorrect + Quality issue Steps: 1 2 3
  18. 18. Three categories of quality problems occur pervasively in DBpedia and can be crowdsourced: Incorrect/Incomplete object ▪Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”. Incorrect data type or language tags ▪Example: dbpedia:Torishima_Izu_Islands foaf:name “ ”@en. Incorrect link to “external Web pages” ▪Example: dbpedia:John-Two-Hawks dbpedia-owl:wikiPageExternalLink <http://cedarlakedvd.com/> Selecting LD quality issues to crowdsource 1
  19. 19. Presenting the data to the crowd • Selection of foaf:name or rdfs:label to extract human- readable descriptions • Values extracted automatically from Wikipedia infoboxes • Link to the Wikipedia article via foaf:isPrimaryTopicOf • Preview of external pages by implementing HTML iframe Microtask interfaces: MTurk tasks Incorrect object Incorrect data type or language tag Incorrect outlink 2
  20. 20. EXPERIMENTAL STUDY
  21. 21. Experimental design • Crowdsourcing approaches: • Find stage: Contest with LD experts • Verify stage: Microtasks • Creation of a gold standard: • Two of the authors of this paper (MA, AZ) generated the gold standard for all the triples obtained from the contest • Each author independently evaluated the triples • Conflicts were resolved via mutual agreement • Metric: precision
  22. 22. Overall results LD Experts Microtask workers Number of distinct participants 50 80 Total time 3 weeks (predefined) 4 days Total triples evaluated 1,512 1,073 Total cost ~ US$ 400 (predefined) ~ US$ 43
  23. 23. Precision results: Incorrect object task • MTurk workers can be used to reduce the error rates of LD experts for the Find stage • 117 DBpedia triples had predicates related to dates with incorrect/ incomplete values: ”2005 Six Nations Championship” Date 12 . • 52 DBpedia triples had erroneous values from the source: ”English (programming language)” Influenced by ? . • Experts classified all these triples as incorrect • Workers compared values against Wikipedia and successfully classified this triples as “correct” Triples compared LD Experts MTurk (majority voting: n=5) 509 0.7151 0.8977
  24. 24. = Precision results: Incorrect data type task Numberoftriples 0 38 75 113 150 Data types Date Millimetre Number Second Year Experts TP Experts FP Crowd TP Crowd FP Triples compared LD Experts MTurk (majority voting: n=5) 341 0.8270 0.4752
  25. 25. Precision results: Incorrect link task • We analyzed the 189 misclassifications by the experts: • The misclassifications by the workers correspond to pages with a language different from English. 11% 39% 50% Freebase links Wikipedia images External links Triples compared Baseline LD Experts MTurk (n=5 majority voting) 223 0.2598 0.1525 0.9412
  26. 26. Final discussion RQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing mechanisms? LD experts - incorrect datatype AMT workers - incorrect/incomplete object value, incorrect interlink RQ2: What type of crowd is most suitable for each type of quality issue? The effort of LD experts must be applied on those tasks demanding specific-domain skills. AMT workers were exceptionally good at performing data comparisons RQ3: Which types of errors are made by lay users and experts? Lay users do not have the skills to solve domain-specific tasks, while experts performance is very low on tasks that demand an extra effort (e.g., checking an external page)
  27. 27. CONCLUSIONS, CHALLENGES & FUTURE WORK
  28. 28. Conclusions A crowdsourcing methodology for LD quality assessment: Find stage: LD experts Verify stage: AMT workers Methodology and tool are generic to be applied to other scenarios Crowdsourcing approaches are feasible in detecting the studied quality issues
  29. 29. Challenges Lack of gold-standard Crowdsourcing design — how many workers? how many tasks? reward? Microtask design
  30. 30. Future Work Combining semi-automated and crowdsourcing methods Predicted vs. crowdsourced metadata Conducting new experiments (other domains) Entity, dataset, experimental metadata Fix/Improve Quality using Crowdsourcing Find-Fix-Verify Phases
  31. 31. References Triplecheckmate: A tool for crowdsourcing the quality assessment of linked data. D Kontokostas, A Zaveri, S Auer, J Lehmann, ISWC 2013. Crowdsourcing linked data quality assessment. M Acosta, A Zaveri, E Simperl, D Kontokostas, S Auer, J Lehmann, ISWC 2013. User-driven quality evaluation of DBpedia. A Zaveri, D Kontokostas, MA Sherif, L Bühmann, M Morsey, S Auer, J Lehmann, ISEMANTiCS 2013. Quality assessment for linked data: A survey. A Zaveri, A Rula, A Maurino, R Pietrobon, J Lehmann, S Auer, Semantic Web Journal 2015. Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. M Acosta, A Zaveri, E Simperl, D Kontokostas, F Flöck, J Lehmann, Semantic Web Journal 2016. ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment. U Hassan, A Zaveri, E Marx, E Curry, J Lehmann. EKAW 2016.
  32. 32. Thank You Questions? amrapali@stanford.edu @AmrapaliZ

×