Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment

317 views

Published on

Crowdsourcing has emerged as a powerful paradigm for quality assessment and improvement of Linked Data. A major challenge of employing crowdsourcing, for quality assessment in Linked Data, is the cold-start problem: how to estimate the reliability of crowd workers and assign the most reliable workers to tasks? We address this challenge by
proposing a novel approach for generating test questions from DBpedia based on the topics associated with quality assessment tasks. These test questions are used to estimate the reliability of the new workers. Subsequently, the tasks are dynamically assigned to reliable workers to help improve the accuracy of collected responses. Our proposed approach, ACRyLIQ, is evaluated using workers hired from Amazon Mechanical Turk, on two real-world Linked Data datasets. We validate the proposed approach in terms of accuracy and compare it against the baseline approach of reliability estimate using gold-standard task. The results demonstrate that our proposed approach achieves high accuracy without using gold-standard tasks.

Published in: Technology
  • Be the first to comment

Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment

  1. 1. EKAW 2016 ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment Umair ul Hassan, Amrapali Zaveri, Edgard Marx, Edward Curry, Jens Lehmann
  2. 2. Background • Linked Data Quality Assessment (LDQA) – Incomplete, inaccurate, inconsistent data in LOD • Crowdsourcing LDQA 1. Generate Micro-tasks to assess quality of Linked Data dataset 2. Recruits crowd workers to perform LDQA tasks 3. Update dataset based on crowd answers Zaveri, Amrapali, et al. "Quality assessment for linked data: A survey." Semantic Web 7.1 (2015): 63-93. Acosta, Maribel, et al. "Crowdsourcing linked data quality assessment." International Semantic Web Conference. Springer Berlin Heidelberg, 2013. 2 Linked Dataset LDQA tasks Updates Crowd Workers Answers
  3. 3. Research Challenge • Workers have varying reliability and expertise depending on the domain and topics of a datasets 3 Linked Dataset Crowdsourced LDQA tasks How can we estimate the reliability of crowd workers to achieve high accuracy of LDQA tasks though adaptive task assignment?
  4. 4. Existing Approach • Use experts to create gold-standard tasks (GST) • Estimate worker reliability and assign tasks 4 Correct Responses Gold-standard LDQA tasks Linked Dataset Crowdsourced LDQA tasks 1) GST Selection 2) Task Assignment Domain Experts
  5. 5. Propose Approach • Leverage DBPedia to generate knowledge-based questions (KBQs) • Estimate worker reliability and assign tasks 5 Facts (i.e. triples) KBQs Linked Dataset Crowdsourced LDQA tasks 1) KBQ Selection 2) Task Assignment
  6. 6. Evaluation Methodology Languages Interlinks LDQA Tasks Verify language tags for entities in LinkedSpending dataset Verify relationships between entities as generated by OAEI Topics Chinese, English, French, Japanese, Russian Anatomy, Books, Economics, Geography, Nature KBQs Verify language of Dbpedia facts Verify Dbpedia facts based on SKOS relationships No. of tasks 25 25 No. of KBQs 10 10 6
  7. 7. Evaluation Methodology • Crowd Workers – 60 workers from Amazon Mechanical Turk – $1.5 for 30 mins – Provided answers to 10 KBQs and 25 tasks for both datasets – Diverse reliability on Languages tasks – Low reliability on Interlinks tasks 7
  8. 8. Results: Compared Approaches KBQ approach generates reliability estimates similar to the GST approach 8
  9. 9. Results: Algorithm Parameters 9
  10. 10. Summary • Strengths – KBQs provide a quick and inexpensive method of estimating the reliability and expertise of workers – Our approach is particularly suited for complex and knowledge- intensive tasks • Limitations – Assumption that LDQA tasks and KBQs are partitioned according to same set of topics – Assumption that the all facts in Dbpedia are correct – Assumption that dataset topics are mutually exclusive • Future work – Scalability of the proposed approach needs to be validated – Evaluate of wide range of tasks and datasets 10
  11. 11. Thank you Umair Ul Hassan, Amrapali Zaveri, Edgard Marx, Edward Curry, and Jens Lehmann. “ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment”. In: 20th International Conference on Knowledge Engineering and Knowledge Management. Springer International Publishing. 2016 Questions: umair.ulhassan@insight-centre.org

×