Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

TripleCheckMate

831 views

Published on

Presentation of the TripleCheckMate tool: http://aksw.org/Projects/TripleCheckMate.html @KESW 2013 (kesw.ifmo.ru/kesw2013/)

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

TripleCheckMate

  1. 1. TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data Dimitris Kontokostas, Amrapali Zaveri, Sören Auer and Jens Lehmann KESW 2013 Oct 08, 2013
  2. 2. Outline ❏ Data Quality ❏ Data Quality Assessment Methodology ❏ Evaluation Methodology - Manual ❏ Phase I: Quality Problem Taxonomy ❏ Phase II: Crowdsourcing Quality Assessment ❏ TripleCheckMate ❏ Architecture ❏ Demo ❏ Conclusion & Future Work 2
  3. 3. Data Quality ● Data Quality (DQ) is defined as: ○ fitness for a certain use case* ● On the Data Web - varying quality of information covering various domains ● High quality datasets ○ curated over decades - life science domain ○ crowdsourcing process - extracted from unstructured and semi-structured information, e.g. DBpedia * J. Juran. The Quality Control Handbook. McGraw-Hill, New York, 1974. 3
  4. 4. Data Quality Assessment Methodology 4 Step Methodology: ❏ Step 1: Resource selection ❏ Per Class ❏ Completely random ❏ Manual ❏ Step 2: Evaluation mode selection ❏ Manual ❏ Semi-automatic ❏ Automatic ❏ Step 3: Resource evaluation ❏ Step 4: DQ improvement ❏ Direct ❏ Indirect 4
  5. 5. Evaluating Methodology - Manual ❏Phase I: Creation of quality problem taxonomy ❏Phase II: Crowdsourcing quality assessment 5
  6. 6. Phase I: Quality Problem Taxonomy AZaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality assessment methodologies for Linked Open Data: A Review. Under review, available at http://www.semantic-webjournal.net/content/quality-assessment-methodologieslinked-open-data. 6
  7. 7. Phase II: Crowdsourcing Quality Assessment Crowdsourcing Our Approach Type Human Intelligent Tasks (HITs) Contest-based Participants Labor market Linked Data (LD) experts Task Detect quality issues in triples Detect & classify quality issues in resources Reward Per tasks/triple Most no. of resources evaluated Tool Amazon Mechanical Turk, CrowdFlower etc. TripleCheckMate 7
  8. 8. TripleCheckMate - Architecture (1/2) 8
  9. 9. TripleCheckMate - Architecture (2/2) ● Built on Java / GWT ○ GWT compiles to native cross-browser HTML/JS ● Tomcat / Jetty & MySQL as minimal backend ○ store/retrieve evaluation data only ● Application logic is built on the client ○ SPARQL executed on client ○ Portable 9
  10. 10. Evaluation storage schema ● Designed to support multiple campaigns and different ontologies ● Quality taxonomy is stored in the database which makes it easy to adapt 10
  11. 11. TripleCheckMate - Demo http://tinyurl.com/TCM-Demo http://tinyurl.com/TCM-Screencast
  12. 12. Conclusion & Future Work ● TripleCheckMate ○ Tool for crowdsouring quality assessment ○ Linked Data quality assessment ○ Supports inter-rater agreement ○ Can be used with any Linked Dataset ● Future Work ○ Directly integrating semi-automatic methods ○ Improve efficiency of quality assessment ○ Include support for Patch Ontology* as output format * M. Knuth, J. Hercher, and H. Sack. Collaboratively patching linked data. CoRR, 2012. 12
  13. 13. Thank You Questions? http://nl.dbpedia.org:8080/TripleCheckMate-Demo/ https://github.com/AKSW/TripleCheckMate http://aksw.org/AmrapaliZaveri zaveri@informatik.uni-leipzig.de Twitter: @amrapaliz

×