TripleCheckMate

764 views

Published on

Presentation of the TripleCheckMate tool: http://aksw.org/Projects/TripleCheckMate.html @KESW 2013 (kesw.ifmo.ru/kesw2013/)

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
764
On SlideShare
0
From Embeds
0
Number of Embeds
33
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

TripleCheckMate

  1. 1. TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data Dimitris Kontokostas, Amrapali Zaveri, Sören Auer and Jens Lehmann KESW 2013 Oct 08, 2013
  2. 2. Outline ❏ Data Quality ❏ Data Quality Assessment Methodology ❏ Evaluation Methodology - Manual ❏ Phase I: Quality Problem Taxonomy ❏ Phase II: Crowdsourcing Quality Assessment ❏ TripleCheckMate ❏ Architecture ❏ Demo ❏ Conclusion & Future Work 2
  3. 3. Data Quality ● Data Quality (DQ) is defined as: ○ fitness for a certain use case* ● On the Data Web - varying quality of information covering various domains ● High quality datasets ○ curated over decades - life science domain ○ crowdsourcing process - extracted from unstructured and semi-structured information, e.g. DBpedia * J. Juran. The Quality Control Handbook. McGraw-Hill, New York, 1974. 3
  4. 4. Data Quality Assessment Methodology 4 Step Methodology: ❏ Step 1: Resource selection ❏ Per Class ❏ Completely random ❏ Manual ❏ Step 2: Evaluation mode selection ❏ Manual ❏ Semi-automatic ❏ Automatic ❏ Step 3: Resource evaluation ❏ Step 4: DQ improvement ❏ Direct ❏ Indirect 4
  5. 5. Evaluating Methodology - Manual ❏Phase I: Creation of quality problem taxonomy ❏Phase II: Crowdsourcing quality assessment 5
  6. 6. Phase I: Quality Problem Taxonomy AZaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality assessment methodologies for Linked Open Data: A Review. Under review, available at http://www.semantic-webjournal.net/content/quality-assessment-methodologieslinked-open-data. 6
  7. 7. Phase II: Crowdsourcing Quality Assessment Crowdsourcing Our Approach Type Human Intelligent Tasks (HITs) Contest-based Participants Labor market Linked Data (LD) experts Task Detect quality issues in triples Detect & classify quality issues in resources Reward Per tasks/triple Most no. of resources evaluated Tool Amazon Mechanical Turk, CrowdFlower etc. TripleCheckMate 7
  8. 8. TripleCheckMate - Architecture (1/2) 8
  9. 9. TripleCheckMate - Architecture (2/2) ● Built on Java / GWT ○ GWT compiles to native cross-browser HTML/JS ● Tomcat / Jetty & MySQL as minimal backend ○ store/retrieve evaluation data only ● Application logic is built on the client ○ SPARQL executed on client ○ Portable 9
  10. 10. Evaluation storage schema ● Designed to support multiple campaigns and different ontologies ● Quality taxonomy is stored in the database which makes it easy to adapt 10
  11. 11. TripleCheckMate - Demo http://tinyurl.com/TCM-Demo http://tinyurl.com/TCM-Screencast
  12. 12. Conclusion & Future Work ● TripleCheckMate ○ Tool for crowdsouring quality assessment ○ Linked Data quality assessment ○ Supports inter-rater agreement ○ Can be used with any Linked Dataset ● Future Work ○ Directly integrating semi-automatic methods ○ Improve efficiency of quality assessment ○ Include support for Patch Ontology* as output format * M. Knuth, J. Hercher, and H. Sack. Collaboratively patching linked data. CoRR, 2012. 12
  13. 13. Thank You Questions? http://nl.dbpedia.org:8080/TripleCheckMate-Demo/ https://github.com/AKSW/TripleCheckMate http://aksw.org/AmrapaliZaveri zaveri@informatik.uni-leipzig.de Twitter: @amrapaliz

×