Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 12

A Template-Based Approach for Annotating Long-Tailed Datasets

0

Share

Download to read offline

An increasing amount of data is shared on the Web through heterogeneous spreadsheets and CSV files. In order to homogenize and query these data, the scientific community has developed Extract, Transform and Load (ETL) tools and services that help making these files machine readable in Knowledge Graphs (KGs). However, tabular data may be complex; and the level of expertise required by existing ETL tools makes it difficult for users to describe their own data. In this paper we propose a simple annotation schema to guide users when transforming complex tables into KGs. We have implemented our approach by extending T2WML, a table annotation tool designed to help users annotate their data and upload the results to a public KG. We have evaluated our effort with six non-expert users, obtaining promising preliminary results.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

A Template-Based Approach for Annotating Long-Tailed Datasets

  1. 1. Information Sciences Institute A Template-Based Approach for Annotating Long-Tailed Datasets Daniel Garijo, Ke-Thia Yao, Amandeep Singh and Pedro Szekely {dgarijo, kyao, amandeep, szeke}@isi.edu @dgarijov This work was funded by the Defense Advanced Research Projects Agency (DARPA)
  2. 2. Information Sciences Institute Transforming tabular data into KGs... Expert
  3. 3. Information Sciences Institute Transforming tabular data into KGs... How can we ease the process for non-experts? Expert
  4. 4. Information Sciences Institute Challenges: Annotation Oil production Subject to annotate Variable (predicate) Object (values) Time Qualifiers
  5. 5. Information Sciences Institute Challenges: Annotation Oil production Oil price Units!Time ● Multiple variables, missing values, etc.
  6. 6. Information Sciences Institute Challenges: Summary How to create a way for non-experts to annotate their data… - Without having to learn a mapping language - Capturing qualifiers of described variables - Ignoring undesired columns/incomplete cells - Share the results as part of a public KG
  7. 7. Information Sciences Institute Proposed workflow Users should be able to 1. Annotate their data 2. Preview their progress 3. Share their results (KG) ?
  8. 8. Information Sciences Institute Annotation schema • We adopt the Wikidata data model (s,p,o,q,r) • Add 7 rows to define metadata https://t2wml-annotation.readthedocs.io
  9. 9. Information Sciences Institute T2WML Extension Load data Link and review Preview and upload (or save) https://github.com/usc-isi-i2/t2wml
  10. 10. Information Sciences Institute Sharing annotated datasets with Datamart https://github.com/usc-isi-i2/datamart-api Implementation (password protected): https://dsbox02.isi.edu/datamart-api/ REST API Datamart: - Metadata catalog (search variables, datasets, locations, etc.) - Data catalog (time series data)
  11. 11. Information Sciences Institute (Very) preliminary results Evaluation with users: - 6 users (not familiar with Semantic Web technologies) - Knowledge in Data Science/Scripting - 1 hour training in T2WML/schema - 3 datasets (each dataset was assigned to two users) Results: - All users were able to describe and upload their data - Trouble understanding differences between variables and qualifiers
  12. 12. Information Sciences Institute Conclusions and future work - Non experts should be empowered to populate existing KGs with their own data. - We propose a simple workflow to let users annotate, preview and share their data as a KG - Next steps: incorporate table understanding approaches in the annotation process - Less effort from users required

×