SlideShare a Scribd company logo
1 of 12
Download to read offline
Information Sciences Institute
A Template-Based Approach for Annotating
Long-Tailed Datasets
Daniel Garijo, Ke-Thia Yao, Amandeep Singh and
Pedro Szekely
{dgarijo, kyao, amandeep, szeke}@isi.edu
@dgarijov
This work was funded by the Defense Advanced Research Projects Agency (DARPA)
Information Sciences Institute
Transforming tabular data into KGs...
Expert
Information Sciences Institute
Transforming tabular data into KGs...
How can we ease the process for non-experts?
Expert
Information Sciences Institute
Challenges: Annotation
Oil production
Subject to annotate
Variable (predicate)
Object (values)
Time
Qualifiers
Information Sciences Institute
Challenges: Annotation
Oil production
Oil price Units!Time
● Multiple variables, missing values, etc.
Information Sciences Institute
Challenges: Summary
How to create a way for non-experts to annotate their data…
- Without having to learn a mapping language
- Capturing qualifiers of described variables
- Ignoring undesired columns/incomplete cells
- Share the results as part of a public KG
Information Sciences Institute
Proposed workflow
Users should be able to
1. Annotate their data
2. Preview their progress
3. Share their results (KG)
?
Information Sciences Institute
Annotation schema
• We adopt the Wikidata data model (s,p,o,q,r)
• Add 7 rows to define metadata
https://t2wml-annotation.readthedocs.io
Information Sciences Institute
T2WML Extension
Load data
Link and
review
Preview and
upload (or save)
https://github.com/usc-isi-i2/t2wml
Information Sciences Institute
Sharing annotated datasets with
Datamart
https://github.com/usc-isi-i2/datamart-api
Implementation (password protected): https://dsbox02.isi.edu/datamart-api/
REST
API
Datamart:
- Metadata catalog (search variables, datasets, locations,
etc.)
- Data catalog (time series data)
Information Sciences Institute
(Very) preliminary results
Evaluation with users:
- 6 users (not familiar with Semantic Web technologies)
- Knowledge in Data Science/Scripting
- 1 hour training in T2WML/schema
- 3 datasets (each dataset was assigned to two users)
Results:
- All users were able to describe and upload their data
- Trouble understanding differences between variables
and qualifiers
Information Sciences Institute
Conclusions and future work
- Non experts should be empowered to populate existing
KGs with their own data.
- We propose a simple workflow to let users annotate,
preview and share their data as a KG
- Next steps: incorporate table understanding approaches
in the annotation process
- Less effort from users required

More Related Content

What's hot

OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
 
A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database Darroch Greally
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community UpdateCarole Goble
 
Mining Sociotechnical Information From Software Repositories
Mining Sociotechnical Information From Software RepositoriesMining Sociotechnical Information From Software Repositories
Mining Sociotechnical Information From Software RepositoriesMarco Aurelio Gerosa
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Computer Science PhD Project Guidance
Computer Science PhD Project GuidanceComputer Science PhD Project Guidance
Computer Science PhD Project GuidancePhD Services
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)Carole Goble
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2Daniel JACOB
 
Findability through Traceability - A Realistic Application of Candidate Tr...
Findability through Traceability  - A Realistic Application of Candidate Tr...Findability through Traceability  - A Realistic Application of Candidate Tr...
Findability through Traceability - A Realistic Application of Candidate Tr...Markus Borg
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Stuart Chalk
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
TienResumeFinalV22016
TienResumeFinalV22016TienResumeFinalV22016
TienResumeFinalV22016Nora Tien
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 

What's hot (20)

OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Coming to terms to FAIR semantics
Coming to terms to FAIR semanticsComing to terms to FAIR semantics
Coming to terms to FAIR semantics
 
Resume
ResumeResume
Resume
 
A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
Mining Sociotechnical Information From Software Repositories
Mining Sociotechnical Information From Software RepositoriesMining Sociotechnical Information From Software Repositories
Mining Sociotechnical Information From Software Repositories
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Computer Science PhD Project Guidance
Computer Science PhD Project GuidanceComputer Science PhD Project Guidance
Computer Science PhD Project Guidance
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
 
Findability through Traceability - A Realistic Application of Candidate Tr...
Findability through Traceability  - A Realistic Application of Candidate Tr...Findability through Traceability  - A Realistic Application of Candidate Tr...
Findability through Traceability - A Realistic Application of Candidate Tr...
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
 
Brian_Thomas_Resume_20160215
Brian_Thomas_Resume_20160215Brian_Thomas_Resume_20160215
Brian_Thomas_Resume_20160215
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt
Iwesep19.ppt
 
TienResumeFinalV22016
TienResumeFinalV22016TienResumeFinalV22016
TienResumeFinalV22016
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 

Similar to A Template-Based Approach for Annotating Long-Tailed Datasets

Resume_Shalini_Ajmera_Latest
Resume_Shalini_Ajmera_LatestResume_Shalini_Ajmera_Latest
Resume_Shalini_Ajmera_LatestShalini Ajmera
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
Nitin - Data Specialist
Nitin - Data SpecialistNitin - Data Specialist
Nitin - Data SpecialistNitin singhal
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19Yong Siang (Ivan) Tan
 
Scientific
Scientific Scientific
Scientific marpierc
 
Using Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience InfrastructureUsing Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience InfrastructureDaniel S. Katz
 
MODERN DATA PIPELINE
MODERN DATA PIPELINEMODERN DATA PIPELINE
MODERN DATA PIPELINEIRJET Journal
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Research Data Alliance
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science EducationJames Hendler
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
IRJET- Natural Language Query Processing
IRJET- Natural Language Query ProcessingIRJET- Natural Language Query Processing
IRJET- Natural Language Query ProcessingIRJET Journal
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 

Similar to A Template-Based Approach for Annotating Long-Tailed Datasets (20)

Resume_Shalini_Ajmera_Latest
Resume_Shalini_Ajmera_LatestResume_Shalini_Ajmera_Latest
Resume_Shalini_Ajmera_Latest
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Nitin - Data Specialist
Nitin - Data SpecialistNitin - Data Specialist
Nitin - Data Specialist
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
Jeevananthan_Informatica
Jeevananthan_InformaticaJeevananthan_Informatica
Jeevananthan_Informatica
 
Aditya java nov_2015
Aditya java nov_2015Aditya java nov_2015
Aditya java nov_2015
 
Resume
ResumeResume
Resume
 
Sas - Introduction to designing the data mart
Sas - Introduction to designing the data martSas - Introduction to designing the data mart
Sas - Introduction to designing the data mart
 
Resume (1)
Resume (1)Resume (1)
Resume (1)
 
Scientific
Scientific Scientific
Scientific
 
Using Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience InfrastructureUsing Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience Infrastructure
 
MODERN DATA PIPELINE
MODERN DATA PIPELINEMODERN DATA PIPELINE
MODERN DATA PIPELINE
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Vishnu_Gowthem_Resume
Vishnu_Gowthem_ResumeVishnu_Gowthem_Resume
Vishnu_Gowthem_Resume
 
IRJET- Natural Language Query Processing
IRJET- Natural Language Query ProcessingIRJET- Natural Language Query Processing
IRJET- Natural Language Query Processing
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 

More from dgarijo

WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineeringdgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 
Publicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigaciónPublicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigacióndgarijo
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overviewdgarijo
 
Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)dgarijo
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsdgarijo
 
Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods dgarijo
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015dgarijo
 
Towards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard RepresentationsTowards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard Representationsdgarijo
 
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline UsersWorkflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Usersdgarijo
 
Frag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific WorkflowsFrag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific Workflowsdgarijo
 

More from dgarijo (20)

WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
Publicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigaciónPublicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigación
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
Towards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard RepresentationsTowards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard Representations
 
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline UsersWorkflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
 
Frag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific WorkflowsFrag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific Workflows
 

Recently uploaded

Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...arifengg7
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...gerogepatton
 
Introduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxIntroduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxPoonam60376
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfLivre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfsaad175691
 
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...Ayisha586983
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
ADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studyADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studydhruvamdhruvil123
 
Indian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfIndian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfalokitpathak01
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunicationnovrain7111
 
10 AsymmetricKey Cryptography students.pptx
10 AsymmetricKey Cryptography students.pptx10 AsymmetricKey Cryptography students.pptx
10 AsymmetricKey Cryptography students.pptxAdityaGoogle
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
1- Practice occupational health and safety procedures.pptx
1- Practice occupational health and safety procedures.pptx1- Practice occupational health and safety procedures.pptx
1- Practice occupational health and safety procedures.pptxMel Paras
 

Recently uploaded (20)

Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
 
Introduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxIntroduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptx
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfLivre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
 
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
Versatile Engineering Construction Firms
Versatile Engineering Construction FirmsVersatile Engineering Construction Firms
Versatile Engineering Construction Firms
 
ADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studyADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain study
 
Indian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfIndian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdf
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunication
 
10 AsymmetricKey Cryptography students.pptx
10 AsymmetricKey Cryptography students.pptx10 AsymmetricKey Cryptography students.pptx
10 AsymmetricKey Cryptography students.pptx
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
1- Practice occupational health and safety procedures.pptx
1- Practice occupational health and safety procedures.pptx1- Practice occupational health and safety procedures.pptx
1- Practice occupational health and safety procedures.pptx
 

A Template-Based Approach for Annotating Long-Tailed Datasets

  • 1. Information Sciences Institute A Template-Based Approach for Annotating Long-Tailed Datasets Daniel Garijo, Ke-Thia Yao, Amandeep Singh and Pedro Szekely {dgarijo, kyao, amandeep, szeke}@isi.edu @dgarijov This work was funded by the Defense Advanced Research Projects Agency (DARPA)
  • 2. Information Sciences Institute Transforming tabular data into KGs... Expert
  • 3. Information Sciences Institute Transforming tabular data into KGs... How can we ease the process for non-experts? Expert
  • 4. Information Sciences Institute Challenges: Annotation Oil production Subject to annotate Variable (predicate) Object (values) Time Qualifiers
  • 5. Information Sciences Institute Challenges: Annotation Oil production Oil price Units!Time ● Multiple variables, missing values, etc.
  • 6. Information Sciences Institute Challenges: Summary How to create a way for non-experts to annotate their data… - Without having to learn a mapping language - Capturing qualifiers of described variables - Ignoring undesired columns/incomplete cells - Share the results as part of a public KG
  • 7. Information Sciences Institute Proposed workflow Users should be able to 1. Annotate their data 2. Preview their progress 3. Share their results (KG) ?
  • 8. Information Sciences Institute Annotation schema • We adopt the Wikidata data model (s,p,o,q,r) • Add 7 rows to define metadata https://t2wml-annotation.readthedocs.io
  • 9. Information Sciences Institute T2WML Extension Load data Link and review Preview and upload (or save) https://github.com/usc-isi-i2/t2wml
  • 10. Information Sciences Institute Sharing annotated datasets with Datamart https://github.com/usc-isi-i2/datamart-api Implementation (password protected): https://dsbox02.isi.edu/datamart-api/ REST API Datamart: - Metadata catalog (search variables, datasets, locations, etc.) - Data catalog (time series data)
  • 11. Information Sciences Institute (Very) preliminary results Evaluation with users: - 6 users (not familiar with Semantic Web technologies) - Knowledge in Data Science/Scripting - 1 hour training in T2WML/schema - 3 datasets (each dataset was assigned to two users) Results: - All users were able to describe and upload their data - Trouble understanding differences between variables and qualifiers
  • 12. Information Sciences Institute Conclusions and future work - Non experts should be empowered to populate existing KGs with their own data. - We propose a simple workflow to let users annotate, preview and share their data as a KG - Next steps: incorporate table understanding approaches in the annotation process - Less effort from users required