SlideShare a Scribd company logo
1 of 21
A Generic Scientific Data Model
and Ontology for Representation
of Chemical Data
Stuart J. Chalk, Department of Chemistry
University of North Florida
schalk@unf.edu
CINF Paper 171 – 251st ACS Meeting Spring 2016
#ACSCINFDataSummit
Scientific Data Should be Open
 Simple: Openness as the norm not the exception
 Data made available, without restriction, so its useful
 Mechanisms/tools to make data available
 Formats to allow others to get the data…
 …but also so its easy to use
 Annotate the data to make it easy to find
 Community driven promotion of and action on this issue
 Research Notebook
 Spectral Files (JCAMP-DX, propriety)
 Excel Spreadsheets
 Personal Databases
 Online Databases
 PDF Files No!
 RDF Yes!
Resource Description Framework
Options for Storing Data?
 W3C Recommendation 2015
Specification - https://www.w3.org/TR/ldp/
Primer - https://www.w3.org/TR/ldp-primer/
The Linked Data
Platform
From: http://www.dataversity.net/introduction-linked-data-platform/
 Use JavaScript
Object Notation
(JSON) as a text
format for
storing data and
metadata so it
can be converted
to RDF
JSON for Linked Data (JSON-LD)
{
"@context": {
"name": "http://schema.org/name",
"isAlive": "http://example.org/isAlive",
"age": "http://example.org/age",
"height": "http://schema.org/height",
"@base": "http://www.unf.edu/chemistry/stuart_chalk.aspx"
},
"@id": "",
"name": "Stuart Chalk",
"isAlive": true,
"age": 49,
"height": 188.0
} http://json-ld.org/playground/
JSON for Linked Data (JSON-LD)
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://example.org/age>
"49"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://example.org/isAlive>
"true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://schema.org/height>
"188"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://schema.org/name>
"Stuart Chalk" .
 Nice idea but because anything can be
linked to anything else to form a graph of variable structure…
 ...difficult to search, hard to maintain
 OK, use regular relational database – Rigid Schema
Not good to try and make data fit the schema…
 Use a hybrid approach!
 Encode some structure in RDF using a framework...
 ...add data to the structured graph in an organized way
Store all Scientific Data in RDF?
 Consider FAIR Principals (http://www.datafairport.org)
 To be Findable:
 F1. (meta)data are assigned a globally unique and persistent identifier
 F2. data are described with rich metadata (defined by R1 below)
 F3. metadata clearly and explicitly include the identifier of the data it describes
 F4. (meta)data are registered or indexed in a searchable resource
 To be Accessible:
 A1. (meta)data are retrievable by their identifier using a standardized communications protocol
 A2. metadata are accessible, even when the data are no longer available
 To be Interoperable:
 I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
 I2. (meta)data use vocabularies that follow FAIR principles
 I3. (meta)data include qualified references to other (meta)data
 To be Reusable:
 R1. meta(data) are richly described with a plurality of accurate and relevant attributes
 R1.1. (meta)data are released with a clear and accessible data usage license
 R1.2. (meta)data are associated with detailed provenance
 R1.3. (meta)data meet domain-relevant community standards
What Metadata is Important for Data?
 Define scope as data obtained from an experiment,
a series of experiments, a project
 Who did the work and where are they?
 Metadata about the data “packet”
 The raw data…
 …its associated metadata (enough to properly contextualize the data)
 Access rights
 Published location
What Should a Data Model Represent?
General
Framework
 SciData – Scientific Data
Model (SDM)
 Overview –
http://stuchalk.github.io/scidata/
 GitHub Repo –
https://github.com/stuchalk/scidata
General Framework
- The Context
 “@context” contains the
context definition
 Refers to other context files
 Namespace abbreviations
 Default vocabulary “@vocab”
 “@id” links ontology term
 “@type” states data type
Methodology, System, and Dataset
Example Data - pH
Example Data -
Literature Value
 “scope” provides internal link
to “@id” value
 Each value of a name value pair
has a default data type that can
be override by expanding value
to a JSON object and adding
“@value” and “@type”
Example Data -
NMR Spectrum
 “dataseries” are JSON arrays of
data on one axis
 Bring them together with
“datagroup” and we can
represent at spectrum
 “parameter” is generic
container for data, or metadata
Example Data –
CC Calculation
 “datagroup”s are structures to
aggregate data at any level
 “datagroup”s can be infinitely
nested
 “uid” is optional and can be
used to unique define any piece
of data
The SDM
Ontology
 SciData Ontology –
Scientific Data Model
Ontology (SDMO)
 OWL File –
https://github.com/stuchalk/scidata/b
lob/master/ontology/scidata.owl
 Get community feedback, refine/extend/standardize
 Generate large corpus of disparate data in JSON-LD, ingest into triple store
and query (SPARQL)
 Evaluate inferencing on the triple store data
 Push adoption through collaboration
 Run hackathons to build developer implementations
 Develop Electronic Laboratory Notebook (ELN) to generate data in JSON-LD
 Get feedback from data community, RDA - https://rd-alliance.org/
 Test using the NDS - http://www.nationaldataservice.org/
Future Work
 Pain Points
 Challenges
 Opportunities
 Normalization
 Tools to generate
metadata automatically
 User Perspective
 Gaps in Data
 Gaps in Ontology Coverage
Pain Points?
 Gather stakeholders to work on standards
 Broad knowledge domain representation
 i-UPAC, RDA Chemistry Research Data IG
 Priorities?
 Data annotation and representation
 Data exchange (repo <-> repo, user <-> user)
 Structure representation (chiral centers)
 Curation infrastructures
 Domain vocabulary translations
 Units of measure
Reality Check
“to err is human; to forgive, divine”
Alexander Pope
“to err is human; to really screw things up requires a computer”
Paul Ehrlich
“to err is human; all hell will break loose if you
don’t provide accurate semantics to a computer”
Stuart Chalk
 schalk@unf.edu
 Phone: 904-620-1938
 Skype: stuartchalk
 LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk
 ORCID: http://orcid.org/0000-0002-0703-7776
 ResearcherID: http://www.researcherid.com/rid/D-8577-2013
Questions?

More Related Content

What's hot

Geospatial Analysis: Innovation in GIS for Better Decision Making
Geospatial Analysis: Innovation in GIS for Better Decision MakingGeospatial Analysis: Innovation in GIS for Better Decision Making
Geospatial Analysis: Innovation in GIS for Better Decision MakingMEASURE Evaluation
 
Hidrovias, portos e aeroportos - aula 02 - sistema de transporte aéreo jba
Hidrovias, portos e aeroportos - aula 02 - sistema de transporte aéreo jbaHidrovias, portos e aeroportos - aula 02 - sistema de transporte aéreo jba
Hidrovias, portos e aeroportos - aula 02 - sistema de transporte aéreo jbaRafael José Rorato
 
Data Center Infrastructure Management(DCIM)
Data Center Infrastructure Management(DCIM)Data Center Infrastructure Management(DCIM)
Data Center Infrastructure Management(DCIM)MD. IFTEKARUL ALAM
 
Web-GIS Based Utility Management System
Web-GIS Based Utility Management SystemWeb-GIS Based Utility Management System
Web-GIS Based Utility Management SystemRabin Ojha
 
Metadata Matters! What it is and How to Manage it
Metadata Matters! What it is and How to Manage itMetadata Matters! What it is and How to Manage it
Metadata Matters! What it is and How to Manage itSafe Software
 
Aviation basic no background
Aviation basic no backgroundAviation basic no background
Aviation basic no backgroundoldcramo2009
 
Hands on experience with aircraft structure fusleage station, wing station nu...
Hands on experience with aircraft structure fusleage station, wing station nu...Hands on experience with aircraft structure fusleage station, wing station nu...
Hands on experience with aircraft structure fusleage station, wing station nu...Mayank Gupta
 
Mission planning and control for UAV's
Mission planning and control for UAV'sMission planning and control for UAV's
Mission planning and control for UAV'sSuthan Rajendran
 
bca final year project drone presentation
bca final year project drone presentationbca final year project drone presentation
bca final year project drone presentationpawanrai68
 
Drones presentation
Drones presentationDrones presentation
Drones presentationAmna Abrar
 
Drone simulators, advancements and challenges
Drone simulators, advancements and challengesDrone simulators, advancements and challenges
Drone simulators, advancements and challengesNile University
 

What's hot (20)

Geospatial Analysis: Innovation in GIS for Better Decision Making
Geospatial Analysis: Innovation in GIS for Better Decision MakingGeospatial Analysis: Innovation in GIS for Better Decision Making
Geospatial Analysis: Innovation in GIS for Better Decision Making
 
Hidrovias, portos e aeroportos - aula 02 - sistema de transporte aéreo jba
Hidrovias, portos e aeroportos - aula 02 - sistema de transporte aéreo jbaHidrovias, portos e aeroportos - aula 02 - sistema de transporte aéreo jba
Hidrovias, portos e aeroportos - aula 02 - sistema de transporte aéreo jba
 
Data Center Infrastructure Management(DCIM)
Data Center Infrastructure Management(DCIM)Data Center Infrastructure Management(DCIM)
Data Center Infrastructure Management(DCIM)
 
Web-GIS Based Utility Management System
Web-GIS Based Utility Management SystemWeb-GIS Based Utility Management System
Web-GIS Based Utility Management System
 
Principles Of Flight
Principles Of FlightPrinciples Of Flight
Principles Of Flight
 
Avionics sai
Avionics saiAvionics sai
Avionics sai
 
Drone technology
Drone technologyDrone technology
Drone technology
 
Metadata Matters! What it is and How to Manage it
Metadata Matters! What it is and How to Manage itMetadata Matters! What it is and How to Manage it
Metadata Matters! What it is and How to Manage it
 
Web mapping
Web mappingWeb mapping
Web mapping
 
Aviation basic no background
Aviation basic no backgroundAviation basic no background
Aviation basic no background
 
Internet GIS
Internet GISInternet GIS
Internet GIS
 
Hands on experience with aircraft structure fusleage station, wing station nu...
Hands on experience with aircraft structure fusleage station, wing station nu...Hands on experience with aircraft structure fusleage station, wing station nu...
Hands on experience with aircraft structure fusleage station, wing station nu...
 
Mission planning and control for UAV's
Mission planning and control for UAV'sMission planning and control for UAV's
Mission planning and control for UAV's
 
bca final year project drone presentation
bca final year project drone presentationbca final year project drone presentation
bca final year project drone presentation
 
Drones presentation
Drones presentationDrones presentation
Drones presentation
 
manual global mapper
manual global mappermanual global mapper
manual global mapper
 
Geodatabases
GeodatabasesGeodatabases
Geodatabases
 
Drone technology
Drone technologyDrone technology
Drone technology
 
Drone simulators, advancements and challenges
Drone simulators, advancements and challengesDrone simulators, advancements and challenges
Drone simulators, advancements and challenges
 
DCIM
DCIMDCIM
DCIM
 

Similar to A Generic Scientific Data Model and Ontology for Representation of Chemical Data

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalWaqas Tariq
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsTom Plasterer
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordJisc
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Laurent Alquier
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesValeria Pesce
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseMicah Altman
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 

Similar to A Generic Scientific Data Model and Ontology for Representation of Chemical Data (20)

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Introduction of Linked Data for Science
Introduction of Linked Data for ScienceIntroduction of Linked Data for Science
Introduction of Linked Data for Science
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
When is a model FAIR – and why should we care?
When is a model FAIR – and why should we care?When is a model FAIR – and why should we care?
When is a model FAIR – and why should we care?
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
 
FAIR data
FAIR dataFAIR data
FAIR data
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 

More from Stuart Chalk

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and unitsStuart Chalk
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structuresStuart Chalk
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...Stuart Chalk
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardStuart Chalk
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic AgeStuart Chalk
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook OntologyStuart Chalk
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataStuart Chalk
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebStuart Chalk
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseStuart Chalk
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Stuart Chalk
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectStuart Chalk
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXStuart Chalk
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Stuart Chalk
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaStuart Chalk
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationStuart Chalk
 
ACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataStuart Chalk
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectStuart Chalk
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationStuart Chalk
 

More from Stuart Chalk (20)

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and units
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral Database
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
ACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility Data
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData Project
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 

Recently uploaded

basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 

Recently uploaded (20)

basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 

A Generic Scientific Data Model and Ontology for Representation of Chemical Data

  • 1. A Generic Scientific Data Model and Ontology for Representation of Chemical Data Stuart J. Chalk, Department of Chemistry University of North Florida schalk@unf.edu CINF Paper 171 – 251st ACS Meeting Spring 2016 #ACSCINFDataSummit
  • 2. Scientific Data Should be Open  Simple: Openness as the norm not the exception  Data made available, without restriction, so its useful  Mechanisms/tools to make data available  Formats to allow others to get the data…  …but also so its easy to use  Annotate the data to make it easy to find  Community driven promotion of and action on this issue
  • 3.  Research Notebook  Spectral Files (JCAMP-DX, propriety)  Excel Spreadsheets  Personal Databases  Online Databases  PDF Files No!  RDF Yes! Resource Description Framework Options for Storing Data?
  • 4.  W3C Recommendation 2015 Specification - https://www.w3.org/TR/ldp/ Primer - https://www.w3.org/TR/ldp-primer/ The Linked Data Platform From: http://www.dataversity.net/introduction-linked-data-platform/
  • 5.  Use JavaScript Object Notation (JSON) as a text format for storing data and metadata so it can be converted to RDF JSON for Linked Data (JSON-LD) { "@context": { "name": "http://schema.org/name", "isAlive": "http://example.org/isAlive", "age": "http://example.org/age", "height": "http://schema.org/height", "@base": "http://www.unf.edu/chemistry/stuart_chalk.aspx" }, "@id": "", "name": "Stuart Chalk", "isAlive": true, "age": 49, "height": 188.0 } http://json-ld.org/playground/
  • 6. JSON for Linked Data (JSON-LD) <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://example.org/age> "49"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://example.org/isAlive> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://schema.org/height> "188"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://schema.org/name> "Stuart Chalk" .
  • 7.  Nice idea but because anything can be linked to anything else to form a graph of variable structure…  ...difficult to search, hard to maintain  OK, use regular relational database – Rigid Schema Not good to try and make data fit the schema…  Use a hybrid approach!  Encode some structure in RDF using a framework...  ...add data to the structured graph in an organized way Store all Scientific Data in RDF?
  • 8.  Consider FAIR Principals (http://www.datafairport.org)  To be Findable:  F1. (meta)data are assigned a globally unique and persistent identifier  F2. data are described with rich metadata (defined by R1 below)  F3. metadata clearly and explicitly include the identifier of the data it describes  F4. (meta)data are registered or indexed in a searchable resource  To be Accessible:  A1. (meta)data are retrievable by their identifier using a standardized communications protocol  A2. metadata are accessible, even when the data are no longer available  To be Interoperable:  I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.  I2. (meta)data use vocabularies that follow FAIR principles  I3. (meta)data include qualified references to other (meta)data  To be Reusable:  R1. meta(data) are richly described with a plurality of accurate and relevant attributes  R1.1. (meta)data are released with a clear and accessible data usage license  R1.2. (meta)data are associated with detailed provenance  R1.3. (meta)data meet domain-relevant community standards What Metadata is Important for Data?
  • 9.  Define scope as data obtained from an experiment, a series of experiments, a project  Who did the work and where are they?  Metadata about the data “packet”  The raw data…  …its associated metadata (enough to properly contextualize the data)  Access rights  Published location What Should a Data Model Represent?
  • 10. General Framework  SciData – Scientific Data Model (SDM)  Overview – http://stuchalk.github.io/scidata/  GitHub Repo – https://github.com/stuchalk/scidata
  • 11. General Framework - The Context  “@context” contains the context definition  Refers to other context files  Namespace abbreviations  Default vocabulary “@vocab”  “@id” links ontology term  “@type” states data type
  • 14. Example Data - Literature Value  “scope” provides internal link to “@id” value  Each value of a name value pair has a default data type that can be override by expanding value to a JSON object and adding “@value” and “@type”
  • 15. Example Data - NMR Spectrum  “dataseries” are JSON arrays of data on one axis  Bring them together with “datagroup” and we can represent at spectrum  “parameter” is generic container for data, or metadata
  • 16. Example Data – CC Calculation  “datagroup”s are structures to aggregate data at any level  “datagroup”s can be infinitely nested  “uid” is optional and can be used to unique define any piece of data
  • 17. The SDM Ontology  SciData Ontology – Scientific Data Model Ontology (SDMO)  OWL File – https://github.com/stuchalk/scidata/b lob/master/ontology/scidata.owl
  • 18.  Get community feedback, refine/extend/standardize  Generate large corpus of disparate data in JSON-LD, ingest into triple store and query (SPARQL)  Evaluate inferencing on the triple store data  Push adoption through collaboration  Run hackathons to build developer implementations  Develop Electronic Laboratory Notebook (ELN) to generate data in JSON-LD  Get feedback from data community, RDA - https://rd-alliance.org/  Test using the NDS - http://www.nationaldataservice.org/ Future Work
  • 19.  Pain Points  Challenges  Opportunities  Normalization  Tools to generate metadata automatically  User Perspective  Gaps in Data  Gaps in Ontology Coverage Pain Points?  Gather stakeholders to work on standards  Broad knowledge domain representation  i-UPAC, RDA Chemistry Research Data IG  Priorities?  Data annotation and representation  Data exchange (repo <-> repo, user <-> user)  Structure representation (chiral centers)  Curation infrastructures  Domain vocabulary translations  Units of measure
  • 20. Reality Check “to err is human; to forgive, divine” Alexander Pope “to err is human; to really screw things up requires a computer” Paul Ehrlich “to err is human; all hell will break loose if you don’t provide accurate semantics to a computer” Stuart Chalk
  • 21.  schalk@unf.edu  Phone: 904-620-1938  Skype: stuartchalk  LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk  ORCID: http://orcid.org/0000-0002-0703-7776  ResearcherID: http://www.researcherid.com/rid/D-8577-2013 Questions?