SlideShare a Scribd company logo
Daniel Jacob – INRA - 2018
How to ensure that open data
works for research
Make your data great again
Daniel Jacob
INRA UMR 1332 BFP – Metabolism Group
Bordeaux Metabolomics Facility
Oct 2018
https://fr.slideshare.net/danieljacob771282/make-your-data-great-now
following
Give an open access to your data
and make them ready to be mined
Open Data for Access and Mining
ODAM Framework
Daniel Jacob – INRA - 2018
Develop if needed, lightweight tools
- R scripts, lightweight GUI (R shiny)
Minimal effort, Maximal efficiency
…
Use existing tools
- Spreadsheets, R studio,
BioStatFlow, Galaxy,
Cytoscape, …
Data
Format
TSV
EDTMS
ODAM
F
A
INTEROPERABLE
R
Experiment
Data Tables
2 metadata files
+
Research question  Project  Experiment  Experimental set-up
 Data emancipation
regarding Tools
Data API  Tools
DataTools
https://fr.slideshare.net/danieljacob771282/make-your-data-great-now
following
Daniel Jacob – INRA - 2018
Develop if needed, lightweight tools
- R scripts, lightweight GUI (R shiny)
…
Use existing tools
- Spreadsheets, R studio,
BioStatFlow, Galaxy,
Cytoscape, …
Data
Format
TSV
Multi-species
Data Integration
Data integration
Towards Linked Data
Phenotype Information System
EDTMS
ODAM
F
A
INTEROPERABLE
R
« Plant Physiology and Metabolism»
https://www.quora.com/What-is-plant-physiology-and-metabolism
« Plant Growth»
Daniel Jacob – INRA - 2018
http://cgi.di.uoa.gr/~pms509/past_lectures/introduction-to-rdf.pdf
EDTMS
ODAM Resource Description Framework (RDF)
Daniel Jacob – INRA - 2018
s_subsets.tsv This metadata file allows to associate a key concept to each data subset file
Creation of the metadata files - Subsets
EDTMS
ODAM
Optional:
an annotation based on
ontology
CV Term
X
…
Optional:
an annotation based
on ontology
Plants
Harvests
Samples
Compounds
…
a_attributes.csv This metadata file allows each attribute (variable) to be annotated with some minimal but relevant metadata
CV Term
X
Resource Description Framework (RDF)
Daniel Jacob – INRA - 2018
Data / Metadata
Entities
Attributes
categories
subsets CV Term
s_subsets.tsv
a_attributes.tsv
CV Term ?
attributes CV Term
EDTMS
ODAM Resource Description Framework (RDF)
Daniel Jacob – INRA - 2018
Data / Metadata
Entities
Attributes
attributes CV Term
subsets CV Term
s_subsets.tsv
a_attributes.tsv
CV Term
Entity + Attribute = Trait
Trait (characteristic / feature)
categories
EDTMS
ODAM Resource Description Framework (RDF)
Daniel Jacob – INRA - 2018
TO
Plant Trait
Ontology EO
Plant Env.
Ontology
PO
Plant
Structure &
Dev. Stage
Ontology
CHEBI
Ontology
GO
Ontology
…
TO
EO
PO
Entity + Attribute = Trait
Trait (characteristic / feature)
Plant Trait Ontology
as the core / kernel of all ontologies
http://agroportal.lirmm.fr/ontologies
EDTMS
ODAM Resource Description Framework (RDF)
« Plant Physiology and Metabolism»
« Plant Growth»
Daniel Jacob – INRA - 2018
factor
quantitative
qualitative
identifier
categories
Plants
Compounds
Enzymes
Harvests
Samples
plants.tsv
PlanteID
harvests.tsv
Lot samples.tsv
SampleID
compounds.tsv
enzymes.tsv
SampleID
SampleID
Entities
TO
Plant Trait
Ontology
EO
Plant Env.
Ontology
PO
Plant Structure &
Dev. Stage
Ontology
GO
Ontology
CHEBI
Ontology
…
Attributes CV Term
CV Term
CV Term
http://agroportal.lirmm.fr/ontologies
CV Term
EDTMS
ODAM
a TBox is a "terminological component“
a conceptualization associated with a set of facts
TBox
Reference ontologies
Resource Description Framework (RDF)
Daniel Jacob – INRA - 2018
Data / Metadata
Category CV Term
Entities
Attributes
Typical queries:
Search for a particular Trait
Entity + Attribute = Trait
CV Term
Attribute Subset
CV Term
Category Species
EDTMS
ODAM
an ABox is an "assertion component“
a fact associated with a conceptual model or ontologies within a knowledge base.
ABox
Application ontologies
Resource Description Framework (RDF)
Daniel Jacob – INRA - 2018
factor
quantitative
qualitative
identifier
rdfs:range
categories
For each
Dataset
RDF
Schema
rdfs:label
<description>
rdfs:label
<description>
#description
Attributes Subsets
attribute
node
subset
node
rdf:type rdf:type
rdf:Bag
xsd:stringxsd:string
Attribute Entity
#hasEntity
#hasAttribute
Category Species
#hasCategory #hasSpecies
#description
#hasCategory
xsd:string
TO
EO
PO
CHE
BI
GO
…
Taxo
n
rdf:resource
rdf:resource
…
xsd:string
rdf:resource
CV Term
Abox - Application ontologies
Tbox - Reference ontologies
EDTMS
ODAM
https://schema.org/Dataset
measurementTechniquevariableMeasured
Resource Description Framework (RDF)
Daniel Jacob – INRA - 2018
Category CV Term
Entities
Attributes
Data / Metadata
Traits
Values
Phenotype (observed)
=
Traits + Values
Towards a Phenotype Information System
Automatic populating of the knowledge base
from the metadata files
defined within ODAM data subsets
Attributes Subsets
attribute
node
subset
node
rdf:type rdf:type
Attribute Entity
#hasEntity
#hasAttribute
Category Species
#hasCategory #hasSpecie
s
EDTMS
ODAM
Daniel Jacob – INRA - 2018
Fruit + weight = Fruit weightTrait
Constraint
and
Species = Tomato
Typical queries:
Search for a particular Trait
with or without Constraints
hasSynonym Tomato
Towards a Phenotype Information System
Attributes
Entities
EDTMS
ODAM
Daniel Jacob – INRA - 2018
Fruit + weight = Fruit weightTrait
Constraint
and
Species = Tomato
Typical queries:
Search for a particular Trait
with or without Constraints
Phenotype (observed)
=
(Entity + Attribute) + Values
Towards a Phenotype Information SystemEDTMS
ODAM
Daniel Jacob – INRA - 2018
Category CV Term
Entities
Attributes
Data mapping
Values
Data capture
EDTMS
Entity + Attribute = Trait
Trait (characteristic / feature)
Attributes Subsets
attribute
node
subset
node
rdf:type rdf:type
Attribute Entity
#hasEntity
#hasAttribute
Category Species
#hasCategory #hasSpecies
Data linking
Develop if needed, lightweight tools
- R scripts (Galaxy), lightweight GUI (R shiny)
EDTMS
ODAM
Daniel Jacob – INRA - 2018
Category CV Term
Entities
Attributes
Data mapping
Values
Data capture
EDTMS
Phenotype
(observed)
=
Traits + Values
Data Exploration
Entity + Attribute = Trait
Trait (characteristic / feature)
Towards a Phenotype
Information System
Attributes Subsets
attribute
node
subset
node
rdf:type rdf:type
Attribute Entity
#hasEntity
#hasAttribute
Category Species
#hasCategory #hasSpecies
Data linking
Data = Phenotypic data +
Molecular data +
Environment data
Phenotypic metadata =
Descriptors of Traits
(Entity-Attribute) +
Environment Factors
Data accumulation

Knowledge Base
EDTMS
ODAM
Daniel Jacob – INRA - 2018
Bayes' theorem, the general formula:
y : data  : parameters
[ y,  ] = [ y |  ].[ ] = [ | y].[y]
Where [.] means a density or a probability
Posterior density
or simply the so-
called “posterior”
Prior density of  or simply the
so-called “prior”
Likelihood (function of  )
Marginal density
(data, model)
Model-Based Bayesian Inference:
Data mining
Phenotype
Information
System
Ex : model for
phenotypic variance and
biomass prediction (Y)
based on environmental
parameters ( )
Machine
Learning
« Plant Growth»
Daniel Jacob – INRA - 2018
Make your data great again
 Metadata : not just on the "top"
linked to datasets but more
deeply linked to the variables.
The data management system becomes completely
independent of data usage.
One dataset  Several applications
&
One application  Several datasets
Making open data work for research
Data accumulation

Knowledge Base
 Keep data “alive” into the data process loop
 to similar way as for DNA/Protein
sequences where sequences can be
integrated into annotation pipelines.
Machine Learning
Model-Based Bayesian Inference:

More Related Content

What's hot

PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
Araport
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII
Vivek Krishnakumar
 
2015 09 rda-pre-meeting_jk
2015 09 rda-pre-meeting_jk2015 09 rda-pre-meeting_jk
2015 09 rda-pre-meeting_jk
Johannes Keizer
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
Mark Wilkinson
 
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
dgarijo
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Mark Wilkinson
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
Carole Goble
 
SWAT4LS 2014 SLIDE by Yamamoto
SWAT4LS 2014 SLIDE by YamamotoSWAT4LS 2014 SLIDE by Yamamoto
SWAT4LS 2014 SLIDE by Yamamoto
yayamamo @ DBCLS Kashiwanoha
 
ICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes ChanICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes Chan
Araport
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
dgarijo
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
Pablo Pareja Tobes
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
dgarijo
 
Kampmeier ecn 2012
Kampmeier ecn 2012Kampmeier ecn 2012
Kampmeier ecn 2012
ECNOfficer
 
2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet
Araport
 
Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015
Araport
 

What's hot (20)

PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII
 
2015 09 rda-pre-meeting_jk
2015 09 rda-pre-meeting_jk2015 09 rda-pre-meeting_jk
2015 09 rda-pre-meeting_jk
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 
SWAT4LS 2014 SLIDE by Yamamoto
SWAT4LS 2014 SLIDE by YamamotoSWAT4LS 2014 SLIDE by Yamamoto
SWAT4LS 2014 SLIDE by Yamamoto
 
ICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes ChanICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes Chan
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
Kampmeier ecn 2012
Kampmeier ecn 2012Kampmeier ecn 2012
Kampmeier ecn 2012
 
2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet
 
Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015
 

Similar to Make your data great again - Ver 2

FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
Mark Wilkinson
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Takeshi Morita
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
Mark Wilkinson
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
Daniel JACOB
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
WU (Vienna University of Economics and Business)
 
New Directions in Metadata
New Directions in MetadataNew Directions in Metadata
New Directions in Metadata
suyu22
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
Tony Hammond
 
Preservation Metadata
Preservation MetadataPreservation Metadata
Preservation Metadata
DigitalPreservationEurope
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
Carole Goble
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
ICZN
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Mark Wilkinson
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
Fabien Gandon
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
CIARD Movement
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
datascienceiqss
 
Make your data great now
Make your data great nowMake your data great now
Make your data great now
Daniel JACOB
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Mark Wilkinson
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
Roberto García
 
DB and IR Integration
DB and IR IntegrationDB and IR Integration
DB and IR Integration
Marco A Torres
 

Similar to Make your data great again - Ver 2 (20)

FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
New Directions in Metadata
New Directions in MetadataNew Directions in Metadata
New Directions in Metadata
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
 
Preservation Metadata
Preservation MetadataPreservation Metadata
Preservation Metadata
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 
Make your data great now
Make your data great nowMake your data great now
Make your data great now
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
 
DB and IR Integration
DB and IR IntegrationDB and IR Integration
DB and IR Integration
 

Recently uploaded

CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Tissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptxTissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptx
muralinath2
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 

Recently uploaded (20)

CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Tissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptxTissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptx
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 

Make your data great again - Ver 2

  • 1. Daniel Jacob – INRA - 2018 How to ensure that open data works for research Make your data great again Daniel Jacob INRA UMR 1332 BFP – Metabolism Group Bordeaux Metabolomics Facility Oct 2018 https://fr.slideshare.net/danieljacob771282/make-your-data-great-now following Give an open access to your data and make them ready to be mined Open Data for Access and Mining ODAM Framework
  • 2. Daniel Jacob – INRA - 2018 Develop if needed, lightweight tools - R scripts, lightweight GUI (R shiny) Minimal effort, Maximal efficiency … Use existing tools - Spreadsheets, R studio, BioStatFlow, Galaxy, Cytoscape, … Data Format TSV EDTMS ODAM F A INTEROPERABLE R Experiment Data Tables 2 metadata files + Research question  Project  Experiment  Experimental set-up  Data emancipation regarding Tools Data API  Tools DataTools https://fr.slideshare.net/danieljacob771282/make-your-data-great-now following
  • 3. Daniel Jacob – INRA - 2018 Develop if needed, lightweight tools - R scripts, lightweight GUI (R shiny) … Use existing tools - Spreadsheets, R studio, BioStatFlow, Galaxy, Cytoscape, … Data Format TSV Multi-species Data Integration Data integration Towards Linked Data Phenotype Information System EDTMS ODAM F A INTEROPERABLE R « Plant Physiology and Metabolism» https://www.quora.com/What-is-plant-physiology-and-metabolism « Plant Growth»
  • 4. Daniel Jacob – INRA - 2018 http://cgi.di.uoa.gr/~pms509/past_lectures/introduction-to-rdf.pdf EDTMS ODAM Resource Description Framework (RDF)
  • 5. Daniel Jacob – INRA - 2018 s_subsets.tsv This metadata file allows to associate a key concept to each data subset file Creation of the metadata files - Subsets EDTMS ODAM Optional: an annotation based on ontology CV Term X … Optional: an annotation based on ontology Plants Harvests Samples Compounds … a_attributes.csv This metadata file allows each attribute (variable) to be annotated with some minimal but relevant metadata CV Term X Resource Description Framework (RDF)
  • 6. Daniel Jacob – INRA - 2018 Data / Metadata Entities Attributes categories subsets CV Term s_subsets.tsv a_attributes.tsv CV Term ? attributes CV Term EDTMS ODAM Resource Description Framework (RDF)
  • 7. Daniel Jacob – INRA - 2018 Data / Metadata Entities Attributes attributes CV Term subsets CV Term s_subsets.tsv a_attributes.tsv CV Term Entity + Attribute = Trait Trait (characteristic / feature) categories EDTMS ODAM Resource Description Framework (RDF)
  • 8. Daniel Jacob – INRA - 2018 TO Plant Trait Ontology EO Plant Env. Ontology PO Plant Structure & Dev. Stage Ontology CHEBI Ontology GO Ontology … TO EO PO Entity + Attribute = Trait Trait (characteristic / feature) Plant Trait Ontology as the core / kernel of all ontologies http://agroportal.lirmm.fr/ontologies EDTMS ODAM Resource Description Framework (RDF) « Plant Physiology and Metabolism» « Plant Growth»
  • 9. Daniel Jacob – INRA - 2018 factor quantitative qualitative identifier categories Plants Compounds Enzymes Harvests Samples plants.tsv PlanteID harvests.tsv Lot samples.tsv SampleID compounds.tsv enzymes.tsv SampleID SampleID Entities TO Plant Trait Ontology EO Plant Env. Ontology PO Plant Structure & Dev. Stage Ontology GO Ontology CHEBI Ontology … Attributes CV Term CV Term CV Term http://agroportal.lirmm.fr/ontologies CV Term EDTMS ODAM a TBox is a "terminological component“ a conceptualization associated with a set of facts TBox Reference ontologies Resource Description Framework (RDF)
  • 10. Daniel Jacob – INRA - 2018 Data / Metadata Category CV Term Entities Attributes Typical queries: Search for a particular Trait Entity + Attribute = Trait CV Term Attribute Subset CV Term Category Species EDTMS ODAM an ABox is an "assertion component“ a fact associated with a conceptual model or ontologies within a knowledge base. ABox Application ontologies Resource Description Framework (RDF)
  • 11. Daniel Jacob – INRA - 2018 factor quantitative qualitative identifier rdfs:range categories For each Dataset RDF Schema rdfs:label <description> rdfs:label <description> #description Attributes Subsets attribute node subset node rdf:type rdf:type rdf:Bag xsd:stringxsd:string Attribute Entity #hasEntity #hasAttribute Category Species #hasCategory #hasSpecies #description #hasCategory xsd:string TO EO PO CHE BI GO … Taxo n rdf:resource rdf:resource … xsd:string rdf:resource CV Term Abox - Application ontologies Tbox - Reference ontologies EDTMS ODAM https://schema.org/Dataset measurementTechniquevariableMeasured Resource Description Framework (RDF)
  • 12. Daniel Jacob – INRA - 2018 Category CV Term Entities Attributes Data / Metadata Traits Values Phenotype (observed) = Traits + Values Towards a Phenotype Information System Automatic populating of the knowledge base from the metadata files defined within ODAM data subsets Attributes Subsets attribute node subset node rdf:type rdf:type Attribute Entity #hasEntity #hasAttribute Category Species #hasCategory #hasSpecie s EDTMS ODAM
  • 13. Daniel Jacob – INRA - 2018 Fruit + weight = Fruit weightTrait Constraint and Species = Tomato Typical queries: Search for a particular Trait with or without Constraints hasSynonym Tomato Towards a Phenotype Information System Attributes Entities EDTMS ODAM
  • 14. Daniel Jacob – INRA - 2018 Fruit + weight = Fruit weightTrait Constraint and Species = Tomato Typical queries: Search for a particular Trait with or without Constraints Phenotype (observed) = (Entity + Attribute) + Values Towards a Phenotype Information SystemEDTMS ODAM
  • 15. Daniel Jacob – INRA - 2018 Category CV Term Entities Attributes Data mapping Values Data capture EDTMS Entity + Attribute = Trait Trait (characteristic / feature) Attributes Subsets attribute node subset node rdf:type rdf:type Attribute Entity #hasEntity #hasAttribute Category Species #hasCategory #hasSpecies Data linking Develop if needed, lightweight tools - R scripts (Galaxy), lightweight GUI (R shiny) EDTMS ODAM
  • 16. Daniel Jacob – INRA - 2018 Category CV Term Entities Attributes Data mapping Values Data capture EDTMS Phenotype (observed) = Traits + Values Data Exploration Entity + Attribute = Trait Trait (characteristic / feature) Towards a Phenotype Information System Attributes Subsets attribute node subset node rdf:type rdf:type Attribute Entity #hasEntity #hasAttribute Category Species #hasCategory #hasSpecies Data linking Data = Phenotypic data + Molecular data + Environment data Phenotypic metadata = Descriptors of Traits (Entity-Attribute) + Environment Factors Data accumulation  Knowledge Base EDTMS ODAM
  • 17. Daniel Jacob – INRA - 2018 Bayes' theorem, the general formula: y : data  : parameters [ y,  ] = [ y |  ].[ ] = [ | y].[y] Where [.] means a density or a probability Posterior density or simply the so- called “posterior” Prior density of  or simply the so-called “prior” Likelihood (function of  ) Marginal density (data, model) Model-Based Bayesian Inference: Data mining Phenotype Information System Ex : model for phenotypic variance and biomass prediction (Y) based on environmental parameters ( ) Machine Learning « Plant Growth»
  • 18. Daniel Jacob – INRA - 2018 Make your data great again  Metadata : not just on the "top" linked to datasets but more deeply linked to the variables. The data management system becomes completely independent of data usage. One dataset  Several applications & One application  Several datasets Making open data work for research Data accumulation  Knowledge Base  Keep data “alive” into the data process loop  to similar way as for DNA/Protein sequences where sequences can be integrated into annotation pipelines. Machine Learning Model-Based Bayesian Inference:

Editor's Notes

  1. Trait vs Phenotype Entity + Attribute = Trait (observable) Entity + (Attribute + Value) = Phenotype (observed)
  2. an ABox is an "assertion component"—a fact associated with a terminological vocabulary within a knowledge base
  3. TBox statements describe a system in terms of controlled vocabularies, for example, a set of classes and properties. ABox are TBox-compliant statements about that vocabulary.
  4. Questions types: Quel est l’ensemble des “Traits” (quantitative/qualitative) pour un échantillon (identifiant) donné ? Quel est l’ensemble des “Traits” (quantitative/qualitative) pour un ou plusieurs CV donnés { type de subsets: ex: CV subset in (metabolite,enzyme)(CHEBI) ; type d’attribut: ex CV attribute ==tissue == “fruit pericarp” (PO) }, avec ou sans contrainte suppl. Ex : type de factor