Odam: Open Data, Access and Mining

Daniel JACOB
Daniel JACOBResearch Engineer at INRA
Give an open access to your data
and make them ready to be mined
Daniel Jacob
UMR 1332 BFP – Metabolism Group
Bordeaux Metabolomics Facility
May 2016
Open Data for Access and Mining
A data explorer as bonus
EDTMS
ODAM
Daniel Jacob – INRA UMR 1332 –May 2016
The experimental context: needs / wishesseeding harvesting
samples
preparation
samples analysis
Sample
identifiers
2
Experiment
Data Tables
Experiment Design
Web API
Develop if needed, lightweight tools
- R scripts (Galaxy), lightweight GUI
(R shiny)
Make both metadata and data
available for data mining
identifiers centrally
managed
data sharing & data availability
facilitate the subsequent
data mining
1
2
3
EDTMS
ODAM Open Data for Access and Mining : The core idea in one shot
Daniel Jacob – INRA UMR 1332 –May 2016
Data repository
Data capture Minimal effort (PUT)
PUT
myhost.org
http://myhost.org/
mount
GET
Implementation of an
Experiment Data Tables Management System
(EDTMS)
Experiment
Data Tables
Merely dropping data files in a data
repository (e.g. a local NAS or distant
storage space) should allow users to
access them by web API
Data can be downloaded,
explored and mined
No database schema, no programming code and no additional configuration on the server side.
Open Data for Access and Mining : The core idea in one shot
EDTMS
ODAM
3
Daniel Jacob – INRA UMR 1332 –May 2016
plants.tsv
harvests.tsv
samples.tsv
compounds.tsv
Data subset files
enzymes.tsv
• Whatever the kind of experiment, this assumes a design of experiment
(DoE) involving individuals, samples or whatever things, as the main
objects of study (e.g. plants, tissues, bacteria, …)
• This also assumes the observation of dependent variables resulting of
effects of some controlled experimental factors.
• Moreover, the objects of study have usually an identifier for each of
them, and the variables can be quantitative or qualitative.
• We can have either one object type of study or several kinds, but in
this latter case, it must exist a relationship between object types that
we assume of “obtainedFrom" type.
Preparation and cleaning of the data sub-sets of files
EDTMS
ODAM
4
Daniel Jacob – INRA UMR 1332 –May 2016
plants.tsv
harvests.tsv
samples.tsv
compounds.tsv
Classification of each column within its right category
enzymes.tsv
Data subset files
factor
quantitative
qualitative
identifier
link
categories
EDTMS
ODAM
5
Data subsets files and their associated metadata files must be compliant
with the TSV standard (Tab-Separator-Values)
• You have to organize your data subsets so that links could be established between them.
• In practical, it means to add a column containing the identifiers corresponding to the entity
to which you want to connect the subset, implying a ‘obtainedFrom’ relation.
• It is to be noted that this duplication of identifiers must be the only redundant
information, through all data subsets.
Daniel Jacob – INRA UMR 1332 –May 2016
plants.tsv harvests.tsv
samples.tsv
enzymes.tsv
Data subset files
compounds.tsv
Plants
Harvests
Samples
Compounds
Enzymes
Connections between the dataset files based on identifiers
Entities
(concepts)
Link between 2 subsets being carried out from identifiers
(implies a ‘obtainedFrom’ relation)
Identifier of the central entity of the subset
EDTMS
ODAM
factor
quantitative
qualitative
identifier
link
categories
6
Daniel Jacob – INRA UMR 1332 –May 2016
Supplementary files
In order to allow data to be explored and mined, we have to adjoin some
minimal but relevant metadata:
For that, 2 metadata files are required
• s_subsets.tsv: a file allowing to associate with each subset of data a key
concept corresponding to the main entity of the subset and the relations
of the type "obtainedFrom" between these concepts
• a_attributes.tsv: a metadata file allowing each attribute
(concept/variable) to be annotated with some minimal but relevant
metadata
Creation of the metadata files
EDTMS
ODAM
7
Data subsets files and their associated metadata files must be compliant with the TSV standard (Tab-Separator-Values)Note:
TSV is an alternative to the common comma-separated values (CSV) format, which often causes difficulties because of the need to escape commas
Daniel Jacob – INRA UMR 1332 –May 2016
s_subsets.tsv This metadata file allows to associate a key concept to each data subset file
Creation of the metadata files
EDTMS
ODAM
8
Plants
Compounds
Enzymes
Harvests
Samples
plants.tsv
PlanteID
harvests.tsv
Lot samples.tsv
SampleID
compounds.tsv
enzymes.tsv
SampleID
SampleID
1
2
3
4
5
Identifier of the central entity of the subset
Link between 2 subsets (implies a ‘obtainedFrom’ relation)
Unique rank number of the data subset
Key concept (i.e. the main entity) associated to the subset in the form of a short name
Plants1
factor
quantitative
qualitative
identifier
categories
PlanteID plants.tsv
Data file name
Daniel Jacob – INRA UMR 1332 –May 2016
a_attributes.tsv This metadata file allows each attribute (variable) to be annotated with
some minimal but relevant metadata
Creation of the metadata files
EDTMS
ODAM
9
factor
quantitative
qualitative
identifier
categories
Plants
Harvests
Samples
Compounds
…
…
Daniel Jacob – INRA UMR 1332 –May 2016
s_subsets.tsv
a_attributes.tsv
…
…
Additional subsets/ attributes can be
added step by step, as soon as data
are produced.
Updating the metadata files
EDTMS
ODAM
Daniel Jacob – INRA UMR 1332 –May 2016
Uploading your datasets in the data repository
EDTMS
ODAM
No database schema, no programming code and no additional configuration on the server side.
Your data subset files
Your dataset entry (named
‘frim1’ as example) within
the data repository
Z: (Storage)
Merely dropping data files on the data repository (e.g. NAS) should allow
users to access them by web API
Data subsets files and their
associated metadata files must be
compliant with the TSV standard
(Tab-Separator-Values)
Data repository
PUT
myhost.orgmount
GET
Data capture
Minimal effort (PUT)
Daniel Jacob – INRA UMR 1332 –May 2016
http://myhost.org/check/frim1
myhost.org
StorageDataRepos
NAS
Checking online if your the data subset files are consistent
EDTMS
ODAM
Many test checks can
be automatically
done for you
Daniel Jacob – INRA UMR 1332 –May 2016
EDTMS
ODAM
Data storage
seeding
harvesting samples analysis
samples
preparation
13
GET
, maximal efficiency (GET)
After depositing your complete dataset as described previously:
• An open access is given to your data through web API
• They are ready to be mined
• No specific code or additional configuration are needed (*) https://www.erasysbio.net/index.php?index=266
minimal effort (PUT)
PUT
Format
TSV
Data
Data Linking
Preparation and cleaning of the data sub-sets of files
FRIM1(*)
Check
Open Data, Access and Mining : web API
Daniel Jacob – INRA UMR 1332 –May 2016
Data
Format
TSV
EDTMS
ODAM
Data linking
Open Data, Access and Mining : web API
REST Services: hierarchical tree of resource naming (URL)
Retrieving data
Retrieving metadata
<data format>
<dataset name>
<subset>
(<subset>)
<entry><category>
<value> <value> <value>
<entry>
GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … >
factor
quantitative
qualitative
identifier
link
categories
FRIM1 (*)
xml/tsv/json
frim1
14
(*) https://doi.org/10.5281/zenodo.154041
Daniel Jacob – INRA UMR 1332 –May 2016
EDTMS
ODAM Open Data, Access and Mining : web API
REST Services: hierarchical tree of resource naming (URL)
15
GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … >
Field Description Examples
<data format> format of the retrieved data; possible values are: 'xml' or 'csv' xml
<dataset name> Short name (tag) of your dataset frim1
<subset> Short name of a data subset samples
<entry> Name of an attribute entry (defined by the user in the a_attribute file
(column ‘entry’)
sampleid
<category> Name of the attribute category; (assigned by the user in the a_attribute file
(column ‘category’)
possible values are: ‘identifier’, ‘factor’, ‘qualitative’, ‘quantitative’
quantitative
(<subset>) Set of data subsets by merging all the subsets with lower rank than the
specified subset and following the pathway defined by the "is_part_of"
links.
(samples) 
plants + harvests
+ samples
<value> Exact value of the desired entry or category 1, factor
Daniel Jacob – INRA UMR 1332 –May 2016
EDTMS
ODAM Open Data, Access and Mining : web API
REST Services: hierarchical tree of resource naming (URL)
16
GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … >
http://myhost.org/getdata/<data format>/<dataset name>/<subset>/<entry>/<value>
http://myhost.org/getdata/<data format>/<dataset name>/(<subset>)/<category>
http://myhost.org/getdata/<data format>/<dataset name>
http://myhost.org/getdata/<data format>/<dataset name>/(<subset>)/<entry>/<value>
http://myhost.org/getdata/<data format>/<dataset name>/<subset>
http://myhost.org/getdata/<data format>/<dataset name>/(<subset>)
• Get the subset list of a dataset
• Get all values within a data subset
• Get values within a data subset for a specific value of an entry
• Get all values within a set of data subsets
• Get values within a set of data subsets for a specific value of an entry
• Get the attribute list within a set of data subsets for a specific category
Daniel Jacob – INRA UMR 1332 –May 2016
http://myhost.org/getdata/xml/frim1 http://myhost.org/getdata/xml/frim1/plants
http://myhost.org/getdata/xml/frim1/harvests/lot/1
http://myhost.org/getdata/xml/frim1/(compounds)/quantitative
Metadata
Metadata
Data
Data
Open Data Access via web API: Examples based on FRIM1
EDTMS
ODAM
FRIM1
17
Daniel Jacob – INRA UMR 1332 –May 2016
http://myhost.org/getdata/xml/frim1/(samples)/treatment/Control
Set of data subsets by merging all the subsets with lower rank than the specified
subset and following the pathway defined by the “obtainedFrom" links.
(samples)  plants + harvests + samples
Open Data Access via web API: Examples based on FRIM1
EDTMS
ODAM
FRIM1
18
Daniel Jacob – INRA UMR 1332 –May 2016
Data
Format
TSV
minimal effort, maximal efficiency
EDTMS
ODAM
Data linking
Open Data Access via web API: Application layer
FRIM1
19
…
Use existing tools
- Spreadsheets, R studio,
BioStatFlow, Galaxy,
Cytoscape, …
Daniel Jacob – INRA UMR 1332 –May 2016
Retrieving Data within R
Open Data Access via web API: Application layer
The R package
Rodam
EDTMS
ODAM
20
Daniel Jacob – INRA UMR 1332 –May 2016
Open Data Access via web API Rodam package
21
<data format>
<dataset name>
<subset>
(<subset>)
<entry><category>
<value> <value> <value>
<entry>
tsv
frim1
samples
sample
365
GET http://www.bordeaux.inra.fr/pmb/getdata/tsv/frim1/(samples)/sample/365
Daniel Jacob – INRA UMR 1332 –May 2016
Open Data Access via web API
Read metadata
i.e. category types within the data
Get the data subset ‘activome’
along with its metadata
22
<data format>
<dataset name>
<subset>
(<subset>)
<entry>
<category>
<value>
<value>
<entry>
tsv
frim1
activome
factor
GET http://www.bordeaux.inra.fr/pmb/getdata/tsv/frim1/(activome)/factor
Rodam package
Daniel Jacob – INRA UMR 1332 –May 2016
Open Data Access via web API
23
Rodam package
Daniel Jacob – INRA UMR 1332 –May 2016
Data / Metadata
Data Mining
?
Make both
metadata and data
available for
data mining.
Experimentation
/ Analysis
MFA
rCCA
pLDA
…
Open Data Access via web API
activome qNMR_metabo
Water StressControl
ODAM facilitates the subsequent data mining
All Dev. Stages
All Treatments
ODAM facilitates the subsequent data mining
(log10 transformed)
24
Rodam package
Daniel Jacob – INRA UMR 1332 –May 2016
Develop if needed, lightweight tools
- R scripts (Galaxy), lightweight GUI (R shiny)
minimal effort, maximal efficiency
…
Use existing tools
- Spreadsheets, R studio,
BioStatFlow, Galaxy,
Cytoscape, …
EDTMS
ODAM
Data
Format
TSV
Data linking
Open Data Access via web API: Application layer
FRIM1
25
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
26
http://www.bordeaux.inra.fr/pmb/dataexplorer/?ds=frim1
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
27
http://www.bordeaux.inra.fr/pmb/dataexplorer/?ds=frim1
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
28
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
29
To remove an item
from the selection: i)
click on it, and then
ii) click on the
‘Suppr’ key
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
30
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
31
Explore several
possibilities by
interacting with
the graph
Daniel Jacob – INRA UMR 1332 –May 2016
To summarize
1. Preparation and cleaning of the data sub-sets of files
2. Classification of each column within its right category
3. Connections between the dataset files based on identifiers
4. Creation of the definition files namely s_subsets.tsv and a_attributes.tsv
5. Deposit of the dataset files in the data repository
6. Checking online if your the data subset files are consistent
7. Testing online the web-services on your dataset
8. Use of the web API through an application layer (R scripts, data explorer, ... )
EDTMS
ODAM
Data subsets files and their associated metadata files must be
compliant with the TSV standard (Tab-Separator-Values)
Note:
TSV is an alternative to the common comma-separated values (CSV) format, which often causes difficulties because of the need to escape commas
(See https://en.wikipedia.org/wiki/Tab-separated_values)
Daniel Jacob – INRA UMR 1332 –May 2016
Advantages of this approach
data sharing & data availability
- The array of the "plants" may be created even before planting the seeds.
- Similarly, the array of the "harvests" can be created as soon as the harvests are done,
and this before any analysis.
- Thus, these arrays are generated only once in the project and we can set up the
sharing soon the seed planting. Then each analysis comes to complement the set of
data as soon as they produce their own sub-dataset.
- data are accessible to everyone as soon as they are produced,
identifiers centrally managed
- data are archived and compiled, so that it becomes useless to proceed a laborious
investigation to find out who possesses the right identifiers, etc.
EDTMS
ODAM
seeding harvesting samples analysis
Sample
identifiers
samples
preparation
Daniel Jacob – INRA UMR 1332 –May 2016
Advantages of this approach
facilitate the subsequent publication of data
- data are already readily available online by web API,
- But nothing prevents to take this data to fill in existing databases, by adjoining more
elaborate annotations.
- Neither administrator privileges nor any programmatic skills are required
EDTMS
ODAM
Data
Format
TSV
Data linking
PUT
GET
Data capture
Minimal effortData analysis/mining
Maximum efficiency
Daniel Jacob – INRA UMR 1332 –May 2016
minimal effort, maximum efficiency
Format the data
- Based on TSV: choice to keep the good old way of scientist to use
worksheets, thus i) using the same tool for both data files and metadata
definition files, ii) no programmatic skill are required
Give an access through a web services layer
- based on current standards (REST)
Use existing tools
- Spreadsheets, R studio, BioStatFlow, Galaxy, Cytoscape, …
Develop if needed, lightweight tools
- R scripts, lightweight GUI (R shiny)
Advantages of this approach
biostatflow.org
EDTMS
ODAM
Daniel Jacob – INRA UMR 1332 –May 2016
Have a good fun !!
Daniel Jacob
UMR 1332 BFP – Metabolism Group
Bordeaux Metabolomics Facility
May 2016
Open Data for Access and Mining
https://hub.docker.com/r/odam/getdata/
http://www.bordeaux.inra.fr/pmb/dataexplorer/
https://github.com/INRA/ODAM
https://cran.r-project.org/package=Rodam
https://zenodo.org/record/154041
An online example
1 of 36

Recommended

Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap by
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
7.6K views51 slides
Data Mining Concepts by
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
59K views40 slides
Data mining presentation.ppt by
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
5.8K views23 slides
Data mining concepts and work by
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
1.2K views31 slides
03 data mining : data warehouse by
03 data mining : data warehouse03 data mining : data warehouse
03 data mining : data warehouseInstitute of Technology Telkom
4.2K views58 slides
Classification and prediction in data mining by
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
11.1K views27 slides

More Related Content

What's hot

Introduction To Data Mining by
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
3.1K views79 slides
02 Data Mining by
02 Data Mining02 Data Mining
02 Data MiningInstitute of Technology Telkom
2.6K views78 slides
Cluster2 by
Cluster2Cluster2
Cluster2work
811 views41 slides
data mining and data warehousing by
data mining and data warehousingdata mining and data warehousing
data mining and data warehousingSunny Gandhi
8.7K views34 slides
Introduction to Data Mining by
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningDataminingTools Inc
3.7K views18 slides
Data miningppt378 by
Data miningppt378Data miningppt378
Data miningppt378nitttin
868 views31 slides

What's hot(20)

Introduction To Data Mining by Phi Jack
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack3.1K views
Cluster2 by work
Cluster2Cluster2
Cluster2
work811 views
data mining and data warehousing by Sunny Gandhi
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
Sunny Gandhi8.7K views
Data miningppt378 by nitttin
Data miningppt378Data miningppt378
Data miningppt378
nitttin868 views
Introduction to Datamining Concept and Techniques by Sơn Còm Nhom
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
Sơn Còm Nhom858 views
Data Mining Concepts and Techniques by Pratik Tambekar
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and Techniques
Pratik Tambekar977 views
introduction to data mining tutorial by Salah Amean
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
Salah Amean16K views
Data Warehouse and Data Mining by Ranak Ghosh
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data Mining
Ranak Ghosh2.6K views
Dwdmunit1 a by bhagathk
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk3K views
Cssu dw dm by sumit621
Cssu dw dmCssu dw dm
Cssu dw dm
sumit6211.1K views
Introduction to Data mining by Hadi Fadlallah
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
Hadi Fadlallah2.1K views
Database by sumit621
DatabaseDatabase
Database
sumit621804 views

Viewers also liked

How I data mined my text message history by
How I data mined my text message historyHow I data mined my text message history
How I data mined my text message historyJoe Cannatti Jr.
1.7K views59 slides
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods by
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsSalah Amean
11.6K views83 slides
Data Mining: Mining ,associations, and correlations by
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
23.3K views15 slides
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts by
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
30.9K views81 slides
3.2 partitioning methods by
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
45K views20 slides
Mining Frequent Patterns, Association and Correlations by
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
17.7K views67 slides

Viewers also liked(16)

How I data mined my text message history by Joe Cannatti Jr.
How I data mined my text message historyHow I data mined my text message history
How I data mined my text message history
Joe Cannatti Jr.1.7K views
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods by Salah Amean
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Salah Amean11.6K views
Data Mining: Mining ,associations, and correlations by Datamining Tools
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
Datamining Tools23.3K views
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts by Salah Amean
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Salah Amean30.9K views
3.2 partitioning methods by Krish_ver2
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver245K views
Mining Frequent Patterns, Association and Correlations by Justin Cletus
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus17.7K views
1.8 discretization by Krish_ver2
1.8 discretization1.8 discretization
1.8 discretization
Krish_ver210.2K views
Data Warehousing and Data Mining by idnats
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
idnats112.7K views
Data cube computation by Rashmi Sheikh
Data cube computationData cube computation
Data cube computation
Rashmi Sheikh31.1K views
Support Vector Machines for Classification by Prakash Pimpale
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale35.8K views
Data mining (lecture 1 & 2) conecpts and techniques by Saif Ullah
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah53.1K views
Data mining slides by smj
Data mining slidesData mining slides
Data mining slides
smj130.8K views

Similar to Odam: Open Data, Access and Mining

Enabling Precise Identification and Citability of Dynamic Data: Recommendatio... by
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Research Data Alliance
471 views156 slides
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data" by
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"National Information Standards Organization (NISO)
207 views56 slides
Make your data great now by
Make your data great nowMake your data great now
Make your data great nowDaniel JACOB
176 views40 slides
Environment Canada's Data Management Service by
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management ServiceSafe Software
1.3K views28 slides
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio... by
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...LEARN Project
402 views60 slides
Force11 JDDCP workshop presentation, @ Force2015, Oxford by
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordMark Wilkinson
1.2K views87 slides

Similar to Odam: Open Data, Access and Mining(20)

Enabling Precise Identification and Citability of Dynamic Data: Recommendatio... by Research Data Alliance
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Make your data great now by Daniel JACOB
Make your data great nowMake your data great now
Make your data great now
Daniel JACOB176 views
Environment Canada's Data Management Service by Safe Software
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
Safe Software1.3K views
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio... by LEARN Project
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
LEARN Project402 views
Force11 JDDCP workshop presentation, @ Force2015, Oxford by Mark Wilkinson
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Mark Wilkinson1.2K views
A Generic Scientific Data Model and Ontology for Representation of Chemical Data by Stuart Chalk
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
Stuart Chalk614 views
Make your data great again - Ver 2 by Daniel JACOB
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
Daniel JACOB177 views
Dataset description: DCAT and other vocabularies by Valeria Pesce
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
Valeria Pesce1.5K views
Dataset Catalogs as a Foundation for FAIR* Data by Tom Plasterer
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
Tom Plasterer682 views
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐... by Edward Blurock
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
Edward Blurock240 views
DataFinder concepts and example: General (20100503) by Data Finder
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
Data Finder565 views
Jeff Grethe: CAMERA by Iddo
Jeff Grethe: CAMERAJeff Grethe: CAMERA
Jeff Grethe: CAMERA
Iddo823 views
eTRIKS Data Harmonization Service Platform by ibemam
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
ibemam828 views
Big data & hadoop framework by Tu Pham
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
Tu Pham2.6K views
A Look into the Apache OODT Ecosystem by Chris Mattmann
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
Chris Mattmann4.3K views
Data Wrangling and Visualization Using Python by MOHITKUMAR1379
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
MOHITKUMAR1379464 views
Metadata & brokering - a modern approach #2 by Daniele Bailo
Metadata & brokering - a modern approach #2Metadata & brokering - a modern approach #2
Metadata & brokering - a modern approach #2
Daniele Bailo622 views
Building a modern Application with DataFrames by Databricks
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
Databricks5.4K views

Recently uploaded

CRM stick or twist workshop by
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshopinfo828217
14 views16 slides
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptxDataScienceConferenc1
10 views16 slides
Ukraine Infographic_22NOV2023_v2.pdf by
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdfAnastosiyaGurin
1.4K views3 slides
Custom Tag Manager Templates by
Custom Tag Manager TemplatesCustom Tag Manager Templates
Custom Tag Manager TemplatesMarkus Baersch
29 views17 slides
Data about the sector workshop by
Data about the sector workshopData about the sector workshop
Data about the sector workshopinfo828217
29 views27 slides
CRIJ4385_Death Penalty_F23.pptx by
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptxyvettemm100
7 views24 slides

Recently uploaded(20)

CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821714 views
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821729 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ... by DataScienceConferenc1
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
DGST Methodology Presentation.pdf by maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum5 views
CRM stick or twist.pptx by info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20048 views
Dr. Ousmane Badiane-2023 ReSAKSS Conference by AKADEMIYA2063
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS Conference
AKADEMIYA20635 views
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int... by DataScienceConferenc1
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf by Oppotus
OPPOTUS - Malaysians on Malaysia 3Q2023.pdfOPPOTUS - Malaysians on Malaysia 3Q2023.pdf
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf
Oppotus27 views
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f... by DataScienceConferenc1
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...

Odam: Open Data, Access and Mining

  • 1. Give an open access to your data and make them ready to be mined Daniel Jacob UMR 1332 BFP – Metabolism Group Bordeaux Metabolomics Facility May 2016 Open Data for Access and Mining A data explorer as bonus EDTMS ODAM
  • 2. Daniel Jacob – INRA UMR 1332 –May 2016 The experimental context: needs / wishesseeding harvesting samples preparation samples analysis Sample identifiers 2 Experiment Data Tables Experiment Design Web API Develop if needed, lightweight tools - R scripts (Galaxy), lightweight GUI (R shiny) Make both metadata and data available for data mining identifiers centrally managed data sharing & data availability facilitate the subsequent data mining 1 2 3 EDTMS ODAM Open Data for Access and Mining : The core idea in one shot
  • 3. Daniel Jacob – INRA UMR 1332 –May 2016 Data repository Data capture Minimal effort (PUT) PUT myhost.org http://myhost.org/ mount GET Implementation of an Experiment Data Tables Management System (EDTMS) Experiment Data Tables Merely dropping data files in a data repository (e.g. a local NAS or distant storage space) should allow users to access them by web API Data can be downloaded, explored and mined No database schema, no programming code and no additional configuration on the server side. Open Data for Access and Mining : The core idea in one shot EDTMS ODAM 3
  • 4. Daniel Jacob – INRA UMR 1332 –May 2016 plants.tsv harvests.tsv samples.tsv compounds.tsv Data subset files enzymes.tsv • Whatever the kind of experiment, this assumes a design of experiment (DoE) involving individuals, samples or whatever things, as the main objects of study (e.g. plants, tissues, bacteria, …) • This also assumes the observation of dependent variables resulting of effects of some controlled experimental factors. • Moreover, the objects of study have usually an identifier for each of them, and the variables can be quantitative or qualitative. • We can have either one object type of study or several kinds, but in this latter case, it must exist a relationship between object types that we assume of “obtainedFrom" type. Preparation and cleaning of the data sub-sets of files EDTMS ODAM 4
  • 5. Daniel Jacob – INRA UMR 1332 –May 2016 plants.tsv harvests.tsv samples.tsv compounds.tsv Classification of each column within its right category enzymes.tsv Data subset files factor quantitative qualitative identifier link categories EDTMS ODAM 5 Data subsets files and their associated metadata files must be compliant with the TSV standard (Tab-Separator-Values) • You have to organize your data subsets so that links could be established between them. • In practical, it means to add a column containing the identifiers corresponding to the entity to which you want to connect the subset, implying a ‘obtainedFrom’ relation. • It is to be noted that this duplication of identifiers must be the only redundant information, through all data subsets.
  • 6. Daniel Jacob – INRA UMR 1332 –May 2016 plants.tsv harvests.tsv samples.tsv enzymes.tsv Data subset files compounds.tsv Plants Harvests Samples Compounds Enzymes Connections between the dataset files based on identifiers Entities (concepts) Link between 2 subsets being carried out from identifiers (implies a ‘obtainedFrom’ relation) Identifier of the central entity of the subset EDTMS ODAM factor quantitative qualitative identifier link categories 6
  • 7. Daniel Jacob – INRA UMR 1332 –May 2016 Supplementary files In order to allow data to be explored and mined, we have to adjoin some minimal but relevant metadata: For that, 2 metadata files are required • s_subsets.tsv: a file allowing to associate with each subset of data a key concept corresponding to the main entity of the subset and the relations of the type "obtainedFrom" between these concepts • a_attributes.tsv: a metadata file allowing each attribute (concept/variable) to be annotated with some minimal but relevant metadata Creation of the metadata files EDTMS ODAM 7 Data subsets files and their associated metadata files must be compliant with the TSV standard (Tab-Separator-Values)Note: TSV is an alternative to the common comma-separated values (CSV) format, which often causes difficulties because of the need to escape commas
  • 8. Daniel Jacob – INRA UMR 1332 –May 2016 s_subsets.tsv This metadata file allows to associate a key concept to each data subset file Creation of the metadata files EDTMS ODAM 8 Plants Compounds Enzymes Harvests Samples plants.tsv PlanteID harvests.tsv Lot samples.tsv SampleID compounds.tsv enzymes.tsv SampleID SampleID 1 2 3 4 5 Identifier of the central entity of the subset Link between 2 subsets (implies a ‘obtainedFrom’ relation) Unique rank number of the data subset Key concept (i.e. the main entity) associated to the subset in the form of a short name Plants1 factor quantitative qualitative identifier categories PlanteID plants.tsv Data file name
  • 9. Daniel Jacob – INRA UMR 1332 –May 2016 a_attributes.tsv This metadata file allows each attribute (variable) to be annotated with some minimal but relevant metadata Creation of the metadata files EDTMS ODAM 9 factor quantitative qualitative identifier categories Plants Harvests Samples Compounds … …
  • 10. Daniel Jacob – INRA UMR 1332 –May 2016 s_subsets.tsv a_attributes.tsv … … Additional subsets/ attributes can be added step by step, as soon as data are produced. Updating the metadata files EDTMS ODAM
  • 11. Daniel Jacob – INRA UMR 1332 –May 2016 Uploading your datasets in the data repository EDTMS ODAM No database schema, no programming code and no additional configuration on the server side. Your data subset files Your dataset entry (named ‘frim1’ as example) within the data repository Z: (Storage) Merely dropping data files on the data repository (e.g. NAS) should allow users to access them by web API Data subsets files and their associated metadata files must be compliant with the TSV standard (Tab-Separator-Values) Data repository PUT myhost.orgmount GET Data capture Minimal effort (PUT)
  • 12. Daniel Jacob – INRA UMR 1332 –May 2016 http://myhost.org/check/frim1 myhost.org StorageDataRepos NAS Checking online if your the data subset files are consistent EDTMS ODAM Many test checks can be automatically done for you
  • 13. Daniel Jacob – INRA UMR 1332 –May 2016 EDTMS ODAM Data storage seeding harvesting samples analysis samples preparation 13 GET , maximal efficiency (GET) After depositing your complete dataset as described previously: • An open access is given to your data through web API • They are ready to be mined • No specific code or additional configuration are needed (*) https://www.erasysbio.net/index.php?index=266 minimal effort (PUT) PUT Format TSV Data Data Linking Preparation and cleaning of the data sub-sets of files FRIM1(*) Check Open Data, Access and Mining : web API
  • 14. Daniel Jacob – INRA UMR 1332 –May 2016 Data Format TSV EDTMS ODAM Data linking Open Data, Access and Mining : web API REST Services: hierarchical tree of resource naming (URL) Retrieving data Retrieving metadata <data format> <dataset name> <subset> (<subset>) <entry><category> <value> <value> <value> <entry> GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … > factor quantitative qualitative identifier link categories FRIM1 (*) xml/tsv/json frim1 14 (*) https://doi.org/10.5281/zenodo.154041
  • 15. Daniel Jacob – INRA UMR 1332 –May 2016 EDTMS ODAM Open Data, Access and Mining : web API REST Services: hierarchical tree of resource naming (URL) 15 GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … > Field Description Examples <data format> format of the retrieved data; possible values are: 'xml' or 'csv' xml <dataset name> Short name (tag) of your dataset frim1 <subset> Short name of a data subset samples <entry> Name of an attribute entry (defined by the user in the a_attribute file (column ‘entry’) sampleid <category> Name of the attribute category; (assigned by the user in the a_attribute file (column ‘category’) possible values are: ‘identifier’, ‘factor’, ‘qualitative’, ‘quantitative’ quantitative (<subset>) Set of data subsets by merging all the subsets with lower rank than the specified subset and following the pathway defined by the "is_part_of" links. (samples)  plants + harvests + samples <value> Exact value of the desired entry or category 1, factor
  • 16. Daniel Jacob – INRA UMR 1332 –May 2016 EDTMS ODAM Open Data, Access and Mining : web API REST Services: hierarchical tree of resource naming (URL) 16 GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … > http://myhost.org/getdata/<data format>/<dataset name>/<subset>/<entry>/<value> http://myhost.org/getdata/<data format>/<dataset name>/(<subset>)/<category> http://myhost.org/getdata/<data format>/<dataset name> http://myhost.org/getdata/<data format>/<dataset name>/(<subset>)/<entry>/<value> http://myhost.org/getdata/<data format>/<dataset name>/<subset> http://myhost.org/getdata/<data format>/<dataset name>/(<subset>) • Get the subset list of a dataset • Get all values within a data subset • Get values within a data subset for a specific value of an entry • Get all values within a set of data subsets • Get values within a set of data subsets for a specific value of an entry • Get the attribute list within a set of data subsets for a specific category
  • 17. Daniel Jacob – INRA UMR 1332 –May 2016 http://myhost.org/getdata/xml/frim1 http://myhost.org/getdata/xml/frim1/plants http://myhost.org/getdata/xml/frim1/harvests/lot/1 http://myhost.org/getdata/xml/frim1/(compounds)/quantitative Metadata Metadata Data Data Open Data Access via web API: Examples based on FRIM1 EDTMS ODAM FRIM1 17
  • 18. Daniel Jacob – INRA UMR 1332 –May 2016 http://myhost.org/getdata/xml/frim1/(samples)/treatment/Control Set of data subsets by merging all the subsets with lower rank than the specified subset and following the pathway defined by the “obtainedFrom" links. (samples)  plants + harvests + samples Open Data Access via web API: Examples based on FRIM1 EDTMS ODAM FRIM1 18
  • 19. Daniel Jacob – INRA UMR 1332 –May 2016 Data Format TSV minimal effort, maximal efficiency EDTMS ODAM Data linking Open Data Access via web API: Application layer FRIM1 19 … Use existing tools - Spreadsheets, R studio, BioStatFlow, Galaxy, Cytoscape, …
  • 20. Daniel Jacob – INRA UMR 1332 –May 2016 Retrieving Data within R Open Data Access via web API: Application layer The R package Rodam EDTMS ODAM 20
  • 21. Daniel Jacob – INRA UMR 1332 –May 2016 Open Data Access via web API Rodam package 21 <data format> <dataset name> <subset> (<subset>) <entry><category> <value> <value> <value> <entry> tsv frim1 samples sample 365 GET http://www.bordeaux.inra.fr/pmb/getdata/tsv/frim1/(samples)/sample/365
  • 22. Daniel Jacob – INRA UMR 1332 –May 2016 Open Data Access via web API Read metadata i.e. category types within the data Get the data subset ‘activome’ along with its metadata 22 <data format> <dataset name> <subset> (<subset>) <entry> <category> <value> <value> <entry> tsv frim1 activome factor GET http://www.bordeaux.inra.fr/pmb/getdata/tsv/frim1/(activome)/factor Rodam package
  • 23. Daniel Jacob – INRA UMR 1332 –May 2016 Open Data Access via web API 23 Rodam package
  • 24. Daniel Jacob – INRA UMR 1332 –May 2016 Data / Metadata Data Mining ? Make both metadata and data available for data mining. Experimentation / Analysis MFA rCCA pLDA … Open Data Access via web API activome qNMR_metabo Water StressControl ODAM facilitates the subsequent data mining All Dev. Stages All Treatments ODAM facilitates the subsequent data mining (log10 transformed) 24 Rodam package
  • 25. Daniel Jacob – INRA UMR 1332 –May 2016 Develop if needed, lightweight tools - R scripts (Galaxy), lightweight GUI (R shiny) minimal effort, maximal efficiency … Use existing tools - Spreadsheets, R studio, BioStatFlow, Galaxy, Cytoscape, … EDTMS ODAM Data Format TSV Data linking Open Data Access via web API: Application layer FRIM1 25
  • 26. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 26 http://www.bordeaux.inra.fr/pmb/dataexplorer/?ds=frim1
  • 27. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 27 http://www.bordeaux.inra.fr/pmb/dataexplorer/?ds=frim1
  • 28. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 28
  • 29. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 29 To remove an item from the selection: i) click on it, and then ii) click on the ‘Suppr’ key
  • 30. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 30
  • 31. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 31 Explore several possibilities by interacting with the graph
  • 32. Daniel Jacob – INRA UMR 1332 –May 2016 To summarize 1. Preparation and cleaning of the data sub-sets of files 2. Classification of each column within its right category 3. Connections between the dataset files based on identifiers 4. Creation of the definition files namely s_subsets.tsv and a_attributes.tsv 5. Deposit of the dataset files in the data repository 6. Checking online if your the data subset files are consistent 7. Testing online the web-services on your dataset 8. Use of the web API through an application layer (R scripts, data explorer, ... ) EDTMS ODAM Data subsets files and their associated metadata files must be compliant with the TSV standard (Tab-Separator-Values) Note: TSV is an alternative to the common comma-separated values (CSV) format, which often causes difficulties because of the need to escape commas (See https://en.wikipedia.org/wiki/Tab-separated_values)
  • 33. Daniel Jacob – INRA UMR 1332 –May 2016 Advantages of this approach data sharing & data availability - The array of the "plants" may be created even before planting the seeds. - Similarly, the array of the "harvests" can be created as soon as the harvests are done, and this before any analysis. - Thus, these arrays are generated only once in the project and we can set up the sharing soon the seed planting. Then each analysis comes to complement the set of data as soon as they produce their own sub-dataset. - data are accessible to everyone as soon as they are produced, identifiers centrally managed - data are archived and compiled, so that it becomes useless to proceed a laborious investigation to find out who possesses the right identifiers, etc. EDTMS ODAM seeding harvesting samples analysis Sample identifiers samples preparation
  • 34. Daniel Jacob – INRA UMR 1332 –May 2016 Advantages of this approach facilitate the subsequent publication of data - data are already readily available online by web API, - But nothing prevents to take this data to fill in existing databases, by adjoining more elaborate annotations. - Neither administrator privileges nor any programmatic skills are required EDTMS ODAM Data Format TSV Data linking PUT GET Data capture Minimal effortData analysis/mining Maximum efficiency
  • 35. Daniel Jacob – INRA UMR 1332 –May 2016 minimal effort, maximum efficiency Format the data - Based on TSV: choice to keep the good old way of scientist to use worksheets, thus i) using the same tool for both data files and metadata definition files, ii) no programmatic skill are required Give an access through a web services layer - based on current standards (REST) Use existing tools - Spreadsheets, R studio, BioStatFlow, Galaxy, Cytoscape, … Develop if needed, lightweight tools - R scripts, lightweight GUI (R shiny) Advantages of this approach biostatflow.org EDTMS ODAM
  • 36. Daniel Jacob – INRA UMR 1332 –May 2016 Have a good fun !! Daniel Jacob UMR 1332 BFP – Metabolism Group Bordeaux Metabolomics Facility May 2016 Open Data for Access and Mining https://hub.docker.com/r/odam/getdata/ http://www.bordeaux.inra.fr/pmb/dataexplorer/ https://github.com/INRA/ODAM https://cran.r-project.org/package=Rodam https://zenodo.org/record/154041 An online example