SlideShare a Scribd company logo
1 of 36
Download to read offline
Data Sharing
Infrastructures to
Foster Data
Reuse
David Johnson
david.johnson@oerc.ox.ac.uk
@NuDataScientist
Integrating Large Data into Plant Science workshop
21st April 2016
Philippe  
Rocca-­Serra,  PhD
Senior  Research  Lecturer
Alejandra
Gonzalez-­Beltran,  PhD
Research  Lecturer
Milo  
Thurston,  PhD
Research  Software  Engineer
Massimiliano  Izzo,  PhD
Research  Software  Engineer
Peter  
McQuilton,  PhD
Knowledge  Engineer
Our  main  areas  of  research  and  activity:
• Data  collection,  curation,  
representation  etc.
• Data  publication
• Data  provenance  
• Development  of  software,  infrastructure
• Open,  community  ontologies  and  
standards
• Semantic  web
• Training
Communities we work with/for:
Allyson  
Lister,  PhD
Knowledge  Engineer
Eamonn
Maguire,  DPhil
Software  Engineer  contractor
David  
Johnson,  PhD
Research  Software  Engineer
Susanna-­Assunta  Sansone,  PhD
Principal  Investigator,  Associate  Director  
(consultant  for  Nature  Publishing  Group)
Notes in Lab Books
(information for humans)
Spreadsheets andTables
( the compromise)
Facts as RDF statements
(information for machines)
Notes and narrative Spreadsheets and tables Linked data and data publication
Notes in Lab Books
(information for humans)
Spreadsheets andTables
( the compromise)
Facts as R
(informat
n Lab Books
ation for humans)
Spreadsheets andTables
( the compromise)
Facts as RDF statements
(information for machine
Enabling  reproducible  research  and  open  science,
driving  science  and  discoveries
Increase  the  level  of  annotation  at  the  source,  tracking  provenance  and  using  community  standards
Maximize  data  discoverability  and  reuse
Applied  research  approach
Two  well-­established  products  with  
large  user  base,  embedded  in  
many  funded  projects
Several  community-­driven  
ontology  and  other  standards,  
embedded  in  many  funded  
projects
86
349
200
MIAME
MIAPA
MIRIAM
MIQASMIX
MIGEN
ARRIVE
MIAPPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
MAGE-Tab
GCDML
SRA XML
SOFT FASTA
DICOM
MzML
SBRML
SED-ML…
GELML
ISAtab
CML
MITAB
AAO
CHEBI
OBI
PATO ENVO
MOD
BTO
IDO…
TEDDY
PRO
XAO
DO
VO
In the life sciences there are > 600 content standards
Databases and tools
implementing
Standards;also training
material on and around
standards
nmrML
ISA-JSON
Formats
Terminologies
Guidelines
CO
de jure de facto
grass-roots
groups
standard
organizations
Nanotechnology Working Group
• To structure, enrich and report the description of the datasets and the
experimental context under which they were produced
Community-developed content standards
Formats
Terminologies
Guidelines
Mapping  the  landscape  of  ‘standards’  in  the  life  sciences
A  web-­based,  curated and  searchable  registry  ensuring  that  
standards and  databases are  registered,  informative and  discoverable;;  
monitoring  development  and  evolution of  standards,  their  use in  
databases  and  adoption  of  both  in  data  policies
1,400  records  and  growing  
Mapping  the  landscape  of  ‘standards’  in  the  life  sciences
1,400  records  and  growing  
also  operating   as  a  WG  in  Run  at is  also  an contribution   to
Is there a database, implementing
standards, where to deposit my
metagenomics dataset?
My funder’s data sharing policy
recommends the use of
established standards, but
which ones are widely
endorsed and applicable to my
toxicological and clinical data?
Am I using the most up-to-date
version of this terminology to
annotate cell-based assays?
I understand this format has been
deprecated; what has been replaced
by and who is leading the work?
Are there databases implementing
this exchange format, whose
development we have funded?
What are the mature
standards and
standards-compliant
databases we should
recommend to our
authors?
But  how  do  we  help  users  to  make  informed  decisions?
The  International  Conference  on  Systems  Biology  (ICSB),  22-­28  August,  2008              Susanna-­Assunta  
Sansone www.ebi.ac.uk/net-­project
Search  and  filter  to  find  what  is  relevant  to  your  type  of  data
From  simple  and  advance  search  interfaces  to….
Powered  by  curated  descriptions  of  each  
standard  and  database  records,  and  their  
relations;;
….the  recommender  system
The  International  Conference  on  Systems  Biology  (ICSB),  22-­28  August,  2008              Susanna-­Assunta  
Sansone www.ebi.ac.uk/net-­project
Tracking  evolution,  e.g.  deprecations  and  substitutions
Cross-­linking  standards  to  standards  and  databases
Model/format  formalizing  reporting  guideline  -­-­>  
<-­-­ Reporting  guideline  used  by  model/format
We  link  (descriptions  of)  standards  to  
related  standards  and  databases,  
implementing  them
Standards  and  databases  cross-­linked
model and related formats
These tools and formats will help you to:
The  International  Conference  on  Systems  Biology  (ICSB),  22-­28  August,  2008              Susanna-­Assunta  
Sansone www.ebi.ac.uk/net-­project
ISA powers data collection, curation resources and repositories, e.g.:
Initiated 2003, continues to work with/for many domains
model and related formats
17
ISA in a nutshell
18
Why ISA format and Tools?
ISA metadata specifications:
•workflow and process orientated
•compatible with checklist enforcement
•compatible with external vocabulary resources
•compatible by design with existing schemas
19
1. Essentials about ISA tab syntax
● Investigation File: cardinality: 1..1
– purpose: think “executive summary”
– layout: rows of key value pairs organized in blocks
– content:
• Why? general study description
• How? methods / protocol declaration
• How? variable declarations (predictor and response variables)
• Who? contact and affiliation information
● Study File: cardinality: 1..n
– layout: true header/row of record table (think “sorting, filtering of samples”)
– content:
• What? Listing all biological materials collected over the study course and their
treatments.
● Assay File: cardinality: 1..n
– layout: true header/row of record table (think “sorting, filtering of datafiles”)
– content:
• What? Listing all data acquisition events and data files collected by a given assay and
subsequent data transformations
20
1. Essentials about ISA syntax
Protocol act on Material or Data defining
Workflows:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled
Extract Name.) or Data Nodes (Raw Data File or Derived Data File)
Characteristics[…]
Factor Value[…]
(independent variables)
Material Type
Comment[…]
Data NodeMaterial Node
Date (day effect)
Performer (operator effect)
Parameter Value
[…]
Protocol
Application
Material
TransformationSample
Extract Raw  Data  File
Derived  Data  File
21
2. basic coding patterns with ISA syntax
The task: rendering a graph in a table
22
– Branching events:
root
mature leaf
A thaliana 1
Source	
  Name
Characteristic
s[organism] Protocol	
  REF
Parameter	
  
Value[storage	
  
condition]
Sample	
  Name
Characteristics[organ]
AT1 A	
  Thaliana
sample	
  
collection
liquid	
  nitrogen
AT1	
  -­‐ sample1 flower
AT1 A	
  Thaliana
sample	
  
collection
liquid	
  nitrogen AT1	
  -­‐ sample2 mature	
  leaf
AT1 A	
  Thaliana
sample	
  
collection
liquid	
  nitrogen AT1	
  -­‐ sample3 root
Source Material
flower
Sample Material
2. basic coding patterns with ISA syntax
23
– Pooling events:
Source	
  Name
Characteristic
s[organism] Protocol	
  REF
Parameter	
  
Value[storage	
  
condition]
Sample	
  Name
Characteristics[organ]
plant	
  1
Fragaria
ananassa,
sample	
  
collection
liquid	
  nitrogen pool1 fruit
plant	
  2
Fragaria
ananassa,
sample	
  
collection
liquid	
  nitrogen
pool1 fruit
plant	
  3
Fragaria
ananassa,
sample	
  
collection
liquid	
  nitrogen
pool1 fruit
plant 1
plant 2
plant 3
Source Material
fruit
Sample Material
2. basic coding patterns with ISA syntax
24
– Representing  interventions  and  treatments
• expressing  treatments  as  sets  of  factor  levels
• examples:    exposure  to  different  doses  of  systemic  herbicide
• Factors  will  be  ‘compound’,  ‘dose’ and  duration
• (what?,how much?,  how  long  for?)
• Implicit  column  order  matters  but  this  is  independent  from  the  ISA  syntax  
specification:
Source	
  Name
Characteristic
s[organism] Protocol	
  REF
Factor	
  
Value[compound]
Factor	
  
Value[dose]
Factor	
  
Value[duration]
Plant	
  1 Zea	
  mays treatment
glyphosate
250	
  mg/day 12	
  weeks
Plant	
  2 Zea	
  mays treatment glyphosate 250	
  mg/day 12	
  weeks
Plant	
  3 Zea	
  mays treatment glyphosate 20	
  mg/day 12	
  weeks
2.  basic  coding  patterns  with  ISA  syntax
25
–Tagging with Terminologies
• ISA tools (ISAcreator - ISAconfigurator) provide
Ontology term selection and term tagging facilities to
help users.
Source	
  Name
Characteristics[
ORGANISM]
Term	
  Source	
  
REF
Term	
  
Accession	
  
Number
Characteristics[
AGE]
Unit
Term	
  Source	
  
REF
Term	
  
Accession	
  
Number
Factor	
  
Value[COMPOUND	
  
(htppt://purl]
Term	
  Source	
  REF Term	
  Accession	
  Number
individual1 Homo	
  sapiens NCBITax 9606 12 week UO
UO:wwer
wta
aspirin CHEBI 1231354
2. basic coding patterns with ISA syntax
Source	
  Name Characteristics[ORGANISM] Characteristics[AGE] Factor	
  Value[COMPOUND]
individual1 human 12	
  weeks aspirin
26
ISA syntax boundaries
● Any model is a compromise between granularity
and simplicity
● Some cases are hard to represent
– crossover design with dissimilar arms
– representing mixtures of chemical
– representing loops (with donors and recipients)
● Reaching the limits of how graphs can be efficiently
represented in tables
27
– A case of simple non destructive HTP :
– 60 genotypes x 5 replicates : 12 trays of 25 pots each
– 1 seed per pot gives us 300 individual plants
– experiment duration: 35 days
– single daily data acquisition:
• visible light: 3 angles + top view = 4 images
• near infrared: 3 angles + top view = 4 images
• fluorescence: 1 angle = 1 image
• TOTAL: 9 images per plant per day
– Grand Total: 94,500 files to store and track
Plant H-T Phenotyping worked example
28
– Decomposing the experiment in term of ISA elements
– Identifying key experimental variables:
• independent variables => used to define ISA Factors and/or
Characteristics
– Factor = {genotype}, Factor Values[G1..G60] = 60 distinct values
– Factor = {day}, Factor Values[day1..day35] = 35 distinct values
• response variables => used to define 3 distinct ISA Assays
– morphology using visible light imaging
» ISA parameters to track ‘camera position’ {top,left,right,centre}
– water content using near infrared imaging
» ISA parameters to track ‘camera position’ {top,left,right,centre}
– photosynthetic pigment concentration using fluorescence imaging
» ISA parameters to track ‘camera position’ {top}
Plant H-T Phenotyping worked example
29
– Decomposing the experiment in term of ISA elements
– Identifying key experimental variables:
• independent variables => used to define ISA Factors and/or
Characteristics
– Factor = {genotype}, Factor Values[ ] = 60 distinct values
– Factor = {day}, Factor Values[ ] = 35 distinct values
• Automatic creating and filling of ISA Study Sample files
– 60 x 35 = 2100 factor combinations
– 5 replicates per factor combination => 10500 pots with 1 seed per
pot to be grown
– Translated into :
» 1 ISA study file with 10500 row on the following pattern
Plant H-T Phenotyping worked example
30
Declaring	
  and	
  annotating	
  an	
  ISA	
  Source	
  Node
ISA	
  Protocol	
  Application	
  with	
  sets	
  of	
  
Parameter	
  Values	
  resulting	
  in	
  a	
  ISA	
  Sample	
  
Node
Reporting	
  of	
  independent	
  
variables	
  as	
  ISA	
  Factor	
  Values
Plant H-T Phenotyping worked example
31
– Decomposing the experiment in term of ISA elements
– Identifying key experimental variables:
• response variables => used to define 3 distinct ISA Assays
– morphology using visible light imaging
» ISA parameters to track ‘camera position’
{top,left,right,centre}
– water content using near infrared imaging
» ISA parameters to track ‘camera position’
{top,left,right,centre}
– photosynthetic pigment concentration using fluorescence
imaging
» ISA parameters to track ‘camera position’ {top}
Plant H-T Phenotyping worked examples
32
Describing	
  a	
  data	
  acquisition	
   event
ISA	
  Protocol	
  Application	
  of	
  type	
  Data	
  
Transformation	
  with	
  sets	
  of	
  Parameter	
  
Values	
  resulting	
  in	
  a	
  ISA	
  Derived	
  Data	
  File
Reporting	
  of	
  independent	
  
variables	
  as	
  ISA	
  Factor	
  Values
Plant H-T Phenotyping worked examples
Collaborative Open Plant
Omics
34
ISA tools in the Cloud
35
36
You can email us...
isatools@googlegroups.com
View our blog
http://isatools.org/blog
Follow us on Twitter
@isatools
@biosharing
View our websites
http://www.isa-tools.org
http://www.biosharing.org
View our Git repo & contribute
http://github.com/ISA-tools

More Related Content

What's hot

FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
Carole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
Catherine Canevet
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Carole Goble
 

What's hot (20)

Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
 
Overview of the NIH BD2K CEDAR centre, on metadata and standards
Overview of the NIH BD2K CEDAR centre, on metadata and standardsOverview of the NIH BD2K CEDAR centre, on metadata and standards
Overview of the NIH BD2K CEDAR centre, on metadata and standards
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
FAIR data management in biomedicine
FAIR data management  in biomedicineFAIR data management  in biomedicine
FAIR data management in biomedicine
 
Franz et al tdwg 2016 new developments for libraries of life
Franz et al tdwg 2016 new developments for libraries of lifeFranz et al tdwg 2016 new developments for libraries of life
Franz et al tdwg 2016 new developments for libraries of life
 
is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.
 
Data cycle microbes
Data cycle microbesData cycle microbes
Data cycle microbes
 
NCBO Overview and Biositemaps
NCBO Overview and BiositemapsNCBO Overview and Biositemaps
NCBO Overview and Biositemaps
 

Viewers also liked

Viewers also liked (7)

Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Importance and Challenges of Reproducible Research
Importance and Challenges of Reproducible ResearchImportance and Challenges of Reproducible Research
Importance and Challenges of Reproducible Research
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
FAIR data overview
FAIR data overviewFAIR data overview
FAIR data overview
 
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can EditWikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?
 

Similar to GARNet workshop on Integrating Large Data into Plant Science

Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Susanna-Assunta Sansone
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
Susanna-Assunta Sansone
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
Susanna-Assunta Sansone
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 

Similar to GARNet workshop on Integrating Large Data into Plant Science (20)

NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
 
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
 
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWSRDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
 
INSERM - Data Management & Reuse of Health Data - May 2017
INSERM - Data Management & Reuse of Health Data - May 2017INSERM - Data Management & Reuse of Health Data - May 2017
INSERM - Data Management & Reuse of Health Data - May 2017
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and Neuroscience
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
BioSharing - Update - Feb2016
BioSharing - Update - Feb2016BioSharing - Update - Feb2016
BioSharing - Update - Feb2016
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

GARNet workshop on Integrating Large Data into Plant Science

  • 1. Data Sharing Infrastructures to Foster Data Reuse David Johnson david.johnson@oerc.ox.ac.uk @NuDataScientist Integrating Large Data into Plant Science workshop 21st April 2016
  • 2. Philippe   Rocca-­Serra,  PhD Senior  Research  Lecturer Alejandra Gonzalez-­Beltran,  PhD Research  Lecturer Milo   Thurston,  PhD Research  Software  Engineer Massimiliano  Izzo,  PhD Research  Software  Engineer Peter   McQuilton,  PhD Knowledge  Engineer Our  main  areas  of  research  and  activity: • Data  collection,  curation,   representation  etc. • Data  publication • Data  provenance   • Development  of  software,  infrastructure • Open,  community  ontologies  and   standards • Semantic  web • Training Communities we work with/for: Allyson   Lister,  PhD Knowledge  Engineer Eamonn Maguire,  DPhil Software  Engineer  contractor David   Johnson,  PhD Research  Software  Engineer Susanna-­Assunta  Sansone,  PhD Principal  Investigator,  Associate  Director   (consultant  for  Nature  Publishing  Group)
  • 3. Notes in Lab Books (information for humans) Spreadsheets andTables ( the compromise) Facts as RDF statements (information for machines) Notes and narrative Spreadsheets and tables Linked data and data publication Notes in Lab Books (information for humans) Spreadsheets andTables ( the compromise) Facts as R (informat n Lab Books ation for humans) Spreadsheets andTables ( the compromise) Facts as RDF statements (information for machine Enabling  reproducible  research  and  open  science, driving  science  and  discoveries Increase  the  level  of  annotation  at  the  source,  tracking  provenance  and  using  community  standards Maximize  data  discoverability  and  reuse Applied  research  approach Two  well-­established  products  with   large  user  base,  embedded  in   many  funded  projects Several  community-­driven   ontology  and  other  standards,   embedded  in  many  funded   projects
  • 4. 86 349 200 MIAME MIAPA MIRIAM MIQASMIX MIGEN ARRIVE MIAPPE MIASE MIQE MISFISHIE…. REMARK CONSORT MAGE-Tab GCDML SRA XML SOFT FASTA DICOM MzML SBRML SED-ML… GELML ISAtab CML MITAB AAO CHEBI OBI PATO ENVO MOD BTO IDO… TEDDY PRO XAO DO VO In the life sciences there are > 600 content standards Databases and tools implementing Standards;also training material on and around standards nmrML ISA-JSON Formats Terminologies Guidelines CO
  • 5. de jure de facto grass-roots groups standard organizations Nanotechnology Working Group • To structure, enrich and report the description of the datasets and the experimental context under which they were produced Community-developed content standards Formats Terminologies Guidelines
  • 6. Mapping  the  landscape  of  ‘standards’  in  the  life  sciences A  web-­based,  curated and  searchable  registry  ensuring  that   standards and  databases are  registered,  informative and  discoverable;;   monitoring  development  and  evolution of  standards,  their  use in   databases  and  adoption  of  both  in  data  policies 1,400  records  and  growing  
  • 7. Mapping  the  landscape  of  ‘standards’  in  the  life  sciences 1,400  records  and  growing   also  operating   as  a  WG  in  Run  at is  also  an contribution   to
  • 8. Is there a database, implementing standards, where to deposit my metagenomics dataset? My funder’s data sharing policy recommends the use of established standards, but which ones are widely endorsed and applicable to my toxicological and clinical data? Am I using the most up-to-date version of this terminology to annotate cell-based assays? I understand this format has been deprecated; what has been replaced by and who is leading the work? Are there databases implementing this exchange format, whose development we have funded? What are the mature standards and standards-compliant databases we should recommend to our authors? But  how  do  we  help  users  to  make  informed  decisions?
  • 9. The  International  Conference  on  Systems  Biology  (ICSB),  22-­28  August,  2008              Susanna-­Assunta   Sansone www.ebi.ac.uk/net-­project Search  and  filter  to  find  what  is  relevant  to  your  type  of  data
  • 10. From  simple  and  advance  search  interfaces  to…. Powered  by  curated  descriptions  of  each   standard  and  database  records,  and  their   relations;; ….the  recommender  system
  • 11. The  International  Conference  on  Systems  Biology  (ICSB),  22-­28  August,  2008              Susanna-­Assunta   Sansone www.ebi.ac.uk/net-­project Tracking  evolution,  e.g.  deprecations  and  substitutions
  • 12. Cross-­linking  standards  to  standards  and  databases Model/format  formalizing  reporting  guideline  -­-­>   <-­-­ Reporting  guideline  used  by  model/format We  link  (descriptions  of)  standards  to   related  standards  and  databases,   implementing  them
  • 13. Standards  and  databases  cross-­linked
  • 14.
  • 15. model and related formats These tools and formats will help you to:
  • 16. The  International  Conference  on  Systems  Biology  (ICSB),  22-­28  August,  2008              Susanna-­Assunta   Sansone www.ebi.ac.uk/net-­project ISA powers data collection, curation resources and repositories, e.g.: Initiated 2003, continues to work with/for many domains model and related formats
  • 17. 17 ISA in a nutshell
  • 18. 18 Why ISA format and Tools? ISA metadata specifications: •workflow and process orientated •compatible with checklist enforcement •compatible with external vocabulary resources •compatible by design with existing schemas
  • 19. 19 1. Essentials about ISA tab syntax ● Investigation File: cardinality: 1..1 – purpose: think “executive summary” – layout: rows of key value pairs organized in blocks – content: • Why? general study description • How? methods / protocol declaration • How? variable declarations (predictor and response variables) • Who? contact and affiliation information ● Study File: cardinality: 1..n – layout: true header/row of record table (think “sorting, filtering of samples”) – content: • What? Listing all biological materials collected over the study course and their treatments. ● Assay File: cardinality: 1..n – layout: true header/row of record table (think “sorting, filtering of datafiles”) – content: • What? Listing all data acquisition events and data files collected by a given assay and subsequent data transformations
  • 20. 20 1. Essentials about ISA syntax Protocol act on Material or Data defining Workflows: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) or Data Nodes (Raw Data File or Derived Data File) Characteristics[…] Factor Value[…] (independent variables) Material Type Comment[…] Data NodeMaterial Node Date (day effect) Performer (operator effect) Parameter Value […] Protocol Application Material TransformationSample Extract Raw  Data  File Derived  Data  File
  • 21. 21 2. basic coding patterns with ISA syntax The task: rendering a graph in a table
  • 22. 22 – Branching events: root mature leaf A thaliana 1 Source  Name Characteristic s[organism] Protocol  REF Parameter   Value[storage   condition] Sample  Name Characteristics[organ] AT1 A  Thaliana sample   collection liquid  nitrogen AT1  -­‐ sample1 flower AT1 A  Thaliana sample   collection liquid  nitrogen AT1  -­‐ sample2 mature  leaf AT1 A  Thaliana sample   collection liquid  nitrogen AT1  -­‐ sample3 root Source Material flower Sample Material 2. basic coding patterns with ISA syntax
  • 23. 23 – Pooling events: Source  Name Characteristic s[organism] Protocol  REF Parameter   Value[storage   condition] Sample  Name Characteristics[organ] plant  1 Fragaria ananassa, sample   collection liquid  nitrogen pool1 fruit plant  2 Fragaria ananassa, sample   collection liquid  nitrogen pool1 fruit plant  3 Fragaria ananassa, sample   collection liquid  nitrogen pool1 fruit plant 1 plant 2 plant 3 Source Material fruit Sample Material 2. basic coding patterns with ISA syntax
  • 24. 24 – Representing  interventions  and  treatments • expressing  treatments  as  sets  of  factor  levels • examples:    exposure  to  different  doses  of  systemic  herbicide • Factors  will  be  ‘compound’,  ‘dose’ and  duration • (what?,how much?,  how  long  for?) • Implicit  column  order  matters  but  this  is  independent  from  the  ISA  syntax   specification: Source  Name Characteristic s[organism] Protocol  REF Factor   Value[compound] Factor   Value[dose] Factor   Value[duration] Plant  1 Zea  mays treatment glyphosate 250  mg/day 12  weeks Plant  2 Zea  mays treatment glyphosate 250  mg/day 12  weeks Plant  3 Zea  mays treatment glyphosate 20  mg/day 12  weeks 2.  basic  coding  patterns  with  ISA  syntax
  • 25. 25 –Tagging with Terminologies • ISA tools (ISAcreator - ISAconfigurator) provide Ontology term selection and term tagging facilities to help users. Source  Name Characteristics[ ORGANISM] Term  Source   REF Term   Accession   Number Characteristics[ AGE] Unit Term  Source   REF Term   Accession   Number Factor   Value[COMPOUND   (htppt://purl] Term  Source  REF Term  Accession  Number individual1 Homo  sapiens NCBITax 9606 12 week UO UO:wwer wta aspirin CHEBI 1231354 2. basic coding patterns with ISA syntax Source  Name Characteristics[ORGANISM] Characteristics[AGE] Factor  Value[COMPOUND] individual1 human 12  weeks aspirin
  • 26. 26 ISA syntax boundaries ● Any model is a compromise between granularity and simplicity ● Some cases are hard to represent – crossover design with dissimilar arms – representing mixtures of chemical – representing loops (with donors and recipients) ● Reaching the limits of how graphs can be efficiently represented in tables
  • 27. 27 – A case of simple non destructive HTP : – 60 genotypes x 5 replicates : 12 trays of 25 pots each – 1 seed per pot gives us 300 individual plants – experiment duration: 35 days – single daily data acquisition: • visible light: 3 angles + top view = 4 images • near infrared: 3 angles + top view = 4 images • fluorescence: 1 angle = 1 image • TOTAL: 9 images per plant per day – Grand Total: 94,500 files to store and track Plant H-T Phenotyping worked example
  • 28. 28 – Decomposing the experiment in term of ISA elements – Identifying key experimental variables: • independent variables => used to define ISA Factors and/or Characteristics – Factor = {genotype}, Factor Values[G1..G60] = 60 distinct values – Factor = {day}, Factor Values[day1..day35] = 35 distinct values • response variables => used to define 3 distinct ISA Assays – morphology using visible light imaging » ISA parameters to track ‘camera position’ {top,left,right,centre} – water content using near infrared imaging » ISA parameters to track ‘camera position’ {top,left,right,centre} – photosynthetic pigment concentration using fluorescence imaging » ISA parameters to track ‘camera position’ {top} Plant H-T Phenotyping worked example
  • 29. 29 – Decomposing the experiment in term of ISA elements – Identifying key experimental variables: • independent variables => used to define ISA Factors and/or Characteristics – Factor = {genotype}, Factor Values[ ] = 60 distinct values – Factor = {day}, Factor Values[ ] = 35 distinct values • Automatic creating and filling of ISA Study Sample files – 60 x 35 = 2100 factor combinations – 5 replicates per factor combination => 10500 pots with 1 seed per pot to be grown – Translated into : » 1 ISA study file with 10500 row on the following pattern Plant H-T Phenotyping worked example
  • 30. 30 Declaring  and  annotating  an  ISA  Source  Node ISA  Protocol  Application  with  sets  of   Parameter  Values  resulting  in  a  ISA  Sample   Node Reporting  of  independent   variables  as  ISA  Factor  Values Plant H-T Phenotyping worked example
  • 31. 31 – Decomposing the experiment in term of ISA elements – Identifying key experimental variables: • response variables => used to define 3 distinct ISA Assays – morphology using visible light imaging » ISA parameters to track ‘camera position’ {top,left,right,centre} – water content using near infrared imaging » ISA parameters to track ‘camera position’ {top,left,right,centre} – photosynthetic pigment concentration using fluorescence imaging » ISA parameters to track ‘camera position’ {top} Plant H-T Phenotyping worked examples
  • 32. 32 Describing  a  data  acquisition   event ISA  Protocol  Application  of  type  Data   Transformation  with  sets  of  Parameter   Values  resulting  in  a  ISA  Derived  Data  File Reporting  of  independent   variables  as  ISA  Factor  Values Plant H-T Phenotyping worked examples
  • 33.
  • 35. ISA tools in the Cloud 35
  • 36. 36 You can email us... isatools@googlegroups.com View our blog http://isatools.org/blog Follow us on Twitter @isatools @biosharing View our websites http://www.isa-tools.org http://www.biosharing.org View our Git repo & contribute http://github.com/ISA-tools