ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
1. ISA-Tab as a COSMOS
standard
Metabolomics Data Standards and Capture Workshop
Metabolomics Society Meeting 2014, Tsuruoka, Japan
Philippe Rocca-Serra (PhD)
University of Oxford e-Research Centre
2. Data exchange, Let information flow!
⢠Tenets of Science: reproducibility of results and findings
⢠justifies the right to access data
⢠publishing a manuscript is no longer enough
⢠data should be published and released along side
⢠A GEO or an ArrayExpress for Metabolomic Data
⢠What would you do if you had access to 25000 studies in
Metabolomics today?
3. Data Provenance and Preservation
It is all about structuring experimental information to make it available to
computer and software agents to enable:
Notes in Lab Books
(information for humans)
Spreadsheets and Tables
( the compromise)
Facts as RDF statements
(information for machines)
4. Exchange as Main Goal
⢠Exchange of experimental description: the Study Plan
⢠description of subjects and perturbations: ISA-TAB
⢠Exchange of spectral acquisition file: the Raw Data
⢠enables review, assessment,appraisal, reuse:
MzML,nmrML
⢠Exchange of findings: the Results and Interpretation
⢠identified metabolites: Mz-TAB and Metabolite
Annotation File
5. The essential value of
Contextual Data or Metadata
⢠âData about the Dataâ
âdescription of the data (descriptive metadata)
⢠Lazy way: âit is all in the file nameâ approach
CNL_MOA1_C2_LD_TP1_EWR.cdf
⢠Is this enough to understand what this experiment is
about ....5 years from now?
6. ISA-Tab format in a nutshell
(1)
ISA metadata specifications:
â˘workflow and process orientated
â˘compatible with checklist enforcement
â˘compatible with external vocabulary resources
â˘compatible by design with existing schemas
7. ⢠Investigation File: cardinality: 1..1
âpurpose: think âexecutive summaryâ
â layout: rows of key value pairs organized in blocks
â content:
⢠Why? general study description
⢠How? methods / protocol declaration
⢠How? variable declarations (predictor and response variables)
⢠Who? contact and affiliation information
⢠Study File: cardinality: 1..n
âlayout: true header/row of record table (think âsorting, filtering of samplesâ)
âcontent:
⢠What? Listing all biological materials collected over the study course and their
treatments.
⢠Assay File: cardinality: 1..n
âlayout: true header/row of record table (think âsorting, filtering of datafilesâ)
âcontent:
⢠What? Listing all data acquisition events and data files collected by a given assay
and subsequent data transformations
ISA-Tab format in a nutshell
(II)
8. ISA syntax: Characteristics[<tag>]
Declaring and annotating an
ISA Source Name or Sample
Name
ISA syntax: Protocol REF
with sets of Parameter
Value[<tag>] resulting in
a ISA node Sample
Name
Worked example-ISA Study Sample File:
Describing Study Subjects and their
features
ISA syntax: Factor Value[<tag>]
for
reporting treatments or study
groups as a set of levels of
independent variables
9. Worked example - ISA Assay File:
reporting signal acquisition events
ISA Pattern for LC-MS: Splitting in 2 distinct assay tables, one per scan polarity
ISA Pattern for GC-MS: Report derivatization as an extra sample prep step
ISA Pattern for NMR:
10.
11. ⢠Different kinds of experiments, Different annotation
needs
⢠CIMR ISA configurations to deal with Biological
Specifics
⢠Clinical Context (Human as subjects)
⢠Non-clinical Context (=Animal as subjects)
⢠Plant Context (=Plants as subjects)
⢠In-vitro Context ( = Cell as subject)
Dealing with Diversity:
ISA configurations for ISAcreator
12. 30/06/2013
12
In-vitro study Plant study Clinical study
https://github.com/ISA-tools/Configuration-Files
Dealing with Diversity:
Refining CIMR ISA configurations
13. Dealing with Diversity:
Refining CIMR ISA configurations
⢠Different kinds of experiments, Different annotation needs
⢠additional ISA assay table definitions to deal with technology
needs
⢠Targeted profiling or global metabolomics analysis
⢠liquid chromatography mass spectrometry
⢠gas chromatography mass spectrometry
⢠direct infusion mass spectrometry
⢠1D /2D NMR spectroscopy
⢠Metabolic Flux Analysis (ongoing work with Pr Marta Cascante)
14. Developed to be a user friendly way
to enter standards-compliant
metadata: it has lots of features...
But these are just some of
them...we also have a data entry
wizard and an import utility...
The ISAcreator: an editor for ISA-Tab
format
https://github.com/ISA-tools/ISAcreator
15. The ISAcreator: an editor for ISA-Tab
format
https://github.com/ISA-tools/ISAcreator
16. ISAcreator features: visualizing experimental workflows
Work completed during investigation of new approach for creation of glyphs with use of
taxonomy for guidance. See Maguire et al, Taxonomy-Based Glyph Design â with a Case
Study on Visualizing Workflows of Biological Experiments, IEEE Transactions on
Visualization and Computer Graphics, 2012
17. This bit of code indicates you need to
invoke ISA configuration which define
expected table layout in order to
proceed
ISAcreator features: API
https://github.com/ISA-tools/ISAcreator/wiki/API
20. ISA patterns for reporting QC
samples
Annotation Rule of Thumb: does the reported value satisfy the âis_aâ rule?
In this representation, QC1
would be interpreted to be
an instance of organism
whose type is a âvanillic acidâ
=> incorrect
Improved representation:
QC1 would be interpreted to
be
an instance of chemical
compound whose type is a
âvanillic acidâ => incorrect
acting as âpositive controlâ Furthermore,
only 2 actual
study subject
will be
accounted
for
21. Why does it matter?
It is all about structuring experimental information to make it available to
computer and software agents to enable:
Notes in Lab Books
(information for humans)
Spreadsheets and Tables
( the compromise)
Facts as RDF statements
(information for machines)
22. RDF representation of Metabolomics
Experimental information
⢠Query Expansion and Data Discovery
https://github.com/ISA-tools/isa2owl
23. RDF representation of Metabolomic
Experimental information
⢠Conversion of 80 % of public datasets
⢠Tests against case-queries report partial success
⢠Points to the need to enforce stricter curation rules in
order to fully benefit from the RDF representation
⢠Existing conversion already enables easy cohort
creation
⢠Ongoing work: converting MAF file to RDF
⢠enabling querying from experimental metadata to
chemical identities and vice-versa.
https://github.com/ISA-tools/isa2owl
24. Contributing to
Metabolights and ISA
⢠BBRSC UK-China Award & BGI funded
Hackathon
⢠venue: BGI Hong-Kong
⢠Participants:
⢠Metabolights/BGI/ISA/Birmingham/Hong-
Kong University
⢠Outcome:
⢠ISAtab web viewer code
⢠Functional Specifications & Code for
DoE Wisard API
25. Contributing to
Metabolights and ISA
⢠BBRSC UK-China Award funded Hackathon
will be back!
⢠2nd Meeting to be organised
⢠Fancy participating? get in touch!
⢠isatools@googlegroups.com
26. Donât miss out
⢠2 Main Publishers involved in developing
Data Journals
⢠(Scott Edmunds (GigaScience) and
Susanna Sansone (NPG Scientific Data)
⢠Representatives from Metabolomics
Repository
⢠Get all the help you need for depositing
your data and increase the visibility of your
research!
28. Questions??
You can email us...
isatools@googlegroups.com
View our blog
http://isatools.wordpress.com
Follow us on Twitter
@isatools
View our website
http://www.isa-tools.org
Thanks for listening...
View our Git repo &
contribute
http://github.com/ISA-tools
Editor's Notes
Applying a Protocol is reported by adding a âProtocol REFâ fields, which can be qualified by associated âParameter valuesâ as well as a field âPerformerâ to track the Operator Effect and a field âDateâ to track the âday effectâ.
Applying a Protocol is reported by adding a âProtocol REFâ fields, which can be qualified by associated âParameter valuesâ as well as a field âPerformerâ to track the Operator Effect and a field âDateâ to track the âday effectâ.
Applying a Protocol is reported by adding a âProtocol REFâ fields, which can be qualified by associated âParameter valuesâ as well as a field âPerformerâ to track the Operator Effect and a field âDateâ to track the âday effectâ.
Ecosystem revolving around the ISA-TAB format
Support for massively parallel datasets
Focus on a couple of the tools â OntoMaton
Gradient from left to right â configuration (annotation guidelines), curation tools to analysis and usage â people can choose the path that is more convenient for their use case
Once a configuration has been defined, ISAcreator Editor can read it the spreadsheet will be aware of the terminonology restrictions a set by the super user in charge of defining annotation requirements. In this screenshot, you can see the allowed values for reporting Flow cytometry instrument using OBI classes in an Flow Cytometry Assay as defined in ISAconfigurator. Note the Metadata pulled from OBI and readily avaiable for people to check the term they select is correct.
Once a configuration has been defined, ISAcreator Editor can read it the spreadsheet will be aware of the terminonology restrictions a set by the super user in charge of defining annotation requirements. In this screenshot, you can see the allowed values for reporting Flow cytometry instrument using OBI classes in an Flow Cytometry Assay as defined in ISAconfigurator. Note the Metadata pulled from OBI and readily avaiable for people to check the term they select is correct.