Alejandra González-Beltrán, Ph.D
University of Oxford e-Research Centre, UK
From experimental planning to data publication:
the ISA infrastructure
and case studies in toxicology
alejandra.gonzalezbeltran@oerc.ox.ac.uk
OpenTox Europe - Mainz, Germany - 30th September, 2013
1
2
The data workflow
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
Use existing
data
Perform new
experiment
3
The data workflow
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
Use existing
data
Perform new
experiment
metadata
metadata
metadata
metadata
metadata
metadata
metadata tracking
infrastructure
4
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
Use existing
data
Perform new
experiment
metadata
metadata
metadata
metadata
metadata
metadata
Traceability
Assessment
Accountability
Evidence
Reusability
Reproducibility
Storage
Mining
Provenance
5
sem
antics
structure
6
sem
antics
structure
investigation
study
assay
7
8
infrastructureThe
generic format for experimental
description and data exchange
open source software toolscommunity engagement
11
Run Assays4
SAMPLE1
SAMPLE2
SAMPLE3
SAMPLE4
SAMPLE5
SAMPLE6
SAMPLE7
SAMPLE8
SAMPLE9
SAMPLE10
SAMPLE11
SAMPLE 1
SAMPLE 2
SAMPLE 3
SAMPLE 4
SAMPLE 5
SAMPLE 6
SAMPLE 7
SAMPLE 8
SAMPLE 9
SAMPLE 10
SAMPLE 11
FILE 1
FILE 2
FILE 3
FILE 4
FILE 5
FILE 6
FILE 7
FILE 8
FIL
FIL
FIL
Experiment Design Analysis
Arabidopsis thaliana
Treatment groups
70% 90% 100%
Collect Samples1 2 3 5
6
Parses ISA-Tab datasets into R objects, allowing to update them and save them after
analysis.
Bridges the ISA-Tab metadata to analysis pipelines of specific assay types, by building
objects for use in other R packages downstream: currently considering mass
spectrometry (xmcs package, xcmsSet) and DNA microarray (Biobase package,
ExpressionSet)
Suggests packages in BioConductor that might be relevant for an assay type, according
to the BioCViews annotations.
Gonzalez-Beltran et al. The Risa R/Bioconductor package:
integrative data analysis from experimental metadata and
back again. In press
Data Publication with
• New open-access, online-only publication for
descriptions of scientifically valuable datasets
• Only content type: Data Descriptor, narrative
+ structured parts
• Initially focused on the life, environmental and
biomedical sciences
• Data Descriptor will be complementary to
traditional research journals and data
repositories
• Designed to foster data sharing and reuse, and
ultimately to accelerate scientific discoverywww.nature.com/scientificdata
Data Publication with
http://www.nature.com/scientificdata/
• New open-access, online-only publication for
descriptions of scientifically valuable datasets
• Only content type: Data Descriptor, narrative
+ structured parts
• Initially focused on the life, environmental and
biomedical sciences
• Data Descriptor will be complementary to
traditional research journals and data
repositories
• Designed to foster data sharing and reuse, and
ultimately to accelerate scientific discoverywww.nature.com/scientificdata
Data Publication with
http://www.nature.com/scientificdata/
http://gigasciencejournal.com
1
20
A growing ecosystem of over 30 public and internal resources
using the ISA metadata tracking framework (ISA-Tab and/or
format) to facilitate standards-compliant collection, curation,
management and reuse of investigations in an increasingly diverse set
of life science domains, including:
• stem cell discovery
• system biology
• transcriptomics
• toxicogenomics
• also by communities working to build a library of cellular
signatures
• environmental health
• environmental genomics
• metabolomics
• metagenomics
• nanotechnology
• proteomics
21
Toxicity data
http://xkcd.com/1260/
22
Suter et al 2011. EU Framework 6 Project: Predictive Toxicology (PredTox)—overview and outcome.
Boitier et al 2011.A comparative integrated transcript analysis and functional characterization of differential mechanisms
for induction of liver hypertrophy in the rat
InnoMed PredTox Project
Goal: earlier pre-clinical safety evaluation by combining results from ‘omics
technologies and conventional toxicology methods
23
2-week systemic rat study using male Wistar rats (N=15 per dose group)
14 proprietary drug
candidates from
participating companies
and 2 reference toxic
compounds
24
25
26 http://www.ebi.ac.uk/bioinvindex/study.seam?studyId=BII-S-8
27
Data Infrastructure for Chemical Safety
http://www.dixa-fp7.eu/about
28
Kohonen et al. 2013 The ToxBank Data Warehouse: a research cluster of 7
EU FP7 Health systems toxicology and toxicogenomics projects.
Safety Evaluation Ultimately Replacing Animal Testing-1 (SEURAT-1): looking at improving safety
assessment without the need for animal experiments
ToxBank: cross-cluster infrastructure project
http://toxbank.net
29
https://wiki.nci.nih.gov/display/ICR/ISA-TAB-Nano
Nanotechnology
Informatics Working Group
Thomas et al. 2013 ISA-TAB-Nano: A specification for sharing nanomaterial
research data in spreadsheet-based format
Baker et al. 2013 Standardizing data
ISA-TAB-Nano
Extension of ISA-TAB format to represent
nano-materials, small molecules and
biological specimens along with their assay
characterisation data
30
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
31
Questions?
You can email us...
isatools@googlegroups.com
View our blog
http://isatools.wordpress.com
Follow us onTwitter
@isatools
View our website
http://www.isa-tools.org
View our Git repo & contribute
http://github.com/ISA-tools

OpenTox Europe 2013

  • 1.
    Alejandra González-Beltrán, Ph.D Universityof Oxford e-Research Centre, UK From experimental planning to data publication: the ISA infrastructure and case studies in toxicology alejandra.gonzalezbeltran@oerc.ox.ac.uk OpenTox Europe - Mainz, Germany - 30th September, 2013 1
  • 2.
    2 The data workflow Data Scientist Visualization Analysis Planning Data Management DataCollectionPublication Use existing data Perform new experiment
  • 3.
    3 The data workflow Data Scientist Visualization Analysis Planning Data Management DataCollectionPublication Use existing data Perform new experiment metadata metadata metadata metadata metadata metadata metadata tracking infrastructure
  • 4.
    4 Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Performnew experiment metadata metadata metadata metadata metadata metadata Traceability Assessment Accountability Evidence Reusability Reproducibility Storage Mining Provenance
  • 5.
  • 6.
  • 7.
  • 8.
    8 infrastructureThe generic format forexperimental description and data exchange open source software toolscommunity engagement
  • 11.
    11 Run Assays4 SAMPLE1 SAMPLE2 SAMPLE3 SAMPLE4 SAMPLE5 SAMPLE6 SAMPLE7 SAMPLE8 SAMPLE9 SAMPLE10 SAMPLE11 SAMPLE 1 SAMPLE2 SAMPLE 3 SAMPLE 4 SAMPLE 5 SAMPLE 6 SAMPLE 7 SAMPLE 8 SAMPLE 9 SAMPLE 10 SAMPLE 11 FILE 1 FILE 2 FILE 3 FILE 4 FILE 5 FILE 6 FILE 7 FILE 8 FIL FIL FIL Experiment Design Analysis Arabidopsis thaliana Treatment groups 70% 90% 100% Collect Samples1 2 3 5 6 Parses ISA-Tab datasets into R objects, allowing to update them and save them after analysis. Bridges the ISA-Tab metadata to analysis pipelines of specific assay types, by building objects for use in other R packages downstream: currently considering mass spectrometry (xmcs package, xcmsSet) and DNA microarray (Biobase package, ExpressionSet) Suggests packages in BioConductor that might be relevant for an assay type, according to the BioCViews annotations. Gonzalez-Beltran et al. The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again. In press
  • 15.
  • 16.
    • New open-access,online-only publication for descriptions of scientifically valuable datasets • Only content type: Data Descriptor, narrative + structured parts • Initially focused on the life, environmental and biomedical sciences • Data Descriptor will be complementary to traditional research journals and data repositories • Designed to foster data sharing and reuse, and ultimately to accelerate scientific discoverywww.nature.com/scientificdata Data Publication with http://www.nature.com/scientificdata/
  • 17.
    • New open-access,online-only publication for descriptions of scientifically valuable datasets • Only content type: Data Descriptor, narrative + structured parts • Initially focused on the life, environmental and biomedical sciences • Data Descriptor will be complementary to traditional research journals and data repositories • Designed to foster data sharing and reuse, and ultimately to accelerate scientific discoverywww.nature.com/scientificdata Data Publication with http://www.nature.com/scientificdata/ http://gigasciencejournal.com
  • 18.
  • 20.
    20 A growing ecosystemof over 30 public and internal resources using the ISA metadata tracking framework (ISA-Tab and/or format) to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: • stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communities working to build a library of cellular signatures • environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics
  • 21.
  • 22.
    22 Suter et al2011. EU Framework 6 Project: Predictive Toxicology (PredTox)—overview and outcome. Boitier et al 2011.A comparative integrated transcript analysis and functional characterization of differential mechanisms for induction of liver hypertrophy in the rat InnoMed PredTox Project Goal: earlier pre-clinical safety evaluation by combining results from ‘omics technologies and conventional toxicology methods
  • 23.
    23 2-week systemic ratstudy using male Wistar rats (N=15 per dose group) 14 proprietary drug candidates from participating companies and 2 reference toxic compounds
  • 24.
  • 25.
  • 26.
  • 27.
    27 Data Infrastructure forChemical Safety http://www.dixa-fp7.eu/about
  • 28.
    28 Kohonen et al.2013 The ToxBank Data Warehouse: a research cluster of 7 EU FP7 Health systems toxicology and toxicogenomics projects. Safety Evaluation Ultimately Replacing Animal Testing-1 (SEURAT-1): looking at improving safety assessment without the need for animal experiments ToxBank: cross-cluster infrastructure project http://toxbank.net
  • 29.
    29 https://wiki.nci.nih.gov/display/ICR/ISA-TAB-Nano Nanotechnology Informatics Working Group Thomaset al. 2013 ISA-TAB-Nano: A specification for sharing nanomaterial research data in spreadsheet-based format Baker et al. 2013 Standardizing data ISA-TAB-Nano Extension of ISA-TAB format to represent nano-materials, small molecules and biological specimens along with their assay characterisation data
  • 30.
  • 31.
  • 32.
    Questions? You can emailus... isatools@googlegroups.com View our blog http://isatools.wordpress.com Follow us onTwitter @isatools View our website http://www.isa-tools.org View our Git repo & contribute http://github.com/ISA-tools