1. 1
The ISA infrastructure:
supporting bio-scientists from experimental
design to data publication
Alejandra González-Beltrán, Ph.D
University of Oxford e-Research Centre, UK
alejandra.gonzalezbeltran@oerc.ox.ac.uk
4to. Congreso Argentino de Bioinformática y Biología Computacional (4CAB2C) &
4ta. Conferencia Internacional de la Sociedad Iberoamericana de Bioinformática (SolBio)
29-31 October 2013, Rosario, Argentina
18. Experimental workflow
Planning
Planning
Use existing
data
Publication
Data Collection
Perform new
experiment
Use existing
data
Publication
Data
Scientist
Data
Scientist
Data
Management
Visualization
Analysis
Data Collection
Data
Management
Visualization
ity
bil
sa
eu
aR
at
D
Analysis
Perform new
experiment
19. Experimental workflow
en
id
Ev
Planning
ce
na
ve
ro
P
ce
n
ra
T
Data Collection
Use existing
data
Perform new
experiment
Publication
Data
Scientist
ity
bil
sa
eu
aR
at
ce ility
D
en ib
ci uc
S d
ro
ep
R
Analysis
nt
e
cc
A
Perform new
experiment
Data
Management
Visualization
Analysis
sm
es
ss
A
Data Collection
Data
Scientist
Data
Management
Visualization
y
Planning
Use existing
data
Publication
ea
c
lit
bi
un
o
ab
t
ity
il
Re
rie
t
al
v
i
M
ng
ni
36. Experimental workflow - graph representation
H1.sample1
H1.sample1.labeled
...
Scanning
h1-s1.cel
...
Labeling
Scanning
h1-s2.cel
...
Scanning
h2-s1.cel
H1
H. Sapiens
35 Years
H2
H. Sapiens
33 Years
H1.sample2
H2.sample1
Labeling
H2.sample1.labeled
37. Experimental workflow - graph representation
Labeling
H1.sample1.labeled
...
Scanning
h1-s1.cel
...
H1.sample1
Scanning
h1-s2.cel
...
Scanning
h2-s1.cel
H1
H. Sapiens
35 Years
H2
H1.sample2
Labeling
H2.sample1
H2.sample1.labeled
H. Sapiens
33 Years
Spreadsheets for end-users
...
H1
H. Sapiens
35
Years
H1.sample1
H1
H. Sapiens
35
Years
H1.sample2
H2
H. Sapiens
33
Years
H2.sample1
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
Scanning
Labeling
Scanning
h1-s2.cel
Scanning
h2-s1.cel
vocabulary for the description of the experimental workflow
38. Experimental workflow - graph representation
Labeling
H1.sample1.labeled
...
Scanning
h1-s1.cel
...
H1.sample1
Scanning
h1-s2.cel
...
Scanning
h2-s1.cel
H1
H. Sapiens
35 Years
H2
H1.sample2
Labeling
H2.sample1
H2.sample1.labeled
H. Sapiens
33 Years
Spreadsheets for end-users
...
H1
H. Sapiens
35
Years
H1.sample1
H1
H. Sapiens
35
Years
H1.sample2
H2
H. Sapiens
33
Years
H2.sample1
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
Scanning
Labeling
Scanning
h1-s2.cel
Scanning
h2-s1.cel
vocabulary for the description of the experimental workflow
syntactic interoperability
across biological experiments of different types
40. A growing ecosystem of over 30 public and internal resources
using the ISA metadata tracking framework (ISA-Tab and/or
format) to facilitate standards-compliant collection, curation,
management and reuse of investigations in an increasingly diverse set
of life science domains, including:
• environmental health
• environmental genomics
• metabolomics
• metagenomics
• nanotechnology
• proteomics
23
• stem cell discovery
• system biology
• transcriptomics
• toxicogenomics
• also by communities working to build a library of cellular
signatures
46. Create template(s) to fit the type of
experiments to be described!
!
Create!templates(detailing!the!steps!to!be!reported!
for!different!inves4ga4ons,!complying(to(
community(standards,!e.g.!configuring!the!value(s)!
allowed!for!each!field!to!be!!
• text!(with/without!regular!expression!tes4ng),!
• ontology!terms,!
• numbers!etc.&
!
We#now#have#GSC#compliant#configura7ons#for#
submission#to#ENA.#
&
&
&
29
47. Or describe, curate your experiment
using a desktop-based tool!
Report and edit the description using this tool,
(also customized using the templates) with a
spreadsheet like look and feel, packed with
functionalities such as !
• ontology search (access via
)!
• term-tagging features!
• import from spreadsheets etc…!
!
30
58. Analysis
Analysis
The interesting bit...doing something with our data and metadata...
Analysis of ISA Tab data in
the R language. Brings
together the context and data
to enable more meaningful
analysis.
Also suggests packages to
use for analysis based on the
data types in the ISA Tab file.
Analysis of ISA-Tab data in the
Galaxy Environment.
Analysis of ISA-Tab data in the
GenomeSpace Environment.
Creates Galaxy Library objects
from ISA-Tab files.
Load and edit files stored on distributed
servers.
Created by Brad Chapman at the
Harvard School for Public Health
59. 1 Experiment Design
2 Collect Samples
3
4 Run Assays
5 Analysis
70%
SAMPLE 1
FILE 1
SAMPLE2
SAMPLE 2
FILE 2
SAMPLE3
SAMPLE 3
FILE 3
SAMPLE4
SAMPLE 4
FILE 4
SAMPLE5
SAMPLE 5
FILE 5
SAMPLE6
SAMPLE 6
FILE 6
SAMPLE7
SAMPLE 7
FILE 7
SAMPLE8
SAMPLE 8
FILE 8
SAMPLE9
SAMPLE 9
FIL
SAMPLE10
SAMPLE 10
FIL
SAMPLE11
Arabidopsis thaliana
100%
90%
SAMPLE1
SAMPLE 11
FIL
6
Treatment groups
Parses ISA-Tab datasets into R objects, allowing to update them and save them after
analysis.
Bridges the ISA-Tab metadata to analysis pipelines of specific assay types, by building
objects for use in other R packages downstream: currently considering mass
spectrometry (xmcs package, xcmsSet) and DNA microarray (Biobase package,
ExpressionSet)
Suggests packages in BioConductor that might be relevant for an assay type, according
to the BioCViews annotations.
39
Gonzalez-Beltran et al. The Risa R/Bioconductor package:
integrative data analysis from experimental metadata and
back again. In press
65. Publication
Publication
Getting your work out there...
Share, link and
reason over
experiments with
linked data
Publish, along with
your research
articles
& specialised
community
repositories
68. •
•
•
•
•
New open-access, online-only publication for descriptions of scientifically valuable datasets
Only content type: Data Descriptor, narrative + structured parts
Initially focused on the life, environmental and biomedical sciences
Data Descriptor will be complementary to traditional research journals and data repositories
Designed to foster data sharing and reuse, and ultimately to accelerate scientific discovery
www.nature.com/scientificdata
69. Data Descriptors served by Scientific Data
Structured Section
Narrative Section
A brief article-like document like with:
•Title
•Abstract
Detailed descriptions of the experimental
procedures used to produce the data
•Following community-defined minimum
information requirements
•Background & Summary
•for a level of detail sufficient to reproduce the
experiments
•Methods
•Using ontologies & controlled-vocabularies
•Technical Validation
•To maximise consistency of the descriptions
•Usage Notes
•Figures & Tables
•References
www.nature.com/scientificdata
70. Data Descriptors served by Scientific Data
Structured Section
Narrative Section
A brief article-like document like with:
•Title
•Abstract
Detailed descriptions of the experimental
procedures used to produce the data
•Following community-defined minimum
information requirements
•Background & Summary
•for a level of detail sufficient to reproduce the
experiments
•Methods
•Using ontologies & controlled-vocabularies
•Technical Validation
•To maximise consistency of the descriptions
•Usage Notes
•Figures & Tables
•References
www.nature.com/scientificdata
74. Thanks for your attention!
Questions?
You can email us...
isatools@googlegroups.com
View our website
http://www.isa-tools.org
View our Git repo & contribute
http://github.com/ISA-tools
View our blog
http://isatools.wordpress.com
Follow us on Twitter
@isatools