BCU 2013

The
Inves)ga)on/Study/Assay
(ISA)

metadata
framework
for
reproducible

and
reusable
bioscience
research

Alejandra
González-‐Beltrán,
PhD

on
behalf
of
the
ISATeam

Oxford
e-‐Research
Centre,
University
of
Oxford

Faculty
of
Technology,
Environment
and
Engineering

Birmingham
City
University

12th
March
2013

Ioannidis
et
al.,
Repeatability
of
published
microarray

gene
expression
analyses.
Nature
Gene*cs
41(2),
149-‐55

(2009)
doi:10.1038/ng.295

h[p://www.nature.com/news/2011/110111/full/469139a.html


h[p://www.economist.com/node/21528593


h[p://www.economist.com/node/21528593
h[p://www.ny)mes.com/2011/07/08/health/research/08genes.html

Contextual
informa)on
(metadata):

•  Sample
characteris)cs

•  Technology
and
measurement
types

•  Instrument
parameters

•  …

Need
for
a
generic
representa)on,
applied
to:

•microarray
based
experiments
(MAGE)

•sequencing
based
experiments
(SRA)

•ﬂow
cytometry
based
experiments
(FuGE-‐Flow
Cyt)

•mass
spectrometry
and
NMR
spectroscopy

experiments
(Metabolights
and
PRIDE)

Roadmap

Reproducible
&
Reusable

Bioscience
Research

Roadmap
reasoning
visualiza)on

analysis
browsing
integra)on

exchange
retrieval

Well-‐annotated
&

Structured
Data

Reproducible
&
Reusable

Bioscience
Research

Roadmap
reasoning
visualiza)on

analysis
browsing
integra)on

exchange
retrieval

Well-‐annotated
&

Structured
Data

Reproducible
&
Reusable

Bioscience
Research

User
community

Roadmap
reasoning
visualiza)on

analysis
browsing
integra)on

exchange
retrieval

Community
Standards
Sodware
Tools

Well-‐annotated
&

Structured
Data

Reproducible
&
Reusable

Bioscience
Research

User
community

Roadmap
reasoning
visualiza)on

analysis
browsing
integra)on

exchange
retrieval

Reproducible
&
Reusable

Bioscience
Research

Bioscience
is
mul)-‐domain…

health

env
agro

tox/pharma

§ 

Interdisciplinary
and
integra:ve
in
character

•  need
to
deal
with
new
and
exis:ng
datasets

•  deal
with
a
variety
of
data
types

Source
of
the
ﬁgure:
EBI
website

Mul)ple
communi)es,
mul)ple
norms
and
standards,
e.g.:

use
the
same
term
to

allow
data
to
ﬂow
from
report
the
same
core,

refer
to
the
same
‘thing’

one
system
to
another
essen)al
informa)on

Challenges: lack of interaction and coordination, duplication of effort,
fragmentation and uneven coverage…hinders interoperability

Growing
number
of
bioscience
repor)ng
standards

303
+

150
+

130
+

Source:
MIBBI,

Source:
BioPortal

Es:mated

EQUATOR

Databases,

annota)on,

cura)on

tools

MAGE-Tab! AAO! miame!
GCDML! MIAPA!
CHEBI! GIATE!
SRAxml! OBI! MIRIAM!
VO!
SOFT! MIQAS!
FASTA! PATO! MIX!
CML! ENVO! REMARK!
DICOM! MIGEN!
GELML! MOD!
SBRML! MIAPE! MIQE!
TEDDY!
MITAB! MzML! XAO! CIMR! CONSORT!
BTO!
ISA-Tab! SEDML…! DO
PRO! IDO…! MIASE! MISFISHIE….!

But…

what
do
we
know
about
them
and
how
they
are
related

MAGE-Tab! AAO! miame!
GCDML! MIAPA!
CHEBI! GIATE!
SRAxml! OBI! MIRIAM!
VO!
SOFT! MIQAS!
FASTA! PATO! MIX!
CML! ENVO! REMARK!
DICOM! MIGEN!
GELML! MOD!
SBRML! MIAPE! MIQE!
TEDDY!
MITAB! MzML! XAO! CIMR! CONSORT!
BTO!
ISA-Tab! SEDML…! DO
PRO! IDO…! MIASE! MISFISHIE….!

But…

what
do
we
know
about
them
and
how
they
are
related

I
use
high
throughput

Which
tools
and

sequencing
technologies,

databases

which
ones
are
relevant
to

implement
which

me?

standards?

How
can
I
get

What
are
the
involved
to
propose

criteria
to
evaluate
extensions
or

their
status
and
modiﬁca)ons?

value?

Which
ones
are
Which
formats
I
work
on
plants,
are

mature
enough
for
support
speciﬁc
these
standards
just

me
to
use
or
minimum
for
biomedical

recommend?
informa)on
applica)ons?

guidelines?

A
coherent,
curated
and

searchable
catalogue
of

data
sharing
resources

•  Bioscience
standards
and

associated
data-‐sharing

policies,
publica:ons,
tools

and
databases

•  Assessment
criteria
for

usability
and
popularity
of

standards

•  Rela:onships
among

standards

•  Encouragement
for

communica:on
&

interac:on
among
groups

•  Promo)ng
interoperability

&
informed
decisions
about

standards

ISA
sodware
suite:
suppor)ng

standards-‐compliant
experimental

annota)on
and
enabling
cura)on
at

infrastructure
the
community
level

Rocca-‐Serra
et
al,

2010

Bioinforma)cs

•  Assist
in
the
annota)on
and
management
of

experimental
metadata
at
source,
suppor)ng
data

provenance
tracking

•  Deal
with
high-‐throughput
studies
using
one
or
a

combina)on
of
omics
and
other
technologies

•  Empower
users
to
uptake
community-‐deﬁned
checklists

and
ontologies

•  Facilitate
data
sharing,
re-‐use,
comparison
and

reproducibility
of
experiments,
submission
to

interna)onal
public
repositories

faahKO
dataset

•  Available
in
Bioconductor

•  Subset
of
the
original
data
on
global
metabolite
proﬁling

Saghatlian
et
al.

Biochemistry.
2004

•  LC/MS
peaks
from
the
spinal
cords
of
6
wild-‐type
and
6
FAAH

(fa[y
acid
amyde
hydrolase)
knockout
mice

-‐

Deﬁne
key
en))es
(e.g.
factors,

protocols,
parameters)

-‐
Grouping
of
studies

-‐
Relate
studies
and
assays
faahKO
inves)ga)on

-‐  Subjects
studied:
source(s),
sampling

methodology,
characteris)cs

faahKO
study
-‐  treatments/manipula)ons
performed

to
prepare
the
specimens

NEWT
UniProt
Taxonomy
Database

Mouse
Genome
Informa)cs

-‐  Subjects
studied:
source(s),
sampling

methodology,
characteris)cs

faahKO
study
-‐  treatments/manipula)ons
performed

to
prepare
the
specimens

Mouse
Adult
Gross
Anatomy

-‐  measurement
type,
e.g.
metabolite
proﬁling

-‐  technology,
e.g.
mass
spectrometry
faahKO
assay

Create template(s) to fit the type of
experiments to be described

Create
templates
detailing
the
steps
to
be

reported
for
different
inves)ga)ons,
complying

to
community
standards,
e.g.
configuring
the

value(s)
allowed
for
each
field
to
be

•  text
(with/without
regular
expression
tes)ng),

•  ontology
terms,

•  numbers
etc.

Describe, curate your experiment using a
desktop-based tool

Report and edit the description using this tool,
(also customized using the templates) with a
spreadsheet like look and feel, packed with
functionalities such as

•  ontology search (access via )

•  term-tagging features

•  import from spreadsheets etc…

•  Ontology
search
and
automated
tagging

(relying
on

NCBO
Bioportal
services)
on
Google
Spreadsheets

•  Collabora)ve
annota)on;
support
for
distributed
users

•  Version
control
&
history

OntoMaton:
a
Bioportal
powered

Ontology
widget
for
Google

Spreadsheets

Maguire
et
al,

2013

Bioinforma)cs

•  R
package
available
in
BioConductor
2.11

h[p://bioconductor.org/packages/release/bioc/html/Risa.html

•  ISAtab
class

•  Read
ISAtab
files
into
ISAtab
objects
and
write
ISAtab

files
back
to
disk

•  Increment
metadata
with
defini)on
factors/
treatments/groups

•  Build
xcmsSet
(xcms
package)
objects
from
mass

spectrometry
assays

•  Augment
the
ISAtab
dataset
ader
analysis

• 

source
&
issues
tracking

h[ps://github.com/ISA-‐tools/Risa

•  faahKO
package
v.
2.12
contains
ISAtab
files

describing
the
experiment

faahkoISA
=
readISAta(find.package("faahKO"))

assay.filename
<-‐
faahkoISA["assay.filenames"][[1]]

xset
=
processAssayXcmsSet(faahkoISA,
assay.filename)

…

updateAssayMetadata(faahkoISA,
assay.filename,"Derived
Spectral

Data
File","faahkoDSDF.txt"
)

•  MTBLS2
processing
and
analysis
using
Risa,
xcms
and

CAMERA
BioConductor
packages

Metabolights – an open access general-purpose repository for
metabolomics studies and associated meta-data

Haug et al, 2012

Nucleic Acids Research

The
implicit
seman)cs
of
the

syntax

Hybridiza)on
Derived
Array
Data
File

Sample
Name
Material
Type
Assay
Design
REF
Array
Data
File
Protocol
REF

Assay
Name

sample1
genomic
DNA
assay1
A-AFFY-107" assay1.cel
data
normaliza)on
assay1.txt

sample2
genomic
DNA
assay2
data
normaliza)on
assay2.txt

sample3
genomic
DNA
assay3
data
normaliza)on
assay3.txt

Material
transforma)ons...

Material
Node
Data
File
Node

"
" DATA!
Characteristics[…]

Material! Derived Data File

Factor Value[…]
(independent Protocol

variables)

Process

Material Type

Comment[…]

Parameter
Value

"
[…]
"
Material! DATA! Raw Data
Performer

(operator effect)

File

Date
(day effect)

45

Tagging:
from
free
text
to
ontology-‐based

• single
interven)on
representa)on,
free
text
annota)on

Factor

Characteris)cs[organism]
Factor
Factor

Source
Name
Value[perturba)on

Value[dose]
Value[dura)on]

agent]

individual1
human
aspirin
high
dose
12
weeks

• single
interven)on,
ontology-‐based
annota)on

Factor

Characteris)cs[organism
Term
Source
Term
Accession
Value[chemical
Term
Source
Term
Accession

Source
Name
obi:0100026)])

REF
Number
compound
REF
Number

CHEBI_37577)]

individual1
Homo
sapiens
NCBITax
9606
aspirin
CHEBI
1231354

Factor
Term
Source
Term
Accession
Factor
Value[)me
Term
Source
Term
Accession

Unit

Value[dose(OBI_0000984)
REF
Number
(PATO_0000165)]
REF
Number

low
dose
LNC
LP30872-‐3
12
week
UO
0000034

ToxBank
eﬀort

developed
by
Nina
Jeliazkova

Health
Care
&
Life
Sciences

Kohonen
et
al.
The
ToxBank
Data
Warehouse:
a
Interest
Group

research
cluster
of
7

EU
FP7
Health
systems
toxicology
and

toxicogenomics
projects.

•  Make
the
seman)cs
of
ISAtab
explicit,
including

materials
&
data
en))es
&
processes
&
their

rela)onships

•  Provide
incen)ves
for
provision
of
ontology-‐
based
annota)ons
in
ISA-‐TAB
datasets;
exploit

those
annota)ons

•  Augment
ISA
syntax
with
new
elements
(e.g.

groups),
facilita)ng
the
understanding
&

querying
of
experimental
design

•  Facilitate
data
integra)on
&
knowledge

discovery/reasoning

architecture

ISA-‐TAB

parser

graph
isa2owl
mapping

analysis
parser

Conﬁgura)on

ﬁle

Implementa)on:

-‐  java-‐based

-‐  Using
owlapi

vocabularies

Chemical
Biomolecular

Informa)on

domain
domain
domain

Experimental

domain

Factor

Characteris)cs[organi
Term
Term
Accession
Value[chemical
Term
Source
Term
Accession

Source
Name
smobi:0100026)])

Source
REF
Number
compound
REF
Number

CHEBI_37577)]

individual1
Homo
sapiens
NCBITax
9606
aspirin
CHEBI
1231354

Open
Biological
and

Biomedical
Ontologies

(OBO)
Foundry
BFO

ChEBI
GO
IAO

Factor

Characteris)cs[organi
OBI

Term
Term
Accession
Value[chemical
Term
Source
Term
Accession

Source
Name
smobi:0100026)])

Source
REF
Number
compound
REF
Number

CHEBI_37577)]

individual1
Homo
sapiens
NCBITax
9606
aspirin
CHEBI
1231354

faahKO
dataset

Available
in

Bioconductor

(with
ISA-‐TAB

metadata)

Global
metabolite

proﬁling

Data
subset:
LC/
MS
peaks
from
the

spinal
cords
of
6

wild-‐type
and
6

FAAH
(fa[y
acid

amyde
hydrolase)

knockout
mice

•  support
different
conversion
modes
(different
levels
of

granularity)

•  querying
for
ISA-‐TAB
datasets,
across
mul)ple

experiment
types

•  reasoning
exploi)ng
ontology
annota)ons

– 
seman)c
valida)on
of
ISA-‐TAB
datasets

•  augmented
annota)on
over
na)ve
ISA
syntax

–  iden)fica)on
gaps
in
ontological
representa)ons

–  feedback
of
findings
to
community
ontologies

Increasing
level
of
structure

for
experimental
metadata

Notes
in
Lab
books
Spreadsheets
&
Tables
Facts
as
RDF
statements

(ISAtab
metadata)

Towards
interoperable
bioscience
data

Sansone
et
al,
2012

Nature
Gene)cs

A
growing
ecosystem

of
over
30
public
and
internal
resources
using
the
ISA
metadata
tracking
framework

to
facilitate
standards-‐compliant
collec)on,
cura)on,
management
and
reuse
of
inves)ga)ons
in
an

increasingly
diverse
set
of
life
science
domains.

Implementa)on
at
Harvard

ISA

h[p://discovery.hsci.harvard.edu/

Implementa)on
at
the

European
Bioinforma)cs
Ins)tute

h[p://www.ebi.ac.uk/metabolights

60

reasoning
visualiza)on

analysis
browsing
integra)on

exchange
retrieval

Reproducible
&
Reusable

Bioscience
Research

@isatools
@biosharing

isa-‐tools.org

isacommons.org

biosharing.org

BCU 2013

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to BCU 2013

Similar to BCU 2013 (20)

More from Alejandra Gonzalez-Beltran

More from Alejandra Gonzalez-Beltran (11)

BCU 2013