ENCODE-DCC-metadata-standard-Biocurator 2014

The
ENCODE
metadata
standard
to
integrate

diverse
experimental
data
sets

Eurie
L.
Hong,
Ph.D.
(@elhong)

Project
Manager,
ENCODE
DCC

Department
of
GeneFcs
•
Stanford
University
School
of
Medicine

Intro
to

the
DCC

Metadata

deﬁniFon

Using

ontologies

Accessing

metadata

2

Not
pictured:
Tim
Dreszer,

Jorge
Garcia,
Donna
Karolchik,
Katrina
Learned,
Forrest
Tanaka,
Marcus
Ho

ENCODE
DCC

Galt
Barber,
Morgan
Maddren,
Nikhil
Podduturi,
Greg
Roe,
Kate
Rosenbloom,
Laurence
Rowe

Esther
Chan,
Venkat
Malladi,
Cricket
Sloan,
Seth
StraWan

Eurie
Hong,
Mike
Cherry
(PI),
Jim
Kent
(co-‐PI),
Ben
Hitz

Brian
Lee,
Stuart
Miyasato,
MaW
Simison,
Zhenhua
Wang

@encodedcc
encode-‐help@lists.stanford.edu

Data
Wranglers

So]ware

engineers

QA,
sysadmins,

admin

hWps://github.com/ENCODE-‐DCC/encoded

ProducFon
labs

Analysis
groups

Role:
Data
genera?on
Data
organiza?on
Data
access

Tasks:

Perform
assays
Data
processing
&
validaFon
Web-‐based
searches

Perform
analyses
Data
file
storage
Data
downloads

Validate
data
Metadata
curaFon

Submit
data
files

Submit
metadata

Genome
Browser

ENCODE
portal

(DCC)

Role
of
the
Data
CoordinaFon
Center

Data
files

Metadata
DCC

DCC
Integrative
websites!
Scientific!
community!

Challenge:
How
do
you
deﬁne
a
metadata

standard
for
diverse
assays
in
mulFple

species?

Modiﬁed
from
PLoS
Biol
9-‐e1001046,2011

(M.
Pazin)

Principles
driving
metadata
deﬁniFon

•  Provide
transparency
about
how
experiments
were
performed

•  Capture
data
provenance
during
analyses

•  Communicate
key
experimental
variables
of
an
experiment

•  Communicate
quality
metrics
about
the
data

•

Help
analyze
and
interpret
the
data

•

Help
organize
and
ﬁnd
the
data

Capture
the
experimental
design

Biological

replicate
1

Technical

replicate
1

Technical

replicate
2

Biological

replicate
2

Technical

replicate
1

Technical

replicate
2

Control
1

Control
2

Data
file

Technical

replicate
1

Data
file

Results
file

Experiment

Experiment

IdenFfy
reusable
experimental
variables

Biosamples

•  Type
(e.g.
Fssue,
cell
line)

•  Ontology
term
name

•  Source,
product
id,
lot
id

•  Treatments

•  Knockdown

•  Fusion
construct
informaFon

•  Donor
or
strain
informaFon

•  Dates
(e.g.
growth,
harvest,

procurement)

•  Passage
number

•  StarFng
amount

•  Lab
assigned
IDs

AnFbodies

•  Source,
product
id,
lot
id

•  Isotype

•  AnFgen

•  Host

•  PuriﬁcaFon
method

•  ValidaFon
status

•  NHGRI
approval
status

•  Target

•  Species

•  Dbxrefs

Libraries

•  Library
preparaFon
protocol

•  Strand
speciﬁcity

•  Size
selecFon
method

•  ValidaFon
document

•  Lysis
method

•  SonicaFon
method

•  ExtracFon
method

•  Nucleic
acid
type

•  Nucleic
acid
size
range

+

Files

Peak
calls

•  Reference
genome
version

•  Alignment
so]ware

•  So]ware
parameters

•  So]ware
version

•  Quality
metrics
(e.g.
NRF,
FRiP)

Alignment

(selected
subset
of
all
metadata)

Experiment
with
replicates

Accession
them

Biosamples

•  Type
(e.g.
Fssue,
cell
line)

•  Ontology
term
name

•  Source,
product
id,
lot
id

•  Treatments

•  Knockdown

•  Fusion
construct
informaFon

•  Donor
or
strain
informaFon

•  Dates
(e.g.
growth,
harvest,

procurement)

•  Passage
number

•  StarFng
amount

•  Lab
assigned
IDs

AnFbodies

•  Source,
product
id,
lot
id

•  Isotype

•  AnFgen

•  Host

•  PuriﬁcaFon
method

•  ValidaFon
status

•  NHGRI
approval
status

•  Target

•  Species

•  DBxrefs

Libraries

•  Library
preparaFon
protocol

•  Strand
speciﬁcity

•  Size
selecFon
method

•  ValidaFon
document

•  Lysis
method

•  SonicaFon
method

•  ExtracFon
method

•  Nucleic
acid
type

•  Nucleic
acid
size
range

+

Files

Peak
calls

•  Reference
genome
version

•  Alignment
so]ware

•  So]ware
parameters

•  So]ware
version

•  Quality
metrics
(e.g.
NRF,
FRiP)

Alignment

(selected
subset
of
all
metadata)

Experiment
with
replicates
(ENCSR000DRY)

ENCBS095DKV
(biosample)

ENCDO826IFN
(donors)
ENCAB964IAU
ENCLB239KAN
ENCFF254TDA

Deﬁne
their
relaFonship
to
each
other

Biosample

AnFbodies

Libraries

+

Files

Donor

Biosample

Replicate

has

has

has

has

has

has

Experiment

has

Challenge:
Find
common
biosamples
from
data

generated
by
two
consorFa

356
terms

hWp://encodeproject.org/ENCODE/cellTypes.html

Projects
are
internally
consistent…..

314
terms

GEO
characterisFcs:
common_name,
Fssue_type,
cell_type,
lines

360
terms

Cell
type

…
but
only
3
biosample
names
match
exactly
between
projects

314
terms

GEO

IMR90

PBMC

Th17

Challenge:
Find
all
heart-‐related
Fssues?

Heart_OC

HCF

HCFaa

HCM

Others?

Fetal
Heart

Heart

Right
Atrium

Right
Ventricle

Others?

Project
integraFon
using
ontologies

DCC
OBI
(for
assays):
hWp://obi-‐ontology.org

EFO
(for
cell
lines):

hWp://www.ebi.ac.uk/efo/

UBERON
(for
Fssues):
hWp://uberon.org/

CL
(for
primary
cells):
hWp://cellontology.org/

ENCODE
portal

(DCC)

Other

projects

Ontology-‐driven
searches

hWp://www.encodedcc.org/

Metadata
database

Metadata
in
JSON-‐LD

Metadata
viewed
as

web
page

Scripts

Query
using
REST
API
commands:

GET,
PATCH,
POST

DCC

Challenge:
Provide
user-‐friendly
*AND*

programmaFc
access
to
the
data

Genome
Browser

IntegraFon
with
other
resources

hWp://www.encodedcc.org/

Future
direcFons

•  Metadata
deﬁniFon:
Finalize
so]ware
and
ﬁle
provenance

•  Ontology-‐based
searches:
Implement
searches
for
ChIP-‐seq

targets
using
GO
annotaFons

•  ProgrammaFc
access:
Implement
addiFonal
validaFons
upon

data
submission

Intro
to

the
DCC

Metadata

definiFon

Using

ontologies

Accessing

metadata

We
developed
a
single
data
model
that
reflects
the
experimental

process
to
store
the
30+
assays
done
by
the
ENCODE
producFon
labs

Using
ontologies
to
annotate
metadata
provides
instant

interoperability
with
other
datasets
&
search
funcFonality

ApplicaFon
built
on
a
REST
API
&
JSON-‐LD
supports

programmaFc
querying
across
other
scienFfic
resources

Conclusions

19

Acknowledgements

Brian
Lee,
Nikhil
Podduturi,
Greg
Roe,
Laurence
Rowe

Esther
Chan,
Venkat
Malladi,
Cricket
Sloan,
Seth
StraWan

Eurie
Hong,
Mike
Cherry
(PI),
Jim
Kent
(co-‐PI),
Ben
Hitz

@encodedcc
encode-‐help@lists.stanford.edu

ENCODE-DCC-metadata-standard-Biocurator 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to ENCODE-DCC-metadata-standard-Biocurator 2014

Similar to ENCODE-DCC-metadata-standard-Biocurator 2014 (20)

Recently uploaded

Recently uploaded (20)

ENCODE-DCC-metadata-standard-Biocurator 2014