Presented during the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'12). Part of the workshop 'New Models and Modes for Data Sharing: Experiences from Neuroscience'. Presented by Jeffrey S. Grethe, Ph.D. from the Center for Research in Biological Systems at the University of California, San Diego.
This workshop featured several large scale efforts to establish data sharing platforms, standards and tools to promote data intensive analysis in the neurosciences. As we head into the second decade of the 21st century, many scientists realize that current methods for publishing and accessing data are outmoded and inefficient. Neuroscience, with its large diverse and highly competitive community, has been slow to adopt more open sharing of data and has lacked effective tools to do so. There has been a significant investment in databases and tools for biological science, and frequent calls for more of them, but few calls to the biological community to adopt practices and frameworks for making their resources more easily discoverable and data more accessible. Data are contained within diverse sources, from web pages, databases, literature to personal lab systems, making for a haphazard mechanism for data and tool discovery. Although these mechanisms are effective for small communities, they are parochial for the totality of resources available, leading to fragmentation in the resource ecosystem. Neuroscience, with its diverse subdisciplines, complex data types and broad domain, presents the perfect exemplar of the current practices, bottlenecks and issues surrounding open access to data. This situation is changing, however, as groups have started to work together to define new models and tools for sharing and analyzing neuroscience data on an international scale. In this workshop, we bring together experts from national and international projects to discuss issues of data access and progress towards establishing platforms and best practices for effective sharing of neuroscience data in support of basic and clinical neuroscience.
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
Where are the Data? Perspectives from the Neuroscience Information Framework.
1. Where
are
the
Data?
Perspec.ves
from
the
Neuroscience
Informa.on
Framework
Jeffrey
S.
Grethe,
Ph.
D.
Center
for
Research
in
Biological
Systems
University
of
California,
San
Diego
3. “Neural
Choreography”
“A
grand
challenge
in
neuroscience
is
to
elucidate
brain
func3on
in
rela3on
to
its
mul3ple
layers
of
organiza3on
that
operate
at
different
spa3al
and
temporal
scales.
Central
to
this
effort
is
tackling
“neural
choreography”
-‐-‐
the
integrated
func3oning
of
neurons
into
brain
circuits-‐-‐their
spa3al
organiza3on,
local
and
long-‐distance
connec3ons,
their
temporal
orchestra3on,
and
their
dynamic
features.
Neural
choreography
cannot
be
understood
via
a
purely
reduc3onist
approach.
Rather,
it
entails
the
convergent
use
of
analy3cal
and
synthe3c
tools
to
gather,
analyze
and
mine
informa*on
from
each
level
of
analysis,
and
capture
the
emergence
of
new
layers
of
func3on
(or
dysfunc3on)
as
we
move
from
studying
genes
and
proteins,
to
cells,
circuits,
thought,
and
behavior....
However,
the
neuroscience
community
is
not
yet
fully
engaged
in
exploiEng
the
rich
array
of
data
currently
available,
nor
is
it
adequately
poised
to
capitalize
on
the
forthcoming
data
explosion.
“
Akil
et
al.,
Science,
Feb
11,
2011
4. “We
speak
piously
of
taking
measurements
and
making
small
studies
that
will
add
another
brick
to
the
temple
of
science.
Most
such
bricks
just
lie
around
the
brickyard.”
"We
now
have
unprecedented
PlaO,
J.R.
(1964)
Strong
ability
to
collect
data
about
Inference.
Science.
146:
nature…but
there
is
now
a
crisis
347-‐353.
developing
in
biology,
in
that
c o m p l e t e l y
u n s t r u c t u r e d
informa*on
does
not
enhance
understanding”
Sidney
Brenner
5. The
Data
Federa*on
Problem
No
single
technology
serves
these
all
equally
well.
à Mul*ple
data
types;
mul*ple
scales;
mul*ple
databases
Whole
brain
data
(20
um
microscopic
MRI)
Mosiac
LM
images
(1
GB+)
Conven3onal
LM
images
Individual
cell
morphologies
Neuroscience
is
unlikely
to
be
served
by
a
few
large
databases
EM
volumes
&
reconstruc3ons
like
the
genomics
and
proteomics
community
Solved
molecular
structures
7. What
do
you
mean
by
data?
Databases
come
in
many
shapes
and
sizes
• Primary
data:
• Registries:
– Data
available
for
reanalysis,
e.g.,
– Metadata
microarray
data
sets
from
GEO;
– Pointers
to
data
sets
or
brain
images
from
XNAT;
materials
stored
elsewhere
microscopic
images
(CCDB/CIL)
• Data
aggregators
• Secondary
data
– Aggregate
data
of
the
same
– Data
features
extracted
through
type
from
mul3ple
sources,
data
processing
and
some3mes
e.g.,
Cell
Image
normaliza3on,
e.g,
brain
structure
Library
,SUMSdb,
Brede
volumes
(IBVD),
gene
expression
• Single
source
levels
(Allen
Brain
Atlas);
brain
– Data
acquired
within
a
single
connec3vity
statements
(BAMS)
context
,
e.g.,
Allen
Brain
Atlas
• Ter3ary
data
– Claims
and
asser3ons
about
the
meaning
of
data
• E.g.,
gene
upregula3on/
downregula3on,
brain
ac3va3on
as
a
func3on
of
task
8. Data,
not
just
stories
about
them!
47/50
major
preclinical
published
• “There
are
no
guidelines
that
cancer
studies
could
not
be
replicated
require
all
data
sets
to
be
reported
in
a
paper;
oeen,
• “The
scien3fic
community
original
data
are
removed
during
assumes
that
the
claims
in
a
the
peer
review
and
publicaEon
preclinical
study
can
be
taken
process.
“
at
face
value-‐that
although
there
might
be
some
errors
in
detail,
the
main
message
of
• GeQng
data
out
sooner
in
a
the
paper
can
be
relied
on
form
where
they
can
be
exposed
to
many
eyes
and
and
the
data
will,
for
the
many
analyses,
and
easily
most
part,
stand
the
test
of
compared,
may
allow
us
to
3me.
Unfortunately,
this
is
expose
errors
and
develop
not
always
the
case.”
beSer
metrics
to
evaluate
the
validity
of
data
Begley
and
Ellis,
29
MARCH
2012
|
VOL
483
|
NATURE
|
531
9. In
an
ideal
world...
We’d
like
to
be
able
to
find
• What
is
known:
– What
is
the
average
diameter
of
a
Purkinje
neuron
– Is
GRM1
expressed
In
cerebral
cortex?
– What
are
the
projec3ons
of
hippocampus?
– What
genes
have
been
found
to
be
upregulated
in
chronic
drug
abuse
in
adults
– Find
images
showing
dendri3c
spines
containing
membrane
bound
organelles
– What
animal
models
have
similar
phenotypes
to
Parkinson’s
disease?
– What
studies
used
my
polyclonal
an3body
against
GABA
in
humans?
• What
is
not
known:
– Connec3ons
among
data
– Gaps
in
knowledge
Without
some
sort
of
framework,
very
difficult
to
do
10. The
Problems
Researchers
Face
•
We
are
not
publishing
data
in
a
form
that
is
easy
to
find
or
integrate
•
What
we
mean
isn’t
clear
to
a
search
engine
(or
even
to
a
human)
•
NIF
Registry:
A
catalog
of
neuroscience-‐relevant
resources
>
4700
currently
described
>
2000
databases
•
Searching
and
naviga*ng
across
individual
resources
takes
an
inordinate
amount
of
human
effort
11. But
we
have
Google!
• Current
web
is
• Wikipedia:
The
Deep
designed
to
share
Web
(also
called
documents
Deepnet,
the
invisible
– Documents
are
Web,
DarkNet,
unstructured
data
Undernet
or
the
hidden
Web)
refers
to
World
• Much
of
the
content
of
Wide
Web
content
that
digital
resources
is
part
is
not
part
of
the
of
the
“hidden
web”
Surface
Web,
which
is
indexed
by
standard
search
engines.
12. But
we
have
Pub
Med!
• Bulk
of
neuroscience
• Structured
vs.
data
is
published
as
unstructured
part
of
papers
informa3on
– >
20,000,000
“...it
is
a
growing
challenge
to
ensure
that
data
produced
during
the
course
of
reported
research
are
appropriately
described,
standardized,
archived,
and
available
to
all.”
Lead
Science
editorial
(Science
11
February
2011:
Vol.
331
no.
6018
p.
649
)
Author,
year,
journal,
keywords
13. NIF:
A
New
Type
of
En*ty
for
New
Modes
of
Scien*fic
Dissemina*on
• NIF’s
mission
is
to
maximize
the
awareness
of,
access
to
and
u3lity
of
digital
resources
produced
worldwide
to
enable
beher
science
and
promote
efficient
use
– NIF
is
the
only
neuroscience
informa3on
en3ty
that
views
resources
globally
without
respect
to
domain,
funding
agency,
ins3tute
or
community
– NIF
is
like
a
“Pub
Med”
for
all
neuroscience
resources
– Aggregates
all
the
different
databases,
tools
and
resources
now
produced
by
the
scien3fic
community
– Makes
them
searchable
from
a
single
interface
– A
prac3cal
approach
to
the
data
deluge
– The
“authority”
on
resources
for
neuroscience
– Educate
neuroscien*sts
and
students
about
effec*ve
data
sharing
14. People
use
NIF
to...
• Find
resources
– “Where
can
I
find
a
translaEon
of
Talaraich
to
MNI
coordinates-‐
NIF
Forum
– “What
biospecimen
banks
are
available
with
Essues
from
opiate
addicts?”-‐NIH
• Find
answers
– What
is
the
amount
of
data
published
on
males
vs
females-‐
NIH
– “What
projects
to
the
ventral
lateral
geniculate
nucleus”-‐UCSD
researcher
– “What
is
known
about
the
choroid
plexus?”-‐Small
business
owner
• NIF
is
listed
in
the
library
guides
of
>
85
research
universi3es
worldwide
(ñ
70%
from
last
year)
• NIF
receives
hits
from
>
350
colleges
and
universi3es
every
month
• NIF
receives
hits
from
pharmaceu3cal
companies
• Listed
as
link
on
4
socie3es:
Society
for
Neuroscience,
American
Associa3on
of
Anatomists,
Society
of
Immune
Pharmacology,
American
Academy
of
Neurology
• Track
resource
u3liza3on
– What
projects
are
using
my
an3body/mouse/database?
• Serve
as
a
springboard
– NIF
ontologies,
tools
and
data
resources
are
used
by
many
groups
(>80,000
hits/
month
on
NIF
services)
– NIF
technologies
and
exper3se
jumpstart
related
efforts
• One
Mind
for
Research
15. An
Overview
of
NIF
• Assembled
the
largest
searchable
colla3on
of
neuroscience
data
on
the
web
• The
largest
catalog
of
biomedical
resources
(data,
tools,
materials,
services)
available
• The
largest
ontology
for
neuroscience
• NIF
search
portal:
simultaneous
search
over
data,
NIF
catalog
and
biomedical
literature
• Neurolex
Wiki:
a
community
wiki
serving
neuroscience
concepts
• A
unique
technology
planorm
• Cross-‐neuroscience
analy3cs
• A
reservoir
of
cross-‐disciplinary
biomedical
data
exper.se
16. NIF
services
for
data
providers
• NIF
ensures
that
all
data
are
discoverable,
accessible
and
understandable
– If
data
are
already
in
a
database,
NIF
federates
them
• Aligns
data
to
common
framework
• Makes
them
collec3vely
searchable
• Provides
uniform
data
access
services
for
linking
resources
– If
data
are
not
in
a
database:
• NIF
locates
a
suitable
database
within
its
federa3on
and
facilitates
inges3on
• If
no
database
is
available,
NIF
creates
a
reasonable
structure
using
its
database
tools;
stores
data
in
available
data
repositories
(currently
UCSD
CRBS/SDSC)
and
makes
it
available
through
the
NIF
portal
– Assigns
a
URI
for
data
iden3fica3on
NIF
uses
manual,
semi-‐automated
and
automated
tools
for
inges3on
and
cura3on
17. Registering
a
resource
in
NIF
NIF
provides
a
set
of
tools
and
services
for
easy
sharing
of
data
and
linking
of
data
to
ar3cles,
web
sites
etc.
What
users
are
searching
for:
– NIF
makes
it
easy
to
add
and
manage
resources
through
NIF
• Need
to
respect
resource
and
3me
constraints
of
resource
providers
– Different
levels
of
access
• NIF
Registry
(basic)
• NIF
Site
Map
• NIF
level
2
– create
web
access
and
basic
structure
for
resources
without
API
– U3lizes
DISCO
tools
developed
at
Yale
• NIF
level
3:
Web
service
access,
schema
registra3on
18. NIF
Registry
• NIF
Registry:
each
resource
gets
its
own
URI
and
own
Wiki
page
– Insert
maps,
Twiher
feeds
• NIF
site
map:
manage
updates
to
your
resource
page
– U3lizes
DISCO
protocol
(Luis
Marenco,
Rixin
Wang,
Yale
U)
– NIF
also
consumes
other
sitemaps
for
bioscience,
e.g.,
Biositemaps
19. The
NeuroLex
Wiki:
A
lexicon
for
neuroscience
• Seman3c
wiki
tracking
>
18,000
neuroscience
concepts
• Built
from
and
for
NIF
ontologies
• Supports
integra3on
of
tools
and
widgets
20. A
dynamic
index
for
neuroscience
Parts
of
rodent
brain
Parts
of
white
maher
Parts
of
human
brain
21. A
Seman*cally
Enabled
Search
Engine
• NIF
has
developed
a
produc3on
technology
planorm
for
researchers
to
discover,
share,
access,
analyze,
and
integrate
neuroscience-‐relevant
informa3on
– Seman3cally-‐enabled
search
engine
and
interface
that
customizes
results
for
neuroscience
– System
that
searches
the
“hidden
web”,
i.e.,
content
not
well
served
by
search
engines
– Automated
data
harves3ng
technologies
that
produce
dynamic
indices
of
data
content
including
databases,
web
pages,
text,
xml
etc.
– Easy
to
use
tools
to
make
products
and
data
available
• NIF
has
developed
a
wealth
of
knowledge
about
data
resources
and
data
integra3on
in
the
life
sciences
22. NIF
Data
Federa*on
1000
160
NIF
provides
access
to
the
largest
collec3on
of
neuroscience
relevant
data
on
the
web,
140
Number
of
Federated
Records
(Millions)
all
from
a
single
interface
–already
have
100
Number
of
Federated
Databases
surpassed
year
4
cumula3ve
targets
120
100
10
RDP
80
1
60
Resource
Registry:
4700
...
40
0.1
An3bodies:
935,000
Brain
connec3vity:
66,000
20
Animal
models:
270,000
DISCO
Brain
ac3va3on
foci:
56,000
0.01
0
Jun-‐08
Dec-‐08
Jul-‐09
Jan-‐10
Aug-‐10
Feb-‐11
Sep-‐11
Apr-‐12
25. Making
common
neuroscience
concepts
computable:
concept-‐based
queries
• Search
Google:
GABAergic
neuron
• Search
NIF:
GABAergic
neuron
– NIF
automa3cally
searches
for
types
of
GABAergic
neurons
26. “Search
compu*ng”
What
genes
are
upregulated
by
drugs
of
abuse
in
the
adult
mouse?
Morphine
Increased
expression
Adult
Mouse
Some
concepts,
e.g.,
age
category,
are
quan3ta3ve
but
s3ll
must
be
interpreted
in
a
global
query
system
27. NIF
STANDARD
ONTOLOGIES
(NIFSTD)
• Set
of
modular
ontologies
Bill
Bug
et
al.
– Covering
neuroscience
relevant
terminologies
– Comprehensive
50,000+
dis3nct
concepts
+
synonyms
• Expressed
in
OWL-‐DL
language
• Closely
follows
OBO
community
best
prac3ces
– As
long
as
they
seem
prac3cal
• Avoids
duplica3on
of
efforts
– Standardized
to
the
same
upper
level
ontologies,
e.g.,
– Basic
Formal
Ontology
(BFO),
OBO
Rela3ons
Ontology
(OBO-‐RO),
• Modules
cover
orthogonal
domain
Phonotypical
Quali3es
Ontology
(PATO)
– Relies
on
exis3ng
community
ontologies
e.g.
,
Brain
Regions,
Cells,
Molecules,
e.g.,
CHEBI,
GO,
PRO,
OBI
etc.
Subcellular
parts,
Diseases,
Nervous
system
func3ons,
etc.
28. Data
Services
for
Users
Vocabulary
• NITRC
(autocomplete)
• Neuroscience.com
(annotate)
• INCF
Atlasing
tools
Data
Summary
(NIF
Navigator)
• NIDA,
Blueprint
• NeuroLex
Individual
Data
Sources
• DOMEO
• OneMind
• Eagle
I
Current
DISCO
Services
(LinkOut)
Planned
• PubMed
29. NIF
Link
Out
Broker:
Connec*ng
Resources
NIF
inserted
>
800,000
references
to
Pub
Med
ID’s
NIF
inserts
links
between
data
and
ar3cles
on
behalf
of
data
providers
using
NCBI’s
Link
Out
feature
30. Grabbing
the
long
tail
of
small
data
• Analysis
of
NIF
shows
mul3ple
databases
with
similar
scope
and
content
• Many
contain
par3ally
overlapping
data
• Data
“flows”
from
one
resource
to
the
next
– Data
is
reinterpreted,
reanalyzed
or
added
to
– When
does
it
become
something
else?
• Is
duplica3on
good
or
bad?
31. NIF
Analy*cs:
The
Neuroscience
Ecosystem
Where
are
the
data?
Striatum
Brain
Hypothalamus
Olfactory
bulb
Data
source
Brain
region
Cerebral
cortex
NIF
is
in
a
unique
posi3on
to
answer
ques3ons
about
the
neuroscience
ecosystem
32. How
much
of
the
landscape
do
we
have?
Query
for
“reference”
brain
structures
and
their
parts
in
NIF
Connec*vity
database
33. Embracing
duplica*on:
Data
Mash
ups
•
~300
PMID’s
were
common
between
Brede
and
SUMSdb
•
Same
informa3on;
value
added
Same
data
-‐
different
aspects
34. Same
data:
different
analysis
• Drug
Related
Gene
database:
Chronic
vs
acute
morphine
in
extracted
statements
from
figures,
striatum
tables
and
supplementary
data
from
published
ar3cle
• Gemma:
Reanalyzed
microarray
results
from
GEO
using
different
algorithms
• Both
provide
results
of
increased
or
decreased
expression
as
a
func3on
of
experimental
paradigm
– 4
strains
of
mice
Mined
NIF
for
all
references
to
GEO
– 3
condi3ons:
chronic
morphine,
ID’s:
found
small
number
where
the
acute
morphine,
saline
same
dataset
was
represented
in
two
or
more
databases
hhp://www.chibi.ubc.ca/Gemma/home.html
35. How
easy
was
it
to
compare?
• Gemma:
Gene
ID
+
Gene
Symbol
• DRG:
Gene
name
+
Probe
ID
• Gemma:
Increased
expression/decreased
expression
NIF
annota3on
• DRG:
Increased
expression/decreased
expression
standard
– But...Gemma
presented
results
rela3ve
to
baseline
chronic
morphine;
DRG
with
respect
to
saline,
so
direc3on
of
change
is
opposite
in
the
2
databases
• Analysis:
– 1370
statements
from
Gemma
regarding
gene
expression
as
a
func3on
of
chronic
morphine
– 617
were
consistent
with
DRG;
à
over
half
of
the
claims
of
the
paper
were
not
confirmed
in
this
analysis
– Results
for
1
gene
were
opposite
in
DRG
and
Gemma
– 45
did
not
have
enough
informa3on
provided
in
the
paper
to
make
a
judgment
36. A
global
view
of
data
Informa*cs
should
not
be
an
aherthought
– You
(and
the
machine)
have
to
be
able
to
find
it
• Accessible
through
the
web
• Annota3ons
– You
have
to
be
able
to
use
it
• Data
type
specified
and
in
a
usable
form
– You
have
to
know
what
the
data
mean
– Some
seman3cs
– Context:
Experimental
metadata
– Provenance:
Where
did
the
data
come
from?
Repor3ng
neuroscience
data
within
a
consistent
framework
helps
enormously
37. Compe**on
Coopera*on
Coordina*on
Collabora*on
• We
live
in
a
linked
world:
“
Too
Big
to
Know”
• Mul3ple
efforts
are
underway
simultaneously
– Launched
without
knowledge
of
others
– Mine
is
beher
/
Not
Invented
Here
• Coopera3on
and
coordina3on
will
allow
us
to
move
forward
faster
– NIF
has
tried
to
be
a
good
ci3zen
by
sharing
exper3se,
data,
knowledge,
tools
38. NIF
team
(past
and
present)
Maryann
Martone,
UCSD,
Principal
Inves3gator
Vadim
Astakhov
Jeffrey
Grethe,
UCSD,
Co
Inves3gator
Davis
Banks
Amarnath
Gupta,
UCSD,
Co
Inves3gator
Bill
Bug
Anita
Bandrowski,
NIF
Project
Leader
Jonathan
Cachat
Gordon
Shepherd,
Yale
University
Chris
Condit
Perry
Miller
Mark
Ellisman
Luis
Marenco
Lee
Hornbrook
Rixin
Wang
Fahim
Imam
David
Van
Essen,
Washington
University
Stephen
Larson
Erin
Reid
Jennifer
Lawrence
Paul
Sternberg,
Cal
Tech
Cliff
Lee
Arun
Rangarajan
Larry
Lui
Hans
Michael
Muller
Sarah
Maynard
Yuling
Li
Binh
Ngo
Giorgio
Ascoli,
George
Mason
University
Andrea
Arnaud
Stagg
Sridevi
Polavarum
Xufei
Qian
Tim
Clark,
Harvard
University
Willie
Wong
Paolo
Ciccarese
Jonathan
Pollock,
NIH,
Program
Officer
Karen
Skinner,
NIH,
Program
Officer