Towards Incidental Collaboratories; Research Data Services

Research
Data
Services:

Towards
a
Framework
for
Incidental
Collaboratories

Anita
de
Waard

VP
Research
Data
Collabora@ons,
Elsevier
RDS

Jericho,
VT,
USA

Brief
bio:

•  Background:

–  Low-‐temperature
physics
(Leiden
&
Moscow)

–  Joined
Elsevier
in
1988
as
publisher
in
solid
state
physics

–  1991:
ArXiV
=>
publishers
will
go
out
of
business
very
soon!

•  1997-‐
now:
Disrup@ve
Technologies
Director,
focus
on
beXer

representa@on
of
scien@ﬁc
knowledge:

–  Iden@fying
key
knowledge
elements
in
ar@cles
(linguis@cs
thesis)

–  Building
claim-‐evidence
networks
(through
collabora@ons)

–  Help
build
communi@es
to
accelerate
rate
of
change
(Force11)

•  Star@ng
1/1/2013:
VP
Research
Data
Collabora@ons
-‐
why?

–  Douglas
Engelbart’s
thinking:
connect
minds!

–  My
(non-‐biologists)
understanding
of
biology:

The
big
problem
in
biology:

Interspecies
variability:
A
specimen
is
not
a
species

Gene
expression
variability:
Knowing
genes
is
not

knowing
how
they
are
expressed

Microbiome:
An
animal
is
an
ecosystem

Systems
biology:
A
whole
is
more
than
the
sum
of
its
parts

Reduc@onist
science
doesn’t
work

for
living
systems!

hXp://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg

Sta@s@cs
to
the
rescue!

With
enough
observa@ons,
trends
and
anomalies
can
be
detected:

• 
“Here
we
present
resources
from
a
popula@on
of
242
healthy
adults

sampled
at
15
or
18
body
sites
up
to
three
@mes,
which
have
generated

5,177
microbial
taxonomic
proﬁles
from
16S
ribosomal
RNA
genes
and

over
3.5
terabases
of
metagenomic
sequence
so
far.”

The
Human
Microbiome
Project
Consor@um,
Structure,
func@on
and
diversity
of
the
healthy

human
microbiome,
Nature
486,
207–214
(14
June
2012)
doi:10.1038/nature11234

•  “The
large
sample
size
—
4,298
North
Americans
of
European
descent

and
2,217
African
Americans
—
has
enabled
the
researchers
to
mine

down
into
the
human
genome.”

Nidhi
Subbaraman,
Nature
News,
28
November
2012,
High-‐resolu@on
sequencing
study

emphasizes
importance
of
rare
variants
in
disease.

•  “A
proﬁle
unique
for
a
DNA
sample
source
is
obtained

…
a
series

of
numbers
are
generated
which
can
be
used
as
a
bar
code
for

that
DNA
source.
A
registry
of
bar
codes
would
make
it
easy
to

compare
DNA
samples”

Roland
M.
Nardone,
Ph.D.,
Eradica@on
of
Cross-‐Contaminated
Cell
Lines:
A
Call
for
Ac@on,

hXp://www.sivb.org/publicPolicy_Eradica@on.pdf

Enable
‘incidental
collaboratories’:

•  Collect:
store
data
at
the
level
of
the
experiment:

–  Accessible
through
a
single
interface

–  Add
enough
metadata
to
know
what
was
done/seen

•  Connect:
allow
analyses
over:

–  Similar
experiment
types

–  Experiments
done
with/on
similar
biological
‘things’:

•  Species,
strains,
systems,
cells

•  Anatomical
components
(e.g.
spleen,
hypothalamus)

•  An@bodies,
biomarkers,
bioac@ve
chemicals,
etc

•  Keep:

–  Long-‐term
preserva@on
of
data
and
sosware
(Olive)

–  Fulﬁll
Data
Management
Plan
requirements

–  Allow
gated
access,
if
needed

Problem:
biological
research
is
quite
insular

•  Biology
is
small:
because
objects/
equipment
are
10^-‐5
–
10^2
m,
you

can
work
alone
(‘King’
and

‘subjects’).

Prepare

•  Biology
is
messy:
it
doesn’t
happen

behind
a
terminal.

Ponder
Observe

•  Biology
is
compe@@ve:
diﬀerent

Communicate

people
with
similar
skill
sets,
vying

for
the
same
grants.

Analyze

•  In
summary:
it
does
not
promote

inherent
collabora@on
(vs.,
for

instance,
big
physics
or
astronomy).

Try
to
pop
the
‘lab
bubble’!

Prepare

Observa@ons

Labs
go
from
being

Analyze
Communicate
Think
Observa@ons

informa@on
islands,

to
being
‘sensors
in
a

Observa@ons

network’.

Prepare

Prepare

Analyze
Communicate

Analyze
Communicate

Some
objec@ons,
and
rebuXals:

Objec&on:
Rebu-al:

“But
our
lab
notebooks
are
all
on
Develop
smart
phone/tablet
apps
for
data

paper”
input

“I
need
to
see
a
direct
beneﬁt
from
Develop
‘data
manipula@on
dashboard’

something
I
spend
my
@me
on”
for
PI
to
allow
beXer
access
to
full

experimental
output
for
his/her
lab

“I
am
afraid
other
people
might
Develop
intra-‐lab
data
communica@on

scoop
my
discoveries”
systems
ﬁrst
and
allow
@med/granular

data
export

“I
want
things
to
be
peer
reviewed
Allow
reviewers
access
to
experimental

before
I
expose
them”
database
before
publica@on
(of
data
or

paper)

“I
don’t
really
trust
anyone
else’s
Add
a
social
networking
component
to

data
–
well,
except
for
the
guys
I
this
data
repository
so
you
know
who
(to

went
to
Grad
School
with…”

the
individual)
created
that
data
point.

Elsevier
Research
Data
Services:
Goals

1.  Help
increase
the
amount
of
data
shared
from
the
lab,

enabling
incidental
collaboratories

2.  Help
increase
the
value
of
the
data
shared
by

increasing
annota@on,
normaliza@on,
provenance

enabling
enhanced
interoperability

3.  Help
measure
and
deliver
credit
for
shared
data,
the

researchers,
the
ins@tute,
and
the
funding
body,

enabling
more
sustainable
pla;orms

RDS
Guiding
Principles:

•  In
principle,
all
open
data
stays
open
and
URLs,
front

end
etc.
stay
where
they
are
(i.e.
with
repository)

•  Collabora@on
is
tailored
to
data
repositories’

unique

needs/interests
and
of
a
‘service-‐model’
type:

–  Aspects
where
collabora@on
is
needed
are
discussed

–  A
collabora@on
plan
is
drawn
up
using
a
Service-‐Level

Agreement:
agree
on
@me,
condi@ons,
etc.

–  All
communica@on,
ﬁnance,
IPR
etc.
is
completely

transparent
at
all
@mes.

•  Very
small
(2/3
people)
department;
immediate

communica@on;
instant
deployment
of
ideas

RDS
Approach:

•  Collaborate
and
build
on
rela@onships
with
data

repositories
(life
science,
earth
science,
others)

•  Integrate
with
other
content
sources,
if
possible

•  Build
annota@on
and
standardisa@on
tools
and

processes
to
implement
this

•  Develop
next-‐genera@on
infrastructure
solu@ons

for
back-‐end
integra@on

•  Explore
crea@ve
revenue
opportuni@es

NIF
An@body
Registry:

Problem:

•  95
an@bodies
were
iden@ﬁed
in
8
papers

•  52
did
not
contain
enough
informa@on

to
determine
the
an@body
used

•  Some
provided
details
in
another
paper

•  Failed
to
give
species,
vendor,
catalog
#

Solu@on
#
1:

•  Journals
ask
authors
to
provide

an@body
catalog
nr

•  Link
to
NIF
Registry
from
manufacturers/
vendors’
sites

Solu@on
#2:

•  Pilot
with
a
lab:

Let’s
start
with
the
Urban
Lab

•  Geyng
an@bodies

•  And
messy
bits

•  From
the
notebook

•  Into
Nathan
Urban’s

command
center

•  By
providing

– 7”
Tablets

– Links
to
IgorPro

– A
dashboard
UI

My
ques@ons
to
you:

•  Thoughts
on
this
approach:

–  In
principle?

–  In
prac@ce?

•  Do
you
see
serious
hurdles:

–  Are
we
overlapping
with
other
ini@a@ves;
if
so,
are
we

complementary?

–  How
does
this
connect
to
libraries/local
repositories?

–  Are
there
sensi@vi@es/pain
points
we
are
overlooking?

•  Where
to
start:

–  How
to
collaborate?

–  Who
to
talk
to
–
funding
agencies,
socie@es:
who
else?

–  Thoughts
on
data
repositories/plazorms
to
connect
to?

Your
ques@ons
to
me?

a.dewaard@elsevier.com

hXp://elsatglabs.com/labs/anita/

hXp://www.slideshare.net/anitawaard

Thanks
go
to:

•  Anita
Bandrowski
and
Maryann
Martone,
NIF

•  Nathan
Urban,
Shreejoy
Tripathy,
CMU

•  David
Marques,
SVP
RDS

Towards Incidental Collaboratories; Research Data Services

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Towards Incidental Collaboratories; Research Data Services

Similar to Towards Incidental Collaboratories; Research Data Services (20)

More from Anita de Waard

More from Anita de Waard (20)

Towards Incidental Collaboratories; Research Data Services