The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
Towards Incidental Collaboratories; Research Data Services
1. Research
Data
Services:
Towards
a
Framework
for
Incidental
Collaboratories
Anita
de
Waard
VP
Research
Data
Collabora@ons,
Elsevier
RDS
Jericho,
VT,
USA
2. Brief
bio:
• Background:
– Low-‐temperature
physics
(Leiden
&
Moscow)
– Joined
Elsevier
in
1988
as
publisher
in
solid
state
physics
– 1991:
ArXiV
=>
publishers
will
go
out
of
business
very
soon!
• 1997-‐
now:
Disrup@ve
Technologies
Director,
focus
on
beXer
representa@on
of
scien@fic
knowledge:
– Iden@fying
key
knowledge
elements
in
ar@cles
(linguis@cs
thesis)
– Building
claim-‐evidence
networks
(through
collabora@ons)
– Help
build
communi@es
to
accelerate
rate
of
change
(Force11)
• Star@ng
1/1/2013:
VP
Research
Data
Collabora@ons
-‐
why?
– Douglas
Engelbart’s
thinking:
connect
minds!
– My
(non-‐biologists)
understanding
of
biology:
3. The
big
problem
in
biology:
Interspecies
variability:
A
specimen
is
not
a
species
Gene
expression
variability:
Knowing
genes
is
not
knowing
how
they
are
expressed
Microbiome:
An
animal
is
an
ecosystem
Systems
biology:
A
whole
is
more
than
the
sum
of
its
parts
Reduc@onist
science
doesn’t
work
for
living
systems!
hXp://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
4. Sta@s@cs
to
the
rescue!
With
enough
observa@ons,
trends
and
anomalies
can
be
detected:
•
“Here
we
present
resources
from
a
popula@on
of
242
healthy
adults
sampled
at
15
or
18
body
sites
up
to
three
@mes,
which
have
generated
5,177
microbial
taxonomic
profiles
from
16S
ribosomal
RNA
genes
and
over
3.5
terabases
of
metagenomic
sequence
so
far.”
The
Human
Microbiome
Project
Consor@um,
Structure,
func@on
and
diversity
of
the
healthy
human
microbiome,
Nature
486,
207–214
(14
June
2012)
doi:10.1038/nature11234
• “The
large
sample
size
—
4,298
North
Americans
of
European
descent
and
2,217
African
Americans
—
has
enabled
the
researchers
to
mine
down
into
the
human
genome.”
Nidhi
Subbaraman,
Nature
News,
28
November
2012,
High-‐resolu@on
sequencing
study
emphasizes
importance
of
rare
variants
in
disease.
• “A
profile
unique
for
a
DNA
sample
source
is
obtained
…
a
series
of
numbers
are
generated
which
can
be
used
as
a
bar
code
for
that
DNA
source.
A
registry
of
bar
codes
would
make
it
easy
to
compare
DNA
samples”
Roland
M.
Nardone,
Ph.D.,
Eradica@on
of
Cross-‐Contaminated
Cell
Lines:
A
Call
for
Ac@on,
hXp://www.sivb.org/publicPolicy_Eradica@on.pdf
5. Enable
‘incidental
collaboratories’:
• Collect:
store
data
at
the
level
of
the
experiment:
– Accessible
through
a
single
interface
– Add
enough
metadata
to
know
what
was
done/seen
• Connect:
allow
analyses
over:
– Similar
experiment
types
– Experiments
done
with/on
similar
biological
‘things’:
• Species,
strains,
systems,
cells
• Anatomical
components
(e.g.
spleen,
hypothalamus)
• An@bodies,
biomarkers,
bioac@ve
chemicals,
etc
• Keep:
– Long-‐term
preserva@on
of
data
and
sosware
(Olive)
– Fulfill
Data
Management
Plan
requirements
– Allow
gated
access,
if
needed
6. Problem:
biological
research
is
quite
insular
• Biology
is
small:
because
objects/
equipment
are
10^-‐5
–
10^2
m,
you
can
work
alone
(‘King’
and
‘subjects’).
Prepare
• Biology
is
messy:
it
doesn’t
happen
behind
a
terminal.
Ponder
Observe
• Biology
is
compe@@ve:
different
Communicate
people
with
similar
skill
sets,
vying
for
the
same
grants.
Analyze
• In
summary:
it
does
not
promote
inherent
collabora@on
(vs.,
for
instance,
big
physics
or
astronomy).
7. Try
to
pop
the
‘lab
bubble’!
Prepare
Observa@ons
Labs
go
from
being
Analyze
Communicate
Think
Observa@ons
informa@on
islands,
to
being
‘sensors
in
a
Observa@ons
network’.
Prepare
Prepare
Analyze
Communicate
Analyze
Communicate
8. Some
objec@ons,
and
rebuXals:
Objec&on:
Rebu-al:
“But
our
lab
notebooks
are
all
on
Develop
smart
phone/tablet
apps
for
data
paper”
input
“I
need
to
see
a
direct
benefit
from
Develop
‘data
manipula@on
dashboard’
something
I
spend
my
@me
on”
for
PI
to
allow
beXer
access
to
full
experimental
output
for
his/her
lab
“I
am
afraid
other
people
might
Develop
intra-‐lab
data
communica@on
scoop
my
discoveries”
systems
first
and
allow
@med/granular
data
export
“I
want
things
to
be
peer
reviewed
Allow
reviewers
access
to
experimental
before
I
expose
them”
database
before
publica@on
(of
data
or
paper)
“I
don’t
really
trust
anyone
else’s
Add
a
social
networking
component
to
data
–
well,
except
for
the
guys
I
this
data
repository
so
you
know
who
(to
went
to
Grad
School
with…”
the
individual)
created
that
data
point.
9. Elsevier
Research
Data
Services:
Goals
1. Help
increase
the
amount
of
data
shared
from
the
lab,
enabling
incidental
collaboratories
2. Help
increase
the
value
of
the
data
shared
by
increasing
annota@on,
normaliza@on,
provenance
enabling
enhanced
interoperability
3. Help
measure
and
deliver
credit
for
shared
data,
the
researchers,
the
ins@tute,
and
the
funding
body,
enabling
more
sustainable
pla;orms
10. RDS
Guiding
Principles:
• In
principle,
all
open
data
stays
open
and
URLs,
front
end
etc.
stay
where
they
are
(i.e.
with
repository)
• Collabora@on
is
tailored
to
data
repositories’
unique
needs/interests
and
of
a
‘service-‐model’
type:
– Aspects
where
collabora@on
is
needed
are
discussed
– A
collabora@on
plan
is
drawn
up
using
a
Service-‐Level
Agreement:
agree
on
@me,
condi@ons,
etc.
– All
communica@on,
finance,
IPR
etc.
is
completely
transparent
at
all
@mes.
• Very
small
(2/3
people)
department;
immediate
communica@on;
instant
deployment
of
ideas
11. RDS
Approach:
• Collaborate
and
build
on
rela@onships
with
data
repositories
(life
science,
earth
science,
others)
• Integrate
with
other
content
sources,
if
possible
• Build
annota@on
and
standardisa@on
tools
and
processes
to
implement
this
• Develop
next-‐genera@on
infrastructure
solu@ons
for
back-‐end
integra@on
• Explore
crea@ve
revenue
opportuni@es
12. NIF
An@body
Registry:
Problem:
• 95
an@bodies
were
iden@fied
in
8
papers
• 52
did
not
contain
enough
informa@on
to
determine
the
an@body
used
• Some
provided
details
in
another
paper
• Failed
to
give
species,
vendor,
catalog
#
Solu@on
#
1:
• Journals
ask
authors
to
provide
an@body
catalog
nr
• Link
to
NIF
Registry
from
manufacturers/
vendors’
sites
Solu@on
#2:
• Pilot
with
a
lab:
13. Let’s
start
with
the
Urban
Lab
• Geyng
an@bodies
• And
messy
bits
• From
the
notebook
• Into
Nathan
Urban’s
command
center
• By
providing
– 7”
Tablets
– Links
to
IgorPro
– A
dashboard
UI
14. My
ques@ons
to
you:
• Thoughts
on
this
approach:
– In
principle?
– In
prac@ce?
• Do
you
see
serious
hurdles:
– Are
we
overlapping
with
other
ini@a@ves;
if
so,
are
we
complementary?
– How
does
this
connect
to
libraries/local
repositories?
– Are
there
sensi@vi@es/pain
points
we
are
overlooking?
• Where
to
start:
– How
to
collaborate?
– Who
to
talk
to
–
funding
agencies,
socie@es:
who
else?
– Thoughts
on
data
repositories/plazorms
to
connect
to?
15. Your
ques@ons
to
me?
a.dewaard@elsevier.com
hXp://elsatglabs.com/labs/anita/
hXp://www.slideshare.net/anitawaard
Thanks
go
to:
• Anita
Bandrowski
and
Maryann
Martone,
NIF
• Nathan
Urban,
Shreejoy
Tripathy,
CMU
• David
Marques,
SVP
RDS