1. Making It Happen:
Making It Happen
Sustainable Data
Preservation and Use
March 19, 2013
Anita de Waard
VP Research Data Collaborations, Elsevier RDS
a.dewaard@elsevier.com
2. “What
aspects/tools/capabilities/frameworks
are related to this idea?”
• There are many different research databases– both generic
(Dryad, Dataverse, …) and specific (NIF, IEDA, PDB, …)
• There are many systems for creating/sharing workflows
(Taverna, MyExperiment, Vistrails, Workflow4Ever etc)
• There are many e-lab notebooks
(LabGuru, LabArchives, LaBlog, etc)
• There are scores of
projects, committees, standards, bodies, grants, initiatives,
conferences for discussing and connecting all of this
(KEfED, Pegasus, PROV, RDA, Science
Gateways, Codata, BRDI, Earthcube, etc. etc)
• You can make a living out of this ;-)! (and many of us do…)
3. …but this is what scientists do:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to
make sense of this,
and writes a paper.
End of story.
4. Why save research data?
A. Data Preservation:
– Preserve record of scientific process, provenance
– Enable reproducible research
B. Data Use:
– Use results obtained by others
– Do better science!
– Improve interdisciplinary work
C. Sustainable Models:
– Technology transfer; societal/industrial development
– Reward scientists for data creation (credit/attribution)
– Long-term archiving
5. Where The Data Goes Now:
PDB:
A small portion of data 88,3 k
(1-2%?) stored in small, PetDB:
> 50 My Papers 1,5 k SedDB:
topic-focused
2 M scientists data repositories 0.6 k
MiRB:
2 M papers/year 25k
TAIR:
72,1 k
Some data
(8%?) stored in large,
generic data
Majority of data repositories
(90%?) is stored
on local hard drives
Dryad: Dataverse:
7,631 files 0.6 M
Datacite:
1.5 M
6. Key Needs: DEVELOP SUSTAINABLE MODELS
PDB:
A small portion of data 88,3 k
(1-2%?) stored in small, PetDB:
> 50 My Papers 1,5 k SedDB:
topic-focused
2 M scientists data repositories 0.6 k
MiRB:
2 M papers/year 25k
TAIR:
72,1 k
Some data
(8%?) stored in large,
generic data
Majority of data repositories
(90%?) is stored
on local hard drives
Dryad: Dataverse:
7,631 files 0.6 M
INCREASE DATA
PRESERVATION Datacite:
1.5 M
7. Objections (and rebuttals) to data sharing:
Objection: Rebuttal:
“Our lab notebooks are all on Graft tools closely on scientists’
paper – it’s how we do things” daily practice
“I need to see a direct benefit Create tools to allow better
of any effort I put in.” insight in own and other’s results.
“I don’t really trust anyone Create social networking context
else’s data – and don’t think and allow data owner to provide
they’ll trust mine” granular access control.
“I am afraid other people => Reward system moves
might scoop my from a competition to a
discoveries” ‘shared mission’
9. …to shared experimental repositories:
Across labs, experiments:
track reagents and how
they are used
Observations
Observations
Observations
Prepare
Prepare
Analyze Communicate
Analyze Communicate
10. …to shared experimental repositories:
Compare outcome of
interactions with these
entities
Observations
Observations
Observations
Prepare
Prepare
Analyze Communicate
Analyze Communicate
11. …to shared experimental repositories:
Build a ‘virtual reagent
spectrogram’ by comparing
how different entities
Observations
interacted in different
experiments Think
Observations
Observations
Prepare
Prepare
Analyze Communicate
Communicate Analyze
12. Some examples:
• Grafting tools on workflow: create tailored
metadata collection tools on mini-tablets
in labs to replace paper notebook
• Direct rewards: through ‘PI-Dashboard’:
allow immediate access/analysis of shared
data: new science!
• Data sharing rewards: Data Rescue Challenge::
collect and reward stories/practices of data
preservation/use in Earth/Lunar Science
• Improve data use: With NIF/Eagle-I: add
antibodies as key ‘entities’ to paper, link to AB repository
consortium
13. How do we make data use happen:
• We are creating repositories of shared experiments:
you are part of a greater whole!
• Collect and share stories and practices re. data use
and sustainable systems: “What gets to them?”
• Develop system of rewards for data sharing: enable
demonstrably better science!
• Work with grant agencies, repositories
(generic/specific, institutional, cross-national) to
integrate and annotate existing datasets and enable
cross-use
• Collectively pioneer long-term funding options;
support/develop ‘shared mission’ funding challenges