Stephen Abrams
University of California Curation Center
www.cdlib.org/uc3
Data Preservation 101
www.flickr.com/photos/infocux/8450190120www.flickr.com/photos/adavey/4735763989
preservation is the
means to an end …
widespread data availability,
sharing, and (re)use
www.flickr.com/photos/puroticorico/503484670
good for science
 reproducibility integrity
 enables collaboration and
synergy
 minimizes needless
duplication of effort
© Universal Pictures
“Papers with publicly available microarray data received
more citations than similar papers that did not make their
data available, even after controlling for many variables
known to influence citation rate”
good for scientists
 get credit for your work
 higher impact factor
… and you have to
(and should want to)
 funders require it
 journals require it
 disciplinary best practice
(increasingly) expects it
“To do otherwise should come to be
regarded as scientific malpractice”
– Royal Society, 2014
what can I do?
www.flickr.com/photos/cristinacosta/4304968451
adopt the growing body of
good practices
10 aspirational goals ►
plan ahead
www.flickr.com/photos/wscullin/3770015203
10
implicit (non-)decisions
can have significant
consequences
plan ahead
www.flickr.com/photos/wscullin/3770015203
dmptool.org
10
a data management plan
describes your intentions
during and after your
research project
www.flickr.com/photos/epublicist/3546059144
prefer formats that are …
standard customized
open source proprietary
commonly-used obscure
self-describing opaque
text binary
9
be preservation-
friendly from
the start
assign an identifier
to your data
www.flickr.com/photos/erskinelibrary/4581870160
ezid.cdlib.orgdatacite.org
8
DOIs provide unambiguous
reference, persistent access,
and citation metrics
[digital object identifier]
get an identifier for
yourself
orcid.org
www.flickr.com/photos/mumpfpuffel/2337520969
7
ORCIDs provide unambiguous
reference and citation metrics
[open researcher and contributor identifier]
www.flickr.com/photos/mumpfpuffel/233752096
describe and
document
what would you
want to know
about someone
else’s data?
www.flickr.com/photos/61423903@N06/7357608430
who?
what?
when?
where?
how?
why?
…?
6
upload to a
repository
www.flickr.com/photos/teegardin/6094310934
re3data.org
databib.org
5
professional,
pro-active
management
datashare.ucsf.edu
merritt.cdlib.edu
replication
fixity
monitoring
media refresh
technology watch
disaster recovery/
business continuity
…
replication
fixity
monitoring
media refresh
technology watch
disaster recovery/
business continuity
…
use a license with
the most permissive
terms
www.flickr.com/photos/_elemenoh_/147966697
4
is best
is okay
custom data use
agreement should be avoided
publish
3
www.nature.com/sdata
esapubs.org/archive
www.flickr.com/photos/takomabibelot/3984413475
so your data is available to
collaborators, colleagues,
and community
cite yourself and
others
2
add data citations to your CV
and publications
track usage of your data
products through alt-metrics
www.flickr.com/photos/rob_stone/559595880
plumanalytics.comaltmetric.com impactstory.org
preserve your code
1
everything just said about
data applies equally well
to code
www.flickr.com/photos/mwichary/3368836377
github.com sourceforge.com
plan
format
identify (your data)
identify (yourself)
describe
upload
license
publish
cite
code
data preservation 101
www.flickr.com/photos/santos/230060595
www.cdlib.org/uc3
uc3@ucop.edu
datapub.cdlib.org
for more information …
www.cdlib.org/uc3
uc3@ucop.edu
datapub.cdlib.org
for more information …
… also, a good paper to review:
Goodman, Pepe, Blocker, Borgman, Cranmer et al. (2014)
“Ten simple rules for the care and feeding of scientific data”
PLOS Computational Biology 10(4):e1003452,
doi:10.1371/journal.pcbi.1003542
… and ask your local librarian

Data preservation 101

Editor's Notes

  • #2 Davey, Detail - cuneiform inscription, https://www.flickr.com/photos/adavey/4735763989 http://www.flickr.com/photos/infocux/8450190120
  • #3 Richie Diesterheft, Stepping stones into the Japanese gardens, https://www.flickr.com/photos/puroticorico/503484670
  • #4 Universal Pictures, Bride of Frankenstein, 1935
  • #7 Cristina Costa, Question, https://www.flickr.com/photos/cristinacosta/4304968451
  • #8 Will Scullin, Blueprint, https://www.flickr.com/photos/wscullin/3770015203
  • #9 Will Scullin, Blueprint, https://www.flickr.com/photos/wscullin/3770015203
  • #10 Yoel Ben-Avraham, Square-peg-round-hole-21, https://www.flickr.com/photos/epublicist/3546059144
  • #11 Barcodes, https://www.flickr.com/photos/erskinelibrary/4581870160
  • #12 Tobias Wolter, AGB stamp, https://www.flickr.com/photos/mumpfpuffel/2337520969
  • #13 Colorful books stacked (blender), https://www.flickr.com/photos/61423903@N06/7357608430
  • #14 Ken Teegardin, Blue piggy bank with coins, https://www.flickr.com/photos/teegardin/6094310934
  • #15 Clint Chilcott, My old license plate, https://www.flickr.com/photos/_elemenoh_/147966697
  • #16 Penny black printing press in a British Library hallway, https://www.flickr.com/photos/takomabibelot/3984413475
  • #17 Rob Stone, Prom parking ticket, https://www.flickr.com/photos/rob_stone/559595880
  • #18 Marcin Wichary, 1403 printout, https://www.flickr.com/photos/mwichary/3368836377
  • #19 Summer preserves 1, https://www.flickr.com/photos/santos/230060595