Data preservation 101


Published on

10 simple steps towards effective preservation of research data

Published in: Science
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Davey, Detail - cuneiform inscription,
  • Richie Diesterheft, Stepping stones into the Japanese gardens,
  • Universal Pictures, Bride of Frankenstein, 1935
  • Cristina Costa, Question,
  • Will Scullin, Blueprint,
  • Will Scullin, Blueprint,
  • Yoel Ben-Avraham, Square-peg-round-hole-21,
  • Barcodes,
  • Tobias Wolter, AGB stamp,
  • Colorful books stacked (blender),
  • Ken Teegardin, Blue piggy bank with coins,
  • Clint Chilcott, My old license plate,
  • Penny black printing press in a British Library hallway,
  • Rob Stone, Prom parking ticket,
  • Marcin Wichary, 1403 printout,
  • Summer preserves 1,
  • Data preservation 101

    1. 1. Stephen Abrams University of California Curation Center Data Preservation 101
    2. 2. preservation is the means to an end … widespread data availability, sharing, and (re)use
    3. 3. good for science  reproducibility integrity  enables collaboration and synergy  minimizes needless duplication of effort © Universal Pictures
    4. 4. “Papers with publicly available microarray data received more citations than similar papers that did not make their data available, even after controlling for many variables known to influence citation rate” good for scientists  get credit for your work  higher impact factor
    5. 5. … and you have to (and should want to)  funders require it  journals require it  disciplinary best practice (increasingly) expects it “To do otherwise should come to be regarded as scientific malpractice” – Royal Society, 2014
    6. 6. what can I do? adopt the growing body of good practices 10 aspirational goals ►
    7. 7. plan ahead 10 implicit (non-)decisions can have significant consequences
    8. 8. plan ahead 10 a data management plan describes your intentions during and after your research project
    9. 9. prefer formats that are … standard customized open source proprietary commonly-used obscure self-describing opaque text binary 9 be preservation- friendly from the start
    10. 10. assign an identifier to your data 8 DOIs provide unambiguous reference, persistent access, and citation metrics [digital object identifier]
    11. 11. get an identifier for yourself 7 ORCIDs provide unambiguous reference and citation metrics [open researcher and contributor identifier]
    12. 12. describe and document what would you want to know about someone else’s data? who? what? when? where? how? why? …? 6
    13. 13. upload to a repository 5 professional, pro-active management replication fixity monitoring media refresh technology watch disaster recovery/ business continuity … replication fixity monitoring media refresh technology watch disaster recovery/ business continuity …
    14. 14. use a license with the most permissive terms 4 is best is okay custom data use agreement should be avoided
    15. 15. publish 3 so your data is available to collaborators, colleagues, and community
    16. 16. cite yourself and others 2 add data citations to your CV and publications track usage of your data products through alt-metrics
    17. 17. preserve your code 1 everything just said about data applies equally well to code
    18. 18. plan format identify (your data) identify (yourself) describe upload license publish cite code data preservation 101
    19. 19. for more information …
    20. 20. for more information … … also, a good paper to review: Goodman, Pepe, Blocker, Borgman, Cranmer et al. (2014) “Ten simple rules for the care and feeding of scientific data” PLOS Computational Biology 10(4):e1003452, doi:10.1371/journal.pcbi.1003542 … and ask your local librarian