The blessing and the curse: handshaking between general and specialist data repositories


Published on

Talk presented at the Genomic Standards Consortium 15 conference.

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Specialized repository infrastructure exists for certain data-types, e.g. DNA sequences and species occurrence data. But vast quantities of valuable and irreplaceable data are comprise the long tail, much in idiosyncratically formatted spreadsheets and other nonstandardized files. An archive is not needed to replace existing repositories, but to provide a home for orphan data and enable ALL the data underlying a publication to be archived.
  • Dryad was was developed to fill the infrastructure gap for journals that wished to sincerely promote data archiving. One that could be used not only by those authors producing certain types of data, or only those authors most motivated to share, but by all the authors to whom the journal’s data policy would apply.
  • The blessing and the curse: handshaking between general and specialist data repositories

    1. 1. The blessing and the curse:handshaking betweengeneral andspecialist data repositoriesHilmar Lapp (NESCent), Todd Vision (UNC Chapel Hill)GSC 15 Conference, Bethesda, MDApril 22-24, 2013
    2. 2. > 180 forbiological sciencesalone
    3. 3. Which data goes where?Which is required?
    4. 4. Addressing the long tail of orphan dataVolumeRank frequency of datatypeSpecialized repositories(e.g. GenBank, GBIF)Orphan dataAfter Heidorn (2008) datasets belong to thelong tail. Though lessstandardized, they can be rich ininformation content and haveunique value
    5. 5. General purpose repositoriescater to long-tail data
    6. 6. General purpose repositoriescater to long-tail data
    7. 7. And that’s aside fromthe proverbial Babel ofdata formats.
    8. 8. Where does this leavethe user?
    9. 9. Where to deposit what, and how?
    10. 10. Enter Publication:Please enter your publication:Publication:Enter Publication:Metadatahas to beprovisionedredundantly
    11. 11. How to concisely link tothe supporting data?
    12. 12. Given the article, howdo I find the data?
    13. 13. Given a datarecord, howdo I findrelated data?
    14. 14. How do I assess qualityand fitness for purpose?
    15. 15. Lessons fromDryad/TreeBASEhandshaking
    16. 16. • The End To make data archiving and reuse a standard part of scholarly communication.• The Means Integrate data archiving with the process of publication. Make archiving easy and low burden for both authors and journals. Give researchers incentives to archive their data. Promote responsible data reuse. Empower journals, societies & publishers in shared governance. Ensure sustainability and long-term preservation. Work with and support trusted, specialized disciplinary repositories.• The Scope Research data in sciences and medicine. (Early focus on evolution and ecology). Content must be complementary to existing disciplinary repositories. Data must be associated with a vetted publication (article, thesis, book chapter, etc.) Associated non-data content (e.g. software scripts, figures) where appropriate
    17. 17. Lessons learnt• Different priorities on deposit versusmetadata richness may void benefits• Advantages of one-stop deposition andwhen to use it are not obvious to users• Custom-building handshakingprotocols is not robust, doesn’t scale
    18. 18. How to promote• Minimum metadatareporting standards?• Uptake of communityspecialist repositories?• Archival of all long-taildata?• Linking betweenrepositories?
    19. 19. DataMetadataLinksDataMetadata Links
    20. 20. Standards for repository& web of datainteroperability
    21. 21. Standards for repository &web of data interoperability
    22. 22. Promoting communityrallying around standards?
    23. 23. Promoting communityrallying around standards?
    24. 24. Repo: http://datadryad.orgBlog: http://blog.datadryad.orgWiki: