The blessing and the curse: handshaking between general and specialist data repositories

  • 553 views
Uploaded on

Talk presented at the Genomic Standards Consortium 15 conference.

Talk presented at the Genomic Standards Consortium 15 conference.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
553
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
5
Comments
1
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Specialized repository infrastructure exists for certain data-types, e.g. DNA sequences and species occurrence data. But vast quantities of valuable and irreplaceable data are comprise the long tail, much in idiosyncratically formatted spreadsheets and other nonstandardized files. An archive is not needed to replace existing repositories, but to provide a home for orphan data and enable ALL the data underlying a publication to be archived.
  • Dryad was was developed to fill the infrastructure gap for journals that wished to sincerely promote data archiving. One that could be used not only by those authors producing certain types of data, or only those authors most motivated to share, but by all the authors to whom the journal’s data policy would apply.

Transcript

  • 1. The blessing and the curse:handshaking betweengeneral andspecialist data repositoriesHilmar Lapp (NESCent), Todd Vision (UNC Chapel Hill)GSC 15 Conference, Bethesda, MDApril 22-24, 2013
  • 2. > 180 forbiological sciencesalone
  • 3. Which data goes where?Which is required?
  • 4. Addressing the long tail of orphan dataVolumeRank frequency of datatypeSpecialized repositories(e.g. GenBank, GBIF)Orphan dataAfter Heidorn (2008) http://hdl.handle.net/2142/9127Many datasets belong to thelong tail. Though lessstandardized, they can be rich ininformation content and haveunique value
  • 5. General purpose repositoriescater to long-tail data
  • 6. General purpose repositoriescater to long-tail data
  • 7. And that’s aside fromthe proverbial Babel ofdata formats.
  • 8. Where does this leavethe user?
  • 9. Where to deposit what, and how?
  • 10. Enter Publication:Please enter your publication:Publication:Enter Publication:Metadatahas to beprovisionedredundantly
  • 11. How to concisely link tothe supporting data?
  • 12. Given the article, howdo I find the data?
  • 13. Given a datarecord, howdo I findrelated data?
  • 14. How do I assess qualityand fitness for purpose?
  • 15. Lessons fromDryad/TreeBASEhandshaking
  • 16. • The End To make data archiving and reuse a standard part of scholarly communication.• The Means Integrate data archiving with the process of publication. Make archiving easy and low burden for both authors and journals. Give researchers incentives to archive their data. Promote responsible data reuse. Empower journals, societies & publishers in shared governance. Ensure sustainability and long-term preservation. Work with and support trusted, specialized disciplinary repositories.• The Scope Research data in sciences and medicine. (Early focus on evolution and ecology). Content must be complementary to existing disciplinary repositories. Data must be associated with a vetted publication (article, thesis, book chapter, etc.) Associated non-data content (e.g. software scripts, figures) where appropriate
  • 17. Lessons learnt• Different priorities on deposit versusmetadata richness may void benefits• Advantages of one-stop deposition andwhen to use it are not obvious to users• Custom-building handshakingprotocols is not robust, doesn’t scale
  • 18. How to promote• Minimum metadatareporting standards?• Uptake of communityspecialist repositories?• Archival of all long-taildata?• Linking betweenrepositories?
  • 19. DataMetadataLinksDataMetadata Links
  • 20. Standards for repository& web of datainteroperability
  • 21. Standards for repository &web of data interoperability
  • 22. Promoting communityrallying around standards?
  • 23. Promoting communityrallying around standards?
  • 24. Repo: http://datadryad.orgBlog: http://blog.datadryad.orgWiki: http://datadryad.org/wikiCode: http://code.google.com/p/dryadList: dryad-users@nescent.org@datadryadDryad