Specialized repository infrastructure exists for certain data-types, e.g. DNA sequences and species occurrence data. But vast quantities of valuable and irreplaceable data are comprise the long tail, much in idiosyncratically formatted spreadsheets and other nonstandardized files. An archive is not needed to replace existing repositories, but to provide a home for orphan data and enable ALL the data underlying a publication to be archived.
Dryad was was developed to fill the infrastructure gap for journals that wished to sincerely promote data archiving. One that could be used not only by those authors producing certain types of data, or only those authors most motivated to share, but by all the authors to whom the journal’s data policy would apply.
1. The blessing and the curse:handshaking betweengeneral andspecialist data repositoriesHilmar Lapp (NESCent), Todd Vision (UNC Chapel Hill)GSC 15 Conference, Bethesda, MDApril 22-24, 2013
2. > 180 forbiological sciencesalone
3. Which data goes where?Which is required?
4. Addressing the long tail of orphan dataVolumeRank frequency of datatypeSpecialized repositories(e.g. GenBank, GBIF)Orphan dataAfter Heidorn (2008) http://hdl.handle.net/2142/9127Many datasets belong to thelong tail. Though lessstandardized, they can be rich ininformation content and haveunique value
5. General purpose repositoriescater to long-tail data
6. General purpose repositoriescater to long-tail data
7. And that’s aside fromthe proverbial Babel ofdata formats.
8. Where does this leavethe user?
9. Where to deposit what, and how?
10. Enter Publication:Please enter your publication:Publication:Enter Publication:Metadatahas to beprovisionedredundantly
11. How to concisely link tothe supporting data?
12. Given the article, howdo I ﬁnd the data?
13. Given a datarecord, howdo I ﬁndrelated data?
14. How do I assess qualityand ﬁtness for purpose?
15. Lessons fromDryad/TreeBASEhandshaking
16. • The End To make data archiving and reuse a standard part of scholarly communication.• The Means Integrate data archiving with the process of publication. Make archiving easy and low burden for both authors and journals. Give researchers incentives to archive their data. Promote responsible data reuse. Empower journals, societies & publishers in shared governance. Ensure sustainability and long-term preservation. Work with and support trusted, specialized disciplinary repositories.• The Scope Research data in sciences and medicine. (Early focus on evolution and ecology). Content must be complementary to existing disciplinary repositories. Data must be associated with a vetted publication (article, thesis, book chapter, etc.) Associated non-data content (e.g. software scripts, ﬁgures) where appropriate
17. Lessons learnt• Different priorities on deposit versusmetadata richness may void beneﬁts• Advantages of one-stop deposition andwhen to use it are not obvious to users• Custom-building handshakingprotocols is not robust, doesn’t scale
18. How to promote• Minimum metadatareporting standards?• Uptake of communityspecialist repositories?• Archival of all long-taildata?• Linking betweenrepositories?
19. DataMetadataLinksDataMetadata Links
20. Standards for repository& web of datainteroperability
21. Standards for repository &web of data interoperability