The blessing and the curse: handshaking between general and specialist data repositories
1. The blessing and the curse:
handshaking between
general and
specialist data repositories
Hilmar Lapp (NESCent), Todd Vision (UNC Chapel Hill)
GSC 15 Conference, Bethesda, MD
April 22-24, 2013
4. Addressing the long tail of orphan data
Volume
Rank frequency of datatype
Specialized repositories
(e.g. GenBank, GBIF)
Orphan data
After Heidorn (2008) http://hdl.handle.net/2142/9127
Many datasets belong to the
long tail. Though less
standardized, they can be rich in
information content and have
unique value
17. • The End
To make data archiving and reuse a standard part of scholarly communication.
• The Means
Integrate data archiving with the process of publication.
Make archiving easy and low burden for both authors and journals.
Give researchers incentives to archive their data.
Promote responsible data reuse.
Empower journals, societies & publishers in shared governance.
Ensure sustainability and long-term preservation.
Work with and support trusted, specialized disciplinary repositories.
• The Scope
Research data in sciences and medicine. (Early focus on evolution and ecology).
Content must be complementary to existing disciplinary repositories.
Data must be associated with a vetted publication (article, thesis, book chapter, etc.)
Associated non-data content (e.g. software scripts, figures) where appropriate
18.
19.
20.
21.
22. Lessons learnt
• Different priorities on deposit versus
metadata richness may void benefits
• Advantages of one-stop deposition and
when to use it are not obvious to users
• Custom-building handshaking
protocols is not robust, doesn’t scale
23. How to promote
• Minimum metadata
reporting standards?
• Uptake of community
specialist repositories?
• Archival of all long-tail
data?
• Linking between
repositories?
Specialized repository infrastructure exists for certain data-types, e.g. DNA sequences and species occurrence data. But vast quantities of valuable and irreplaceable data are comprise the long tail, much in idiosyncratically formatted spreadsheets and other nonstandardized files. An archive is not needed to replace existing repositories, but to provide a home for orphan data and enable ALL the data underlying a publication to be archived.
Dryad was was developed to fill the infrastructure gap for journals that wished to sincerely promote data archiving. One that could be used not only by those authors producing certain types of data, or only those authors most motivated to share, but by all the authors to whom the journal’s data policy would apply.