The blessing and the curse: handshaking between general and specialist data repositories

•

2 likes•970 views

Hilmar Lapp

Talk presented at the Genomic Standards Consortium 15 conference.

Technology

The blessing and the curse:
handshaking between
general and
specialist data repositories
Hilmar Lapp (NESCent), Todd Vision (UNC Chapel Hill)
GSC 15 Conference, Bethesda, MD
April 22-24, 2013

Which data goes where?
Which is required?

Addressing the long tail of orphan data
Volume
Rank frequency of datatype
Specialized repositories
(e.g. GenBank, GBIF)
Orphan data
After Heidorn (2008) http://hdl.handle.net/2142/9127
Many datasets belong to the
long tail. Though less
standardized, they can be rich in
information content and have
unique value

General purpose repositories
cater to long-tail data

And that’s aside from
the proverbial Babel of
data formats.

Enter Publication:
Please enter your publication:
Publication:
Enter Publication:
Metadata
has to be
provisioned
redundantly

How to concisely link to
the supporting data?

Given the article, how
do I ﬁnd the data?

Given a data
record, how
do I ﬁnd
related data?

How do I assess quality
and ﬁtness for purpose?

• The End
 To make data archiving and reuse a standard part of scholarly communication.
• The Means
 Integrate data archiving with the process of publication.
 Make archiving easy and low burden for both authors and journals.
 Give researchers incentives to archive their data.
 Promote responsible data reuse.
 Empower journals, societies & publishers in shared governance.
 Ensure sustainability and long-term preservation.
 Work with and support trusted, specialized disciplinary repositories.
• The Scope
 Research data in sciences and medicine. (Early focus on evolution and ecology).
 Content must be complementary to existing disciplinary repositories.
 Data must be associated with a vetted publication (article, thesis, book chapter, etc.)
 Associated non-data content (e.g. software scripts, ﬁgures) where appropriate

Lessons learnt
• Different priorities on deposit versus
metadata richness may void beneﬁts
• Advantages of one-stop deposition and
when to use it are not obvious to users
• Custom-building handshaking
protocols is not robust, doesn’t scale

How to promote
• Minimum metadata
reporting standards?
• Uptake of community
specialist repositories?
• Archival of all long-tail
data?
• Linking between
repositories?

Standards for repository
& web of data
interoperability

Promoting community
rallying around standards
?

Repo: http://datadryad.org
Blog: http://blog.datadryad.org
Wiki: http://datadryad.org/wiki
Code: http://code.google.com/p/dryad
List: dryad-users@nescent.org
@datadryad
Dryad

What's hot

Rots RDAP11 Data Archives in Federal AgenciesASIS&T

Research Data Management in practice, RIA Data Management Workshop Brisbane 2017ARDC

Valen Metadata and the [Data] RepositoryNational Information Standards Organization (NISO)

Borgman - Privacy, Policy and Data Governance in the UniversityNational Information Standards Organization (NISO)

Publishers and RDMCentre for Digital Scholarship, Leiden University Libraries

RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...ASIS&T

Publishing perspectives on data management & future directionsARDC

A Data Citation Roadmap for Scholarly Data RepositoriesLIBER Europe

Burton - Security, Privacy and TrustNational Information Standards Organization (NISO)

Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas

NIH Data Sharing Plan Workshop - HandoutIUPUI

‘Good, better, best’? Examining the range and rationales of institutional dat...Robin Rice

Research Data Management: Why is it important?EDINA, University of Edinburgh

Overcoming obstacles to sharing data about human subjectsRobin Rice

RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...ASIS&T

Building research data management services at the University of Edinburgh: a ...Robin Rice

The challenge of sharing data well, how publishers can helpVarsha Khodiyar

RDAP14: DataNet Federal Consortium Update ASIS&T

Mike Mertens Directions for RDM day one summaryJisc

Smith RDAP11 NSF Data Management Plan Case StudiesASIS&T

What's hot (20)

Rots RDAP11 Data Archives in Federal Agencies

Research Data Management in practice, RIA Data Management Workshop Brisbane 2017

Valen Metadata and the [Data] Repository

Borgman - Privacy, Policy and Data Governance in the University

Publishers and RDM

RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...

Publishing perspectives on data management & future directions

A Data Citation Roadmap for Scholarly Data Repositories

Burton - Security, Privacy and Trust

Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...

NIH Data Sharing Plan Workshop - Handout

‘Good, better, best’? Examining the range and rationales of institutional dat...

Research Data Management: Why is it important?

Overcoming obstacles to sharing data about human subjects

RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...

Building research data management services at the University of Edinburgh: a ...

The challenge of sharing data well, how publishers can help

RDAP14: DataNet Federal Consortium Update

Mike Mertens Directions for RDM day one summary

Smith RDAP11 NSF Data Management Plan Case Studies

Viewers also liked

Dark Data In the Long Tail of Science: Examples in BiologyBryan Heidorn

Library and data lecture for inf21306Hugo Besemer

Bringing reason to phenotype diversity, character change, and common descentHilmar Lapp

Introduction to Research Data Management at Lancaster UniversityLancaster University Library

Reproducible Science - Panel at iEvoBio 2014 Hilmar Lapp

Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Research Support Team, IT Services, University of Oxford

Sharing Data: An Introductory Workshop from OpenAIRE and FosterOpenAIRE

Open Bioinformatics Foundation: 2014 Update & Some IntrospectionHilmar Lapp

Data Metadata and Data Citation - Emma Ganley (PLoS)National Information Standards Organization (NISO)

The Needs of Stakeholders in the RDM Process - the role of LEARNLEARN Project

Open science and the individual researcherBram Zandbelt

A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...LIBER Europe

Viewers also liked (12)

Dark Data In the Long Tail of Science: Examples in Biology

Library and data lecture for inf21306

Bringing reason to phenotype diversity, character change, and common descent

Introduction to Research Data Management at Lancaster University

Reproducible Science - Panel at iEvoBio 2014

Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...

Sharing Data: An Introductory Workshop from OpenAIRE and Foster

Open Bioinformatics Foundation: 2014 Update & Some Introspection

Data Metadata and Data Citation - Emma Ganley (PLoS)

The Needs of Stakeholders in the RDM Process - the role of LEARN

Open science and the individual researcher

A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...

Similar to The blessing and the curse: handshaking between general and specialist data repositories

The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision

NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone

Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleNational Information Standards Organization (NISO)

Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...The University of Edinburgh

Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard

W3C Library Linked Data Incubator Group - 2011Antoine Isaac

NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceSusanna-Assunta Sansone

Open Science Governance and Regulation/Simon HodsonAcademy of Science of South Africa (ASSAf)

INSERM - Data Management & Reuse of Health Data - May 2017Susanna-Assunta Sansone

Research data life cycleUniversity of Arizona

The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...Peter McQuilton

Open Data and Institutional RepositoriesRobin Rice

Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone

eROSA Stakeholder WS1: Data discovery through federated dataset cataloguese-ROSA

The expanding dataverseMerce Crosas

HKU Data Curation MLIM7350 Class 9 Scott Edmunds

FAIR BioData ManagementUlrike Wittig

David Van Enckevort - FAIR sample and data access DataSciSIG

NIH Data Science Special Interest GroupYaffa Rubinstien

Fair sample and data access -David Van enckevortData Science NIH

Similar to The blessing and the curse: handshaking between general and specialist data repositories (20)

The Dryad Digital Repository: Published evolutionary data as part of the gre...

NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Full Erdmann Ruttenberg Community Approaches to Open Data at Scale

Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...

Some Ideas on Making Research Data: "It's the Metadata, stupid!"

W3C Library Linked Data Incubator Group - 2011

NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Open Science Governance and Regulation/Simon Hodson

INSERM - Data Management & Reuse of Health Data - May 2017

Research data life cycle

The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...

Open Data and Institutional Repositories

Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014

eROSA Stakeholder WS1: Data discovery through federated dataset catalogues

The expanding dataverse

HKU Data Curation MLIM7350 Class 9

FAIR BioData Management

David Van Enckevort - FAIR sample and data access

NIH Data Science Special Interest Group

Fair sample and data access -David Van enckevort

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

AI as an Interface for Commercial BuildingsMemoori

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Artificial intelligence in cctv survelliance.pptxhariprasad279825

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Install Stable Diffusion in windows machinePadma Pradeep

Story boards and shot lists for my a level piececharlottematthew16

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

"ML in Production",Oleksandr BaganFwdays

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club

AI as an Interface for Commercial Buildings

Human Factors of XR: Using Human Factors to Design XR Systems

My INSURER PTE LTD - Insurtech Innovation Award 2024

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Artificial intelligence in cctv survelliance.pptx

"Debugging python applications inside k8s environment", Andrii Soldatenko

Install Stable Diffusion in windows machine

Story boards and shot lists for my a level piece

What's New in Teams Calling, Meetings and Devices March 2024

Powerpoint exploring the locations used in television show Time Clash

My Hashitalk Indonesia April 2024 Presentation

Vector Databases 101 - An introduction to the world of Vector Databases

"ML in Production",Oleksandr Bagan

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

DMCC Future of Trade Web3 - Special Edition

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Vertex AI Gemini Prompt Engineering Tips

Nell’iperspazio con Rocket: il Framework Web di Rust!

SAP Build Work Zone - Overview L2-L3.pptx

The blessing and the curse: handshaking between general and specialist data repositories

1. The blessing and the curse: handshaking between general and specialist data repositories Hilmar Lapp (NESCent), Todd Vision (UNC Chapel Hill) GSC 15 Conference, Bethesda, MD April 22-24, 2013

2. > 180 for biological sciences alone

3. Which data goes where? Which is required?

4. Addressing the long tail of orphan data Volume Rank frequency of datatype Specialized repositories (e.g. GenBank, GBIF) Orphan data After Heidorn (2008) http://hdl.handle.net/2142/9127 Many datasets belong to the long tail. Though less standardized, they can be rich in information content and have unique value

5. General purpose repositories cater to long-tail data

6. General purpose repositories cater to long-tail data

7. And that’s aside from the proverbial Babel of data formats.

8. Where does this leave the user?

9. Where to deposit what, and how?

10. Enter Publication: Please enter your publication: Publication: Enter Publication: Metadata has to be provisioned redundantly

11. How to concisely link to the supporting data?

12. Given the article, how do I ﬁnd the data?

13.

14. Given a data record, how do I ﬁnd related data?

15. How do I assess quality and ﬁtness for purpose?

16. Lessons from Dryad/TreeBASE handshaking

17. • The End  To make data archiving and reuse a standard part of scholarly communication. • The Means  Integrate data archiving with the process of publication.  Make archiving easy and low burden for both authors and journals.  Give researchers incentives to archive their data.  Promote responsible data reuse.  Empower journals, societies & publishers in shared governance.  Ensure sustainability and long-term preservation.  Work with and support trusted, specialized disciplinary repositories. • The Scope  Research data in sciences and medicine. (Early focus on evolution and ecology).  Content must be complementary to existing disciplinary repositories.  Data must be associated with a vetted publication (article, thesis, book chapter, etc.)  Associated non-data content (e.g. software scripts, ﬁgures) where appropriate

18.

19.

20.

21.

22. Lessons learnt • Different priorities on deposit versus metadata richness may void beneﬁts • Advantages of one-stop deposition and when to use it are not obvious to users • Custom-building handshaking protocols is not robust, doesn’t scale

23. How to promote • Minimum metadata reporting standards? • Uptake of community specialist repositories? • Archival of all long-tail data? • Linking between repositories?

24. Data Metadata Links Data Metadata Links

25.

26. Standards for repository & web of data interoperability

27. Standards for repository & web of data interoperability

28. Promoting community rallying around standards ?

29. Promoting community rallying around standards ?

30. Repo: http://datadryad.org Blog: http://blog.datadryad.org Wiki: http://datadryad.org/wiki Code: http://code.google.com/p/dryad List: dryad-users@nescent.org @datadryad Dryad

Editor's Notes

Specialized repository infrastructure exists for certain data-types, e.g. DNA sequences and species occurrence data. But vast quantities of valuable and irreplaceable data are comprise the long tail, much in idiosyncratically formatted spreadsheets and other nonstandardized files. An archive is not needed to replace existing repositories, but to provide a home for orphan data and enable ALL the data underlying a publication to be archived.
Dryad was was developed to fill the infrastructure gap for journals that wished to sincerely promote data archiving. One that could be used not only by those authors producing certain types of data, or only those authors most motivated to share, but by all the authors to whom the journal’s data policy would apply.

The blessing and the curse: handshaking between general and specialist data repositories

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to The blessing and the curse: handshaking between general and specialist data repositories

Similar to The blessing and the curse: handshaking between general and specialist data repositories (20)

More from Hilmar Lapp

More from Hilmar Lapp (17)

Recently uploaded

Recently uploaded (20)

The blessing and the curse: handshaking between general and specialist data repositories

Editor's Notes