Managing, Sharing and Curating Your Research Data in a Digital Environment

Managing, Sharing and Curating
Your Research Data
in a Digital Environment
Sonia Barbosa, Manager of Data Curation, Harvard Dataverse
Philip Durbin, Developer, Harvard Dataverse

https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource

http://nobaproject.com/modules/the-replication-crisis-in-psychology

What has been the outcomes of open science
and mandates to share data?...

Researchers are sharing data
Huge increase in data deposits as sharing becomes the norm

Data are being shared, alongside the
research article!

Bi-directional linking of data and
research articles is taking place!
If I find your data, I can find your article.
If I find your article, I can find your data!

Self-curation repositories are being
developed to acquire and help
researchers publish their DATA!

1. Visibility: Studies have shown that open access content attracts more attention than
non-open access content
Increased citation and usage, Greater public engagement
2. Make new discoveries: Open access data and papers accelerate the pace of scientific
enquiry
Faster impact, Wider collaboration , Increased interdisciplinary conversation
3. Comply with funder mandates: open access is increasingly required by funders around the
world

https://www.fosteropenscience.eu/content/what-are-benefits-open-science

Key requirements for open data
● Availability
● Access
● Redistribution and reuse
© 2007 - 2018 SPARC, subject to a Creative Commons Attribution 4.0 International License

Connecting research articles to data...

Sünje Dallmeier-Tiessen (CERN)http://slideplayer.com/slide/5768687/

Article DOI
Dataset
identifier

Data discoverability and standard citation...

●
https://www.force11.org/group/fairgroup/fairprinciples

● establish easier access to research data on the Internet
● increase acceptance of research data as legitimate, citable contributions to the scholarly record
● support data archiving that will permit results to be verified and re-purposed for future study.

DataCite
● Open Access standards for Datasets
● International in scope including universities, research institutions, data governance agencies,
government entities, etc…
● DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for
research data. Our goal is to help the research community locate, identify, and cite research
data with confidence. (Datacite.org)

Success stories in raising research visibility
with data sharing...

http://www.ornl.gov/sci/techresources/Human_Genome/graphics/99-1133r.jpg

Increasing data availability statement
requirements
by publishers...

The Scientific Community is Establishing Best Practices
for Data Publishing and Replication

The Scientific Community is Establishing Best Practices
for Data Publishing and Replication...
DA-RT Journal Policies
Goal: To increase transparency in social science
In 2016, the first group of DA-RT Journals began to post new data sharing and transparency policies:
American Journal of Political Science's Guidelines for Preparing Replication Materials
American Political Science Review's DA-RT Guidelines
Conflict Management and Peace Science DA-RT guidelines
The Italian Political Science Review's Replication Policy and Policy for Datasets and Supplemental Files
State Politics and Policy Quarterly's Guidelines for Preparing Replication Policies

The Scientific Community is Establishing Best Practices for Data Publishing and Replication...

Authors Comply with Strong Data Policies

The aim of Springer Nature data sharing policy...
These new policies and services aim to:
● improve author service and experience by standardising research data policies and
procedures between journals where appropriate
● improve reader service by providing more consistent links between publications and data
● improve editor and peer reviewer service by providing more consistent guidelines and support
for research data policies, and increased visibility of data in the peer-review process
● encourage publication of more open and reproducible research
● increase growth and innovation in research data sharing
● provide a dedicated Research Data Support helpdesk for Springer Nature authors and editors
http://blogs.nature.com/ofschemesandmemes/2016/07/05/promoting-research-data-sharing-at-springer-nature

http://blogs.nature.com/ofschemesandmemes/2016/07/05/promoting-research-data-sharing-at-sprin
ger-nature

Data management and curation challenges...

*IDC Energy Insights for Oil & Gas 2015-2017 report: (2015 Upstream Intelligence, IDC Energy Insights, McKinsey
and Company, Bain and Company)

5 Reasons Healthcare Data Is Unique and Difficult to Measure By Dan LeSueur

Challenges include but are not limited to...
Meaningful data aggregation and analysis
Privacy and security demands
Missing integration of data sources and instruments
Complicated privacy laws (US and European)
Diverse stakeholders
Sandra Gesing Center for Research Computing, University of Notre Dame sandra.gesing@nd.edu 7th National Data Service Consortium
Workshop, Chicago 13 April 2017 Science Gateways: Addressing Data Management Challenges

Research data management solutions with
dataverse...

Dataverse is an open source web application to share, preserve, cite, explore, and
analyze research data. It facilitates making data available to others, and allows you to
replicate others' work more easily. Researchers, data authors, publishers, data distributors,
and affiliated institutions all receive academic credit and web visibility.
https://dataverse.org/
Data Management Plan
Checklist for data management plan
Template for data management plans
http://best-practices.dataverse.org/data-management/index.html

Authors
Publication Date Dataset title
Digital Object Identifier
Repository
Versioning

Code files required for data replication

Dataverse supports:
● Access and Sharing
● File Format Support
● Documentation, Metadata and Bibliographic Information
● Versioning

Dataverse facilitates data access by providing:
● descriptive and variable/question-level search;
● topical browsing;
● data extraction;
● re-formatting;
● on-line analysis
Dataverse performs:
● archival format migration;
● metadata extraction;
● validity checks;
The Dataverse application’s “templating” feature will be used for consistency of information across datasets.
The Dataverse repository automatically generates persistent identifiers, and Universal Numeric
Fingerprints (UNF) for datasets; extracts and indexes variable descriptions, missing-value codes and labels;
creates variable-level summary statistics; and facilitates open distribution
of metadata with a variety of standard formats (Data Cite, DDI v 2.5, Dublin Core, VO Resource,
and ISA-Tab) and protocols (OAI-PMH, SWORD)

Data Sharing Has Many Acceptable Levels
-Different levels of openness in sharing data
-Verification of reproducibility
-Replication data for, Data related to…
-Public version of a dataset vs restricted version

https://rin.lipi.go.id/dataverse/lipi

Management and curation of research data...

What is research data....?
● Observational: data captured in real time that is usually unique and irreplaceable. For example,
remote sensing data, survey data, field recordings, sample data
● Experimental: data captured from lab equipment that is often reproducible, but can be expensive.
For example, gene sequences, chromatograms, magnetic field data
● Models or simulations: data generated from test models where the model and metadata may be
more important than output data from the model. For example, climate models, economic models
● Derived or compiled: resulting from processing or combining ‘raw’ data, often reproducible, but may
be expensive. For example, text and data mining, compiled databases, 3D models
● Reference or canonical: a static or organic conglomeration or collection of datasets, probably
published and curated. For example, gene sequence databanks, collection of letters or archive of
historical images
http://libguides.ucd.ie/data/researchdata

The purpose of research data management...
● To ensure research integrity and validation of results.
● To increase research efficiency.
● To facilitate data security and minimise the risk of data loss.
● To ensure wider dissemination and increased impact.
● To enable research continuity through secondary data use.
● To ensure compliance with a funding agency’s requirements.
http://libguides.ucd.ie/data/researchdata

IQSS and the Dataverse Project

IQSS and the Dataverse Project
● Mission: "...enabling bigger, better, faster, and more
collaborative social science"
● Integrations powered by APIs
● Current and future efforts
● Community
● Transparency at all project levels

Transparency: Roadmap
https://dataverse.org/goals-roadmap-and-releases

Transparency: Daily Standup
https://waffle.io/IQSS/dataverse

Integrations Powered by APIs
● Data Deposit API (SWORD)
● Search API
● Download API
● Native API
http://guides.dataverse.org/en/latest/api

OJS Integration - Getting Data into Dataverse

OSF Integration - Getting Data into Dataverse

RSpace Integration - Getting Data into Dataverse

Deposit APIs: Ready for the Next Integration!

SHARE: Making Data Discoverable

Archivematica: Getting Data out of Dataverse
https://www.slideshare.net/datascienceiqss/bell-trimble-dataverse-community-meeting-2015-final-presentation

Current and Future Efforts
● Cloud
● Big Data
● Streaming Data
● File Hierarchy
● Embargo

Streaming Data
311Boston API
App for
Regular
Processing
● Citation
● Versioning
● File Appending
● R Scripts run at some
interval defined by
researcher
● Authentication to API
(if needed)
● Boston makes APIs
available for public
works data
● So do many others!

File Hierarchy (Folders, Directories)
https://github.com/IQSS/dataverse/issues/2249

LIPI's Dataverse Installation: RIN
● RIN is now part of the Dataverse community
● Open to all researchers in Indonesia
https://rin.lipi.go.id

Dataverse Community
● 60+ code contributors
● Hundreds of members of the Dataverse Community -
developers, researchers, librarians, data scientists
○ Dataverse Google Group
○ Dataverse Community Calls
○ Dataverse Community Meeting
https://groups.google.com/d/forum/dataverse-community

Dev Efforts from the Community
https://github.com/IQSS/dataverse/blob/develop/CONTRIBUTING.md

DATAVERSE COMMUNITY MEETING, 2018

Thank you! Comments? Questions?

References
Teplitzky, S. (2017). Open Data, [Open] Access: Linking Data Sharing and Article Sharing in the Earth
Sciences. Journal of Librarianship and Scholarly Communication, 5(General Issue), eP2150.
https://doi.org/10.7710/2162-3309.2150
Lee DJ, Stvilia B (2017) Practices of research data curation in institutional repositories: A qualitative view from repository staff. PLoS ONE 12(3):
e0173987. https://doi.org/10.1371/journal.pone.0173987
Drachen, T.M. et al. , (2016). Sharing data increases citations . LIBER Quarterly . 26 ( 2 ) , pp . 67–82 . DOI: http://doi.org/10.18352/lq.10149
Open Access and the Future of Scholarly Communication: Policy and Infrastructure
By Kevin L. Smith, Katherine A. Dickson

References
https://www.dataone.org/
https://www.datacite.org/
https://www.rd-alliance.org/open-data
https://www.oecd.org/sti/outlook/e-outlook/stipolicyprofiles/interactionsforinnovation/openscience.htm
https://www.nap.edu/read/5504/chapter/5#61
https://www.force11.org/group/fairgroup/fairprinciples
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002235
https://obamawhitehouse.archives.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research
http://www.righttoresearch.org/learn/whyoa/index.shtml
https://www.dartstatement.org/
https://datascience.codata.org/articles/10.5334/dsj-2017-009/

References
https://www.nature.com/openresearch/about-open-access/benefits-for-authors/
https://www.dtls.nl/fair-data/fair-principles-explained
https://cos.io/our-services/top-guidelines/
https://www.cessda.eu/
http://library.harvard.edu/sites/default/files/HarvardPurdue_Workshop_full.pdf
https://www.fosteropenscience.eu/content/what-open-science-introduction
http://www.unesco.org/new/en/communication-and-information/portals-and-platforms/goap/open-science-movement/
http://dataconservancy.org/
http://sciencecommons.org/resources/readingroom/principles-for-open-science/

Managing, Sharing and Curating Your Research Data in a Digital Environment

More Related Content

What's hot

Similar to Managing, Sharing and Curating Your Research Data in a Digital Environment

Recently uploaded

Managing, Sharing and Curating Your Research Data in a Digital Environment