SciELO International Conference 2018
Jonathan Crabtree
Director of Cyberinfrastructure
Odum Institute
Founded in 1924, the Odum Institute provides core research infrastructure for the social
sciences to support the research, teaching, and service mission of UNC. We define social
science broadly to include the health sciences, and we serve faculty and students from every
corner of UNC’s campus.
Home of the Lou Harris Data Center and the UNC Dataverse
An ongoing 12 year collaboration around repository solutions and tools
Partnering on projects to promote data sharing and publication
Leading efforts to promote Open and Reproducible Science
An open-source platform to share and archive data
Developed at Harvard’s Institute for Quantitative Social Science since 2006
Gives credit and control to data authors and producers
Builds a community to define standards and best practices and foster new
research in data sharing and research reproducibility
Has brought data publishing into the hands of data authors
๏ Data Citation with global persistent IDs:
๏ generate DOI automatically
attribution to data authors and repository
registration to DataCite
๏
๏
๏ Rich Metadata:
๏ citation metadata
domain-specific descriptive metadata
variable and file metadata (extracted automatically)
๏
๏
๏ Access and usage controls:
๏ open data as default, with CC0 waiver
custom terms of use and licenses, when needed
data can be restricted, but citation & metadata always publicly accessible
๏
๏
๏ APIs and standards:
๏ SWORD, OAI-PMH, native API to search and get data and metadata Dublin Core and DDI
metadata standards
PROV ontology standard to capture provenance of a dataset (coming soon)
๏
๏
Has grown to 33 installations around the world
Thousands of scientific studies archived
Across many disciplines
Supporting many metadata standards
The Global Dataverse Community Consortium (GDCC) is dedicated to
providing international organization to existing Dataverse community
efforts, and will provide a collaborative venue for institutions to leverage
economies of scale in support of Dataverse repositories around the world.
http://DataverseCommunity.Global
But, are these shared data reusable?
For this, we need well-documented, well-organized data and code as well as
tools to facilitate the replication and reuse
More than 50% of the top
50 journals in
anthropology, economics,
psychology, and
political sciences have
data policies that either
encourage or require to
share the data
associated with the
article.
Crosas, Gautier, Karcher, Kirilova, Otalora, Schwartz, 2018. Data Policies of highly-ranked social science journals
With funding from the Sloan Foundation, our organizations plan to address data reuse and reproducibility by:
– Improving curation through educational materials, friendly user-interface, and services
– Integrating replication tools with Dataverse repositories:
• Encapsulator to pack your data and code in a self-contained, documented capsule (IQSS Harvard)
• Code Ocean to easily run scientific code online (IQSS Harvard)
• CoRe2 to connect systems in order to streamline the verification workflow (ODUM Institute)
The Confirmable
Reproducible
Research (CoRe2)
Environment
Linking Tools to Promote
Computational Reproducibility
Support for this research was provided by the
Alfred P. Sloan Foundation (2018-11121). The
views expressed here do not necessarily reflect
the views of the Foundation.
AJPS
STATE
POLITICS &
POLICY
QUARTERLY
< >AUTHOR
EDITOR
VERIFIER
CURATOR
1
2
3
4
Manuscript Publication & Data Curation + Verification
< >AUTHOR
EDITOR
VERIFIER
CURATOR
1
2
3
4
Manuscript Publication & Data Curation + Verification
Manuscript Publication & Data Curation + Verification
< >AUTHOR
EDITOR
VERIFIER
CURATOR
1
2
3
4
< >AUTHOR
EDITOR
VERIFIER
CURATOR
1
2
3
4
Manuscript Publication & Data Curation + Verification
< >AUTHOR
EDITOR
VERIFIER
CURATOR
1
2
3
4
Manuscript Publication & Data Curation + Verification
Manuscript Publication & Data Curation + Verification
Given current constraints and the need for iterative review, data curation and
successful verification of a replication package for a single manuscript requires
six hours of labor on average.
COMPUTATION COORDINATION ADMINISTRATION
COMPUTATION COORDINATION ADMINISTRATION
binder
encapsulator
Promote and support computational reproducibility by
integrating and streamlining manuscript publication and
data curation + verification workflows
● Facilitate access to and adoption of tools and platforms to support
scientific reproducibility
● Coordinate manuscript submission and data curation + verification
workflow processes across key stakeholders
● Promote the adoption of standards and best practices for data access and
transparency as part of normative research practice.
AUTHOR
EDITOR VERIFIER
CURATOR
binder
< >
encapsulator
More Information at:
http://dataverse.org
http://dataversecommunity.global
http://www.odum.unc.edu
Merce Crosas at IQSS
https://scholar.harvard.edu/mercecrosas/home
Odum Co-PI Thu-Mai Christian and Visual Arts Specialist Kasha Ely

Jonathan David Crabtree - The Dataverse Community: Supporting Open Science and Reproducibility

  • 1.
    SciELO International Conference2018 Jonathan Crabtree Director of Cyberinfrastructure Odum Institute
  • 2.
    Founded in 1924,the Odum Institute provides core research infrastructure for the social sciences to support the research, teaching, and service mission of UNC. We define social science broadly to include the health sciences, and we serve faculty and students from every corner of UNC’s campus. Home of the Lou Harris Data Center and the UNC Dataverse
  • 3.
    An ongoing 12year collaboration around repository solutions and tools Partnering on projects to promote data sharing and publication Leading efforts to promote Open and Reproducible Science
  • 4.
    An open-source platformto share and archive data Developed at Harvard’s Institute for Quantitative Social Science since 2006 Gives credit and control to data authors and producers Builds a community to define standards and best practices and foster new research in data sharing and research reproducibility Has brought data publishing into the hands of data authors
  • 5.
    ๏ Data Citationwith global persistent IDs: ๏ generate DOI automatically attribution to data authors and repository registration to DataCite ๏ ๏ ๏ Rich Metadata: ๏ citation metadata domain-specific descriptive metadata variable and file metadata (extracted automatically) ๏ ๏ ๏ Access and usage controls: ๏ open data as default, with CC0 waiver custom terms of use and licenses, when needed data can be restricted, but citation & metadata always publicly accessible ๏ ๏ ๏ APIs and standards: ๏ SWORD, OAI-PMH, native API to search and get data and metadata Dublin Core and DDI metadata standards PROV ontology standard to capture provenance of a dataset (coming soon) ๏ ๏
  • 7.
    Has grown to33 installations around the world Thousands of scientific studies archived Across many disciplines Supporting many metadata standards
  • 9.
    The Global DataverseCommunity Consortium (GDCC) is dedicated to providing international organization to existing Dataverse community efforts, and will provide a collaborative venue for institutions to leverage economies of scale in support of Dataverse repositories around the world. http://DataverseCommunity.Global
  • 10.
    But, are theseshared data reusable? For this, we need well-documented, well-organized data and code as well as tools to facilitate the replication and reuse
  • 11.
    More than 50%of the top 50 journals in anthropology, economics, psychology, and political sciences have data policies that either encourage or require to share the data associated with the article. Crosas, Gautier, Karcher, Kirilova, Otalora, Schwartz, 2018. Data Policies of highly-ranked social science journals
  • 12.
    With funding fromthe Sloan Foundation, our organizations plan to address data reuse and reproducibility by: – Improving curation through educational materials, friendly user-interface, and services – Integrating replication tools with Dataverse repositories: • Encapsulator to pack your data and code in a self-contained, documented capsule (IQSS Harvard) • Code Ocean to easily run scientific code online (IQSS Harvard) • CoRe2 to connect systems in order to streamline the verification workflow (ODUM Institute)
  • 13.
    The Confirmable Reproducible Research (CoRe2) Environment LinkingTools to Promote Computational Reproducibility Support for this research was provided by the Alfred P. Sloan Foundation (2018-11121). The views expressed here do not necessarily reflect the views of the Foundation.
  • 14.
  • 15.
  • 16.
  • 17.
    Manuscript Publication &Data Curation + Verification < >AUTHOR EDITOR VERIFIER CURATOR 1 2 3 4
  • 18.
  • 19.
  • 20.
    Manuscript Publication &Data Curation + Verification
  • 21.
    Given current constraintsand the need for iterative review, data curation and successful verification of a replication package for a single manuscript requires six hours of labor on average. COMPUTATION COORDINATION ADMINISTRATION
  • 22.
  • 23.
    Promote and supportcomputational reproducibility by integrating and streamlining manuscript publication and data curation + verification workflows
  • 24.
    ● Facilitate accessto and adoption of tools and platforms to support scientific reproducibility ● Coordinate manuscript submission and data curation + verification workflow processes across key stakeholders ● Promote the adoption of standards and best practices for data access and transparency as part of normative research practice.
  • 25.
  • 26.
  • 27.
    Merce Crosas atIQSS https://scholar.harvard.edu/mercecrosas/home Odum Co-PI Thu-Mai Christian and Visual Arts Specialist Kasha Ely