Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Publishing Models by Sünje Dallmeier-Tiessen


Published on

Data Publishing is becoming an integral part of scholarly communication today. Thus, it is indispensable to understand how data publishing works across disciplines. Are there best practices others can learn from or even data publishing standards? How do they impact interoperability in the Open Science landscape? The presentation will look at a range of examples, and the main building blocks of data publishing today. The work has been conducted as part of the RDA Data Publishing Workflows group.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Data Publishing Models by Sünje Dallmeier-Tiessen

  1. 1. Data Publishing Models Sünje Dallmeier-Tiessen, PhD CERN, Harvard University For the RDA-WDS Data Publishing Workflow Group June 9th, 2015
  2. 2. Topics • What is data publishing • Why do we care about it (today) • Models in data publishing • Building blocks • Information gathered through trusted data publishing • Relevance and conclusions for today’s workshop This is work conducted by the RDA-WDS group on data publishing workflows, chaired in collaboration with Fiona Murphy and Theo Bloom.
  3. 3. Data Publishing … describes the process of making research data and other research objects available on the web so that they can be discovered and referred to in a unique and persistent way. At its best, data publishing takes place through dedicated data repositories and data journals and ensures that the published research objects are well documented, curated, archived for the long term, interoperable, citable and quality assured. Thus, they are reusable and discoverable on the long term.
  4. 4. Examples
  5. 5. Analysis elements • Discipline, responsible units (i.e. their roles) • Function of workflow • PID assignment: DOI, ARK, etc. • Peer review of data (e.g. by researcher & editorial review) • Curatorial review of metadata (e.g. by institutional or subject repository?) • Technical review & checks (e.g. for data integrity at repository upon ingestion) • Formats covered • Persons/Roles involved, e.g. editor, publisher, data repository manager, etc. • Links to additional data products (data paper; review documents; other journal articles) or “stand-alone” product • Links to grants, usage of author PIDs • Discoverability: Indexing of the data -- if yes, where? • Data citation facilitated • Data life cycle reference • Standards compliance
  6. 6. Repository’s perspective
  7. 7. Data Deposit Ingest Quality Assurance Data Management LT Archiving Dissemination Access Producer Consumer/ Reuse Simplified generic repository workflow Researcher with a central role during submission/deposition Review/QA mainly internal through dedicated curation personnel
  8. 8. Data Deposit Ingest Quality Assurance Light Data Management LT Archiving Dissemination Access Producer Consumer (disciplinary) Ingest Quality Assurance Detailed Project Repositories: • Data are published in a federated data infrastructure • Data are added and corrected • Poor documentation • Usually no data backup • Light-weight quality assurance against intl. and project standards • Tendency that the project data never become stable • Currently no PIDs assigned or reserved but Handles planned Long-term Archive: • Data are archived for the long term at a single location • Data are stable and curated • Detailed documentation • Data backup/redundancy • Quality assurance process is more detailed and includes a review • Data is a “snapshot” of the project data at a certain time • DOIs assigned to data collections Consumer (interdisciplinary) Dissemination Access Content provided by M. Stockhause Disciplinary repository example
  9. 9. Lessons learnt and questions • Very diverse landscape • Discipline-specific and cross-discipline actions • Quality assurance a big topic in discipline-specific repositories • Widespread persistent identification • Data citation awareness • Challenge: Versioning
  10. 10. Publisher’s perspective
  11. 11. Article preparation Data Submission Article submission Peer Review Process EditingProducer Consumer/ Reuse Simplified generic publisher workflow Researcher takes over several roles: submitter, reviewer, editor potentially? - Article/data container - Separate article and datasets Publishing Data repositories
  12. 12. Example Workflows in Dataverse: Connect Data to Journals A. Journals include Dataverse as a Recommended Repository B. Authors Contribute Directly to a Journal’s Dataverse C. Automated Integration of Journal + Dataverse (e.g., OJS) Slide by Eleni Castro
  13. 13. Example: Dryad repository integrated with journals Slide by T. Bloom
  14. 14. Data publishing building blocks Primary data entry with PID Repository entry Metadata Curation Parallel data description Data Paper or link to it Link to results paper Linked and published quality assurance Curation, Editing process Peer review Any kind of QA process Additional visibility Push to ORCID, author pages, impact/reput ation building tools Enable index (Data citation index, crawled by Google) Basic published product Add-ons: workflows for more documentation, QA, visibility
  15. 15. Trusted data publishing contains: • Standardized information about the data – Disciplinary standards – Basic common metadata sets • Distinct Roles, Workflows and Responsibilities – Authorship, Submission – Curation – Quality Assurance – Peer review • Persistent Identification – Permanent reference – Data citation
  16. 16. Challenges • Interoperability challenges – Different metadata schemas – Rich vs. limited metadata • Discoverability challenges – E.g. no bi-directional linking – Usability challenges in aggregators • Metrics and accreditation • What information is needed for future reuse/remix/reproducibility • How can this information be exposed – human and machine readable
  17. 17. Thank you!
  18. 18. Data Publishing Workflows Activities and processes in a digital environment that lead to the publication of research data and other research objects on the Web. These activities may be performed by humans or in an automated fashion. In contrast to the interim or final published products, workflows are the means to curate, document, peer review and thus ensure and enhance the value of the published product.