Varsha Khodiyar discussed data publishing and institutional repositories. Data papers allow data producers to receive credit and make data reuse easier by including details about what was done to generate the data, how it was processed, its location, who was involved and technical analyses supporting its quality. Unlike traditional articles, data papers do not contain new scientific hypotheses testing. Scientific Data publishes both a human-readable article and machine-accessible metadata. It recommends data repositories and allows authors to use their institution's repository when submitting, obtaining a DataCite DOI. Repositories are evaluated based on recognition, preservation plans, community standards implementation, stable identifiers and open access without commercial use restrictions.
The challenge of sharing data well, how publishers can helpVarsha Khodiyar
Researchers, academic institutes and funders are increasingly recognizing the importance of data sharing for reproducible science. However, it is not always straightforward and clear to researchers as to how best to share data in a useful way. At Springer Nature we are working on several initiatives to help facilitate the sharing of research data in a reusable way, with our overarching goal being to publish research that is robust and reproducible. I will talk about the effort that goes into our flagship data journal, Scientific Data, to facilitate best practices in publication and sharing of research data, and share some of our experiences publishing Challenge datasets. I will also describe some of the newer Research Data Services that are now available to help all researchers (not only Springer Nature authors) to share their data in a useful way.
Short overview responding to the following 4 questions, as suggested by the RDA Long Tail Data IG:
1. Name and location of institution/service
2. What type of data do you collect and how do you acquire the data?
3. What services do you provide?
4. How do you intend to interoperate with a global ecosystem of research data?
The challenge of sharing data well, how publishers can helpVarsha Khodiyar
Researchers, academic institutes and funders are increasingly recognizing the importance of data sharing for reproducible science. However, it is not always straightforward and clear to researchers as to how best to share data in a useful way. At Springer Nature we are working on several initiatives to help facilitate the sharing of research data in a reusable way, with our overarching goal being to publish research that is robust and reproducible. I will talk about the effort that goes into our flagship data journal, Scientific Data, to facilitate best practices in publication and sharing of research data, and share some of our experiences publishing Challenge datasets. I will also describe some of the newer Research Data Services that are now available to help all researchers (not only Springer Nature authors) to share their data in a useful way.
Short overview responding to the following 4 questions, as suggested by the RDA Long Tail Data IG:
1. Name and location of institution/service
2. What type of data do you collect and how do you acquire the data?
3. What services do you provide?
4. How do you intend to interoperate with a global ecosystem of research data?
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...LEARN Project
Enabling Precise Identification and Citability of Dynamic Data: Recommendations of the RDA Working Group, by Andreas Rauber – 2nd LEARN Workshop, Vienna, 6th April 2016
Crossref as a source of open bibliographic metadataNees Jan van Eck
Presentation at the 18th International Conference of the International Society for Scientometrics and Informetrics, July 12-15, 2021.
Several initiatives have been taken to promote the openly availability of bibliographic metadata of scholarly publications in Crossref. We present an up-to-date overview of the availability of six metadata elements in Crossref: reference lists, abstracts, ORCIDs, author affiliations, funding information, and license information. Our analysis shows that the availability of these metadata elements has improved over time. However, it also shows that many publishers need to make additional efforts to realize full openness of bibliographic metadata. To illustrate the value of open metadata, we use the metadata in Crossref to construct and visualize a large citation network of scholarly journals.
Data discovery and metadata - Natasha Simons
Research Data Management workshop at the iSchools Data Science Winter Institute, 7-9 December 2017, University of Hong Kong
Presentation slides from a talk by Gareth Knight which discussed the need to consider data sharing activities in academic citizenship, different approaches that may be taken to publish data associated with publications, and the opportunities presented by data journals
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...University of Edinburgh
Talk targeted at researchers at the University of Edinburgh, explaining how they can use DataShare to publish their research results, and some of the benefits of doing so.
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...OpenAIRE
Presentation at the OpenAIRE-COAR Conference: "Open Access Movement to Reality: Putting the Pieces Together", Athens - May 21-22, 2014.
Session 2: Research data in the institutional context and beyond.
Allowing research data to shine: providing tangible credit for data sharing, by Varsha Khodiyar - Editorial Biocurator at F1000Research
Introductory talk for ANDS workshop on Institutional Repositories and data. The talk situates the topic within the field of scholarly communication before comparing the relative technical simplicity of running repositories of publications with the complexities that accompany a shift to data. The most-retweeted slide is the one viewing the response of repository managers to data through the lens of Elizabeth Kübler-Ross' stages of grieving.
The thorough integration of information technology and resources into scientific workflows has nurtured a new paradigm of data-intensive science. However, far too much research activity still takes place in silos, to the detriment of open scientific inquiry and advancement. Data-intensive science would be facilitated by more universal adoption of good data management practices ensuring the ongoing viability and usability of all legitimate research outputs, including data, and the encouragement of data publication and sharing for reuse. The centerpiece of such data sharing is the digital repository, acting as the foundation for external value-added services supporting and promoting effective data acquisition, publication, discovery, and dissemination. Since a general-purpose curation repository will not be able to offer the same level of specialized user experience provided by disciplinary tools and portals, a layered model built on a stable repository core is an appropriate division of labor, taking best advantage of the relative strengths of the concerned systems.
The Merritt repository, operated by the University of California Curation Center (UC3) at the California Digital Library (CDL), functions as a curation core for several data sharing initiatives, including the eScholarship open access publishing platform, the DataONE network, and the Open Context archaeological portal. This presentation with highlight two recent examples of external integration for purposes of research data sharing: DataShare, an open portal for biomedical data at UC, San Francisco; and Research Hub, an Alfresco-based content management system at UC, Berkeley. They both significantly extend Merritt’s coverage of the full research data lifecycle and workflows, both upstream, with augmented capabilities for data description, packaging, and deposit; and downstream, with enhanced domain-specific discovery. These efforts showcase the catalyzing effect that coupled integration of curation repositories and well-known public disciplinary search environments can have on research data sharing and scientific advancement.
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...LEARN Project
Enabling Precise Identification and Citability of Dynamic Data: Recommendations of the RDA Working Group, by Andreas Rauber – 2nd LEARN Workshop, Vienna, 6th April 2016
Crossref as a source of open bibliographic metadataNees Jan van Eck
Presentation at the 18th International Conference of the International Society for Scientometrics and Informetrics, July 12-15, 2021.
Several initiatives have been taken to promote the openly availability of bibliographic metadata of scholarly publications in Crossref. We present an up-to-date overview of the availability of six metadata elements in Crossref: reference lists, abstracts, ORCIDs, author affiliations, funding information, and license information. Our analysis shows that the availability of these metadata elements has improved over time. However, it also shows that many publishers need to make additional efforts to realize full openness of bibliographic metadata. To illustrate the value of open metadata, we use the metadata in Crossref to construct and visualize a large citation network of scholarly journals.
Data discovery and metadata - Natasha Simons
Research Data Management workshop at the iSchools Data Science Winter Institute, 7-9 December 2017, University of Hong Kong
Presentation slides from a talk by Gareth Knight which discussed the need to consider data sharing activities in academic citizenship, different approaches that may be taken to publish data associated with publications, and the opportunities presented by data journals
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...University of Edinburgh
Talk targeted at researchers at the University of Edinburgh, explaining how they can use DataShare to publish their research results, and some of the benefits of doing so.
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...OpenAIRE
Presentation at the OpenAIRE-COAR Conference: "Open Access Movement to Reality: Putting the Pieces Together", Athens - May 21-22, 2014.
Session 2: Research data in the institutional context and beyond.
Allowing research data to shine: providing tangible credit for data sharing, by Varsha Khodiyar - Editorial Biocurator at F1000Research
Introductory talk for ANDS workshop on Institutional Repositories and data. The talk situates the topic within the field of scholarly communication before comparing the relative technical simplicity of running repositories of publications with the complexities that accompany a shift to data. The most-retweeted slide is the one viewing the response of repository managers to data through the lens of Elizabeth Kübler-Ross' stages of grieving.
The thorough integration of information technology and resources into scientific workflows has nurtured a new paradigm of data-intensive science. However, far too much research activity still takes place in silos, to the detriment of open scientific inquiry and advancement. Data-intensive science would be facilitated by more universal adoption of good data management practices ensuring the ongoing viability and usability of all legitimate research outputs, including data, and the encouragement of data publication and sharing for reuse. The centerpiece of such data sharing is the digital repository, acting as the foundation for external value-added services supporting and promoting effective data acquisition, publication, discovery, and dissemination. Since a general-purpose curation repository will not be able to offer the same level of specialized user experience provided by disciplinary tools and portals, a layered model built on a stable repository core is an appropriate division of labor, taking best advantage of the relative strengths of the concerned systems.
The Merritt repository, operated by the University of California Curation Center (UC3) at the California Digital Library (CDL), functions as a curation core for several data sharing initiatives, including the eScholarship open access publishing platform, the DataONE network, and the Open Context archaeological portal. This presentation with highlight two recent examples of external integration for purposes of research data sharing: DataShare, an open portal for biomedical data at UC, San Francisco; and Research Hub, an Alfresco-based content management system at UC, Berkeley. They both significantly extend Merritt’s coverage of the full research data lifecycle and workflows, both upstream, with augmented capabilities for data description, packaging, and deposit; and downstream, with enhanced domain-specific discovery. These efforts showcase the catalyzing effect that coupled integration of curation repositories and well-known public disciplinary search environments can have on research data sharing and scientific advancement.
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
Talk at NITRD Workshop "Measuring the Impact of Digital Repositories" February 28 – March 1, 2017 https://www.nitrd.gov/nitrdgroups/index.php?title=DigitalRepositories
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...EUDAT
| www.eudat.eu | This webinar was co-organised by DANS, EUDAT and OpenAIRE and was held on 12th and 13th December 2016.
Everybody wants to play FAIR, but how do we put the principles into practice?
There is a growing demand for quality criteria for research datasets. In this webinar we will argue that the DSA (Data Seal of Approval for data repositories) and FAIR principles get as close as possible to giving quality criteria for research data. They do not do this by trying to make value judgements about the content of datasets, but rather by qualifying the fitness for data reuse in an impartial and measurable way. By bringing the ideas of the DSA and FAIR together, we will be able to offer an operationalization that can be implemented in any certified Trustworthy Digital Repository.
In 2014 the FAIR Guiding Principles (Findable, Accessible, Interoperable and Reusable) were formulated. The well-chosen FAIR acronym is highly attractive: it is one of these ideas that almost automatically get stuck in your mind once you have heard it. In a relatively short term, the FAIR data principles have been adopted by many stakeholder groups, including research funders.
The FAIR principles are remarkably similar to the underlying principles of DSA (2005): the data can be found on the Internet, are accessible (clear rights and licenses), in a usable format, reliable and are identified in a unique and persistent way so that they can be referred to. Essentially, the DSA presents quality criteria for digital repositories, whereas the FAIR principles target individual datasets.
In this webinar the two sets of principles will be discussed and compared and a tangible operationalization will be presented.
INSERM Workshop 246 - Management and reuse of health data: methodological issues: https://ateliersinserm.dakini.fr/en/workshop.246.management.and.reuse.of.health.data.methodological.issues-66-22.php
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureXiaogang (Marshall) Ma
A presentation with a review of technical trends in data management, publication and citation, and methodologies on data interoperability, provenance of research and semantic escience.
This presentation was provided by Chris Erdmann of Library Carpentries and by Judy Ruttenberg of ARL during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
Talk at JISC Repositories conference intended for repository managers or research managers on some of the issues involved. Talk had to be originally given unaided because of a technology problem!
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
An overview of recent works on entitiy linking and retrieval in large corpora, specifically bibliographic data. The works address both traditional Linked Data and knowledge graphs as well as data extracted from Web markup, such as the Web Data Commons.
Talk given at the Data Visualisation and the Future of Academic Publishing event. https://www.eventbrite.com/e/data-visualisation-and-the-future-of-academic-publishing-tickets-25372801733?password=dataviz
FAIR for the future: embracing all things dataARDC
FAIR for the future: embracing all things data - Natasha Simons, Keith Russell and Liz Stokes, presented at Taylor & Francis Scholarly Summits in Sydney 11 Feb 2019 and Melbourne 14 Feb 2019.
"Open Science, Open Data" training for participants of Software Writing Skills for Your Research - Workshop for Proficient, Helmholtz Centre Potsdam - GFZ German Research Centre for Geosciences, Telegrafenberg, December 16, 2015
Open science curriculum for students, June 2019Dag Endresen
Living Norway seminar on Open Science in Trondheim 12th June 2019.
https://livingnorway.no/2019/04/26/living-norway-seminar-2019/
https://www.gbif.no/events/2019/living-norway-seminar.html
Rebecca Grant - DH research data: identification and challenges (DH2016)dri_ireland
Presentation made by Rebecca Grant as part of the panel session “Digital data sharing: the opportunities and challenges of opening research” at the Digital Humanities conference, Krakow, 15 July 2016. This paper “DH research data: identification and challenges” provided an introduction to concepts of research data in the digital humanities, including accepted definitions of what constitutes research data in a DH context.
Digital transformation to enable a FAIR approach for health data scienceVarsha Khodiyar
Invited talk for ConTech Pharma on 1st March 2022
Abstract
Health Data Research UK is the UK’s national institute for health data science, with a mission to unite the UK’s health data to enable discoveries that improve people’s lives. In this talk, Dr Varsha Khodiyar will outline how HDR UK is bringing together disparate health data from all four countries of the United Kingdom, creating the infrastructure to enable discovery of and access to health data, and the convening standards making bodies to improve data linkage and data reuse. Varsha will also discuss how HDR UK is moving beyond the traditional confines of FAIR data to also ensure that data sharing and data use is transparent and ‘fair’ for the patients and lay public who are the subjects of these datasets.
Lessons from the UK: Data access, patient trust & real-world impact with heal...Varsha Khodiyar
Slides supporting presentation given at the virtual Beilstein Open Science Symposium in October 2021.
Abstract:
Health Data Research UK’s mission is to unite the UK’s health data to enable discoveries that improve people’s lives. Our 20-year vision is for large scale data and advanced analytics to benefit every patient interaction, clinical trial, biomedical discovery and enhance public health. A key part of HDR UK’s vision is our data portal, the Innovation Gateway. The Gateway facilitates discovery of healthcare data and simplifies data request procedures across multiple data custodians. The Gateway contains metadata on a variety of datasets, including those related to COVID-19, cardiovascular, maternal health, emergency care, primary care, secondary care, acute care, palliative care, biobanks, research cohorts and deeply phenotyped patient cohorts.
From the outset HDR UK has sought the voices, views and experiences of patient and lay-public groups to ensure there is transparency and clear public benefit in the use of the UK’s health data. Patient and public involvement is key to making the Gateway accessible, transparent and to ensure public confidence in research access to health data. The importance of public outreach combined with providing research access to data is illustrated with HDR UK’s contribution to the UK’s coronavirus pandemic response. HDR UK was tasked by the UK’s Chief Scientific Office to build and facilitate the infrastructure to support the National Core Studies, providing key insights on the evolving situation to UK policy makers during the course of the pandemic.
In this talk, I will show how HDR UK is enabling open science by facilitating the discovery of health data, and simplifying the process of requesting access to multiple datasets. I’ll discuss HDR UK’s approach to embedding transparency on research data usage for patients and public, and summarise some of the key ways in which HDR UK has contributed to the coronavirus pandemic.
The information in this slide deck was presented at the Covid Crisis in India - Information & Appeal on Sunday 23rd May 2021.
If you find the information in this slide deck useful, please donate to https://justgiving.com/fundraising/covidcrisisinindia
Data citation and sharing during article publicationVarsha Khodiyar
Deck presented to CHORUS forum on 21st Jan 2021, as part of panel on Data Citations & Sharing (https://www.chorusaccess.org/events/chorus-forum-new-connections/)
What role can publishers play in the open data ecosystem?Varsha Khodiyar
Presentation at session 3 of the NIH workshop 'Role of Generalist Repositories to Enhance Data Discoverability and Reuse' on Feb 11th, at the NIH Main Campus.
New approaches to data management: supporting FAIR data sharing at Springer N...Varsha Khodiyar
Presentation given at Biocuration 2019 Session 5 (Data standards and ontologies: Making data FAIR)
Abstract:
Since 2016, academic publishers including Springer Nature, Elsevier and Taylor & Francis have been providing standard research data policies to journal authors, reflecting key aspects of the FAIR Principles’ practical applications: sharing data in repositories, using persistent identifiers and citing data appropriately. In spite of the rise of FAIR and good data management practice, recent surveys found that nearly 60% of researchers had never heard of the FAIR Principles, and 46% are not sure how to organise their data in a presentable and useful way. In this presentation we will analyse the results of a white paper which assessed the key challenges faced by researchers in sharing their data, and discuss current initiatives and approaches to support researchers to adopt good data sharing practice.
These include the roll-out of research data policies since 2016, as well as the launch of a Helpdesk service which has provided support to authors and allowed the research data team to capture more granular information on the challenges they face in sharing their data. We will also discuss the development of a third-party curation service which assists authors in depositing their data into appropriate repositories, and drafting data availability statements.
Finally we will assess the impacts of some of these interventions, including an analysis of data availability statements and an overview of the methods authors are currently using to share their data, and how these align with FAIR.
The value of data curation as part of the publishing processVarsha Khodiyar
Presentation given at Biocuration 2019 Session 5 (Interacting with the Research Community)
Abstract:Journals and publishers have an important role to play in the drive to increase the reproducibility of published science. Since its launch in 2014, the Nature Research journal Scientific Data has established a reputation for publishing data papers (‘Data Descriptors’) that are highly reusable, as evidenced by a strong citation record. One of the ways in which Scientific Data ensures maximum reusability of published data is via the in-house data curation workflow applied to every Data Descriptor. In 2017, Springer Nature launched its Research Data Support (RDS) service to provide data curation expertise to researchers publishing at other Springer Nature journals.
During curation at Scientific Data and RDS, our data editors familiarise themselves with the related manuscript and perform a thorough check of each data archive. This ensures the descriptions in the manuscript match the metadata and data at the data repositories. The curation process facilitates the identification of any discrepancies between the manuscript text and the information held at the data repository.
Over the last year, the curation team have been recording the types of discrepancies rectified as a direct result of our curation process. At Scientific Data approximately 10% of the discrepancies the team find are significant enough to potentially have warranted a formal correction had the issue had not been resolved prior to publication.
In this presentation we give an overview of our observed outcomes from embedding data curation within the publishing process. We describe of how we are monitoring the value of our curation work, and show examples of the types of discrepancy most commonly identified through curation at Scientific Data and RDS.
Preparing your data for sharing and publishingVarsha Khodiyar
Talk given as part of the MRC Cognition and Brain Sciences Unit Open Science Day on 20th November 2018 , University of Cambridge (https://www.eventbrite.co.uk/e/open-science-day-at-the-mrc-cbu-tickets-50363553745)
Facilitating good research data management practice as part of scholarly publ...Varsha Khodiyar
Presentation given to the SciDataCon #IDW2018 session: Democratising Data Publishing: A Global Perspective, on Tuesday 6th November 2018, Gaborone, Botswana
Practical challenges for researchers in data sharingVarsha Khodiyar
Presentation given at the Research Data Alliance Plenary 12 session: IG Open Questionnaire for Research Data Sharing Survey, on Tuesday 6th November 2018, Gaborone, Botswana
Update from Data policy standardisation and implementation IGVarsha Khodiyar
Update given to the Research Data Alliance Plenary 12 joint meeting session: WG FAIRSharing Registry and Data Policy Standardisation and Implementation IG, on Monday 5th November 2018, Gaborone, Botswana
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
1. Varsha Khodiyar, PhD
Data Curation Editor, Scientific Data
Nature Publishing Group
@varsha_khodiyar
@scientificdata
Twitter friendly talk
Data Publishing and Institutional Repositories
FORCE16 : Libraries United in Opening New Scholarly Platforms
18th April 2016
4. Why data papers? - Data reuse is easier
4
“The Data Descriptor made it easier
to use the data, for me it was critical
that everything was there…all the
technical details like voxel size.”
Professor Daniele Marinazzo
5. Results
Discussion
Analysis & Conclusions
What was done to generate the data?
How was the data processed?
Where is the data?
Who did what and when?
Methods and technical analyses supporting the quality of the measurements.
Do not contain tests of new scientific hypotheses
Comparison of data paper to traditional article
6. Data papers at Scientific Data
6
Human readable
representation of
study
i.e. article (HTML &
PDF)
Human readable
component
i.e. article text
(HTML & PDF)
Machine
accessible
component
i.e. metadata
(ISA format)
7. Scientific Data’s Repository List
Almost 80 recommended data repositories listed
1. Is there a public data-specific repository for your data?
2. If there is no public data-specific repository for your data
exists, does your funder or institution mandate deposition to
a particular repository?
www.nature.com/sdata/data-policies/repositories
7
8. Authors can use their own IR when submitting to
Scientific Data
For institutional repositories (such as ScholarsArchive@OSU), select
‘DataCite DOI’ as Repository Name during submission
DataCite
8
11. What do data journals require from a repository?
1. Recognized within their scientific community
2. Long-term data preservation plan
3. Implementation of community reporting standards
4. Stable identifiers for published datasets
5. Allow open access to data without unnecessary
restrictions e.g. no commercial use restrictions
Questionnaire for new repositories requesting listing:
http://www.nature.com/sdata/data-policies#repo-suggest
11
12. Visit nature.com/sdata
Email scientificdata@nature.com
Tweet @ScientificData
Honorary Academic Editor
Susanna-Assunta Sansone
Managing Editor
Andrew L. Hufton
Data Curation Editor
Varsha K. Khodiyar
Advisory Panel and Editorial
Board including senior researchers,
funders, librarians and curators
Supported by
Editor's Notes
Scientific Data is an open-access, peer-reviewed publication for descriptions of scientifically valuable datasets. Our primary article-type, the Data Descriptor, is designed to make your data more discoverable, interpretable and reusable.
Biology example: Sequence data from this Data Descriptor, was used to help develop and test this groundbreaking assembly algorithm published at Nature. Data publication does NOT in any way preclude collaboration and co-authorship – our observations so far suggest very much the opposite.
Daniele knew about the dataset prior to Chris’ paper being published, as Chris had shared this in Torrent Exchange. However he did not access the data from this. He saw on Twitter when the SciData paper was published and then read the paper. Daniele said “I would never have collected this data myself, as it’s not my primary field of work”.
He said the Data Descriptor made it easier to use the data “for me it was critical that everything was there [in the Data Descriptor ]…all the technical details like voxel size.”
Data Descriptors are methodologically driven while traditional research articles are hypothesis driven.
Nature-titled journals have agreed that prior publication of a Data Descriptor will not compromise the novelty of new manuscript submissions as long as those manuscripts go substantially beyond a descriptive analysis of the data, and report important new scientific findings appropriate for the journal.
See full policy online: http://www.nature.com/sdata/for-authors/editorial-and-publishing-policies/#prior-pub
We currently list almost 80 repositories, across biological, medical, physical and social sciences
When required, we provide guidance to authors on the best place to store their data