This presentation was provided by Kathryn Funk of The U.S. National Library of Medicine (NIH), and Jeff Beck of The National Center for Biotechnology Information (NCBI), during the NISO Event "Open Access: The Role and Impact of Preprint Servers," held November 14 - 15, 2019.
Hierarchy of management that covers different levels of management
Funk and Beck "Driving Use: Identifiers and Enhanced Metadata"
1. Driving Use of Preprints:
Identifiers and Enhanced Metadata
Jeffrey Beck – Program Head, Literature
Kathryn Funk – Program Manager, PubMed Central
2. NIH intends to maximize impact of interim
research products that are developed with NIH
funds.
National Institutes of Health, March 24, 2017
https://grants.nih.gov/grants/guide/notice-files/NOT-OD-17-050.html
3. Awardees are encouraged, not required,
to post preprints.
Applicants are not required to cite
preprints as part of their grant
applications.
Preprints do not fall under the NIH
access policy.
A few notes on
NIH’s Position
https://grants.nih.gov/grants/guide/notice-files/NOT-OD-17-050.html
4. Make preprint publicly
available
Acknowledge NIH
funding
Clearly state work was
not peer reviewed
Declare any competing
interests
Select CC-BY or public
domain license
Choose repository that
follows best practices
NIH Expectations
for Researchers
Who Post Preprints
6. NIH Guidelines for
Selecting Preprint Servers
https://grants.nih.gov/grants/guide/notice-files/NOT-OD-17-050.html
Image: Open Science Basics, CC0
https://book.fosteropenscience.eu/en/02OpenScienceBasics/02OpenResearchDataAndMaterials.html
7. PMC nlm’s open
literature archive
Launched in February 2000
Digital counterpart to NLM’s print collection
Collaboration with subscription and open access journals as well as research funders
All content stored in archival standard XML format (JATS)
Repository for peer-reviewed accepted manuscripts supported by NIH since 2005
8. Europe
PMC
Operated by EBI
Currently includes preprint
citation and abstract
metadata sourced from
Crossref (bioRxiv, ChemRxiv,
PeerJ Preprints, and
F1000Res).
PubMed Central
Operated by NLM
Currently does not include
preprints
Considering pilot to ingest
preprints citing NIH support
PMC
International
Network
13. Preprints,
Servers, &
Publishers
• Preprint server as <journal-title>
• <publisher> metadata should identify organization
responsible for server
Who is providing the hosting and taking
responsibility for addressing concerns re:
plagiarism, competing interests, misconduct?
Who is taking responsibility for ensuring
hallmarks of reputable scholarly publishing are
rigorous and transparent?
16. Preprint Version Control
Recommend extending ICMJE recommendations regarding correcting
published articles to preprints,
Post a new version
with details of the
changes from the
original version
and the date(s) on
which the changes
were made.
Archive all prior
versions of the
article.
Previous electronic
versions should
prominently note
that there are
more recent
versions of the
article.
The citation should
be to the most
recent version.
http://www.icmje.org/recommendations/browse/publishing-and-
editorial-issues/corrections-and-version-control.html
17. Persistent Identifiers and Versions
“Talking AboutVersions” (Crossref
Document)
https://docs.google.com/document/d/13L29
Euis2uruRb3LTypnKHuPTB86Um8jRqDV5h_
CcXg/edit
Preprints are intellectually distinct
from the peer-reviewed paper
accepted for publication in a journal,
and as such should have a unique
Crossref currently recommends a new DOI for each version
of the preprint.
18. Article Stacks
Version 1
Version 2
Version 3
Version 4
Version 1
Version 2
Retraction
Updates and Corrections Retraction
• Each article (stack) has one ID
(DOI)
• Systems need to resolve the
article ID to the latest version
available
• All versions should be available
from every other version
• Each version should include the
change log from the previous
and the pub date of that version
and the original pub date of the
article.
A retraction is the end
of the line for this article!
19. Preprint Stacks
Version 1
Version 2
Version 3
Version 4
Version 1
Version 2
Withdrawal
Notice
Updates and Corrections Withdrawal
• Each preprint (stack) has one
ID (DOI)
• Systems need to resolve the
PUID to the latest version
available
• All versions should be available
from every other version
• Each version should include the
change log from the previous
and the pub date of that version
and the original pub date of the
article.
A withdrawal is the end
of the line for this
preprint!
20. Ensuring Integrity and Transparency
Facilitate linking from preprint to version of record /
published version by including preprint PID/DOI in
article metadata. (NIH expectation)
Establish best practices for maintenance of the
scientific record, e.g., withdrawals or retractions to
deal with plagiarism or scientific misconduct. (NIH
expectation)
Build a common vocabulary for preprints, e.g.,
publishing vs. posting, that can enable more
productive discussions around standards.
21. Thank You!
Special thanks to our colleagues at Europe
PMC who kindly shared their expertise and
experiences with preprints:
• Michele Ide-Smith
• Michael Parkins
• Jo McEntyre
… and NLM colleagues and leadership who
have supported our engagement with the
preprint community:
• Patricia Brennan
• Jerry Sheehan
• Jim Ostell
• Kim Pruitt
• Chris Kelly
This research was supported by the Intramural Research
Program of the NIH, National Library of Medicine.
Editor's Notes
Why are we interested in driving use of preprints?
In 2017, NIH began encouraging investigators to use interim research products, such as preprints, to speed the dissemination and enhance the rigor of their work.
For NIH researchers that opt to post preprints, NIH provides some general guidelines, which we’ve taken into account into account in our later recommendations where applicable.
With these expectations in mind, the question becomes how can NIH maximize impact and drive use of preprints?
One way, NIH has approached this is by laying out clear guidelines for researchers to use in selecting preprint repositories. A number of these guidelines tie in directly to the recommendations we’ll be looking at around metadata and standards for supporting use, including content that is findable, accessible, interoperable and re-usable, encouraging researchers to post preprints to servers that make the content and metadata open, and easy to access by machines and people. This access is both a function of permission (e.g. use of Creative Commons licenses) and technology (e.g. application program interfaces).
Building on the FAIR principles and the NIH guidelines, another way to maximize impact and drive use is to facilitate discovery of preprints and ensure long-term preservation and access this content. To that end, NIH has looked to NLM to explore how we can leverage our existing literature databases, such as PubMed Central, for this purpose.
As NLM’s free full-text archive of biomedical and life sciences journal literature, PMC serves as a digital counterpart to NLM’s extensive print journal collection and has been an archive of NIH-funded peer-reviewed literature since 2005.
PMC stores all content in the archival XML standard for journal articles, JATS, and provides machine-readable metadata to PubMed to facilitate discovery of its contents.
PMC currently collects journal versions of record directly from publishers as well as peer-reviewed, accepted author manuscripts that fall under the public access policies of NIH and a number of other public and private research funders.
To help non-US funding organizations build national or regional repositories of funded research articles, NLM has established the PMC International Network, which includes Europe PMC. PMC International sites mirror content held in PMC, but may also supplement their collections with other materials of particular interest to their respective communities that may fall outside the scope of US PMC and the NLM Collection.
As a result of the freedom to supplement collections, Europe PMC began ingesting preprint citation and abstract from CrossRef in July of 2018
Conversely, at this time, US PMC has not started incorporating any preprint content, but is considering a pilot to archive the full text of preprints acknowledging NIH support in PMC and make them discoverable in PubMed.
For databases such as PMC and Europe PMC, facilitating the discoverability and use of preprints is dependent on the application and distribution of consistent metadata, much of which is consistent with article-level metadata associated with journal articles.
To that end, in preparing for this presentation, what we tried to do was see where we could leverage existing the journal article metadata model as defined by the Journal Article Tag Suite, and identify the areas of overlap as well as potential points of divergence between articles and preprints.
There are of course the standard metadata elements for papers that are key, such as a PID, title, and authors, to the citation and discovery of research objects.
Additionally, the standardized capture of funding metadata is a critical element in allowing funders to track the impact of the research they support.
Funders are also invested in ensuring that conflicts of interests are made open to ensure the integrity of the research they support.
Most of the rest of the presentation will focus on these areas where the metadata for preprints may diverge from traditional articles in order to ensure the transparency and integrity of the scholarly record.
We found that metadata elements seem to diverge when it comes time to account for the less formal (for lack of a better term) nature of preprints. We needed to determine how to account for capturing server information, whether peer-review status needed to be captured or if there was an acceptable proxy to use, and maintenance of a potentially many-versioned publication record.
We also needed to sort through where the concept of ‘publisher’ fit into this picture.
In reviewing NIH guidelines for researchers, which note that preprint servers should have policies for addressing concerns regarding plagiarism, competing interests, and scientific misconduct and ensuring scholarly publishing practices are transparent followed, we decided the metadata should include the organization taking responsibility for posting and enforcing such policies.
We recommend the preprint server be identified as <journal title> and the <publisher> field name the organization responsible for enforcing the policies and practices of the server.
Another NIH guideline or expectation is that preprints be transparent and clearly indicate a preprint and or ‘not peer reviewed status’.
We do not consider peer review status to be metadata (consistent with journal articles that are not peer reviewed); though we also do not want to rely solely on platform or server as a proxy for identifying preprints. In some cases, such as the publishing model used by F1000 Research, a paper could have a preprint status or be a peer reviewed article within the same publication at different points of time.
So in lieu of machine-readability and rendering of preprint status there needs to be some sort of indicator that users and discovery systems can queue off of.
A preprint indicator should not be confused with @article-type, which should continue to reflect the type of paper being posted, i.e., original research, review, editorial, etc.
Perhaps the most significant challenge in capturing metadata for preprints is version control.