Photo credit: Alpha Stock Images CC BY SA 3.0
USDA Enterprise Data Management
USDA Public Access Policy
ARS OSQR Procedure
NIFA RFP, Terms and Conditions
Cooperative agreements and contracts
• Data Management Plan
• Data to be made public in trusted
repository within 30 months unless
private, proprietary, or sensitive
• Datasets to be cataloged at Ag Data
Commons with appropriate identifiers
PLOS ONE Data Availability:
20% Currently in Repositories
U41A: How Safe and Persistent Is Your Research?
AGU Fall Meeting, December 14, 2017
Kerry Kroffe, Director, Editorial Services, PLOS
”Enabling FAIR Data” initiative
• Journal will require all data
supporting the article be in a data
citation and described in the Data
• Editors and reviewers enforce
• Ensure NO data is in the
• Repository selected by author must
• Journal community adopts and
enforces FAIR principles
Citation: Stall, S. (2017), Enabling findable, accessible, interoperable, and
reusable data, Eos, 98, https://doi.org/10.1029/2018EO081907. Published on 15
Over half of top agricultural
journals encourage or require
n = 50
Where USDA researchers published in 2016
(thanks Jon Sears)
Researchers have few
options for open
submission in domain-
n = 235 (thanks Erin Antognoli)
Where ag researchers deposit data in 2016
• Discovery Interface
• Computational Tools
• Data Analytic Tools
Ag Data Commons
Data Producers Data Consumers
FAIR Data Principles
Catalog and repository
Currently all open data,
linked to literature
datasets and databases
11% of records have data
in our repository –
Ag Data Commons https://data.nal.usda.gov/
Harvesting metadata in DKAN
E.g. NCBI Bioprojects
USDA NAL Geodata
USFS Research Data Archive
E.g. Project Open Data,
Harvesting from distributed repositories
• Avoids duplication of submission effort
• More exposure = more impact
• Distributes costs for storage
• Keeps to specialized platforms for communities
• Usually lacks funding information
• Many lack DOIs
• Many lack methodological detail
• Challenging to match up with associated articles
Making data machine readable, linked
Promoting shared standards
CSV, API, DB, code
Ag Data Commons data.nal.usda.gov
Ag Data Commons
Data Management Plans
NOW REQUIRED BY MOST FUNDERS
NAL provides online resources & will
provide consultation on draft DMPs
click on DATA
How can Ag Data Commons help AgBioData
• Harvesting metadata?
• DOI service for subsets or entire versions of datasets?
• Compliance: linking data to grant and award numbers?
• Linking data to citations (re-use)?
• Collecting consistent documentation and API
• Transformation services?
USDA is in the process of implementing new requirements for public access to federally funded data, and Ag Data Commons is a big part of implementing that.
But even once we get past the gathering stage for all this diverse, scattered data, we want to be able to transform it into knowledge and translate it in ways that are actionable by society for decision.
More journals are requiring the data associated with their published papers to be open.
Top journals with ag content profiled (anything Jon wants to add about that?)
PLOS ONE, Scientific Reports, Frontiers in Plant Science, Genome Announcements are top ag journals that require open data.
Note: not every journal has a policy regarding open data one way or the other
Whether the repositories are managed federally, by industry, or at universities, data should be managed in a place tailored to community needs
However, there should be a central catalog, and the data owners are best suited to describing their data in that central catalog
To be most useful and understandable we need rich metadata, but given the diversity of kinds of data it can’t be as high as the specialized community repositories need.
NAL curators can help make sure the metadata as good as possible
The platform should add value, by making available APIs, providing broadly useful tools for working with the data, and extracting the knowledge from the data and connecting it to publications and grant information
Finally, our curators help programmers set up harvests.
Given the wide variety of kinds of data, dsitributed platforms don’t use consistent standards so can’t do a distributed search
If they are using standards, there are inevitably dialects of standards
Programmers don’t understand, metadata librarians help, communicate with data owners
We have a human readable page with some text descriptions, attached files, structured metadata
We also promote a variety of ways to make things machine readable and actionable.
How do we work with big data platforms?
Just a comment that we are working with the SCINet team to coordinate policies and plans for what to do with big data when it is ready for release.