Webinar presentation by Cyndy Parr and Erin Antognoli hosted by Hunger Solutions Institute (HSI) and Presidents United to Solve Hunger (PUSH) at Auburn University on April 25, 2019.
Cyndy ParrBiologist and technologist at National Agricultural Library
1. Open Data and
The Ag Data Commons
Presented by
Cyndy Parr & Erin Antognoli
April 25, 2019
1
2. Agenda
Open data
● Definition and basics
Ag Data Commons
● USDA research data catalog
● Open agricultural data
National Agricultural Library services
● Data dictionaries
● Data management plans
2
4. Open data policy history
2013 - Obama administration’s open data policy memo
Directs all federal agencies to publish their information as machine-readable data, using
searchable, open formats
Required every agency to maintain a centralized Enterprise Data Inventory that lists all
data sets
Mandated a centralized inventory for the whole government – the platform currently
known as data.gov
2019 - OPEN Government Data Act becomes law
https://project-open-data.cio.gov/policy-memo/
https://www.congress.gov/bill/115th-congress/house-bill/4174/text 4
5. Public access policy history
2013 - “Holdren memo” issued by Office of Science and Technology Policy
2014 - USDA Implementation Plan approved
2016 - USDA Public Access Policy for Scholarly Publications approved
● CHORUS will provide access to many published articles
● Submission of accepted manuscripts to PubAg (pubag.data.nal.gov) is imminent
2019 - Anticipate approval of USDA Public Access Policy for Digital Scientific
Data
https://go.usa.gov/xmB9a https://go.usa.gov/xmB92 5
6. Open data is...
“...data that can be freely used, re-used and redistributed by anyone - subject
only, at most, to the requirement to attribute and sharealike.”
~ Open Data Handbook
Why is a clear definition of open data important?
Interoperability - different datasets should be able to work together
● Availability and access
● Re-use and redistribution
● Universal participation
http://opendatahandbook.org/guide/en/what-is-open-data/ 6
7. Availability and Access
“The data must be available as a whole and at no more than a reasonable
reproduction cost, preferably by downloading over the internet. The data
must also be available in a convenient and modifiable form.”
http://opendatahandbook.org/guide/en/what-is-open-data/ 7
8. Re-use and Redistribution
“The data must be provided
under terms that permit
re-use and redistribution
including the intermixing
with other datasets.”
http://opendatahandbook.org/guide/en/what-is-open-data/ 8
9. Universal Participation
“Everyone must be able to use, re-use and redistribute - there should be no
discrimination against fields of endeavour or against persons or groups. For
example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use,
or restrictions of use for certain purposes (e.g. only in education), are not
allowed.”
9
10. FAIR principles reinforce open data
Findable
Accessible
Interoperable
Reusable
FINDABLE
Rich metadata
Persistent identifiers
INTEROPERABLE
Open formats
Common metadata
standards
Controlled vocabularies
REUSABLE
Usage license
Provenance
Community standards
ACCESSIBLE
Fixity
Data & metadata
available to target
audience
FAIR Principles
https://www.force11.org/group/fairgroup/fairprinciples 10
12. The Ag Data Commons is...
● A catalog and data repository for open
agricultural research data
● The catalog for all USDA-funded research data
● Satisfies the federal open data requirements
● Satisfies the USDA public access requirements
https://data.nal.usda.gov/
12
13. Ag Data Commons collection policies
Ag-related data
● Many high-level categories - i.e. Agroecosystems &
Environment, Agricultural Economics, Bioenergy,
Agricultural Products, etc.
USDA Funding
● USDA-funded data or data from USDA
researchers working on collaborative projects
DOI
● Assigned for locally
held resources
Version policy
https://data.nal.usda.gov/ 13
14. Ag Data Commons features
Groups by project or affiliation
● Programs can request a tag to keep
all their data entries grouped
together
● Data hierarchies one level deep
supported (parent / child)
ORCID integration
● Authors can link to their profiles to
prevent ambiguity
Citations
● Specify a citation for your own
data
● Link to scholarly publications
or data papers / PubAg
● Link to other related data
content
https://data.nal.usda.gov/ 14
15. Submission limitations
Data should have ties to USDA
● Funder, collaborator, or employer
File size - 20 GB per file max
● Larger size data storage pilot underway!
No executables allowed
● Executables can be cataloged with a pointer to
the software/code, but not deposited directly
https://data.nal.usda.gov/ 15
16. Submit ag-related data
Create an account
● https://data.nal.usda.gov/user/register
Data submission form
● Metadata entry
● Workflow tools
● Clone metadata
● Separate descriptions for each
resource file
Metadata - Project Open Data
● Open standard
● Formatted for ingest into
data.gov
● https://project-open-data.cio.gov/
schema/https://data.nal.usda.gov/ 16
18. A data dictionary is...
… a collection of descriptions of the data objects or items in a
dataset or model for the benefit of programmers and others who
need to refer to them.
18
19. Ag Data Commons supports data dictionaries
Encouraged as part of catalog entry in the Ag Data Commons
● A special designation for data dictionary resources in the submission form
● CSV format preferred, other machine-readable formats accepted
19
20. NAL offers data dictionary resources
Ag Data Commons submission manual
● https://data.nal.usda.gov > under the About tab
● Instructions for automatic and manual generation
● Blank template
Data dictionary webinars
● National Agricultural Library YouTube channel
● Link under the Ag Data Commons “About” tab
Direct questions / advice / help
● NAL-ADC-Curator@ars.usda.gov
20
22. DMPs are required for USDA funding proposals
USDA funding proposals now require a
DMP
There is a specific format for NIFA DMP
- 2 pages with 5 sections*
● Expected data types
● Data formats (and standards)
● Data storage and preservation (of access)
● Data sharing, protection,
and public access
● Roles and responsibilities
*Note: Other agencies or institutions may require a different format
22
23. NAL assists with DMPs
USDA DMP guide
● https://www.nal.usda.gov/ks/guidelines-data-management-planning
NAL provides DMP draft review
● USDA researchers and collaborators can send their drafts to
NAL-ADC-Curator@ars.usda.gov for review
DMP Webinars
● National Agricultural Library YouTube channel
● Linked under the Ag Data Commons “About” tab
23
24. Other resources at NAL
Webinars
● Recordings available publicly on the NAL
YouTube channel
● Anyone may join future webinars - email
NAL-ADC-Curator@ars.usda.gov to be added
to the list
Ag Data Commons site
● Submission manual, policy pages, etc., all
linked under the “About” tab
PubAg
●https://pubag.nal.usda.gov/
Knowledge Services website
● https://www.nal.usda.gov/ks
24
25. Summary
Open data
● Required for federal research
● Available and accessible for reuse and
redistribution
● FAIR principles - Findable, Accessible,
Interoperable, Reusable
Ag Data Commons
● USDA’s catalog for ag research data
● Agricultural data submissions
Guidelines and assistance at NAL
● Data dictionaries
● Data management plans
25