An update on public access activities at the National Agricultural Library and next steps, presented 11 January 2017 at the Earth Science Information Partners (ESIP) meeting in Bethesda, Maryland.
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Public access to research results at USDA
1. Cynthia Parr @cydparr
US Department of Agriculture
National Agricultural Library
11 January 2017
Public access to research results
at USDA
Credit: Phenocam Swan Lake Research Farm, MN
2. The Story
• committed to public access
• PubAg is well along
• Ag Data Commons in the process
of making USDA-funded data
accessible
We
are
3. The Story part 2
public access isn’t good enoughBut
Therefore
we are
1. enhancing our platform
2. establishing sound curation policies
and processes
3. promoting machine-readability and
data stories
4. seeking a sustainable business
model
5. PubAg https://pubag.nal.usda.gov
• Launched 2014
• Almost 50K full-text peer-reviewed articles
• More than 1 million citations for other papers
• Will collaborate with US Forest Service Treesearch
• Will expand beyond Agricultural Research Service full text
• Will cooperate with the CHORUS publisher consortium
• Will soon launch a redesign
6.
7. Transform agriculture to deliver a 20%
increase in quality production with 20%
lower environmental impact by 2025
-- USDA Agricultural Research Service
Public access is not enough
8. Goals for USDA digital scientific data
Who What
Researchers and funders Compliance with public access
Agencies Compliance with open data
Research (data submitters) Safe, citable place for data
Research (data users) Find and use awesome data
10. DKAN http://nucivic.com/dkan/
PRO
• Open source community
• Drupal modules for basic
CMS functions
• Can feed Data.gov
• Basic metadata already
supported
CON
• Not designed for scientific
data or scientists
• No links to literature
• No Digital Object
Identifiers
• Doesn’t handle dataset
relationships
• Metadata inadequate for
compliance checking &
re-use
11. Use all this for some data intensive research
Ag Data Commons Pilot FY 2016
• Self-submission accounts (almost 100 now)
• More than 240 datasets (104 harvested)
• Distributed curation
• Links to PubAg, tagged with NAL thesaurus terms
• DataCite Digital Object Identifiers, ORCIDs, FundRef
• Methods metadata, data dictionaries for re-use
• Designed to feed Data.gov
12. 2. Sound curation policies and processes
• Who can submit?
• What do we accept?
• When do we assign DOIs?
• What embargo periods are okay?
• How much review of metadata
and data do we do?
• Who reviews metadata and data?
• How should data be organized?
• When do we offer a group a
“collection”?
• Must we host all the data?
• What can we automate?
• How do we make things more
machine-readable?
• When should datasets be versioned?
• How do we handle preservation?
• How much and what kind of data
storage do we need?
• How do we avoid licensing and
“ownership” confusion?
13. Research products
Include in the Ag Data Commons (or provide links)
• Raw data files and/or Processed data files
• Data dictionary or Readme
Do not submit with the data
• Manuscript
• Figures/tables from manuscript
14. Research products
Include as resources (resource can be URL pointer)
• Web database
• Software
• Source code/Scripts/Workflows
• User manuals
Do not submit with the data
• Presentations associated with the study
• News articles or press releases
• Related or cited data
19. To sum up
committed to public accessUSDA
is public access isn’t good enoughBut
Therefore
20. Acknowledgements
Cynthia.Parr@ars.usda.gov
Susan McCarthy, Ursula Pieper, Erin Antognoli, Jon
Sears, Qing Qu, Jeff Campbell, Jocelyn
McNamara, Melissa Lohrey, Don Gourley,
GovDelivery, Angry Cactus team
The PubAg team, especially Melanie Gardner
UMD: Kerry Huller, Adam Kriesberg, Meghna
Sarin, Candice Ho
Other students: Jaylen Nathwani
Editor's Notes
In the Moving Beyond Mandates: Progress Towards Public Access and What the Future Holds session
, presentations should focus on providing an overview of progress we've made in our respective organizations in the area of public/open access to data/information and what we see in terms of next steps and/or a vision for the future of open/public data.
https://phenocam.sr.unh.edu/webcam/sites/arsmnswanlake1/
Is anybody familiar with this book? “Houston we have a narrative” by Randy Olson.
ARS grand challenge
How does data help you do this?
It will help to provide people access to publications and data BUT
It doesn’t speed up the process of helping scientists discover raw data that could be re-used in large scale analysis, or metaanalyses, or models,
it doesn’t help assess data quality or fitness for use
it doesn’t speed up integration with environmental data
Help researchers and funders demonstrate compliance with public access directives
Ensure that federal scientific data is in data.gov in compliance with open data directives
Provide researchers with a safe, citable place for their data
Help researchers find and use awesome data for future research
How are we making USDA-funded data open, discoverable, safe
to serve the ARS researchers – place to park there data safely, or point to it where it lives in some other trusted repository
Covers things like environmental measurements from the Long Term Agroecosystem Research initiative, genomics of livestock pests, and datasets related to modeling soil erosion, etc.
Drupal
Knowledge
Archive
Network
To date
Plan to become a trusted repository and feed data to Data.gov, the US government’s primary inventory of open government data
We have a human readable page with some text descriptions, attached files, structured metadata
To
Systems are not yet mature, populated, linked, or sustainable. They need to be all of these things in order to support the revolutionary advances in food security science research that are needed
Planning to scale ADC up to hold huge numbers of well documented machine-readable datasets – but so far that’s a lot of human effort and we don't yet have the automated process we need.
For example, there are dozens if not hundreds of publicly available PDFs that have the results of performance trials of different varieties of corn and soybeans – but almost all of that old data is not machine readable. It may or may not be linked yet to Vouchers in the National Plant Germpasm repository
We are increasing our use of identifiers both in ADC and USFSC so that people could do this sort of automated work.
We also need to work to start with the itemization of the DTMS ontology terms in the data.gov datasets, and get those captured on the Ag Data Commons
DTMS is a research effort tied to a synthesis cetner that may not have a lot of longevity – want to capture the results of their semantic work
But furthermore, we need to make sure that those ontology terms link to the existing agricultural thesauri and ontologies so that we can better link the data