More Related Content

Slideshows for you(20)

Similar to Ag Data Commons for AgBioData(20)


Ag Data Commons for AgBioData

  1. Ag Data Commons Cynthia Parr USDA ARS National Agricultural Library A platform to harness the power of Digital Agriculture
  2. Agricultural Data (Gather) Agricultural Knowledge (Transform) Agricultural Decision-making and Action (Translate) Why Ag Data Commons? Federal directives: Public access to open, machine-readable data
  3. Photo credit: Alpha Stock Images CC BY SA 3.0 USDA Enterprise Data Management USDA Public Access Policy ARS OSQR Procedure NIFA RFP, Terms and Conditions Cooperative agreements and contracts • Data Management Plan • Data to be made public in trusted repository within 30 months unless private, proprietary, or sensitive • Datasets to be cataloged at Ag Data Commons with appropriate identifiers
  4. PLOS ONE Data Availability: 20% Currently in Repositories U41A: How Safe and Persistent Is Your Research? AGU Fall Meeting, December 14, 2017 Kerry Kroffe, Director, Editorial Services, PLOS ”Enabling FAIR Data” initiative • Journal will require all data supporting the article be in a data citation and described in the Data Availability Statement • Editors and reviewers enforce policy • Ensure NO data is in the supplement • Repository selected by author must be FAIR-compliant • Journal community adopts and enforces FAIR principles Citation: Stall, S. (2017), Enabling findable, accessible, interoperable, and reusable data, Eos, 98, Published on 15 September 2017.
  5. 22% 34% 2%2% 40% Required Encouraged Over half of top agricultural journals encourage or require open data n = 50 Where USDA researchers published in 2016 (thanks Jon Sears) 17% 78% 5% Yes No Undetermined Researchers have few options for open submission in domain- specific databases n = 235 (thanks Erin Antognoli) Where ag researchers deposit data in 2016
  6. The Concept • Discovery Interface • Catalog • APIs • Computational Tools • Data Analytic Tools Ag Data Commons Knowledge Base Data Producers Data Consumers •Publications •Patents •Grant Info. Federal Repository (I) University Repository (K) Industry Repository (N) Experiment Devices Farm Equipment UAVs, Sensors
  7. FAIR Data Principles Catalog and repository ecosystem Self-submission & harvesting Currently all open data, linked to literature Currently USDA-funded datasets and databases 11% of records have data in our repository – issuing DOIs Ag Data Commons
  8. 8 Public interactive monthly platform statistics Registered Users Catalogued Datasets Downloads Citations
  9. 9 Organizing datasets Photo credit: Anjuli_ayer CC- BY-NC-SA
  10. 10 Ag Data Commons Topics NAL Thesaurus Terms
  11. ARS National Programs 11
  12. ARS National Program 301 12
  13. AgBioData program 13
  14. AgBioData program 14
  15. 15 Harvesting metadata Photo: CC BY Tony Walmsley
  16. Harvesting metadata in DKAN 16 E.g. NCBI Bioprojects USDA NAL Geodata USFS Research Data Archive E.g. Project Open Data, CSW, OAI-PMH
  17. Harvesting from distributed repositories • Avoids duplication of submission effort • More exposure = more impact • Distributes costs for storage • Keeps to specialized platforms for communities • Usually lacks funding information • Many lack DOIs • Many lack methodological detail • Challenging to match up with associated articles 17
  18. Making data machine readable, linked Promoting shared standards JSON, RDF Data dictionary CSV, API, DB, code Ag Data Commons frictionlessdata.ioscience
  19. NAL Resources Ag Data Commons Data Management Plans NOW REQUIRED BY MOST FUNDERS NAL provides online resources & will provide consultation on draft DMPs click on DATA 20
  20. DISCUSSION How can Ag Data Commons help AgBioData • Harvesting metadata? • DOI service for subsets or entire versions of datasets? • Compliance: linking data to grant and award numbers? • Linking data to citations (re-use)? • Discoverability? • Collecting consistent documentation and API information? • Transformation services? • Other? 21

Editor's Notes

  1. USDA is in the process of implementing new requirements for public access to federally funded data, and Ag Data Commons is a big part of implementing that. But even once we get past the gathering stage for all this diverse, scattered data, we want to be able to transform it into knowledge and translate it in ways that are actionable by society for decision.
  2. More journals are requiring the data associated with their published papers to be open. Top journals with ag content profiled (anything Jon wants to add about that?) PLOS ONE, Scientific Reports, Frontiers in Plant Science, Genome Announcements are top ag journals that require open data. Note: not every journal has a policy regarding open data one way or the other
  3. Whether the repositories are managed federally, by industry, or at universities, data should be managed in a place tailored to community needs However, there should be a central catalog, and the data owners are best suited to describing their data in that central catalog To be most useful and understandable we need rich metadata, but given the diversity of kinds of data it can’t be as high as the specialized community repositories need. NAL curators can help make sure the metadata as good as possible The platform should add value, by making available APIs, providing broadly useful tools for working with the data, and extracting the knowledge from the data and connecting it to publications and grant information
  4. Finally, our curators help programmers set up harvests. Given the wide variety of kinds of data, dsitributed platforms don’t use consistent standards so can’t do a distributed search If they are using standards, there are inevitably dialects of standards Programmers don’t understand, metadata librarians help, communicate with data owners
  5. We have a human readable page with some text descriptions, attached files, structured metadata We also promote a variety of ways to make things machine readable and actionable.
  6. How do we work with big data platforms? Just a comment that we are working with the SCINet team to coordinate policies and plans for what to do with big data when it is ready for release.