This document discusses the Ag Data Commons, a proposed solution for aggregating and providing access to open agricultural research data. It would support public access mandates by hosting USDA and other agricultural data. The Ag Data Commons would provide both human and machine access to metadata and data. It would integrate existing databases and repositories and add value by standardizing metadata, assigning DOIs, and linking to related data and literature. The document considers options for the technical platform, focusing on standards for metadata, controlled vocabularies, and trusted data repository requirements.
Ag Data Commons: Adding Value to open agricultural research dataCyndy Parr
A talk presented on 30 September 2013 at the Biodiversity Information Standards (Taxonomic Databases Working Group TDWG) annual meeting in Nairobi, Kenya
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Matthew Spitzer, Center for Open Science
Creating impact with accessible data in agriculture and nutrition: sharing da...godanSec
Richard Finkers (Wageningen UR) presented at the 2nd International Workshop: Creating Impact with Open Data in Agriculture and Nutrition in The Hague, 11 September 2015.
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Ana Van Gulick, Carnegie Mellon University
Ag Data Commons: Adding Value to open agricultural research dataCyndy Parr
A talk presented on 30 September 2013 at the Biodiversity Information Standards (Taxonomic Databases Working Group TDWG) annual meeting in Nairobi, Kenya
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Matthew Spitzer, Center for Open Science
Creating impact with accessible data in agriculture and nutrition: sharing da...godanSec
Richard Finkers (Wageningen UR) presented at the 2nd International Workshop: Creating Impact with Open Data in Agriculture and Nutrition in The Hague, 11 September 2015.
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Ana Van Gulick, Carnegie Mellon University
Building data networks: exploring trust and interoperability between authoris...Repository Fringe
Building data networks: exploring trust and interoperability between authoris, repositories and journals. Varsha Khodiyar , Scientific Data; Neil Chue Hong, Journal of Open Research Software; Rachael Kotarski, DataCite, Peter McQuilton, BioSharing; Reza Salek, Metabolights. At Repository Fringe 2015
Keynote presentation at 2020 NIH/NLM workshop on generalist repositories. Central themes include software as a richer pathway to data than articles, the development of new metrics for software (such as the CHAOSS framework), working with the technology companies through organizations like the Eclipse Foundation, and the importance of linked data. In particular, the concept of the "value line" as a means to map generalist repositories represents an important opportunity.
What role can publishers play in the open data ecosystem?Varsha Khodiyar
Presentation at session 3 of the NIH workshop 'Role of Generalist Repositories to Enhance Data Discoverability and Reuse' on Feb 11th, at the NIH Main Campus.
Linking Scientific Metadata (presented at DC2010)Jian Qin
Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.
FAIR Data Management and FAIR Data SharingMerce Crosas
Presentation at the Critical Perspective on the Practice of Digiral Archeology symposium: http://archaeology.harvard.edu/critical-perspectives-practice-digital-archaeology
A Lined Data Approach to Interoperability between Biomedical Resource Invento...Trish Whetzel
Overview of Resource Representation Coordination efforts to coordinate the representation of resources from Biositemaps, eagle-i, and the Neuroscience Information Framework.
RDAP 15: “This is just for me”: Researchers on their data documentation pract...ASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Part of "Beyond metadata: Supporting non-standardized documentation to facilitate data reuse"
Sara Mannheimer, Data Management Librarian, Montana State University
Increased access to the data generated is fuelling increased consumption and accelerating the cycle of discovery. But the successful integration and re-use of heterogeneous data from multiple providers and scientific domains is a major challenge within academia and industry, often due to incomplete description of the study details or metadata about the study. Using the BioSharing, ISA Commons and the STATistics Ontology (STATO) projects as exemplar community efforts, in this breakout session we will discuss the evolving portfolio of community-based standards and methods for structuring and curating datasets, from experimental descriptions to the results of analysis.
http://www.methodsinecologyandevolution.org/view/0/events.html#Data_workshop
Preparing for data-intensive science across domains.Cyndy Parr
Presented at American Institute for Biological Sciences council meeting 8 December 2015. I focus on anecdotes from multiple domains on the kinds of skills and trajectories that empower scientists at multiple levels to become engaged in data-intensive science as data wranglers or tool-builders. Even if they don't have lots of funding from NSF or NIH.
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Sky Bristol
Presentation on one of the strategic themes being considered for a U.S. Government Big Data R&D strategy - https://www.nitrd.gov/bigdata/rfi/02102014.aspx.
Talk given at the Data Visualisation and the Future of Academic Publishing event. https://www.eventbrite.com/e/data-visualisation-and-the-future-of-academic-publishing-tickets-25372801733?password=dataviz
An update on the latest BioSharing work; including work with ELIXIR and NIH BD2K, also our survey to assess user needs (530 replies) and the work on the recommender tool
Building data networks: exploring trust and interoperability between authoris...Repository Fringe
Building data networks: exploring trust and interoperability between authoris, repositories and journals. Varsha Khodiyar , Scientific Data; Neil Chue Hong, Journal of Open Research Software; Rachael Kotarski, DataCite, Peter McQuilton, BioSharing; Reza Salek, Metabolights. At Repository Fringe 2015
Keynote presentation at 2020 NIH/NLM workshop on generalist repositories. Central themes include software as a richer pathway to data than articles, the development of new metrics for software (such as the CHAOSS framework), working with the technology companies through organizations like the Eclipse Foundation, and the importance of linked data. In particular, the concept of the "value line" as a means to map generalist repositories represents an important opportunity.
What role can publishers play in the open data ecosystem?Varsha Khodiyar
Presentation at session 3 of the NIH workshop 'Role of Generalist Repositories to Enhance Data Discoverability and Reuse' on Feb 11th, at the NIH Main Campus.
Linking Scientific Metadata (presented at DC2010)Jian Qin
Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.
FAIR Data Management and FAIR Data SharingMerce Crosas
Presentation at the Critical Perspective on the Practice of Digiral Archeology symposium: http://archaeology.harvard.edu/critical-perspectives-practice-digital-archaeology
A Lined Data Approach to Interoperability between Biomedical Resource Invento...Trish Whetzel
Overview of Resource Representation Coordination efforts to coordinate the representation of resources from Biositemaps, eagle-i, and the Neuroscience Information Framework.
RDAP 15: “This is just for me”: Researchers on their data documentation pract...ASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Part of "Beyond metadata: Supporting non-standardized documentation to facilitate data reuse"
Sara Mannheimer, Data Management Librarian, Montana State University
Increased access to the data generated is fuelling increased consumption and accelerating the cycle of discovery. But the successful integration and re-use of heterogeneous data from multiple providers and scientific domains is a major challenge within academia and industry, often due to incomplete description of the study details or metadata about the study. Using the BioSharing, ISA Commons and the STATistics Ontology (STATO) projects as exemplar community efforts, in this breakout session we will discuss the evolving portfolio of community-based standards and methods for structuring and curating datasets, from experimental descriptions to the results of analysis.
http://www.methodsinecologyandevolution.org/view/0/events.html#Data_workshop
Preparing for data-intensive science across domains.Cyndy Parr
Presented at American Institute for Biological Sciences council meeting 8 December 2015. I focus on anecdotes from multiple domains on the kinds of skills and trajectories that empower scientists at multiple levels to become engaged in data-intensive science as data wranglers or tool-builders. Even if they don't have lots of funding from NSF or NIH.
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Sky Bristol
Presentation on one of the strategic themes being considered for a U.S. Government Big Data R&D strategy - https://www.nitrd.gov/bigdata/rfi/02102014.aspx.
Talk given at the Data Visualisation and the Future of Academic Publishing event. https://www.eventbrite.com/e/data-visualisation-and-the-future-of-academic-publishing-tickets-25372801733?password=dataviz
An update on the latest BioSharing work; including work with ELIXIR and NIH BD2K, also our survey to assess user needs (530 replies) and the work on the recommender tool
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
Heinz Pampel | GFZ German Research Centre for Geosciences, LIS
Maxi Kindling | Humboldt-Universität zu Berlin, Berlin School of Library and Information Science Frank Scholze | Karlsruhe Institute of Technology, KIT Library
RDA-Deutschland-Treffen 2015| Potsdam, November 26, 2015
Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarised under the term Research Data Repositories (RDR). The project re3data.org – Registry of Research Data Repositories – began to index research data repositories in 2012 and offers researchers, funding organisations, libraries and publishers an overview of the heterogeneous research data repository landscape. In December 2014 re3data.org listed more than 1,030 research data repositories, which are described in detail using the re3data.org schema (http://dx.doi.org/10.2312/re3.003). Information icons help researchers to identify easily an adequate repository for the storage and reuse of their data. This talk describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further, it outlines the features of re3data. org and it shows current developments for integration into data management planning tools and other services.
By the end of 2015 re3data.org and Databib (Purdue University, USA) will merge their services, which will then be managed under the auspices of DataCite. The aim of this merger is to reduce duplication of effort and to serve the research community better with a single, sustainable registry of research data repositories. The talk will present this organisational development as a best practice example for the development of international research information services.
Public access to research results at USDACyndy Parr
An update on public access activities at the National Agricultural Library and next steps, presented 11 January 2017 at the Earth Science Information Partners (ESIP) meeting in Bethesda, Maryland.
Presentation about the agINFRA Germplasm Working Group (http://wiki.aginfra.eu/index.php/Germplasm_Working_Group). Presented during Session 1 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)
Presentation delivered in the context of the Agricultural Data Interoperability WG meeeting, during the RDA 3rd Plenary Meeting in Dublin, Ireland. 26/3/2014.
The presentation is mostly focused on the work done by the agINFRA project towards proposing a methodology for the definition of Germplasm descriptors as RDF, based on the existing work of experts in the field and making use of the existing effort in this direction.
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET
For all proposals submitted on/after January 25 2023, NIH requires data sharing from all NIH-funded studies. Do you have appropriate data management practices and sharing plans in place to meet these requirements? Have questions or need some help? Join the dkNET office hours to learn about NIH’s policy (NOT-OD-21-013) and available resources that could help.
In our upcoming session on March 3, 2023, we are pleased to invite Dr. Jeffrey Grethe, dkNET co-PI and expert on Data Management and Sharing, Dr. Rebecca Rodriguez, Repository Program Director at NIDDK, Ms. Reaya Reuss, Chief of Staff to the Deputy Director at NIDDK, and the support team members from the NIDDK Central Repository. They will be available to answer any questions you may have.
*Previous Office Hours Slides and Recording: https://dknet.org/about/blog/2535
Upcoming Webinars Schedule: https://dknet.org/about/webinar
Data sharing promotes many goals of the NIH research endeavor. It is particularly important for unique data that cannot be readily replicated. Data sharing allows scientists to expedite the translation of research results into knowledge, products, and procedures to improve human health. Do you know what a data sharing plan should include? Are you aware of common practices and standards for data sharing? Do you know what services are available to help share your data responsibly? This workshop will begin to address these questions. Q&A will follow the presentation. Anyone interested in or planning to apply for NIH funding should attend. Note: The NIH data-sharing policy applies to applicants seeking $500,000 or more in direct costs in any year of the proposed research.
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...ASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Part of “Beyond metadata: Supporting non-standardized documentation to facilitate data reuse”
Webinar presentation by Cyndy Parr and Erin Antognoli hosted by Hunger Solutions Institute (HSI) and Presidents United to Solve Hunger (PUSH) at Auburn University on April 25, 2019.
Biodiversity informatics and the agricultural data landscapeCyndy Parr
Introductory talk of a symposium on Agrobiodiversity informatics at the 2016 annual meeting of the Biodiversity Information Standards. Begins with an overview of the symposium and its speakers, and then launches into my talk.
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK Cyndy Parr
Talk presented at iEvoBio 2014 conference in Raleigh, North Carolina. Though there's a similar title and overlap with the talk I posted last week, there is new material here especially geared towards an informatics crowd savvy in the tools and technology.
Frontiers of discovery with Encyclopedia of LifeCyndy Parr
Presented at the National Museum of Natural History, Smithsonian Institution 18 June 2014
Describes, among other things, development of the TraitBank repository of species attributes, and the use of EOL and TraitBank in scientific research.
Practical interoperability across semantic stores of data for ecological, tax...Cyndy Parr
Presented at the Biodiversity Information Standards (Taxonomic Databases Working Group) 2013 meeting in Florence, Italy on 31 October 2013. Essentially, an introduction to aspects of the back end of the new trait repository of Encyclopedia of Life.
Using and extending Darwin Core for structured attribute dataCyndy Parr
Presented at the Biodiversity Information Standards (Taxonomic Databases Working Group) 2013 meeting in Florence, Italy on 29 October 2013. Essentially, an introduction to the new trait repository of Encyclopedia of Life.
A talk presented January 19, 2013 in the Indo-US Joint Workshop on Biodiversity Informatics at the Ashoka Trust for Research in Ecology and the Environment in Bangalore, India.
A talk presented January 20, 2013 in the Indo-US Joint Workshop on Biodiversity Informatics at the Ashoka Trust for Research in Ecology and the Environment in Bangalore, India.
A talk given at the Semantic Reasoning workshop held at the National Museum of Natural History September 6, 2012. The audience included computer scientists and biological scientists interested in using EOL for their research.
1. Cynthia Parr
US Department of Agriculture
National Agricultural Library
21 October 2015
Ag Data Commons
Adding value to
open agricultural research data
Credit: Phenocam USDA-ARS Hawbecker Farm, PA
3. The challenge of agricultural data
• Broad subject areas
• Journals not integrated with repositories like
Dryad
• Too many existing databases & web distribution
points
• Lack of infrastructure for long-tail data
• Lack of a neutral, sustainable solution for long-
term multi-institutional projects
3
4. • Supports Public Access mandates
• Holds agricultural research data
• Primary audience: researchers
• Holds metadata for data held elsewhere
• Starting with USDA data but will broaden
• Both human and machine access
• Can include unpublished data that is ready
for release
Ag Data Commons Prototyping FY 2015
A proposed solution
5. Search &
Knowledge
Discovery
Thesaurus &
Indexing
Ag Data
Commons
Repository
Organization &
Curation
Grant
management
systems
INGESTION DISSEMINATION
PubAg
Dataset
Submissio
n
Analytics &
Tools
Data.gov
Ag Data
Commons
Catalog
Legend
Building
Adapting
Existing
Distributed
repositories
Forest Service
Geospatial
6. Adding value
6
Metadata +
data package
DOI
Links
Thesaurus tags
Idiosyncratic
data
dictionary
Search, services,
compliance checking
7. DKAN http://nucivic.com/dkan/
PRO
• Open source community
• Drupal modules for basic
CMS functions
• Integrated CKAN catalog
• Feeds Data.gov
• Basic metadata already
supported
CON
• Not designed for scientific
data or scientists
• No links to literature
• No Digital Object
Identifiers
• Doesn’t handle dataset
relationships
• Metadata inadequate for
compliance checking &
re-use
• Lacks preservation
8. Metadata Standards
Core Metadata Schema
POD 1.1 (Project Open Data)
https://project-open-data.cio.gov/
Related Scientific Metadata & Data Standards (e.g.)
ISO 19115 (GIS Data, FGDC)
https://www.iso.org
EML (Ecological Metadata Language)
https://knb.ecoinformatics.org/#tools/eml
MiXS GSC (Genomic Standards Consortium)
http://gensc.org/projects/mixs-gsc-project/
Darwin Core (Biodiversity standards)
http://rs.tdwg.org/dwc/
9. Controlled Vocabularies
• NALT – National Agricultural Library Thesaurus
http://agclass.nal.usda.gov
GACS Global Agricultural Concept Scheme
• Biological Taxonomy
• Gene Ontology (GO)
http://geneontology.org/
• Environments Ontology EnvO, etc.
Relevant for Agriculture
• Help create a semantic web
• SKOS (Simple Knowledge Organization System): W3C
recommendation, or RDF
Credit: AIMS--FAO
Title
Ag Data Commons: adding value to open agricultural research data
Public access to results of federally-funded research is a new mandate for large departments of the United States government. Public access to scholarly literature from U.S. investments is straightforward, with policies and systems like PubMed Central and PubAg (http://pubag.nal.usda.gov) already implemented. However, research data release is a more complex undertaking. Agricultural researchers make their data available in a patchwork of locations, if they share it at all, and metadata and data formats are far from standardized. Many data types overlap with basic science domains that have standards (e.g. biodiversity, genomics, hydrology) but have little in common with each other and are not tailored for agriculture. U.S. Department of Agriculture's prototoype system, the Ag Data Commons (http://data.nal.usda.gov), will meet the requirements of public access but should also go further to facilitate novel, data-intensive science. Aimed at researchers, Ag Data Commons uses DKAN, a Drupal-based catalog and repository (http://nucivic.com/dkan/), to enhance discoverability and access to well-curated resources (data files, databases, software) deposited in the system or held elsewhere. Core metadata fields are from Project Open Data v.1.1 (a requirement of the U.S. open data catalog athttp://data.gov) but we added fields and features to support scholarly research. We issue DataCite Digital Object Identifiers (DOIs), accept author ORCIDs (http://orcid.org/), apply National Agricultural Library thesaurus terms, and encourage citation of literature and linkage with related datasets and other online resources. While extremely detailed metadata are impractical given the breadth of agricultural domains, we can extract fields from sophisticated ISO 19115 geographic information metadata and extended metadata files can be posted and will be indexed. We are piloting the harvest of distributed metadata records. Towards data integration and standardization, we are developing guidelines for machine-readable data dictionaries, manifests of data elements in datasets not unlike Darwin Core Archives. We are exploring ways to enable basic interactive visualizations. Metadata are available in JSON (http://json.org/) and RDF (http://www.w3.org/RDF/), with dedicated feeds for publication links and (eventually) compliance checking. Many challenges remain before we can move from prototype to production. Among the challenges are how to provide easy API (application program interface) access to elements in data files, how to interface with related systems (e.g. Dryad, DataONE, EcoInforma, iPlant), how to leverage methods metadata and semantics, how to better support provenance and impact tracking, and how to ease the pain of both working with and preserving big data for high performance computing.
This plan is in a learning and pilot phase now. Policies are being developed to be available in the next fiscal year. New projects in 2016-1017 will be expected to be in full compliance with policies, that means data management plans up front that result in publicly released scientific data according to policy. .So we have a little time to work out the details and influence the policies. We can have conversations now on best practices that may guide the policy makers.
Dark Blue: develop as part of AgDatacCommons
Light blue:Enhance existing systems.
Gray: Already exist
Drupal
Knowledge
Archive
Network
Phase II prototypeLaunching next week!
Data submission for outside personnel
Automate DOI submission
Support for compliance checking
Embargo support
Support for methods & software metadata