Parr ag datacommonsnal_brownbag

•Download as PPTX, PDF•

2 likes•514 views

This document discusses the Ag Data Commons, a proposed solution for aggregating and providing access to open agricultural research data. It would support public access mandates by hosting USDA and other agricultural data. The Ag Data Commons would provide both human and machine access to metadata and data. It would integrate existing databases and repositories and add value by standardizing metadata, assigning DOIs, and linking to related data and literature. The document considers options for the technical platform, focusing on standards for metadata, controlled vocabularies, and trusted data repository requirements.

Cynthia Parr
US Department of Agriculture
National Agricultural Library
21 October 2015
Ag Data Commons
Adding value to
open agricultural research data
Credit: Phenocam USDA-ARS Hawbecker Farm, PA

Federal directives: Public access to
open, machine-readable data

The challenge of agricultural data
• Broad subject areas
• Journals not integrated with repositories like
Dryad
• Too many existing databases & web distribution
points
• Lack of infrastructure for long-tail data
• Lack of a neutral, sustainable solution for long-
term multi-institutional projects
3

• Supports Public Access mandates
• Holds agricultural research data
• Primary audience: researchers
• Holds metadata for data held elsewhere
• Starting with USDA data but will broaden
• Both human and machine access
• Can include unpublished data that is ready
for release
Ag Data Commons Prototyping FY 2015
A proposed solution

Search &
Knowledge
Discovery
Thesaurus &
Indexing
Ag Data
Commons
Repository
Organization &
Curation
Grant
management
systems
INGESTION DISSEMINATION
PubAg
Dataset
Submissio
n
Analytics &
Tools
Data.gov
Ag Data
Commons
Catalog
Legend
Building
Adapting
Existing
Distributed
repositories
Forest Service
Geospatial

Adding value
6
Metadata +
data package
DOI
Links
Thesaurus tags
Idiosyncratic
data
dictionary
Search, services,
compliance checking

DKAN http://nucivic.com/dkan/
PRO
• Open source community
• Drupal modules for basic
CMS functions
• Integrated CKAN catalog
• Feeds Data.gov
• Basic metadata already
supported
CON
• Not designed for scientific
data or scientists
• No links to literature
• No Digital Object
Identifiers
• Doesn’t handle dataset
relationships
• Metadata inadequate for
compliance checking &
re-use
• Lacks preservation

Metadata Standards
Core Metadata Schema
POD 1.1 (Project Open Data)
https://project-open-data.cio.gov/
Related Scientific Metadata & Data Standards (e.g.)
ISO 19115 (GIS Data, FGDC)
https://www.iso.org
EML (Ecological Metadata Language)
https://knb.ecoinformatics.org/#tools/eml
MiXS GSC (Genomic Standards Consortium)
http://gensc.org/projects/mixs-gsc-project/
Darwin Core (Biodiversity standards)
http://rs.tdwg.org/dwc/

Controlled Vocabularies
• NALT – National Agricultural Library Thesaurus
http://agclass.nal.usda.gov
 GACS Global Agricultural Concept Scheme
• Biological Taxonomy
• Gene Ontology (GO)
http://geneontology.org/
• Environments Ontology EnvO, etc.
Relevant for Agriculture
• Help create a semantic web
• SKOS (Simple Knowledge Organization System): W3C
recommendation, or RDF
Credit: AIMS--FAO

https://data.nal.usda.gov/
Launched
this week

Adding even more value
Structured
methods
metadata
Shared
data
dictionary
Semantic
data
dictionary

Adding even more value
Assist
application
launch
Find related
data
Integrate/link
related data
= help build the knowledge graph

ISO 16363
Trusted repository requirements
with Adam Kriesberg
and Ricky Punzalan
University of Maryland

Acknowledgements
Cynthia.Parr@ars.usda.gov
Susan McCarthy, NAL – KSD
Ursula Pieper, NAL – ISD
Qing Qu, NAL – KSD contractor
Jeff Campbell – NAL – KSD
Jaylen Nathwani, NAL – student intern
NüCivic, Angry Cactus Team
Jocelyn McNamara -- NAL – KSD contractor
Kerry Huller – UMD graduate fellow
Erin Antognoli – UMD graduate fellow
Adam Kriesberg – UMD postdoctoral fellow

Keynote presentation at 2020 NIH/NLM workshop on generalist repositories. Central themes include software as a richer pathway to data than articles, the development of new metrics for software (such as the CHAOSS framework), working with the technology companies through organizations like the Eclipse Foundation, and the importance of linked data. In particular, the concept of the "value line" as a means to map generalist repositories represents an important opportunity.

What role can publishers play in the open data ecosystem?

Varsha Khodiyar

Linking Scientific Metadata (presented at DC2010)

Jian Qin

Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.

FAIRsharing and FAIRmetrics - RDA, March 2018

Susanna-Assunta Sansone

FAIR Data Management and FAIR Data Sharing

Merce Crosas

Ontology-based Tools to Enhance the Curation WorkflowTrish Whetzel

DataStarR: A Data Sharing and Publication Infrastructure to Support Research

IAALD Community

A Lined Data Approach to Interoperability between Biomedical Resource Invento...

Trish Whetzel

Unlocking Thesis Data - Stephen Grace, University of East London

Repository Fringe

RSpace - Rory Macneil at Repository Fringe 2015

Repository Fringe

RDAP 15: “This is just for me”: Researchers on their data documentation pract...

ASIS&T

Metadata challenges research and re-usable data - BioSharing, ISA and STATO

Alejandra Gonzalez-Beltran

Increased access to the data generated is fuelling increased consumption and accelerating the cycle of discovery. But the successful integration and re-use of heterogeneous data from multiple providers and scientific domains is a major challenge within academia and industry, often due to incomplete description of the study details or metadata about the study. Using the BioSharing, ISA Commons and the STATistics Ontology (STATO) projects as exemplar community efforts, in this breakout session we will discuss the evolving portfolio of community-based standards and methods for structuring and curating datasets, from experimental descriptions to the results of analysis. http://www.methodsinecologyandevolution.org/view/0/events.html#Data_workshop

Data Repositories Impact

Merce Crosas

Preparing for data-intensive science across domains.

Cyndy Parr

Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...

Sky Bristol

Data publication: Discover, Explore, Visualise

Alejandra Gonzalez-Beltran

Collaboratively creating a network of ideas, data and software

Anita de Waard

BioSharing - Update - Feb2016

Susanna-Assunta Sansone

Connecting Dataverse with the Research Life Cycle

Merce Crosas

Big Data Initiatives for Agroecosystems

Cyndy Parr

Ag Data Commons: A new USDA catalog and repository for agricultural research ...

Cyndy Parr

What's hot

Building data networks: exploring trust and interoperability between authoris...

Repository Fringe

A Blueprint for the Research Data Landscape

Sayeed Choudhury

What role can publishers play in the open data ecosystem?

Varsha Khodiyar

Linking Scientific Metadata (presented at DC2010)

Jian Qin

FAIRsharing and FAIRmetrics - RDA, March 2018

Susanna-Assunta Sansone

FAIR Data Management and FAIR Data Sharing

Merce Crosas

Ontology-based Tools to Enhance the Curation WorkflowTrish Whetzel

DataStarR: A Data Sharing and Publication Infrastructure to Support Research

IAALD Community

A Lined Data Approach to Interoperability between Biomedical Resource Invento...

Trish Whetzel

Unlocking Thesis Data - Stephen Grace, University of East London

Repository Fringe

RSpace - Rory Macneil at Repository Fringe 2015

Repository Fringe

RDAP 15: “This is just for me”: Researchers on their data documentation pract...

ASIS&T

Metadata challenges research and re-usable data - BioSharing, ISA and STATO

Alejandra Gonzalez-Beltran

Data Repositories Impact

Merce Crosas

Preparing for data-intensive science across domains.

Cyndy Parr

Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...

Sky Bristol

Data publication: Discover, Explore, Visualise

Alejandra Gonzalez-Beltran

Collaboratively creating a network of ideas, data and software

Anita de Waard

BioSharing - Update - Feb2016

Susanna-Assunta Sansone

Connecting Dataverse with the Research Life Cycle

Merce Crosas

What's hot (20)

Building data networks: exploring trust and interoperability between authoris...

A Blueprint for the Research Data Landscape

What role can publishers play in the open data ecosystem?

Linking Scientific Metadata (presented at DC2010)

FAIRsharing and FAIRmetrics - RDA, March 2018

FAIR Data Management and FAIR Data Sharing

Ontology-based Tools to Enhance the Curation Workflow

DataStarR: A Data Sharing and Publication Infrastructure to Support Research

A Lined Data Approach to Interoperability between Biomedical Resource Invento...

Unlocking Thesis Data - Stephen Grace, University of East London

RSpace - Rory Macneil at Repository Fringe 2015

RDAP 15: “This is just for me”: Researchers on their data documentation pract...

Metadata challenges research and re-usable data - BioSharing, ISA and STATO

Data Repositories Impact

Preparing for data-intensive science across domains.

Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...

Data publication: Discover, Explore, Visualise

Collaboratively creating a network of ideas, data and software

BioSharing - Update - Feb2016

Connecting Dataverse with the Research Life Cycle

Similar to Parr ag datacommonsnal_brownbag

Big Data Initiatives for Agroecosystems

Cyndy Parr

Ag Data Commons: A new USDA catalog and repository for agricultural research ...

Cyndy Parr

re3data.org – a Registry of Research Data Repositories

Heinz Pampel

re3data.org – Registry of Research Data Repositories

Heinz Pampel

Scholze liber 2015-06-25_final

Karlsruhe Institute of Technology (KIT)

Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarised under the term Research Data Repositories (RDR). The project re3data.org – Registry of Research Data Repositories – began to index research data repositories in 2012 and offers researchers, funding organisations, libraries and publishers an overview of the heterogeneous research data repository landscape. In December 2014 re3data.org listed more than 1,030 research data repositories, which are described in detail using the re3data.org schema (http://dx.doi.org/10.2312/re3.003). Information icons help researchers to identify easily an adequate repository for the storage and reuse of their data. This talk describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further, it outlines the features of re3data. org and it shows current developments for integration into data management planning tools and other services. By the end of 2015 re3data.org and Databib (Purdue University, USA) will merge their services, which will then be managed under the auspices of DataCite. The aim of this merger is to reduce duplication of effort and to serve the research community better with a single, sustainable registry of research data repositories. The talk will present this organisational development as a best practice example for the development of international research information services.

Public access to research results at USDA

Cyndy Parr

eROSA Stakeholder WS1: Data discovery through federated dataset catalogues

e-ROSA

The agINFRA Germplasm Working Group

Vassilis Protonotarios

Introduction to Data Management Planning at Alien Challenge COST workshop

Aaike De Wever

Global RDF Descriptors for Germplasm Data

Vassilis Protonotarios

Presentation delivered in the context of the Agricultural Data Interoperability WG meeeting, during the RDA 3rd Plenary Meeting in Dublin, Ireland. 26/3/2014. The presentation is mostly focused on the work done by the agINFRA project towards proposing a methodology for the definition of Germplasm descriptors as RDF, based on the existing work of experts in the field and making use of the existing effort in this direction.

Data discovery through federated dataset catalogs

Valeria Pesce

Being FAIR: FAIR data and model management SSBSS 2017 Summer School

Carole Goble

Lecture 1: Being FAIR: FAIR data and model management In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation. Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects. Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester. In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face. I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects. [1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18

dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...

dkNET

For all proposals submitted on/after January 25 2023, NIH requires data sharing from all NIH-funded studies. Do you have appropriate data management practices and sharing plans in place to meet these requirements? Have questions or need some help? Join the dkNET office hours to learn about NIH’s policy (NOT-OD-21-013) and available resources that could help. In our upcoming session on March 3, 2023, we are pleased to invite Dr. Jeffrey Grethe, dkNET co-PI and expert on Data Management and Sharing, Dr. Rebecca Rodriguez, Repository Program Director at NIDDK, Ms. Reaya Reuss, Chief of Staff to the Deputy Director at NIDDK, and the support team members from the NIDDK Central Repository. They will be available to answer any questions you may have. *Previous Office Hours Slides and Recording: https://dknet.org/about/blog/2535 Upcoming Webinars Schedule: https://dknet.org/about/webinar

RDA Presentation to the International Federation of Library Associations

Research Data Alliance

Making Research Data Repositories Visible – The re3data.org Registry

Heinz Pampel

NIH Data Sharing Plan Workshop - Handout

IUPUI

Data sharing promotes many goals of the NIH research endeavor. It is particularly important for unique data that cannot be readily replicated. Data sharing allows scientists to expedite the translation of research results into knowledge, products, and procedures to improve human health. Do you know what a data sharing plan should include? Are you aware of common practices and standards for data sharing? Do you know what services are available to help share your data responsibly? This workshop will begin to address these questions. Q&A will follow the presentation. Anyone interested in or planning to apply for NIH funding should attend. Note: The NIH data-sharing policy applies to applicants seeking $500,000 or more in direct costs in any year of the proposed research.

RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...

ASIS&T

Open Science - Global Perspectives/Simon Hodson

Academy of Science of South Africa (ASSAf)

IEDA Overview & Updates, March 2014

iedadata

Open Access Week - Oxford, 20-24 Oct 2014

Susanna-Assunta Sansone

Similar to Parr ag datacommonsnal_brownbag (20)

Big Data Initiatives for Agroecosystems

Ag Data Commons: A new USDA catalog and repository for agricultural research ...

re3data.org – a Registry of Research Data Repositories

re3data.org – Registry of Research Data Repositories

Scholze liber 2015-06-25_final

Public access to research results at USDA

eROSA Stakeholder WS1: Data discovery through federated dataset catalogues

The agINFRA Germplasm Working Group

Introduction to Data Management Planning at Alien Challenge COST workshop

Global RDF Descriptors for Germplasm Data

Data discovery through federated dataset catalogs

Being FAIR: FAIR data and model management SSBSS 2017 Summer School

dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...

RDA Presentation to the International Federation of Library Associations

Making Research Data Repositories Visible – The re3data.org Registry

NIH Data Sharing Plan Workshop - Handout

RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...

Open Science - Global Perspectives/Simon Hodson

IEDA Overview & Updates, March 2014

Open Access Week - Oxford, 20-24 Oct 2014

More from Cyndy Parr

Open data and the ag data commons

Cyndy Parr

Biodiversity informatics and the agricultural data landscape

Cyndy Parr

Ag Data Commons: Agricultural research metadata and data

Cyndy Parr

TDWG 2014 opening talk: Chair's Welcome

Cyndy Parr

Behavior ontology workshop princeton

Cyndy Parr

iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK

Cyndy Parr

Frontiers of discovery with Encyclopedia of Life

Cyndy Parr

Practical interoperability across semantic stores of data for ecological, tax...

Cyndy Parr

Using and extending Darwin Core for structured attribute data

Cyndy Parr

How the Encyclopedia of Life is wrangling organismal attribute data

Cyndy Parr

The Road to TraitBank: What's Next for the Encyclopedia of Life

Cyndy Parr

Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...

Cyndy Parr

Encyclopedia of Life: Use cases for phenotypesCyndy Parr

Species pages and portals

Cyndy Parr

Building EOL species pages

Cyndy Parr

Leveraging an international infrastructure: Case studies from the Encyclopeda...

Cyndy Parr

Introduction to EOL.org for scientists

Cyndy Parr

EOL and Science: Yes we can!

Cyndy Parr

EOL China Center status

Cyndy Parr

Western Ghats Portal

Cyndy Parr

More from Cyndy Parr (20)

Open data and the ag data commons

Biodiversity informatics and the agricultural data landscape

Ag Data Commons: Agricultural research metadata and data

TDWG 2014 opening talk: Chair's Welcome

Behavior ontology workshop princeton

iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK

Frontiers of discovery with Encyclopedia of Life

Practical interoperability across semantic stores of data for ecological, tax...

Using and extending Darwin Core for structured attribute data

How the Encyclopedia of Life is wrangling organismal attribute data

The Road to TraitBank: What's Next for the Encyclopedia of Life

Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...

Encyclopedia of Life: Use cases for phenotypes

Species pages and portals

Building EOL species pages

Leveraging an international infrastructure: Case studies from the Encyclopeda...

Introduction to EOL.org for scientists

EOL and Science: Yes we can!

EOL China Center status

Western Ghats Portal

Parr ag datacommonsnal_brownbag

1. Cynthia Parr US Department of Agriculture National Agricultural Library 21 October 2015 Ag Data Commons Adding value to open agricultural research data Credit: Phenocam USDA-ARS Hawbecker Farm, PA

2. Federal directives: Public access to open, machine-readable data

3. The challenge of agricultural data • Broad subject areas • Journals not integrated with repositories like Dryad • Too many existing databases & web distribution points • Lack of infrastructure for long-tail data • Lack of a neutral, sustainable solution for long- term multi-institutional projects 3

4. • Supports Public Access mandates • Holds agricultural research data • Primary audience: researchers • Holds metadata for data held elsewhere • Starting with USDA data but will broaden • Both human and machine access • Can include unpublished data that is ready for release Ag Data Commons Prototyping FY 2015 A proposed solution

5. Search & Knowledge Discovery Thesaurus & Indexing Ag Data Commons Repository Organization & Curation Grant management systems INGESTION DISSEMINATION PubAg Dataset Submissio n Analytics & Tools Data.gov Ag Data Commons Catalog Legend Building Adapting Existing Distributed repositories Forest Service Geospatial

6. Adding value 6 Metadata + data package DOI Links Thesaurus tags Idiosyncratic data dictionary Search, services, compliance checking

7. DKAN http://nucivic.com/dkan/ PRO • Open source community • Drupal modules for basic CMS functions • Integrated CKAN catalog • Feeds Data.gov • Basic metadata already supported CON • Not designed for scientific data or scientists • No links to literature • No Digital Object Identifiers • Doesn’t handle dataset relationships • Metadata inadequate for compliance checking & re-use • Lacks preservation

8. Metadata Standards Core Metadata Schema POD 1.1 (Project Open Data) https://project-open-data.cio.gov/ Related Scientific Metadata & Data Standards (e.g.) ISO 19115 (GIS Data, FGDC) https://www.iso.org EML (Ecological Metadata Language) https://knb.ecoinformatics.org/#tools/eml MiXS GSC (Genomic Standards Consortium) http://gensc.org/projects/mixs-gsc-project/ Darwin Core (Biodiversity standards) http://rs.tdwg.org/dwc/

9. Controlled Vocabularies • NALT – National Agricultural Library Thesaurus http://agclass.nal.usda.gov  GACS Global Agricultural Concept Scheme • Biological Taxonomy • Gene Ontology (GO) http://geneontology.org/ • Environments Ontology EnvO, etc. Relevant for Agriculture • Help create a semantic web • SKOS (Simple Knowledge Organization System): W3C recommendation, or RDF Credit: AIMS--FAO

10. https://data.nal.usda.gov/ Launched this week

11. Adding even more value Structured methods metadata Shared data dictionary Semantic data dictionary

12. Adding even more value Assist application launch Find related data Integrate/link related data = help build the knowledge graph

13. ISO 16363 Trusted repository requirements with Adam Kriesberg and Ricky Punzalan University of Maryland

14. Acknowledgements Cynthia.Parr@ars.usda.gov Susan McCarthy, NAL – KSD Ursula Pieper, NAL – ISD Qing Qu, NAL – KSD contractor Jeff Campbell – NAL – KSD Jaylen Nathwani, NAL – student intern NüCivic, Angry Cactus Team Jocelyn McNamara -- NAL – KSD contractor Kerry Huller – UMD graduate fellow Erin Antognoli – UMD graduate fellow Adam Kriesberg – UMD postdoctoral fellow

Editor's Notes

Title Ag Data Commons: adding value to open agricultural research data Public access to results of federally-funded research is a new mandate for large departments of the United States government. Public access to scholarly literature from U.S. investments is straightforward, with policies and systems like PubMed Central and PubAg (http://pubag.nal.usda.gov) already implemented. However, research data release is a more complex undertaking. Agricultural researchers make their data available in a patchwork of locations, if they share it at all, and metadata and data formats are far from standardized. Many data types overlap with basic science domains that have standards (e.g. biodiversity, genomics, hydrology) but have little in common with each other and are not tailored for agriculture. U.S. Department of Agriculture's prototoype system, the Ag Data Commons (http://data.nal.usda.gov), will meet the requirements of public access but should also go further to facilitate novel, data-intensive science. Aimed at researchers, Ag Data Commons uses DKAN, a Drupal-based catalog and repository (http://nucivic.com/dkan/), to enhance discoverability and access to well-curated resources (data files, databases, software) deposited in the system or held elsewhere. Core metadata fields are from Project Open Data v.1.1 (a requirement of the U.S. open data catalog athttp://data.gov) but we added fields and features to support scholarly research. We issue DataCite Digital Object Identifiers (DOIs), accept author ORCIDs (http://orcid.org/), apply National Agricultural Library thesaurus terms, and encourage citation of literature and linkage with related datasets and other online resources. While extremely detailed metadata are impractical given the breadth of agricultural domains, we can extract fields from sophisticated ISO 19115 geographic information metadata and extended metadata files can be posted and will be indexed. We are piloting the harvest of distributed metadata records. Towards data integration and standardization, we are developing guidelines for machine-readable data dictionaries, manifests of data elements in datasets not unlike Darwin Core Archives. We are exploring ways to enable basic interactive visualizations. Metadata are available in JSON (http://json.org/) and RDF (http://www.w3.org/RDF/), with dedicated feeds for publication links and (eventually) compliance checking. Many challenges remain before we can move from prototype to production. Among the challenges are how to provide easy API (application program interface) access to elements in data files, how to interface with related systems (e.g. Dryad, DataONE, EcoInforma, iPlant), how to leverage methods metadata and semantics, how to better support provenance and impact tracking, and how to ease the pain of both working with and preserving big data for high performance computing.
This plan is in a learning and pilot phase now. Policies are being developed to be available in the next fiscal year. New projects in 2016-1017 will be expected to be in full compliance with policies, that means data management plans up front that result in publicly released scientific data according to policy. .So we have a little time to work out the details and influence the policies. We can have conversations now on best practices that may guide the policy makers.
Dark Blue: develop as part of AgDatacCommons Light blue:Enhance existing systems. Gray: Already exist
Drupal Knowledge Archive Network
Phase II prototypeLaunching next week! Data submission for outside personnel Automate DOI submission Support for compliance checking Embargo support Support for methods & software metadata

Parr ag datacommonsnal_brownbag

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Parr ag datacommonsnal_brownbag

Similar to Parr ag datacommonsnal_brownbag (20)

More from Cyndy Parr

More from Cyndy Parr (20)

Parr ag datacommonsnal_brownbag

Editor's Notes