SlideShare a Scribd company logo
1 of 14
Cynthia Parr
US Department of Agriculture
National Agricultural Library
21 October 2015
Ag Data Commons
Adding value to
open agricultural research data
Credit: Phenocam USDA-ARS Hawbecker Farm, PA
Federal directives: Public access to
open, machine-readable data
The challenge of agricultural data
• Broad subject areas
• Journals not integrated with repositories like
Dryad
• Too many existing databases & web distribution
points
• Lack of infrastructure for long-tail data
• Lack of a neutral, sustainable solution for long-
term multi-institutional projects
3
• Supports Public Access mandates
• Holds agricultural research data
• Primary audience: researchers
• Holds metadata for data held elsewhere
• Starting with USDA data but will broaden
• Both human and machine access
• Can include unpublished data that is ready
for release
Ag Data Commons Prototyping FY 2015
A proposed solution
Search &
Knowledge
Discovery
Thesaurus &
Indexing
Ag Data
Commons
Repository
Organization &
Curation
Grant
management
systems
INGESTION DISSEMINATION
PubAg
Dataset
Submissio
n
Analytics &
Tools
Data.gov
Ag Data
Commons
Catalog
Legend
Building
Adapting
Existing
Distributed
repositories
Forest Service
Geospatial
Adding value
6
Metadata +
data package
DOI
Links
Thesaurus tags
Idiosyncratic
data
dictionary
Search, services,
compliance checking
DKAN http://nucivic.com/dkan/
PRO
• Open source community
• Drupal modules for basic
CMS functions
• Integrated CKAN catalog
• Feeds Data.gov
• Basic metadata already
supported
CON
• Not designed for scientific
data or scientists
• No links to literature
• No Digital Object
Identifiers
• Doesn’t handle dataset
relationships
• Metadata inadequate for
compliance checking &
re-use
• Lacks preservation
Metadata Standards
Core Metadata Schema
POD 1.1 (Project Open Data)
https://project-open-data.cio.gov/
Related Scientific Metadata & Data Standards (e.g.)
ISO 19115 (GIS Data, FGDC)
https://www.iso.org
EML (Ecological Metadata Language)
https://knb.ecoinformatics.org/#tools/eml
MiXS GSC (Genomic Standards Consortium)
http://gensc.org/projects/mixs-gsc-project/
Darwin Core (Biodiversity standards)
http://rs.tdwg.org/dwc/
Controlled Vocabularies
• NALT – National Agricultural Library Thesaurus
http://agclass.nal.usda.gov
 GACS Global Agricultural Concept Scheme
• Biological Taxonomy
• Gene Ontology (GO)
http://geneontology.org/
• Environments Ontology EnvO, etc.
Relevant for Agriculture
• Help create a semantic web
• SKOS (Simple Knowledge Organization System): W3C
recommendation, or RDF
Credit: AIMS--FAO
https://data.nal.usda.gov/
Launched
this week
Adding even more value
Structured
methods
metadata
Shared
data
dictionary
Semantic
data
dictionary
Adding even more value
Assist
application
launch
Find related
data
Integrate/link
related data
= help build the knowledge graph
ISO 16363
Trusted repository requirements
with Adam Kriesberg
and Ricky Punzalan
University of Maryland
Acknowledgements
Cynthia.Parr@ars.usda.gov
Susan McCarthy, NAL – KSD
Ursula Pieper, NAL – ISD
Qing Qu, NAL – KSD contractor
Jeff Campbell – NAL – KSD
Jaylen Nathwani, NAL – student intern
NüCivic, Angry Cactus Team
Jocelyn McNamara -- NAL – KSD contractor
Kerry Huller – UMD graduate fellow
Erin Antognoli – UMD graduate fellow
Adam Kriesberg – UMD postdoctoral fellow

More Related Content

What's hot

Ontology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation WorkflowOntology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation Workflow
Trish Whetzel
 

What's hot (20)

Building data networks: exploring trust and interoperability between authoris...
Building data networks: exploring trust and interoperability between authoris...Building data networks: exploring trust and interoperability between authoris...
Building data networks: exploring trust and interoperability between authoris...
 
A Blueprint for the Research Data Landscape
A Blueprint for the Research Data LandscapeA Blueprint for the Research Data Landscape
A Blueprint for the Research Data Landscape
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?
 
Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)
 
FAIRsharing and FAIRmetrics - RDA, March 2018
FAIRsharing and FAIRmetrics - RDA, March 2018FAIRsharing and FAIRmetrics - RDA, March 2018
FAIRsharing and FAIRmetrics - RDA, March 2018
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
Ontology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation WorkflowOntology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation Workflow
 
DataStarR: A Data Sharing and Publication Infrastructure to Support Research
DataStarR: A Data Sharing and Publication Infrastructure to Support ResearchDataStarR: A Data Sharing and Publication Infrastructure to Support Research
DataStarR: A Data Sharing and Publication Infrastructure to Support Research
 
A Lined Data Approach to Interoperability between Biomedical Resource Invento...
A Lined Data Approach to Interoperability between Biomedical Resource Invento...A Lined Data Approach to Interoperability between Biomedical Resource Invento...
A Lined Data Approach to Interoperability between Biomedical Resource Invento...
 
Unlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East LondonUnlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East London
 
RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015
 
RDAP 15: “This is just for me”: Researchers on their data documentation pract...
RDAP 15: “This is just for me”: Researchers on their data documentation pract...RDAP 15: “This is just for me”: Researchers on their data documentation pract...
RDAP 15: “This is just for me”: Researchers on their data documentation pract...
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
Data Repositories Impact
Data Repositories ImpactData Repositories Impact
Data Repositories Impact
 
Preparing for data-intensive science across domains.
Preparing for data-intensive science across domains.Preparing for data-intensive science across domains.
Preparing for data-intensive science across domains.
 
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
 
BioSharing - Update - Feb2016
BioSharing - Update - Feb2016BioSharing - Update - Feb2016
BioSharing - Update - Feb2016
 
Connecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleConnecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life Cycle
 

Similar to Parr ag datacommonsnal_brownbag

Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
Karlsruhe Institute of Technology (KIT)
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 

Similar to Parr ag datacommonsnal_brownbag (20)

Big Data Initiatives for Agroecosystems
Big Data Initiatives for AgroecosystemsBig Data Initiatives for Agroecosystems
Big Data Initiatives for Agroecosystems
 
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Ag Data Commons: A new USDA catalog and repository for agricultural research ...Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
 
re3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data Repositoriesre3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data Repositories
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
Public access to research results at USDA
Public access to research results at USDAPublic access to research results at USDA
Public access to research results at USDA
 
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogueseROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
 
The agINFRA Germplasm Working Group
The agINFRA Germplasm Working GroupThe agINFRA Germplasm Working Group
The agINFRA Germplasm Working Group
 
Introduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopIntroduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshop
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm Data
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogs
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
 
RDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library AssociationsRDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library Associations
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org Registry
 
NIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - HandoutNIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - Handout
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
 
Open Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon Hodson
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 

More from Cyndy Parr

Encyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesEncyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypes
Cyndy Parr
 

More from Cyndy Parr (20)

Open data and the ag data commons
Open data and the ag data commonsOpen data and the ag data commons
Open data and the ag data commons
 
Biodiversity informatics and the agricultural data landscape
Biodiversity informatics and the agricultural data landscapeBiodiversity informatics and the agricultural data landscape
Biodiversity informatics and the agricultural data landscape
 
Ag Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and dataAg Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and data
 
TDWG 2014 opening talk: Chair's Welcome
TDWG 2014 opening talk: Chair's WelcomeTDWG 2014 opening talk: Chair's Welcome
TDWG 2014 opening talk: Chair's Welcome
 
Behavior ontology workshop princeton
Behavior ontology workshop princetonBehavior ontology workshop princeton
Behavior ontology workshop princeton
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
Frontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of LifeFrontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of Life
 
Practical interoperability across semantic stores of data for ecological, tax...
Practical interoperability across semantic stores of data for ecological, tax...Practical interoperability across semantic stores of data for ecological, tax...
Practical interoperability across semantic stores of data for ecological, tax...
 
Using and extending Darwin Core for structured attribute data
Using and extending Darwin Core for structured attribute dataUsing and extending Darwin Core for structured attribute data
Using and extending Darwin Core for structured attribute data
 
How the Encyclopedia of Life is wrangling organismal attribute data
How the Encyclopedia of Life is wrangling organismal attribute dataHow the Encyclopedia of Life is wrangling organismal attribute data
How the Encyclopedia of Life is wrangling organismal attribute data
 
The Road to TraitBank: What's Next for the Encyclopedia of Life
The Road to TraitBank: What's Next for the Encyclopedia of LifeThe Road to TraitBank: What's Next for the Encyclopedia of Life
The Road to TraitBank: What's Next for the Encyclopedia of Life
 
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
 
Encyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesEncyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypes
 
Species pages and portals
Species pages and portals Species pages and portals
Species pages and portals
 
Building EOL species pages
Building EOL species pagesBuilding EOL species pages
Building EOL species pages
 
Leveraging an international infrastructure: Case studies from the Encyclopeda...
Leveraging an international infrastructure: Case studies from the Encyclopeda...Leveraging an international infrastructure: Case studies from the Encyclopeda...
Leveraging an international infrastructure: Case studies from the Encyclopeda...
 
Introduction to EOL.org for scientists
Introduction to EOL.org for scientistsIntroduction to EOL.org for scientists
Introduction to EOL.org for scientists
 
EOL and Science: Yes we can!
EOL and Science: Yes we can!EOL and Science: Yes we can!
EOL and Science: Yes we can!
 
EOL China Center status
EOL China Center statusEOL China Center status
EOL China Center status
 
Western Ghats Portal
Western Ghats PortalWestern Ghats Portal
Western Ghats Portal
 

Parr ag datacommonsnal_brownbag

  • 1. Cynthia Parr US Department of Agriculture National Agricultural Library 21 October 2015 Ag Data Commons Adding value to open agricultural research data Credit: Phenocam USDA-ARS Hawbecker Farm, PA
  • 2. Federal directives: Public access to open, machine-readable data
  • 3. The challenge of agricultural data • Broad subject areas • Journals not integrated with repositories like Dryad • Too many existing databases & web distribution points • Lack of infrastructure for long-tail data • Lack of a neutral, sustainable solution for long- term multi-institutional projects 3
  • 4. • Supports Public Access mandates • Holds agricultural research data • Primary audience: researchers • Holds metadata for data held elsewhere • Starting with USDA data but will broaden • Both human and machine access • Can include unpublished data that is ready for release Ag Data Commons Prototyping FY 2015 A proposed solution
  • 5. Search & Knowledge Discovery Thesaurus & Indexing Ag Data Commons Repository Organization & Curation Grant management systems INGESTION DISSEMINATION PubAg Dataset Submissio n Analytics & Tools Data.gov Ag Data Commons Catalog Legend Building Adapting Existing Distributed repositories Forest Service Geospatial
  • 6. Adding value 6 Metadata + data package DOI Links Thesaurus tags Idiosyncratic data dictionary Search, services, compliance checking
  • 7. DKAN http://nucivic.com/dkan/ PRO • Open source community • Drupal modules for basic CMS functions • Integrated CKAN catalog • Feeds Data.gov • Basic metadata already supported CON • Not designed for scientific data or scientists • No links to literature • No Digital Object Identifiers • Doesn’t handle dataset relationships • Metadata inadequate for compliance checking & re-use • Lacks preservation
  • 8. Metadata Standards Core Metadata Schema POD 1.1 (Project Open Data) https://project-open-data.cio.gov/ Related Scientific Metadata & Data Standards (e.g.) ISO 19115 (GIS Data, FGDC) https://www.iso.org EML (Ecological Metadata Language) https://knb.ecoinformatics.org/#tools/eml MiXS GSC (Genomic Standards Consortium) http://gensc.org/projects/mixs-gsc-project/ Darwin Core (Biodiversity standards) http://rs.tdwg.org/dwc/
  • 9. Controlled Vocabularies • NALT – National Agricultural Library Thesaurus http://agclass.nal.usda.gov  GACS Global Agricultural Concept Scheme • Biological Taxonomy • Gene Ontology (GO) http://geneontology.org/ • Environments Ontology EnvO, etc. Relevant for Agriculture • Help create a semantic web • SKOS (Simple Knowledge Organization System): W3C recommendation, or RDF Credit: AIMS--FAO
  • 11. Adding even more value Structured methods metadata Shared data dictionary Semantic data dictionary
  • 12. Adding even more value Assist application launch Find related data Integrate/link related data = help build the knowledge graph
  • 13. ISO 16363 Trusted repository requirements with Adam Kriesberg and Ricky Punzalan University of Maryland
  • 14. Acknowledgements Cynthia.Parr@ars.usda.gov Susan McCarthy, NAL – KSD Ursula Pieper, NAL – ISD Qing Qu, NAL – KSD contractor Jeff Campbell – NAL – KSD Jaylen Nathwani, NAL – student intern NüCivic, Angry Cactus Team Jocelyn McNamara -- NAL – KSD contractor Kerry Huller – UMD graduate fellow Erin Antognoli – UMD graduate fellow Adam Kriesberg – UMD postdoctoral fellow

Editor's Notes

  1. Title Ag Data Commons: adding value to open agricultural research data     Public access to results of federally-funded research is a new mandate for large departments of the United States government. Public access to scholarly literature from U.S. investments is straightforward, with policies and systems like PubMed Central and PubAg (http://pubag.nal.usda.gov) already implemented. However, research data release is a more complex undertaking. Agricultural researchers make their data available in a patchwork of locations, if they share it at all, and metadata and data formats are far from standardized. Many data types overlap with basic science domains that have standards (e.g. biodiversity, genomics, hydrology) but have little in common with each other and are not tailored for agriculture. U.S. Department of Agriculture's prototoype system, the Ag Data Commons (http://data.nal.usda.gov), will meet the requirements of public access but should also go further to facilitate novel, data-intensive science. Aimed at researchers, Ag Data Commons uses DKAN, a Drupal-based catalog and repository (http://nucivic.com/dkan/), to enhance discoverability and access to well-curated resources (data files, databases, software) deposited in the system or held elsewhere. Core metadata fields are from Project Open Data v.1.1 (a requirement of the U.S. open data catalog athttp://data.gov) but we added fields and features to support scholarly research. We issue DataCite Digital Object Identifiers (DOIs), accept author ORCIDs (http://orcid.org/), apply National Agricultural Library thesaurus terms, and encourage citation of literature and linkage with related datasets and other online resources. While extremely detailed metadata are impractical given the breadth of agricultural domains, we can extract fields from sophisticated ISO 19115 geographic information metadata and extended metadata files can be posted and will be indexed. We are piloting the harvest of distributed metadata records. Towards data integration and standardization, we are developing guidelines for machine-readable data dictionaries, manifests of data elements in datasets not unlike Darwin Core Archives. We are exploring ways to enable basic interactive visualizations. Metadata are available in JSON (http://json.org/) and RDF (http://www.w3.org/RDF/), with dedicated feeds for publication links and (eventually) compliance checking. Many challenges remain before we can move from prototype to production. Among the challenges are how to provide easy API (application program interface) access to elements in data files, how to interface with related systems (e.g. Dryad, DataONE, EcoInforma, iPlant), how to leverage methods metadata and semantics, how to better support provenance and impact tracking, and how to ease the pain of both working with and preserving big data for high performance computing.
  2. This plan is in a learning and pilot phase now. Policies are being developed to be available in the next fiscal year. New projects in 2016-1017 will be expected to be in full compliance with policies, that means data management plans up front that result in publicly released scientific data according to policy. .So we have a little time to work out the details and influence the policies. We can have conversations now on best practices that may guide the policy makers.
  3. Dark Blue: develop as part of AgDatacCommons Light blue:Enhance existing systems. Gray: Already exist
  4. Drupal Knowledge Archive Network
  5. Phase II prototype Launching next week! Data submission for outside personnel Automate DOI submission Support for compliance checking Embargo support Support for methods & software metadata