The Environmental Data Initiative
1
- Create . Package . Archive . Discover . Reuse -
2
Here is the greenish title slide
Objectives
Recognize EDI’s unique history and future.
Understand EDI’s core mission and focus.
Become familiar with EDI’s infrastructure and services.
Be capable of answering the question: Why should I publish my data?
EDI and The National Science Foundation
3
The Environmental Data Initiative is an NSF-funded project, actively promoting
and enabling curation and re-use of environmental data. We assist researchers
from field stations, individual laboratories, and research projects of all sizes to
archive and publish their environmental data. EDI is committed to enable data that
is Findable, Accessible, Interoperable, and Reusable (FAIR).
History
4
Almost 40 years of LTER data management experience
1980s
Repository Infrastructure
2010
DataONE Member Node
2015
Digital Object Identifiers Minted
EDI
We are standing on the shoulders of giants!
Mission
5
Support and accelerate curation and archive of
environmental data.
Follow guiding principles for data stewardship
of the earth sciences community.
Committed to enable Findable, Accessible,
Interoperable, Reusable (FAIR) Data.
Mission
FAIR Guiding Principles:
● GO FAIR: https://www.go-fair.org/fair-principles/
● Enabling FAIR data project: http://www.copdess.org/enabling-fair-data-
project/
● DataCite: https://datacite.org/index.html
Emphasis on rich metadata following , machine-actionability (i.e., the capacity of
computational systems to find, access, interoperate, and reuse data with none
or minimal human intervention) because humans increasingly rely on
computational support to deal with data as a result of the increase in volume,
complexity, and creation speed of data.
6
Infrastructure
7
● A secure and certified data repository
● Data portal
● Contributed data packages (5/20/2020)
○ 7538 (unique)
○ 20956 (all revisions)
● Total data packages (+ EcoTrends and Landsat):
○ 43904 (unique)
○ 72725 (all revisions)
● Hardware managed at University of New Mexico
Center for Advanced Research Computing
● Data content backed up to Amazon’s Glacier Cloud
Services
8
● Support for archiving data
● Data curation software tools
● Training and council on data archiving best practices
● Support for data synthesis projects
○ Community survey data
○ Meteorological and hydrological data
● For environmental scientists and research agencies
(e.g. Long Term Ecological Research (LTER),
Organization of Biological Field Stations (OBFS),
Long Term Research in Environmental Biology
(LTREB), Macrosystems Biology (MSB))
Outreach
9
● Data publishing training (webinars & workshops)
● Data management fellowship program
● Presence at conferences (ESA, AGU, ESIP,
OBFS)
○ Data help desk
○ Sessions
○ Posters
● Office hours
● Website
● Newsletter
● Twitter feed
● Slack channel
Why publish data?
10
https://xkcd.com/1909/
Why publish data?
11
Benefits Challenges
Clarifies authorship of data
Provenance, immutability
Having to learn new concepts and
tools
Fulfill funding agencies’ and journals’
requirements
Overcoming concerns related to data
sharing
Preservation of data and metadata
Re-formatting data into publishable
units.
Reuse of data to enable new science
Time investment in preparing data and
detailed metadata.
Saves time sharing data
First class research object
Accelerating data discovery
12
A few options:
● EDI Data Search
● DataONE Data Search
● Google Dataset Search
● Personal data catalog
Promoting data to first class research objects
13
● Digital Object Identifier (DOI)
● Linked to research IDs (e.g. ORCID)
● Citation and attribution
14
Here is the greenish title slide
Summary
EDI evolved from the data management best practices and expertise of the US
LTER Network to serve the broader ecological and environmental science
community.
The core mission of EDI is to accelerate the data curation abilities of the ecological
and environmental sciences.
EDI is a certified and trustworthy data repository and data curation services.
Publishing data in an open and accessible form creates new scientific
opportunities and meaningful attribution to authors.
15
Here is the greenish title slide
Resources
Contact
Website
Data portal
Slack
Twitter
GitHub EDIorg
GitHub PASTA+

EDI Training Module 2: EDI Project

  • 1.
    The Environmental DataInitiative 1 - Create . Package . Archive . Discover . Reuse -
  • 2.
    2 Here is thegreenish title slide Objectives Recognize EDI’s unique history and future. Understand EDI’s core mission and focus. Become familiar with EDI’s infrastructure and services. Be capable of answering the question: Why should I publish my data?
  • 3.
    EDI and TheNational Science Foundation 3 The Environmental Data Initiative is an NSF-funded project, actively promoting and enabling curation and re-use of environmental data. We assist researchers from field stations, individual laboratories, and research projects of all sizes to archive and publish their environmental data. EDI is committed to enable data that is Findable, Accessible, Interoperable, and Reusable (FAIR).
  • 4.
    History 4 Almost 40 yearsof LTER data management experience 1980s Repository Infrastructure 2010 DataONE Member Node 2015 Digital Object Identifiers Minted EDI We are standing on the shoulders of giants!
  • 5.
    Mission 5 Support and acceleratecuration and archive of environmental data. Follow guiding principles for data stewardship of the earth sciences community. Committed to enable Findable, Accessible, Interoperable, Reusable (FAIR) Data.
  • 6.
    Mission FAIR Guiding Principles: ●GO FAIR: https://www.go-fair.org/fair-principles/ ● Enabling FAIR data project: http://www.copdess.org/enabling-fair-data- project/ ● DataCite: https://datacite.org/index.html Emphasis on rich metadata following , machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data. 6
  • 7.
    Infrastructure 7 ● A secureand certified data repository ● Data portal ● Contributed data packages (5/20/2020) ○ 7538 (unique) ○ 20956 (all revisions) ● Total data packages (+ EcoTrends and Landsat): ○ 43904 (unique) ○ 72725 (all revisions) ● Hardware managed at University of New Mexico Center for Advanced Research Computing ● Data content backed up to Amazon’s Glacier Cloud
  • 8.
    Services 8 ● Support forarchiving data ● Data curation software tools ● Training and council on data archiving best practices ● Support for data synthesis projects ○ Community survey data ○ Meteorological and hydrological data ● For environmental scientists and research agencies (e.g. Long Term Ecological Research (LTER), Organization of Biological Field Stations (OBFS), Long Term Research in Environmental Biology (LTREB), Macrosystems Biology (MSB))
  • 9.
    Outreach 9 ● Data publishingtraining (webinars & workshops) ● Data management fellowship program ● Presence at conferences (ESA, AGU, ESIP, OBFS) ○ Data help desk ○ Sessions ○ Posters ● Office hours ● Website ● Newsletter ● Twitter feed ● Slack channel
  • 10.
  • 11.
    Why publish data? 11 BenefitsChallenges Clarifies authorship of data Provenance, immutability Having to learn new concepts and tools Fulfill funding agencies’ and journals’ requirements Overcoming concerns related to data sharing Preservation of data and metadata Re-formatting data into publishable units. Reuse of data to enable new science Time investment in preparing data and detailed metadata. Saves time sharing data First class research object
  • 12.
    Accelerating data discovery 12 Afew options: ● EDI Data Search ● DataONE Data Search ● Google Dataset Search ● Personal data catalog
  • 13.
    Promoting data tofirst class research objects 13 ● Digital Object Identifier (DOI) ● Linked to research IDs (e.g. ORCID) ● Citation and attribution
  • 14.
    14 Here is thegreenish title slide Summary EDI evolved from the data management best practices and expertise of the US LTER Network to serve the broader ecological and environmental science community. The core mission of EDI is to accelerate the data curation abilities of the ecological and environmental sciences. EDI is a certified and trustworthy data repository and data curation services. Publishing data in an open and accessible form creates new scientific opportunities and meaningful attribution to authors.
  • 15.
    15 Here is thegreenish title slide Resources Contact Website Data portal Slack Twitter GitHub EDIorg GitHub PASTA+

Editor's Notes

  • #4 National Science Foundation Sustained Availability of Biological Infrastructure (SABI) Core Program