ESS-DIVE AGU 12/14/2018
ESS-DIVE: Report on our
First 18 Months
Deb Agarwal
Charuleka Varadharajan, Shreyas Cholia, Cory Snavely,
Valerie Hendrix, Fianna O’Brien, Matt Jones, Chris
Jones, Sara Studwell, Crystal Sherline, and Karen
Whitenack
ESS-DIVE AGU 12/14/2018
ESS-DIVE Mission
To preserve, expand access to, and improve usability of
critical data generated through DOE-sponsored
research of terrestrial and subsurface ecosystems in
support of the DOE’s efforts to address some of
society’s most pressing energy and environmental
challenges.
2
ESS-DIVE AGU 12/14/2018
ESS-DIVE Data Spans Bedrock to Canopy . . .
3
ESS-DIVE AGU 12/14/2018
The ESS-DIVE Team
Data Scientists and
Software engineers
Digital Librarians Environmental
Scientists
ESS-DIVE AGU 12/14/2018
ESS-DIVE Current Status
● CDIAC data -> data packages
○ Complete conversion
○ DOI’s sorted out
● 238 Datasets in the archive
(6097 files)
● 4200 downloads in Dec 2018
5
ESS-DIVE AGU 12/14/2018
ESS Community Engagement
● Worked with ESS Data Management
Working Group in ESS-DIVE package
metadata design
● Site visits to ORNL, CDIAC, OSTI,
SLAC/Stanford, PNNL
● 2017/18 ESS CI and AGU presentation
● Formed Archive Partnership Board and
had three meetings
● Site visit to LLNL and ESGF scheduled
ESS-DIVE AGU 12/14/2018
Project Community Engagement and Implementation
●Community engagement
○2017 May – ESS CI and PI Meeting
○2017 July - Visit to ORNL and OSTI
○2017 Dec – Visit to Stanford/SLAC
○2018 March – Archive Partnership Board
Meeting
○2018 May – ESS PI Meeting
○2018 July – Visit to PNNL
○2018 July – Archive Partnership Board
Meeting
○2018 …
●Implementation
○2017 July – Project start
○2017 Sept. – Old archive transferred
○2018 April – ESS-DIVE live
○2018 August – Join
○2018 December – Prototype API
July2017
September2017
April2018
July2018
July2018
ESS-DIVE AGU 12/14/2018
ESS-DIVE Archiving Features
● Globus for large data uploads
ESS-DIVE AGU 12/14/2018
Feature: User Services with Metacat
● Developed at NCEAS for earth
sciences community
● Robust support for relevant
metadata standards, data
deposit, and data auditing /
management
● Integrates with DataONE
● Web UI and REST API
ESS-DIVE AGU 12/14/2018
Feature: Microservice Architecture at NERSC
● Built on supported Spin platform
for web and network services
● Integrated with NERSC storage
systems, HPC, and network
● Docker enables rigorous version
control over software and
underlying systems
● Multiple instances easy to “spin
up” using isolated containers
ESS-DIVE AGU 12/14/2018
Feature: DataONE network membership
● Robust protocol for data integrity
and replication
● Geographic distribution ensures
continuity of data preservation
and access
● Federated search enhances
discovery
ESS-DIVE AGU 12/14/2018
Features and Benefits Summary
User Services based on Metacat
archive management software and
data tools
Microservice architecture on
NERSC’s new Docker-based
platform, Spin
DataONE network membership
Quick start-up; robust functionality;
metadata and tools grounded in earth
sciences community
Integrated with petascale file
systems, HPC, and fast network;
rigorous version-controlled software
stack and system architecture
Reliable data integrity and replication;
geographic distribution protects
against localized risks; broadened
discovery through federation
⇒
⇒
⇒
ESS-DIVE AGU 12/14/2018
Lessons Learned
● Transforming data from a webpage-
based approach to archiving took a
lot of time
● Microservices architecture has
worked well but there are still
performance challenges
● Adding cybersecurity to the system
to check uploads has been a
challenge
13
ESS-DIVE AGU 12/14/2018
Lessons Learned 2
● Users want an API to store data and bulk upload
● Difficult to build a long-term capability in a shifting
standards world
○ Publisher as project or archive
○ Use of infix
● Incompatible assumptions cause interesting
software challenges
○ DOI approach and versioning
○ Extensive manual curation
○ Data uploaded to CMS at first touch
14
ESS-DIVE AGU 12/14/2018
Acknowledgements
• DOE BER Data Management program within the Climate and
Environmental Science Division - Funding
• National Center for Ecological Analysis and Synthesis (NCEAS) – Help
getting up and running quickly
• DOE Office of Scientific and Technical Information (OSTI) – Transition
DOIs from prior archive
• Datacite – Consultations and transfer of DOIs
• National Energy Research Scientific Computing facility (NERSC) – Hosting
archive
15

Enabling Scalable Integration of Diverse Data

  • 1.
    ESS-DIVE AGU 12/14/2018 ESS-DIVE:Report on our First 18 Months Deb Agarwal Charuleka Varadharajan, Shreyas Cholia, Cory Snavely, Valerie Hendrix, Fianna O’Brien, Matt Jones, Chris Jones, Sara Studwell, Crystal Sherline, and Karen Whitenack
  • 2.
    ESS-DIVE AGU 12/14/2018 ESS-DIVEMission To preserve, expand access to, and improve usability of critical data generated through DOE-sponsored research of terrestrial and subsurface ecosystems in support of the DOE’s efforts to address some of society’s most pressing energy and environmental challenges. 2
  • 3.
    ESS-DIVE AGU 12/14/2018 ESS-DIVEData Spans Bedrock to Canopy . . . 3
  • 4.
    ESS-DIVE AGU 12/14/2018 TheESS-DIVE Team Data Scientists and Software engineers Digital Librarians Environmental Scientists
  • 5.
    ESS-DIVE AGU 12/14/2018 ESS-DIVECurrent Status ● CDIAC data -> data packages ○ Complete conversion ○ DOI’s sorted out ● 238 Datasets in the archive (6097 files) ● 4200 downloads in Dec 2018 5
  • 6.
    ESS-DIVE AGU 12/14/2018 ESSCommunity Engagement ● Worked with ESS Data Management Working Group in ESS-DIVE package metadata design ● Site visits to ORNL, CDIAC, OSTI, SLAC/Stanford, PNNL ● 2017/18 ESS CI and AGU presentation ● Formed Archive Partnership Board and had three meetings ● Site visit to LLNL and ESGF scheduled
  • 7.
    ESS-DIVE AGU 12/14/2018 ProjectCommunity Engagement and Implementation ●Community engagement ○2017 May – ESS CI and PI Meeting ○2017 July - Visit to ORNL and OSTI ○2017 Dec – Visit to Stanford/SLAC ○2018 March – Archive Partnership Board Meeting ○2018 May – ESS PI Meeting ○2018 July – Visit to PNNL ○2018 July – Archive Partnership Board Meeting ○2018 … ●Implementation ○2017 July – Project start ○2017 Sept. – Old archive transferred ○2018 April – ESS-DIVE live ○2018 August – Join ○2018 December – Prototype API July2017 September2017 April2018 July2018 July2018
  • 8.
    ESS-DIVE AGU 12/14/2018 ESS-DIVEArchiving Features ● Globus for large data uploads
  • 9.
    ESS-DIVE AGU 12/14/2018 Feature:User Services with Metacat ● Developed at NCEAS for earth sciences community ● Robust support for relevant metadata standards, data deposit, and data auditing / management ● Integrates with DataONE ● Web UI and REST API
  • 10.
    ESS-DIVE AGU 12/14/2018 Feature:Microservice Architecture at NERSC ● Built on supported Spin platform for web and network services ● Integrated with NERSC storage systems, HPC, and network ● Docker enables rigorous version control over software and underlying systems ● Multiple instances easy to “spin up” using isolated containers
  • 11.
    ESS-DIVE AGU 12/14/2018 Feature:DataONE network membership ● Robust protocol for data integrity and replication ● Geographic distribution ensures continuity of data preservation and access ● Federated search enhances discovery
  • 12.
    ESS-DIVE AGU 12/14/2018 Featuresand Benefits Summary User Services based on Metacat archive management software and data tools Microservice architecture on NERSC’s new Docker-based platform, Spin DataONE network membership Quick start-up; robust functionality; metadata and tools grounded in earth sciences community Integrated with petascale file systems, HPC, and fast network; rigorous version-controlled software stack and system architecture Reliable data integrity and replication; geographic distribution protects against localized risks; broadened discovery through federation ⇒ ⇒ ⇒
  • 13.
    ESS-DIVE AGU 12/14/2018 LessonsLearned ● Transforming data from a webpage- based approach to archiving took a lot of time ● Microservices architecture has worked well but there are still performance challenges ● Adding cybersecurity to the system to check uploads has been a challenge 13
  • 14.
    ESS-DIVE AGU 12/14/2018 LessonsLearned 2 ● Users want an API to store data and bulk upload ● Difficult to build a long-term capability in a shifting standards world ○ Publisher as project or archive ○ Use of infix ● Incompatible assumptions cause interesting software challenges ○ DOI approach and versioning ○ Extensive manual curation ○ Data uploaded to CMS at first touch 14
  • 15.
    ESS-DIVE AGU 12/14/2018 Acknowledgements •DOE BER Data Management program within the Climate and Environmental Science Division - Funding • National Center for Ecological Analysis and Synthesis (NCEAS) – Help getting up and running quickly • DOE Office of Scientific and Technical Information (OSTI) – Transition DOIs from prior archive • Datacite – Consultations and transfer of DOIs • National Energy Research Scientific Computing facility (NERSC) – Hosting archive 15