Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02
Data management services supported by
NCAR’s Research Data Archive
2018 AGU Fall Meeting
Presentation IN34B-02
12 December 2018
Thomas Cram and Doug Schuster
National Center for Atmospheric Research
Computational and Information Systems Laboratory (CISL)
Research Data Archive
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02
About the RDA
• History
– Established 1960s
• Purpose
– Support climate & weather research at NCAR and
UCAR universities with reference datasets
• Collections
– Ocean & atmospheric observations, climate
reanalyses, operational NWP products
– 600+ datasets, 10M files, 2.2 PB
– 70+ updated daily-monthly
• Free and open access
• FY 2017 data usage
– 13K unique users
– 2.2 PB delivered
2
rda.ucar.edu
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02
RDA data access services
3
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02
RDA data access services
4
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02
RDA data access services
5
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02
RDA data access services
6
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02 7
User services
Data reduction
via delayed
mode
processing
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02 8
User services
Data reduction
via delayed
mode
processing
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02 9
User services
Data reduction
via delayed
mode
processing
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02 10
Since FY 2012
• Data volume reduction 97%
FY 2017
• 50,000 requests processed
• 28 PB data accessed
• 695 TB output delivered
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02 11
User services
Data transfer
options
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02 12
RDA-Globus integration
• Fully integrated into
RDA web portal
• Shared endpoints to full
RDA archive and custom
products
• Secure, reliable, fast
• Reduces burden on web
server
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02
• Metadata harvesting is critical
– Supports value added services
– Validates file integrity during data ingest
• Essential to co-locate data with HPC on disk
– Support NCAR HPC users and RDA Value added services
• Data access related metrics provide limited information
– Provide indicators for the popularity of individual dataset
collections and access mechanisms
– Do not necessarily measure “science impact”
– Need a programmatic way to harvest dataset
citation metrics
Lessons learned
13
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02
• Public cloud “Data Commons”
– NCAR data store co-located with flexible community
accessible compute options
– Facilitate interdisciplinary discovery
– JupyterHub integration (e.g. Pangeo, pangeo.io)
• Leverage future Globus enhancements, e.g. automate
• Link to publications that cite datasets (www.scholix.org)
– Facilitate more relevant dataset search/discovery
– Build upon the research ideas of others that have cited a
dataset
Future directions
14
Shortened presentation title
Shortened presentation title
2018 AGU Fall Meeting: IN34B-02
Contact us
rda.ucar.edu
rdahelp@ucar.edu
15
ncarrda.blogspot.com
@NCAR.RDA
@NCAR_RDA

Data Management Services Supported by NCAR’s Research Data Archive

  • 1.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 Data management services supported by NCAR’s Research Data Archive 2018 AGU Fall Meeting Presentation IN34B-02 12 December 2018 Thomas Cram and Doug Schuster National Center for Atmospheric Research Computational and Information Systems Laboratory (CISL) Research Data Archive
  • 2.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 About the RDA • History – Established 1960s • Purpose – Support climate & weather research at NCAR and UCAR universities with reference datasets • Collections – Ocean & atmospheric observations, climate reanalyses, operational NWP products – 600+ datasets, 10M files, 2.2 PB – 70+ updated daily-monthly • Free and open access • FY 2017 data usage – 13K unique users – 2.2 PB delivered 2 rda.ucar.edu
  • 3.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 RDA data access services 3
  • 4.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 RDA data access services 4
  • 5.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 RDA data access services 5
  • 6.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 RDA data access services 6
  • 7.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 7 User services Data reduction via delayed mode processing
  • 8.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 8 User services Data reduction via delayed mode processing
  • 9.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 9 User services Data reduction via delayed mode processing
  • 10.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 10 Since FY 2012 • Data volume reduction 97% FY 2017 • 50,000 requests processed • 28 PB data accessed • 695 TB output delivered
  • 11.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 11 User services Data transfer options
  • 12.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 12 RDA-Globus integration • Fully integrated into RDA web portal • Shared endpoints to full RDA archive and custom products • Secure, reliable, fast • Reduces burden on web server
  • 13.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 • Metadata harvesting is critical – Supports value added services – Validates file integrity during data ingest • Essential to co-locate data with HPC on disk – Support NCAR HPC users and RDA Value added services • Data access related metrics provide limited information – Provide indicators for the popularity of individual dataset collections and access mechanisms – Do not necessarily measure “science impact” – Need a programmatic way to harvest dataset citation metrics Lessons learned 13
  • 14.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 • Public cloud “Data Commons” – NCAR data store co-located with flexible community accessible compute options – Facilitate interdisciplinary discovery – JupyterHub integration (e.g. Pangeo, pangeo.io) • Leverage future Globus enhancements, e.g. automate • Link to publications that cite datasets (www.scholix.org) – Facilitate more relevant dataset search/discovery – Build upon the research ideas of others that have cited a dataset Future directions 14
  • 15.
    Shortened presentation title Shortenedpresentation title 2018 AGU Fall Meeting: IN34B-02 Contact us rda.ucar.edu rdahelp@ucar.edu 15 ncarrda.blogspot.com @NCAR.RDA @NCAR_RDA