May 6, 2019 1
Improving Data Transfer and Delivery using Globus
Recent Upgrades to The Atmospheric Radiation
Measurement (ARM) Facility Data Center Architecture
GIRI PRAKASH, ZACH PRICE, JOSEPH OLATT, AND JITU KUMAR
ARM Data Center, Oak Ridge National Laboratory
Globus World, May 01, 2019
palanisamyg@ornl.gov
ARM’s Vision
2
To provide a detailed & accurate description
of the earth atmosphere in diverse climate
regimes to resolve the uncertainties in climate
and earth system models toward the
development of sustainable solutions for the
Nation’s energy & environmental challenges.
ARM Data Flow – The Big Picture
Data Growth
1 PB
3
ARM Data – Disaster Recovery
Offsite Data backup
ARM Data files that are copied into the ORNL HPSS system at ORNL are also
copied over to the HPSS system at ANL. The Globus-URL-copy program and the
ESnet network between the two labs are utilized for this purpose.
■ Date copying to ANL started: 03-26-2018
■ Total size transferred: 188.03 TB
■ Total number of files transferred: 3,938,465
4
Data Transfer and Staging to Facilitate Data Science
5
Data Transfer and Staging to Facilitate Data Science
6
Data Pipeline and Software Architecture
May 6, 2019 7
Data Processing
Storage &
Data
Model
Querying Analytics Scientific
Users
Data Pipeline
Software Architecture
Interface
Visualization
Analytics
Output
Spark
ARM HPC
Computing Clusters
JupyterLab
Relational Database NoSQL Database
• Supports fast analysis
of voluminous data
• Hides architectural
complexities
• Stage data in HPC
• Metadata
• Order History
• Data from multiple
instruments
Frontend
Analytic Server
Backend
Dr.Bhargavi Krishna, Yuping Lu, and Dr.Jitu Kumar
7
Data Retrieval, Packaging, and Delivery
§ Merging
§ DQR filtering
§ Conversion
Retrieval
Future
capability
Data-
streams
HPSS
Online
copy
Link to data access
Data quality
Access to plots
DOI based citation guidance
Publication request
Discovery
UI
&
Web services
NetCDF
data
extractions
Data
staging
order
Live Data WS
8
Data Discovery Tool
9
10
§ Based on big data analysis platform
(NoSQL)
§ ARM HPC Clusters for data
processing
§ Provides an interactive web
interface for users to find
simulations of interest through
examination of the LES
performance relative to select ARM
observations
§ Allows user to visualize LASSO
data bundle diagnostics and skill
scores on the fly using plots and
tables
§ Globus as a delivery option
Cassandra
D3 &
NodeJS
Spark
Data Discovery for LASSO
Questions?
5/6/19 11
ARM Data Services
Giri Prakash
palanisamyg@ornl.gov

Recent Upgrades to ARM Data Transfer and Delivery Using Globus

  • 1.
    May 6, 20191 Improving Data Transfer and Delivery using Globus Recent Upgrades to The Atmospheric Radiation Measurement (ARM) Facility Data Center Architecture GIRI PRAKASH, ZACH PRICE, JOSEPH OLATT, AND JITU KUMAR ARM Data Center, Oak Ridge National Laboratory Globus World, May 01, 2019 palanisamyg@ornl.gov
  • 2.
    ARM’s Vision 2 To providea detailed & accurate description of the earth atmosphere in diverse climate regimes to resolve the uncertainties in climate and earth system models toward the development of sustainable solutions for the Nation’s energy & environmental challenges.
  • 3.
    ARM Data Flow– The Big Picture Data Growth 1 PB 3
  • 4.
    ARM Data –Disaster Recovery Offsite Data backup ARM Data files that are copied into the ORNL HPSS system at ORNL are also copied over to the HPSS system at ANL. The Globus-URL-copy program and the ESnet network between the two labs are utilized for this purpose. ■ Date copying to ANL started: 03-26-2018 ■ Total size transferred: 188.03 TB ■ Total number of files transferred: 3,938,465 4
  • 5.
    Data Transfer andStaging to Facilitate Data Science 5
  • 6.
    Data Transfer andStaging to Facilitate Data Science 6
  • 7.
    Data Pipeline andSoftware Architecture May 6, 2019 7 Data Processing Storage & Data Model Querying Analytics Scientific Users Data Pipeline Software Architecture Interface Visualization Analytics Output Spark ARM HPC Computing Clusters JupyterLab Relational Database NoSQL Database • Supports fast analysis of voluminous data • Hides architectural complexities • Stage data in HPC • Metadata • Order History • Data from multiple instruments Frontend Analytic Server Backend Dr.Bhargavi Krishna, Yuping Lu, and Dr.Jitu Kumar 7
  • 8.
    Data Retrieval, Packaging,and Delivery § Merging § DQR filtering § Conversion Retrieval Future capability Data- streams HPSS Online copy Link to data access Data quality Access to plots DOI based citation guidance Publication request Discovery UI & Web services NetCDF data extractions Data staging order Live Data WS 8
  • 9.
  • 10.
    10 § Based onbig data analysis platform (NoSQL) § ARM HPC Clusters for data processing § Provides an interactive web interface for users to find simulations of interest through examination of the LES performance relative to select ARM observations § Allows user to visualize LASSO data bundle diagnostics and skill scores on the fly using plots and tables § Globus as a delivery option Cassandra D3 & NodeJS Spark Data Discovery for LASSO
  • 11.
    Questions? 5/6/19 11 ARM DataServices Giri Prakash palanisamyg@ornl.gov