1. Introduction to the
HACC Simulation Data Portal
Globus World 2019; Chicago, May 1, 2019
Katrin Heitmann (Argonne National Laboratory)
Based on: arXiv:1904.11966
2. Introduction
! In cosmology we study the origin, evolution, and make-up of
the Universe
! Many unsolved questions:
○ What is the nature of dark energy and dark matter, making up 95% of the
energy-matter budget of our Universe?
○ What is the mass of the lightest particle in the Universe, the neutrino?
○ How can we learn more about the very first moments of the Universe?
! Upcoming cosmological surveys try to answer these
questions and rely on detailed, complex simulations
○ Simulations are carried out and analyzed on the largest supercomputers
available world-wide
○ Cosmological simulations generate large amounts of data (PBs) to capture
the evolution of the Universe faithfully
○ Given the resources required for these simulations, it is crucial to share
them with the community to enable the best possible science outcome HACC/Galacticus/GalSim
Hubble Ultra Deep Field
NASA
3. What is needed ...
A large-scale effort that
provides easy access to a
range of simulation products to
the world’s cosmologists as
well as analysis capabilities to
established survey
collaborations
4. Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis User community via web and
community-specific clients
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Datasets
Collaboration-installed Web/
Data Interfaces
• LSST DM Butler
• Jupyter
• PDACS (Galaxy)
• DESCQA
• Visualization
• Databases
• Globus
• Workflows
Globus
Online
Petrel
O(1 PB, 100TB to start)
• Portal
• Globus
ALCF-hosted
Collaboration-controlled Resources
Physical/Virtual Machine(s)
Phoenix
In collaboration with Tom Uram, Mike Papka, Ian Foster
5. Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
6. Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Temporary storage,
expires with allocation,
only collaborators on the
project have direct
access
7. Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Globus
Online
Petrel
O(1 PB, 100TB to start)
Datasets
8. Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Globus
Online
Petrel
O(1 PB, 100TB to start)
Datasets
• Portal
• Globus
User community via web and
community-specific clients
9. Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis User community via web and
community-specific clients
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Datasets
Collaboration-installed Web/
Data Interfaces
• LSST DM Butler
• Jupyter
• PDACS (Galaxy)
• DESCQA
• Visualization
• Databases
• Globus
• Workflows
Globus
Online
Petrel
O(1 PB, 100TB to start)
• Portal
• Globus
ALCF-hosted
Collaboration-controlled Resources
Physical/Virtual Machine(s)
Phoenix
In collaboration with Tom Uram, Mike Papka, Ian Foster
10. What exists ...
• Petrel and Phoenix
• Simulations
• First version of web portal
using Globus
11. ! Petrel: Data Management and
Sharing Pilot, hosted at Argonne
! 1.7PB parallel filesystem
! Embedded in Argonne’s
100+Gbps network fabric to allow
high-speed data transfers
! Web and API access via Globus
! Federated login
! Self-managed by PIs
! https://press3.mcs.anl.gov/petrel/
12. ! Webportal for easy access to
simulations
! Currently: ~ 82.5 TB in our
project covering three
simulation projects
! Step 0: Register with Globus
! Step 1: Select simulation
project
! Step 2: Select data products,
information about data size
available
! Step 3: Transfer with Globus to
endpoint of your choice
13. ! Webportal for easy access to
simulations
! Currently: ~ 82.5 TB in our
project covering three
simulation projects
! Step 0: Register with Globus
! Step 1: Select simulation
project
! Step 2: Select data products,
information about data size
available
! Step 3: Transfer with Globus to
endpoint of your choice
14. ! Webportal for easy access to
simulations
! Currently: ~ 82.5 TB in our
project covering three
simulation projects
! Step 0: Register with Globus
! Step 1: Select simulation
project
! Step 2: Select data products,
information about data size
available
! Step 3: Transfer with Globus to
endpoint of your choice
15. “The purpose of computing is insight not numbers”
- Richard Hamming