SlideShare a Scribd company logo
Advances in Scalable Adaptive
Mesh Refinement (AMR) at
Extreme Scales
Prof. Michael L Norman
Director, San Diego Supercomputer Center
University of California, San Diego
Developer: Dr. James Bordner
Supported by NSF grants SI2-SSE-1440709, PHY-1104819 and AST-0808184.
10/19/2017 M. L. Norman - HPC China 2017, Hefei 1
My intrepid partner in this journey
• James Bordner
• PhD CS UIUC, 1999
• C++ programmer
extraordinaire
• Enzo-P/Cello is entirely his
design and implementation
10/19/2017 M. L. Norman - Charm++ Workshop 2017 2
Motivation : Extreme Scale Numerical Cosmology
• Dark matter only N-body
simulations have crossed the
1012 particle threshold on the
world’s largest
supercomputers
• Hydrodynamic cosmology
applications are lagging behind
N-body simulations
• This is due to the lack of
extreme scale AMR
frameworks
1 trillion particle dark matter simulation on
IBM BG/Q, Habib et al. (2013)
10/19/2017 M. L. Norman - HPC China 2017, Hefei 3
Motivation: Hydrodynamic Cosmological
Simulations of the First Galaxies
10/19/2017 M. L. Norman - HPC China 2017, Hefei 4
Norman & O’Shea, NCSA Blue Waters
Enzo: AMR Hydrodynamic Cosmology Code
http://enzo-project.org
• Enzo code under
continuous development
since 1994
– First hydrodynamic
cosmological AMR code
– Hundreds of users
• Rich set of physics solvers
(hydro, N-body, radiation
transport, chemistry,…)
• Have done simulations with
1012 dynamic range and 42
levels
First Stars
First Galaxies Reionization
10/19/2017 M. L. Norman - HPC China 2017, Hefei 5
Extreme AMR: 1012 dynamic range from 42
levels of refinement
• Gordon Bell prize
finalist 2001
• Formation of the first
star in the universe
10/19/2017 M. L. Norman - HPC China 2017, Hefei 6
Bryan, Norman & Abel 2001
Enzo uses structured AMR (Berger & Colella 1989)
10/19/2017 M. L. Norman - HPC China 2arrier017, Hefei 7
• Unequal patch sizes
• Arbitrary patch locations
• Complex parent-child-sibling relationships
 barriers to scalability
Structured AMR produces
Norman & Bryan 1998
time
∆t
∆t/2 ∆t/2
∆t/4 ∆t/4 ∆t/4 ∆t/4
“W cycle”
Serialization over level
updates also limits
scalability and performance
10/19/2017 M. L. Norman - HPC China 2017, Hefei 8
Adopted Strategy
• Keep the best part of Enzo (numerical solvers) and
replace the AMR infrastructure
• Implement using modern OOP best practices for
modularity and extensibility
• Use the best available scalable AMR algorithm
• Move from bulk synchronous to data-driven
asynchronous execution model to support patch
adaptive timestepping
• Leverage parallel runtimes that support this execution
model, and have a path to exascale
• Make AMR software library application-independent so
others can use it
10/19/2017 M. L. Norman - HPC China 2017, Hefei 10
Software Architecture
10/19/2017 M. L. Norman - HPC China 2017, Hefei 11
Enzo numerical solvers (Enzo-P)
Forest-of-octrees AMR (Cello)
Charm++
Hardware (heterogeneous, hierarchical)
(C++)
10/19/2017 M. L. Norman - HPC China 2017, Hefei 13
Object-oriented design implements “separation of concerns”, enhancing extensibility,
maintainability, understandability
Forest (=Array) of Octrees
Burstedde, Wilcox, Gattas 2011
10/19/2017 M. L. Norman - HPC China 2017, Hefei 14
2 x 2 x 2 trees 6 x 2 x 2 trees
unrefined
tree
refined
tree
p4est weak scaling: mantle convection
Burstedde et al. (2010), Gordon Bell prize finalist paper
10/19/2017 M. L. Norman - HPC China 2017, Hefei 15
What makes it so scalable?
Fully distributed data structure; no parent-child
10/19/2017 M. L. Norman - HPC China 2017, Hefei 16
Burstedde, Wilcox, Gattas 2011
Charm++
10/19/2017 M. L. Norman - HPC China 2017, Hefei 17
(Laxmikant Kale et al. PPL/UIUC)
10/19/2017 M. L. Norman - HPC China 2017, Hefei 18
How does Cello use Charm++ to
implement FOT?
• A forest is array of octrees
of arbitrary size K x L x M
• An octree has leaf nodes
which are blocks (N x N x N)
• Each block is a chare (unit
of sequential work)
• The entire FOT is stored as a
chare array using a bit index
scheme
• Chare arrays are fully
distributed data structures
in Charm++
10/19/2017 M. L. Norman - HPC China 2017, Hefei 19
2 x 2 x 2 tree
N x N x N block
10/19/2017 M. L. Norman - HPC China 2017, Hefei
20
• Each leaf node of the tree is a block
• Each block is a chare
• The forest of trees is represented as a chare array
Demonstration of Enzo-P/Cello
10/19/2017 M. L. Norman - HPC China 2017, Hefei 22
PPM gas dynamics
Demonstration of Enzo-P/Cello
10/19/2017 M. L. Norman - HPC China 2017, Hefei 24
Quadtree AMR mesh – 5 levels
Effective resolution: 4096 x 2048 mesh
Every square is a 162 block, communicating and executing asynchronously with neighbors
WEAK SCALING TEST –
HOW BIG AN AMR MESH CAN WE DO?
10/19/2017 M. L. Norman - HPC China 2017, Hefei 29
A 3x5 array of blast waves
10/19/2017 M. L. Norman - HPC China 2017, Hefei 30
Total energy
Unit cell: 1 tree per core
201 blocks/tree, 323 cells/block
10/19/2017 M. L. Norman - Charm++ Workshop 2017 32
Weak scaling test: Alphabet Soup
array of supersonic blast waves mesh
N trees Np =
cores
Blocks/
Chares
Cells
13 1 201 6.6 M
23 8 1,608
33 27 5,427
43 64 12,864
53 125
63 216
83 512
103 1000 201,000
163 4096
243 13824
323 32768
403 64000 12.9M
483 110592 22.2M 0.7T
543 157464 31.6M 1.0T
643 262144 52.7M 1.7T
10/19/2017 M. L. Norman - Charm++ Workshop 2017 33
Largest
AMR
Simulation
in the
world
1.7 trillion
cells
262K cores
on NCSA
Blue Waters
M. L. Norman - Charm++ Workshop 2017 34
html10/19/2017
Cello fcns
Enzo-P solver
10/19/2017 M. L. Norman - Charm++ Workshop 2017 35
262k cores
Charm++
messaging
bottleneck
10/19/2017 M. L. Norman - Charm++ Workshop 2017 36
p4est
Toward Extreme Scale AMR
Hydrodynamic Cosmology
10/19/2017 M. L. Norman - HPC China 2017, Hefei 37
Scalable
hydrodynamics
Scalable N-body
dynamics
Scalable
gravity
10/19/2017 M. L. Norman - HPC China 2017, Hefei 38
Demonstration of Enzo-P/Cello
10/19/2017 M. L. Norman - HPC China 2017, Hefei 39
Tracer particles
10/19/2017 M. L. Norman - HPC China 2017, Hefei 40
110k cores
10/19/2017 M. L. Norman - SJTU Intel Briefing 41
10/19/2017 M. L. Norman - HPC China 2017, Hefei 42
10/19/2017 M. L. Norman - SJTU Intel Briefing 43
64k cores
10/19/2017 M. L. Norman - SJTU Intel Briefing 44
64k cores
Hydrodynamic Cosmology Scaling Tests
• Uniform grid only
• Weak and strong scaling
• 323 blocks/chare
• 1, 8, 64 chares/core
• 643 to 20483 meshes
• p=8 to 128k cores
10/19/2017 M. L. Norman - HPC China 2017, Hefei 45
Density projection in 5123 simulation
Scalable Hydrodynamic Cosmology
10/19/2017 M. L. Norman - SJTU Intel Briefing 46
64k cores
Scalable Hydrodynamic Cosmology
10/19/2017 M. L. Norman - SJTU Intel Briefing 47
64k cores
8k cores
Toward Extreme Scale AMR
Hydrodynamic Cosmology
10/19/2017 M. L. Norman - HPC China 2017, Hefei 48
Scalable
hydrodynamics
Scalable N-body
dynamics
Scalable
gravity
uniform
grid only
Path Forward
• Finish debugging scalable AMR gravity solver
• Implement block-local timestepping
• Experiment with Charm’s built-in DLB schemes
and dynamic execution
• Do a 1 trillion cell/particle hydro cosmology
simulation to demonstrate capability
• Publish!
10/19/2017 M. L. Norman - HPC China 2017, Hefei 57
Resources
• Project site: http://www.cello-project.org
• Source code: https://bitbucket.org/cello-project
• Tutorials: on project site
10/19/2017 M. L. Norman - HPC China 2017, Hefei 58
10/19/2017 M. L. Norman - HPC China 2017, Hefei 59

More Related Content

Similar to Scalable AMR - HPC China 2017, Hefei China

An Architecture for Implementing Private Local Automation Clouds Built by CPS
An Architecture for Implementing Private Local Automation Clouds Built by CPSAn Architecture for Implementing Private Local Automation Clouds Built by CPS
An Architecture for Implementing Private Local Automation Clouds Built by CPS
FAST-Lab. Factory Automation Systems and Technologies Laboratory, Tampere University of Technology
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & Productivity
Linaro
 
Processing Large ToF-SIMS Datasets For The Study Of Surface Segregation Of Po...
Processing Large ToF-SIMS Datasets For The Study Of Surface Segregation Of Po...Processing Large ToF-SIMS Datasets For The Study Of Surface Segregation Of Po...
Processing Large ToF-SIMS Datasets For The Study Of Surface Segregation Of Po...
Gustavo Ferraz Trindade
 
Processing Large ToF-SIMS Datasets
Processing Large ToF-SIMS DatasetsProcessing Large ToF-SIMS Datasets
Processing Large ToF-SIMS Datasets
Gustavo Ferraz Trindade
 
1Spatial: FME World Tour London: Digital surveying with FME
1Spatial: FME World Tour London: Digital surveying with FME1Spatial: FME World Tour London: Digital surveying with FME
1Spatial: FME World Tour London: Digital surveying with FME
1Spatial
 
F1041028_George_Chen_Resume_9_with_Publications_Training
F1041028_George_Chen_Resume_9_with_Publications_TrainingF1041028_George_Chen_Resume_9_with_Publications_Training
F1041028_George_Chen_Resume_9_with_Publications_TrainingWei-Su Chen
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data Benchmarks
DataBench
 
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
Yuki Oyama
 
Update on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCUpdate on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPC
inside-BigData.com
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
Brahim HAMADICHAREF
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
Cs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureCs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN Architecture
Yanbin Kong
 
C hi mad_phasefieldworkshop(1)
C hi mad_phasefieldworkshop(1)C hi mad_phasefieldworkshop(1)
C hi mad_phasefieldworkshop(1)
PFHub PFHub
 
Introduction to neural networks and Keras
Introduction to neural networks and KerasIntroduction to neural networks and Keras
Introduction to neural networks and Keras
Jie He
 
Cigos_carbonation.pptx
Cigos_carbonation.pptxCigos_carbonation.pptx
Cigos_carbonation.pptx
Tran Van Quan
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304
Linaro
 
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondKernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Anne Nicolas
 
External Aerodynamic Optimization Using ANSYS Mesh Morphing
External Aerodynamic Optimization Using ANSYS Mesh MorphingExternal Aerodynamic Optimization Using ANSYS Mesh Morphing
External Aerodynamic Optimization Using ANSYS Mesh Morphing
Marco E. Biancolini
 
Collaborative editing (and more) in CERNBox
Collaborative editing (and more) in CERNBoxCollaborative editing (and more) in CERNBox
Collaborative editing (and more) in CERNBox
Giuseppe Lo Presti
 
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
confluent
 

Similar to Scalable AMR - HPC China 2017, Hefei China (20)

An Architecture for Implementing Private Local Automation Clouds Built by CPS
An Architecture for Implementing Private Local Automation Clouds Built by CPSAn Architecture for Implementing Private Local Automation Clouds Built by CPS
An Architecture for Implementing Private Local Automation Clouds Built by CPS
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & Productivity
 
Processing Large ToF-SIMS Datasets For The Study Of Surface Segregation Of Po...
Processing Large ToF-SIMS Datasets For The Study Of Surface Segregation Of Po...Processing Large ToF-SIMS Datasets For The Study Of Surface Segregation Of Po...
Processing Large ToF-SIMS Datasets For The Study Of Surface Segregation Of Po...
 
Processing Large ToF-SIMS Datasets
Processing Large ToF-SIMS DatasetsProcessing Large ToF-SIMS Datasets
Processing Large ToF-SIMS Datasets
 
1Spatial: FME World Tour London: Digital surveying with FME
1Spatial: FME World Tour London: Digital surveying with FME1Spatial: FME World Tour London: Digital surveying with FME
1Spatial: FME World Tour London: Digital surveying with FME
 
F1041028_George_Chen_Resume_9_with_Publications_Training
F1041028_George_Chen_Resume_9_with_Publications_TrainingF1041028_George_Chen_Resume_9_with_Publications_Training
F1041028_George_Chen_Resume_9_with_Publications_Training
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data Benchmarks
 
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
 
Update on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCUpdate on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPC
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Cs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureCs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN Architecture
 
C hi mad_phasefieldworkshop(1)
C hi mad_phasefieldworkshop(1)C hi mad_phasefieldworkshop(1)
C hi mad_phasefieldworkshop(1)
 
Introduction to neural networks and Keras
Introduction to neural networks and KerasIntroduction to neural networks and Keras
Introduction to neural networks and Keras
 
Cigos_carbonation.pptx
Cigos_carbonation.pptxCigos_carbonation.pptx
Cigos_carbonation.pptx
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304
 
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondKernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
 
External Aerodynamic Optimization Using ANSYS Mesh Morphing
External Aerodynamic Optimization Using ANSYS Mesh MorphingExternal Aerodynamic Optimization Using ANSYS Mesh Morphing
External Aerodynamic Optimization Using ANSYS Mesh Morphing
 
Collaborative editing (and more) in CERNBox
Collaborative editing (and more) in CERNBoxCollaborative editing (and more) in CERNBox
Collaborative editing (and more) in CERNBox
 
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
 

Recently uploaded

Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
vrstrong314
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
abdulrafaychaudhry
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 

Recently uploaded (20)

Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 

Scalable AMR - HPC China 2017, Hefei China

  • 1. Advances in Scalable Adaptive Mesh Refinement (AMR) at Extreme Scales Prof. Michael L Norman Director, San Diego Supercomputer Center University of California, San Diego Developer: Dr. James Bordner Supported by NSF grants SI2-SSE-1440709, PHY-1104819 and AST-0808184. 10/19/2017 M. L. Norman - HPC China 2017, Hefei 1
  • 2. My intrepid partner in this journey • James Bordner • PhD CS UIUC, 1999 • C++ programmer extraordinaire • Enzo-P/Cello is entirely his design and implementation 10/19/2017 M. L. Norman - Charm++ Workshop 2017 2
  • 3. Motivation : Extreme Scale Numerical Cosmology • Dark matter only N-body simulations have crossed the 1012 particle threshold on the world’s largest supercomputers • Hydrodynamic cosmology applications are lagging behind N-body simulations • This is due to the lack of extreme scale AMR frameworks 1 trillion particle dark matter simulation on IBM BG/Q, Habib et al. (2013) 10/19/2017 M. L. Norman - HPC China 2017, Hefei 3
  • 4. Motivation: Hydrodynamic Cosmological Simulations of the First Galaxies 10/19/2017 M. L. Norman - HPC China 2017, Hefei 4 Norman & O’Shea, NCSA Blue Waters
  • 5. Enzo: AMR Hydrodynamic Cosmology Code http://enzo-project.org • Enzo code under continuous development since 1994 – First hydrodynamic cosmological AMR code – Hundreds of users • Rich set of physics solvers (hydro, N-body, radiation transport, chemistry,…) • Have done simulations with 1012 dynamic range and 42 levels First Stars First Galaxies Reionization 10/19/2017 M. L. Norman - HPC China 2017, Hefei 5
  • 6. Extreme AMR: 1012 dynamic range from 42 levels of refinement • Gordon Bell prize finalist 2001 • Formation of the first star in the universe 10/19/2017 M. L. Norman - HPC China 2017, Hefei 6 Bryan, Norman & Abel 2001
  • 7. Enzo uses structured AMR (Berger & Colella 1989) 10/19/2017 M. L. Norman - HPC China 2arrier017, Hefei 7 • Unequal patch sizes • Arbitrary patch locations • Complex parent-child-sibling relationships  barriers to scalability Structured AMR produces Norman & Bryan 1998
  • 8. time ∆t ∆t/2 ∆t/2 ∆t/4 ∆t/4 ∆t/4 ∆t/4 “W cycle” Serialization over level updates also limits scalability and performance 10/19/2017 M. L. Norman - HPC China 2017, Hefei 8
  • 9. Adopted Strategy • Keep the best part of Enzo (numerical solvers) and replace the AMR infrastructure • Implement using modern OOP best practices for modularity and extensibility • Use the best available scalable AMR algorithm • Move from bulk synchronous to data-driven asynchronous execution model to support patch adaptive timestepping • Leverage parallel runtimes that support this execution model, and have a path to exascale • Make AMR software library application-independent so others can use it 10/19/2017 M. L. Norman - HPC China 2017, Hefei 10
  • 10. Software Architecture 10/19/2017 M. L. Norman - HPC China 2017, Hefei 11 Enzo numerical solvers (Enzo-P) Forest-of-octrees AMR (Cello) Charm++ Hardware (heterogeneous, hierarchical)
  • 11. (C++) 10/19/2017 M. L. Norman - HPC China 2017, Hefei 13 Object-oriented design implements “separation of concerns”, enhancing extensibility, maintainability, understandability
  • 12. Forest (=Array) of Octrees Burstedde, Wilcox, Gattas 2011 10/19/2017 M. L. Norman - HPC China 2017, Hefei 14 2 x 2 x 2 trees 6 x 2 x 2 trees unrefined tree refined tree
  • 13. p4est weak scaling: mantle convection Burstedde et al. (2010), Gordon Bell prize finalist paper 10/19/2017 M. L. Norman - HPC China 2017, Hefei 15
  • 14. What makes it so scalable? Fully distributed data structure; no parent-child 10/19/2017 M. L. Norman - HPC China 2017, Hefei 16 Burstedde, Wilcox, Gattas 2011
  • 15. Charm++ 10/19/2017 M. L. Norman - HPC China 2017, Hefei 17 (Laxmikant Kale et al. PPL/UIUC)
  • 16. 10/19/2017 M. L. Norman - HPC China 2017, Hefei 18
  • 17. How does Cello use Charm++ to implement FOT? • A forest is array of octrees of arbitrary size K x L x M • An octree has leaf nodes which are blocks (N x N x N) • Each block is a chare (unit of sequential work) • The entire FOT is stored as a chare array using a bit index scheme • Chare arrays are fully distributed data structures in Charm++ 10/19/2017 M. L. Norman - HPC China 2017, Hefei 19 2 x 2 x 2 tree N x N x N block
  • 18. 10/19/2017 M. L. Norman - HPC China 2017, Hefei 20 • Each leaf node of the tree is a block • Each block is a chare • The forest of trees is represented as a chare array
  • 19. Demonstration of Enzo-P/Cello 10/19/2017 M. L. Norman - HPC China 2017, Hefei 22 PPM gas dynamics
  • 20. Demonstration of Enzo-P/Cello 10/19/2017 M. L. Norman - HPC China 2017, Hefei 24 Quadtree AMR mesh – 5 levels Effective resolution: 4096 x 2048 mesh Every square is a 162 block, communicating and executing asynchronously with neighbors
  • 21. WEAK SCALING TEST – HOW BIG AN AMR MESH CAN WE DO? 10/19/2017 M. L. Norman - HPC China 2017, Hefei 29
  • 22. A 3x5 array of blast waves 10/19/2017 M. L. Norman - HPC China 2017, Hefei 30 Total energy
  • 23. Unit cell: 1 tree per core 201 blocks/tree, 323 cells/block 10/19/2017 M. L. Norman - Charm++ Workshop 2017 32
  • 24. Weak scaling test: Alphabet Soup array of supersonic blast waves mesh N trees Np = cores Blocks/ Chares Cells 13 1 201 6.6 M 23 8 1,608 33 27 5,427 43 64 12,864 53 125 63 216 83 512 103 1000 201,000 163 4096 243 13824 323 32768 403 64000 12.9M 483 110592 22.2M 0.7T 543 157464 31.6M 1.0T 643 262144 52.7M 1.7T 10/19/2017 M. L. Norman - Charm++ Workshop 2017 33
  • 25. Largest AMR Simulation in the world 1.7 trillion cells 262K cores on NCSA Blue Waters M. L. Norman - Charm++ Workshop 2017 34 html10/19/2017
  • 26. Cello fcns Enzo-P solver 10/19/2017 M. L. Norman - Charm++ Workshop 2017 35 262k cores
  • 27. Charm++ messaging bottleneck 10/19/2017 M. L. Norman - Charm++ Workshop 2017 36 p4est
  • 28. Toward Extreme Scale AMR Hydrodynamic Cosmology 10/19/2017 M. L. Norman - HPC China 2017, Hefei 37 Scalable hydrodynamics Scalable N-body dynamics Scalable gravity
  • 29. 10/19/2017 M. L. Norman - HPC China 2017, Hefei 38
  • 30. Demonstration of Enzo-P/Cello 10/19/2017 M. L. Norman - HPC China 2017, Hefei 39 Tracer particles
  • 31. 10/19/2017 M. L. Norman - HPC China 2017, Hefei 40 110k cores
  • 32. 10/19/2017 M. L. Norman - SJTU Intel Briefing 41
  • 33. 10/19/2017 M. L. Norman - HPC China 2017, Hefei 42
  • 34. 10/19/2017 M. L. Norman - SJTU Intel Briefing 43 64k cores
  • 35. 10/19/2017 M. L. Norman - SJTU Intel Briefing 44 64k cores
  • 36. Hydrodynamic Cosmology Scaling Tests • Uniform grid only • Weak and strong scaling • 323 blocks/chare • 1, 8, 64 chares/core • 643 to 20483 meshes • p=8 to 128k cores 10/19/2017 M. L. Norman - HPC China 2017, Hefei 45 Density projection in 5123 simulation
  • 37. Scalable Hydrodynamic Cosmology 10/19/2017 M. L. Norman - SJTU Intel Briefing 46 64k cores
  • 38. Scalable Hydrodynamic Cosmology 10/19/2017 M. L. Norman - SJTU Intel Briefing 47 64k cores 8k cores
  • 39. Toward Extreme Scale AMR Hydrodynamic Cosmology 10/19/2017 M. L. Norman - HPC China 2017, Hefei 48 Scalable hydrodynamics Scalable N-body dynamics Scalable gravity uniform grid only
  • 40. Path Forward • Finish debugging scalable AMR gravity solver • Implement block-local timestepping • Experiment with Charm’s built-in DLB schemes and dynamic execution • Do a 1 trillion cell/particle hydro cosmology simulation to demonstrate capability • Publish! 10/19/2017 M. L. Norman - HPC China 2017, Hefei 57
  • 41. Resources • Project site: http://www.cello-project.org • Source code: https://bitbucket.org/cello-project • Tutorials: on project site 10/19/2017 M. L. Norman - HPC China 2017, Hefei 58
  • 42. 10/19/2017 M. L. Norman - HPC China 2017, Hefei 59