SlideShare a Scribd company logo
Modeling Systems at the end of Dennard Scaling
Future of Fluids: Big Data and Big Computation
Aviation Forum
Atlanta Georgia
V. Balaji
NOAA/GFDL and Princeton University
28 June 2018
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 1 / 35
Outline
1 Earth system modeling
2 Hardware evolution at the end of Dennard scaling
The end of Dennard scaling
Specialized and commodity computing
Increased concurrency, slower arithmetic
Deep learning is an industry driver
3 Approaches to modeling post-Dennard
Uncertainty exploration
Use fewer bits
Generate low-dimensional representations from
higher-dimensional
4 Ideas and challenges
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 2 / 35
Outline
1 Earth system modeling
2 Hardware evolution at the end of Dennard scaling
The end of Dennard scaling
Specialized and commodity computing
Increased concurrency, slower arithmetic
Deep learning is an industry driver
3 Approaches to modeling post-Dennard
Uncertainty exploration
Use fewer bits
Generate low-dimensional representations from
higher-dimensional
4 Ideas and challenges
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 3 / 35
Atmospheric response to doubled CO2
Fig 5 from Manabe and Wetherald (1975), equilibrium response to
doubled CO2.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 4 / 35
History of GFDL Computing
Courtesy Brian Gross, NOAA/GFDL.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 5 / 35
NGGPS: Next-Generation Global Prediction System
FV3 dynamical core from GFDL for the next-generation forecast model
(target: 3 km non-hydrostatic in 10 years running at ∼ 200 d/d)
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 6 / 35
Passing the climate Turing test?
We may be able to simulate everything in great detail, but do we
understand how it works?
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 7 / 35
Outline
1 Earth system modeling
2 Hardware evolution at the end of Dennard scaling
The end of Dennard scaling
Specialized and commodity computing
Increased concurrency, slower arithmetic
Deep learning is an industry driver
3 Approaches to modeling post-Dennard
Uncertainty exploration
Use fewer bits
Generate low-dimensional representations from
higher-dimensional
4 Ideas and challenges
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 8 / 35
Moore’s Law and End of Dennard scaling
Figure courtesy Moore 2011: Data processing in exascale-class
systems.
Processor concurrency: Intel Xeon-Phi.
Fine-grained thread concurrency: Nvidia GPU.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 9 / 35
Top500 revisited
HPCG/HPL ratio is a measure of “percent of peak” (Dongarra and
Heroux 2013).
All recent HPC acquisitions in climate/weather have been on
conventional Intel Xeon (see Balaji et al 2017).
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 10 / 35
The inexorable triumph of commodity computing
From The Platform, Hemsoth (2015).
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 11 / 35
The "Navier-Stokes Computer" of 1986
“The Navier-Stokes computer (NSC)
has been developed for solving
problems in fluid mechanics involving
complex flow simulations that require
more speed and capacity than
provided by current and proposed
Class VI supercomputers. The
machine is a parallel processing
supercomputer with several new
architectural elements which can be
programmed to address a wide range
of problems meeting the following
criteria: (1) the problem is
numerically intensive, and (2) the
code makes use of long vectors.”
Nosenchuck and Littman (1986)
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 12 / 35
The Caltech "Cosmic Cube" (1986)
“Caltech is at its best blazing new trails; we are not the best place for
programmatic research that dots i’s and crosses t’s”. Geoffrey Fox,
pioneer of the Caltech Concurrent Computation Program, in 1986.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 13 / 35
Beowulf clusters
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 14 / 35
Power-8 with NVLink
Figure courtesy IBM.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 15 / 35
KNL Overview
Figure courtesy Intel.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 16 / 35
Processors for Deep Learning
Deep learning is a layered NN approach with hidden layers. Figure
courtesy NVidia.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 17 / 35
Google TPU (Tensor Processing Unit)
Figure courtesy Google.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 18 / 35
Google TPU (Tensor Processing Unit)
Hardware pipelining of steps in matrix-multiply. Figure courtesy
Google.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 19 / 35
Outline
1 Earth system modeling
2 Hardware evolution at the end of Dennard scaling
The end of Dennard scaling
Specialized and commodity computing
Increased concurrency, slower arithmetic
Deep learning is an industry driver
3 Approaches to modeling post-Dennard
Uncertainty exploration
Use fewer bits
Generate low-dimensional representations from
higher-dimensional
4 Ideas and challenges
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 20 / 35
No separation of "large" and "small" scales
Nastrom and Gage (1985).
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 21 / 35
Multi-model “skill scores”
Based on RMS error of surface temperature and precipitation. (Fig. 3
from Knutti et al, GRL, 2013).
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 22 / 35
Multi-model skill scores?
More complex models that show the same skill represents an
“advance”!
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 23 / 35
Model tuning
Model tuning or “calibration” consists of reducing overall model bias
(usually relative to 20th century climatology) by modifying parameters.
In principle, minimizing some cost function:
C(p1, p2, ...) =
N
1
ωi φi − φobs
i
Usually the p must be chosen within some observed or theoretical
range pmin ≤ p ≤ pmax .
“Fudge factors” (applying known wrong values) generally frowned
upon (see Shackley et al 1999 discussion on history of “flux
adjustments”. More on that later...)
The choice of ωi is part of the lab’s “culture”!
The choice of φobs
i is also troublesome:
overlap between “tuning” metrics and “evaluation” metrics.
“Over-tuning”: remember “reality” is but one ensemble member!
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 24 / 35
Model choice: culture and constraints
GFDL models built on FMS Goals: dec-cen, carbon cycle,
seasonal prediction, decadal predictability, TC climatology,
aerosol-cloud feedbacks, ozone climate, regional climate
IITM (8 SYPD on 164p; 500 CHSY): Goals: DECK experiments,
monsoons under climate change.
IPSL: IPSLCM6-VLR (38 SYPD on 160p; 100 CHSY) to
IPSLCM6-LR (6 SYPD on 550p; 2200 CHSY) Goals: WCRP
grand challenge on clouds; dec-cen climate change; carbon cycle;
ozone climate; paleoclimate
Strategies of model building (choices of ωi)
Thought experiment: if two different labs started at the same point in
Knutti’s genealogy, would they build the same model?
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 25 / 35
Objective methods of tuning?
Neelin et al (2010) construct “metamodels” to aid in multi-parameter
optimization. Metamodel generation is expensive (as in deep learning),
and varies with cost function.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 26 / 35
Low precision arithmetic for Deep Learning
Figure 1 from Gupta et al (2015).
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 27 / 35
Low precision arithmetic for Deep Learning
Figure courtesy NVidia. Low-precision arithmetic.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 28 / 35
Irreproducible Computing, Inexact Hardware
Figure 1 from Düben et al, Phil. Trans. A, 2016. Which bits can we
allow to be “inexactly” flipped? Lorenz 96 as canonical test case of
non-linearity and chaos.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 29 / 35
Irreproducible Computing, Inexact Hardware
Figure 2 from Düben et al, Phil. Trans. A, 2016.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 30 / 35
Generating parameterizations from CRMs and
super-parameterization
(Courtesy: S-J Lin, NOAA/GFDL).
(Courtesy: D. Randall, CSU;
CMMAP).
Global-scale CRMs (e.g 7 km simulation on the left) and even
super-parameterization using embedded cloud models (right)
remain prohibitively expensive.
Use emulators (genetic programming or DL using GCM-resolution
predictors) to emulate columns of a cloud field.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 31 / 35
Outline
1 Earth system modeling
2 Hardware evolution at the end of Dennard scaling
The end of Dennard scaling
Specialized and commodity computing
Increased concurrency, slower arithmetic
Deep learning is an industry driver
3 Approaches to modeling post-Dennard
Uncertainty exploration
Use fewer bits
Generate low-dimensional representations from
higher-dimensional
4 Ideas and challenges
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 32 / 35
Ideas and Challenges
No scale separation implies a catastrophic cascade of
dimensionality: we’re off by 1010 from required flops, Schneider et
al (2017).
Multiple “fit-for-purpose” cost functions depending on the question
asked.
Learning algorithms may play multiple roles:
Building emulators, fast surrogate models of low dimensionality.
Early detection of “viable” models
Other fields exploring same terrain face substantial difficulties: see
Frégnac (2017): “Big data and the industrialization of
neuroscience: A safe roadmap for understanding the brain?” See
also Jonas and Kording (2017): “Could a Neuroscientist
Understand a Microprocessor?”
In the face of the above, we must regard it a success that we hold
the line on Manabe’s results despite a vast increase in
dimensionality!
Need unified modeling system across the model hierarchy.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 33 / 35
What would future infrastructure look like?
A unified modeling infrastructure with:
≤∼1 SYPD models, “LES”, “DNS” for generating training data
∼10 SYPD comprehensive models for “doing science” – e.g climate
sensitivity, detection-attribution, predictability, prediction, projection,
...
≥∼100-1000 SYPD fast approximate models for uncertainty
exploration
Massive re-engineering to speed up the 10 SYPD model by a few
X will not be transformational (scientists will add to it to bring it
back to ∼10 SYPD)
A flexible open evaluation and testing framework where metrics
can be added with little effort (see e.g Pangeo)
A system of composing cost functions at will and generating the
learnt models within a period attuned to human attention span
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 34 / 35
Bibliography
“Climate goals and computing the future of clouds”, Schneider et
al 2017.
“Climate Computing: The State of Play” Balaji 2015.
“Big data and the industrialization of neuroscience: A safe
roadmap for understanding the brain?” Frégnac 2017.
“The Art and Science of Climate Model Tuning”. Hourdin et al
2016.
“On the use of inexact, pruned hardware in atmospheric
modelling” Düben et al 2014.
“CPMIP: measurements of real computational performance of
Earth system models in CMIP6”. Balaji et al 2017.
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 35 / 35

More Related Content

What's hot

QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
The Statistical and Applied Mathematical Sciences Institute
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
David Gleich
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
Mark Reynolds
 
The Solution of Maximal Flow Problems Using the Method Of Fuzzy Linear Progra...
The Solution of Maximal Flow Problems Using the Method Of Fuzzy Linear Progra...The Solution of Maximal Flow Problems Using the Method Of Fuzzy Linear Progra...
The Solution of Maximal Flow Problems Using the Method Of Fuzzy Linear Progra...
theijes
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
SERC at Carleton College
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Geoffrey Fox
 
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
Christian Plessl
 

What's hot (7)

QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
 
The Solution of Maximal Flow Problems Using the Method Of Fuzzy Linear Progra...
The Solution of Maximal Flow Problems Using the Method Of Fuzzy Linear Progra...The Solution of Maximal Flow Problems Using the Method Of Fuzzy Linear Progra...
The Solution of Maximal Flow Problems Using the Method Of Fuzzy Linear Progra...
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
 

Similar to AIAA Future of Fluids 2018 Balaji

Scaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid ParallelismScaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid Parallelism
Parameswaran Raman
 
Sc10 slide share
Sc10 slide shareSc10 slide share
Sc10 slide share
Guy Tel-Zur
 
Microkernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric ComputingMicrokernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric Computing
Martin Děcký
 
Toward Greener Cyberinfrastructure
Toward Greener CyberinfrastructureToward Greener Cyberinfrastructure
Toward Greener Cyberinfrastructure
Larry Smarr
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Jason Riedy
 
geostack
geostackgeostack
geostack
Joana Simoes
 
Digital Infrastructure in a Carbon-Constrained World
Digital Infrastructure in a Carbon-Constrained WorldDigital Infrastructure in a Carbon-Constrained World
Digital Infrastructure in a Carbon-Constrained World
Larry Smarr
 
TUW-ASE Summer 2015 - Quality of Result-aware data analytics
TUW-ASE Summer 2015 - Quality of Result-aware data analyticsTUW-ASE Summer 2015 - Quality of Result-aware data analytics
TUW-ASE Summer 2015 - Quality of Result-aware data analytics
Hong-Linh Truong
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
Edureka!
 
A computational scientist's wish list for tomorrow's computing systems
A computational scientist's wish list for tomorrow's computing systemsA computational scientist's wish list for tomorrow's computing systems
A computational scientist's wish list for tomorrow's computing systems
khinsen
 
Simulation Informatics
Simulation InformaticsSimulation Informatics
Simulation Informatics
David Gleich
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
Pietro Michiardi
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender Systems
Enrico Palumbo
 
GreenLight Project Overview
GreenLight Project OverviewGreenLight Project Overview
GreenLight Project Overview
Jerry Sheehan
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
Akin Osman Kazakci
 
Machine Learning meets Granular Computing
Machine Learning meets Granular ComputingMachine Learning meets Granular Computing
Machine Learning meets Granular Computing
Jenny Midwinter
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and Applications
Vijay Raghavan
 
STINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingSTINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph Streaming
Jason Riedy
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
tuxette
 
HPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific ComputingHPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific Computing
inside-BigData.com
 

Similar to AIAA Future of Fluids 2018 Balaji (20)

Scaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid ParallelismScaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid Parallelism
 
Sc10 slide share
Sc10 slide shareSc10 slide share
Sc10 slide share
 
Microkernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric ComputingMicrokernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric Computing
 
Toward Greener Cyberinfrastructure
Toward Greener CyberinfrastructureToward Greener Cyberinfrastructure
Toward Greener Cyberinfrastructure
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
geostack
geostackgeostack
geostack
 
Digital Infrastructure in a Carbon-Constrained World
Digital Infrastructure in a Carbon-Constrained WorldDigital Infrastructure in a Carbon-Constrained World
Digital Infrastructure in a Carbon-Constrained World
 
TUW-ASE Summer 2015 - Quality of Result-aware data analytics
TUW-ASE Summer 2015 - Quality of Result-aware data analyticsTUW-ASE Summer 2015 - Quality of Result-aware data analytics
TUW-ASE Summer 2015 - Quality of Result-aware data analytics
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
 
A computational scientist's wish list for tomorrow's computing systems
A computational scientist's wish list for tomorrow's computing systemsA computational scientist's wish list for tomorrow's computing systems
A computational scientist's wish list for tomorrow's computing systems
 
Simulation Informatics
Simulation InformaticsSimulation Informatics
Simulation Informatics
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender Systems
 
GreenLight Project Overview
GreenLight Project OverviewGreenLight Project Overview
GreenLight Project Overview
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
Machine Learning meets Granular Computing
Machine Learning meets Granular ComputingMachine Learning meets Granular Computing
Machine Learning meets Granular Computing
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and Applications
 
STINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingSTINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph Streaming
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
HPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific ComputingHPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific Computing
 

Recently uploaded

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

AIAA Future of Fluids 2018 Balaji

  • 1. Modeling Systems at the end of Dennard Scaling Future of Fluids: Big Data and Big Computation Aviation Forum Atlanta Georgia V. Balaji NOAA/GFDL and Princeton University 28 June 2018 V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 1 / 35
  • 2. Outline 1 Earth system modeling 2 Hardware evolution at the end of Dennard scaling The end of Dennard scaling Specialized and commodity computing Increased concurrency, slower arithmetic Deep learning is an industry driver 3 Approaches to modeling post-Dennard Uncertainty exploration Use fewer bits Generate low-dimensional representations from higher-dimensional 4 Ideas and challenges V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 2 / 35
  • 3. Outline 1 Earth system modeling 2 Hardware evolution at the end of Dennard scaling The end of Dennard scaling Specialized and commodity computing Increased concurrency, slower arithmetic Deep learning is an industry driver 3 Approaches to modeling post-Dennard Uncertainty exploration Use fewer bits Generate low-dimensional representations from higher-dimensional 4 Ideas and challenges V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 3 / 35
  • 4. Atmospheric response to doubled CO2 Fig 5 from Manabe and Wetherald (1975), equilibrium response to doubled CO2. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 4 / 35
  • 5. History of GFDL Computing Courtesy Brian Gross, NOAA/GFDL. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 5 / 35
  • 6. NGGPS: Next-Generation Global Prediction System FV3 dynamical core from GFDL for the next-generation forecast model (target: 3 km non-hydrostatic in 10 years running at ∼ 200 d/d) V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 6 / 35
  • 7. Passing the climate Turing test? We may be able to simulate everything in great detail, but do we understand how it works? V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 7 / 35
  • 8. Outline 1 Earth system modeling 2 Hardware evolution at the end of Dennard scaling The end of Dennard scaling Specialized and commodity computing Increased concurrency, slower arithmetic Deep learning is an industry driver 3 Approaches to modeling post-Dennard Uncertainty exploration Use fewer bits Generate low-dimensional representations from higher-dimensional 4 Ideas and challenges V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 8 / 35
  • 9. Moore’s Law and End of Dennard scaling Figure courtesy Moore 2011: Data processing in exascale-class systems. Processor concurrency: Intel Xeon-Phi. Fine-grained thread concurrency: Nvidia GPU. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 9 / 35
  • 10. Top500 revisited HPCG/HPL ratio is a measure of “percent of peak” (Dongarra and Heroux 2013). All recent HPC acquisitions in climate/weather have been on conventional Intel Xeon (see Balaji et al 2017). V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 10 / 35
  • 11. The inexorable triumph of commodity computing From The Platform, Hemsoth (2015). V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 11 / 35
  • 12. The "Navier-Stokes Computer" of 1986 “The Navier-Stokes computer (NSC) has been developed for solving problems in fluid mechanics involving complex flow simulations that require more speed and capacity than provided by current and proposed Class VI supercomputers. The machine is a parallel processing supercomputer with several new architectural elements which can be programmed to address a wide range of problems meeting the following criteria: (1) the problem is numerically intensive, and (2) the code makes use of long vectors.” Nosenchuck and Littman (1986) V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 12 / 35
  • 13. The Caltech "Cosmic Cube" (1986) “Caltech is at its best blazing new trails; we are not the best place for programmatic research that dots i’s and crosses t’s”. Geoffrey Fox, pioneer of the Caltech Concurrent Computation Program, in 1986. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 13 / 35
  • 14. Beowulf clusters V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 14 / 35
  • 15. Power-8 with NVLink Figure courtesy IBM. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 15 / 35
  • 16. KNL Overview Figure courtesy Intel. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 16 / 35
  • 17. Processors for Deep Learning Deep learning is a layered NN approach with hidden layers. Figure courtesy NVidia. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 17 / 35
  • 18. Google TPU (Tensor Processing Unit) Figure courtesy Google. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 18 / 35
  • 19. Google TPU (Tensor Processing Unit) Hardware pipelining of steps in matrix-multiply. Figure courtesy Google. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 19 / 35
  • 20. Outline 1 Earth system modeling 2 Hardware evolution at the end of Dennard scaling The end of Dennard scaling Specialized and commodity computing Increased concurrency, slower arithmetic Deep learning is an industry driver 3 Approaches to modeling post-Dennard Uncertainty exploration Use fewer bits Generate low-dimensional representations from higher-dimensional 4 Ideas and challenges V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 20 / 35
  • 21. No separation of "large" and "small" scales Nastrom and Gage (1985). V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 21 / 35
  • 22. Multi-model “skill scores” Based on RMS error of surface temperature and precipitation. (Fig. 3 from Knutti et al, GRL, 2013). V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 22 / 35
  • 23. Multi-model skill scores? More complex models that show the same skill represents an “advance”! V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 23 / 35
  • 24. Model tuning Model tuning or “calibration” consists of reducing overall model bias (usually relative to 20th century climatology) by modifying parameters. In principle, minimizing some cost function: C(p1, p2, ...) = N 1 ωi φi − φobs i Usually the p must be chosen within some observed or theoretical range pmin ≤ p ≤ pmax . “Fudge factors” (applying known wrong values) generally frowned upon (see Shackley et al 1999 discussion on history of “flux adjustments”. More on that later...) The choice of ωi is part of the lab’s “culture”! The choice of φobs i is also troublesome: overlap between “tuning” metrics and “evaluation” metrics. “Over-tuning”: remember “reality” is but one ensemble member! V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 24 / 35
  • 25. Model choice: culture and constraints GFDL models built on FMS Goals: dec-cen, carbon cycle, seasonal prediction, decadal predictability, TC climatology, aerosol-cloud feedbacks, ozone climate, regional climate IITM (8 SYPD on 164p; 500 CHSY): Goals: DECK experiments, monsoons under climate change. IPSL: IPSLCM6-VLR (38 SYPD on 160p; 100 CHSY) to IPSLCM6-LR (6 SYPD on 550p; 2200 CHSY) Goals: WCRP grand challenge on clouds; dec-cen climate change; carbon cycle; ozone climate; paleoclimate Strategies of model building (choices of ωi) Thought experiment: if two different labs started at the same point in Knutti’s genealogy, would they build the same model? V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 25 / 35
  • 26. Objective methods of tuning? Neelin et al (2010) construct “metamodels” to aid in multi-parameter optimization. Metamodel generation is expensive (as in deep learning), and varies with cost function. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 26 / 35
  • 27. Low precision arithmetic for Deep Learning Figure 1 from Gupta et al (2015). V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 27 / 35
  • 28. Low precision arithmetic for Deep Learning Figure courtesy NVidia. Low-precision arithmetic. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 28 / 35
  • 29. Irreproducible Computing, Inexact Hardware Figure 1 from Düben et al, Phil. Trans. A, 2016. Which bits can we allow to be “inexactly” flipped? Lorenz 96 as canonical test case of non-linearity and chaos. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 29 / 35
  • 30. Irreproducible Computing, Inexact Hardware Figure 2 from Düben et al, Phil. Trans. A, 2016. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 30 / 35
  • 31. Generating parameterizations from CRMs and super-parameterization (Courtesy: S-J Lin, NOAA/GFDL). (Courtesy: D. Randall, CSU; CMMAP). Global-scale CRMs (e.g 7 km simulation on the left) and even super-parameterization using embedded cloud models (right) remain prohibitively expensive. Use emulators (genetic programming or DL using GCM-resolution predictors) to emulate columns of a cloud field. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 31 / 35
  • 32. Outline 1 Earth system modeling 2 Hardware evolution at the end of Dennard scaling The end of Dennard scaling Specialized and commodity computing Increased concurrency, slower arithmetic Deep learning is an industry driver 3 Approaches to modeling post-Dennard Uncertainty exploration Use fewer bits Generate low-dimensional representations from higher-dimensional 4 Ideas and challenges V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 32 / 35
  • 33. Ideas and Challenges No scale separation implies a catastrophic cascade of dimensionality: we’re off by 1010 from required flops, Schneider et al (2017). Multiple “fit-for-purpose” cost functions depending on the question asked. Learning algorithms may play multiple roles: Building emulators, fast surrogate models of low dimensionality. Early detection of “viable” models Other fields exploring same terrain face substantial difficulties: see Frégnac (2017): “Big data and the industrialization of neuroscience: A safe roadmap for understanding the brain?” See also Jonas and Kording (2017): “Could a Neuroscientist Understand a Microprocessor?” In the face of the above, we must regard it a success that we hold the line on Manabe’s results despite a vast increase in dimensionality! Need unified modeling system across the model hierarchy. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 33 / 35
  • 34. What would future infrastructure look like? A unified modeling infrastructure with: ≤∼1 SYPD models, “LES”, “DNS” for generating training data ∼10 SYPD comprehensive models for “doing science” – e.g climate sensitivity, detection-attribution, predictability, prediction, projection, ... ≥∼100-1000 SYPD fast approximate models for uncertainty exploration Massive re-engineering to speed up the 10 SYPD model by a few X will not be transformational (scientists will add to it to bring it back to ∼10 SYPD) A flexible open evaluation and testing framework where metrics can be added with little effort (see e.g Pangeo) A system of composing cost functions at will and generating the learnt models within a period attuned to human attention span V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 34 / 35
  • 35. Bibliography “Climate goals and computing the future of clouds”, Schneider et al 2017. “Climate Computing: The State of Play” Balaji 2015. “Big data and the industrialization of neuroscience: A safe roadmap for understanding the brain?” Frégnac 2017. “The Art and Science of Climate Model Tuning”. Hourdin et al 2016. “On the use of inexact, pruned hardware in atmospheric modelling” Düben et al 2014. “CPMIP: measurements of real computational performance of Earth system models in CMIP6”. Balaji et al 2017. V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 35 / 35