AIAA Future of Fluids 2018 Balaji

Modeling Systems at the end of Dennard Scaling
Future of Fluids: Big Data and Big Computation
Aviation Forum
Atlanta Georgia
V. Balaji
NOAA/GFDL and Princeton University
28 June 2018
V. Balaji (balaji@princeton.edu) The Post-Dennard Era 28 June 2018 1 / 35

Outline
1 Earth system modeling
2 Hardware evolution at the end of Dennard scaling
The end of Dennard scaling
Specialized and commodity computing
Increased concurrency, slower arithmetic
Deep learning is an industry driver
3 Approaches to modeling post-Dennard
Uncertainty exploration
Use fewer bits
Generate low-dimensional representations from
higher-dimensional
4 Ideas and challenges

Outline
Use fewer bits
higher-dimensional

Atmospheric response to doubled CO2
Fig 5 from Manabe and Wetherald (1975), equilibrium response to
doubled CO2.

History of GFDL Computing
Courtesy Brian Gross, NOAA/GFDL.

NGGPS: Next-Generation Global Prediction System
FV3 dynamical core from GFDL for the next-generation forecast model
(target: 3 km non-hydrostatic in 10 years running at ∼ 200 d/d)

Passing the climate Turing test?
We may be able to simulate everything in great detail, but do we
understand how it works?

Outline
Use fewer bits
higher-dimensional

Moore’s Law and End of Dennard scaling
Figure courtesy Moore 2011: Data processing in exascale-class
systems.
Processor concurrency: Intel Xeon-Phi.
Fine-grained thread concurrency: Nvidia GPU.

Top500 revisited
HPCG/HPL ratio is a measure of “percent of peak” (Dongarra and
Heroux 2013).
All recent HPC acquisitions in climate/weather have been on
conventional Intel Xeon (see Balaji et al 2017).

The inexorable triumph of commodity computing
From The Platform, Hemsoth (2015).

The "Navier-Stokes Computer" of 1986
“The Navier-Stokes computer (NSC)
has been developed for solving
problems in ﬂuid mechanics involving
complex ﬂow simulations that require
more speed and capacity than
provided by current and proposed
Class VI supercomputers. The
machine is a parallel processing
supercomputer with several new
architectural elements which can be
programmed to address a wide range
of problems meeting the following
criteria: (1) the problem is
numerically intensive, and (2) the
code makes use of long vectors.”
Nosenchuck and Littman (1986)

The Caltech "Cosmic Cube" (1986)
“Caltech is at its best blazing new trails; we are not the best place for
programmatic research that dots i’s and crosses t’s”. Geoffrey Fox,
pioneer of the Caltech Concurrent Computation Program, in 1986.

Beowulf clusters

Power-8 with NVLink
Figure courtesy IBM.

KNL Overview
Figure courtesy Intel.

Processors for Deep Learning
Deep learning is a layered NN approach with hidden layers. Figure
courtesy NVidia.

Google TPU (Tensor Processing Unit)
Figure courtesy Google.

Google TPU (Tensor Processing Unit)
Hardware pipelining of steps in matrix-multiply. Figure courtesy
Google.

Outline
Use fewer bits
higher-dimensional

No separation of "large" and "small" scales
Nastrom and Gage (1985).

Multi-model “skill scores”
Based on RMS error of surface temperature and precipitation. (Fig. 3
from Knutti et al, GRL, 2013).

Multi-model skill scores?
More complex models that show the same skill represents an
“advance”!

Model tuning
Model tuning or “calibration” consists of reducing overall model bias
(usually relative to 20th century climatology) by modifying parameters.
In principle, minimizing some cost function:
C(p1, p2, ...) =
N
1
ωi φi − φobs
i
Usually the p must be chosen within some observed or theoretical
range pmin ≤ p ≤ pmax .
“Fudge factors” (applying known wrong values) generally frowned
upon (see Shackley et al 1999 discussion on history of “ﬂux
adjustments”. More on that later...)
The choice of ωi is part of the lab’s “culture”!
The choice of φobs
i is also troublesome:
overlap between “tuning” metrics and “evaluation” metrics.
“Over-tuning”: remember “reality” is but one ensemble member!

Model choice: culture and constraints
GFDL models built on FMS Goals: dec-cen, carbon cycle,
seasonal prediction, decadal predictability, TC climatology,
aerosol-cloud feedbacks, ozone climate, regional climate
IITM (8 SYPD on 164p; 500 CHSY): Goals: DECK experiments,
monsoons under climate change.
IPSL: IPSLCM6-VLR (38 SYPD on 160p; 100 CHSY) to
IPSLCM6-LR (6 SYPD on 550p; 2200 CHSY) Goals: WCRP
grand challenge on clouds; dec-cen climate change; carbon cycle;
ozone climate; paleoclimate
Strategies of model building (choices of ωi)
Thought experiment: if two different labs started at the same point in
Knutti’s genealogy, would they build the same model?

Objective methods of tuning?
Neelin et al (2010) construct “metamodels” to aid in multi-parameter
optimization. Metamodel generation is expensive (as in deep learning),
and varies with cost function.

Low precision arithmetic for Deep Learning
Figure 1 from Gupta et al (2015).

Low precision arithmetic for Deep Learning
Figure courtesy NVidia. Low-precision arithmetic.

Irreproducible Computing, Inexact Hardware
Figure 1 from Düben et al, Phil. Trans. A, 2016. Which bits can we
allow to be “inexactly” ﬂipped? Lorenz 96 as canonical test case of
non-linearity and chaos.

Irreproducible Computing, Inexact Hardware
Figure 2 from Düben et al, Phil. Trans. A, 2016.

Generating parameterizations from CRMs and
super-parameterization
(Courtesy: S-J Lin, NOAA/GFDL).
(Courtesy: D. Randall, CSU;
CMMAP).
Global-scale CRMs (e.g 7 km simulation on the left) and even
super-parameterization using embedded cloud models (right)
remain prohibitively expensive.
Use emulators (genetic programming or DL using GCM-resolution
predictors) to emulate columns of a cloud ﬁeld.

Outline
Use fewer bits
higher-dimensional

Ideas and Challenges
No scale separation implies a catastrophic cascade of
dimensionality: we’re off by 1010 from required flops, Schneider et
al (2017).
Multiple “fit-for-purpose” cost functions depending on the question
asked.
Learning algorithms may play multiple roles:
Building emulators, fast surrogate models of low dimensionality.
Early detection of “viable” models
Other fields exploring same terrain face substantial difficulties: see
Frégnac (2017): “Big data and the industrialization of
neuroscience: A safe roadmap for understanding the brain?” See
also Jonas and Kording (2017): “Could a Neuroscientist
Understand a Microprocessor?”
In the face of the above, we must regard it a success that we hold
the line on Manabe’s results despite a vast increase in
dimensionality!
Need unified modeling system across the model hierarchy.

What would future infrastructure look like?
A uniﬁed modeling infrastructure with:
≤∼1 SYPD models, “LES”, “DNS” for generating training data
∼10 SYPD comprehensive models for “doing science” – e.g climate
sensitivity, detection-attribution, predictability, prediction, projection,
...
≥∼100-1000 SYPD fast approximate models for uncertainty
exploration
Massive re-engineering to speed up the 10 SYPD model by a few
X will not be transformational (scientists will add to it to bring it
back to ∼10 SYPD)
A ﬂexible open evaluation and testing framework where metrics
can be added with little effort (see e.g Pangeo)
A system of composing cost functions at will and generating the
learnt models within a period attuned to human attention span

Bibliography
“Climate goals and computing the future of clouds”, Schneider et
al 2017.
“Climate Computing: The State of Play” Balaji 2015.
“Big data and the industrialization of neuroscience: A safe
roadmap for understanding the brain?” Frégnac 2017.
“The Art and Science of Climate Model Tuning”. Hourdin et al
2016.
“On the use of inexact, pruned hardware in atmospheric
modelling” Düben et al 2014.
“CPMIP: measurements of real computational performance of
Earth system models in CMIP6”. Balaji et al 2017.

AIAA Future of Fluids 2018 Balaji

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Similar to AIAA Future of Fluids 2018 Balaji

Similar to AIAA Future of Fluids 2018 Balaji (20)

Recently uploaded

Recently uploaded (20)

AIAA Future of Fluids 2018 Balaji