Opportunities with very large high resolution
climate model datasets
Extreme event attribution
Projections
Machine learning
Michael F. Wehner
Lawrence Berkeley National Laboratory
mfwehner@lbl.gov
US DOE Policy 411.2A
SUBJECT: SCIENTIFIC INTEGRITY
When expressing opinions on policy matters to the public and media,
research personnel must make it clear when they are expressing their
personal views, rather than those of the Department, the U.S.
Government, or their respective institutions. Public representation of
Government or DOE positions or policies must be cleared through
their program management to include DOE headquarters.
Icosahedral High Resolution
More modern discretizations
Resolution
1km
Cloud system resolving models
are a transformational change
25km
Upper limit of climate models
with cloud parameterizations
200km
Typical resolution of
IPCC AR4 models
Surface Altitude (feet)
Technology
Moore’s law is alive and well.
The largest computers continually get faster. And so do models
1990 AMIP1: Many modeling groups required a calendar year to complete a 10 year
integration of a stand alone atmospheric general circulation model. Typical grid
resolution was T21 or about 600km (64X32x10)
2017: I get ~1 simulated year/ wall clock day for the same calculation except at 25km
(1152x768x30)
This calculation used only 7680 processors on a 120,000 processor machine
• 5 million processor hours.
• 25 km grid cell
• Took about 3 months to complete in 2012. Typically, I get better throughput now.
Tropical
Storm
Cat1
Cat2
Cat3
Cat4
Cat5
Figures by Prabhat
Cat1
Cat2
Cat3
Cat4
Cat5
Figures by Prabhat
Cat4
Cat5
Figures by Prabhat
Tropical Cyclone min pressure vs max wind speed
Total # TC / year
observations 87±8
cam5.1 84±9
Total # hurricanes / year
observations 49±7
cam5.1 52
Figures by Cheng-Ta Chen
The strongest hurricanes get more intense.
+0.85oC +1.5oC +2oC +4.0oC
m/s
Average annual most intense tropical cyclone wind speed (m/s)
Real storms can be tracked by hand. They happen in real time!
Tracking of simulated storms must be automated. There are too many to count.
Two approaches.
Traditional, “parametric” feature tracking based on conditions.
• Hurricanes: co-located vorticity maxima, pressure minima, warm cores.
• Extratropical cyclones: co-located vorticity maxima, pressure minima.
• Atmospheric rivers: precipitable water, integrated water transport, etc. (ARTMIP)
• Blocking, fronts, meso-scale convective systems.
Supervised Machine-learning
• Convolutional neural networks.
• Need to have a training data set.
Tracking storms
Two steps:
1. Candidate detection
2. Continuity in time & space. (Stitching/tracking)
Toolkit for Extreme Climate Analysis
https://github.com/LBL-EESA/TECA
Highly parallel
(I routinely use 29200
processors for TC tracking)
TECA2: parallel parametric feature tracking
Tropical cyclone detection.
Movie courtesy of Burlen Loring
Can Deep Learning Work for Climate Science?
- 20 -
Similarities to Computer Vision
• Tasks:
– Pattern Classification
– Clustering
– Feature Learning
– Anomaly Detection
Differences
• Unique attributes of Climate Data
– Multi-channel / Multi-variate
– Spatio-temporal
– Statistics are likely different
Challenge: Multi-Variate Data
- 21 -
- 22 -
Task: Find Extreme Weather Patterns in a box
Supervised Learning
Training Input: Cropped, Centered, Multi-variate patches with Labels*
• Tropical Cyclone (TC)
• Atmospheric River (AR)
• Weather Front (WF)
• TC & AR labels are provided by TECA, which
implemented human-specified criteria
• WF is a hand crafted data set (5 FTE-years)
Output: Binary (Yes/No) on Test patches
• Is there a TC in the patch?
• Is there an AR in the patch?
• Is there a WF in the patch?
Currently, we have separate convolutional neural nets for these 3 storm types.
– Our goal is to have just one machine learning algorithm for all storms.
- 23 -
CLASSIFICATIO
N
Image
Dimensi
on
Variables Total Examples
(+ve) (-ve)
Tropical
Cyclone
32x32 PSL,UBOT,VBOT,TMQ,
U850,V850,T200,T500
10000 10000
Atmospheric
Rivers
148x224 TMQ, Land Sea mask 6500 6800
Weather
Fronts
27x60 T2m, Precip, PSL 5600 6500
Machine learning Training Data
Logistic
Regression
K-Nearest
Neighbor
Support
Vector
Machine
Random
Forest
ConvNet
Train Test Train Test Train Test Train Test Trai
n
Test
Tropical
Cyclone
96.8 95.85 98.1 97.85 97.0 95.85 99.2 99.4 99.3 99.1
Atmosphe
ric Rivers
81.97 82.65 79.7 81.7 81.6 83.0 87.9 88.4 90.5 90.0
Weather
Fronts
84.9 89.8 72.46 76.45 84.35 90.2 80.97 87.5 88.7 89.4
Hyper-parameter optimization applied with Spearmint for all methods
Supervised Classification Accuracy
Weather Front Detection
- 26 -
Contributors: Jim Biard, Ken Kunkel, Evan Racah
- 27 -
- 28 -
- 29 -
Current status
Contact Prabhat about Machine Learning details
prabhat@lbl.gov
Hyper-Parameter Optimization
• Tuning #layers, #filters, learning rates, schedule is a black art
Performance and Scaling
• Current networks take days to train on O(10) GB datasets, we have O(10TB)
datasets on hand
Scarcity of Labeled Data
• Community needs to self-organize and run labeling campaigns
Interpretability and Visualization
• ‘Black Box’ classifier
Deep Learning is viable for Pattern Detection in Climate Data
• Supervised architectures can match TECA performance
• Open challenges in semi-supervised, unsupervised learning and
interpretability
• Need more ground truth catalogs and training data!
- 30 -
• When extreme weather
happens, the public wants to
know
– “Is this climate change?”
Extreme Event Attribution
• Not quite the correct question, better to ask:
– “How has the risk of this event changed because of climate change?”
Or
– “How did climate change affect the magnitude of this event?”
Extreme Event Attribution
Severe floods occurred along
the Colorado Front Range
during the second week of
September 2013, impacting
several thousands of people
and many homes, roads, and
businesses.
Lyons, CO
usatoday.com
• At least 10 deaths; 11,000 evacuated
• Nearly 19,000 homes damaged, and
over 1,500 destroyed, costing $2 bn
• Several highway bridges
damaged/destroyed, and rail lines
affected
South Platte River, CO
nytimes.com
The 2013 Colorado Floods
P Pall, C Patricola, M Wehner, D Stone, C Paciorek, W Collins. In press.
Colorado Floods September 2013
A more constrained numerical experiment
Step 1 Step 2 Step 3 Step 4 Step 5
… with a best estimate of a
about a doubling in odds of
heavy rainfall occurrence.
Simulations suggest a
substantial human-
induced influence on
South Platte rainfall…
NCEP RE-
ANALYSIS
WRF MODEL SOUTH PLATTE
BASIN (CO)
INCREASE IN
ODDS OF HEAVY
RAINFALL
DISTRIBUTIONS OF
ENSEMBLE
RAINFALL
Use Sep 2013
weather from
NCEP re-
analysis, both
under human and
adjusted natural
conditions
…to drive an
ensemble of
100 regional
model
simulations
(WRF 12km)
… then
extract rain
over South
Platte
basin.
Human
Natural
(adjusted T, u,v, RH, etc.)
Colorado Sep 2013 floods: Mechanistic approach
• We find a substantial shift in our rainfall distributions over the South Platte basin
(increase in mean of ~30%)
-> beyond a thermodynamic (~7-14%/K) induced increase, given ΔT = ~1.5-2K
• But increase in precipitable water (~15%) appears broadly consistent with C-C
• The 30% increase is a result of increased cumulus convective energy
• Not a result of changes in larger scale dynamics or uplifting.
• The “storm that was” was more violent than the “storm that might have been”
.
7-DAY RAINFALL
P. Pall, et al. (2016) Diagnosing Anthropogenic Contributions to Heavy Colorado Rainfall in
September 2013. to appear in Weather and Climate Extremes
zarzycki@ucar.edu - University of Colorado, Boulder, CO, April 2016
Typhoon Haiyan
• Use VR-CESM in “forecast mode”
• ATM: GFS analysis
• OCN: NOAA OI
• Ensembles of 120 hr forecasts
Init: 12Z 11-04-2013
NOAA IR Obs: 11-07 21Z
111km: 11-07 21Z
8km: 11-07 21Z
zarzycki@ucar.edu - University of Colorado, Boulder, CO, April 2016
Typhoon Haiyan
• Forecast pretty good!
• Little overall change in forecast track
Obs.
All-Hist
Nat-Hist
Present day storm (red) was slightly weaker than the counterfactual storm (blue)
Colder counterfactual SST alone (green) weakened the storm.
Counterfactual initial conditions alone (yellow)intensified the storm.
Changes in winds and shear had little effect.
Colder upper air temperature changes alone intensified the storm.
Lots of unanswered questions. CAM5 vs MIROC5?
Typhoon Haiyan
Video courtesy of Andreas Prein NCAR
Convective outbreak in May 2010
• Objective based analysis allows to evaluate model on
the storm scale
Observation WRF 4 km
Hurricane Katrina
• Hindcast that was (red)
• Hindcast that might have been (blue)
• 3km WRF
• No detectible anthropogenic effect
on cyclone intensity in 2005
• Accumulated precipitation increases
at Clausius-Clapeyron rates.
• 3km WRF
Max wind speed
• End of 21st century (RCP8.5)
• But intensity increases in a much
warmer world
• 9 & 27km WRF
Max wind speed
• Not an ideal candidate
• Track is not as stable
to perturbations and
simulation start date
00UCT 25 Oct 2005
18UCT 24 Oct 2005
Superstorm
Sandy
Factual Counterfactual
Super storm Sandy
No discernible change in intensity
But storm surge was worse because of sea level rise
(GFDL ran detailed storm surge calculations)
• Christina finds little anthropogenic effect on Hurricane Katrina in 2005
but an intensification if a similar storm occurs in 2100.
• Andreas finds more MCS events and that they move slower in a
warmer world. Maximum rain rates up to 40% more in 2100.
• Our project at LBNL estimates that 28 sustained petaflops is required
for a global 2km climate model.
• We provided a technology path forward based on consumer
electronics design practices (Eliminate waste with a reduced
instruction set.)
• Each hourly 2D variable would require 6TB/year and would need to
be written at 200MB/sec.
– But many variables are of interest so the total is a lot more than
this.
– Some but not all tasks would better suited for in-line calculations.
Cloud system resolving models
• Over 4PB of a single hi-resolution global model is available now.
• Community Atmospheric Model (CAM5.1)
• 25km
Done now
• 5 realizations of a world that was(1996-2015)
• 5 realizations of a world that ParisCOP21 wanted (2105-2115) 1.5K over
preindustrial
• 5 realizations of a world that is also not very likely (2105-2115) 2.0K over
preindustrial
• Done soon
• 5 realizations of a world that might have been (1996-2005)
• 5 realizations of a world that we currently are headed towards (2080-2100)
– RCP8.5 (3.5K over preindustrial)
Available data.
http://portal.nersc.gov/c20c/
C20C+ detection and attribution subproject
• As climate models get to finer resolution, higher frequency
data becomes more interesting, causing dataset sizes to
increase yet more.
• Better simulated storms.
• More realistic extreme weather.
• New questions can be asked. And answered!
• Supervised machine learning works great for finding things
we already know something about (i.e.storms).
• Can unsupervised machine learning reveal other climate
features?
• New classes of storms?
• New modes of variability?
Conclusions
New Journal!
Intended as a bridge between the Statistics and climate/weather/ocean communities
http://advances-statistical-climatology-meteorology-oceanography.net/index.html
Contact me if you want some data!
Thank you!
mfwehner@lbl.gov

Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, Computational and Mathematical Challenges in Climate Modeling - Michael Wehner, Aug 21, 2017

  • 1.
    Opportunities with verylarge high resolution climate model datasets Extreme event attribution Projections Machine learning Michael F. Wehner Lawrence Berkeley National Laboratory mfwehner@lbl.gov
  • 2.
    US DOE Policy411.2A SUBJECT: SCIENTIFIC INTEGRITY When expressing opinions on policy matters to the public and media, research personnel must make it clear when they are expressing their personal views, rather than those of the Department, the U.S. Government, or their respective institutions. Public representation of Government or DOE positions or policies must be cleared through their program management to include DOE headquarters.
  • 6.
    Icosahedral High Resolution Moremodern discretizations
  • 7.
    Resolution 1km Cloud system resolvingmodels are a transformational change 25km Upper limit of climate models with cloud parameterizations 200km Typical resolution of IPCC AR4 models Surface Altitude (feet)
  • 8.
    Technology Moore’s law isalive and well. The largest computers continually get faster. And so do models 1990 AMIP1: Many modeling groups required a calendar year to complete a 10 year integration of a stand alone atmospheric general circulation model. Typical grid resolution was T21 or about 600km (64X32x10) 2017: I get ~1 simulated year/ wall clock day for the same calculation except at 25km (1152x768x30) This calculation used only 7680 processors on a 120,000 processor machine • 5 million processor hours. • 25 km grid cell • Took about 3 months to complete in 2012. Typically, I get better throughput now.
  • 10.
  • 11.
  • 12.
  • 13.
    Tropical Cyclone minpressure vs max wind speed Total # TC / year observations 87±8 cam5.1 84±9 Total # hurricanes / year observations 49±7 cam5.1 52 Figures by Cheng-Ta Chen
  • 14.
    The strongest hurricanesget more intense. +0.85oC +1.5oC +2oC +4.0oC m/s Average annual most intense tropical cyclone wind speed (m/s)
  • 17.
    Real storms canbe tracked by hand. They happen in real time! Tracking of simulated storms must be automated. There are too many to count. Two approaches. Traditional, “parametric” feature tracking based on conditions. • Hurricanes: co-located vorticity maxima, pressure minima, warm cores. • Extratropical cyclones: co-located vorticity maxima, pressure minima. • Atmospheric rivers: precipitable water, integrated water transport, etc. (ARTMIP) • Blocking, fronts, meso-scale convective systems. Supervised Machine-learning • Convolutional neural networks. • Need to have a training data set. Tracking storms
  • 18.
    Two steps: 1. Candidatedetection 2. Continuity in time & space. (Stitching/tracking) Toolkit for Extreme Climate Analysis https://github.com/LBL-EESA/TECA Highly parallel (I routinely use 29200 processors for TC tracking) TECA2: parallel parametric feature tracking
  • 19.
    Tropical cyclone detection. Moviecourtesy of Burlen Loring
  • 20.
    Can Deep LearningWork for Climate Science? - 20 - Similarities to Computer Vision • Tasks: – Pattern Classification – Clustering – Feature Learning – Anomaly Detection Differences • Unique attributes of Climate Data – Multi-channel / Multi-variate – Spatio-temporal – Statistics are likely different
  • 21.
  • 22.
    - 22 - Task:Find Extreme Weather Patterns in a box
  • 23.
    Supervised Learning Training Input:Cropped, Centered, Multi-variate patches with Labels* • Tropical Cyclone (TC) • Atmospheric River (AR) • Weather Front (WF) • TC & AR labels are provided by TECA, which implemented human-specified criteria • WF is a hand crafted data set (5 FTE-years) Output: Binary (Yes/No) on Test patches • Is there a TC in the patch? • Is there an AR in the patch? • Is there a WF in the patch? Currently, we have separate convolutional neural nets for these 3 storm types. – Our goal is to have just one machine learning algorithm for all storms. - 23 -
  • 24.
    CLASSIFICATIO N Image Dimensi on Variables Total Examples (+ve)(-ve) Tropical Cyclone 32x32 PSL,UBOT,VBOT,TMQ, U850,V850,T200,T500 10000 10000 Atmospheric Rivers 148x224 TMQ, Land Sea mask 6500 6800 Weather Fronts 27x60 T2m, Precip, PSL 5600 6500 Machine learning Training Data
  • 25.
    Logistic Regression K-Nearest Neighbor Support Vector Machine Random Forest ConvNet Train Test TrainTest Train Test Train Test Trai n Test Tropical Cyclone 96.8 95.85 98.1 97.85 97.0 95.85 99.2 99.4 99.3 99.1 Atmosphe ric Rivers 81.97 82.65 79.7 81.7 81.6 83.0 87.9 88.4 90.5 90.0 Weather Fronts 84.9 89.8 72.46 76.45 84.35 90.2 80.97 87.5 88.7 89.4 Hyper-parameter optimization applied with Spearmint for all methods Supervised Classification Accuracy
  • 26.
    Weather Front Detection -26 - Contributors: Jim Biard, Ken Kunkel, Evan Racah
  • 27.
  • 28.
  • 29.
  • 30.
    Current status Contact Prabhatabout Machine Learning details prabhat@lbl.gov Hyper-Parameter Optimization • Tuning #layers, #filters, learning rates, schedule is a black art Performance and Scaling • Current networks take days to train on O(10) GB datasets, we have O(10TB) datasets on hand Scarcity of Labeled Data • Community needs to self-organize and run labeling campaigns Interpretability and Visualization • ‘Black Box’ classifier Deep Learning is viable for Pattern Detection in Climate Data • Supervised architectures can match TECA performance • Open challenges in semi-supervised, unsupervised learning and interpretability • Need more ground truth catalogs and training data! - 30 -
  • 31.
    • When extremeweather happens, the public wants to know – “Is this climate change?” Extreme Event Attribution
  • 32.
    • Not quitethe correct question, better to ask: – “How has the risk of this event changed because of climate change?” Or – “How did climate change affect the magnitude of this event?” Extreme Event Attribution
  • 33.
    Severe floods occurredalong the Colorado Front Range during the second week of September 2013, impacting several thousands of people and many homes, roads, and businesses. Lyons, CO usatoday.com • At least 10 deaths; 11,000 evacuated • Nearly 19,000 homes damaged, and over 1,500 destroyed, costing $2 bn • Several highway bridges damaged/destroyed, and rail lines affected South Platte River, CO nytimes.com The 2013 Colorado Floods P Pall, C Patricola, M Wehner, D Stone, C Paciorek, W Collins. In press.
  • 34.
    Colorado Floods September2013 A more constrained numerical experiment Step 1 Step 2 Step 3 Step 4 Step 5 … with a best estimate of a about a doubling in odds of heavy rainfall occurrence. Simulations suggest a substantial human- induced influence on South Platte rainfall… NCEP RE- ANALYSIS WRF MODEL SOUTH PLATTE BASIN (CO) INCREASE IN ODDS OF HEAVY RAINFALL DISTRIBUTIONS OF ENSEMBLE RAINFALL Use Sep 2013 weather from NCEP re- analysis, both under human and adjusted natural conditions …to drive an ensemble of 100 regional model simulations (WRF 12km) … then extract rain over South Platte basin. Human Natural (adjusted T, u,v, RH, etc.)
  • 35.
    Colorado Sep 2013floods: Mechanistic approach • We find a substantial shift in our rainfall distributions over the South Platte basin (increase in mean of ~30%) -> beyond a thermodynamic (~7-14%/K) induced increase, given ΔT = ~1.5-2K • But increase in precipitable water (~15%) appears broadly consistent with C-C • The 30% increase is a result of increased cumulus convective energy • Not a result of changes in larger scale dynamics or uplifting. • The “storm that was” was more violent than the “storm that might have been” . 7-DAY RAINFALL P. Pall, et al. (2016) Diagnosing Anthropogenic Contributions to Heavy Colorado Rainfall in September 2013. to appear in Weather and Climate Extremes
  • 36.
    zarzycki@ucar.edu - Universityof Colorado, Boulder, CO, April 2016 Typhoon Haiyan • Use VR-CESM in “forecast mode” • ATM: GFS analysis • OCN: NOAA OI • Ensembles of 120 hr forecasts Init: 12Z 11-04-2013 NOAA IR Obs: 11-07 21Z 111km: 11-07 21Z 8km: 11-07 21Z
  • 37.
    zarzycki@ucar.edu - Universityof Colorado, Boulder, CO, April 2016 Typhoon Haiyan • Forecast pretty good! • Little overall change in forecast track Obs. All-Hist Nat-Hist
  • 38.
    Present day storm(red) was slightly weaker than the counterfactual storm (blue) Colder counterfactual SST alone (green) weakened the storm. Counterfactual initial conditions alone (yellow)intensified the storm. Changes in winds and shear had little effect. Colder upper air temperature changes alone intensified the storm. Lots of unanswered questions. CAM5 vs MIROC5? Typhoon Haiyan
  • 40.
    Video courtesy ofAndreas Prein NCAR Convective outbreak in May 2010 • Objective based analysis allows to evaluate model on the storm scale Observation WRF 4 km
  • 41.
    Hurricane Katrina • Hindcastthat was (red) • Hindcast that might have been (blue) • 3km WRF
  • 42.
    • No detectibleanthropogenic effect on cyclone intensity in 2005 • Accumulated precipitation increases at Clausius-Clapeyron rates. • 3km WRF Max wind speed
  • 43.
    • End of21st century (RCP8.5) • But intensity increases in a much warmer world • 9 & 27km WRF Max wind speed
  • 44.
    • Not anideal candidate • Track is not as stable to perturbations and simulation start date 00UCT 25 Oct 2005 18UCT 24 Oct 2005 Superstorm Sandy Factual Counterfactual
  • 45.
    Super storm Sandy Nodiscernible change in intensity But storm surge was worse because of sea level rise (GFDL ran detailed storm surge calculations)
  • 46.
    • Christina findslittle anthropogenic effect on Hurricane Katrina in 2005 but an intensification if a similar storm occurs in 2100. • Andreas finds more MCS events and that they move slower in a warmer world. Maximum rain rates up to 40% more in 2100. • Our project at LBNL estimates that 28 sustained petaflops is required for a global 2km climate model. • We provided a technology path forward based on consumer electronics design practices (Eliminate waste with a reduced instruction set.) • Each hourly 2D variable would require 6TB/year and would need to be written at 200MB/sec. – But many variables are of interest so the total is a lot more than this. – Some but not all tasks would better suited for in-line calculations. Cloud system resolving models
  • 47.
    • Over 4PBof a single hi-resolution global model is available now. • Community Atmospheric Model (CAM5.1) • 25km Done now • 5 realizations of a world that was(1996-2015) • 5 realizations of a world that ParisCOP21 wanted (2105-2115) 1.5K over preindustrial • 5 realizations of a world that is also not very likely (2105-2115) 2.0K over preindustrial • Done soon • 5 realizations of a world that might have been (1996-2005) • 5 realizations of a world that we currently are headed towards (2080-2100) – RCP8.5 (3.5K over preindustrial) Available data.
  • 48.
  • 49.
    • As climatemodels get to finer resolution, higher frequency data becomes more interesting, causing dataset sizes to increase yet more. • Better simulated storms. • More realistic extreme weather. • New questions can be asked. And answered! • Supervised machine learning works great for finding things we already know something about (i.e.storms). • Can unsupervised machine learning reveal other climate features? • New classes of storms? • New modes of variability? Conclusions
  • 50.
    New Journal! Intended asa bridge between the Statistics and climate/weather/ocean communities http://advances-statistical-climatology-meteorology-oceanography.net/index.html
  • 51.
    Contact me ifyou want some data! Thank you! mfwehner@lbl.gov