Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine learning and climate and weather research

73 views

Published on

We present the current activities of the German Climate Computing Center (DKRZ) related to the application of machine learning and deep learning in fundamental weather and climate research. We follow the Nature article "Deep learning and process understanding for data-driven Earth system science" (https://www.nature.com/articles/s41586-019-0912-1), elaborate on the hybrid model in the article "Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling" (https://arxiv.org/abs/1710.11431), and explain the recent application of Nvidia image inpaiting in the reconstruction of temperature missing data (Kadow et al. (2020), "Artificial Intelligence reconstructs missing Climate Information" (in review)).

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Machine learning and climate and weather research

  1. 1. Karsten Peters peters@dkrz.de Applying machine learning to address pressing issues of fundamental weather and climate research Maria Moreno de Castro moreno@dkrz.de
  2. 2. the presentation follow this perspective article
  3. 3. Organism Biome Region Landscape ECOSYSTEM Organ Cell Molecule GLOBE Complex Biology + Chemistry + Physics Unique The Earth System Slide courtesy from the author Markus Reichstein
  4. 4. THE EARTH SYSTEM
  5. 5. THE EARTH SYSTEM The behavior is dominated by spatial and temporal relations Main research focus: ○ seasonal meteorological predictions ○ forecasting extreme events: floods, fires,... ○ long term climate predictions
  6. 6. It’s not like we haven’t got enough data at our hands… There’s observational data…
  7. 7. The A-Train - 7 satellites flying in formation - Operating since ~18 years - Aqua collects about 89 GB of data/day Example of observational data collection by remote sensing
  8. 8. It’s not like we haven’t got enough data at our hands… There’s observational data … ...and model data.
  9. 9. Model data are the result of simulations generated by numerically solved differential equations derived from physical models by discretizing the Earth and representing key processes with parameterizations
  10. 10. Repeat for every model timestep and for every point of the globe Model data Calculate physical processes Apply boundary conditions
  11. 11. Multimodel analysis ©DKRZ/MPI-M Last report: 3.5PBytes Next report: ~ 30PBytes
  12. 12. Climate Models at the 1km scale are coming up ~650 GB of data per output time step
  13. 13. << Mistral High Performace Computer - 6 years old (new in 2020) - 3.9 PFlops (#80 in Top500) - 52 PiBytes disk (#4 in IO500) Tape archive - >200 PiBytes - 5PiBytes disk cache
  14. 14. <<
  15. 15. example of spatio-temporal relations: the prediction of fire occurrence, the stimation of burnt area, and the trace of gas emissions depend on: ● instantaneous climatic drivers: temperature, humidity,... ● sources of ignition: humans, lightning,... ● state variables: available fuel,.. ● moisture, terrain, wind speed and direction,..
  16. 16. Machine learning applications often do not directly and exhaustively account for spatio-temporal correlations Deep learning is a promising approach Example: convolutional networks (spatial) + recurrent networks (memory, sequence learning)
  17. 17. Examples of Deep Learning applications in Earth System Science Slide courtesy from the author Markus Reichstein
  18. 18. Deep learning challenges in Earth System science
  19. 19. ● Diverse sources of noise → poor signal-to-noise ratio Deep learning challenges in Earth System science
  20. 20. ● Diverse sources of noise → poor signal-to-noise ratio ● Inconsistencies Deep learning challenges in Earth System science
  21. 21. Fundamental laws of physics energy and mass conservations,.... and we must assure the deep learning models do not allow for negative densities, precipitations,... Noether's theorem explains why conservation laws exists (wikipedia)
  22. 22. ● Diverse sources of noise→ poor signal-to-noise ratio ● Inconsistencies → energy and mass conservations, density must be positive,... ● Extrapolation problem Deep learning challenges in Earth System science
  23. 23. Extrapolation problem: classification
  24. 24. Extrapolation problem: classification the model should show is not certain about predicting in undersampled regions...
  25. 25. input data Extrapolation problem: regression
  26. 26. Non- stationary system Data shift or concept drift ● training data are not longer representative if the system has changed ● the accuracy of the trained model definitely decreased under data shift/concept drift
  27. 27. ● Diverse sources of noise→ poor signal-to-noise ratio ● Inconsistencies → energy and mass conservations, density must be positive,... ● Extrapolation problem → system changes in time: data shift or concept drift Deep learning challenges in Earth System science
  28. 28. ● Beyond visible spectrum → different statistical properties, no i.i.d. sets ● 40 000 x 20 000 pixels for a regular 1 km global resolution ● Multiple scales Images Deep learning challenges in Earth System science
  29. 29. what is the scale of this? → ● Beyond visible spectrum → different statistical properties, no i.i.d. sets ● 40 000 x 20 000 pixels for a regular 1 km global resolution ● Multiple scales ● Scale invariant features Images Deep learning challenges in Earth System science
  30. 30. Scale invariant!
  31. 31. ● Beyond visible spectrum → different statistical properties, no i.i.d. sets ● 40 000 x 20 000 pixels for a regular 1 km global resolution ● Multiple scales ● Scale invariant features ● No ImageNet → and difficult to have, example: labelling clouds Images Deep learning challenges in Earth System science
  32. 32. ● Beyond visible spectrum → different statistical properties, no i.i.d. sets ● 40 000 x 20 000 pixels for a regular 1 km global resolution ● Multiple scales ● Scale invariant features ● No ImageNet → and difficult to have, example: labelling clouds ● Missing data → a solution Christopher Kadow, the leader of DKRZ machine learning research group, adapted the Nvidia Technology for image inpainting Deep learning challenges in Earth System science Images
  33. 33. ‘ground-truth’ original data masked data missing values reconstruction by Deep Convolutional NN Image inpainting to reconstruct temperature missing observations
  34. 34. Hybrid models Physical models ML and DL models
  35. 35. Physical models ML and DL models Lightweighting/simplifying/speeding up physical models ● improve parametrizations ● analysis of model-observations mismatch ● emulation
  36. 36. Physical models ML and DL models Domain knowledge can guide/optimize the pure data-driven methods ● design the architecture ● constrain the cost (or reward) function ● physically based data augmentation: expansion of the data set for undersampled regions
  37. 37. Depth (m) Temp (°C) feature prediction Example: lakes simulations to predict temperature from depth measurements Physical model example: Tempd+1 = Temp d + sun - wind - upwelling given that we measured Td=surface = 15°C
  38. 38. Depth (m) Temp (°C) feature prediction Physical model example: Tempd+1 = Temp d + sun - wind - upwelling given that we measured Td=surface = 15°C Moderate model skills and of course zero inconsistency
  39. 39. Depth (m) Temp (°C) feature prediction Neural Network might allow negative densities! _ Better model skills and but the inconsistency spreads
  40. 40. Depth (m) Density (g/L) Temp (°C) features prediction DATA AUGMENTATION: to include new features that comes from physical knowledge and then NN
  41. 41. Depth (m) Density (g/L) Temp (°C) features prediction DATA AUGMENTATION: to include new features that comes from physical knowledge and then NN Even better model skills and a bit less inconsistency but it still spreads
  42. 42. Depth (m) Density (g/L) Temp (°C) ✓ X features prediction physical model + NN + constrain loss function: denser water must be deeper
  43. 43. Depth (m) Density (g/L) Temp (°C) ✓ X features prediction physical model + NN + constrain loss function: denser water must be deeper Totally consistent and high model skills! Great model performance (~1°C less error) and totally consistent
  44. 44. References ● Earth System figure: https://karenbakker.org/the-climate-system/ ● Data cube image: Earth Syst. Dynam., 11, 201–234 (2020) https://doi.org/10.5194/esd-11-201-2020 ● Climate model image: A. Gettelman and R.B. Rood, Demystifying Climate Models, Earth Systems Data and Models 2, doi: 10.1007/978-3-662-48959-8_5 ● Multimodel figure: Michael Böttinger (DKRZ) and Joachem Marotzke (Max Planck Institute Meteorology) ● Mistral picture: Michael Böttinger (DKRZ) . ● Wildfire picture: https://pixnio.com/miscellaneous/fire-flames-pictures/aerial-ignition-interior-high -rates-of-spread-in-open-savannas ● Hockey stick IPCC https://www.ipcc.ch/report/ar3/wg1/ (Chapter 2) ● Scale invariant issue with chiguagua and dingo: Christian Staudt http://clstaudt.me ● Scale free image http://paulbourke.net/fractals/googleearth/ ● Image inpainting: ○ Nvidia Technology https://www.nvidia.com/research/inpainting/ ○ Kadow et al. (2020), Artificial Intelligence reconstructs missing Climate Information (in review) ● Physics-guided neural networks : https://arxiv.org/pdf/1710.11431.pdf and https://towardsdatascience.com/physics-guided-neural-networks-pgnns-8fe9dbad9414
  45. 45. DKRZ Unit: Machine Learning as a Service ● Provide a knowledge base ● Bring prototypes to production ● Train, educate, and exchange
  46. 46. machinelearning-join@lists.dkrz.de
  47. 47. Summary of the main topics discussed in the kick-off workshop ● Applying machine learning to Earth System modelling ○ Hybrid approaches to (i) improve parametrizations and (ii) validate physical models ○ Increase the availability of training data via (i) data augmentation and (ii) labelling ○ Infer causality of the patterns found in observational data ● Technology ○ Support for Python, portation to GPUs, and larger memory ○ Machine learning libraries for NetDCF data handling ○ Adaptive learning integrated with physical models during running time on the HPC ○ Distributed training and execution ○ Portability between HPC centres ● Uncertainty and reproducibility ○ Performance metrics for (i) unsupervised learning and (ii) data shift/concept drift ○ Adoption of interpretable models and uncertainty quantification and explainability methods ○ Sharing of training scripts and training data or trained model ● Community activities and capacity building: workshops, summer schools,... Artificial intelligence and machine learning activities at DKRZ EXTRASLIDE
  48. 48. Climate models ● physical models derived from first principles (mechanistic) ● used to simulate how the Earth’s climate changes in time (dynamical) ● written in the form of coupled differential equations ● solution depends on boundary and initial conditions ● run with different conditions allows to create scenarios (see for instance, RCP) ● solved numerically with long lasting parallel runs ● calibrated and validated against observational data ● results are called model data EXTRASLIDE year RCP 2.6: the best case scenario historical RPC 8.5: business as usualExample of a basic climate model including time (t) and space (x): Tempt+1,x+1 = Temp t,x + warmingt,x - coolingt,x given that Tempt=0 = 25°C
  49. 49. High Performance Computing Data Center Ongoing efforts to reduce our carbon footprint: ● Power Usage Effectiveness ~1 (PUE = 1.19) ● Cold aisle containment reduce CO2 emissions by 20% ● Hot air is recycled for heating nearby facilities ● Cooling water is recycled in our toilets ● Greener energy supplier possible BEFORE AFTER EXTRASLIDE

×