McKinley, Galen: Physical knowledge to improve and extend machine learning pCO2 reconstructions

Physical knowledge to improve
and extend machine learning
pCO2 reconstructions
Galen A. McKinley
Valerie Bennington, Lucas Gloege, Amanda Fay
Columbia University, Earth and Environmental Science
Lamont-Doherty Earth Observatory
ICOS Science Conference
September 13, 2022

2
Uncertainties remain significant for the ocean carbon sink.
And observation-based products are temporally limited.
Friedlingstein et al., 2022
sink
sink
Fluxes since 1850 (in GtCO2/yr)
Ocean flux since 1960 (in GtCO2/yr)

3
Limitations of observation-based reconstructions of pCO2
• Few observations in 1980s and none prior
• Limited explain-ability from machine
learning
• Existing knowledge about CO2 dynamics is
often not incorporated into algorithms
Gloege et al., JAMES 2021
µatm

4
Limitations of observation-based reconstructions of pCO2
• Few observations in 1980s and none prior
• Limited explain-ability from machine
learning
• Existing knowledge about CO2 dynamics is
often not incorporated into algorithms
Gloege et al., JAMES 2021
µatm

5
The impact of temperature on pCO2 is very well
established

6
Method #1: pCO2-Residual
• Sea Surface Temperature has a known direct impact on
ocean pCO2
• Biogeochemistry and physical processes drive the
remaining variability
• By removing the temperature component, we focus the
statistics on biogeochemical-physical impacts on pCO2
Takahashi et al., 1993
pCO2-T
Bennington et al. 2022, in review
Mean pCO2

7
Mean pCO2
+
pCO2-T (blue) and observed pCO2 (red)
pCO2 Residual = (pCO2-pCO2-T)
monthly SST

8
pCO2-Residual values
make physical sense
and are approximately
normally distributed

9
Reconstruct pCO2-Residual with XGBoost
pCO2 – T = pCO2
pCO2 - Residual
1. XGB learns pCO2-Residual as
function of features
2. Combine with
pCO2-T for final
result Input data (“Features”)
• Satellite data
− Sea Surface Temp. (SST)
− Chlorophyll-a (Chl-a)
▪ Monthly climatological
− Mixed layer depth
− Sea Surface Salinity (SSS)
▪ Location and time
− Day of year (DOY)
− Latitude, Longitude (n-
vector)
▪ xCO2

10
pCO2-Residual performs well against independent data
Bennington et al 2022, in review

11
Flux timeseries, 1985-2019

12
Ocean models have long been used as the basis
for the Global Carbon Budget.
Can these be used to support reconstruction of
real ocean pCO2?

13
Method #2: Hybrid Data Physics
• Use GCB Hindcast models as a first
guess (or “prior”)
• Calculate the difference between
the model pCO2 and SOCAT data
• Apply XGB algorithm to reconstruct
a full-field model correction
• Get final estimate of real-world
pCO2 by adding this estimated
correction at each point
© 222 Lamont-Doherty Earth Observatory
Hindcast ocean biogeochemical models
+ SOCAT Data
Gloege et al. 2022, JAMES

14
Full-coverage misfit individually estimated for each model,
month
Gloege et al. 2022, JAMES pCO2
misfit (uatm)

15
Climatological misfits are much larger than interannual
Princeton Model, others similar Bennington et al. 2022 GRL

16
Since climatological misfit dominates, how much skill is gained by applying
only this as correction, as opposed to an interannual?
• HPDClimTest applies the 2000-2020 climatology of the model-observation misft
1959 2020
1982
LDEO-HPD = Model pCO2 + Interannual Misfit
HPD: Model pCO2 + Climatological Misfit
Observations
Begin
Model Period
Begins
HPDClimTest = Model pCO2 +
Climatological Misfit
Bennington et al. 2022 GRL

17
Most improvement over original models is climatological
Comparison data (1990-2020) not
used in algorithm training: GLODAP
and LDEO pCO2 (not in SOCAT) Bennington et al. 2022 GRL

18

19

20

21
Apply climatological misfit to extend back to 1959
• HPDClimTest applies the 2000-2020 climatology of the model-observation misfit
• Since it dominates, apply this in the pre-observed period
1959 2020
1982
LDEO-HPD = Model pCO2 + Interannual Misfit
LDEO-HPD = Model pCO2 +
Observations
Begin
Model Period
Begins
HPDClimTest = Model pCO2 +

22
LDEO-HPD: Air-sea CO2 flux, 1959-2020

23
Conclusions
• Physical knowledge can be incorporated into machine learning algorithms, and
leads to improved reconstruction skill
• pCO2-Residual
• Focuses the statistics on the biogeochemical-physical component of pCO2
• LDEO-HPD
• Uses suite of hindcast ocean models as a prior, corrects with SOCAT data
• Climatological correction most impactful; supporting extension back to 1959

24
THANK YOU
mckinley@ldeo.columbia.edu

26
Flux timeseries, 1985-2019
Bennington et al. 2022 in review

McKinley, Galen: Physical knowledge to improve and extend machine learning pCO2 reconstructions

Recommended

Recommended

More Related Content

Similar to McKinley, Galen: Physical knowledge to improve and extend machine learning pCO2 reconstructions

Similar to McKinley, Galen: Physical knowledge to improve and extend machine learning pCO2 reconstructions (20)

More from Integrated Carbon Observation System (ICOS)

More from Integrated Carbon Observation System (ICOS) (20)

Recently uploaded

Recently uploaded (20)

McKinley, Galen: Physical knowledge to improve and extend machine learning pCO2 reconstructions