CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension Reduction for Complex Physical Forward Models - Emily Kang, Feb 12, 2018

1. Statistical Emulation with Dimension Reduction for Complex Physical Forward Models Emily L. Kang Department of Mathematical Sciences University of Cincinnati SAMSI-JPL Workshop “Remote Sensing, Uncertainty Quantiﬁcation, and the Theory of Data Systems” @Caltech , February 2018 This research is joint with Jon Hobbs (JPL), Alex Konomi (Univ. of Cincinnati), Pulong Ma (Univ. of Cincinnati), and Anirban Mondal (Case Western Reserve Univ.), and Joon Jin Song (Baylor Univ.) Emulator Group February 2018 1 / 25

2. Outlines 1 Introduction 2 Method Dimension Reduction for Y Dimension Reduction for X GP Emulation 3 Ongoing Work and Discussion Emulator Group February 2018 2 / 25

3. Introduction Introduction Complex physical models often require large resources in time and memory to produce realistic results. To characterize the impact of the uncertainties in the conditions or the parameterization of these models (including computer models, simulators), a suﬃcient number of simulations is required. However, this can become extremely costly or even prohibitive. One prevailing way to overcome this hurdle is to construct statistical surrogates, namely emulators, to approximate the complex physical models in a probabilistic way. Emulator Group February 2018 3 / 25

6. Introduction Emulating the Forward Model in Remote Sensing X Y retrieval algorithm N(0, ⌃✏) F(X, B) + ✏ R(Y, B, F) ˆX X ˆX radiancestate vector retrieval Marginal distribution Radiative transfer (RF) computation within the forward model F(X, B) is usually the rate-limiting step in many remote sensing applications. Emulator Group February 2018 4 / 25

7. Introduction Emulating the Forward Model in Remote Sensing X Y retrieval algorithm N(0, ⌃✏) F(X, B) + ✏ R(Y, B, F) ˆX X ˆX radiancestate vector retrieval Marginal distribution Radiative transfer (RF) computation within the forward model F(X, B) is usually the rate-limiting step in many remote sensing applications. Emulator Group February 2018 4 / 25

8. Introduction SCO2 O2 WCO2 0 250 500 750 1000 0 250 500 750 1000 0.0e+00 2.5e+19 5.0e+19 7.5e+19 1.0e+20 0.0e+00 2.5e+19 5.0e+19 7.5e+19 1.0e+20 Wavelength Index Radiance Sounding 1 2 3 4 5 6 7 8 9 10 Emulator Group February 2018 5 / 25

9. Introduction Goal and Motivation We would like to build a statistical emulator ˆF(X, B) to reproduce radiances, which does not comprise much on accuracy while enhancing computational eﬃciency. We can employ this computationally more eﬃcient emulator for any subsequent purposes such as uncertainty propagation, sensitivity analysis, and calibration. The emulator can also be potentially used to speedup the retrieval algorithm process. Emulator Group February 2018 6 / 25

10. Introduction Challenges The outputs of F(X, B) are high-dimensional. In OCO-2, the outputs are radiances at hundreds of wavelengths from three bands, the O2 band, the weak CO2 band, and the strong CO2 band. Directly emulating the relationship between high-dimensional outputs and inputs can be complicated, in terms of both computation and modeling. F(X, B) is a nonlinear complex function with high-dimensional inputs X, called the state vector. In OCO-2, X is m-dimensional with m = 62. Theoretical results in the literature have shown that the goodness of the approximation of a target function using a statistical emulator such as a Gaussian process (GP) deteriorates as the dimension m increases. Emulator Group February 2018 7 / 25

13. Method Emulation with Dimension Reduction We propose a framework to perform dimension reduction for both inputs and outputs and then build a GP emulator on the low-dimensional spaces. Step 1: Given a set of n available pairs of state vectors and soundings (X1, Y1), . . . , (X2, Yn), perform dimension reduction for X and Y: Dimension reduction for Y: Performing functional principal component analysis (FPCA) to obtain the functional principal component scores ξ Dimension reduction for X: Using the active subspace (AS) approach to project X to a low-dimensional vector S = PX Step 2: Build a GP emulator using the low-dimensional pairs (Si , ξi ), i = 1, . . . , n. Emulator Group February 2018 8 / 25

18. Method The OCO-2 Radiance Data SCO2 O2 WCO2 0 250 500 750 1000 0 250 500 750 1000 0.0e+00 2.5e+19 5.0e+19 7.5e+19 1.0e+20 0.0e+00 2.5e+19 5.0e+19 7.5e+19 1.0e+20 Wavelength Index Radiance Sounding 1 2 3 4 5 6 7 8 9 10 The wavelengths vary from sounding to sounding. Many data are missing. Emulator Group February 2018 9 / 25

19. Method Dimension Reduction for Y Deﬁne Yijk as the radiance at wavelength ωijk for the ith sounding and jth spectral band, where i = 1, . . . , n, j = 1, 2, 3, and k = 1, . . . , Nij . Here, Nij can vary across both soundings (indexed by i) and spectral bands (indexed by j). Since we have radiances at irregular wavelengths, we perform functional principal component analysis (FPCA) on the radiances to reduce dimension. Emulator Group February 2018 10 / 25

20. Method Dimension Reduction for Y Dimension Reduction for Radiances We model the radiances as realizations of a random function, Yijk = µj (ωijk) + ∞ l=1 ξijl φjl (ωijk); µj (·) denotes the mean function for the jth spectral band, j = 1, 2, 3; φjl (·) denotes the lth eigenfunction for the jth spectral band, corresponding to nonincreasing eigenvalues λjl ; {ξijl } are random variables with mean 0 and var(ξijl ) = λjl . {ξijl } are called the functional principal component (FPC) scores. Emulator Group February 2018 11 / 25

25. Method Dimension Reduction for Y Computational Methods for FPCA Various methods have been proposed for FPCA, including Ramsay and Silverman (2005), Yao et al. (2005), and Li and Hsing (2010). As in the previous work, we carry out FPCA for the jth spectral band: Estimate the mean curve µj using a local linea smoother (Fan and Gijbels, 2996); Estimate the covariance function Gj (ω1, ω2) = l λjl φjl (ω1)φjl (ω2) using a local linear smoother; Obtain estimates of the eigenfunctions and eigenvalues, ˆφjl and ˆλjl ; Estimate the FPC score ξijl = (Yij (ω) − µj (ω))φjl (ω)dω by numerical integration: ˆξijl = Nij k=1 (Yijk − ˆµj (ωijk ))ˆφjl (ωijk )(ωijk − ωij(k−1)). Emulator Group February 2018 12 / 25

30. Method Dimension Reduction for Y For the jth spectral band, by choosing Kj principal components, we transform the original Nij -dimensional outputs Yij = (Yij1, . . . , YijNij ) to a Kj -dimensional output ξij = (ξij1, . . . , ξijKj ) , i = 1, . . . , n. The number of principal components, Kj is chosen using leave-one-curve-out or K-fold cross validations, and can be potentially diﬀerent from the three bands. Let ξi ≡ (ξi1, ξi2, ξi3) . It has K = 3 j=1 Kj elements, and is lower dimensional vector compared to the original ith sounding Yi ≡ (Yi1, Yi2, Yi3) , for i = 1, . . . , n. Our preliminary results show that with Kj = 2 or 3, we can achieve nearly lossless representationtion with at least 99% of variation is preserved. Emulator Group February 2018 13 / 25

34. Method Dimension Reduction for X Inputs X The inputs of the forward function is an m-dimensional state vector X. In OCO-2, m = 62. An intuitive way to reduce the dimension is just to apply the principal components analysis (PCA). However, PCA only considers the correlation structure within X and ignores the role of outputs Y. We call R(X) ∈ Rd where d < m a suﬃcient dimension reduction if p(Y|X) = p(Y|R(X)), where p(Y|X) and p(Y|R(X)) are conditional probability density functions with respect to X and R(X), respectively. Emulator Group February 2018 14 / 25

37. Method Dimension Reduction for X Therefore, we want to detect subspace Rd ⊂ Rm that the loglikelihood of Y, f , is most sensitive to. 1. Compute the gradient f ≡ Xf (X). 2. Compute eigenvalue decomposition of the m × m matrix C ≡ Eπ(X)[( f )( f ) ]: C = WΛWT , where W is the orthogonal matrix and Λ = diag{λ1, . . . , λm} is the diagonal matrix of ordered eigenvalues with λ1 ≥ λ2 ≥ · · · ≥ λm ≥ 0. 3. Let W ≡ [W1, W2], and Λ ≡ blockdiag{Λ1, Λ2}, where Λ1 contains the largest d eigenvalues and W1 contains the corresponding d eigenvectors. The column space of W1 is called active subspace. S = W1X is called active variable. U = W2X is called inactive variable. X = W1S + W2U. Emulator Group February 2018 15 / 25

41. Method Dimension Reduction for X Some More Details Given data (X1, Y1), . . . , (Xn, Yn), we can use Monte Carlo approximation to identify the active subspace by calculating fi ≡ Xi f (Xi ) for i = 1, . . . , n; computing ˆC ≡ 1/n n i=1 fi fi . The input space is then changed from the original Rm to the active subspace within Rd . The value of d can be determined based on the gap of eigenvalues and the summation d j=1 λj / m j=1 λj . Emulator Group February 2018 16 / 25

44. Method Dimension Reduction for X Results of Dimension Reduction of X based on PCA Figure: Eigenvalues based on PCA Emulator Group February 2018 17 / 25

45. Method Dimension Reduction for X Results of Dimension Reduction of X based on AS Figure: Eigenvalues based on AS Emulator Group February 2018 18 / 25

46. Method Dimension Reduction for X Results based on PCA, cont’d Figure: Eigenvectors based on PCA Emulator Group February 2018 19 / 25

47. Method Dimension Reduction for X Results based on AS, cont’d Figure: Eigenvectors based on AS Emulator Group February 2018 20 / 25

48. Method GP Emulation Gaussian Process (GP) Emulation Recall that ξi = (ξi1, ξi2, ξi3) denotes the K = 3 j=1 Kj FPC scores for the ith sounding, i = 1, . . . , n. The active variable corresponding to Xi is denoted by Si = W1Xi . Let B denote a b-dimensional vector of other parameters. Here, for OCO-2 emulation problem, it includes solar zenith angle, solar azimuth angle, instrument zenith angle, instrument azimuth angle. We assume a GP model on ξ(S; B): ξ(S; B) ∼ GP{µ, Σ(·, ·)} µ ∈ RK is the mean vector; Σ(·, ·) : Rd+b × Rd+b → RK×K is the corresponding cross-covariance matrix function. We will begin with commonly used parametric cross-covariance functions discussed in Genton and Kleiber (2015). Emulator Group February 2018 21 / 25

52. Ongoing Work and Discussion Ongoing Work We will carry out empirical studies to investigate how well the approximations (dimension reductions in both Y and X and the emulator) retain most of the information about the input-output behavior. We will compare our method with other techniques in terms of eﬃciency and accuracy. We are also investigating the theoretical properties of the proposed approach. Our method can be applied to set up simulation studies for uncertainty quantiﬁcation of retrieval algorithms, sensitivity analysis and calibration. Emulator Group February 2018 22 / 25

56. Ongoing Work and Discussion Discussion The methods we used in this emulation problem can be used to answer some questions in order to solve the inverse problem in the retrieval algorithms as well. When we substitute the emulator for the forward function in the retrieval algorithm, the associated inverse problem can be solved much more efficiently, which can be used to warm start the original optimal estimation procedure. We will also apply the dimension reduction techniques directly for the inverse problem which will potentially improve the computational efficiency. Incorporating spatial dependence: spatial X/spatial Y Distributed computation: Model globally, fit locally Emulator Group February 2018 23 / 25

61. Acknowledgements This research was partially supported by the National Science Foundation under Grant DMS-1638521 to the Statistical and Applied Mathematical Sciences Institute (SAMSI). Any opinions, ﬁndings, and conclusions or recommendations expressed in this material do not necessarily reﬂect the views of the National Science Foundation. We would like to thank other members of the Working Group on Remote Sensing in the SAMSI Program on Mathematical and Statistical Methods for Climate and the Earth System(CLIM) for helpful discussion and suggestions. Emulator Group February 2018 24 / 25

62. Thank you! Emulator Group February 2018 25 / 25

CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension Reduction for Complex Physical Forward Models - Emily Kang, Feb 12, 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension Reduction for Complex Physical Forward Models - Emily Kang, Feb 12, 2018

Similar to CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension Reduction for Complex Physical Forward Models - Emily Kang, Feb 12, 2018 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension Reduction for Complex Physical Forward Models - Emily Kang, Feb 12, 2018