Undergraduate Modeling Workshop - Hierarchical Models for Sparsely Sampled High-Dimensional LiDAR and Forest Variables, Andrew Finley, May 22, 2018

Hierarchical models for sparsely sampled
high-dimensional LiDAR and forest variables: An
interior Alaska FIA case study
Andrew Finley (Michigan State University)
Hans-Erik Andersen (Forest Service, Forest Inventory and Analysis)
Sudipto Banerjee (University of California, LA)
Bruce Cook (NASA Goddard Space Flight Center)
Doug Morton (NASA Goddard Space Flight Center)
SAMSI Undergraduate Modelling Workshop

Extraordinary opportunities to understand the spatial and temporal
complexity of environmental processes at broad scales.
Unprecedented investment to collect, develop, and distribute data and
tools to further large-scale and long-term science.
For example:
National Ecological Observatory Network (NEON)
designed to detect and enable forecasting of ecological change at
continental scales over time — NSF $434 million 30 year project
USDA Forest Service Forest Inventory and Analysis (FIA)
designed to monitor status and trends in forest land — since 1998
FIA measured 510,340 inventory plots across conterminous US
measuring 5,839,642 trees (now with 2+ repeated measurements)!
National Aeronautics and Space Administration (NASA)
Global Ecosystem Dynamics Investigation LiDAR (GEDI) — $95
million 5 year project

Key challenges in spatiotemporal environmental data analysis
Data sets often exhibit:
missingness and misalignment among outcomes
space- and time-varying impact of covariates
complex residual dependence structures
nonstationarity among multiple outcomes across locations
unknown time and perhaps space lags between outcomes and
covariates

Key challenges in spatiotemporal environmental data analysis
Data sets often exhibit:
missingness and misalignment among outcomes
space- and time-varying impact of covariates
complex residual dependence structures
nonstationarity among multiple outcomes across locations
unknown time and perhaps space lags between outcomes and
covariates
Interest in modeling frameworks that:
incorporate many sources of space and time indexed data
accommodate structured residual dependence
propagate uncertainty through to predictions
scale to eﬀectively exploit information in massive data sets

Joint NASA and FIA Forest Service initiative
Project goal: Design and implement an operational forest inventory in
Interior AK by extending sparse networks of ground samples with space
and airborne multi-sensor data.
Data products:
1. Complete coverage maps (e.g., 15×15 m resolution) of forest:
Above ground biomass (AGB; mg/ha)
Basal area (BA; m2
/ha)
Density (TPH; trees/ha)



From inventory plots
Fractional cover (FC; %)
Canopy height (P95; m)
From LiDAR
2. Pixel level prediction with uncertainty estimates
3. Biologically consistent relationships among predictions
4. Reporting with uncertainty for user deﬁned areas
Fun read: www.wired.com/2014/12/alaska-laser-survey-3d-map

Tanana Inventory Unit (TIU)
Data:
1. LiDAR transects
50,000 km ﬂight lines
25 TB data
∼43 million signals
{FC, P95}
2. Inventory plots
1,461 ∼7 m radius
{AGB, BA, TPH}
3. Complete coverage
% forest canopy (TC)
Forest ﬁre (FIRE)

Data:
1. LiDAR transects
25 TB data
{FC, P95}
2. Inventory plots
1,461 ∼7 m radius
{AGB, BA, TPH}
Forest ﬁre (FIRE)
Normalized Intensity
HeightAboveGround(m)
0.00 0.01 0.02 0.03 0.04 0.05
05101520
5%
25%
50%
95%
G−LiHT sensor signal
Percentile heights
(i.e., P95)

Data:
1. LiDAR transects
25 TB data
{FC, P95}
2. Inventory plots
1,461 ∼7 m radius
{AGB, BA, TPH}
Forest ﬁre (FIRE)
Easting (km)
Northing(km)
0 100 200 300 400 500 600
0100200300400500
Forest fire within 20 years

Key inferential challenges
incorporate many sources of spatially indexed data
address misalignment (missingness) among responses
accommodate and leverage residual spatial dependence
propagate parameter uncertainty through to predictions
deliver statistically valid probabilistic prediction of arbitrary areas
maintain observed covariance among multivariate predictions

Key inferential challenges
incorporate many sources of spatially indexed data
address misalignment (missingness) among responses
accommodate and leverage residual spatial dependence
propagate parameter uncertainty through to predictions
deliver statistically valid probabilistic prediction of arbitrary areas
maintain observed covariance among multivariate predictions
Some anticipated extensions:
incorporate time-indexed observations
model nonstationarity among multiple responses across locations
estimate space- and time-varying impact of covariates

Hierarchical Gaussian process models
Say we observe q outcomes at a given location within domain L. A
multivariate spatial regression:
yk ( ) = xk ( ) βk + wk ( ) + ek ( ), for k = 1, 2, . . . , q
yk ( ) is the kth
outcome at generic location (e.g., AGB, BA, TPH,
FC, P95)
Mean: xk ( ) includes an intercept, TC, and FIRE
Cov: w( ) = (w1( ), w2( ), . . . , wq( )) ∼ MVGP(0, Γθ(·, ·)) where
Γθ( , ) = {Cov(wi ( ), wj ( ))} for i, j = 1, 2, . . . , q
Error: e( ) = (e1( ), e2( ), . . . , eq( )) ∼ MVN(0, Ψ)
TIU we must accommodate spatial misalignment (i.e., yk ’s are partially
observed at some locations), see, e.g., Gelfand et al. 2004, Finley et al.
2014.
Skip to results

Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random eﬀect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random eﬀect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance

Spatiotemporal regression models
Start with a simple univariate regression:
y( ) = x( ) β + w( ) + e( )
Potentially very rich: understand spatially- and/or
temporally-varying impact of intercept or predictors on outcome
Produce maps for random eﬀects: {w( ) : ∈ L}
L is spatial domain (e.g., D ⊂ d
) or spatiotemporal domain (e.g.,
D ⊂ d
× +
)
Model-based predictions: y( 0) | {y( 1), y( 2), . . . , y( n)}

Gaussian spatiotemporal process
{w( ) : ∈ L} ∼ GP(0, Kθ(·, ·)) implies
w = (w( 1), w( 2), . . . , w( n)) ∼ MVN(0, Kθ)
for every ﬁnite set of points 1, 2, . . . , n.
Kθ = {Kθ( i , j )} is a spatial variance-covariance matrix, where
θ = {σ, φ}
Stationary: Kθ( , ) = Kθ( − ). Isotropy:
Kθ( , ) = Kθ( − ).

Likelihood from (full rank) GP models
Assuming {w( ) : ∈ L} ∼ GP(0, Kθ(·, ·)) implies
w = (w( 1), w( 2), . . . , w( n)) ∼ MVN(0, Kθ)
Estimating process parameters from the likelihood involves:
p(w) ∝ −
1
2
log det(Kθ) −
1
2
w K−1
θ w
Bayesian inference: priors on θ and many Markov chain Monte Carlo
(MCMC) iterations
See, e.g., Finley et al. 2015 and Finley et al. 2017 for some coding tips.

Computation issues
Storage: n2
pairwise distances to compute Kθ
Kθ is dense; Need to solve Kθx = b and need det(Kθ)
This is best achieved using chol(Kθ) = LDL
Complexity: roughly O(n3
) ﬂops
Computationally infeasible for large datasets

Burgeoning literature on spatial big data
Low-rank models: (Wahba 1990; Higdon 2002; Kamman & Wand 2003;
Paciorek 2007; Rasmussen & Williams 2006; Stein 2007, 2008; Cressie &
Johannesson 2008; Banerjee et al. 2008, 2010; Gramacy & Lee 2008;
Finley et al. 2009; Sang et al. 2011, 2012; Lemos et al. 2011; Guhaniyogi
et al. 2011, 2013; Salazar et al. 2013; Katzfuss 2016)
Spectral approximations and composite likelihoods: (Fuentes 2007;
Paciorek 2007; Eidsvik et al. 2016)
Multi-resolution approaches: (Nychka, 2014; Johannesson et al. 2007;
Matsuo et al. 2010; Tzeng & Huang 2015; Katzfuss 2016)
Sparsity: (Solve Ax = b by (i) sparse A, or (ii) sparse A−1
)
1. Covariance tapering (Furrer et al. 2006; Du et al. 2009; Kaufman et
al. 2009; Shaby and Ruppert 2013)
2. GMRFs to GPs: INLA (Rue et al. 2009; Lindgren et al. 2011)
3. LAGP (Gramacy et al. 2014; Gramacy & Apley 2015)
4. Nearest-neighbor Gaussian Process (NNGP) models (Datta et al.
2015, 2016; Finley et al. 2017)

Reduced (Low) rank models
Kθ ≈ BθK∗
θ Bθ + Dθ
Bθ is n × r matrix of spatial basis functions, r << n
K∗
θ is r × r spatial covariance matrix
Dθ is either diagonal or sparse
Examples: Kernel projections, Splines, Predictive process, FRK,
spectral basis . . .
Computations exploit above structure: roughly O(nr2
) << O(n3
)
ﬂops

Low-rank models: hierarchical approach
N(w∗
| 0, K∗
θ ) × N(w | Bθw∗
, D)
w is n × 1 and n is large
w∗
is r × 1, where r << n; so K∗
θ is r × r
Bθ is n × r is a matrix of “basis” functions
D is n × n, but easy to invert (e.g., diagonal)
Derive var(w) (or var(w∗
| y)) in alternate ways to obtain
(BθK∗
θ Bθ + D)−1
= D−1
− D−1
Bθ(K∗−1
θ + Bθ D−1
Bθ)−1
Bθ D−1
.
This is the famous Sherman-Woodbury-Morrison formula.
Modeling: specifying w∗
and Bθ.
See Finley et al. 2015 for implementation details in spBayes R package

Oversmoothing due to reduced-rank models
(a) True w (b) Full GP (c) PPGP 64 knots
Figure: Comparing full GP vs low-rank GP with 2500 locations. Figure (c)
exhibits oversmoothing by a low-rank process (predictive process with 64 knots)
See Stein 2014 for good reasons not to use reduced-rank spatial models

Simple method of introducing sparsity (e.g. graphical models)
p(w) = N(w | 0, Kθ)
= p(w1)p(w2 | w1)
× p(w3 | w1, w2)
× p(w4 |¨¨w1, w2, w3)
× p(w5 |¨¨w1,¨¨w2, w3, w4)
× p(w6 |¨¨w1,¨¨w2,¨¨w3, w4, w5)
× p(w7 | w1,¨¨w2,¨¨w3, w4,¨¨w5, w6) .
We need to solve n − 1 linear systems of size at most m × m, where m is
the number of neighbors in the conditional set.

Sparse likelihood approximations (Vecchia, 1988; Stein et al., 2004)
With w( ) ∼ GP(0, Kθ(·)), write the joint density p(w) as:
N(w | 0, Kθ) =
n
i=1
p(w( i ) | wH( i ))
≈
n
i=1
p(w( i ) | wN( i )) = N(w | 0, ˜Kθ) .
where N( i ) ⊆ H( i ).
Shrinkage: Choose N( ) as the set of “m nearest-neighbors” among
H( i ). Theory: “Screening” eﬀect of kriging.
˜K−1
θ depends on Kθ, but is sparser with at most nm2
non-zero
entries
Extension to a GP (Datta et al., JASA, 2016) called the Nearest
Neighbor Gaussian Process (NNGP)

(a) True w (b) Full GP (c) PPGP 64 knots
(d) NNGP, m = 10 (e) NNGP, m = 20

q
q
q
q
q
q q
q q q q q q q q q q q q q q q q q q
m
RMSPE
1.15
1.20
1.25
1.30
1.35
q
q
q
q
q
q
q
q
q q q q q q q q q q q q q q q q q
2.10
2.15
2.20
2.25
2.30
2.35
2.40
Mean95%CIwidth
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
q
q
NNGP RMSPE
NNGP Mean 95% CI width
Full GP RMSPE
Full GP Mean 95% CI width
Figure: Choice of m in NNGP models: Out-of-sample Root Mean Squared
Prediction Error (RMSPE) and mean width between the upper and lower 95%
posterior predictive credible intervals for a range of m for the univariate
synthetic data analysis

Figure: Wall time required for one MCMC iteration by number of locations n
and m=10 nearest neighbors (both axes are on the log scale).

Concluding remarks: Storage and computation
Algorithms: Gibbs, RWM, HMC, VB, INLA; NNGP/HMC especially
promising
Model-based solution for spatial “BIG DATA”
Never needs to store n × n distance matrix—store n small m × m
matrices, where m is the number of nearest neighbors considered
and m << n, e.g., m ≈ 15.
Total ﬂop count per iteration is O(nm3
) i.e., linear in n
Scalable to massive datasets because m is small—you never need
more than a few neighbors.
Compare with reduced-rank models: O(nm3
) << O(nr2
).
New R package spNNGP (on CRAN
https://cran.r-project.org/web/packages/spNNGP)

Tanana Valley initial run results
Initial analysis ﬁt the multivariate spatial NNGP model (with
misalignment between inventory plots and LiDAR outcomes) Skip TIU model
Model ﬁt and prediction algorithms written in C with heavy use of
OpenMP for parallelization.
Outcome vector included: AGB, BA, TPH, FC, and P95
AGB, BA, and TPH measured on 1,461 forest inventory plots
FC and P95 measured on 5 million LiDAR pixels
We considered m=15 neighbors for NNGPs
Posterior inference was based on 25k post burn-in MCMC samples
Full GP covariance matrix Kθ would be 5,001,461×5,001,461!
NNGP run time was ∼12 hours (Intel 18 core machine) Prediction for
TIU takes ∼5 days to deliver pixel level posterior distributions.

Brief overview of parameter estimates
β’s
Parameter 50% (2.5% 97.5%)
AGB0 0.86 (0.27, 1.46)
AGBTC 0.10 (0.09, 0.11)
AGBFIRE 0.41 (-0.23, 1.06)
BA0 0.96 (0.66, 1.26)
BATC 0.04 (0.04, 0.05)
BAFIRE 0.05 (-0.29, 0.39)
TPH0 28.17 (24.21, 32.15)
TPHTC 0.21 (0.16, 0.26)
TPHFIRE -0.77 (-5.02, 3.52)
FC0 0.01 (-0.03, 0.06)
FCTC 0.01 (0.01, 0.01)
FCFIRE -0.09 (-0.14, -0.05)
P950 -0.28 (-1.05, 0.49)
P95TC 0.13 (0.12, 0.14)
P95FIRE -0.99 (-1.92, -0.06)
cor(Γ)
Parameter 50% (2.5% 97.5%)
AGB, BA 0.91 (0.91, 0.92)
AGB, TPA 0.09 (0.08, 0.09)
AGB, FC 0.37 (0.31, 0.41)
AGB, P95 0.84 (0.84, 0.85)
BA, TPH 0.09 (0.08, 0.09)
BA, FC 0.18 (0.11, 0.27)
BA, P95 0.73 (0.72, 0.74)
TPA, FC 0.38 (0.27, 0.44)
TPA, P95 -0.05 (-0.06,-0.04)
FC, P95 0.50 (0.45, 0.55)

NNGP m=15 model predicted AGB (posterior predicted mean)

NNGP m=25 model predicted AGB (posterior predicted SD)

Prototype for FIA/NASA TIU data products user interface
http://www.globalfiredata.org/temp/tanana.html

Thank You !
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large
geostatistical datasets. Journal of the American Statistical Association, 111:800-812.
Datta, A., S. Banerjee, A.O. Finley, N.A.S. Hamm, and M. Schaap. (2016) Non-separable Dynamic Nearest Neighbor Gaussian
Process Models for Large Spatio-temporal Data with Application to Particulate Matter Analysis. Annals of Applied Statistics,
31286-1316.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) On Nearest-Neighbor Gaussian Process Models for Massive Spatial
Data. WIREs Computational Statistics, 8:162-171.
Finley, A.O., S. Banerjee, Y., Zhou, B.D. Cook. (2017) Joint hierarchical models for sparsely sampled high-dimensional LiDAR
and forest variables. Remote Sensing of Environment, 1:149-161.
Finley, A.O., A. Datta, B.C. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee (2017) Applying Nearest Neighbor Gaussian
Processes to massive spatial data sets: Forest canopy height prediction across Tanana Valley Alaska.
https://arxiv.org/abs/1702.00434
Finley, A.O., S. Banerjee, A.E. Gelfand. (2015) spBayes for large univariate and multivariate point-referenced spatio-temporal
data models. Journal of Statistical Software, 63:1-28.
Heaton, M.J. A. Datta, A.O. Finley, R. Furrer, R. Guhaniyogi, F. Gerber, R.B. Gramacy, D. Hammerling, M. Katzfuss, F.
Lindgren, D.W. Nychka, F. Sun, and A. Zammit-Mangion. (2017) Methods for analyzing large spatial data: A review and
comparison. https://arxiv.org/abs/1710.05013
Other references provided upon request.

Concluding remarks: Comparisons
Are low-rank spatial models well and truly beaten?
Certainly do not seem to scale as nicely as NNGP
Have somewhat greater theoretical tractability (e.g. Bayesian
asymptotics)
Can be used to ﬂexibly model smoothness
Can be constructed for other processes—e.g., Spatial Dirichlet
Predictive Process
Compare with scalable multi-resolution frameworks (Katzfuss, 2016)
Highly scalable meta-kriging frameworks (Guhaniyogi, 2016)
Future work: High-dimensional multivariate spatial-temporal variable
selection

Undergraduate Modeling Workshop - Hierarchical Models for Sparsely Sampled High-Dimensional LiDAR and Forest Variables, Andrew Finley, May 22, 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Undergraduate Modeling Workshop - Hierarchical Models for Sparsely Sampled High-Dimensional LiDAR and Forest Variables, Andrew Finley, May 22, 2018

Similar to Undergraduate Modeling Workshop - Hierarchical Models for Sparsely Sampled High-Dimensional LiDAR and Forest Variables, Andrew Finley, May 22, 2018 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

Undergraduate Modeling Workshop - Hierarchical Models for Sparsely Sampled High-Dimensional LiDAR and Forest Variables, Andrew Finley, May 22, 2018