Hierarchical models for sparsely sampled
high-dimensional LiDAR and forest variables: An
interior Alaska FIA case study
Andrew Finley (Michigan State University)
Hans-Erik Andersen (Forest Service, Forest Inventory and Analysis)
Sudipto Banerjee (University of California, LA)
Bruce Cook (NASA Goddard Space Flight Center)
Doug Morton (NASA Goddard Space Flight Center)
SAMSI Undergraduate Modelling Workshop
Extraordinary opportunities to understand the spatial and temporal
complexity of environmental processes at broad scales.
Unprecedented investment to collect, develop, and distribute data and
tools to further large-scale and long-term science.
For example:
National Ecological Observatory Network (NEON)
designed to detect and enable forecasting of ecological change at
continental scales over time — NSF $434 million 30 year project
USDA Forest Service Forest Inventory and Analysis (FIA)
designed to monitor status and trends in forest land — since 1998
FIA measured 510,340 inventory plots across conterminous US
measuring 5,839,642 trees (now with 2+ repeated measurements)!
National Aeronautics and Space Administration (NASA)
Global Ecosystem Dynamics Investigation LiDAR (GEDI) — $95
million 5 year project
SAMSI Undergraduate Modelling Workshop
Key challenges in spatiotemporal environmental data analysis
Data sets often exhibit:
missingness and misalignment among outcomes
space- and time-varying impact of covariates
complex residual dependence structures
nonstationarity among multiple outcomes across locations
unknown time and perhaps space lags between outcomes and
covariates
SAMSI Undergraduate Modelling Workshop
Key challenges in spatiotemporal environmental data analysis
Data sets often exhibit:
missingness and misalignment among outcomes
space- and time-varying impact of covariates
complex residual dependence structures
nonstationarity among multiple outcomes across locations
unknown time and perhaps space lags between outcomes and
covariates
Interest in modeling frameworks that:
incorporate many sources of space and time indexed data
accommodate structured residual dependence
propagate uncertainty through to predictions
scale to effectively exploit information in massive data sets
SAMSI Undergraduate Modelling Workshop
Joint NASA and FIA Forest Service initiative
Project goal: Design and implement an operational forest inventory in
Interior AK by extending sparse networks of ground samples with space
and airborne multi-sensor data.
Data products:
1. Complete coverage maps (e.g., 15×15 m resolution) of forest:
Above ground biomass (AGB; mg/ha)
Basal area (BA; m2
/ha)
Density (TPH; trees/ha)



From inventory plots
Fractional cover (FC; %)
Canopy height (P95; m)
From LiDAR
2. Pixel level prediction with uncertainty estimates
3. Biologically consistent relationships among predictions
4. Reporting with uncertainty for user defined areas
Fun read: www.wired.com/2014/12/alaska-laser-survey-3d-map
SAMSI Undergraduate Modelling Workshop
Tanana Inventory Unit (TIU)
Data:
1. LiDAR transects
50,000 km flight lines
25 TB data
∼43 million signals
{FC, P95}
2. Inventory plots
1,461 ∼7 m radius
{AGB, BA, TPH}
3. Complete coverage
% forest canopy (TC)
Forest fire (FIRE)
SAMSI Undergraduate Modelling Workshop
Tanana Inventory Unit (TIU)
Data:
1. LiDAR transects
50,000 km flight lines
25 TB data
∼43 million signals
{FC, P95}
2. Inventory plots
1,461 ∼7 m radius
{AGB, BA, TPH}
3. Complete coverage
% forest canopy (TC)
Forest fire (FIRE)
Normalized Intensity
HeightAboveGround(m)
0.00 0.01 0.02 0.03 0.04 0.05
05101520
5%
25%
50%
95%
G−LiHT sensor signal
Percentile heights
(i.e., P95)
SAMSI Undergraduate Modelling Workshop
Tanana Inventory Unit (TIU)
Data:
1. LiDAR transects
50,000 km flight lines
25 TB data
∼43 million signals
{FC, P95}
2. Inventory plots
1,461 ∼7 m radius
{AGB, BA, TPH}
3. Complete coverage
% forest canopy (TC)
Forest fire (FIRE)
SAMSI Undergraduate Modelling Workshop
Tanana Inventory Unit (TIU)
Data:
1. LiDAR transects
50,000 km flight lines
25 TB data
∼43 million signals
{FC, P95}
2. Inventory plots
1,461 ∼7 m radius
{AGB, BA, TPH}
3. Complete coverage
% forest canopy (TC)
Forest fire (FIRE)
Easting (km)
Northing(km)
0 100 200 300 400 500 600
0100200300400500
Forest fire within 20 years
SAMSI Undergraduate Modelling Workshop
Key inferential challenges
incorporate many sources of spatially indexed data
address misalignment (missingness) among responses
accommodate and leverage residual spatial dependence
propagate parameter uncertainty through to predictions
deliver statistically valid probabilistic prediction of arbitrary areas
maintain observed covariance among multivariate predictions
scale to effectively exploit information in massive data sets
SAMSI Undergraduate Modelling Workshop
Key inferential challenges
incorporate many sources of spatially indexed data
address misalignment (missingness) among responses
accommodate and leverage residual spatial dependence
propagate parameter uncertainty through to predictions
deliver statistically valid probabilistic prediction of arbitrary areas
maintain observed covariance among multivariate predictions
scale to effectively exploit information in massive data sets
Some anticipated extensions:
incorporate time-indexed observations
model nonstationarity among multiple responses across locations
estimate space- and time-varying impact of covariates
SAMSI Undergraduate Modelling Workshop
Hierarchical Gaussian process models
Say we observe q outcomes at a given location within domain L. A
multivariate spatial regression:
yk ( ) = xk ( ) βk + wk ( ) + ek ( ), for k = 1, 2, . . . , q
yk ( ) is the kth
outcome at generic location (e.g., AGB, BA, TPH,
FC, P95)
Mean: xk ( ) includes an intercept, TC, and FIRE
Cov: w( ) = (w1( ), w2( ), . . . , wq( )) ∼ MVGP(0, Γθ(·, ·)) where
Γθ( , ) = {Cov(wi ( ), wj ( ))} for i, j = 1, 2, . . . , q
Error: e( ) = (e1( ), e2( ), . . . , eq( )) ∼ MVN(0, Ψ)
TIU we must accommodate spatial misalignment (i.e., yk ’s are partially
observed at some locations), see, e.g., Gelfand et al. 2004, Finley et al.
2014.
Skip to results
SAMSI Undergraduate Modelling Workshop
Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random effect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random effect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance
SAMSI Undergraduate Modelling Workshop
Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random effect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random effect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance
SAMSI Undergraduate Modelling Workshop
Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random effect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random effect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance
SAMSI Undergraduate Modelling Workshop
Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random effect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random effect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance
SAMSI Undergraduate Modelling Workshop
Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random effect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random effect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance
SAMSI Undergraduate Modelling Workshop
Spatiotemporal regression models
Start with a simple univariate regression:
y( ) = x( ) β + w( ) + e( )
Potentially very rich: understand spatially- and/or
temporally-varying impact of intercept or predictors on outcome
Produce maps for random effects: {w( ) : ∈ L}
L is spatial domain (e.g., D ⊂ d
) or spatiotemporal domain (e.g.,
D ⊂ d
× +
)
Model-based predictions: y( 0) | {y( 1), y( 2), . . . , y( n)}
SAMSI Undergraduate Modelling Workshop
Gaussian spatiotemporal process
{w( ) : ∈ L} ∼ GP(0, Kθ(·, ·)) implies
w = (w( 1), w( 2), . . . , w( n)) ∼ MVN(0, Kθ)
for every finite set of points 1, 2, . . . , n.
Kθ = {Kθ( i , j )} is a spatial variance-covariance matrix, where
θ = {σ, φ}
Stationary: Kθ( , ) = Kθ( − ). Isotropy:
Kθ( , ) = Kθ( − ).
SAMSI Undergraduate Modelling Workshop
Likelihood from (full rank) GP models
Assuming {w( ) : ∈ L} ∼ GP(0, Kθ(·, ·)) implies
w = (w( 1), w( 2), . . . , w( n)) ∼ MVN(0, Kθ)
Estimating process parameters from the likelihood involves:
p(w) ∝ −
1
2
log det(Kθ) −
1
2
w K−1
θ w
Bayesian inference: priors on θ and many Markov chain Monte Carlo
(MCMC) iterations
See, e.g., Finley et al. 2015 and Finley et al. 2017 for some coding tips.
SAMSI Undergraduate Modelling Workshop
Computation issues
Storage: n2
pairwise distances to compute Kθ
Kθ is dense; Need to solve Kθx = b and need det(Kθ)
This is best achieved using chol(Kθ) = LDL
Complexity: roughly O(n3
) flops
Computationally infeasible for large datasets
SAMSI Undergraduate Modelling Workshop
Burgeoning literature on spatial big data
Low-rank models: (Wahba 1990; Higdon 2002; Kamman & Wand 2003;
Paciorek 2007; Rasmussen & Williams 2006; Stein 2007, 2008; Cressie &
Johannesson 2008; Banerjee et al. 2008, 2010; Gramacy & Lee 2008;
Finley et al. 2009; Sang et al. 2011, 2012; Lemos et al. 2011; Guhaniyogi
et al. 2011, 2013; Salazar et al. 2013; Katzfuss 2016)
Spectral approximations and composite likelihoods: (Fuentes 2007;
Paciorek 2007; Eidsvik et al. 2016)
Multi-resolution approaches: (Nychka, 2014; Johannesson et al. 2007;
Matsuo et al. 2010; Tzeng & Huang 2015; Katzfuss 2016)
Sparsity: (Solve Ax = b by (i) sparse A, or (ii) sparse A−1
)
1. Covariance tapering (Furrer et al. 2006; Du et al. 2009; Kaufman et
al. 2009; Shaby and Ruppert 2013)
2. GMRFs to GPs: INLA (Rue et al. 2009; Lindgren et al. 2011)
3. LAGP (Gramacy et al. 2014; Gramacy & Apley 2015)
4. Nearest-neighbor Gaussian Process (NNGP) models (Datta et al.
2015, 2016; Finley et al. 2017)
SAMSI Undergraduate Modelling Workshop
Reduced (Low) rank models
Kθ ≈ BθK∗
θ Bθ + Dθ
Bθ is n × r matrix of spatial basis functions, r << n
K∗
θ is r × r spatial covariance matrix
Dθ is either diagonal or sparse
Examples: Kernel projections, Splines, Predictive process, FRK,
spectral basis . . .
Computations exploit above structure: roughly O(nr2
) << O(n3
)
flops
SAMSI Undergraduate Modelling Workshop
Low-rank models: hierarchical approach
N(w∗
| 0, K∗
θ ) × N(w | Bθw∗
, D)
w is n × 1 and n is large
w∗
is r × 1, where r << n; so K∗
θ is r × r
Bθ is n × r is a matrix of “basis” functions
D is n × n, but easy to invert (e.g., diagonal)
Derive var(w) (or var(w∗
| y)) in alternate ways to obtain
(BθK∗
θ Bθ + D)−1
= D−1
− D−1
Bθ(K∗−1
θ + Bθ D−1
Bθ)−1
Bθ D−1
.
This is the famous Sherman-Woodbury-Morrison formula.
Modeling: specifying w∗
and Bθ.
See Finley et al. 2015 for implementation details in spBayes R package
SAMSI Undergraduate Modelling Workshop
Oversmoothing due to reduced-rank models
(a) True w (b) Full GP (c) PPGP 64 knots
Figure: Comparing full GP vs low-rank GP with 2500 locations. Figure (c)
exhibits oversmoothing by a low-rank process (predictive process with 64 knots)
See Stein 2014 for good reasons not to use reduced-rank spatial models
SAMSI Undergraduate Modelling Workshop
Simple method of introducing sparsity (e.g., graphical models)
p(w) = N(w | 0, Kθ)
= p(w1)p(w2 | w1)
× p(w3 | w1, w2)
× p(w4 | w1, w2, w3)
× p(w5 | w1, w2, w3, w4)
× p(w6 | w1, w2, . . . , w5)
× p(w7 | w1, w2, . . . , w6) .
SAMSI Undergraduate Modelling Workshop
Simple method of introducing sparsity (e.g. graphical models)
p(w) = N(w | 0, Kθ)
= p(w1)p(w2 | w1)
× p(w3 | w1, w2)
× p(w4 |¨¨w1, w2, w3)
× p(w5 |¨¨w1,¨¨w2, w3, w4)
× p(w6 |¨¨w1,¨¨w2,¨¨w3, w4, w5)
× p(w7 | w1,¨¨w2,¨¨w3, w4,¨¨w5, w6) .
We need to solve n − 1 linear systems of size at most m × m, where m is
the number of neighbors in the conditional set.
SAMSI Undergraduate Modelling Workshop
Sparse likelihood approximations (Vecchia, 1988; Stein et al., 2004)
With w( ) ∼ GP(0, Kθ(·)), write the joint density p(w) as:
N(w | 0, Kθ) =
n
i=1
p(w( i ) | wH( i ))
≈
n
i=1
p(w( i ) | wN( i )) = N(w | 0, ˜Kθ) .
where N( i ) ⊆ H( i ).
Shrinkage: Choose N( ) as the set of “m nearest-neighbors” among
H( i ). Theory: “Screening” effect of kriging.
˜K−1
θ depends on Kθ, but is sparser with at most nm2
non-zero
entries
Extension to a GP (Datta et al., JASA, 2016) called the Nearest
Neighbor Gaussian Process (NNGP)
SAMSI Undergraduate Modelling Workshop
(a) True w (b) Full GP (c) PPGP 64 knots
(d) NNGP, m = 10 (e) NNGP, m = 20
SAMSI Undergraduate Modelling Workshop
q
q
q
q
q
q q
q q q q q q q q q q q q q q q q q q
m
RMSPE
1.15
1.20
1.25
1.30
1.35
q
q
q
q
q
q
q
q
q q q q q q q q q q q q q q q q q
2.10
2.15
2.20
2.25
2.30
2.35
2.40
Mean95%CIwidth
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
q
q
NNGP RMSPE
NNGP Mean 95% CI width
Full GP RMSPE
Full GP Mean 95% CI width
Figure: Choice of m in NNGP models: Out-of-sample Root Mean Squared
Prediction Error (RMSPE) and mean width between the upper and lower 95%
posterior predictive credible intervals for a range of m for the univariate
synthetic data analysis
SAMSI Undergraduate Modelling Workshop
Figure: Wall time required for one MCMC iteration by number of locations n
and m=10 nearest neighbors (both axes are on the log scale).
SAMSI Undergraduate Modelling Workshop
Concluding remarks: Storage and computation
Algorithms: Gibbs, RWM, HMC, VB, INLA; NNGP/HMC especially
promising
Model-based solution for spatial “BIG DATA”
Never needs to store n × n distance matrix—store n small m × m
matrices, where m is the number of nearest neighbors considered
and m << n, e.g., m ≈ 15.
Total flop count per iteration is O(nm3
) i.e., linear in n
Scalable to massive datasets because m is small—you never need
more than a few neighbors.
Compare with reduced-rank models: O(nm3
) << O(nr2
).
New R package spNNGP (on CRAN
https://cran.r-project.org/web/packages/spNNGP)
SAMSI Undergraduate Modelling Workshop
Tanana Valley initial run results
Initial analysis fit the multivariate spatial NNGP model (with
misalignment between inventory plots and LiDAR outcomes) Skip TIU model
Model fit and prediction algorithms written in C with heavy use of
OpenMP for parallelization.
Outcome vector included: AGB, BA, TPH, FC, and P95
AGB, BA, and TPH measured on 1,461 forest inventory plots
FC and P95 measured on 5 million LiDAR pixels
We considered m=15 neighbors for NNGPs
Posterior inference was based on 25k post burn-in MCMC samples
Full GP covariance matrix Kθ would be 5,001,461×5,001,461!
NNGP run time was ∼12 hours (Intel 18 core machine) Prediction for
TIU takes ∼5 days to deliver pixel level posterior distributions.
SAMSI Undergraduate Modelling Workshop
Brief overview of parameter estimates
β’s
Parameter 50% (2.5% 97.5%)
AGB0 0.86 (0.27, 1.46)
AGBTC 0.10 (0.09, 0.11)
AGBFIRE 0.41 (-0.23, 1.06)
BA0 0.96 (0.66, 1.26)
BATC 0.04 (0.04, 0.05)
BAFIRE 0.05 (-0.29, 0.39)
TPH0 28.17 (24.21, 32.15)
TPHTC 0.21 (0.16, 0.26)
TPHFIRE -0.77 (-5.02, 3.52)
FC0 0.01 (-0.03, 0.06)
FCTC 0.01 (0.01, 0.01)
FCFIRE -0.09 (-0.14, -0.05)
P950 -0.28 (-1.05, 0.49)
P95TC 0.13 (0.12, 0.14)
P95FIRE -0.99 (-1.92, -0.06)
cor(Γ)
Parameter 50% (2.5% 97.5%)
AGB, BA 0.91 (0.91, 0.92)
AGB, TPA 0.09 (0.08, 0.09)
AGB, FC 0.37 (0.31, 0.41)
AGB, P95 0.84 (0.84, 0.85)
BA, TPH 0.09 (0.08, 0.09)
BA, FC 0.18 (0.11, 0.27)
BA, P95 0.73 (0.72, 0.74)
TPA, FC 0.38 (0.27, 0.44)
TPA, P95 -0.05 (-0.06,-0.04)
FC, P95 0.50 (0.45, 0.55)
SAMSI Undergraduate Modelling Workshop
NNGP m=15 model predicted AGB (posterior predicted mean)
SAMSI Undergraduate Modelling Workshop
NNGP m=25 model predicted AGB (posterior predicted SD)
SAMSI Undergraduate Modelling Workshop
Prototype for FIA/NASA TIU data products user interface
http://www.globalfiredata.org/temp/tanana.html
SAMSI Undergraduate Modelling Workshop
Prototype for FIA/NASA TIU data products user interface
http://www.globalfiredata.org/temp/tanana.html
SAMSI Undergraduate Modelling Workshop
Thank You !
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large
geostatistical datasets. Journal of the American Statistical Association, 111:800-812.
Datta, A., S. Banerjee, A.O. Finley, N.A.S. Hamm, and M. Schaap. (2016) Non-separable Dynamic Nearest Neighbor Gaussian
Process Models for Large Spatio-temporal Data with Application to Particulate Matter Analysis. Annals of Applied Statistics,
31286-1316.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) On Nearest-Neighbor Gaussian Process Models for Massive Spatial
Data. WIREs Computational Statistics, 8:162-171.
Finley, A.O., S. Banerjee, Y., Zhou, B.D. Cook. (2017) Joint hierarchical models for sparsely sampled high-dimensional LiDAR
and forest variables. Remote Sensing of Environment, 1:149-161.
Finley, A.O., A. Datta, B.C. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee (2017) Applying Nearest Neighbor Gaussian
Processes to massive spatial data sets: Forest canopy height prediction across Tanana Valley Alaska.
https://arxiv.org/abs/1702.00434
Finley, A.O., S. Banerjee, A.E. Gelfand. (2015) spBayes for large univariate and multivariate point-referenced spatio-temporal
data models. Journal of Statistical Software, 63:1-28.
Heaton, M.J. A. Datta, A.O. Finley, R. Furrer, R. Guhaniyogi, F. Gerber, R.B. Gramacy, D. Hammerling, M. Katzfuss, F.
Lindgren, D.W. Nychka, F. Sun, and A. Zammit-Mangion. (2017) Methods for analyzing large spatial data: A review and
comparison. https://arxiv.org/abs/1710.05013
Other references provided upon request.
SAMSI Undergraduate Modelling Workshop
Concluding remarks: Comparisons
Are low-rank spatial models well and truly beaten?
Certainly do not seem to scale as nicely as NNGP
Have somewhat greater theoretical tractability (e.g. Bayesian
asymptotics)
Can be used to flexibly model smoothness
Can be constructed for other processes—e.g., Spatial Dirichlet
Predictive Process
Compare with scalable multi-resolution frameworks (Katzfuss, 2016)
Highly scalable meta-kriging frameworks (Guhaniyogi, 2016)
Future work: High-dimensional multivariate spatial-temporal variable
selection
SAMSI Undergraduate Modelling Workshop

Undergraduate Modeling Workshop - Hierarchical Models for Sparsely Sampled High-Dimensional LiDAR and Forest Variables, Andrew Finley, May 22, 2018

  • 1.
    Hierarchical models forsparsely sampled high-dimensional LiDAR and forest variables: An interior Alaska FIA case study Andrew Finley (Michigan State University) Hans-Erik Andersen (Forest Service, Forest Inventory and Analysis) Sudipto Banerjee (University of California, LA) Bruce Cook (NASA Goddard Space Flight Center) Doug Morton (NASA Goddard Space Flight Center) SAMSI Undergraduate Modelling Workshop
  • 2.
    Extraordinary opportunities tounderstand the spatial and temporal complexity of environmental processes at broad scales. Unprecedented investment to collect, develop, and distribute data and tools to further large-scale and long-term science. For example: National Ecological Observatory Network (NEON) designed to detect and enable forecasting of ecological change at continental scales over time — NSF $434 million 30 year project USDA Forest Service Forest Inventory and Analysis (FIA) designed to monitor status and trends in forest land — since 1998 FIA measured 510,340 inventory plots across conterminous US measuring 5,839,642 trees (now with 2+ repeated measurements)! National Aeronautics and Space Administration (NASA) Global Ecosystem Dynamics Investigation LiDAR (GEDI) — $95 million 5 year project SAMSI Undergraduate Modelling Workshop
  • 3.
    Key challenges inspatiotemporal environmental data analysis Data sets often exhibit: missingness and misalignment among outcomes space- and time-varying impact of covariates complex residual dependence structures nonstationarity among multiple outcomes across locations unknown time and perhaps space lags between outcomes and covariates SAMSI Undergraduate Modelling Workshop
  • 4.
    Key challenges inspatiotemporal environmental data analysis Data sets often exhibit: missingness and misalignment among outcomes space- and time-varying impact of covariates complex residual dependence structures nonstationarity among multiple outcomes across locations unknown time and perhaps space lags between outcomes and covariates Interest in modeling frameworks that: incorporate many sources of space and time indexed data accommodate structured residual dependence propagate uncertainty through to predictions scale to effectively exploit information in massive data sets SAMSI Undergraduate Modelling Workshop
  • 5.
    Joint NASA andFIA Forest Service initiative Project goal: Design and implement an operational forest inventory in Interior AK by extending sparse networks of ground samples with space and airborne multi-sensor data. Data products: 1. Complete coverage maps (e.g., 15×15 m resolution) of forest: Above ground biomass (AGB; mg/ha) Basal area (BA; m2 /ha) Density (TPH; trees/ha)    From inventory plots Fractional cover (FC; %) Canopy height (P95; m) From LiDAR 2. Pixel level prediction with uncertainty estimates 3. Biologically consistent relationships among predictions 4. Reporting with uncertainty for user defined areas Fun read: www.wired.com/2014/12/alaska-laser-survey-3d-map SAMSI Undergraduate Modelling Workshop
  • 6.
    Tanana Inventory Unit(TIU) Data: 1. LiDAR transects 50,000 km flight lines 25 TB data ∼43 million signals {FC, P95} 2. Inventory plots 1,461 ∼7 m radius {AGB, BA, TPH} 3. Complete coverage % forest canopy (TC) Forest fire (FIRE) SAMSI Undergraduate Modelling Workshop
  • 7.
    Tanana Inventory Unit(TIU) Data: 1. LiDAR transects 50,000 km flight lines 25 TB data ∼43 million signals {FC, P95} 2. Inventory plots 1,461 ∼7 m radius {AGB, BA, TPH} 3. Complete coverage % forest canopy (TC) Forest fire (FIRE) Normalized Intensity HeightAboveGround(m) 0.00 0.01 0.02 0.03 0.04 0.05 05101520 5% 25% 50% 95% G−LiHT sensor signal Percentile heights (i.e., P95) SAMSI Undergraduate Modelling Workshop
  • 8.
    Tanana Inventory Unit(TIU) Data: 1. LiDAR transects 50,000 km flight lines 25 TB data ∼43 million signals {FC, P95} 2. Inventory plots 1,461 ∼7 m radius {AGB, BA, TPH} 3. Complete coverage % forest canopy (TC) Forest fire (FIRE) SAMSI Undergraduate Modelling Workshop
  • 9.
    Tanana Inventory Unit(TIU) Data: 1. LiDAR transects 50,000 km flight lines 25 TB data ∼43 million signals {FC, P95} 2. Inventory plots 1,461 ∼7 m radius {AGB, BA, TPH} 3. Complete coverage % forest canopy (TC) Forest fire (FIRE) Easting (km) Northing(km) 0 100 200 300 400 500 600 0100200300400500 Forest fire within 20 years SAMSI Undergraduate Modelling Workshop
  • 10.
    Key inferential challenges incorporatemany sources of spatially indexed data address misalignment (missingness) among responses accommodate and leverage residual spatial dependence propagate parameter uncertainty through to predictions deliver statistically valid probabilistic prediction of arbitrary areas maintain observed covariance among multivariate predictions scale to effectively exploit information in massive data sets SAMSI Undergraduate Modelling Workshop
  • 11.
    Key inferential challenges incorporatemany sources of spatially indexed data address misalignment (missingness) among responses accommodate and leverage residual spatial dependence propagate parameter uncertainty through to predictions deliver statistically valid probabilistic prediction of arbitrary areas maintain observed covariance among multivariate predictions scale to effectively exploit information in massive data sets Some anticipated extensions: incorporate time-indexed observations model nonstationarity among multiple responses across locations estimate space- and time-varying impact of covariates SAMSI Undergraduate Modelling Workshop
  • 12.
    Hierarchical Gaussian processmodels Say we observe q outcomes at a given location within domain L. A multivariate spatial regression: yk ( ) = xk ( ) βk + wk ( ) + ek ( ), for k = 1, 2, . . . , q yk ( ) is the kth outcome at generic location (e.g., AGB, BA, TPH, FC, P95) Mean: xk ( ) includes an intercept, TC, and FIRE Cov: w( ) = (w1( ), w2( ), . . . , wq( )) ∼ MVGP(0, Γθ(·, ·)) where Γθ( , ) = {Cov(wi ( ), wj ( ))} for i, j = 1, 2, . . . , q Error: e( ) = (e1( ), e2( ), . . . , eq( )) ∼ MVN(0, Ψ) TIU we must accommodate spatial misalignment (i.e., yk ’s are partially observed at some locations), see, e.g., Gelfand et al. 2004, Finley et al. 2014. Skip to results SAMSI Undergraduate Modelling Workshop
  • 13.
    Hierarchical Gaussian processmodels Multivariate spatial regression model for TIU and similar settings Forest variables Spatial random effect Trend Spatial decay LiDAR variables Trend Spatial random effect Spatial decay Landsat, etc. Non-spatial error variance-covariance Spatial variance-covariance SAMSI Undergraduate Modelling Workshop
  • 14.
    Hierarchical Gaussian processmodels Multivariate spatial regression model for TIU and similar settings Forest variables Spatial random effect Trend Spatial decay LiDAR variables Trend Spatial random effect Spatial decay Landsat, etc. Non-spatial error variance-covariance Spatial variance-covariance SAMSI Undergraduate Modelling Workshop
  • 15.
    Hierarchical Gaussian processmodels Multivariate spatial regression model for TIU and similar settings Forest variables Spatial random effect Trend Spatial decay LiDAR variables Trend Spatial random effect Spatial decay Landsat, etc. Non-spatial error variance-covariance Spatial variance-covariance SAMSI Undergraduate Modelling Workshop
  • 16.
    Hierarchical Gaussian processmodels Multivariate spatial regression model for TIU and similar settings Forest variables Spatial random effect Trend Spatial decay LiDAR variables Trend Spatial random effect Spatial decay Landsat, etc. Non-spatial error variance-covariance Spatial variance-covariance SAMSI Undergraduate Modelling Workshop
  • 17.
    Hierarchical Gaussian processmodels Multivariate spatial regression model for TIU and similar settings Forest variables Spatial random effect Trend Spatial decay LiDAR variables Trend Spatial random effect Spatial decay Landsat, etc. Non-spatial error variance-covariance Spatial variance-covariance SAMSI Undergraduate Modelling Workshop
  • 18.
    Spatiotemporal regression models Startwith a simple univariate regression: y( ) = x( ) β + w( ) + e( ) Potentially very rich: understand spatially- and/or temporally-varying impact of intercept or predictors on outcome Produce maps for random effects: {w( ) : ∈ L} L is spatial domain (e.g., D ⊂ d ) or spatiotemporal domain (e.g., D ⊂ d × + ) Model-based predictions: y( 0) | {y( 1), y( 2), . . . , y( n)} SAMSI Undergraduate Modelling Workshop
  • 19.
    Gaussian spatiotemporal process {w() : ∈ L} ∼ GP(0, Kθ(·, ·)) implies w = (w( 1), w( 2), . . . , w( n)) ∼ MVN(0, Kθ) for every finite set of points 1, 2, . . . , n. Kθ = {Kθ( i , j )} is a spatial variance-covariance matrix, where θ = {σ, φ} Stationary: Kθ( , ) = Kθ( − ). Isotropy: Kθ( , ) = Kθ( − ). SAMSI Undergraduate Modelling Workshop
  • 20.
    Likelihood from (fullrank) GP models Assuming {w( ) : ∈ L} ∼ GP(0, Kθ(·, ·)) implies w = (w( 1), w( 2), . . . , w( n)) ∼ MVN(0, Kθ) Estimating process parameters from the likelihood involves: p(w) ∝ − 1 2 log det(Kθ) − 1 2 w K−1 θ w Bayesian inference: priors on θ and many Markov chain Monte Carlo (MCMC) iterations See, e.g., Finley et al. 2015 and Finley et al. 2017 for some coding tips. SAMSI Undergraduate Modelling Workshop
  • 21.
    Computation issues Storage: n2 pairwisedistances to compute Kθ Kθ is dense; Need to solve Kθx = b and need det(Kθ) This is best achieved using chol(Kθ) = LDL Complexity: roughly O(n3 ) flops Computationally infeasible for large datasets SAMSI Undergraduate Modelling Workshop
  • 22.
    Burgeoning literature onspatial big data Low-rank models: (Wahba 1990; Higdon 2002; Kamman & Wand 2003; Paciorek 2007; Rasmussen & Williams 2006; Stein 2007, 2008; Cressie & Johannesson 2008; Banerjee et al. 2008, 2010; Gramacy & Lee 2008; Finley et al. 2009; Sang et al. 2011, 2012; Lemos et al. 2011; Guhaniyogi et al. 2011, 2013; Salazar et al. 2013; Katzfuss 2016) Spectral approximations and composite likelihoods: (Fuentes 2007; Paciorek 2007; Eidsvik et al. 2016) Multi-resolution approaches: (Nychka, 2014; Johannesson et al. 2007; Matsuo et al. 2010; Tzeng & Huang 2015; Katzfuss 2016) Sparsity: (Solve Ax = b by (i) sparse A, or (ii) sparse A−1 ) 1. Covariance tapering (Furrer et al. 2006; Du et al. 2009; Kaufman et al. 2009; Shaby and Ruppert 2013) 2. GMRFs to GPs: INLA (Rue et al. 2009; Lindgren et al. 2011) 3. LAGP (Gramacy et al. 2014; Gramacy & Apley 2015) 4. Nearest-neighbor Gaussian Process (NNGP) models (Datta et al. 2015, 2016; Finley et al. 2017) SAMSI Undergraduate Modelling Workshop
  • 23.
    Reduced (Low) rankmodels Kθ ≈ BθK∗ θ Bθ + Dθ Bθ is n × r matrix of spatial basis functions, r << n K∗ θ is r × r spatial covariance matrix Dθ is either diagonal or sparse Examples: Kernel projections, Splines, Predictive process, FRK, spectral basis . . . Computations exploit above structure: roughly O(nr2 ) << O(n3 ) flops SAMSI Undergraduate Modelling Workshop
  • 24.
    Low-rank models: hierarchicalapproach N(w∗ | 0, K∗ θ ) × N(w | Bθw∗ , D) w is n × 1 and n is large w∗ is r × 1, where r << n; so K∗ θ is r × r Bθ is n × r is a matrix of “basis” functions D is n × n, but easy to invert (e.g., diagonal) Derive var(w) (or var(w∗ | y)) in alternate ways to obtain (BθK∗ θ Bθ + D)−1 = D−1 − D−1 Bθ(K∗−1 θ + Bθ D−1 Bθ)−1 Bθ D−1 . This is the famous Sherman-Woodbury-Morrison formula. Modeling: specifying w∗ and Bθ. See Finley et al. 2015 for implementation details in spBayes R package SAMSI Undergraduate Modelling Workshop
  • 25.
    Oversmoothing due toreduced-rank models (a) True w (b) Full GP (c) PPGP 64 knots Figure: Comparing full GP vs low-rank GP with 2500 locations. Figure (c) exhibits oversmoothing by a low-rank process (predictive process with 64 knots) See Stein 2014 for good reasons not to use reduced-rank spatial models SAMSI Undergraduate Modelling Workshop
  • 26.
    Simple method ofintroducing sparsity (e.g., graphical models) p(w) = N(w | 0, Kθ) = p(w1)p(w2 | w1) × p(w3 | w1, w2) × p(w4 | w1, w2, w3) × p(w5 | w1, w2, w3, w4) × p(w6 | w1, w2, . . . , w5) × p(w7 | w1, w2, . . . , w6) . SAMSI Undergraduate Modelling Workshop
  • 27.
    Simple method ofintroducing sparsity (e.g. graphical models) p(w) = N(w | 0, Kθ) = p(w1)p(w2 | w1) × p(w3 | w1, w2) × p(w4 |¨¨w1, w2, w3) × p(w5 |¨¨w1,¨¨w2, w3, w4) × p(w6 |¨¨w1,¨¨w2,¨¨w3, w4, w5) × p(w7 | w1,¨¨w2,¨¨w3, w4,¨¨w5, w6) . We need to solve n − 1 linear systems of size at most m × m, where m is the number of neighbors in the conditional set. SAMSI Undergraduate Modelling Workshop
  • 28.
    Sparse likelihood approximations(Vecchia, 1988; Stein et al., 2004) With w( ) ∼ GP(0, Kθ(·)), write the joint density p(w) as: N(w | 0, Kθ) = n i=1 p(w( i ) | wH( i )) ≈ n i=1 p(w( i ) | wN( i )) = N(w | 0, ˜Kθ) . where N( i ) ⊆ H( i ). Shrinkage: Choose N( ) as the set of “m nearest-neighbors” among H( i ). Theory: “Screening” effect of kriging. ˜K−1 θ depends on Kθ, but is sparser with at most nm2 non-zero entries Extension to a GP (Datta et al., JASA, 2016) called the Nearest Neighbor Gaussian Process (NNGP) SAMSI Undergraduate Modelling Workshop
  • 29.
    (a) True w(b) Full GP (c) PPGP 64 knots (d) NNGP, m = 10 (e) NNGP, m = 20 SAMSI Undergraduate Modelling Workshop
  • 30.
    q q q q q q q q qq q q q q q q q q q q q q q q q m RMSPE 1.15 1.20 1.25 1.30 1.35 q q q q q q q q q q q q q q q q q q q q q q q q q 2.10 2.15 2.20 2.25 2.30 2.35 2.40 Mean95%CIwidth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 q q NNGP RMSPE NNGP Mean 95% CI width Full GP RMSPE Full GP Mean 95% CI width Figure: Choice of m in NNGP models: Out-of-sample Root Mean Squared Prediction Error (RMSPE) and mean width between the upper and lower 95% posterior predictive credible intervals for a range of m for the univariate synthetic data analysis SAMSI Undergraduate Modelling Workshop
  • 31.
    Figure: Wall timerequired for one MCMC iteration by number of locations n and m=10 nearest neighbors (both axes are on the log scale). SAMSI Undergraduate Modelling Workshop
  • 32.
    Concluding remarks: Storageand computation Algorithms: Gibbs, RWM, HMC, VB, INLA; NNGP/HMC especially promising Model-based solution for spatial “BIG DATA” Never needs to store n × n distance matrix—store n small m × m matrices, where m is the number of nearest neighbors considered and m << n, e.g., m ≈ 15. Total flop count per iteration is O(nm3 ) i.e., linear in n Scalable to massive datasets because m is small—you never need more than a few neighbors. Compare with reduced-rank models: O(nm3 ) << O(nr2 ). New R package spNNGP (on CRAN https://cran.r-project.org/web/packages/spNNGP) SAMSI Undergraduate Modelling Workshop
  • 33.
    Tanana Valley initialrun results Initial analysis fit the multivariate spatial NNGP model (with misalignment between inventory plots and LiDAR outcomes) Skip TIU model Model fit and prediction algorithms written in C with heavy use of OpenMP for parallelization. Outcome vector included: AGB, BA, TPH, FC, and P95 AGB, BA, and TPH measured on 1,461 forest inventory plots FC and P95 measured on 5 million LiDAR pixels We considered m=15 neighbors for NNGPs Posterior inference was based on 25k post burn-in MCMC samples Full GP covariance matrix Kθ would be 5,001,461×5,001,461! NNGP run time was ∼12 hours (Intel 18 core machine) Prediction for TIU takes ∼5 days to deliver pixel level posterior distributions. SAMSI Undergraduate Modelling Workshop
  • 34.
    Brief overview ofparameter estimates β’s Parameter 50% (2.5% 97.5%) AGB0 0.86 (0.27, 1.46) AGBTC 0.10 (0.09, 0.11) AGBFIRE 0.41 (-0.23, 1.06) BA0 0.96 (0.66, 1.26) BATC 0.04 (0.04, 0.05) BAFIRE 0.05 (-0.29, 0.39) TPH0 28.17 (24.21, 32.15) TPHTC 0.21 (0.16, 0.26) TPHFIRE -0.77 (-5.02, 3.52) FC0 0.01 (-0.03, 0.06) FCTC 0.01 (0.01, 0.01) FCFIRE -0.09 (-0.14, -0.05) P950 -0.28 (-1.05, 0.49) P95TC 0.13 (0.12, 0.14) P95FIRE -0.99 (-1.92, -0.06) cor(Γ) Parameter 50% (2.5% 97.5%) AGB, BA 0.91 (0.91, 0.92) AGB, TPA 0.09 (0.08, 0.09) AGB, FC 0.37 (0.31, 0.41) AGB, P95 0.84 (0.84, 0.85) BA, TPH 0.09 (0.08, 0.09) BA, FC 0.18 (0.11, 0.27) BA, P95 0.73 (0.72, 0.74) TPA, FC 0.38 (0.27, 0.44) TPA, P95 -0.05 (-0.06,-0.04) FC, P95 0.50 (0.45, 0.55) SAMSI Undergraduate Modelling Workshop
  • 35.
    NNGP m=15 modelpredicted AGB (posterior predicted mean) SAMSI Undergraduate Modelling Workshop
  • 36.
    NNGP m=25 modelpredicted AGB (posterior predicted SD) SAMSI Undergraduate Modelling Workshop
  • 37.
    Prototype for FIA/NASATIU data products user interface http://www.globalfiredata.org/temp/tanana.html SAMSI Undergraduate Modelling Workshop
  • 38.
    Prototype for FIA/NASATIU data products user interface http://www.globalfiredata.org/temp/tanana.html SAMSI Undergraduate Modelling Workshop
  • 39.
    Thank You ! Datta,A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111:800-812. Datta, A., S. Banerjee, A.O. Finley, N.A.S. Hamm, and M. Schaap. (2016) Non-separable Dynamic Nearest Neighbor Gaussian Process Models for Large Spatio-temporal Data with Application to Particulate Matter Analysis. Annals of Applied Statistics, 31286-1316. Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) On Nearest-Neighbor Gaussian Process Models for Massive Spatial Data. WIREs Computational Statistics, 8:162-171. Finley, A.O., S. Banerjee, Y., Zhou, B.D. Cook. (2017) Joint hierarchical models for sparsely sampled high-dimensional LiDAR and forest variables. Remote Sensing of Environment, 1:149-161. Finley, A.O., A. Datta, B.C. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee (2017) Applying Nearest Neighbor Gaussian Processes to massive spatial data sets: Forest canopy height prediction across Tanana Valley Alaska. https://arxiv.org/abs/1702.00434 Finley, A.O., S. Banerjee, A.E. Gelfand. (2015) spBayes for large univariate and multivariate point-referenced spatio-temporal data models. Journal of Statistical Software, 63:1-28. Heaton, M.J. A. Datta, A.O. Finley, R. Furrer, R. Guhaniyogi, F. Gerber, R.B. Gramacy, D. Hammerling, M. Katzfuss, F. Lindgren, D.W. Nychka, F. Sun, and A. Zammit-Mangion. (2017) Methods for analyzing large spatial data: A review and comparison. https://arxiv.org/abs/1710.05013 Other references provided upon request. SAMSI Undergraduate Modelling Workshop
  • 40.
    Concluding remarks: Comparisons Arelow-rank spatial models well and truly beaten? Certainly do not seem to scale as nicely as NNGP Have somewhat greater theoretical tractability (e.g. Bayesian asymptotics) Can be used to flexibly model smoothness Can be constructed for other processes—e.g., Spatial Dirichlet Predictive Process Compare with scalable multi-resolution frameworks (Katzfuss, 2016) Highly scalable meta-kriging frameworks (Guhaniyogi, 2016) Future work: High-dimensional multivariate spatial-temporal variable selection SAMSI Undergraduate Modelling Workshop