The effect of grid spacing on spatial prediction of species abundances was estimated. Data on counts of intertidal macrofauna (M. balthica) were collected in the Dutch Wadden sea over a grid of 500 × 500 m. The first step in the procedure was modelling of the zero-inflated data
without taking spatial dependency into account. The problem of excess zeros was addressed through a mixture model (Lambert, 1992) which allowed to distinguish the point mass at zero through a Bernoulli process and the count component through a Poisson process. In the second
step spatial correlation in both processes was then accounted for through generalised linear geostatistical model (GLSM) (Diggle et al., 1998; Christensen, 2004). Using simulations from the conditional distribution by MCMC a Monte Carlo approximation to the likelihood function was made. In the third step the two calibrated GLSMs were used to generate 100 pseudo-realities. This was done by conditional simulation from the original grid to the nodes of a fine prediction grid (100 × 100 m) supplemented with 1000 randomly selected validation points. The simulated pseudo-realities of the Bernoulli variable and the Poisson variable were combined
into 100 pseudo-realities of a zero-inflated Poisson variable. In the fourth step each simulated
pseudo-reality was repeatedly sampled by grid sampling with a varying spacing. Each sample was used to predict the study variable at the validation points by inverse distance weighted interpolation, and to estimate the Mean Squared Error (MSE). By averaging the MSEs over the
pseudo-realities an estimate of the model-expectation of the MSE was obtained. The results showed that the decrease in resolution of the sampling grid (upscaling) had a clear effect on the precision of the predictions. This has direct implications for decisions with respect to sampling
density for ecological monitoring programmes.
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
ISEC 2014 (International Statistical Ecology Conference)
1. Grid spacing and quality of spatially predicted species
abundances
A case-study for zero-inflated spatial data
Olga Lyashevska* Dick Brus** Jaap van der Meer*
*Royal Netherlands Institute for Sea Research
Department of Marine Ecology
**Alterra, Wageningen University and Research Centre
olga.lyashevska@nioz.nl
July, 2 2014
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 1 / 16
2. Problem
Sampling is expensive, therefore it is important to statistically
evaluate sampling designs prior to implementation of
monitoring network;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 2 / 16
3. Problem
Sampling is expensive, therefore it is important to statistically
evaluate sampling designs prior to implementation of monitoring
network;
This has been done before . . . (Bijleveld et al., 2012; Brus and
de Gruijter, 2013), but. . .
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 2 / 16
4. Problem
Sampling is expensive, therefore it is important to statistically
evaluate sampling designs prior to implementation of monitoring
network;
This has been done before . . . (Bijleveld et al., 2012; Brus and
de Gruijter, 2013), but. . .
spatial empirical ecological data are typically zero-inflated
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 2 / 16
5. Problem
Sampling is expensive, therefore it is important to statistically
evaluate sampling designs prior to implementation of monitoring
network;
This has been done before . . . (Bijleveld et al., 2012; Brus and
de Gruijter, 2013), but. . .
spatial empirical ecological data are typically zero-inflated
and accounting for spatial dependence of such data is not
straightforward.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 2 / 16
6. Aim
1. To work out a methodology for statistical evaluation of
sampling designs for zero-inflated spatially correlated count
data;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 3 / 16
7. Aim
1. To work out a methodology for statistical evaluation of sampling
designs for zero-inflated spatially correlated count data;
2. To test proposed methodology in a real-world case study.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 3 / 16
8. Methodology
Postulate a statistical model of the spatial distribution of the
variable;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
9. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
10. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Simulate a large number of pseudo-realities;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
11. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Simulate a large number of pseudo-realities;
Sample each pseudo-reality repeatedly with candidate sampling
designs;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
12. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Simulate a large number of pseudo-realities;
Sample each pseudo-reality repeatedly with candidate sampling
designs;
Predict variable of interest at validation points;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
13. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Simulate a large number of pseudo-realities;
Sample each pseudo-reality repeatedly with candidate sampling
designs;
Predict variable of interest at validation points;
Compute performance statistics;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
14. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Simulate a large number of pseudo-realities;
Sample each pseudo-reality repeatedly with candidate sampling
designs;
Predict variable of interest at validation points;
Compute performance statistics;
Select the best candidate design out of evaluated candidates
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
15. Case Study
Dutch Wadden Sea;
Area: 2483 km2;
Abundance of Baltic tellin
(M. balthica);
Centrifuge tube (17.3 – 17.7
cm) to a depth of 25 cm
June–October 2010
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 5 / 16
16. Field data - Species Abundance
0
1000
2000
3000
0 25 50 75
Species abundance
Counts
90% observations are zeros
max 100 individuals
µ = 1.39 individuals
var = 24 individuals
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 6 / 16
18. Modelling of the spatial distribution
1. Calibrate zero-inflated Poisson mixture model (assuming independent
data);
2. Use fitted model to classify each zero either as a Bernoulli or a
Poisson zero;
3. Model the Bernoulli and Poisson variables separately (accounting for
spatial dependence).
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 8 / 16
19. Modelling of the spatial distribution
1. Zero inflated Poisson mixture model (Lambert, 1992);
P(y|x) =
exp(−µ)µy
y!
(1)
logit(ψ) = log(
ψ
1 − ψ
) = xT
β (2)
P(Y = y)
ψ + (1 − ψ)exp(−µ) y=0
(1 − ψ)exp(−µ)µy
y! for y = 1, 2, 3, . . .
(3)
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 9 / 16
20. Modelling of the spatial distribution
2. Bernoulli/Poisson zeros;
Compute the ratio of the probability of a Bernoulli zero to the total
probability of a zero;
ψ
ψ + (1 − ψ)exp(−µ)
(1)
Randomly allocate each zero to a Bernoulli zero or a Poisson zero.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 9 / 16
21. Modelling of the spatial distribution
3. Bernoulli and Poisson variables are modelled separately by GLGM
(Diggle et al., 1998; Christensen, 2004)
GLGM is GLM for dependent data (spatial random effect);
Transformed model parameters, logit(ψ) and log(µ) are modelled with
Gaussian Random Field.
S1 = logit(ψ) = x1β1 + 1 (1)
S2 = log(µ) = x2β2 + 2 (2)
The model parameters are obtained through Marcov Chain Monte
Carlo (MCML);
MCML is computationally prohibitive for large data sets.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 9 / 16
22. Simulation of the pseudo-realities
Simulate signals S (linear combination of covariates and
Gaussian noise) with GLGM models for Bernoulli and Poisson
variables at sampling locations (original grid);
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 10 / 16
23. Simulation of the pseudo-realities
Simulate signals S (linear combination of covariates and Gaussian
noise) with GLGM models for Bernoulli and Poisson variables at
sampling locations (original grid);
Use sequential Gaussian simulation to simulate signals at very
fine grid (100 m x 100 m) supplemented with validation points;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 10 / 16
24. Simulation of the pseudo-realities
Simulate signals S (linear combination of covariates and Gaussian
noise) with GLGM models for Bernoulli and Poisson variables at
sampling locations (original grid);
Use sequential Gaussian simulation to simulate signals at very fine
grid (100 m x 100 m) supplemented with validation points;
Combine pairwise the simulated fields of Bernoulli indicators
and Poisson counts to pseudo-realities of zero-inflated Poisson
counts;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 10 / 16
25. Simulated data vs Original
Figure : Simulated data, species occurrence
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 11 / 16
27. Grid spacing and Performance
Sample each pseudo-reality of zero-inflated Poisson data
repeatedly by grid-sampling with a given spacing;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 12 / 16
28. Grid spacing and Performance
Sample each pseudo-reality of zero-inflated Poisson data repeatedly
by grid-sampling with a given spacing;
Repeat it for all considered grid-spacings;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 12 / 16
29. Grid spacing and Performance
Sample each pseudo-reality of zero-inflated Poisson data repeatedly
by grid-sampling with a given spacing;
Repeat it for all considered grid-spacings;
Predict values with IDW interpolation at validation points;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 12 / 16
30. Grid spacing and Performance
Sample each pseudo-reality of zero-inflated Poisson data repeatedly
by grid-sampling with a given spacing;
Repeat it for all considered grid-spacings;
Predict values with IDW interpolation at validation points;
Calculate the performance statistics: the Mean Squared Error
MSE =
1
N
N
i=1
Y (a0) − ˆY (a0)
2
(3)
MMSE =
1
(R ∗ S)
R
i=1
S
j=1
MSEji (4)
N is a number of validation points, R - simulations and
S - samples.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 12 / 16
32. Conclusions
Sampling design for zero-inflated spatial count data is
evaluated;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 14 / 16
33. Conclusions
Sampling design for zero-inflated spatial count data is evaluated;
A strong monotonous increase of the MMSE is observed;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 14 / 16
34. Conclusions
Sampling design for zero-inflated spatial count data is evaluated;
A strong monotonous increase of the MMSE is observed;
MSEji varies strongly between simulations and samples,
especially for large grid spacings;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 14 / 16
35. Conclusions
Sampling design for zero-inflated spatial count data is evaluated;
A strong monotonous increase of the MMSE is observed;
MSEji varies strongly between simulations and samples, especially for
large grid spacings;
So numerous simulations and samples are needed for estimating
MMSE;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 14 / 16
36. Conclusions
Sampling design for zero-inflated spatial count data is evaluated;
A strong monotonous increase of the MMSE is observed;
MSEji varies strongly between simulations and samples, especially for
large grid spacings;
So numerous simulations and samples are needed for estimating
MMSE;
Spatial modelling of zero-inflated spatial data is laborious and
computer-intensive.
Is there an easier way: INLA?
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 14 / 16
37. Thanks!
Acknowledgements:
This work was done in the framework of the WaLTER (Wadden Sea Long-Term
Ecosystem Research) project (WP5)
www.walterproject.nl
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 15 / 16
38. References I
Bijleveld, A. I., van Gils, J. A., van der Meer, J., Dekinga, A., Kraan, C., van der
Veer, H. W., and Piersma, T. (2012). Designing a benthic monitoring
programme with multiple conflicting objectives. Methods in Ecology and
Evolution, 3(3):526–536.
Brus, D. and de Gruijter, J. (2013). Effects of spatial pattern persistence on the
performance of sampling designs for regional trend monitoring analyzed by
simulation of spacetime fields. Computers & Geosciences, 61(0):175 – 183.
Christensen, O. F. (2004). Monte carlo maximum likelihood in model-based
geostatistics. Journal of Computational and Graphical Statistics, 13(3):pp.
702–718.
Diggle, P. J., Tawn, J. A., and Moyeed, R. A. (1998). Model-based geostatistics.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 47(3):pp.
299–350.
Lambert, D. (1992). Zero-inflated poisson regression, with an application to
defects in manufacturing. Technometrics, 34(1):pp. 1–14.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 16 / 16