stxplore: Tools for spatio-
temporal data exploration
Sevvandi Kandanaarachchi, Petra Kuhnert, Andrew Zammit-Mangion,
Christopher Wikle
MODSIM 2023
The story behind stxplore
Spatio-Temporal Statistics with R
by Christopher K. Wikle, Andrew Zammit-
Mangion
and Noel Cressie
Short course: An Introduction to Statistics for
Spatio-Temporal Data. (2017)
Petra Kuhnert and Chris Wikle
R package stxplore (2022/2023)
Spatio-temporal data is
everywhere
• Climate
• Disease
• Energy
• Satellite data Vegetation Index (NASA)
Land surface temperature anomaly (NASA)
Why is spatio-temporal exploration
important?
• Quantities can change differently in latitude and longitude
• Temporal changes different to spatial changes
• The difference in scales (temporal and spatial)
• Different granularities
• Data is complex and diverse
• Generally, no two spatio-temporal datasets are the same
Before modelling we need to
understand the data better
R packages to explore spatio-temporal data
• cubble – Sherry Zhang et al
• A spatio-temporal data object for mainly data wrangling
• Many plotting packages to plot geospatial data
• ggmap
• leaflet
• tmap
• …
stxplore: what’s different from
other packages?
Features of stxplore
• We provide initial simple analysis functions for spatio-temporal
modelling
• Simple explorations
• Taking averages
• Different types of plots for spatio-temporal data
• Investigating covariances
• Variograms
• PCA like computations for spatio-temporal data
stxplore can use
• Either dataframes
• Or stars objects (stacked raster data)
• R package stars
Example: explore aerosol data
• Aerosol data from December 2019 to December 2020
• Monthly data – from NASA
• NASA Earth Observations (NEO).” https://neo.gsfc.nasa.gov/.
• 2019/2020 bushfires
Initial explorations
2020−12−01
2020−08−01 2020−09−01 2020−10−01 2020−11−01
2020−04−01 2020−05−01 2020−06−01 2020−07−01
2019−12−01 2020−01−01 2020−02−01 2020−03−01
120 140 160 180
120 140 160 180 120 140 160 180 120 140 160 180
−60
−40
−20
0
−60
−40
−20
0
−60
−40
−20
0
−60
−40
−20
0
x
y
50
100
150
200
250
z
Spatial Snapshots
Temporal
variation
0
50
100
150
200
250
Jan 2020 Apr 2020 Jul 2020 Oct 2020
Time
Value
Observed
Average
Temporal Empirical Means
Temporal variation – Ridgeline plots
2019−12−01
2020−02−01
2020−03−01
2020−04−01
2020−05−01
2020−06−01
2020−08−01
2020−09−01
2020−10−01
2020−11−01
0 100 200
Value
Group
Intervals
0
50
100
150
200
250
z
Use ggridges under the hood
Jan 2020
Apr 2020
Jul 2020
Oct 2020
Jan 2021
120 140 160 180
Longitude
Day
z
20
30
40
50
Jan 2020
Apr 2020
Jul 2020
Oct 2020
Jan 2021
−60 −40 −20 0
Latitude
Day
z
20
30
40
50
60
Hovmoller plots
Zooming in to the high aerosol
region
2020−12−01
2020−08−01 2020−09−01 2020−10−01 2020−11−01
2020−04−01 2020−05−01 2020−06−01 2020−07−01
2019−12−01 2020−01−01 2020−02−01 2020−03−01
150 155 160 165 170
150 155 160 165 170150 155 160 165 170150 155 160 165 170
−40
−35
−30
−25
−20
−40
−35
−30
−25
−20
−40
−35
−30
−25
−20
−40
−35
−30
−25
−20
x
y
50
100
150
200
250
z
Spatial
Snapshots
PCA-like explorations
Empirical Orthogonal Functions
• A bit similar to PCA for Spatio-temporal data – 3D data cube
• Used for dimension reduction
• Input: A stack of rasters, components = 2
• Output:
• 1st Spatial snapshot +1st time series
• 2nd Spatial snapshot + 2nd time series
• When you multiply the spatial snapshot by the time series it gives an
approximation to the original data cube
−40
−35
−30
−25
−20
150 155 160 165 170
Longitude°
Latitude°
−0.16
−0.12
−0.08
−0.04
EOF1
EOF_1
−2
−1
0
5 10
Time (t)
Normalized
Principal
Component
PC Time series for EOF_1
−40
−35
−30
−25
−20
150 155 160 165 170
Longitude°
Latitude°
−0.2
−0.1
0.0
0.1
0.2
EOF2
EOF_2
−2
−1
0
1
2
5 10
Time (t)
Normalized
Principal
Component
PC Time series for EOF_2
Empirical Orthogonal Functions
The code
About stxplore
• Vignette: https://sevvandi.github.io/stxplore/index.html
• R package is on CRAN
• Thoughts and collaborations welcome!
We are hiring! – FairML Research
• CSIRO Postdoctoral Fellowship in Fairness
Research in Machine Learning
• Salary Range: AU$92,624 to AU$101,459 pa
• plus up to 15.4% superannuation
• 3-year contract
• https://jobs.csiro.au/go/CERC-Postdoctoral-and-
Engineering-Fellowships/7829300/
• Job will be advertised in August
Thank you!

Sophisticated tools for spatio-temporal data exploration

  • 1.
    stxplore: Tools forspatio- temporal data exploration Sevvandi Kandanaarachchi, Petra Kuhnert, Andrew Zammit-Mangion, Christopher Wikle MODSIM 2023
  • 2.
    The story behindstxplore Spatio-Temporal Statistics with R by Christopher K. Wikle, Andrew Zammit- Mangion and Noel Cressie Short course: An Introduction to Statistics for Spatio-Temporal Data. (2017) Petra Kuhnert and Chris Wikle R package stxplore (2022/2023)
  • 3.
    Spatio-temporal data is everywhere •Climate • Disease • Energy • Satellite data Vegetation Index (NASA) Land surface temperature anomaly (NASA)
  • 4.
    Why is spatio-temporalexploration important? • Quantities can change differently in latitude and longitude • Temporal changes different to spatial changes • The difference in scales (temporal and spatial) • Different granularities • Data is complex and diverse • Generally, no two spatio-temporal datasets are the same
  • 5.
    Before modelling weneed to understand the data better
  • 6.
    R packages toexplore spatio-temporal data • cubble – Sherry Zhang et al • A spatio-temporal data object for mainly data wrangling • Many plotting packages to plot geospatial data • ggmap • leaflet • tmap • …
  • 7.
    stxplore: what’s differentfrom other packages?
  • 8.
    Features of stxplore •We provide initial simple analysis functions for spatio-temporal modelling • Simple explorations • Taking averages • Different types of plots for spatio-temporal data • Investigating covariances • Variograms • PCA like computations for spatio-temporal data
  • 9.
    stxplore can use •Either dataframes • Or stars objects (stacked raster data) • R package stars
  • 10.
    Example: explore aerosoldata • Aerosol data from December 2019 to December 2020 • Monthly data – from NASA • NASA Earth Observations (NEO).” https://neo.gsfc.nasa.gov/. • 2019/2020 bushfires
  • 11.
  • 12.
    2020−12−01 2020−08−01 2020−09−01 2020−10−012020−11−01 2020−04−01 2020−05−01 2020−06−01 2020−07−01 2019−12−01 2020−01−01 2020−02−01 2020−03−01 120 140 160 180 120 140 160 180 120 140 160 180 120 140 160 180 −60 −40 −20 0 −60 −40 −20 0 −60 −40 −20 0 −60 −40 −20 0 x y 50 100 150 200 250 z Spatial Snapshots
  • 13.
    Temporal variation 0 50 100 150 200 250 Jan 2020 Apr2020 Jul 2020 Oct 2020 Time Value Observed Average Temporal Empirical Means
  • 14.
    Temporal variation –Ridgeline plots 2019−12−01 2020−02−01 2020−03−01 2020−04−01 2020−05−01 2020−06−01 2020−08−01 2020−09−01 2020−10−01 2020−11−01 0 100 200 Value Group Intervals 0 50 100 150 200 250 z Use ggridges under the hood
  • 15.
    Jan 2020 Apr 2020 Jul2020 Oct 2020 Jan 2021 120 140 160 180 Longitude Day z 20 30 40 50 Jan 2020 Apr 2020 Jul 2020 Oct 2020 Jan 2021 −60 −40 −20 0 Latitude Day z 20 30 40 50 60 Hovmoller plots
  • 16.
    Zooming in tothe high aerosol region
  • 17.
    2020−12−01 2020−08−01 2020−09−01 2020−10−012020−11−01 2020−04−01 2020−05−01 2020−06−01 2020−07−01 2019−12−01 2020−01−01 2020−02−01 2020−03−01 150 155 160 165 170 150 155 160 165 170150 155 160 165 170150 155 160 165 170 −40 −35 −30 −25 −20 −40 −35 −30 −25 −20 −40 −35 −30 −25 −20 −40 −35 −30 −25 −20 x y 50 100 150 200 250 z Spatial Snapshots
  • 18.
  • 19.
    Empirical Orthogonal Functions •A bit similar to PCA for Spatio-temporal data – 3D data cube • Used for dimension reduction • Input: A stack of rasters, components = 2 • Output: • 1st Spatial snapshot +1st time series • 2nd Spatial snapshot + 2nd time series • When you multiply the spatial snapshot by the time series it gives an approximation to the original data cube
  • 20.
    −40 −35 −30 −25 −20 150 155 160165 170 Longitude° Latitude° −0.16 −0.12 −0.08 −0.04 EOF1 EOF_1 −2 −1 0 5 10 Time (t) Normalized Principal Component PC Time series for EOF_1 −40 −35 −30 −25 −20 150 155 160 165 170 Longitude° Latitude° −0.2 −0.1 0.0 0.1 0.2 EOF2 EOF_2 −2 −1 0 1 2 5 10 Time (t) Normalized Principal Component PC Time series for EOF_2 Empirical Orthogonal Functions
  • 21.
  • 22.
    About stxplore • Vignette:https://sevvandi.github.io/stxplore/index.html • R package is on CRAN • Thoughts and collaborations welcome!
  • 23.
    We are hiring!– FairML Research • CSIRO Postdoctoral Fellowship in Fairness Research in Machine Learning • Salary Range: AU$92,624 to AU$101,459 pa • plus up to 15.4% superannuation • 3-year contract • https://jobs.csiro.au/go/CERC-Postdoctoral-and- Engineering-Fellowships/7829300/ • Job will be advertised in August
  • 24.