[20240628_LabSeminar_Huy]ScalableSTGNN.pptx

Quang-Huy Tran
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: huytran1126@gmail.com
2024-06-28
Scalable Spatiotemporal Graph Neural
Networks
Andrea Cini et al.
AAAI-2023: Proceedings of the Thirty-Seventh Conference on Artificial Intelligence

2
OUTLINE
• MOTIVATION
• INTRODUCTION
• METHODOLOGY
• EXPERIMENT & RESULT
• CONCLUSION

3
MOTIVATION
• Graph neural networks (GNNs) are gaining more traction in many application fields
o The need for architectures scalable to large graphs is becoming a pressing issue. Especially, it
becomes challenge when dealing with discrete-time dynamical graphs.
Overview and Limitation of Previous Works
• Previous works focus on subsampling graphs to reduce the computational requirements:
o Sampling the node-level observation: if they were i.i.d., it can break spatial dependencies in static
graphs and in dynamic case when deal with temporal dimension.
o Precomputing aggregated features over the graph allows for factoring out spatial propagation:
 Preprocessing step must account also for the temporal dependencies besides the graph topology.
 Hence it is not trivial in spatiotemporal case for large scale.

4
INTRODUCTION
• Propose Scalable Graph Predictor (SGP)
framework for spatiotemporal time series:
o Exploits a novel encoding method based on
randomized recurrent components and scalable
GNNs architectures.
o Spatiotemporal encoding is training-free: exploits
a deep randomized recurrent neural network.
 Encode the history of each sequence in a high-
dimensional vector embedding.
Contribution
 Uses powers of the graph adjacency matrix to build informative node representations of the spatiotemporal
dynamics at different scales.
o From downstream task, decoder maps the node representations into the desired output.
 Act as a collection of filters localized at different spatiotemporal scales.

5
METHODOLOGY
Problem Definition
• Given graph at time t as 𝐺𝑡 = 𝑋𝑡, 𝑈𝑡, 𝑉, 𝐴𝑡 :
o 𝑉 ∈ ℝ𝑁×𝑑𝑣 is set of additional, optional, static node attributes from 𝑁 interlinked sensors.
o Adjacency matrix 𝐴𝑡 ∈ ℝ𝑁×𝑁
o Node attribute matrix 𝑋𝑡 ∈ ℝ𝑁×𝑑𝑥 and 𝑈𝑡 ∈ ℝ𝑁×𝑑𝑢 is exogenous variables matrix (e.g., weather information
related to a monitored area).
o 𝑑: dimensional multivariate observation.
• Problem: predict the next H observations given a window of W past measurement
• Echo-State Network: consist of recurrent neural networks with random connections
o Idea: feed an input signal into a high-dimensional, randomized, and non-linear reservoir, whose internal state can
be used as an embedding of the input dynamics.
o Reservoir will extract a rich pool of dynamics characterizing the system underlying the input time series

6
METHODOLOGY
Main Architecture
• Hybrid encoder-decoder architecture
o Encoder: constructs representations of the time series observed at each node by using a reservoir that accounts for
dynamics at different time scales.
 Representations are further processed for spatial dynamics by graph structure.
 Final embedding is then built by concatenating representations obtained w.r.t. each propagation step.
o Decoder (or readout): learning separate weight matrices for each spatiotemporal scale before MLP.

7
METHODOLOGY
Spatiotemporal Encoder
• Deep Echo State network: consider each node is encoded by a stack of L randomized
recurrent layers.
o a hierarchical stack of reservoir layers to extract a richpool of multi-scale temporal dynamic by changing
discount factor γl
o Obtain node-level temporal encodings ℎ𝑡
𝑖
for each node i and time-step t.

8
METHODOLOGY
Spatiotemporal Encoder
• Indicate as 𝐻𝑡 the encoding for the whole graph at time t.
• Use powers of a graph shift operator 𝐴 to propagate and aggregate node representations
at different scales
o Features corresponding to each order k computed recursively K sparse matrix-matrix multiplications.
where 𝐴 indicates a generic graph shift operator matching the sparsity
pattern of the graph adjacency matrix
𝐴 = D−1
A: directed graph
𝐴 = D−1/2
AD−1/2
: undirected graph.

9
METHODOLOGY
Multi-Scale Decoder
• From concatenation of temporal representations and spatial representations 𝑆𝑡.
• Design first layer with a sparse connectivity pattern to learn representations
o Representations 𝑍𝑡 can be efficiently computed by exploiting grouped 1-d convolutions

10
METHODOLOGY
Multi-Scale Decoder
o Obtained representations are fed into an MLP that predicts the H-step-ahead observations:
• Training and sampling:
o Representations 𝑆𝑡 embed both the temporal and spatial relationships among observations over the
sensor network.
o Each sample 𝑠𝑡
𝑖
can be processed independently since no further spatiotemporal information needs to
be collected.
o Allow to train the decoder with SGD by uniformly and independently sampling mini-batches of data
points 𝑠𝑡
𝑖
.
o Make training scalable and drastically reduces the lower bound on the computational complexity.

11
EXPERIMENT AND RESULT
EXPERIMENT SETTINGs
• Dataset:
o Medium scale: METR-LA, PEMS-BAY (traffic dataset).
o Large scale: PV-US, CER-En (Energy production dataset).
• Baselines:
o Deep Learning: LSTM, and FC-LSTM.
o Graph methods: DCRNN[1], Graph WaveNet(GWNet)[2], GatedGN (GGN)[3], and DynGESN[4].
[1] Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2017). Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926..
[2] Wu, Z., Pan, S., Long, G., Jiang, J., & Zhang, C. (2019). Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121.
[3] Gao, J., & Ribeiro, B. (2022, June). On the equivalence between temporal and static equivariant graph representations. In International Conference on Machine Learning (pp. 7052-7076). PMLR.
[4] Micheli, A., & Tortorella, D. (2022). Discrete-time dynamic graph echo state networks. Neurocomputing, 496, 85-95.
• Measurement:
o MAE, MSE, MAPE.

12
RESULT – Overall Performance in Medium-scale Datasets

13
RESULT – Overall Performance in Large-scale Datasets

14
CONCLUSION
• Proposed SGP, a scalable architecture for graph-based spatiotemporal time series
forecasting.
o Deal well with medium-sized and improving scalability in large sensor network.
o While in SGP sampling largely reduces GPU memory usage, the entire processed sequence can take
up a large portion of system memory, depending on the size of the reservoir.
o The preprocessed data stored on disk and loaded in batches during training, as customary for large
datasets. Hence, preprocessing can be distributed.
• Future work:
o Explore a tighter integration of the spatial and temporal encoding components
o Assess performance on even larger benchmarks.
Summarization

[20240628_LabSeminar_Huy]ScalableSTGNN.pptx

[20240628_LabSeminar_Huy]ScalableSTGNN.pptx

More Related Content

Similar to [20240628_LabSeminar_Huy]ScalableSTGNN.pptx

More from thanhdowork

Recently uploaded

[20240628_LabSeminar_Huy]ScalableSTGNN.pptx