Quang-Huy Tran
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: huytran1126@gmail.com
2024-06-28
Scalable Spatiotemporal Graph Neural
Networks
Andrea Cini et al.
AAAI-2023: Proceedings of the Thirty-Seventh Conference on Artificial Intelligence
2
OUTLINE
• MOTIVATION
• INTRODUCTION
• METHODOLOGY
• EXPERIMENT & RESULT
• CONCLUSION
3
MOTIVATION
• Graph neural networks (GNNs) are gaining more traction in many application fields
o The need for architectures scalable to large graphs is becoming a pressing issue. Especially, it
becomes challenge when dealing with discrete-time dynamical graphs.
Overview and Limitation of Previous Works
• Previous works focus on subsampling graphs to reduce the computational requirements:
o Sampling the node-level observation: if they were i.i.d., it can break spatial dependencies in static
graphs and in dynamic case when deal with temporal dimension.
o Precomputing aggregated features over the graph allows for factoring out spatial propagation:
 Preprocessing step must account also for the temporal dependencies besides the graph topology.
 Hence it is not trivial in spatiotemporal case for large scale.
4
INTRODUCTION
• Propose Scalable Graph Predictor (SGP)
framework for spatiotemporal time series:
o Exploits a novel encoding method based on
randomized recurrent components and scalable
GNNs architectures.
o Spatiotemporal encoding is training-free: exploits
a deep randomized recurrent neural network.
 Encode the history of each sequence in a high-
dimensional vector embedding.
Contribution
 Uses powers of the graph adjacency matrix to build informative node representations of the spatiotemporal
dynamics at different scales.
o From downstream task, decoder maps the node representations into the desired output.
 Act as a collection of filters localized at different spatiotemporal scales.
5
METHODOLOGY
Problem Definition
• Given graph at time t as 𝐺𝑡 = 𝑋𝑡, 𝑈𝑡, 𝑉, 𝐴𝑡 :
o 𝑉 ∈ ℝ𝑁×𝑑𝑣 is set of additional, optional, static node attributes from 𝑁 interlinked sensors.
o Adjacency matrix 𝐴𝑡 ∈ ℝ𝑁×𝑁
o Node attribute matrix 𝑋𝑡 ∈ ℝ𝑁×𝑑𝑥 and 𝑈𝑡 ∈ ℝ𝑁×𝑑𝑢 is exogenous variables matrix (e.g., weather information
related to a monitored area).
o 𝑑: dimensional multivariate observation.
• Problem: predict the next H observations given a window of W past measurement
• Echo-State Network: consist of recurrent neural networks with random connections
o Idea: feed an input signal into a high-dimensional, randomized, and non-linear reservoir, whose internal state can
be used as an embedding of the input dynamics.
o Reservoir will extract a rich pool of dynamics characterizing the system underlying the input time series
6
METHODOLOGY
Main Architecture
• Hybrid encoder-decoder architecture
o Encoder: constructs representations of the time series observed at each node by using a reservoir that accounts for
dynamics at different time scales.
 Representations are further processed for spatial dynamics by graph structure.
 Final embedding is then built by concatenating representations obtained w.r.t. each propagation step.
o Decoder (or readout): learning separate weight matrices for each spatiotemporal scale before MLP.
7
METHODOLOGY
Spatiotemporal Encoder
• Deep Echo State network: consider each node is encoded by a stack of L randomized
recurrent layers.
o a hierarchical stack of reservoir layers to extract a richpool of multi-scale temporal dynamic by changing
discount factor γl
o Obtain node-level temporal encodings ℎ𝑡
𝑖
for each node i and time-step t.
8
METHODOLOGY
Spatiotemporal Encoder
• Indicate as 𝐻𝑡 the encoding for the whole graph at time t.
• Use powers of a graph shift operator 𝐴 to propagate and aggregate node representations
at different scales
o Features corresponding to each order k computed recursively K sparse matrix-matrix multiplications.
where 𝐴 indicates a generic graph shift operator matching the sparsity
pattern of the graph adjacency matrix
𝐴 = D−1
A: directed graph
𝐴 = D−1/2
AD−1/2
: undirected graph.
9
METHODOLOGY
Multi-Scale Decoder
• From concatenation of temporal representations and spatial representations 𝑆𝑡.
• Design first layer with a sparse connectivity pattern to learn representations
o Representations 𝑍𝑡 can be efficiently computed by exploiting grouped 1-d convolutions
10
METHODOLOGY
Multi-Scale Decoder
o Obtained representations are fed into an MLP that predicts the H-step-ahead observations:
• Training and sampling:
o Representations 𝑆𝑡 embed both the temporal and spatial relationships among observations over the
sensor network.
o Each sample 𝑠𝑡
𝑖
can be processed independently since no further spatiotemporal information needs to
be collected.
o Allow to train the decoder with SGD by uniformly and independently sampling mini-batches of data
points 𝑠𝑡
𝑖
.
o Make training scalable and drastically reduces the lower bound on the computational complexity.
11
EXPERIMENT AND RESULT
EXPERIMENT SETTINGs
• Dataset:
o Medium scale: METR-LA, PEMS-BAY (traffic dataset).
o Large scale: PV-US, CER-En (Energy production dataset).
• Baselines:
o Deep Learning: LSTM, and FC-LSTM.
o Graph methods: DCRNN[1], Graph WaveNet(GWNet)[2], GatedGN (GGN)[3], and DynGESN[4].
[1] Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2017). Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926..
[2] Wu, Z., Pan, S., Long, G., Jiang, J., & Zhang, C. (2019). Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121.
[3] Gao, J., & Ribeiro, B. (2022, June). On the equivalence between temporal and static equivariant graph representations. In International Conference on Machine Learning (pp. 7052-7076). PMLR.
[4] Micheli, A., & Tortorella, D. (2022). Discrete-time dynamic graph echo state networks. Neurocomputing, 496, 85-95.
• Measurement:
o MAE, MSE, MAPE.
12
EXPERIMENT AND RESULT
RESULT – Overall Performance in Medium-scale Datasets
13
EXPERIMENT AND RESULT
RESULT – Overall Performance in Large-scale Datasets
14
CONCLUSION
• Proposed SGP, a scalable architecture for graph-based spatiotemporal time series
forecasting.
o Deal well with medium-sized and improving scalability in large sensor network.
o While in SGP sampling largely reduces GPU memory usage, the entire processed sequence can take
up a large portion of system memory, depending on the size of the reservoir.
o The preprocessed data stored on disk and loaded in batches during training, as customary for large
datasets. Hence, preprocessing can be distributed.
• Future work:
o Explore a tighter integration of the spatial and temporal encoding components
o Assess performance on even larger benchmarks.
Summarization
[20240628_LabSeminar_Huy]ScalableSTGNN.pptx
[20240628_LabSeminar_Huy]ScalableSTGNN.pptx

[20240628_LabSeminar_Huy]ScalableSTGNN.pptx

  • 1.
    Quang-Huy Tran Network ScienceLab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: huytran1126@gmail.com 2024-06-28 Scalable Spatiotemporal Graph Neural Networks Andrea Cini et al. AAAI-2023: Proceedings of the Thirty-Seventh Conference on Artificial Intelligence
  • 2.
    2 OUTLINE • MOTIVATION • INTRODUCTION •METHODOLOGY • EXPERIMENT & RESULT • CONCLUSION
  • 3.
    3 MOTIVATION • Graph neuralnetworks (GNNs) are gaining more traction in many application fields o The need for architectures scalable to large graphs is becoming a pressing issue. Especially, it becomes challenge when dealing with discrete-time dynamical graphs. Overview and Limitation of Previous Works • Previous works focus on subsampling graphs to reduce the computational requirements: o Sampling the node-level observation: if they were i.i.d., it can break spatial dependencies in static graphs and in dynamic case when deal with temporal dimension. o Precomputing aggregated features over the graph allows for factoring out spatial propagation:  Preprocessing step must account also for the temporal dependencies besides the graph topology.  Hence it is not trivial in spatiotemporal case for large scale.
  • 4.
    4 INTRODUCTION • Propose ScalableGraph Predictor (SGP) framework for spatiotemporal time series: o Exploits a novel encoding method based on randomized recurrent components and scalable GNNs architectures. o Spatiotemporal encoding is training-free: exploits a deep randomized recurrent neural network.  Encode the history of each sequence in a high- dimensional vector embedding. Contribution  Uses powers of the graph adjacency matrix to build informative node representations of the spatiotemporal dynamics at different scales. o From downstream task, decoder maps the node representations into the desired output.  Act as a collection of filters localized at different spatiotemporal scales.
  • 5.
    5 METHODOLOGY Problem Definition • Givengraph at time t as 𝐺𝑡 = 𝑋𝑡, 𝑈𝑡, 𝑉, 𝐴𝑡 : o 𝑉 ∈ ℝ𝑁×𝑑𝑣 is set of additional, optional, static node attributes from 𝑁 interlinked sensors. o Adjacency matrix 𝐴𝑡 ∈ ℝ𝑁×𝑁 o Node attribute matrix 𝑋𝑡 ∈ ℝ𝑁×𝑑𝑥 and 𝑈𝑡 ∈ ℝ𝑁×𝑑𝑢 is exogenous variables matrix (e.g., weather information related to a monitored area). o 𝑑: dimensional multivariate observation. • Problem: predict the next H observations given a window of W past measurement • Echo-State Network: consist of recurrent neural networks with random connections o Idea: feed an input signal into a high-dimensional, randomized, and non-linear reservoir, whose internal state can be used as an embedding of the input dynamics. o Reservoir will extract a rich pool of dynamics characterizing the system underlying the input time series
  • 6.
    6 METHODOLOGY Main Architecture • Hybridencoder-decoder architecture o Encoder: constructs representations of the time series observed at each node by using a reservoir that accounts for dynamics at different time scales.  Representations are further processed for spatial dynamics by graph structure.  Final embedding is then built by concatenating representations obtained w.r.t. each propagation step. o Decoder (or readout): learning separate weight matrices for each spatiotemporal scale before MLP.
  • 7.
    7 METHODOLOGY Spatiotemporal Encoder • DeepEcho State network: consider each node is encoded by a stack of L randomized recurrent layers. o a hierarchical stack of reservoir layers to extract a richpool of multi-scale temporal dynamic by changing discount factor γl o Obtain node-level temporal encodings ℎ𝑡 𝑖 for each node i and time-step t.
  • 8.
    8 METHODOLOGY Spatiotemporal Encoder • Indicateas 𝐻𝑡 the encoding for the whole graph at time t. • Use powers of a graph shift operator 𝐴 to propagate and aggregate node representations at different scales o Features corresponding to each order k computed recursively K sparse matrix-matrix multiplications. where 𝐴 indicates a generic graph shift operator matching the sparsity pattern of the graph adjacency matrix 𝐴 = D−1 A: directed graph 𝐴 = D−1/2 AD−1/2 : undirected graph.
  • 9.
    9 METHODOLOGY Multi-Scale Decoder • Fromconcatenation of temporal representations and spatial representations 𝑆𝑡. • Design first layer with a sparse connectivity pattern to learn representations o Representations 𝑍𝑡 can be efficiently computed by exploiting grouped 1-d convolutions
  • 10.
    10 METHODOLOGY Multi-Scale Decoder o Obtainedrepresentations are fed into an MLP that predicts the H-step-ahead observations: • Training and sampling: o Representations 𝑆𝑡 embed both the temporal and spatial relationships among observations over the sensor network. o Each sample 𝑠𝑡 𝑖 can be processed independently since no further spatiotemporal information needs to be collected. o Allow to train the decoder with SGD by uniformly and independently sampling mini-batches of data points 𝑠𝑡 𝑖 . o Make training scalable and drastically reduces the lower bound on the computational complexity.
  • 11.
    11 EXPERIMENT AND RESULT EXPERIMENTSETTINGs • Dataset: o Medium scale: METR-LA, PEMS-BAY (traffic dataset). o Large scale: PV-US, CER-En (Energy production dataset). • Baselines: o Deep Learning: LSTM, and FC-LSTM. o Graph methods: DCRNN[1], Graph WaveNet(GWNet)[2], GatedGN (GGN)[3], and DynGESN[4]. [1] Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2017). Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926.. [2] Wu, Z., Pan, S., Long, G., Jiang, J., & Zhang, C. (2019). Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121. [3] Gao, J., & Ribeiro, B. (2022, June). On the equivalence between temporal and static equivariant graph representations. In International Conference on Machine Learning (pp. 7052-7076). PMLR. [4] Micheli, A., & Tortorella, D. (2022). Discrete-time dynamic graph echo state networks. Neurocomputing, 496, 85-95. • Measurement: o MAE, MSE, MAPE.
  • 12.
    12 EXPERIMENT AND RESULT RESULT– Overall Performance in Medium-scale Datasets
  • 13.
    13 EXPERIMENT AND RESULT RESULT– Overall Performance in Large-scale Datasets
  • 14.
    14 CONCLUSION • Proposed SGP,a scalable architecture for graph-based spatiotemporal time series forecasting. o Deal well with medium-sized and improving scalability in large sensor network. o While in SGP sampling largely reduces GPU memory usage, the entire processed sequence can take up a large portion of system memory, depending on the size of the reservoir. o The preprocessed data stored on disk and loaded in batches during training, as customary for large datasets. Hence, preprocessing can be distributed. • Future work: o Explore a tighter integration of the spatial and temporal encoding components o Assess performance on even larger benchmarks. Summarization