Graduate Outcomes Presentation Slides - English (v3).pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
1. Quang-Huy Tran
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: huytran1126@gmail.com
2024-04-22
Taming Local Effects in Graph-based
Spatiotemporal Forecasting
Andrea Cini et al.
NeurIPS’37: 2023 Conference on Neural Information Processing Systems
3. 3
MOTIVATION
• Graph-based methods are effective in forecasting collections of time series.
• Deep learning methods combine sequence-processing operators with message
passing.
o A single (global) inductive model is trained to predict any time series associated with a node.
o It is common practice to use node-specific (local) parameters to help node identification.
Improved modeling of local effects and accuracy.
Inductive capabilities compromised.
Overview graph
4. 4
INTRODUCTION
• Understanding the interplay between globality and locality in graph-based models.
• A methodological framework for designing node-specific components effectively.
• Methods to ease transferability of the global components to new nodes/graphs.
5. 5
PROBLEM FORMULATION
• We consider a set of 𝑁 correlated time series, where each 𝑖-th time series associated
with:
o an observation vector Xt = 𝑥𝑡
𝑖
∈ ℝ𝑑𝑥 at each time step t.
o a vector of exogenous variable Ut = 𝑢𝑡
𝑖
∈ ℝ𝑑𝑢 at each time step t;
Collections of time series
6. 6
PROBLEM FORMULATION
Relational Information
• Assume the existence of functional dependencies between time
series:
o i.e. forecasts for one time series can be improved by
accounting for the past values of other time series.
• Model pairwise relationships existing at time step t with
adjacency matrix 𝐴 ∈ ℝ𝑁×𝑁.
o A can be asymmetric (directed graph).
• Spatial is called the dimension spanning the time series
collection.
7. 7
PROBLEM FORMULATION
Time Series Forecasting
• Multi – step time series forecasting:
o Given a window of observations 𝑋𝑡−𝑊:𝑡, the goal is to predict the next 𝐻 observations 𝑋𝑡:𝑡+𝐻
𝑋𝑡:𝑡+𝐻 = 𝑓(𝑋𝑡−𝑤:𝑡)
• The deep learning approach to forecasting consists of
training:
o A global neural network (NN).
o local node specification NN.
8. 8
METHODOLOGY
Global and Local Forecasting
• Global forecasting model: if its parameters are fitted to a group of time series.
o All learnable parameters are shared.
o More data available for training.
o Can be used in inductive learning.
• Local forecasting model: specific to a single time series.
o Capture better series-specific dynamics.
o Often require shorter input windows.
o Or reduced model capacity.
9. 9
METHODOLOGY
Relational inductive biases
• Both approaches share a drawback: dependencies across time are often discarded.
Message Passing Neural Network for Spatiotemporal GNNs: Spatiotemporal
Message Passing (STMP)
• Embed relational information as an
architectural bias into the processing.
• Graph Neural Network (GNN) provide
appropriate neural operators.
11. 11
METHODOLOGY
Spatiotemporal message passing - Globality
• The cornerstone operator in STGNNs is the STMP layer
where 𝜌𝑙: update function of layer 𝑙 and 𝛾𝑙: message function of layer 𝑙.
12. 12
METHODOLOGY
Globality and Locality in STGNNs
• Limitation of Global model:
• Hybrid global-local STGNNs with
specialized local components:
o Struggle to model local effects.
o Require large model capacity or impractically
long windows.
o Node-level effects are captured more
efficiently than by fully global models.
o Forecasting accuracy on the task is usually
higher empirically.
13. 13
METHODOLOGY
Limits of global-local STGNNs
• Local components in a global STGNN’s disadvantages:
o Model’s inductive capabilities are compromised (hard to handle unseen time series).
o The number of learnable parameters can be much larger compared to fully global model.
14. 14
METHODOLOGY
Learnable node embeddings
• Mitigate drawbacks by using node embeddings, a table of learnable parameter
associated with each node:
o Fed into global STGNN and learned end-to-end.
• Amortize cost of specializing the model to each time series:
o A single vector for each node is added to the model’s parameters.
o Same vector can be used in multiple components of the architecture.
• Transfer the learned model to a different set of 𝑁′
time series more easily:
o Only 𝑁′𝑑𝑣 parameters need to be tuned, while the shared components are fixed.
o The embedding space can be regularized to better fit embeddings of new nodes.
15. 15
METHODOLOGY
Structuring the Embedding space
• Two strategies for regularizing the embedding space:
o Variational: a smoother embedding space to enable interpolation.
Model each node embedding as sample from a multivariate Gaussian under sampling t
where (𝜇𝑖, 𝜎𝑖): learnable (local) parameters.
where 𝑃 = 𝑁 0, Ι : prior,
𝐷𝐾𝐿 : Kullback-Leibler divergence.
𝛽: controls the regularization strength.
16. 16
METHODOLOGY
Structuring the Embedding space
• Two strategies for regularizing the embedding space:
o Clustering: make clusters in the latent space to improve interpretability.
Add a matrix 𝐶 ∈ ℝ𝐾×𝑑𝑣 of K ≪ 𝑁 learnable centroids and a cluster assignment matrix 𝑆 ∈
ℝ𝑁×𝐾
containing node-cluster pair scores.
where 𝜏: hyperparameter.
17. 17
EXPERIMENT AND RESULT
EXPERIMENT
• Measurement:
o Mean Absolute Errors (MAE)
• Dataset:
o GPVAR(-L): a graph with 20 communities resulting in a network.
o Traffic flow forecasting: METR-LA and PEMS-BAY. For transfer learning, PEMS03, PEMS04, PEMS07,
and PEMS08 dataset are used.
o Electric load forecasting : CER-E
dataset, a collection of energy
consumption.
o Air quality monitoring: AQI dataset
collects hourly measurements of
pollutant PM2.5 in China.
18. 18
• Baseline:
o RNN: global univariate RNN sharing the same parameters across the time series.
o FC-RNN: multivariate RNN taking as input the time series as if they were a multivariate one.
o LocalRNNs: local univariate RNNs with different sets of parameters for each time series.
o DCRNN[1]: recurrent T&S model with the Diffusion Convolutional operator.
o AGCRN[2]: T&S global-local Adaptive Graph Convolutional Recurrent Network.
o GraphWaveNet[3]: deep T&S spatiotemporal convolutional network.
EXPERIMENT AND RESULT
EXPERIMENT
[1] Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2017). Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926.
[2] Bai, L., Yao, L., Li, C., Wang, X., & Wang, C. (2020). Adaptive graph convolutional recurrent network for traffic forecasting. Advances in neural information processing systems, 33, 17804-17815.
[3] Wu, Z., Pan, S., Long, G., Jiang, J., & Zhang, C. (2019). Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121.
21. 21
EXPERIMENT AND RESULT
RESULT – Additional Experiments Results
• Analysis on Local components.
o The performance of reference architecture with and without local components
24. 24
CONCLUSION
• Investigate the impact of locality and globality in graph-based spatiotemporal
forecasting architectures.
• Propose a framework to explain empirical results associated with the use of trainable
node embeddings
o discuss different architectures and regularization techniques to account for local effects.
• The proposed methodologies are thoroughly empirically validated:
o effective in a transfer learning context.
• Future works can build on the results presented here and study alternative, and even
more transferable, methods to account for local effects.
1a : results of the analysis by reporting the median load profile for each cluster; shaded areas correspond to quantiles with 10% increments.
TTS-IMP and TTS-AMP, together with FC-RNN (a multivariate RNN) and Local RNNs (local univariate RNNs with a different set of parameters for each time series.
STGNNs: a global variant (without any local component) and global-local alternatives, where insert node-specific components within the architecture
each table shows results for the reference architectures w.r.t. different training set sizes (from 1 day to 2 weeks) and considers the settings where embeddings are fed to both encoder and decoder or decoder only
each table shows results for the reference architectures w.r.t. different training set sizes (from 1 day to 2 weeks) and considers the settings where embeddings are fed to both encoder and decoder or decoder only