[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Causal Lens and Treatment.pptx
1. Quang-Huy Tran
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: huytran1126@gmail.com
2024-04-15
Deciphering Spatio-Temporal Graph
Forecasting: A Causal Lens and Treatment
Yutong Xia et al.
NeurIPS’37: 2023 Conference on Neural Information Processing Systems
3. 3
MOTIVATION
• Spatio-Temporal Graph (STG) represents the spatial and temporal relationships
between nodes or entities, which is widely used in various fields (e.g., transportation,
environment and epidemiology).
Spatio-Temporal Graph Forecasting
• STG forecasting has become crucial in the context of smart cities (e.g. informed
decision-making, sustainable environments).
4. 4
MOTIVATION
• STG data is subject to temporal dynamics.
o Exhibit various data generation distribution over time –
temporal out-of-distribution (OoD).
o 𝑃𝐴 𝑥 ≠ 𝑃𝐵 𝑥 ≠ 𝑃𝑡𝑒𝑠𝑡 𝑥
Spatio-Temporal Forecasting Challenges
• Dynamic spatial causation: mostly previous works
follows.
o Rely on distance-based adjacency matrix to perform message
passing.
o Or attention mechanism to calculate dynamic spatial
correlations between nodes.
o Ripple effects of causal relations.
5. 5
INTRODUCTION
• Concurrently tackle the temporal OoD issues and dynamic spatial causation via causal
treatments.
• Proposed a novel framework Causal Spatio-Temporal neural network (CaST):
o Presented a Structural Causal Model (SCM) to gain deeper understanding of data generation process
of STG data
o Based on SCM, they proposed:
o utilize back-door adjustment to enhance the generalization capability for unseen data (temporal
OoD).
o apply front-door adjustment along with an edge-level convolution operator to effectively
capture the dynamic causation between nodes.
6. 6
METHODOLOGY
Problem Definition: Causal Len
• Causal inference:
o investigate causal relationships between variables, ensuring stable and robust learning and
inference.
o in STG, it commonly used to address temporal OoD issue by learning disentangled seasonal-trend
representations or environment-specific representations.
• Denote signal 𝑋𝑡
∈ ℝ𝑁×𝐷
of 𝑁nodes at time step 𝑡, with D features.
• Given historical previous 𝑇 time steps, learn a function to forecast next 𝑆 steps:
𝑋 𝑡−𝑇 :𝑡 ℱ(. )[𝑌 𝑡+1 : 𝑡+𝑆 ]
7. 7
METHODOLOGY
Structural Causal Model (SCM)
• Assume E & C are independent.
o 𝑿 ⟵ 𝑬 ⟶ 𝒀: The temporal OoD can arise due to changes in external variables over time.
(e.g., weather can affect traffic flow observations).
o 𝑿 ⟵ 𝑪 ⟶ 𝒀: X and Y are intrinsically affected by the surrounding spatial context, comprising
both spurious and genuine causal components.
o 𝑿 ⟶ 𝒀: Primary goal.
o The causal relationship: P(𝑋, 𝑌|𝐸, 𝐶) = 𝑃 𝑋 𝐸, 𝐶 𝑃(𝑌|𝑋, 𝐸, 𝐶).
𝐸: Temporal Environment
𝐶: Spatial Context.
𝑋: Historical Node Signals.
𝑌: Future Signals.
8. 8
METHODOLOGY
Structural Causal Model (SCM)
• Assume E & C are independent.
o 𝑿 ⟵ 𝑬 ⟶ 𝒀: The temporal OoD can arise due to changes in external variables over time. (e.g.,
weather can affect traffic flow observations).
o 𝑿 ⟵ 𝑪 ⟶ 𝒀: X and Y are intrinsically affected by the surrounding spatial context, comprising
both spurious and genuine causal components.
o 𝑿 ⟶ 𝒀: Primary goal.
o The causal relationship: P(𝑋, 𝑌|𝐸, 𝐶) = 𝑃 𝑋 𝐸, 𝐶 𝑃(𝑌|𝑋, 𝐸, 𝐶).
𝐸: Temporal Environment
𝐶: Spatial Context.
𝑋: Historical Node Signals.
𝑌: Future Signals.
Confounding factors
Backdoor
paths
9. 9
METHODOLOGY
Structural Causal Model (SCM)
• Back-door adjustment for E
• Aspect of X is impacted by E and C. To mitigate, they block the back-door path from 𝐸
to 𝑋.
o Remove 𝐸’s confounding effect.
10. 10
METHODOLOGY
Structural Causal Model (SCM)
• Front-door adjustment for C
• Introducing a mediating variable 𝑋∗
between 𝑋 and 𝑌 to mimic a more accurate
representation excluded the spurious parts in 𝐶.
o De-confounding 𝐶’s spurious effect.
12. 12
METHODOLOGY
Backdoor Adjustment
• 2 Steps:
o separating the environment feature from the
input data.
o discretizing the environments.
• 2 main module:
o Environment Disentangler block.
o A learnable Environment Codebook + Representation Disentanglement.
13. 13
METHODOLOGY
Backdoor Adjustment
• Environment Disentangler block:
o EnvEncoder: a series of 1D convolutions, average pooling, and a linear
projection.
o EntEncoder: Fast Fourier Transform and self-attention mechanism to
extract features from both time and frequency domains.
• Environment Codebook: latent embedding space e = {e1, . . , eK}
o a nearest neighbor in the shared embedding space 𝑒 to identify the closest
latent vector for each node’s environment representation.
o Final environment representation is corresponding closest discrete vector in 𝑒.
• Representation Disentanglement: entity representations carry minimal
information (MI) about the environment.
o Employ Mutual Information Neural Estimation by Kullback-Leiber (KL) divergence.
14. 14
METHODOLOGY
Front-door Adjustment
• Construct boundary edge graph:
o First-order: maps pairs of nodes to edges.
o Second-order: maps pairs of edges to
triangles.
• Introducing Hodge-Laplacian (HL) Deconfounder:
o higher-order graph over edges and perform edge convolution to filter
edge signal
o Goal: capture the dynamic causal relations of nodes as well as position
embeddings to learn the nodes’ global location information.
o Then, use a linear transformation and Graph convolutional networks
(GCN) to create causal surrogate.
15. 15
METHODOLOGY
Loss Function - Optimization
• Mutual Information Regularization: thwart the classifier to discern the true labels
o Ensure the classifier can not determine the true corresponding environment based on the
information provided by hidden feature.
• Environment Codebook: prediction loss and codebook loss.
𝛼: balancing hyperparameter.
𝑠𝑔[. ]: stop gradient operator.
• Overall loss function:
16. 16
EXPERIMENT AND RESULT
EXPERIMENT
• Measurement:
o Mean Absolute Errors (MAE) and Root Mean Squared Errors (RMSE).
• Dataset:
o PEMS08: traffic flow data on 8 roads with a time interval of 5 minutes.
o AIR-BJ and AIR-GJ: one-year PM2.5 readings collected from air quality monitoring stations in Beijing
and Guangzhou.
• Task:
o predict over the next 24 steps given the past 24 steps.
• Variants:
o CaST-ADP: using a self-adaptive adjacency matrix.
o CaST-GAT: using the graph attention mechanism for causal scoring.
17. 17
• Baseline:
o Historical Average(HA).
o Vector autoregression (VAR).
o DCRNN[1]: Diffusion Convolution Recurrent Neural Network.
o STSGCN[2]: Spatial temporal synchronous graph convolutional networks.
o ASTGCN[3]: Attention Spatial-Temporal graph convolutional networks.
o MTGNN[4]: Multi Time Series Graph Neural Network.
o AGCRN[5]: Adaptive Graph Convolutional Recurrent Neural Network.
o GMSDR[6]: Graph Multi-Step Dependency Relation.
o STGNCDE[7]: Spatio-temporal graph neural controlled differential equation.
EXPERIMENT AND RESULT
EXPERIMENT
[1] Li, Y.; Yu, R.; Shahabi, C.; and Liu, Y. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In ICLR.
[2] Song, C.; Lin, Y.; Guo, S.; and Wan, H. 2020. Spatial temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In AAAI.
[3] Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, and Huaiyu Wan. 2019. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 922–929.
[4] Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the dots: Multivariate time series forecasting with graph neural networks. InProceedings of the 26th ACM SIGKDD International Conference on K
nowledge Discovery & Data Mining. 753–763.
[5] Lei Bai, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. 2020. Adaptive graph convolutional recurrent network for traffic forecasting. Advances in neural information processing systems 33(2020), 17804–17815.
[6] Dachuan Liu, Jin Wang, Shuo Shang, and Peng Han. 2022. Msdr: Multi-step dependency relation networks for spatial temporal forecasting. In Proceedings of the 28th ACM SIGKDDConference on Knowledge Discovery and Data Mining. 1042–1050.
[7] Jeongwhan Choi, Hwangyong Choi, Jeehyun Hwang, and Noseong Park. 2022. Graph neuralcontrolled differential equations for traffic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 6367–6374.
19. 19
EXPERIMENT AND RESULT
RESULT – Edge and Component Analysis
Fig: Performance comparison of different variant on
PEMS08 and England datasets.
• Effects of Edge Convolution
• Effects of Components and Visualization of Dynamic Spatial Causation
20. 20
EXPERIMENT AND RESULT
RESULT – Interpretation Analysis
• Analysis on Environmental Codebook
• Interpretation of Temporal Environments
21. 21
CONCLUSION
• Took a causal look at the STG forecasting problem:
• Temporal out-of-distribution (OoD) issue.
• Proposed a novel Causal Spatio-Temporal neural network (CaST).
• Utilized back-door and front-door adjustments for resolving challenges.
• Verified effectiveness, generalizability, and interpretability through extensive
experiments on three datasets
Editor's Notes
𝐾: discrete space size/ total number of environments.
Variants:
CaST-ADP: using a self-adaptive adjacency matrix
CaST-GAT: using the graph attention mechanism for causal scoring
Effective of Components
w/o Env: excludes environment features for prediction.
w/o Ent: omits entity features for prediction.
w/o Edge: not utilize the causal score to guide the spatial message passing