Traffic Flow Forecasting with Spatial-Temporal
Graph Diffusion Network
AAAI Association for the Advancement of Artificial Intelligence, 2020
Xiyue Zhang, Chao Huang, Yong Xu, Lianghao Xia, Peng Dai, Liefeng Bo, Junbo Zhang, Yu Zheng
August 27, 2021
Presenter: Kyunghwan Mun
Contents
• Overview of the Paper
• Background and Motivation
• Introduction
• Methodology
• Experiment
• Conclusion and Discussion
Overview of the Paper
• The Framework of Spatial-Temporal Graph Diffusion Networks (ST-GDN)
• Temporal Hierarchy Modeling
• Traffic Dependency Learning with Global Context
• Region-wise Relation Learning with Graph Diffusion Paradigm
• Traffic Prediction Phase
2
Background and Motivation
• Focus on neighboring spatial correlations (adjacent regions)
▪ Ignore the global geographical contextual information
• Fail to encode the complex traffic transition regularities
▪ Time-dependent
▪ Multi-resolution
• Spatial-Temporal Graph Diffusion Network (ST-GDN)
▪ Hierarchically structured graph neural architecture
• The local region-wise geographical dependencies
• The spatial semantics from a global perspective
▪ Multi-scale attention network
• Multi-level temporal dynamics
3
Introduction
• Existing studies & Limitations
▪ Sequence Learning
• Model temporal effects
▪ Convolutional Neural Networks (CNN)
• Model correlations between adjacent regions
▪ Recurrent Neural Networks (RNN)
• Apply on temporal dimension
▪ Using each individual time resolution
• Hourly, Daily, Weekly
• To consider complex and multi-periodic attributes
▪ Ignore the cross-region inter-dependencies
• Not using similar urban functions
• (Ex) Shopping zone, Transportation hub 4
Introduction
• Spatial-Temporal Graph Diffusion Network (ST-GDN)
▪ Multi-scale self-attention network
• Multi-grained temporal dynamics
• Various time resolutions
• Aggregation layer
• Underlying dependencies across multi-level temporal dynamics
• Attentive graph diffusion paradigm
• Incorporate local-level spatially adjacent relations and global-level traffic patterns
5
6
Methodology
• Temporal Hierarchy Modeling
▪ Temporal Hierarchy Modeling
• Multi-scale self-attention network
• Map multi-level temporal signals into common latent representations
• Temporal Resolution (P), Traffic series (𝑥𝑖,𝑗
𝑇𝑃
), Traffic series length (𝑇𝑃)
• 𝑃 ∈ {hour, day, week}
• Three transformation matrices
• Query matrix
• Key matrix
• Value matrix
𝑄
𝐾
𝑉
= 𝐸𝑃
𝑊𝑄
𝑊𝐾
𝑊𝑉
, 𝑌𝑃
= 𝜎
𝑄𝐾𝑇
𝑑
𝑉
𝜎 : Softmax function
7
Methodology
• Traffic Dependency Learning with Global Context - 1
▪ Traffic Dependency Learning with Global Context
• Attentive aggregation mechanism
• Capture both local & global traffic dependency
• Using multi-head attention mechanism
• To capture the region-wise relation semantic
from different learning subspace
8
Methodology
• Traffic Dependency Learning with Global Context - 2
▪ Attention coefficients (𝑤 𝑖,𝑗 ;(𝑖′,𝑗′)
ℎ
)
(Learned quantitative region-wise relevance score)
• Step 1. Concatenate two embedding vectors
• Step 2. Forward the concatenated vector using LeakyReLu function
• Step 3. Calculate attention coefficients using Softmax function
• Step 4. Consider ∀ ℎ ∈ {1, … , 𝐻}
9
Methodology
• Traffic Dependency Learning with Global Context - 3
▪ Message aggregation over G (Region graph)
• Concatenate them for ∀ ℎ ∈ {1, … , 𝐻}
▪ Information aggregation
• 𝑍𝑖,𝑗
𝑃
: The aggregated feature embedding of 𝑟𝑖,𝑗
10
Methodology
• Traffic Dependency Learning with Global Context - 4
▪ High-order Information Propagation
• High-order relation modeling
• Global-level representation of region 𝑟𝑖,𝑗
𝑍𝑖,𝑗
𝑃
= 𝑍𝑖,𝑗
𝑃,(𝑙)
⊕ ⋯ ⊕ 𝑍𝑖,𝑗
𝑃,(𝐿)
, where ⊕ is the element-wise addition
11
Methodology
• Region-wise Spatial Relation Encoding -1
▪ Graph-structured diffusion network
• Region-wise relation graph (𝐺𝑠 = 𝑅𝑠, 𝐸𝑠, 𝐴 )
• Consider K neighboring regions of 𝑟𝑖,𝑗
• Diffusion convolution operation
𝑫𝑶
−𝟏
𝑨 and 𝑫𝒊
−𝟏
𝑨𝑻
: bi-directional transition matrices 🚌…
• Region representation
12
Methodology
• Region-wise Spatial Relation Encoding -2
▪ Consider the multi-resolution traffic patterns
(i.e.) Hourly, Daily, Weekly
where ° is Hadamard product
13
Methodology
• Traffic Prediction Phase
▪ External factors
• Meteorological conditions
• Weather conditions
• Temperature / ℃
• Wind speed / mph
• Map features into vectors 𝑔𝑡
• Projection over 𝑔𝑡 using multi-layer perceptron
▪ Concatenate embedding (Λ𝑖,𝑗 and 𝑔𝑡)
▪ Loss function
14
Experiment
• Baseline models
▪ Traditional Time Series Prediction
• ARIMA 🚌…, Support Vector Regression (SVR)
▪ Conventional Hybrid Learning
• Fuzzy+NN
▪ Recurrent Spatial-Temporal Prediction
• ST-RNN
• D-LSTM
▪ Convolution-based Network
• DeepST
• ST-ResNet
15
Experiment
• Baseline models
▪ Convolutional Recurrent Predictive
• DMVST-Net
• DCRNN
▪ Attentive Traffic Prediction
• STDN 🚌…
▪ Graph Neural Networks
• ST-GCN
• ST-MGCN
• GMAN
▪ Deep Hybrid Traffic Flow Predictive
• UrbanFM 🚌…
• ST-MetaNet
16
Experiment
• Dataset
▪ BJ-Taxi, NYC-Bike, NYC-Taxi + External Factors
• Time interval & Grid-based
▪ 30 minutes / 1 hour
• Metrics
▪ RMSE, MAPE
• Methods
▪ Inflow, Outflow for Prediction
17
Experiment
• Performance comparison (RMSE & MAPE)
18
Conclusion and Discussion
• Grid-based partition  Clustering ?
▪ Density Peak Clustering (DPC) 🚌…
(Ex) Coupled Layer-wise Convolutional Recurrent Neural Network (CCRNN)
▪ K-Means Clustering 🚌…
▪ K-Medians Clustering
▪ Mean-Shift Clustering
▪ Density-Based Spatial Clustering of Applications with Noise (DBSCAN) 🚌…
▪ Etc …
• Adjacency Matrix
▪ Bi-directional transition matrices (𝐴, 𝐴𝑇) + Additional weights based on road network
▪ Existing Studies use fixed adjacency matrix
▪ But CCRNN use different matrices in different layers
▪ But CCRNN generate adjacency matrix (Time-wise matrix & Station-wise matrix)
 Diffusion (ST-GDN) + Normalized + Clustering + Different adjacency matrices in different layers (CCRNN) ? 🚌…
[Paper] Ye, Junchen, et al. "Coupled layer-wise graph convolution for transportation demand prediction." arXiv preprint arXiv:2012.08080 (2020).
19
Conclusion and Discussion
• Combined Clustering Algorithm based on regions ? 🚌…
[Paper] Ren, Chunhua, et al. "Effective Density Peaks Clustering Algorithm Based on the Layered K-Nearest Neighbors and Subcluster Merging." IEEE Access 8 (2020): 123449-123468.
Thank you
Any questions?

Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network

  • 1.
    Traffic Flow Forecastingwith Spatial-Temporal Graph Diffusion Network AAAI Association for the Advancement of Artificial Intelligence, 2020 Xiyue Zhang, Chao Huang, Yong Xu, Lianghao Xia, Peng Dai, Liefeng Bo, Junbo Zhang, Yu Zheng August 27, 2021 Presenter: Kyunghwan Mun
  • 2.
    Contents • Overview ofthe Paper • Background and Motivation • Introduction • Methodology • Experiment • Conclusion and Discussion
  • 3.
    Overview of thePaper • The Framework of Spatial-Temporal Graph Diffusion Networks (ST-GDN) • Temporal Hierarchy Modeling • Traffic Dependency Learning with Global Context • Region-wise Relation Learning with Graph Diffusion Paradigm • Traffic Prediction Phase 2
  • 4.
    Background and Motivation •Focus on neighboring spatial correlations (adjacent regions) ▪ Ignore the global geographical contextual information • Fail to encode the complex traffic transition regularities ▪ Time-dependent ▪ Multi-resolution • Spatial-Temporal Graph Diffusion Network (ST-GDN) ▪ Hierarchically structured graph neural architecture • The local region-wise geographical dependencies • The spatial semantics from a global perspective ▪ Multi-scale attention network • Multi-level temporal dynamics 3
  • 5.
    Introduction • Existing studies& Limitations ▪ Sequence Learning • Model temporal effects ▪ Convolutional Neural Networks (CNN) • Model correlations between adjacent regions ▪ Recurrent Neural Networks (RNN) • Apply on temporal dimension ▪ Using each individual time resolution • Hourly, Daily, Weekly • To consider complex and multi-periodic attributes ▪ Ignore the cross-region inter-dependencies • Not using similar urban functions • (Ex) Shopping zone, Transportation hub 4
  • 6.
    Introduction • Spatial-Temporal GraphDiffusion Network (ST-GDN) ▪ Multi-scale self-attention network • Multi-grained temporal dynamics • Various time resolutions • Aggregation layer • Underlying dependencies across multi-level temporal dynamics • Attentive graph diffusion paradigm • Incorporate local-level spatially adjacent relations and global-level traffic patterns 5
  • 7.
    6 Methodology • Temporal HierarchyModeling ▪ Temporal Hierarchy Modeling • Multi-scale self-attention network • Map multi-level temporal signals into common latent representations • Temporal Resolution (P), Traffic series (𝑥𝑖,𝑗 𝑇𝑃 ), Traffic series length (𝑇𝑃) • 𝑃 ∈ {hour, day, week} • Three transformation matrices • Query matrix • Key matrix • Value matrix 𝑄 𝐾 𝑉 = 𝐸𝑃 𝑊𝑄 𝑊𝐾 𝑊𝑉 , 𝑌𝑃 = 𝜎 𝑄𝐾𝑇 𝑑 𝑉 𝜎 : Softmax function
  • 8.
    7 Methodology • Traffic DependencyLearning with Global Context - 1 ▪ Traffic Dependency Learning with Global Context • Attentive aggregation mechanism • Capture both local & global traffic dependency • Using multi-head attention mechanism • To capture the region-wise relation semantic from different learning subspace
  • 9.
    8 Methodology • Traffic DependencyLearning with Global Context - 2 ▪ Attention coefficients (𝑤 𝑖,𝑗 ;(𝑖′,𝑗′) ℎ ) (Learned quantitative region-wise relevance score) • Step 1. Concatenate two embedding vectors • Step 2. Forward the concatenated vector using LeakyReLu function • Step 3. Calculate attention coefficients using Softmax function • Step 4. Consider ∀ ℎ ∈ {1, … , 𝐻}
  • 10.
    9 Methodology • Traffic DependencyLearning with Global Context - 3 ▪ Message aggregation over G (Region graph) • Concatenate them for ∀ ℎ ∈ {1, … , 𝐻} ▪ Information aggregation • 𝑍𝑖,𝑗 𝑃 : The aggregated feature embedding of 𝑟𝑖,𝑗
  • 11.
    10 Methodology • Traffic DependencyLearning with Global Context - 4 ▪ High-order Information Propagation • High-order relation modeling • Global-level representation of region 𝑟𝑖,𝑗 𝑍𝑖,𝑗 𝑃 = 𝑍𝑖,𝑗 𝑃,(𝑙) ⊕ ⋯ ⊕ 𝑍𝑖,𝑗 𝑃,(𝐿) , where ⊕ is the element-wise addition
  • 12.
    11 Methodology • Region-wise SpatialRelation Encoding -1 ▪ Graph-structured diffusion network • Region-wise relation graph (𝐺𝑠 = 𝑅𝑠, 𝐸𝑠, 𝐴 ) • Consider K neighboring regions of 𝑟𝑖,𝑗 • Diffusion convolution operation 𝑫𝑶 −𝟏 𝑨 and 𝑫𝒊 −𝟏 𝑨𝑻 : bi-directional transition matrices 🚌… • Region representation
  • 13.
    12 Methodology • Region-wise SpatialRelation Encoding -2 ▪ Consider the multi-resolution traffic patterns (i.e.) Hourly, Daily, Weekly where ° is Hadamard product
  • 14.
    13 Methodology • Traffic PredictionPhase ▪ External factors • Meteorological conditions • Weather conditions • Temperature / ℃ • Wind speed / mph • Map features into vectors 𝑔𝑡 • Projection over 𝑔𝑡 using multi-layer perceptron ▪ Concatenate embedding (Λ𝑖,𝑗 and 𝑔𝑡) ▪ Loss function
  • 15.
    14 Experiment • Baseline models ▪Traditional Time Series Prediction • ARIMA 🚌…, Support Vector Regression (SVR) ▪ Conventional Hybrid Learning • Fuzzy+NN ▪ Recurrent Spatial-Temporal Prediction • ST-RNN • D-LSTM ▪ Convolution-based Network • DeepST • ST-ResNet
  • 16.
    15 Experiment • Baseline models ▪Convolutional Recurrent Predictive • DMVST-Net • DCRNN ▪ Attentive Traffic Prediction • STDN 🚌… ▪ Graph Neural Networks • ST-GCN • ST-MGCN • GMAN ▪ Deep Hybrid Traffic Flow Predictive • UrbanFM 🚌… • ST-MetaNet
  • 17.
    16 Experiment • Dataset ▪ BJ-Taxi,NYC-Bike, NYC-Taxi + External Factors • Time interval & Grid-based ▪ 30 minutes / 1 hour • Metrics ▪ RMSE, MAPE • Methods ▪ Inflow, Outflow for Prediction
  • 18.
  • 19.
    18 Conclusion and Discussion •Grid-based partition  Clustering ? ▪ Density Peak Clustering (DPC) 🚌… (Ex) Coupled Layer-wise Convolutional Recurrent Neural Network (CCRNN) ▪ K-Means Clustering 🚌… ▪ K-Medians Clustering ▪ Mean-Shift Clustering ▪ Density-Based Spatial Clustering of Applications with Noise (DBSCAN) 🚌… ▪ Etc … • Adjacency Matrix ▪ Bi-directional transition matrices (𝐴, 𝐴𝑇) + Additional weights based on road network ▪ Existing Studies use fixed adjacency matrix ▪ But CCRNN use different matrices in different layers ▪ But CCRNN generate adjacency matrix (Time-wise matrix & Station-wise matrix)  Diffusion (ST-GDN) + Normalized + Clustering + Different adjacency matrices in different layers (CCRNN) ? 🚌… [Paper] Ye, Junchen, et al. "Coupled layer-wise graph convolution for transportation demand prediction." arXiv preprint arXiv:2012.08080 (2020).
  • 20.
    19 Conclusion and Discussion •Combined Clustering Algorithm based on regions ? 🚌… [Paper] Ren, Chunhua, et al. "Effective Density Peaks Clustering Algorithm Based on the Layered K-Nearest Neighbors and Subcluster Merging." IEEE Access 8 (2020): 123449-123468.
  • 21.