[Seminar] 20210122 Kyunghwan Moon

Revisiting Spatial-Temporal Similarity:
A Deep Learning Framework for Trafﬁc Prediction
Association for the Advancement of Artificial Intelligence, 2019
Huaxiu Yao, Xianfeng Tang, Hua Wei, Guanjie Zheng, Zhenhui Li
January 22, 2021
Presenter: KyungHwan Moon

Contents
• Overview of the Paper
• Background and Motivation
• Introduction
• Spatial-Temporal Dynamic Network
• Experiment
• Conclusion and Discussion

2
Overview of the Paper
Local Spatial-Temporal Network
Local spatial dependency
Short-term Temporal Dependency
Spatial Dynamic Similarity:
Flow Gating Mechanism
Temporal Dynamic Similarity:
Periodically Shifted
Attention Mechanism
Joint Training
• Local vs Non-Local
∎ Local : Using the size of the kernel during operation
(Local Operator)
∎ Non-Local : Using the non-local block for entire image
area(Non-Local Operator)
• Spatial & Temporal
• Static(Stationary) vs Dynamic
• Periodic & dependency
Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He, Carnegie Mellon Universit, and Facebook AI Research.2018

3
Background and Motivation
● Traffic prediction has drawn increasing attention in AI research field
∎ The increasing availability of large-scale traffic data and its importance in the real world
● Existing works make strong assumptions
∎ Spatial dependence is stationary in time
∎ Temporal dynamics is strictly periodical
● However, in practice the spatial dependence could be dynamic
∎ The spatial dependencies between locations are dynamic
∎ The temporal dependency follows daily and weekly pattern
but it is not strictly periodic for its dynamic temporal shifting

4
• Previous methods
∎ Traffic time series for each individual location
(Li et al. 2012; Moreira-Matias et al. 2013;)
(Shekhar and Williams 2008; Lippi, Bertini, and Frasconi 2013)
• Recent studies
∎ Taking into account spatial information
-Adding regularizations on model similarity for nearby locations
(Deng etal. 2016; Id´eand Sugiyama 2011; Zheng and Ni 2013)
∎ Taking into account external context information
-Adding features of venue information, weather condition, and local events
(Wu, Wang, and Li 2016; Pan, Demiryurek, and Shahabi 2012; Tong et al. 2017)
※ Do not well capture the complex non-linear spatial-temporal dependency

5
• Modeling the non-linear spatial dependency
∎ A heatmap image and Convolutional Neural Network(CNN)
(Zhang, Zheng, and Qi 2017; Zhang et al. 2016; Ma et al. 2017)
• Modeling the non-linear temporal dependency
∎ Recurrent Neural Network(RNN)-based framework
(Yuet al. 2017; Cui, Ke, and Wang 2016)
• Modeling both spatial and temporal dependencies
∎ Integrating CNN and Long Short-Term Memory(LSTM)
(Yao et al. 2018)

6
• Limitation(1)
∎ The spatial dependency between locations relies only on the similarity of historical traffic
(Zhang, Zheng, and Qi 2017; Yao et al. 2018)
∎ The model learns a static spatial dependency
※ The dependencies between locations could change over time
• Limitation(2)
∎ Many existing studies ignore the shifting of long-term periodic dependency
∎ Traffic data show a strong daily and weekly periodicity and the dependency based on
such periodicity can be useful for prediction
※ The traffic data are not strictly periodic
※ Consider the sequential dependency and the temporal shifting in the periodicity

7
Introduction
• Spatial-Temporal Dynamic Network (STDN)
∎ Based on a spatial-temporal neural network. which handles spatial and temporal
information via local CNN and LSTM, respectively
A flow-gated
local CNN
A periodically
shifted
attention
mechanism
⦁Spatial dependency by modeling the dynamic similarity
among locations using traffic flow informationation
⦁Learn the long-term periodic dependency and captures
both long-term periodic information and temporal shifting
in traffic sequence via attention mechanism
LSTM to handle the sequential dependency in a hierarchical way

8
Introduction
• Datasets
∎ Taxi data of New York City(NYC)
∎ Bike-sharing data of NYC

9
Introduction
• Datasets
∎ Taxi data of New York City(NYC)
∎ Bike-sharing data of NYC
• Contribution Summarization
∎ A flow gating mechanism to explicitly model dynamic spatial similarity
-The gate controls information propagation among nearby locations
∎ A peridically shifted attention mechanism by taking long-term periodic information
and temporal shifting simultaneously
∎ Experiments on several real-world traffic datasets
-The results show that our model is consistently better than other state-of-the-art
methods

10
Spatial-Temporal Dynamic Network (STDN)
Figure 1: The architecture of STDN

11
Periodically shifted attention mechanism (PSAM) captures the
long-term periodic dependency and temporal shifting. For each
day, we also use LSTM to capture the sequential information.
(a)

12
The short-term temporal dependency is captured by one LSTM.
(b)

13
The flow gating mechanism (FGM) tracks the dynamic spatial
similarity representation by controlling the spatial information
propagation; FC means fully connected layers and Conv means
several convolutional layers.
(c)

14
A unified multi-task prediction component predicts two types of traffic
volumes (start volume and end traffic volume) simultaneously.
(d)

15
• Local Spatial-Temporal Network
∎ Combine local CNN and LSTM (Yaoet al. 2018)
-To capture spatial and temporal squential dependency
-To deal with spatial and short-term temporal dependency
∎ Integrate and predict start and End volumes
-To mutually reinforce the prediction of two types of
traffic volumes
Attention Mechanism
Joint Training

16
• Local spatial dependency
∎ Convolutional neural network (CNN)
-To capture the spatial interactions
-treating the entire city as an image and simply applying CNN
may not achieve the best performance. Including regions
with weak correlations to predict a target region actually
hurts the performance.
∎ Use the local CNN
-To model the spatial dependency
Attention Mechanism
Joint Training
For each time interval t, The target region i in the center of the image,
K convolutional layers

17
• Short-term Temporal Dependency
∎ Long Short-Term Memory (LSTM)
-To address the exploding and vanishing gradient issue of
traditional Recurrent Neural Network (RNN)
-To capture the temporal sequential dependency
-The original version of LSTM
(Hochreiter and Schmidhuber 1997)
Attention Mechanism
Joint Training

18
• Spatial Dynamic Similarity: Flow Gating Mechanism-(1)
∎ CNN vs Local CNN
Attention Mechanism
Joint Training
CNN Local CNN
⦁Handles the local structure similarity by
local connection and weight sharingion
⦁The local spatial dependency relies on
the similarity of historical traffic volume
⦁The spatial dependency of volume is
stationary, which can not fully reflect the
relation between the target region and its
neighbors.
n
Traffic flow is a more direct way to represent interactions between regions

19
∎ CNN vs Local CNN
Attention Mechanism
Joint Training
CNN Local CNN
⦁Handles the local structure similarity by
local connection and weight sharingion
⦁The local spatial dependency relies on
the similarity of historical traffic volume
⦁The spatial dependency of volume is
stationary, which can not fully reflect the
relation between the target region and its
neighbors.
n
Traffic flow is a more direct way to represent interactions between regions
The relation between two regions is
stronger(i.e., they are more similar) if
there are more flows existing between
them

20
∎ Flow Gating Mechanism (FGM)
-To explicitly capture dynamic spatial dependency in the
hierarchy
-Construct the local spatial flow image to protect the
spatial to protect the spatial dependency of flow
Attention Mechanism
Joint Training
Inflow Outflow
⦁Inflow departing from other location
ending in the region during the the time
intervalon
⦁Outflow starting from this region toward
somewhereon
The Traffic flow in two categories

21
-To explicitly capture dynamic spatial dependency in the
hierarchy
-Construct the local spatial flow image to protect the
spatial to protect the spatial dependency of flow
Attention Mechanism
Joint Training
Inflow Outflow
⦁Inflow departing from other location
ending in the region during the the time
intervalon
⦁Outflow starting from this region toward
somewhereon
The Traffic flow in two categories

22
-The acquired flow matrices
-Use CNN to model the spatial flow interactions between
regions
Attention Mechanism
Joint Training

23
-Use CNN to model the spatial flow interactions between
regions
After K gated convolutional layers, we use a flatten layer
followed by a fully connected layer to get the flow
gated spatial representation as
Attention Mechanism
Joint Training

24
• Temporal Dynamic Similarity
: Periodically Shifted Attention Mechanism (PSAM)-(1)
∎ Flow Gating Mechanism (FGM) overlooks the long-term
dependency (e.g. periodicity)which is an important
property of spatial-temporal prediction problem
(Zonoozi et al. 2018; Feng et al. 2018)
Attention Mechanism
Joint Training
⦁Temoral shifting between different dayson ⦁Temoral shifting between different weeksn
The temporal shifting of periodicity
Each time in these figures represents a time interval (e.g., 9:30am means 9:00-9:30am)

25
∎ Training LSTM to handle long-term information is a
nontrivial task, since the increasing length enlarges the
risk of gradient vanishing, thus significantly weaken the
effects of periodicity.
∎ Periodically Shifted Attention Mechanism (PSAM)
-To take long-term periodic information into
consideration
-To tackle the limitations that the periodicity is not strict
daily or weekly
Attention Mechanism
Joint Training

26
∎ Select Q time intervals from each day in Q to tackle the
potential temporal shifting
-For example, if the predicted time is 9:00-9:30pm, we
select 1 hour before and after the predicted time
(i.e., 8:00-10:30pm and |Q| = 5).
∎ Use LSTM to protect the sequential information
for each day
Attention Mechanism
Joint Training

27
∎ Adopt an attention mechanism to capture the temporal
shifting and get the weighted representation of each
previous day.
-the representation of each previous days
Attention Mechanism
Joint Training

28
∎ Adopt an attention mechanism to capture the temporal
shifting and get the weighted representation of each
previous day.
-the representation of each previous days
Attention Mechanism
Joint Training

29
• Joint Training-(1)
∎ Concatenate the short-term representation
and long-term representation
∎ For a fully connected layer, get the final prediction
value of start and end traffic volume for each region
Attention Mechanism
Joint Training

30
• Joint Training-(2)
∎ Predict start volume and end traffic volume
simultaneously
- Loss function
Attention Mechanism
Joint Training

31
Experiment
• Results
∎ Performance Comparison
-Root Mean Square Error (RMSE)
-Mean Absolute Percentage Error (MAPE)
Performance
Comparison
Effectiveness of
Effectiveness of
Attention Mechanism

32
Experiment
• Results
∎ Effectiveness of Flow Gating Mechanism (FGM)
-LSTN : Only short-term temporal dependency, and
local spatial dependency
-LSTN-FI : Use traffic flow information as features
instead of using a flow gating mechanism
-LSTN-FGM : Represent the spatial dynamic similarity between
local neighborhoods by utilizing flow gating
mechanism and do not use periodically shifted
attention mechanism
Performance
Comparison
Effectiveness of
Effectiveness of
Attention Mechanism

33
Experiment
• Results
∎ Evaluation of Flow Gating Mechanism (FGM)
Performance
Comparison
Effectiveness of
Effectiveness of
Attention Mechanism

34
Experiment
• Results
∎ Effectiveness of Periodically shifted attention Mechanism (PSAM)
-LSTN-L : Take long-term sequential information into
consideration to extend LSTN
-LSTN-SL : Remove the periodically shifted attention
mechanism in STDN and does not include flow gating
mechanism
-LSTN-PSAM : Add the periodically shifted attention mechanism
on LSTN-SL, only removing the flow gating
mechanism
Performance
Comparison
Effectiveness of
Effectiveness of
Attention Mechanism

35
Experiment
• Results
∎ Evaluation of Periodically shifted attention Mechanism
(PSAM)
Performance
Comparison
Effectiveness of
Effectiveness of
Attention Mechanism

36
Experiment
• Conclusion and Discussion
∎ Investigate the proposed model on other spatial-temporal prediction problems
∎ Explain the model
(i.e., explain feature importance of traffic prediction)
∎ Selecting appropriate hyperparameter in correspondence with data
∎ How much performance depending on the amount of data

[Seminar] 20210122 Kyunghwan Moon

Recommended

Recommended

More Related Content

Similar to [Seminar] 20210122 Kyunghwan Moon

Similar to [Seminar] 20210122 Kyunghwan Moon (20)

More from ivaderivader

More from ivaderivader (20)

Recently uploaded

Recently uploaded (20)

[Seminar] 20210122 Kyunghwan Moon