Traffic Prediction from Street Network images.pptx

TRAFFIC PREDICTION
FROM STREET NETWORK
IMAGES/SHAPE FILES
USING DEEP LEARNING

Why this?
◦ Timely accurate traffic forecast is crucial for urban traffic control and guidance. Due to the high nonlinearity and
complexity of traffic flow, traditional methods cannot satisfy the requirements of mid-and-long term prediction
tasks and often neglect spatial and temporal dependencies.
◦ Traffic Prediction is something we want to incorporate as a part of our Energy transition projects like Sibyl
project aimed to target most people across the world to switch to Electric Vehicle by developing that network
and making it optimal and affordable. We, as Shell, have already done this for Germany, Netherlands, The UK,
The US(ongoing) and Singapore (Coming up from Jan,2022 of which I am the lead)
◦ Traffic Prediction is usually a challenge because of many reasons, primarily data availability, then we can’t
expect the streets to be having a nice grid structure, then comes the data preparation and finally using the
proper model for prediction. I have personally gone through these challenges lately which we will discuss today.

What is this?
◦ Transportation plays a vital role in everybody’s daily life. According to a survey in 2015, U.S. drivers spend about 48
minutes on average behind the wheel daily. [Ahmed and Cook, 1979]
◦ Under this circumstance, accurate real-time forecast of traffic conditions is of paramount importance for road users,
private sectors and governments. Widely used transportation services, such as flow control, route planning, and
navigation, also rely heavily on a high-quality traffic condition evaluation.
◦ In general, multiscale traffic forecast is the premise and foundation of urban traffic control and guidance, which is also
one of main functions of the Intelligent Transportation System (ITS).
◦ In the traffic study, fundamental variables of traffic flow, namely speed, volume, and density are typically chosen as
indicators to monitor the current status of traffic conditions and to predict the future.
◦ Based on the length of prediction, traffic forecast is generally classified into two scales: short-term (5 ∼ 30 min),
medium and long term (over 30 min).

Challenges:
◦ Most prevalent statistical approaches (for example, linear regression) are able to perform well
on short interval forecast. However, due to the uncertainty and complexity of traffic flow, those
methods are less effective for relatively long-term predictions.
◦ Previous studies on mid-and-long term traffic prediction can be roughly divided into two
categories: dynamical modeling and data-driven methods.
◦ Dynamical modeling uses mathematical tools (e.g. differential equations) and physical
knowledge to formulate traffic problems by computational simulation [Vlahogianni,2015]
◦ To achieve a steady state, the simulation process not only requires sophisticated systematic
programming but also consumes massive computational power. Impractical assumptions and
simplifications among the modeling also degrade the prediction accuracy. Therefore, with rapid
development of traffic data collection and storage techniques, a large group of researchers are
shifting their attention to data-driven approaches.
◦ Classic statistical and machine learning models are two major representatives of data-driven
methods.
4
Chirantan Gupta, Data Scientist, Royal Dutch Shell

Challenges:
◦ In time series analysis, autoregressive integrated moving average (ARIMA) and its variants are
one of the most consolidated approaches based on classical statistics [Ahmed and Cook, 1979
;Williams and Hoel,2003] However, this type of model is limited by the stationary assumption of
time sequences and fails to take the spatio-temporal correlation into account.
◦ Therefore, these approaches have constrained representability of highly nonlinear traffic flow.
Recently, classic statistical models have been vigorously challenged by machine learning
methods on traffic prediction tasks.
◦ Higher prediction accuracy and more complex data modeling can be achieved by these models,
such as k-nearest neighbors algorithm (KNN), support vector machine (SVM), and neural
networks (NN).
5

Recent Attempts
DEEP LEARNING APPROACHES:
◦ Deep learning approaches have been widely and successfully applied to various traffic tasks
nowadays. Significant progress has been made in related work, for instance, deep belief network
(DBN) [Jia et al., 2016; Huang et al., 2014](For reference I can share the details of what is DBN
which is in the auxiliary material courtesy to Medium Blog Post and which I can refer to)
◦ However, it is difficult for these dense networks to extract spatial and temporal features from the input
jointly. Moreover, within narrow constraints or even complete absence of spatial attributes, the
representative ability of these networks would be hindered seriously.
◦ To take full advantage of spatial features, some researchers use convolutional neural network (CNN)
to capture adjacent relations among the traffic network, along with employing recurrent neural
network (RNN) on time axis.
◦ By combining long short-term memory (LSTM) network [Hochreiter and Schmidhuber, 1997] and 1-D
CNN, Wu and Tan [2016] presented a feature-level fused architecture CLTFP for short term traffic
forecast. Although it adopted a straightforward strategy, CLTFP(a novel deep architecture combined
CNN and LSTM to forecast future traffic flow (CLTFP), Wu Yankai et al. 2016) still made the first
attempt to align spatial and temporal regularities.
◦ Afterwards, Shi et al. [2015] proposed the convolutional LSTM, which is an extended fully connected
LSTM (FC-LSTM) with embedded convolutional layers. (For more information on how RNN and
LSTM work I will discuss another article as an auxiliary material for this discussion, courtesy to
Colah, 2015) 6

Shortcomings
◦ Although it adopted a straightforward strategy, CLTFP still made the first attempt to align spatial
and temporal regularities. Afterwards, Shi et al. [2015] proposed the convolutional LSTM, which
is an extended fully connected LSTM (FC-LSTM) with embedded convolutional layers.
◦ However, the normal convolutional operation applied restricts the model to only process grid
structures (e.g. images, videos) rather than general domains.
◦ Meanwhile, recurrent networks for sequence learning require iterative training, which introduces
error accumulation by steps. Additionally, RNN-based networks (including LSTM) are widely
known to be difficult to train and computationally heavy.
7

Remedial Measures
◦ For overcoming these issues, I propose several strategies to effectively model temporal dynamics and spatial dependencies of
traffic flow. To fully utilize spatial information, we model the traffic network by a general graph instead of treating it separately
(e.g. grids or segments). To handle the inherent deficiencies of recurrent networks, we employ a fully convolutional structure
on time axis. Above all, we propose a novel deep learning architecture, the spatial-temporal graph convolutional networks, for
traffic forecasting tasks. This architecture comprises several spatial-temporal convolutional blocks, which are a combination of
graph convolutional layers [Defferrard et al., 2016] and convolutional sequence learning layers, to model spatial and temporal
dependencies.
NOVELTY:
◦ To the best of my knowledge, it is the first time I am attempting to apply purely convolutional structures to extract spatial-
temporal features simultaneously from graph-structured time series in a traffic study. We evaluate our proposed model on two
real world traffic datasets. Experiments show that our framework outperforms existing baselines in prediction tasks with
multiple preset prediction lengths and network scales.
8

Traffic Prediction
on Road Graphs
Traffic forecast is a typical time-
series prediction problem, i.e.
predicting the most likely traffic
measurements (e.g. speed or
traffic flow) in the next H time
steps given the previous M
traffic observations as in the
left,
9
Formula for Traffic Prediction that I am
attempting to use

Traffic Prediction on Road Graphs
◦ In this work, we define the traffic network on a graph and focus on structured traffic time series. The observation 𝑣𝑡 is not
independent but linked by pairwise connection in graph. Therefore, the data point 𝑣𝑡 can be regarded as a graph signal that is
defined on an undirected graph (or directed one) G with weights 𝑊𝑖𝑗 as here
At the t-th time step, in graph 𝐺𝑡 = (𝑣𝑡, €, 𝑊)
𝑣𝑡 is a finite set of vertices, , corresponding to
the observations from n monitor stations in a
traffic network; € is a set of edges, indicating
the connectedness between stations; while 𝑊 ∈
𝑅𝑛×𝑛 denotes the weighted adjacency matrix of
𝐺𝑡
10

Spectral Convolution
◦ Spectral Convolution In a spectral graph convolution, we perform an Eigen decomposition of
the Laplacian Matrix of the graph. This Eigen decomposition helps us in understanding the
underlying structure of the graph with which we can identify clusters/sub-groups of this graph.
This is done in the Fourier space. An analogy is PCA where we understand the spread of the
data by performing an Eigen Decomposition of the feature matrix. The only difference between
these two methods is with respect to the Eigen values. Smaller Eigen values explain the
structure of the data better in Spectral Convolution whereas it's the opposite in PCA. ChebNet,
GCN are some commonly used Deep learning architectures that use Spectral Convolution
Chirantan Gupta, Data Scientist, Royal Dutch Shell 11

Data Acquisition Challenges
◦ Most data to be acquired for this work will have to be purchased from external vendors.
◦ I already had a talk with some key people from Shell like Saptarshi Das , PhD , who has
mentioned that I would be getting all his support regarding my work and data acquisition.
◦ Other than that we as Shell purchase many data from different paid sources, we can use those
datasets as well by taking permission from Shell.
◦ But Data Acquisition is still challenging as usually Open-Source Traffic data even if available, is
either a sample or too small for any research work, hence we need to reach out to different
external vendors like HERE who can help us acquire the same.

Convolutions on Graphs
◦ A standard convolution for regular grids is clearly not applicable to general graphs. There are
two basic approaches currently exploring how to generalize CNNs to structured data forms.
One is to expand the spatial definition of a convolution [Niepert et al., 2016], and the other is
to manipulate in the spectral domain with graph Fourier transforms [Bruna et al., 2013].
◦ The former approach rearranges the vertices into certain grid forms which can be processed by
normal convolutional operations. The latter one introduces the spectral framework to apply
convolutions in spectral domains, often named as the spectral graph convolution
13

What was the result?
◦ Several followingup studies make the graph convolution more promising by reducing the computational complexity from O(𝑛2
)
to linear [Defferrard et al., 2016; Kipf and Welling, 2016].
14

Continued..
15
◦ Θ ∗G x ≈ θ0x + θ1(
2
λ𝑚𝑎𝑥
L − In)x ≈ θ0x − θ1(𝐷
1
2𝑊𝐷−
1
2) x
◦ where Θ0, Θ1 are two shared parameters of the kernel. In order to constrain parameters and
stabilize numerical performances Θ0 and Θ1 , are replaced by a single parameter θ by letting θ = Θ0
= - Θ1; W and D are renormalized by
◦ W = W + 𝐼𝑛 𝑎𝑛𝑑 𝐷𝑖𝑗 = Σ𝑗W𝑖𝑗 separately
◦ Then the Graph Convolution Graph can be written as:
◦ Θ ∗G x = θ(𝐼𝑛 + 𝐷
1
2𝑊𝐷−
1
2) x = θ(𝐷
1
2𝑊𝐷−
1
2) x

Why Laplace Transform?
◦ The purpose of the Laplace Transform is to transform ordinary differential equations (ODEs) into algebraic
equations, which makes it easier to solve ODEs. However, the Laplace Transform gives one more than that: it also
does provide qualitative information on the solution of the ODEs (the prime example is the famous final value
theorem lim
𝑡→ ∞
𝑓 𝑡 = lim
𝑠→0
sF(s)
◦ Not all functions have a Fourier Transform. The Laplace Transform is a generalized Fourier Transform, since it allows one to
obtain transforms of functions that have no Fourier Transforms. Does your function f(t) grow exponentially with time? Then it
has no FT. No problem, just multiply it by a decaying exponential that decays faster than f grows, and you now have a function
that has a Fourier Transform! The FT of that new function is the LT (evaluated on a line on the complex plane parallel to the
imaginary axis)
16

Proposed Model
◦ Composed of several spatiotemporal convolutional blocks, each of which is formed as a “sandwich” structure with two gated
sequential convolution layers and one spatial graph convolution layer in between.
17
Architecture of spatio-temporal graph convolutional
networks. The framework STGCN consists of two
spatio-temporal layer in the end. Each ST-Conv block
contains two temporal gated convolution layers and
one spatial graph convolution layer in the middle.
The residual connection and bottleneck strategy are
applied
inside each block. The input 𝑣𝑡−𝑀+1 , ..., 𝑣𝑡 is
uniformly processed by ST-Conv blocks to explore
spatial and temporal dependencies co- herently.
Comprehensive features are integrated by an output
layer to generate the final prediction v
ˆ.

Graph CNNs for Extracting Spatial Features
◦ The traffic network generally organizes as a graph structure. It is natural and reasonable to formulate road networks as graphs
mathematically. However, previous studies neglect spatial attributes of traffic networks: the connectivity and globality of the
networks are overlooked, since they are split into multiple segments or grids. Even with 2-D convolutions on grids, it can only
capture the spatial locality roughly due to compromises of data modeling. Accordingly, in our model, the graph convolution is
employed directly on graph structured data to extract highly meaningful patterns and features in the space domain.
◦ The computation of the Kernel 𝜃 in graph convolution as described earlier can be expensive due to O(𝑛2
) multiplications with
Graph Fourier Basis, U
◦ Can we improve it? Perhaps using Chebyshev Polynomials Approximation
18

Chebyshev’s Polynomial
◦ To localize the filter and reduce the number of parameters, the kernel Θ can be restricted to a polynomial of Λ as Θ(Λ) =
𝑘=0
𝐾−1
𝜃𝑘 Λ𝑘
, 𝑤ℎ𝑒𝑟𝑒 θ ∈ 𝑅𝐾
K is a vector of polynomial coefficients. K is the kernel size of graph convolution, which determines
the maximum radius of the convolution from central nodes. Traditionally, Chebyshev polynomial 𝑇𝑘(𝑥 )is used to approximate
kernels as a truncated expansion of order K−1 as Θ(Λ) ≈ 𝑘=0
𝐾−1
𝜃𝑘 𝑇𝑘(Λ) with rescaled Λ =
2Λ
Λ𝑚𝑎𝑥− 𝐼𝑛
, where Λ𝑚𝑎𝑥 denotes the
largest eigenvalue of L) [Hammond et al., 2011]. The graph convolution can then be rewritten as,
This can reduce the
time complexity to
O(K𝜖) where 𝜖 is
the number of
edges of the graph
19

Dataset Description
◦ Datasets: https://drive.google.com/drive/folders/10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX
- adj_mat.npy: Weighted Adjacency Matrix with shape of [num_road * num_road].
-node_values.npy: Historical Speed Records with shape of [len_seq * num_road] (len_seq = day_slot *
num_dates).
- num_route : number of routes in your dataset
◦ PeMSD7 was collected from Caltrans Performance Measurement System (PeMS) in real-time by over
39, 000 sensor stations, deployed across the major metropolitan areas of California state highway
system [Chen et al., 2001]. The dataset is also aggregated into 5-minute interval from 30-second
data samples. We randomly select a medium and a large scale among the District 7 of California
containing 228 and 1, 026 stations, labeled as PeMSD7(M) and PeMSD7(L), respectively, as data
sources

Dataset Description

What is a .npy file?
◦ An NPY file is a NumPy array file created by the
Python software package with the NumPy library
installed. It contains an array saved in the NumPy
(NPY) file format. NPY files store all the
information required to reconstruct an array on
any computer, which includes dtype and shape
information.
◦ How to open it?
◦ A sample example is given to the left
(https://stackoverflow.com/questions/33885051/
using-spyder-python-to-open-npy-file )

What does this code do?

Implementation
Details: Code
written by me
Datasets:
https://drive.google.com/drive/
folders/10FOTa6HXPqX8Pf5WRo
RwcFnW9BrNZEIX
adj_mat.npy
node_values.npy

STGCN Block

STGCN Module
Forward
26

Results Obtained
Compared with FNN and
ARIMA.
Conclusion: Got better MAE.
I although did not consider the
original dataset as it needed to
be purchased and I got an
opensource sample that I used
for my comparison.
Training loss: 0.12442259042730007
Validation loss: 0.13693484663963318
Validation MAE: 3.4758996963500977
After 120 epochs. After this The test loss tend to increase again,
but beware we may have yet another decline from the top or
zenith when and if it reaches there. If we allow a long number of
epochs , we can do that . Here number of epochs was initially
taken = 1000, and batch size = 50, but I had to stop it after 120
epochs
Compared with the validation MAE on the same dataset using FNN
is 4.02, ARIMA is 5.86 for a sample of PeMSD7(M)and PeMSD7(L)(
Source: Spatio-Temporal Graph Convolutional Networks: A Deep
Learning Framework for Traffic Forecasting by Yu et al. , 2018)

Plot of the training and validation loss

Data Preprocessing(Yu et al. 2018)
◦ The standard time interval in two datasets is set to 5 minutes. Thus, every node of the road graph contains 288 data points per
day. The linear interpolation method is used to fill missing values after data cleaning. In addition, data input are normalized by
Z-Score method.
◦ In PeMSD7, the adjacency matrix of the road graph is computed based on the distances among stations in the traffic network.
The weighted adjacency matrix W can be formed as,
29

Experimental Settings and claim(Yu et al. 2018)
◦ All experiments are compiled and tested on a Linux cluster (CPU: Intel(R) Xeon(R) CPU E5-2620
v4 @ 2.10GHz, GPU: NVIDIA GeForce GTX 1080). In order to eliminate atypical traffic, only
workday traffic data are adopted in our experiment [Li et al., 2015]. We execute grid search
strategy to locate the best parameters on validations. All the tests use 60 minutes as the
historical time window, a.k.a. 12 observed data points (M = 12) are used to forecast traffic
conditions in the next 15, 30, and 45 minutes (H = 3, 6, 9).
◦ In my case I did not have the GPU installed and hence the performance can be immensely
delayed due to not having GPU.
◦ To measure and evaluate the performance of different methods, Mean Absolute Errors (MAE),
Mean Absolute Percentage Errors (MAPE), and Root Mean Squared Errors (RMSE) are adopted
◦ .
30

Claim (Yu at al. 2018)
◦ The claim by this paper is: STGCN performs better than Historical Average (HA), Auto-
Regressive Integrated Moving Average (ARIMA), Feed-Forward Neural Network (FNN), Full-
Connected LSTM (FC-LSTM) [Sutskever et al., 2014]
◦ PeMSD7(M/L), the channels of three layers in ST-Conv block are 64, 16, 64 respectively. Both
the graph convolution kernel size K and temporal convolution kernel size Kt are set to 3 in the
model.
◦ With the Chebyshev polynomials approximation, while the K is set to 1 in the model
STGCN(1st) with the 1st-order approximation. We train our models by minimizing the mean
square error using RMSprop for 50 epochs with batch size as 50. The initial learning rate is 10−3
with a decay rate of 0.7 after every 5 epochs

Claim (Yu et al. 2018)
Proposed model achieves the best performance with statistical significance (two-tailed T-test, α =
0.01, P < 0.01) in all three evaluation metrics. We can easily observe that traditional statistical
and machine learning methods may perform well for short-term forecasting, but their long-term
predictions are not accurate because of error accumulation, memorization issues, and absence
of spatial information. ARIMA model performs the worst due to its incapability of handling
complex spatio-temporal data. Deep learning approaches generally achieved better prediction
results than traditional machine learning models(CLAIM)
Benefits of Spatial Topology
Previous methods did not incorporate spatial topology and modeled the time series in a coarse-
grained way. Differently, through modeling spatial topology of the sensors, our model STGCN has
achieved a significant improvement on short and mid-and-long term forecasting
32

Results (Yu et al., 2018)
Claim is: STGCN captures the trend of
rush hours more accurately than other
methods; and it detects the ending of the
rush hours earlier than others. Stemming
from the efficient graph convolution and
stacked temporal convolution structures,
our model is capable of fast responding to
the dynamic changes among the traffic
network without over-reliance on
historical average as most of recurrent
networks do.
33

Conclusion and Future Work
◦ In this paper(Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic
Forecasting by Yu et al., 2018), claim is that it is a novel deep learning framework STGCN for traffic
prediction, integrating graph convolution and gated temporal convolution through spatio-temporal
convolutional blocks. Experiments show that our model outperforms other state-of-the-art methods on
two real-world datasets, indicating its great potentials on exploring spatiotemporal structures from the
input. It also achieves faster training, easier convergences, and fewer parameters with flexibility and
scalability. These features are quite promising and practical for scholarly development and large-scale
industry deployment. In the future, we will further optimize the network structure and parameter
settings. Moreover, our proposed framework can be applied into more general spatiotemporal structured
sequence forecasting scenarios, such as evolving of social networks, and preference prediction in
recommendation systems, etc.
34

Another improvement suggested by me
◦ Instead of taking the entire graph, we can take only a selected part of the graph by drawing a 5 km drive time distance that
gives us a smaller graph thus computation time will be smaller.
◦ For example for the city of Hamburg, Germany, I was given a shape file which contained traffic information of trucks and cars in
both directions on different streets, after the data was cleansed properly using Alteryx, I used the Haversine Distance to
compute the distance between the given point with lat and long and then I computed the distances between the mid points of
the streets from that point and thus creating a polygon buffer or a smaller shape file.
. Completed the Distance Computation using Haversein’s Distance Formula:
(Latitude, Longitude)-> 53.543425, 10.021088
35

My Results against the HERE API(NOT OPEN
SOURCE)You can see not much of a difference here
36

How is Shell going to be benefitted?
◦ As Shell is transitioning faster than most, if not all, oil, gas and energy giants across the globe towards 0 Carbon emission. Given
that, Shell has already started working at a breakneck speed to promote and introduce Electric Vehicle Charging Stations across
the globe where we think it is most likely going to be accepted faster. We are introducing cheaper and faster charging networks
in different advanced markets.
◦ Currently we have already done so for China, Germany, Netherlands, The UK, The US, and I am leading the same work for
Singapore.
◦ One of the major variables that we take into context is traffic flow and especially the futuristic prediction of the same will help
our work easier and more robust.
◦ We are not just interested in deploying these stations but also maintaining them. Hence a sudden increase or decrease of
traffic flow which we know from various paid /unpaid data sources are for the past few months and/or weeks or computed on
an average for a certain extent of time.
◦ But futuristic prediction is still something the energy industry and academia both working on and hence this is of very high
importance for us as Shell.

References:
◦ I have implemented the paper Spatio-Temporal Graph Convolutional Networks: A Deep Learning
Framework for Traffic Forecasting Bing Yu, Haoteng Yin, Zhanxing (https://arxiv.org/abs/1709.04875v3)
◦ Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale
Traffic Learning and Forecasting by Zhiyong Cui et al. (IEEE)
◦ Language Modeling with Gated Convolutional Networks Yann N. Dauphin et al.
(https://arxiv.org/pdf/1612.08083v3.pdf)
◦ SIMPLE SPECTRAL GRAPH CONVOLUTION Hao Zhu, Piotr Koniusz(Published as a conference paper at ICLR
2021)
◦ Traffic Flow Prediction Using Graph Convolution Neural Networks(10th International Conference on
Information Science and Technology Bath, United Kingdom, September 9-15, 2020 by Anton
Agafonov)
38

References:
◦ Dynamic Spatial-Temporal Graph Convolutional Neural Networks for Traffic Forecasting(The
Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) by Diao et al. , link:
https://ojs.aaai.org/index.php/AAAI/article/download/3877/3755)
◦ Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting
(The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) by Guo et al. , link:
https://ojs.aaai.org/index.php/AAAI/article/download/3881/3759 )
◦ Bike Flow Prediction with Multi-Graph Convolutional Networks( Chai et al., 2018, link:
https://arxiv.org/pdf/1807.10934 )

◦QUESTIONS?
40

Traffic Prediction from Street Network images.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Traffic Prediction from Street Network images.pptx

Similar to Traffic Prediction from Street Network images.pptx (20)

Recently uploaded

Recently uploaded (20)

Traffic Prediction from Street Network images.pptx