improvement of search strategy with knn approach for traffic state prediction

Improvement of Search Strategy With KNN
Approach for Traffic State Prediction
Midhun Xavier
CSE15P004

Contents
• Introduction
• Why pattern searching approach
• Why KNN
• Challenges
• Search strategy with KNN
• Single level search strategy
• Sequential search strategy
• Computation complexities
• Performance comparison
• Conclusion
• Reference
10/2/2016 CSE15P004 2

Introduction
• The future traffic state information is essential for maintaining successful
Intelligent Transportation Systems (ITS) deployments and can eventually
contribute towards the society by mitigating congestion costs.
• For this reason, various data-driven algorithms aiming to increase the
prediction reliability have been developed in the past.
• statistical linear models (e.g., linear regression, ARIMA family models, Kalman
filter),
• artificial intelligence based models (Artificial neural networks),
• pattern searching methods (Nearest neighbors methods).
10/2/2016 CSE15P004 3

Why pattern searching approach?
• A pattern searching approach has been recently receiving attentions from
researchers due to the recent progress in “big data” related technologies.
• This strategy relies on the historical data, which seems practicable if
sufficiently large database is available.
• One of the pioneering researches of the K-NN based approach has been
implemented in 1991, which predicts the traffic flow and occupancy with data
from the previous day.
10/2/2016 CSE15P004 4

Why KNN?
• With larger historical data, the K-NN has been found to outperform other
methods such as ARIMA and feed-forward neural networks.
• The local linear regression model and time-varying linear regression model
have outperformed the K-NN in terms of accuracy.
• To improve the accuracy, the K-NN method is modified by incorporating a
hybrid state vector, a multivariate matching process, and optimal parametric
constants.
10/2/2016 CSE15P004 5

Challenges
• We have to increase the efficiency in making traffic state predictions, while
dealing with relatively large prediction ranges and reducing computational
efforts.
• This paper develops a novel sequential search strategy for the K-NN based
approach which reduces the searching space sequentially.
• The proposed sequential strategy is found to be outperforming the
conventional single-level search approach in terms of prediction measures,
which are prediction accuracy, efficiency, and stability.
10/2/2016 CSE15P004 6

Search Strategy With K-NN
• The K-NN extracts k neighbors for a given input set by measuring its similarity
to those neighbors, and the feature vector represents the description of a
traffic situation.
• Therefore, it is important to compose the feature vector with relevant
variables to recognize correct past records.
• The computational efficiency relies on the search process which is closely
linked with the dimension of the feature vector and the design of the search
strategy.
10/2/2016 CSE15P004 7

Single-Level Search Strategy
• The feature vector contains current acceleration and deceleration information
contributing to the identification of traffic dynamics.
• Traffic conditions of adjacent sensors are included in the feature vector by
incorporating speeds collected from upstream and downstream of a Vehicle
Detection System (VDS).
10/2/2016 CSE15P004 8

Feature Vector
• Fig. 1 provides schematic description for the hypothetical highway network
containing N VDS sensors (red circle). The basic feature vector (F V ) at a certain time
(ti) and location (Ln) contains Msi components (Msi by 1 matrix) as:
in which,
v(ti) = speed at time ti;
v (ti) = acceleration rate at time ti;
Ln = nth VDS; Ln−1 = upstream VDS;
Ln+1 = downstream VDS.
10/2/2016 CSE15P004 9

Section Feature Vector
• The study network covered with N VDS sensors (for all Ln, 1 ≤ n ≤ N), the F V
for each location comprises Msi by N matrix of section feature vector (SFV) as:
in which,
L1−1 indicates upstream VDS of the first VDS,
LN+1 for downstream VDS of the last VDS.
10/2/2016 CSE15P004 10

Pseudo code of single level search
10/2/2016 CSE15P004 11

• At a certain location (nth VDS), the similarity between current input and
the record from dth day can be determined by calculating the
normalized Euclidean distances (which is unit-less) as:
in which,
SFVT a[, n] and SFVdyd [, n] indicate nth column (= Ln) in SFV of the target
dth day, 1 ≤ d ≤ Dsi, 1 ≤ n ≤ N, and, 1 ≤ m ≤ Msi.
10/2/2016 CSE15P004 12

• This process yields a separate similarity for each VDS location (Ln) and each
day (dyd) as
10/2/2016 CSE15P004 13

• Based on the similarity measurement, we select the ksi neighbors which yield
the minimum distances in the similarity vector for each location, then,
generate the future state by taking the average of the ksi neighbors for each
column
• We call this approach as a single-level search strategy, because it searches the
ksi historical nearest neighbors of the target traffic pattern with single types
of feature vector on a single search process
10/2/2016 CSE15P004 14

Sequential Search Strategy
• This structure separates the attributes for each search level: the speed
and acceleration of nth VDS and its upstream and downstream VDS for
Level I and II respectively.
• The speed is considered in the first level as it is an intuitive representative
variable indicating the traffic state.
• In Level II, the acceleration is considered for distinguishing the detailed
class of current input and historical data among the selected observations
from the first stage (Level I).
• Searching historical pattern is a sequential process with multi-level
phases.
10/2/2016 CSE15P004 15

• In the first level (Level II), search queries are made for klvI historical patterns
that yield the most similar pattern to the current observation based on F VlvI
from the whole data points (DlvI ).
• Note that DlvI indicates the whole number of days in the database, which is
equivalent to the size of data that the single-level search strategy covers (DlvI
= Dsi{dy1, dy2, . . . , dyDsi (or dyDlvI )}).
10/2/2016 CSE15P004 16

• Subsequently, in the second search process (Level II), we consider the
historical observations (DlvII = {dy1 , dy2 , . . . , dyDlvII }) that are the products
of the Level I.
10/2/2016 CSE15P004 17

Pseudo code for sequential search
10/2/2016 CSE15P004 18

Computation Complexities
• Compared with the single-level search structure, the complexity of
computations is reduced by the two main reasons:
• i) the computation of similarity with smaller dimensions of the feature vector
from each level.
• ii) the reduced searching space in Level II as long as the size inequality is held
as:
in which,
size(SFVsi) = Msi ∗ N;
size(SFVlvI ) = MlvI ∗ N;
size(SFVlvII ) = MlvII ∗ N.
10/2/2016 CSE15P004 19

• Using the size of input data set, the algorithmic characteristic on
computational complexity of single-level search can be expressed as:
• In case of the complexity of sequential search, it linearly grows with the
complexity of each sub levels in sequential process:
10/2/2016 CSE15P004 20

• The searching size inequality characterizes the complexity inequality between
the single-level search and the sequential search:
• To achieve this inequality, it is necessary to satisfy the following size
condition:
• This size condition is derived from the possible maximum number of days
that should be concerned in Level II, if the Level I yields mutually exclusive klvI
candidates for each location (Ln).
10/2/2016 CSE15P004 21

Experimental Setting
• The predictive performances of the two types of strategies are investigated
using VDS data (5-min speed) from the highway SR78-E (State Route 78
Eastbound) collected from its Performance Measurement System (PeMS).
• This site is chosen because of its high quality data and the availability of
various traffic situations.
• The study site is installed with 16 VDS stations (N = 16) along the 25 km
stretch with 0.6∼2.9 km of spacing, and isolated bottlenecks frequently
appearing during the PM peak hour on typical weekdays.
• The period of the historical data is 240 typical week days spanning over 1.5
years (January, 2013∼July, 2014).
10/2/2016 CSE15P004 22

Description of Experimental Setting
10/2/2016 CSE15P004 23

K?
• The number of neighbors is usually determined with an empirical
procedure
• As a preliminary study, RSS (Residual Sum of Squares) values are estimated
from the linear relation between real points and predicted values.
10/2/2016 CSE15P004 24

Performance Comparison Between Single-Level and Sequential
Search Strategies
• The two strategies visually appear to be able to replicate the traffic state
transition by capturing the bottleneck location, the activation time, the
maximum queue length, and the duration of bottleneck.
• The prediction accuracy and efficiency have been quantitatively
measured.
• Using the indicators for the average error (MAPE (Mean Absolute
Percent Error) and RMSE (Root Mean Square Error)) between the real
data and the predicted state, the accuracies from the two strategies
have been compared.
10/2/2016 CSE15P004 25

Example of speed contour generated from real data and
prediction results
10/2/2016 CSE15P004 26

Comparison of accuracy between the two strategies for the
whole day
10/2/2016 CSE15P004 27

Comparison of accuracy between the two strategies for the
whole day
10/2/2016 CSE15P004 28

Comparison of Accuracy between the two strategies for the
whole day
10/2/2016 CSE15P004 29

Improvement of pattern searching performance
10/2/2016 CSE15P004 30

Prediction error and Searching time for each strategy
10/2/2016 CSE15P004 31

Internal Improvement Between Level I and Level II in Sequential
Search Strategy
• Internally, from Level I to Level II, prediction error reduced averagely from
5.09 to 3.83% for MAPE and from 5.12 to 3.88 km/h for RMSE.
• Besides the small improvements for the non-peak hour predictions, the
prediction error has approximately reduced by 30% for the peak hour traffic
predictions
• MAPE and RMSE averagely have reduced from 8.84 to 6.13% and from 14.29
to 9.91 km/h respectively.
• Moreover, the small variance in the Level II indicates the performances have
been stabilized: 4.65 to 2.05% in MAPE and 7.32 to 3.25 km/h in RMSE.
10/2/2016 CSE15P004 32

Internal hierarchical improvement across sub-levels in
sequential search strategy.
10/2/2016 CSE15P004 33

Prediction error and Searching time for each level in Sequential
search strategy
10/2/2016 CSE15P004 34

Increasing searching time according to the size of searching
space in Level II
10/2/2016 CSE15P004 35

Conclusion
• Comparing with the conventional single-level search strategy, it is found that
the sequential structure in the K-NN based searching algorithm outperforms
the former one with the higher efficiency and accuracy.
• Especially for the peak hour traffic predictions, the proposed algorithm
significantly reduces prediction errors from the conventional strategy, whilst
accelerating computational efficiency.
• The sequential selection process is mainly credited for the performance
improvements.
10/2/2016 CSE15P004 36

References
• H. Yeo, K. Jang, A. Skabardonis, and S. Kang, “Impact of traffic states on
freeway crash involvement rates,” Accident Anal. Prev., vol. 50, pp. 713–723,
Jan. 2013.
• B. L. Smith, B. M. Williams, and R. K. Oswald, “Comparison of parametric and
nonparametric models for traffic flow forecasting,” Transp. Res. Part C,
Emerging Technol., vol. 10, no. 4, pp. 303–321, Aug. 2002
• J. W. C. van Lint, S. P. Hoogendoorn, and H. J. van Zuylen, “Accurate freeway
travel time prediction with state-space neural networks under missing data,”
Transp. Res. Part C, Emerging Technol., vol. 13, no. 5/6, pp. 347–369, Oct.–
Dec. 2005.
10/2/2016 CSE15P004 37

THANK YOU...
10/2/2016 CSE15P004 38

improvement of search strategy with knn approach for traffic state prediction

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to improvement of search strategy with knn approach for traffic state prediction

Similar to improvement of search strategy with knn approach for traffic state prediction (20)

Recently uploaded

Recently uploaded (20)

improvement of search strategy with knn approach for traffic state prediction