SlideShare a Scribd company logo
1 of 16
Download to read offline
Direct Perception for Congestion Scene Detection Using1
TensorFlow™2
Artur Filipowicz
Princeton University
229 Sherrerd Hall, Princeton, NJ 08544
T: +01 732-593-9067
Email: arturf@princeton.edu
(corresponding author)
Jeremiah Liu
New Jersery Institute of Technology
607 Oak Hall, Newark, NJ 07103
T: +01 202-615-5905
Email: jeremiahliu@live.com
Joyoung Lee, Ph.D
Assistant Professor
New Jersery Institute of Technology
264 Tiernan Hall, Newark, NJ 07103
T: +01 973-596-2475
Email: jo.y.lee@njit.edu
3
Word Count: 6,641 words = 3,641 + 3,000 (8 Figures + 4 Tables)4
Paper submitted for consideration for presentation for the 96th TRB Annual Meeting in January5
2017 and for publication in the Transportation Research Record6
1
A. Filipowicz, J. Liu, J. Lee 2
1 ABSTRACT1
In this research, we examine a new approach to the problem of real-time traffic congestion2
detection based on single image analysis. We demonstrate the of a convolutional neural network in3
this domain. With this learning model and the direct perception approach, transforming an image4
directly to a congestion indicator, we design a system which can detect congestion independently5
of location, time and weather. We further demonstrate that the use of the Fast Fourier Transform6
and wavelet transform can improve the accuracy of a convolutional neural network across multiple7
conditions in new locations.8
A. Filipowicz, J. Liu, J. Lee 3
2 INTRODUCTION1
Traffic flow information is widely used in intelligent transportation systems to detect and2
manage traffic congestion. Collecting traffic data such as speed, count, and occupancy is one of3
the crucial parts in estimating the flow of traffic. Loop detectors, traffic radars, and surveillance4
cameras are used to collect such traffic data. However, due to the inflexibility and high cost of5
deploying loop detectors and traffic radars, video-content-understanding techniques are gaining6
popularity in detecting the flow of traffic. However, video data is often affected by the external7
environment, bad weather (e.g., rain, snow, and fog) and undesirable illumination (e.g., sun glare,8
darkness) conditions. As a result, a challenge in video-based detection is to properly interpret9
video images under such conditions.10
Recently, new advances in the field of machine learning allowed for the creation of more11
reliable computer vision algorithms. These advances include the development of larger models,12
known as Deep Learning (1), and the use of graphical processing units to speed up optimization of13
these models (2, 3). Deep Learning led to successful the use of convolutional neural networks to14
achieve high accuracy on many image (2, 4) and video recognition tasks (5). Furthermore, these15
models have been shown to generalize to new environments and conditions(6, 7).16
In this research, we attempt to address the problem of real-time traffic congestion detection.17
Unlike most previous approaches, we apply ideas from deep learning, computer vision, and signal18
processing to explore the potential of using convolutional neural networks to detect congestion. By19
using a direct perception (8) approach, mapping a single image directly to a congestion indicator,20
we design a system which can detect congestion independently of location, time and weather. We21
also demonstrate the benefit of using signal processing to pre-process images. To the best of our22
knowledge, this is the first study which looks at the performance of convolutional neural networks23
in this domain and the first study which examines the applicability of a single model across multiple24
locations and conditions.25
3 RELATED WORK26
Research efforts in video processing for traffic surveillance and control date back to the27
mid- 1970’s. We find it useful to categorize these systems into direct perception (8) and mediated28
perception (9) concepts discussed in (6).29
Mediated perception approach involves multiple sub-components for scene classification.30
These components could include a vehicles counter and a vehicle speed detector. With mediated31
perception, the estimates of both of these detectors are combined to determine if congestion is32
present. Systems based on mediated perception approaches are convenient for debugging and fine33
tuning. In such systems, we can trace errors back to individual components, and attempt to im-34
prove the system component by component. However, mediated perception also adds unnecessary35
complexity to an already difficult task. There is no clear reason to believe that vehicle detection in36
images is any easier than directly perceiving congestion. Additionally, the combining of outputs37
may introduce more parameters making the system’s performance harder to optimize. In contrast,38
direct perception approach focuses on finding a transformation from a single image directly to a39
congestion indicator. This approach is potentially simpler, more generalizable, and computation-40
ally more efficient. The drawback is that it is more difficult to identify reasons for failure. Direct41
A. Filipowicz, J. Liu, J. Lee 4
perception appears to be gaining some traction in more recent literature.1
3.1 Mediated Perception2
Most of congestion scene classification techniques are based on mediated perception. One3
popular method used for congestion detection is traffic parameter extraction. Examples can be4
dated back to (10) describing the Autoscope system. The system involves two steps. First step is5
to detect vehicles. This is done by defining an area of interest and placing detection lines along6
or across the roadway lanes, then using a segmentor to estimate the pixel changes on the line7
when a vehicle is present. Next step is to derive traffic parameters such as speed by analyzing8
sequential images. The average accuracy is 95% under ideal conditions. However, the false alarm9
significantly increases under certain situations such as overlap of vehicles. Shadows also negatively10
affect the performance of the system. Cucchiara et al. (11) improved the algorithm presented in11
(10) by separating analysis techniques between day and night. A vehicle tracking accuracy of12
96.9% with 5.8% false negatives, 13.2% false positive during day time, and an accuracy of 96%13
with 4.5% false negative and 4.9% false positive during night time is achieved.14
Along a different line of research, (12) presents a way to measure traffic queue parameter15
by applying a motion detector based on Fast Fourier Transform (FFT) and followed with a vehicle16
detector based on edge detection. The result has a 95% accuracy on determining the length of the17
queue under constant lighting conditions between two frames. Similarly, (13) classified congestion18
scene by extracting the overall crowd density and crowd speed. The result shows 94.5% accuracy in19
single location under daylight condition. No evaluation at other locations and night light condition20
was discussed in this experiment. Background subtraction is convenient to implement. However,21
it is hard to extract a background model in conditions such as sudden light change and heavy22
congestion.23
Another approach to detect congestion scene was explained in (14), where Li et al. pro-24
posed a time-spatial image based method and achieved 93% detection rate for congestion condi-25
tion. According to the author, the false estimation results are due to low contrast image, small26
vehicle block and irregular lane condition. There was no evaluation under other weather condition27
such as rain. Palubinskas et al. (15) approaches traffic congestion detection in sequences of optical28
images based on change detection, image processing and incorporation of apriori information such29
as a traffic model and road network. The accuracy of congestion detection was not reported.30
3.2 Direct Perception31
Many systems developed in the past depend on sequential images, edge detection, and32
thresholding. In most cases the objective is to determine more complicated traffic descriptors than33
a single indicator. Both false positive and false negative rates have to be very low to acquire a34
practical system. Currently, most of the work is still carried out by human operators due to a35
high false alarm rate. Recently, a new direct perception approach is gaining popularity due to36
better performance in adapting to various environments. For example, (16) utilized an efficient37
unsupervised feature learning method with density information encoded. Spherical k-means was38
employed to learn features and a feature selection procedure is followed to remove bad features.39
Locality-constraint coding (LLC) was used to map raw images patches to a new feature space40
and trained a SVM classifier for the classification. The proposed algorithm achieved 85% average41
A. Filipowicz, J. Liu, J. Lee 5
accuracy. Evaluation of accuracy under different weather and light conditions was not performed.1
Recent work done by (7) is very relevant to our approach. Sepehr et al (7) presented a parking2
stall vacancy detection algorithm base on deep convolutional neural networks. The result showed3
a misclassification rate as low as 5% under varying weather conditions.4
4 METHODOLOGY5
For this study, we collected over 26 thousand images from traffic surveillance cameras,6
labeled the data based on traffic, time, and weather conditions, applied 4 image processing trans-7
formations and used the new TensorFlow library (17) to construct and train convolutional neural8
networks.9
4.1 Dataset10
We collected and labeled 27,833 images from surveillance cameras at 24 locations in New11
Jersey. The dataset contains a variety of traffic patterns and weather conditions grouped into four12
classes, day-clear, day-rain, night-clear and night-rain. Figure 1 shows a sample image from each13
class. We divided the dataset randomly into training and testing sets by location, such that images14
from the same location do not appear in both sets; Tables 1 and 2 detail the exact division of15
examples. We used the training set for training and tuning and the testing set to get the final16
accuracy.17
TABLE 1 : Percent of examples of free flow and congestion in the test and training set
Condition Training Examples Test Examples
Free flow 59% 44%
Congestion 41% 56%
TABLE 2 : Number of examples of each condition in the dataset
Condition Training Examples Test Examples
Free flow Day Clear 3,098 1,100
Free flow Day Rain 3,131 1,057
Free flow Night Clear 2,960 1,200
Free flow Night Rain 2,122 599
Congestion Day Clear 2,443 1,852
Congestion Day Rain 1,395 2,287
Congestion Night Clear 2,460 562
Congestion Night Rain 1,326 241
Total 18,935 8,898
A. Filipowicz, J. Liu, J. Lee 6
Free Flow Day Clear Free Flow Day Rain Free Flow Night Clear Free Flow Night Rain
Congestion Day Clear Congestion Day Rain Congestion Night Clear Congestion Night Rain
FIGURE 1 : Sample images of all 9 conditions in our dataset.
4.2 Image Processing1
Before training, we scaled the images to 140x100 pixels and applied 4 different image pro-2
cessing transformations. These are grayscale, Fast Fourier Transform (FFT), wavelet transform3
(WT), and our mixture of wavelet and Fast Fourier Transform. For this research, it is important to4
understand the output of these transformations at a high level. The low level details and mathemat-5
ical equations are not illustrative and we therefore omit them.6
In a grayscale image as shown Figure 2(a), each pixel represents intensity of light at that7
location. The basic gray scaling of the images serves a baseline and it is necessary as most images8
are already black and white. In FFT in Figure2(b), the image is represented as a collection of9
amplitudes and phases. For our study, we discard the phases and focus on the amplitudes. In10
this representation, amplitudes of high frequency components correspond to sharp changes in the11
image. When they are removed, the image becomes blurred. In Figure 2(b) the high frequency12
components are away from the center of the image. The wavelet transform (18, 19) at a high13
level, produces four representations of the image. One of the four representations is a compressed14
version of the original image, which we discard as it carries information similar to the grayscale15
image. The other three representations highlight vertical, horizontal, and diagonal edges in the16
image as shown in Figure 2(c). Our WT+FFT transformation appends the WT output to the FFT17
output as shown in Figure 2(d). The FFT output is also scaled such that the range (max - min)18
of FFT matches the range of WT. For a general introduction to the FFT, one can refer to (20) and19
for a deeper treatment, refer to (21). In the discussion section we analyze how the properties of20
these transforms maybe useful for congestion detection. After applying these transformations, we21
standardized the training set by subtracting the mean and dividing by the standard deviation of22
each pixel.23
A. Filipowicz, J. Liu, J. Lee 7
(a) Grayscale (b) FFT
(c) WT (d) WT+FFT
FIGURE 2 : Outputs of the four transformations.
A. Filipowicz, J. Liu, J. Lee 8
4.3 TensorFlow™1
TensorFlow is a library used to build and train gradient-based machine learning models2
which provides a convenient interface for expressing machine learning algorithms by converting3
described computations into a data flow graph (17). Data flow graphs describe mathematical com-4
putation with a directed graph of nodes and edges. Nodes represent mathematical operations as5
well as data inputs and persistent variables. Edges describe the input/output relationships between6
nodes in form of multidimensional data arrays. Computation can be distributed across multiple7
CPUs and GPUs (17).8
4.4 Convoluntional Neural Network9
For our experiments we selected a convolutional neural network model. A convolutional10
neural network is a machine learning model which attempts to mimic the function of the human11
vision system. These networks tend to perform well on vision tasks because some of the parameters12
the network learns are arranged into filters. These filters are convolved with the input image; the13
filter is moved across the image and, at regular intervals, multiplied with the pixel values in that14
region. The effective result of this is that the network can match a particular pattern represented15
by a filter to all areas in an image. This allows the network to detect objects in different locations.16
With multiple layers of such filters, networks can detect objects even when the size of the object17
varies. These filters can serve as simple edge detectors or detect more complicated patterns such18
as facial features. Visualizations of filters can be found in (22).19
We constructed a network with one convolution layer of 8 5x5 filters. This layer is followed20
by a max pooling and normalization layers. Then our model has 3 fully connected layers with21
output sizes 389, 192, and 1. All layers use ReLU activation function (2, 23), except for the22
last layer which uses a softmax function. During training, we employ dropout, weight delay and23
leaning rate decay to avoid overfitting and improve results. Additionally, for grayscale images we24
apply single image whitening and data augmentation techniques of randomly changing contrast25
and brightness.26
5 RESULTS27
We trained the above model on each of the four transformations. Table 3 lists the accuracy28
on the test set for each transformation. While no transformation is superior in all conditions,29
FFT, WT and WT+FFT outperform grayscale in most conditions. Since the accuracy for free flow30
and congestion in day-clear appear to be low for what seems to be an easy condition, we also31
trained a network in just those conditions. The performance of that network on the test set in those32
conditions is about 20% better as summarized in Table 4.33
A. Filipowicz, J. Liu, J. Lee 9
(a) Grayscale (b) FFT
(c) WT (d) WT+FFT
FIGURE 3 : Accuracy on the test set.
A. Filipowicz, J. Liu, J. Lee 10
TABLE 3 : Accuracy on the test set
Condition Grayscale FFT WT WT+FFT
Free Flow Day Clear 0.637 0.970 0.650 0.650
Free Flow Day Rain 0.996 0.995 0.998 0.986
Free Flow Night Clear 0.678 0.965 1.000 0.820
Free Flow Night Rain 0.534 0.998 0.973 0.947
Congestion Day Clear 0.457 0.604 0.455 0.645
Congestion Day Rain 0.724 0.438 0.835 0.846
Congestion Night Clear 0.558 0.538 0.937 0.993
Congestion Night Rain 0.091 0.842 0.062 0.008
TABLE 4 : Accuracy for training and testing under day-clear condition
Condition Grayscale
Free Flow Day Clear 0.85
Congestion Day Clear 0.98
6 DISCUSSION1
The accuracies in our results vary from very low to very high. This shows the general2
difficulty of the congestion recognition problem. A detection system may perform extremely well3
in one condition and location but not another. It is also apparent that a raw image, which is easy4
for a human to understand, may not be the best representation for a machine to detect congestion.5
Let us better understand these results in the context of what each transformation is representing.6
In a grayscale image, each pixel represents intensity of light at that location. We could7
speculate that the amount of light in an image corresponds to number of vehicle headlights or tail-8
lights. Perhaps vehicles being brighter than the pavement could also correlate their number to the9
brightness in the image. However, as seen in Figure 5, under certain conditions such as rain at10
night, even free flow traffic can appear to be very bright.11
In our test, a network trained on grayscale images functions better in rain and at night12
than during the day. Suggesting that the network correlated its output to lights in the image. The13
network is confused by free flow day clear images and congestion at night in rain. A possible14
explanation for this is that the day free flow images in the test set had very light road surface and15
very few vehicles. They are bright but for a different reason and bright enough for the network to16
think the image is of congestion. A similar reason might apply to poor performance in the night17
congestion during rain. A difficulty with this analysis is that the network uses a very complicated18
nonlinear decision function. Thus, our idea of brightness is a qualitative observation.19
A. Filipowicz, J. Liu, J. Lee 11
a b
FIGURE 4 : Filters learned from all conditions (a) and filters learned under the day-clear condition
(b).
FIGURE 5 : Free flow and congestion condition with a lot of brightness in the image.
To further explore this question, we trained our network on images from the training set1
which were taken on clear days. We then tested this network on day-clear images in the test set.2
The results improved dramatically form 65% and 45% to 85% and 98% accuracy on free flow and3
congestion respectively, as shown in Table 4. The filters of these two networks in Figure 4 appear4
very different as well. The filters for the network trained in all conditions look more like they5
detect sharp transitions between dark and light areas. While the filters for the network trained in6
just the day-clear condition look more uniform. They are mostly white or black. This suggests that7
for day conditions, the network learned to look for open road. When other conditions are present,8
a more optimal solution is to look for sharp transitions such as headlights.9
Since brightness is not a clear indicator of congestion, we decided to use transformations10
which depend more on edges and textures in an image. The idea to use other transformations11
comes from simulation results in (24) and the properties of the transforms.12
The Fast Fourier Transform and wavelet transform produce a frequency representation of13
an image. In FFT high frequency components correspond to sharp changes in the image. There14
A. Filipowicz, J. Liu, J. Lee 12
may be some correlation between high frequencies and amount of cars in an image which a network1
could learn. For example, an empty road has fewer sharp transitions than a road with many cars on2
it. Figure 6 shows the FFT output from a free flow and congested image. The FFT of congested3
image has vertical stripes in the high frequency regions. The network trained on FFT images4
performs much better than the network trained on gray scale images across all conditions expect5
for congestion during day-rain and night-clear. Additionally, congestion day-clear and congestion6
night-clear have low accuracy for unknown reason.7
Free Flow Congested
FIGURE 6 : The FFT output for an image with congestion and free flow.
The wavelet transform, at a high level, produces four representations of the image. One of8
the four representations is a compressed version of the original image. The other three highlight9
vertical, horizontal, and diagonal edges in the image in Figure 2(c). The wavelet transform can10
highlight vehicles as vertical edges. With this transformation, the network improved in two con-11
gestion conditions but suffered the same performance in free flow day-clear and congestion nigh-12
rain as the gray scale version. With respect to WT, we also discovered that horizontal shadows as13
appeared in Figure 7 for an example, can look like vehicles.14
In an attempt to get the best of FFT and WT, we combined the output of both transfor-15
mations into a single input image in Figure 2(d). This combination failed at the congestion night16
rain conditions. However, it performed batter than grayscale images in all other conditions. It17
also overcame some of the shortcomings of FFT and WT alone while keeping high accuracy. This18
suggests that FFT and WT capture different and very important features for congestion detection.19
It is important to note that some examples are difficult even for a human to distinguish. In20
Figure 8 rain drops on the camera make it difficult to see the traffic. The scattered light might make21
it appear that there are many vehicles in gray scale or very few in WT or FFT as the textures and22
edges get blurred. In many examples the images show congestion in one direction and free flow23
in the other, Figure 9. In our approach we label any image which has congestion anywhere as an24
example of congestion. However, a network could still become confused.25
A. Filipowicz, J. Liu, J. Lee 13
FIGURE 7 : Empty road with a horizontal shadow which can look like a vehicle in a WT transform
output.
FIGURE 8 : Rain drops on a camera can blur the detail of the image.
FIGURE 9 : In some images congested and free flow lanes are visible which can confuse a net-
work.
A. Filipowicz, J. Liu, J. Lee 14
7 CONCLUDING REMARKS1
In this research, we attempt to address the problem of real-time traffic congestion detection.2
By using a direct perception approach, mapping a single image directing to a congestion indicator,3
we design a system which can detect congestion independent of location, time and weather. We4
demonstrate that the use of FFT and WT with a convolutional neural network can produce high5
accuracy across multiple conditions in new locations.6
These results are promising but still exploratory. Future research steps in this area include7
the creation of a larger dataset and training for larger networks. This set should include tens of8
thousands to hundreds of thousands of images from 50 or more locations. This would allow for9
more thorough testing. Additionally, a convolution neural network with more layers may improve10
results. This model would require more data and computational power then was available for this11
research. Considering the success of such models in other domains and the promising results in this12
study, the performance of a larger convolution neural network would be interesting to investigate.13
References14
[1] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–15
444, 2015.16
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep17
convolutional neural networks. In Advances in neural information processing systems, pages18
1097–1105, 2012.19
[3] Dave Steinkrau, Patrice Y Simard, and Ian Buck. Using gpus for machine learning algo-20
rithms. In Proceedings of the Eighth International Conference on Document Analysis and21
Recognition, pages 1115–1119. IEEE Computer Society, 2005.22
[4] Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 3d convolutional neural networks for hu-23
man action recognition. IEEE transactions on pattern analysis and machine intelligence,24
35(1):221–231, 2013.25
[5] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and26
Li Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceed-27
ings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732,28
2014.29
[6] Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving: Learning affor-30
dance for direct perception in autonomous driving. In Proceedings of the IEEE International31
Conference on Computer Vision, pages 2722–2730, 2015.32
[7] Sepehr Valipour, Mennatullah Siam, Eleni Stroulia, and Martin Jagersand. Parking stall33
vacancy indicator system based on deep convolutional neural networks. arXiv preprint34
arXiv:1606.09367, 2016.35
[8] James J Gibson. The ecological approach to visual perception: classic edition. Psychology36
Press, 2014.37
A. Filipowicz, J. Liu, J. Lee 15
[9] Shimon Ullman. Against direct perception. Behavioral and Brain Sciences, 3(03):373–381,1
1980.2
[10] Panos G Michalopoulos. Vehicle detection video through image processing: the autoscope3
system. IEEE Transactions on vehicular technology, 40(1):21–29, 1991.4
[11] Rita Cucchiara, Massimo Piccardi, and Paola Mello. Image analysis and rule-based reasoning5
for a traffic monitoring system. IEEE Transactions on Intelligent Transportation Systems,6
1(2):119–130, 2000.7
[12] M Fathy and MY Siyal. Real-time image processing approach to measure traffic queue pa-8
rameters. IEE Proceedings-Vision, Image and Signal Processing, 142(5):297–303, 1995.9
[13] Luciano Oliveira Andrews Sobral, Leizer Schnitman, and Felippe De Souza. Highway traffic10
congestion classification using holistic properties.11
[14] Li Li, Long Chen, Xiaofei Huang, and Jian Huang. A traffic congestion estimation approach12
from video using time-spatial imagery. In Intelligent Networks and Intelligent Systems, 2008.13
ICINIS’08. First International Conference on, pages 465–469. IEEE, 2008.14
[15] Gintautas Palubinskas, Franz Kurz, and Peter Reinartz. Detection of traffic congestion in15
optical remote sensing imagery. In IGARSS 2008-2008 IEEE International Geoscience and16
Remote Sensing Symposium, volume 2, pages II–426. IEEE, 2008.17
[16] Yuan Yuan, Jia Wan, and Qi Wang. Congested scene classification via efficient unsupervised18
feature learning and density estimation. Pattern Recognition, 56:159–169, 2016.19
[17] Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,20
Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale21
machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467,22
2016.23
[18] Arne Jensen and Anders la Cour-Harbo. Ripples in mathematics: the discrete wavelet trans-24
form. Springer Science & Business Media, 2001.25
[19] Adrian S Lewis and G Knowles. Image compression using the 2-d wavelet transform. IEEE26
Transactions on image Processing, 1(2):244–250, 1992.27
[20] GD Bergland. A guided tour of the fast fourier transform. IEEE spectrum, 6(7):41–52, 1969.28
[21] Henri J Nussbaumer. Fast Fourier transform and convolution algorithms, volume 2. Springer29
Science & Business Media, 2012.30
[22] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. Convolutional deep belief31
networks for scalable unsupervised learning of hierarchical representations. In Proceedings32
of the 26th annual international conference on machine learning, pages 609–616. ACM,33
2009.34
A. Filipowicz, J. Liu, J. Lee 16
[23] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann ma-1
chines. In Proceedings of the 27th International Conference on Machine Learning (ICML-2
10), pages 807–814, 2010.3
[24] Artur Filipowicz, Thee Chanyaswad, and S. Y. Kung. Filtering of frequency-transformed4
images privacy-preserving face recognition. 2016. submitted to MLSP2016.5

More Related Content

What's hot

Vehicle detection using background subtraction and clustering algorithms
Vehicle detection using background subtraction and clustering algorithmsVehicle detection using background subtraction and clustering algorithms
Vehicle detection using background subtraction and clustering algorithms
TELKOMNIKA JOURNAL
 
A multi sensor-information_fusion_method_based_on_factor_graph_for_integrated...
A multi sensor-information_fusion_method_based_on_factor_graph_for_integrated...A multi sensor-information_fusion_method_based_on_factor_graph_for_integrated...
A multi sensor-information_fusion_method_based_on_factor_graph_for_integrated...
Ashish Sharma
 
New approach to the identification of the easy expression recognition system ...
New approach to the identification of the easy expression recognition system ...New approach to the identification of the easy expression recognition system ...
New approach to the identification of the easy expression recognition system ...
TELKOMNIKA JOURNAL
 
Effective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.IEffective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.I
IJMTST Journal
 

What's hot (20)

Cnn acuracia remotesensing-08-00329
Cnn acuracia remotesensing-08-00329Cnn acuracia remotesensing-08-00329
Cnn acuracia remotesensing-08-00329
 
Vehicle detection using background subtraction and clustering algorithms
Vehicle detection using background subtraction and clustering algorithmsVehicle detection using background subtraction and clustering algorithms
Vehicle detection using background subtraction and clustering algorithms
 
Applying convolutional neural networks for limited-memory application
Applying convolutional neural networks for limited-memory applicationApplying convolutional neural networks for limited-memory application
Applying convolutional neural networks for limited-memory application
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
A New Chaotic Map for Secure Transmission
A New Chaotic Map for Secure TransmissionA New Chaotic Map for Secure Transmission
A New Chaotic Map for Secure Transmission
 
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHMA ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
 
IEEE 2014 Matlab Projects
IEEE 2014 Matlab ProjectsIEEE 2014 Matlab Projects
IEEE 2014 Matlab Projects
 
IEEE 2014 Matlab Projects
IEEE 2014 Matlab ProjectsIEEE 2014 Matlab Projects
IEEE 2014 Matlab Projects
 
A multi sensor-information_fusion_method_based_on_factor_graph_for_integrated...
A multi sensor-information_fusion_method_based_on_factor_graph_for_integrated...A multi sensor-information_fusion_method_based_on_factor_graph_for_integrated...
A multi sensor-information_fusion_method_based_on_factor_graph_for_integrated...
 
IRJET Autonomous Simultaneous Localization and Mapping
IRJET  	  Autonomous Simultaneous Localization and MappingIRJET  	  Autonomous Simultaneous Localization and Mapping
IRJET Autonomous Simultaneous Localization and Mapping
 
AN ENHANCED CHAOTIC IMAGE ENCRYPTION
AN ENHANCED CHAOTIC IMAGE ENCRYPTIONAN ENHANCED CHAOTIC IMAGE ENCRYPTION
AN ENHANCED CHAOTIC IMAGE ENCRYPTION
 
IRJET- Design the Surveillance Algorithm and Motion Detection of Objects for ...
IRJET- Design the Surveillance Algorithm and Motion Detection of Objects for ...IRJET- Design the Surveillance Algorithm and Motion Detection of Objects for ...
IRJET- Design the Surveillance Algorithm and Motion Detection of Objects for ...
 
New approach to the identification of the easy expression recognition system ...
New approach to the identification of the easy expression recognition system ...New approach to the identification of the easy expression recognition system ...
New approach to the identification of the easy expression recognition system ...
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATA
 
Robust foreground modelling to segment and detect multiple moving objects in ...
Robust foreground modelling to segment and detect multiple moving objects in ...Robust foreground modelling to segment and detect multiple moving objects in ...
Robust foreground modelling to segment and detect multiple moving objects in ...
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
 
Minimum image disortion of reversible data hiding
Minimum image disortion of reversible data hidingMinimum image disortion of reversible data hiding
Minimum image disortion of reversible data hiding
 
Function projective synchronization
Function projective synchronizationFunction projective synchronization
Function projective synchronization
 
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...
 
Effective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.IEffective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.I
 

Similar to Direct Perception for Congestion Scene Detection Using TensorFlow

Online video-based abnormal detection using highly motion techniques and stat...
Online video-based abnormal detection using highly motion techniques and stat...Online video-based abnormal detection using highly motion techniques and stat...
Online video-based abnormal detection using highly motion techniques and stat...
TELKOMNIKA JOURNAL
 
Hyperspectral Data Compression Using Spatial-Spectral Lossless Coding Technique
Hyperspectral Data Compression Using Spatial-Spectral Lossless Coding TechniqueHyperspectral Data Compression Using Spatial-Spectral Lossless Coding Technique
Hyperspectral Data Compression Using Spatial-Spectral Lossless Coding Technique
CSCJournals
 
Volkova_DICTA_robust_feature_based_visual_navigation
Volkova_DICTA_robust_feature_based_visual_navigationVolkova_DICTA_robust_feature_based_visual_navigation
Volkova_DICTA_robust_feature_based_visual_navigation
Anastasiia Volkova
 
Threshold adaptation and XOR accumulation algorithm for objects detection
Threshold adaptation and XOR accumulation algorithm for  objects detectionThreshold adaptation and XOR accumulation algorithm for  objects detection
Threshold adaptation and XOR accumulation algorithm for objects detection
IJECEIAES
 
Robust techniques for background subtraction in urban
Robust techniques for background subtraction in urbanRobust techniques for background subtraction in urban
Robust techniques for background subtraction in urban
taylor_1313
 
2013APRU_NO40-abstract-mobilePIV_YangYaoYu
2013APRU_NO40-abstract-mobilePIV_YangYaoYu2013APRU_NO40-abstract-mobilePIV_YangYaoYu
2013APRU_NO40-abstract-mobilePIV_YangYaoYu
Yao-Yu Yang
 

Similar to Direct Perception for Congestion Scene Detection Using TensorFlow (20)

Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
 
Online video-based abnormal detection using highly motion techniques and stat...
Online video-based abnormal detection using highly motion techniques and stat...Online video-based abnormal detection using highly motion techniques and stat...
Online video-based abnormal detection using highly motion techniques and stat...
 
A study on data fusion techniques used in multiple radar tracking
A study on data fusion techniques used in multiple radar trackingA study on data fusion techniques used in multiple radar tracking
A study on data fusion techniques used in multiple radar tracking
 
Hyperspectral Data Compression Using Spatial-Spectral Lossless Coding Technique
Hyperspectral Data Compression Using Spatial-Spectral Lossless Coding TechniqueHyperspectral Data Compression Using Spatial-Spectral Lossless Coding Technique
Hyperspectral Data Compression Using Spatial-Spectral Lossless Coding Technique
 
Volkova_DICTA_robust_feature_based_visual_navigation
Volkova_DICTA_robust_feature_based_visual_navigationVolkova_DICTA_robust_feature_based_visual_navigation
Volkova_DICTA_robust_feature_based_visual_navigation
 
Autonomous Abnormal Behaviour Detection Using Trajectory Analysis
Autonomous Abnormal Behaviour Detection Using Trajectory AnalysisAutonomous Abnormal Behaviour Detection Using Trajectory Analysis
Autonomous Abnormal Behaviour Detection Using Trajectory Analysis
 
Visual odometry _report
Visual odometry _reportVisual odometry _report
Visual odometry _report
 
IRJET- Real Time Implementation of Air Writing
IRJET- Real Time Implementation of  Air WritingIRJET- Real Time Implementation of  Air Writing
IRJET- Real Time Implementation of Air Writing
 
information-11-00583-v3.pdf
information-11-00583-v3.pdfinformation-11-00583-v3.pdf
information-11-00583-v3.pdf
 
CLEARMiner: Mining of Multitemporal Remote Sensing Images
CLEARMiner: Mining of Multitemporal Remote Sensing ImagesCLEARMiner: Mining of Multitemporal Remote Sensing Images
CLEARMiner: Mining of Multitemporal Remote Sensing Images
 
Ijcatr02011007
Ijcatr02011007Ijcatr02011007
Ijcatr02011007
 
Speed Determination of Moving Vehicles using Lucas- Kanade Algorithm
Speed Determination of Moving Vehicles using Lucas- Kanade AlgorithmSpeed Determination of Moving Vehicles using Lucas- Kanade Algorithm
Speed Determination of Moving Vehicles using Lucas- Kanade Algorithm
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVM
 
Vehicle counting without background modeling
Vehicle counting without background modelingVehicle counting without background modeling
Vehicle counting without background modeling
 
Threshold adaptation and XOR accumulation algorithm for objects detection
Threshold adaptation and XOR accumulation algorithm for  objects detectionThreshold adaptation and XOR accumulation algorithm for  objects detection
Threshold adaptation and XOR accumulation algorithm for objects detection
 
Robust techniques for background subtraction in urban
Robust techniques for background subtraction in urbanRobust techniques for background subtraction in urban
Robust techniques for background subtraction in urban
 
D018112429
D018112429D018112429
D018112429
 
Multi-channel microseismic signals classification with convolutional neural n...
Multi-channel microseismic signals classification with convolutional neural n...Multi-channel microseismic signals classification with convolutional neural n...
Multi-channel microseismic signals classification with convolutional neural n...
 
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
 
2013APRU_NO40-abstract-mobilePIV_YangYaoYu
2013APRU_NO40-abstract-mobilePIV_YangYaoYu2013APRU_NO40-abstract-mobilePIV_YangYaoYu
2013APRU_NO40-abstract-mobilePIV_YangYaoYu
 

More from Artur Filipowicz

Incorporating Learning Strategies in Training of Deep Neural Networks for Au...
Incorporating Learning Strategies in Training of Deep Neural  Networks for Au...Incorporating Learning Strategies in Training of Deep Neural  Networks for Au...
Incorporating Learning Strategies in Training of Deep Neural Networks for Au...
Artur Filipowicz
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Artur Filipowicz
 
Video Games for Autonomous Driving
Video Games for Autonomous DrivingVideo Games for Autonomous Driving
Video Games for Autonomous Driving
Artur Filipowicz
 

More from Artur Filipowicz (9)

Smart Safety for Commercial Vehicles (ENG)
Smart Safety for Commercial Vehicles (ENG)Smart Safety for Commercial Vehicles (ENG)
Smart Safety for Commercial Vehicles (ENG)
 
Smart Safety for Commercial Vehicles (中文)
Smart Safety for Commercial Vehicles (中文)Smart Safety for Commercial Vehicles (中文)
Smart Safety for Commercial Vehicles (中文)
 
Incorporating Learning Strategies in Training of Deep Neural Networks for Au...
Incorporating Learning Strategies in Training of Deep Neural  Networks for Au...Incorporating Learning Strategies in Training of Deep Neural  Networks for Au...
Incorporating Learning Strategies in Training of Deep Neural Networks for Au...
 
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
 
Filtering of Frequency Components for Privacy Preserving Facial Recognition
Filtering of Frequency Components for Privacy Preserving Facial RecognitionFiltering of Frequency Components for Privacy Preserving Facial Recognition
Filtering of Frequency Components for Privacy Preserving Facial Recognition
 
Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning
Desensitized RDCA Subspaces for Compressive Privacy in Machine LearningDesensitized RDCA Subspaces for Compressive Privacy in Machine Learning
Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning
 
Video Games for Autonomous Driving
Video Games for Autonomous DrivingVideo Games for Autonomous Driving
Video Games for Autonomous Driving
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 

Direct Perception for Congestion Scene Detection Using TensorFlow

  • 1. Direct Perception for Congestion Scene Detection Using1 TensorFlow™2 Artur Filipowicz Princeton University 229 Sherrerd Hall, Princeton, NJ 08544 T: +01 732-593-9067 Email: arturf@princeton.edu (corresponding author) Jeremiah Liu New Jersery Institute of Technology 607 Oak Hall, Newark, NJ 07103 T: +01 202-615-5905 Email: jeremiahliu@live.com Joyoung Lee, Ph.D Assistant Professor New Jersery Institute of Technology 264 Tiernan Hall, Newark, NJ 07103 T: +01 973-596-2475 Email: jo.y.lee@njit.edu 3 Word Count: 6,641 words = 3,641 + 3,000 (8 Figures + 4 Tables)4 Paper submitted for consideration for presentation for the 96th TRB Annual Meeting in January5 2017 and for publication in the Transportation Research Record6 1
  • 2. A. Filipowicz, J. Liu, J. Lee 2 1 ABSTRACT1 In this research, we examine a new approach to the problem of real-time traffic congestion2 detection based on single image analysis. We demonstrate the of a convolutional neural network in3 this domain. With this learning model and the direct perception approach, transforming an image4 directly to a congestion indicator, we design a system which can detect congestion independently5 of location, time and weather. We further demonstrate that the use of the Fast Fourier Transform6 and wavelet transform can improve the accuracy of a convolutional neural network across multiple7 conditions in new locations.8
  • 3. A. Filipowicz, J. Liu, J. Lee 3 2 INTRODUCTION1 Traffic flow information is widely used in intelligent transportation systems to detect and2 manage traffic congestion. Collecting traffic data such as speed, count, and occupancy is one of3 the crucial parts in estimating the flow of traffic. Loop detectors, traffic radars, and surveillance4 cameras are used to collect such traffic data. However, due to the inflexibility and high cost of5 deploying loop detectors and traffic radars, video-content-understanding techniques are gaining6 popularity in detecting the flow of traffic. However, video data is often affected by the external7 environment, bad weather (e.g., rain, snow, and fog) and undesirable illumination (e.g., sun glare,8 darkness) conditions. As a result, a challenge in video-based detection is to properly interpret9 video images under such conditions.10 Recently, new advances in the field of machine learning allowed for the creation of more11 reliable computer vision algorithms. These advances include the development of larger models,12 known as Deep Learning (1), and the use of graphical processing units to speed up optimization of13 these models (2, 3). Deep Learning led to successful the use of convolutional neural networks to14 achieve high accuracy on many image (2, 4) and video recognition tasks (5). Furthermore, these15 models have been shown to generalize to new environments and conditions(6, 7).16 In this research, we attempt to address the problem of real-time traffic congestion detection.17 Unlike most previous approaches, we apply ideas from deep learning, computer vision, and signal18 processing to explore the potential of using convolutional neural networks to detect congestion. By19 using a direct perception (8) approach, mapping a single image directly to a congestion indicator,20 we design a system which can detect congestion independently of location, time and weather. We21 also demonstrate the benefit of using signal processing to pre-process images. To the best of our22 knowledge, this is the first study which looks at the performance of convolutional neural networks23 in this domain and the first study which examines the applicability of a single model across multiple24 locations and conditions.25 3 RELATED WORK26 Research efforts in video processing for traffic surveillance and control date back to the27 mid- 1970’s. We find it useful to categorize these systems into direct perception (8) and mediated28 perception (9) concepts discussed in (6).29 Mediated perception approach involves multiple sub-components for scene classification.30 These components could include a vehicles counter and a vehicle speed detector. With mediated31 perception, the estimates of both of these detectors are combined to determine if congestion is32 present. Systems based on mediated perception approaches are convenient for debugging and fine33 tuning. In such systems, we can trace errors back to individual components, and attempt to im-34 prove the system component by component. However, mediated perception also adds unnecessary35 complexity to an already difficult task. There is no clear reason to believe that vehicle detection in36 images is any easier than directly perceiving congestion. Additionally, the combining of outputs37 may introduce more parameters making the system’s performance harder to optimize. In contrast,38 direct perception approach focuses on finding a transformation from a single image directly to a39 congestion indicator. This approach is potentially simpler, more generalizable, and computation-40 ally more efficient. The drawback is that it is more difficult to identify reasons for failure. Direct41
  • 4. A. Filipowicz, J. Liu, J. Lee 4 perception appears to be gaining some traction in more recent literature.1 3.1 Mediated Perception2 Most of congestion scene classification techniques are based on mediated perception. One3 popular method used for congestion detection is traffic parameter extraction. Examples can be4 dated back to (10) describing the Autoscope system. The system involves two steps. First step is5 to detect vehicles. This is done by defining an area of interest and placing detection lines along6 or across the roadway lanes, then using a segmentor to estimate the pixel changes on the line7 when a vehicle is present. Next step is to derive traffic parameters such as speed by analyzing8 sequential images. The average accuracy is 95% under ideal conditions. However, the false alarm9 significantly increases under certain situations such as overlap of vehicles. Shadows also negatively10 affect the performance of the system. Cucchiara et al. (11) improved the algorithm presented in11 (10) by separating analysis techniques between day and night. A vehicle tracking accuracy of12 96.9% with 5.8% false negatives, 13.2% false positive during day time, and an accuracy of 96%13 with 4.5% false negative and 4.9% false positive during night time is achieved.14 Along a different line of research, (12) presents a way to measure traffic queue parameter15 by applying a motion detector based on Fast Fourier Transform (FFT) and followed with a vehicle16 detector based on edge detection. The result has a 95% accuracy on determining the length of the17 queue under constant lighting conditions between two frames. Similarly, (13) classified congestion18 scene by extracting the overall crowd density and crowd speed. The result shows 94.5% accuracy in19 single location under daylight condition. No evaluation at other locations and night light condition20 was discussed in this experiment. Background subtraction is convenient to implement. However,21 it is hard to extract a background model in conditions such as sudden light change and heavy22 congestion.23 Another approach to detect congestion scene was explained in (14), where Li et al. pro-24 posed a time-spatial image based method and achieved 93% detection rate for congestion condi-25 tion. According to the author, the false estimation results are due to low contrast image, small26 vehicle block and irregular lane condition. There was no evaluation under other weather condition27 such as rain. Palubinskas et al. (15) approaches traffic congestion detection in sequences of optical28 images based on change detection, image processing and incorporation of apriori information such29 as a traffic model and road network. The accuracy of congestion detection was not reported.30 3.2 Direct Perception31 Many systems developed in the past depend on sequential images, edge detection, and32 thresholding. In most cases the objective is to determine more complicated traffic descriptors than33 a single indicator. Both false positive and false negative rates have to be very low to acquire a34 practical system. Currently, most of the work is still carried out by human operators due to a35 high false alarm rate. Recently, a new direct perception approach is gaining popularity due to36 better performance in adapting to various environments. For example, (16) utilized an efficient37 unsupervised feature learning method with density information encoded. Spherical k-means was38 employed to learn features and a feature selection procedure is followed to remove bad features.39 Locality-constraint coding (LLC) was used to map raw images patches to a new feature space40 and trained a SVM classifier for the classification. The proposed algorithm achieved 85% average41
  • 5. A. Filipowicz, J. Liu, J. Lee 5 accuracy. Evaluation of accuracy under different weather and light conditions was not performed.1 Recent work done by (7) is very relevant to our approach. Sepehr et al (7) presented a parking2 stall vacancy detection algorithm base on deep convolutional neural networks. The result showed3 a misclassification rate as low as 5% under varying weather conditions.4 4 METHODOLOGY5 For this study, we collected over 26 thousand images from traffic surveillance cameras,6 labeled the data based on traffic, time, and weather conditions, applied 4 image processing trans-7 formations and used the new TensorFlow library (17) to construct and train convolutional neural8 networks.9 4.1 Dataset10 We collected and labeled 27,833 images from surveillance cameras at 24 locations in New11 Jersey. The dataset contains a variety of traffic patterns and weather conditions grouped into four12 classes, day-clear, day-rain, night-clear and night-rain. Figure 1 shows a sample image from each13 class. We divided the dataset randomly into training and testing sets by location, such that images14 from the same location do not appear in both sets; Tables 1 and 2 detail the exact division of15 examples. We used the training set for training and tuning and the testing set to get the final16 accuracy.17 TABLE 1 : Percent of examples of free flow and congestion in the test and training set Condition Training Examples Test Examples Free flow 59% 44% Congestion 41% 56% TABLE 2 : Number of examples of each condition in the dataset Condition Training Examples Test Examples Free flow Day Clear 3,098 1,100 Free flow Day Rain 3,131 1,057 Free flow Night Clear 2,960 1,200 Free flow Night Rain 2,122 599 Congestion Day Clear 2,443 1,852 Congestion Day Rain 1,395 2,287 Congestion Night Clear 2,460 562 Congestion Night Rain 1,326 241 Total 18,935 8,898
  • 6. A. Filipowicz, J. Liu, J. Lee 6 Free Flow Day Clear Free Flow Day Rain Free Flow Night Clear Free Flow Night Rain Congestion Day Clear Congestion Day Rain Congestion Night Clear Congestion Night Rain FIGURE 1 : Sample images of all 9 conditions in our dataset. 4.2 Image Processing1 Before training, we scaled the images to 140x100 pixels and applied 4 different image pro-2 cessing transformations. These are grayscale, Fast Fourier Transform (FFT), wavelet transform3 (WT), and our mixture of wavelet and Fast Fourier Transform. For this research, it is important to4 understand the output of these transformations at a high level. The low level details and mathemat-5 ical equations are not illustrative and we therefore omit them.6 In a grayscale image as shown Figure 2(a), each pixel represents intensity of light at that7 location. The basic gray scaling of the images serves a baseline and it is necessary as most images8 are already black and white. In FFT in Figure2(b), the image is represented as a collection of9 amplitudes and phases. For our study, we discard the phases and focus on the amplitudes. In10 this representation, amplitudes of high frequency components correspond to sharp changes in the11 image. When they are removed, the image becomes blurred. In Figure 2(b) the high frequency12 components are away from the center of the image. The wavelet transform (18, 19) at a high13 level, produces four representations of the image. One of the four representations is a compressed14 version of the original image, which we discard as it carries information similar to the grayscale15 image. The other three representations highlight vertical, horizontal, and diagonal edges in the16 image as shown in Figure 2(c). Our WT+FFT transformation appends the WT output to the FFT17 output as shown in Figure 2(d). The FFT output is also scaled such that the range (max - min)18 of FFT matches the range of WT. For a general introduction to the FFT, one can refer to (20) and19 for a deeper treatment, refer to (21). In the discussion section we analyze how the properties of20 these transforms maybe useful for congestion detection. After applying these transformations, we21 standardized the training set by subtracting the mean and dividing by the standard deviation of22 each pixel.23
  • 7. A. Filipowicz, J. Liu, J. Lee 7 (a) Grayscale (b) FFT (c) WT (d) WT+FFT FIGURE 2 : Outputs of the four transformations.
  • 8. A. Filipowicz, J. Liu, J. Lee 8 4.3 TensorFlow™1 TensorFlow is a library used to build and train gradient-based machine learning models2 which provides a convenient interface for expressing machine learning algorithms by converting3 described computations into a data flow graph (17). Data flow graphs describe mathematical com-4 putation with a directed graph of nodes and edges. Nodes represent mathematical operations as5 well as data inputs and persistent variables. Edges describe the input/output relationships between6 nodes in form of multidimensional data arrays. Computation can be distributed across multiple7 CPUs and GPUs (17).8 4.4 Convoluntional Neural Network9 For our experiments we selected a convolutional neural network model. A convolutional10 neural network is a machine learning model which attempts to mimic the function of the human11 vision system. These networks tend to perform well on vision tasks because some of the parameters12 the network learns are arranged into filters. These filters are convolved with the input image; the13 filter is moved across the image and, at regular intervals, multiplied with the pixel values in that14 region. The effective result of this is that the network can match a particular pattern represented15 by a filter to all areas in an image. This allows the network to detect objects in different locations.16 With multiple layers of such filters, networks can detect objects even when the size of the object17 varies. These filters can serve as simple edge detectors or detect more complicated patterns such18 as facial features. Visualizations of filters can be found in (22).19 We constructed a network with one convolution layer of 8 5x5 filters. This layer is followed20 by a max pooling and normalization layers. Then our model has 3 fully connected layers with21 output sizes 389, 192, and 1. All layers use ReLU activation function (2, 23), except for the22 last layer which uses a softmax function. During training, we employ dropout, weight delay and23 leaning rate decay to avoid overfitting and improve results. Additionally, for grayscale images we24 apply single image whitening and data augmentation techniques of randomly changing contrast25 and brightness.26 5 RESULTS27 We trained the above model on each of the four transformations. Table 3 lists the accuracy28 on the test set for each transformation. While no transformation is superior in all conditions,29 FFT, WT and WT+FFT outperform grayscale in most conditions. Since the accuracy for free flow30 and congestion in day-clear appear to be low for what seems to be an easy condition, we also31 trained a network in just those conditions. The performance of that network on the test set in those32 conditions is about 20% better as summarized in Table 4.33
  • 9. A. Filipowicz, J. Liu, J. Lee 9 (a) Grayscale (b) FFT (c) WT (d) WT+FFT FIGURE 3 : Accuracy on the test set.
  • 10. A. Filipowicz, J. Liu, J. Lee 10 TABLE 3 : Accuracy on the test set Condition Grayscale FFT WT WT+FFT Free Flow Day Clear 0.637 0.970 0.650 0.650 Free Flow Day Rain 0.996 0.995 0.998 0.986 Free Flow Night Clear 0.678 0.965 1.000 0.820 Free Flow Night Rain 0.534 0.998 0.973 0.947 Congestion Day Clear 0.457 0.604 0.455 0.645 Congestion Day Rain 0.724 0.438 0.835 0.846 Congestion Night Clear 0.558 0.538 0.937 0.993 Congestion Night Rain 0.091 0.842 0.062 0.008 TABLE 4 : Accuracy for training and testing under day-clear condition Condition Grayscale Free Flow Day Clear 0.85 Congestion Day Clear 0.98 6 DISCUSSION1 The accuracies in our results vary from very low to very high. This shows the general2 difficulty of the congestion recognition problem. A detection system may perform extremely well3 in one condition and location but not another. It is also apparent that a raw image, which is easy4 for a human to understand, may not be the best representation for a machine to detect congestion.5 Let us better understand these results in the context of what each transformation is representing.6 In a grayscale image, each pixel represents intensity of light at that location. We could7 speculate that the amount of light in an image corresponds to number of vehicle headlights or tail-8 lights. Perhaps vehicles being brighter than the pavement could also correlate their number to the9 brightness in the image. However, as seen in Figure 5, under certain conditions such as rain at10 night, even free flow traffic can appear to be very bright.11 In our test, a network trained on grayscale images functions better in rain and at night12 than during the day. Suggesting that the network correlated its output to lights in the image. The13 network is confused by free flow day clear images and congestion at night in rain. A possible14 explanation for this is that the day free flow images in the test set had very light road surface and15 very few vehicles. They are bright but for a different reason and bright enough for the network to16 think the image is of congestion. A similar reason might apply to poor performance in the night17 congestion during rain. A difficulty with this analysis is that the network uses a very complicated18 nonlinear decision function. Thus, our idea of brightness is a qualitative observation.19
  • 11. A. Filipowicz, J. Liu, J. Lee 11 a b FIGURE 4 : Filters learned from all conditions (a) and filters learned under the day-clear condition (b). FIGURE 5 : Free flow and congestion condition with a lot of brightness in the image. To further explore this question, we trained our network on images from the training set1 which were taken on clear days. We then tested this network on day-clear images in the test set.2 The results improved dramatically form 65% and 45% to 85% and 98% accuracy on free flow and3 congestion respectively, as shown in Table 4. The filters of these two networks in Figure 4 appear4 very different as well. The filters for the network trained in all conditions look more like they5 detect sharp transitions between dark and light areas. While the filters for the network trained in6 just the day-clear condition look more uniform. They are mostly white or black. This suggests that7 for day conditions, the network learned to look for open road. When other conditions are present,8 a more optimal solution is to look for sharp transitions such as headlights.9 Since brightness is not a clear indicator of congestion, we decided to use transformations10 which depend more on edges and textures in an image. The idea to use other transformations11 comes from simulation results in (24) and the properties of the transforms.12 The Fast Fourier Transform and wavelet transform produce a frequency representation of13 an image. In FFT high frequency components correspond to sharp changes in the image. There14
  • 12. A. Filipowicz, J. Liu, J. Lee 12 may be some correlation between high frequencies and amount of cars in an image which a network1 could learn. For example, an empty road has fewer sharp transitions than a road with many cars on2 it. Figure 6 shows the FFT output from a free flow and congested image. The FFT of congested3 image has vertical stripes in the high frequency regions. The network trained on FFT images4 performs much better than the network trained on gray scale images across all conditions expect5 for congestion during day-rain and night-clear. Additionally, congestion day-clear and congestion6 night-clear have low accuracy for unknown reason.7 Free Flow Congested FIGURE 6 : The FFT output for an image with congestion and free flow. The wavelet transform, at a high level, produces four representations of the image. One of8 the four representations is a compressed version of the original image. The other three highlight9 vertical, horizontal, and diagonal edges in the image in Figure 2(c). The wavelet transform can10 highlight vehicles as vertical edges. With this transformation, the network improved in two con-11 gestion conditions but suffered the same performance in free flow day-clear and congestion nigh-12 rain as the gray scale version. With respect to WT, we also discovered that horizontal shadows as13 appeared in Figure 7 for an example, can look like vehicles.14 In an attempt to get the best of FFT and WT, we combined the output of both transfor-15 mations into a single input image in Figure 2(d). This combination failed at the congestion night16 rain conditions. However, it performed batter than grayscale images in all other conditions. It17 also overcame some of the shortcomings of FFT and WT alone while keeping high accuracy. This18 suggests that FFT and WT capture different and very important features for congestion detection.19 It is important to note that some examples are difficult even for a human to distinguish. In20 Figure 8 rain drops on the camera make it difficult to see the traffic. The scattered light might make21 it appear that there are many vehicles in gray scale or very few in WT or FFT as the textures and22 edges get blurred. In many examples the images show congestion in one direction and free flow23 in the other, Figure 9. In our approach we label any image which has congestion anywhere as an24 example of congestion. However, a network could still become confused.25
  • 13. A. Filipowicz, J. Liu, J. Lee 13 FIGURE 7 : Empty road with a horizontal shadow which can look like a vehicle in a WT transform output. FIGURE 8 : Rain drops on a camera can blur the detail of the image. FIGURE 9 : In some images congested and free flow lanes are visible which can confuse a net- work.
  • 14. A. Filipowicz, J. Liu, J. Lee 14 7 CONCLUDING REMARKS1 In this research, we attempt to address the problem of real-time traffic congestion detection.2 By using a direct perception approach, mapping a single image directing to a congestion indicator,3 we design a system which can detect congestion independent of location, time and weather. We4 demonstrate that the use of FFT and WT with a convolutional neural network can produce high5 accuracy across multiple conditions in new locations.6 These results are promising but still exploratory. Future research steps in this area include7 the creation of a larger dataset and training for larger networks. This set should include tens of8 thousands to hundreds of thousands of images from 50 or more locations. This would allow for9 more thorough testing. Additionally, a convolution neural network with more layers may improve10 results. This model would require more data and computational power then was available for this11 research. Considering the success of such models in other domains and the promising results in this12 study, the performance of a larger convolution neural network would be interesting to investigate.13 References14 [1] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–15 444, 2015.16 [2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep17 convolutional neural networks. In Advances in neural information processing systems, pages18 1097–1105, 2012.19 [3] Dave Steinkrau, Patrice Y Simard, and Ian Buck. Using gpus for machine learning algo-20 rithms. In Proceedings of the Eighth International Conference on Document Analysis and21 Recognition, pages 1115–1119. IEEE Computer Society, 2005.22 [4] Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 3d convolutional neural networks for hu-23 man action recognition. IEEE transactions on pattern analysis and machine intelligence,24 35(1):221–231, 2013.25 [5] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and26 Li Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceed-27 ings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732,28 2014.29 [6] Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving: Learning affor-30 dance for direct perception in autonomous driving. In Proceedings of the IEEE International31 Conference on Computer Vision, pages 2722–2730, 2015.32 [7] Sepehr Valipour, Mennatullah Siam, Eleni Stroulia, and Martin Jagersand. Parking stall33 vacancy indicator system based on deep convolutional neural networks. arXiv preprint34 arXiv:1606.09367, 2016.35 [8] James J Gibson. The ecological approach to visual perception: classic edition. Psychology36 Press, 2014.37
  • 15. A. Filipowicz, J. Liu, J. Lee 15 [9] Shimon Ullman. Against direct perception. Behavioral and Brain Sciences, 3(03):373–381,1 1980.2 [10] Panos G Michalopoulos. Vehicle detection video through image processing: the autoscope3 system. IEEE Transactions on vehicular technology, 40(1):21–29, 1991.4 [11] Rita Cucchiara, Massimo Piccardi, and Paola Mello. Image analysis and rule-based reasoning5 for a traffic monitoring system. IEEE Transactions on Intelligent Transportation Systems,6 1(2):119–130, 2000.7 [12] M Fathy and MY Siyal. Real-time image processing approach to measure traffic queue pa-8 rameters. IEE Proceedings-Vision, Image and Signal Processing, 142(5):297–303, 1995.9 [13] Luciano Oliveira Andrews Sobral, Leizer Schnitman, and Felippe De Souza. Highway traffic10 congestion classification using holistic properties.11 [14] Li Li, Long Chen, Xiaofei Huang, and Jian Huang. A traffic congestion estimation approach12 from video using time-spatial imagery. In Intelligent Networks and Intelligent Systems, 2008.13 ICINIS’08. First International Conference on, pages 465–469. IEEE, 2008.14 [15] Gintautas Palubinskas, Franz Kurz, and Peter Reinartz. Detection of traffic congestion in15 optical remote sensing imagery. In IGARSS 2008-2008 IEEE International Geoscience and16 Remote Sensing Symposium, volume 2, pages II–426. IEEE, 2008.17 [16] Yuan Yuan, Jia Wan, and Qi Wang. Congested scene classification via efficient unsupervised18 feature learning and density estimation. Pattern Recognition, 56:159–169, 2016.19 [17] Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,20 Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale21 machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467,22 2016.23 [18] Arne Jensen and Anders la Cour-Harbo. Ripples in mathematics: the discrete wavelet trans-24 form. Springer Science & Business Media, 2001.25 [19] Adrian S Lewis and G Knowles. Image compression using the 2-d wavelet transform. IEEE26 Transactions on image Processing, 1(2):244–250, 1992.27 [20] GD Bergland. A guided tour of the fast fourier transform. IEEE spectrum, 6(7):41–52, 1969.28 [21] Henri J Nussbaumer. Fast Fourier transform and convolution algorithms, volume 2. Springer29 Science & Business Media, 2012.30 [22] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. Convolutional deep belief31 networks for scalable unsupervised learning of hierarchical representations. In Proceedings32 of the 26th annual international conference on machine learning, pages 609–616. ACM,33 2009.34
  • 16. A. Filipowicz, J. Liu, J. Lee 16 [23] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann ma-1 chines. In Proceedings of the 27th International Conference on Machine Learning (ICML-2 10), pages 807–814, 2010.3 [24] Artur Filipowicz, Thee Chanyaswad, and S. Y. Kung. Filtering of frequency-transformed4 images privacy-preserving face recognition. 2016. submitted to MLSP2016.5