Udacity project: Vehicle detection and tracking

Udacity Self-Driving Car: Term-1 Bill Kromydas
Project 5: Vehicle Detection and Tracking January 14, 2018

1
Vehicle Detection and Tracking

Summary
The objective of this project is to build a vehicle detection system based on the video feed from a
camera mounted on the front of a car. Image frames from the video are processed to detect and track
nearby vehicles. The image processing pipeline from the Advanced Lane Finding project was
integrated with the vehicle detection and track pipeline from this project to produce both lane finding
and vehicle tracking boxes as shown in example image above. The processing pipeline for the vehicle
detection and track system includes the following steps:
- Region of interest definition (search sector design)
- Sliding window search within the ROI
- Image feature extraction from sliding windows
- Processing image frame detections (accumulating image frame detections in a heatmap)
- Track initiation and maintenance (heatmap integration over multiple frames)
- Rendering bounding boxes to show vehicles in track

Feature Extraction
As suggested in the lecture notes several feature types were used as input for a classifier to detect
vehicles. The features I used included: color histograms, spatial binning of color, and histograms of
gradients (HOG). All three feature types were extracted and concatenated to form a final feature
vector for classification as described in the lecture notes. I experimented with the HOG parameters by
comparing the classification performance using test data with consideration to the performance
through the entire pipeline as well. I finally settled on the values listed in the table below which is
pretty close to the default values listed in the lecture notes with the exception of the number of
orientation bins. I did not see a significant difference in the detection and tracking performance when
experimenting with various feature parameters but in some cases the overall tracking behavior
seemed to be the most consistent with the parameter set below.


2
HOG Parameter Value
Cells per block 2
HOG Channels All
Pixels per Cell 8
Orientation Bins 13

The sequences of images below illustrate what the HOG features look like for five random images
from the data set for each class label (vehicles and non-vehicles).
Vehicles
Non-Vehicles


3
Classifier
A support vector machine (SVM) classifier was used for the propose of detecting cars. I experimented
a little bit with a non-linear SVM, but after a linear model I found the results to be quite good and I
therefore decided to use the linear model for this project. Training the classifier was performed by
splitting the dataset into train and test (80% and 20% respectively). The lecture notes caution about
training on time series data in this fashion where images are closely spaced in time. To mitigate issues
associated with training on images closely spaced in time the data was shuffled before splitting it
between test and train. In order mitigate over fitting cross validation was used to tune the
regularization parameter (‘C’). The value range of C was .00001 to 1.0 in increments log(10) as
shown in the code snippet below.
parameters = {'C':[.00001, .0001, .001, .01, .1, 1]}
svc = GridSearchCV(svr, parameters)
A value of C = .0001 was the best value returned from the grid search above and was used for all the
results presented in this report.
Region of Interest and Sliding Window Search
The process for vehicle detection begins with the definition of a region of interest (ROI) from within
which a sliding window search will be performed. The region of interest will be confined to the lower
portion of the image frame near the road surface (y > 400 pixels). Since the size of other vehicles will
depend on their relative location to the camera the region of interest will be subdivided into four
overlapping search sectors using different scales and different sizes within the overall ROI. Generally,
window searches at the horizon will be performed at a smaller (i.e., finer) scale than window searches
closer to the camera. The figure below shows the progression of the four search bands that were
developed for the sliding window search.
Sector [1] Sector [1,2]
Sector [1,2,3 ] Sector [1,2,3,4]


4
The table below shows the size and location of the search bands (and corresponding scales) within
which sliding windows are scanned. The dimensions of each search sector are constructed such that
an integer number of sliding windows fit within each search sector.
Search
Sector
Scale
Window
Size
Number of
overlapping
windows
x-window y-window
x-start x-stop y-start y-stop
1 0.875 56 x 56 48 x 4 = 192 380 1094 400 498
2 1.250 80 x 80 46 x 3 = 138 280 1200 420 540
3 1.500 96 x 96 38 x 3 = 114 200 1208 424 592
4 1.750 112 x112 39 x 2 = 120 48 1256 436 660
A window overlap of 75% was used for all window regions within the ROI which corresponds to a
cells_per_step = 2. I tried other amounts of overlap but the results were not as satisfying. One reason
is that less overlap results in less granularity in terms of identifying the location of the vehicle (all
other things being equal). As far as I can tell, more than 75% overlap does not offer significant
advantages in detection and tracking performance not withstanding the runtime considerations.
The process used to develop these search sectors relied heavily on trial and error experimentation.
The result I was looking for was robust detection and track without false positives and a fairly tight
box around the target vehicles. Achieving these objectives required tuning the size and scale of the
four search bands. Due to the scaling of the sectors the windows of one sector do not necessarily line
up with the windows of other sectors, however I did not see this as a relevant design consideration.

Vehicle Detection
Individual vehicle detections are achieved by sampling individual window patches on the image
frame within each search sector. As described above the region of interest is scanned using a sliding
window approach with windows of various sizes, scales and amounts of window overlap. For each
sliding window, features are extracted from the image patch to build a complete feature vector which
is then passed to the classifier to determine whether or not the window patch represents a vehicle. To
illustrate this process, detections from frame 964 are shown below for each of the four search sectors.
Each image shows all of the sliding window patches for which a detection of a vehicle was recorded.
Note that for this image frame there were no detections in the forth search sector.
Sector 1 Detections Sector 2 Detections


5
Sector 3 Detections Sector 4 Detections
For each successful detection of a vehicle we then construct a heat map by adding “heat” for every
pixel in each window patch that recorded a positive detection. The resulting heat map then represents
an aggregated map that accumulates all the positive detections from all window patches across all
search bands. If the heat map value exceeds a predetermined detection threshold then we declare a
vehicle detection in the current image frame1
. However, since it is possible that multiple objects
(vehicles) may be in close proximity to each other we will need an approach to determine how many
vehicles are actually present. As suggested in the lecture notes, the label() function from the
scipy.ndimage.measurements package can be used to determine how many vehicles are represented
by a given heat map. The label() function takes a heat map as input and will output a labeled map and
the number of (distinct) objects (vehicles) in the map. The entire process above can be repeated for
each image frame. The image below to the left shows the heat map for frame 964 and the image to
the right shows the corresponding heat map for the same image frame shown above.
Frame 964 Heatmap Frame 964 Track Boxes

Vehicle Tracking and False Positives
Because any given heat map may contain a small percentage of false positive detections, the heat
maps will be accumulated over several consecutive image frames with a detection threshold applied
that would require multiple consecutive “hits” in order to declare a vehicle detection. Since nearby
vehicles traveling in the same direction do not move at an appreciable rate relative to the vehicle with
the mounted camera, we would expect those vehicles to accumulate detections at a much higher rate

1 One difficulty that arises is setting the detection threshold since the value is dependent upon the number and size of
overlapping windows (i.e., changing the size, shape, and scale of the search bands requires re-calibration of the detection
threshold).


6
than other (stationary) objects in the image frame that look like cars. Accumulating detections over
multiple image frames is a straight forward approach that will minimize false positive detections.
As detections are processed by the system a heat map is built and maintained over several consecutive
image frames to accumulate detections from all search sectors. At each image frame a detection
threshold is applied to the current heat map. If the heat map contains values that exceed the threshold
then a vehicle detection is declared for each labeled region in the heat map (after the threshold is
applied). The resulting labeled region(s) from the label function are used to render a bounding box
which represents the car’s location in the image frame. The image below shows the same image frame
as in the previous examples above with the final rendering of the “tracked” cars and the car’s lane.
Final Rendering Of Lane and Tracking Boxes
The formal tracking of cars from frame to frame is a more complex problem that would additional
algorithms to associate tracks in previous frames to detections in the current frame. We can visually
see the “tracks” from frame to frame as a human observer but the system as built does not formally
track individual cars through the video stream, rather it simply renders detections from frame to frame
that exceed the required accumulated detection threshold. In other words, there is no association
between image frames that insures new detections belong to the correct track.
To illustrate this issue I have included three image frames from the project video output leading up to
frame 964. The image on the left shows two cars in track at frame 746. As the white vehicle is
occluded by the black car in the image frame the tracks are “merged” into a single track on the black
car at frame 833. In the last image a new track is established for the white vehicle in frame 964 but
there is no logic that associates the initial track on the white car in frame 746 with the 2nd
instantiation
of the newly spawned track on the white car in frame 964.
Frame 746: Two Tracks Frame 833: Occlusion Frame 964: Two Tracks


7
Experiment
I decided to try the system out on some actual video footage from my own car. This video was shot
with an iPhone hand held inside dash of my car. The image frame below was taken from the output
video that I also submitted with my project. It shows three cars in track. The system failed to detect
the car on the left over the 10 second video feed and system lost track on the car to the right as it
started to fall back. Not ideal results, but given the impromptu nature of the experiment with a hand-
held camera I was still pretty pleased with the results. This experiment highlighted the need for a
more robust system that could be successfully be used to track vehicles from a wide range of video
streams with a single set of tuning parameters.

Reflection
Like the previous project this project presented an opportunity to solve an interesting and challenging
problem using a wide range of methods. In the interest of time I chose to implement all of the
standard approaches recommended in the lecture notes. If I had more time to devote to this project I
would have pursued the following work:
1. Development of a track rendering systems that shows the recent history of tracking boxes as a
visual aid.
2. Development of a simple detection-to-track association algorithm.
3. More in depth experimentation with classifiers.
4. Data augmentation for more robust vehicle detection.
5. More efficient use of search bands to minimize the number computations required.
6. Development of detection level diagnostic output to facilitate search band design.
7. Development of adaptive thresholding techniques to improve robustness of the system across
a range of environmental conditions.


8
System Output Submitted

Processing Pipeline Images Video Output
HOG_Demo_Cars.png Project_video_track_out.mp4
HOG_Demo_Car_Features.png my_car_track_out.mp4
HOG_Demo_NotCars.png
HOG_Demo_NotCar_Features.png
Frame_964_Search_Sector_01.png
Frame_964_Heatmap.png
Frame_964_Tracking_Boxes.png

Source Code Files Submitted
Since I integrated the Advanced Lane Finding pipeline in this project I include all files from both
projects that were required for the integration.
Project 5: Files Description
vehicle_classifier.py Image classifier
vehicle_config.py Configuration parameters
Vehicle_demos.py Functions to create demo plots
vehicle_detect.py
Main processing pipeline which also
includes the lane finding pipeline
vehicle_feature_extractor.py Functions to extract image features
vehicle_plot_helpers.py Helpers
vehicle_search.py Sliding window functions
vehicle_tracker_driver.py Main driver code
Project 4: Files Description
lane_finder_driver.py Test driver
camera_calibration.py Camera calibration
filter_lane_lines.py Filter images
lane_line_helpers.py Convenience functions
find_lines_from_sliding_window.py Sliding window search algorithm
find_lines_from_fit.py Lane tracking algorithm
Line.py Line class
Lane.py Lane class
params.py Tuning parameters

Udacity project: Vehicle detection and tracking

Recommended

Recommended

More Related Content

Similar to Udacity project: Vehicle detection and tracking

Similar to Udacity project: Vehicle detection and tracking (20)

Recently uploaded

Recently uploaded (20)

Udacity project: Vehicle detection and tracking