Lath

Object Detection and Tracking
Object Detection In A Cluttered Scene Using Point Feature Matching
This example shows how to detect a particular object in a cluttered scene, given a reference image
of the object.
Overview
Step 1: Read Images
Step 2: Detect Feature Points
Step 3: Extract Feature Descriptors
Step 4: Find Putative Point Matches
Step 5: Locate the Object in the Scene Using Putative Matches
Step 7: Detect Another Object
Overview
This example presents an algorithm for detecting a specific object based on finding point
correspondences between the reference and the target image. It can detect objects despite a
scale change or in-plane rotation. It is also robust to small amount of out-of-plane rotation and
occlusion.
This method of object detection works best for objects that exhibit non-repeating texture patterns,
which give rise to unique feature matches. This technique is not likely to work well for uniformly-
colored objects, or for objects containing repeating patterns. Note that this algorithm is designed for
detecting a specific object, for example, the elephant in the reference image, rather than any
elephant. For detecting objects of a particular category, such as people or faces, see
vision.PeopleDetector and vision.CascadeObjectDetector.
Step 1: Read Images
Read the reference image containing the object of interest.
boxImage = imread('stapleRemover.jpg');
figure; imshow(boxImage);
title('Image of a Box');
Object Detection and Tracking - MATLAB & Simuli... http://www.mathworks.in/help/vision/gs/object-dete...
1 of 34 04/24/2014 07:10 PM

Read the target image containing a cluttered scene.
sceneImage = imread('clutteredDesk.jpg');
figure; imshow(sceneImage);
title('Image of a Cluttered Scene');
Step 2: Detect Feature Points
Detect feature points in both images.
2 of 34 04/24/2014 07:10 PM

boxPoints = detectSURFFeatures(boxImage);
scenePoints = detectSURFFeatures(sceneImage);
Visualize the strongest feature points found in the reference image.
figure; imshow(boxImage);
title('100 Strongest Feature Points from Box Image');
hold on;
plot(boxPoints.selectStrongest(100));
Visualize the strongest feature points found in the target image.
title('300 Strongest Feature Points from Scene Image');
hold on;
plot(scenePoints.selectStrongest(300));
3 of 34 04/24/2014 07:10 PM

Step 3: Extract Feature Descriptors
Extract feature descriptors at the interest points in both images.
[boxFeatures, boxPoints] = extractFeatures(boxImage,
boxPoints);
[sceneFeatures, scenePoints] = extractFeatures(sceneImage,
scenePoints);
Step 4: Find Putative Point Matches
Match the features using their descriptors.
boxPairs = matchFeatures(boxFeatures, sceneFeatures);
Display putatively matched features.
matchedBoxPoints = boxPoints(boxPairs(:, 1), :);
matchedScenePoints = scenePoints(boxPairs(:, 2), :);
figure;
showMatchedFeatures(boxImage, sceneImage, matchedBoxPoints,
...
matchedScenePoints, 'montage');
title('Putatively Matched Points (Including Outliers)');
4 of 34 04/24/2014 07:10 PM

Step 5: Locate the Object in the Scene Using Putative Matches
estimateGeometricTransform calculates the transformation relating the matched points, while
eliminating outliers. This transformation allows us to localize the object in the scene.
[tform, inlierBoxPoints, inlierScenePoints] = ...
estimateGeometricTransform(matchedBoxPoints,
matchedScenePoints, 'affine');
Display the matching point pairs with the outliers removed
figure;
showMatchedFeatures(boxImage, sceneImage, inlierBoxPoints, ...
inlierScenePoints, 'montage');
title('Matched Points (Inliers Only)');
5 of 34 04/24/2014 07:10 PM

Get the bounding polygon of the reference image.
boxPolygon = [1, 1;... % top-left
size(boxImage, 2), 1;... % top-right
size(boxImage, 2), size(boxImage, 1);... % bottom-
right
1, size(boxImage, 1);... % bottom-left
1, 1]; % top-left again to close
the polygon
Transform the polygon into the coordinate system of the target image. The transformed polygon
indicates the location of the object in the scene.
newBoxPolygon = transformPointsForward(tform, boxPolygon);
Display the detected object.
hold on;
line(newBoxPolygon(:, 1), newBoxPolygon(:, 2), 'Color', 'y');
title('Detected Box');
6 of 34 04/24/2014 07:10 PM

Step 7: Detect Another Object
Detect a second object by using the same steps as before.
Read an image containing the second object of interest.
elephantImage = imread('elephant.jpg');
figure; imshow(elephantImage);
title('Image of an Elephant');
7 of 34 04/24/2014 07:10 PM

Detect and visualize feature points.
elephantPoints = detectSURFFeatures(elephantImage);
figure; imshow(elephantImage);
hold on;
plot(elephantPoints.selectStrongest(100));
title('100 Strongest Feature Points from Elephant Image');
8 of 34 04/24/2014 07:10 PM

Extract feature descriptors.
[elephantFeatures, elephantPoints] =
extractFeatures(elephantImage, elephantPoints);
Match Features
elephantPairs = matchFeatures(elephantFeatures,
sceneFeatures, 'MaxRatio', 0.9);
Display putatively matched features.
matchedElephantPoints = elephantPoints(elephantPairs(:, 1),
:);
matchedScenePoints = scenePoints(elephantPairs(:, 2), :);
figure;
showMatchedFeatures(elephantImage, sceneImage,
matchedElephantPoints, ...
9 of 34 04/24/2014 07:10 PM

matchedScenePoints, 'montage');
title('Putatively Matched Points (Including Outliers)');
Estimate Geometric Transformation and Eliminate Outliers
[tform, inlierElephantPoints, inlierScenePoints] = ...
estimateGeometricTransform(matchedElephantPoints,
matchedScenePoints, 'affine');
figure;
showMatchedFeatures(elephantImage, sceneImage,
inlierElephantPoints, ...
inlierScenePoints, 'montage');
title('Matched Points (Inliers Only)');
10 of 34 04/24/2014 07:10 PM

Display Both Objects
elephantPolygon = [1, 1;... %
top-left
size(elephantImage, 2), 1;... %
top-right
size(elephantImage, 2), size(elephantImage, 1);... %
bottom-right
1, size(elephantImage, 1);... %
bottom-left
1,1]; % top-left again to
close the polygon
newElephantPolygon = transformPointsForward(tform,
elephantPolygon);
figure;
imshow(sceneImage);
hold on;
line(newBoxPolygon(:, 1), newBoxPolygon(:, 2), 'Color', 'y');
line(newElephantPolygon(:, 1), newElephantPolygon(:, 2),
'Color', 'g');
title('Detected Elephant and Box');
11 of 34 04/24/2014 07:10 PM

Face Detection and Tracking Using CAMShift
This example shows how to automatically detect and track a face.
Introduction
Step 1: Detect a Face To Track
Step 2: Identify Facial Features To Track
Step 3: Track the Face
Summary
Reference
Introduction
Object detection and tracking are important in many computer vision applications including activity
recognition, automotive safety, and surveillance. In this example, you will develop a simple face
tracking system by dividing the tracking problem into three separate problems:
Detect a face to track1.
Identify facial features to track2.
Track the face3.
Step 1: Detect a Face To Track
Before you begin tracking a face, you need to first detect it. Use the vision.CascadeObjectDetector
to detect the location of a face in a video frame. The cascade object detector uses the Viola-Jones
detection algorithm and a trained classification model for detection. By default, the detector is
configured to detect faces, but it can be configured for other object types.
% Create a cascade detector object.
faceDetector = vision.CascadeObjectDetector();
% Read a video frame and run the detector.
videoFileReader = vision.VideoFileReader('visionface.avi');
videoFrame = step(videoFileReader);
bbox = step(faceDetector, videoFrame);
% Draw the returned bounding box around the detected face.
videoOut =
insertObjectAnnotation(videoFrame,'rectangle',bbox,'Face');
figure, imshow(videoOut), title('Detected face');
12 of 34 04/24/2014 07:10 PM

You can use the cascade object detector to track a face across successive video frames. However,
when the face tilts or the person turns their head, you may lose tracking. This limitation is due to
the type of trained classification model used for detection. To avoid this issue, and because
performing face detection for every video frame is computationally intensive, this example uses a
simple facial feature for tracking.
Step 2: Identify Facial Features To Track
Once the face is located in the video, the next step is to identify a feature that will help you track
the face. For example, you can use the shape, texture, or color. Choose a feature that is unique to
the object and remains invariant even when the object moves.
In this example, you use skin tone as the feature to track. The skin tone provides a good deal of
contrast between the face and the background and does not change as the face rotates or moves.
% Get the skin tone information by extracting the Hue from
the video frame
% converted to the HSV color space.
[hueChannel,~,~] = rgb2hsv(videoFrame);
% Display the Hue Channel data and draw the bounding box
around the face.
figure, imshow(hueChannel), title('Hue channel data');
rectangle('Position',bbox(1,:),'LineWidth',2,'EdgeColor',[1 1
0])
13 of 34 04/24/2014 07:10 PM

Step 3: Track the Face
With the skin tone selected as the feature to track, you can now use the
vision.HistogramBasedTracker for tracking. The histogram based tracker uses the CAMShift
algorithm, which provides the capability to track an object using a histogram of pixel values. In this
example, the Hue channel pixels are extracted from the nose region of the detected face. These
pixels are used to initialize the histogram for the tracker. The example tracks the object over
successive video frames using this histogram.
% Detect the nose within the face region. The nose provides a
more accurate
% measure of the skin tone because it does not contain any
background
% pixels.
noseDetector = vision.CascadeObjectDetector('Nose');
faceImage = imcrop(videoFrame,bbox(1,:));
noseBBox = step(noseDetector,faceImage);
% The nose bounding box is defined relative to the cropped
face image.
% Adjust the nose bounding box so that it is relative to the
original video
% frame.
noseBBox(1,1:2) = noseBBox(1,1:2) + bbox(1,1:2);
14 of 34 04/24/2014 07:10 PM

% Create a tracker object.
tracker = vision.HistogramBasedTracker;
% Initialize the tracker histogram using the Hue channel
pixels from the
% nose.
initializeObject(tracker, hueChannel, noseBBox(1,:));
% Create a video player object for displaying video frames.
videoInfo = info(videoFileReader);
videoPlayer = vision.VideoPlayer('Position',[300 300
videoInfo.VideoSize+30]);
% Track the face over successive video frames until the video
is finished.
while ~isDone(videoFileReader)
% Extract the next video frame
videoFrame = step(videoFileReader);
% RGB -> HSV
[hueChannel,~,~] = rgb2hsv(videoFrame);
% Track using the Hue channel data
bbox = step(tracker, hueChannel);
% Insert a bounding box around the object being tracked
videoOut =
insertObjectAnnotation(videoFrame,'rectangle',bbox,'Face');
% Display the annotated video frame using the video
player object
step(videoPlayer, videoOut);
end
% Release resources
release(videoFileReader);
15 of 34 04/24/2014 07:10 PM

release(videoPlayer);
Summary
In this example, you created a simple face tracking system that automatically detects and tracks a
single face. Try changing the input video and see if you are able to track a face. If you notice poor
tracking results, check the Hue channel data to see if there is enough contrast between the face
region and the background.
Reference
[1] G.R. Bradski "Real Time Face and Object Tracking as a Component of a Perceptual User
Interface", Proceedings of the 4th IEEE Workshop on Applications of Computer Vision, 1998.
[2] Viola, Paul A. and Jones, Michael J. "Rapid Object Detection using a Boosted Cascade of
Simple Features", IEEE CVPR, 2001.
Motion-Based Multiple Object Tracking
This example shows how to perform automatic detection and motion-based tracking of moving
objects in a video from a stationary camera.
Detection of moving objects and motion-based tracking are important components of many
computer vision applications, including activity recognition, traffic monitoring, and automotive safety.
The problem of motion-based object tracking can be divided into two parts:
detecting moving objects in each frame1.
associating the detections corresponding to the same object over time2.
The detection of moving objects uses a background subtraction algorithm based on Gaussian
mixture models. Morphological operations are applied to the resulting foreground mask to eliminate
noise. Finally, blob analysis detects groups of connected pixels, which are likely to correspond to
16 of 34 04/24/2014 07:10 PM

moving objects.
The association of detections to the same object is based solely on motion. The motion of each
track is estimated by a Kalman filter. The filter is used to predict the track's location in each frame,
and determine the likelihood of each detection being assigned to each track.
Track maintenance becomes an important aspect of this example. In any given frame, some
detections may be assigned to tracks, while other detections and tracks may remain
unassigned.The assigned tracks are updated using the corresponding detections. The unassigned
tracks are marked invisible. An unassigned detection begins a new track.
Each track keeps count of the number of consecutive frames, where it remained unassigned. If the
count exceeds a specified threshold, the example assumes that the object left the field of view and
it deletes the track.
This example is a function with the main body at the top and helper routines in the form of nested
functions below.
function multiObjectTracking()
% Create system objects used for reading video, detecting
moving objects,
% and displaying the results.
obj = setupSystemObjects();
tracks = initializeTracks(); % Create an empty array of
tracks.
nextId = 1; % ID of the next track
% Detect moving objects, and track them across video frames.
while ~isDone(obj.reader)
frame = readFrame();
[centroids, bboxes, mask] = detectObjects(frame);
predictNewLocationsOfTracks();
[assignments, unassignedTracks, unassignedDetections] =
...
detectionToTrackAssignment();
updateAssignedTracks();
updateUnassignedTracks();
deleteLostTracks();
createNewTracks();
17 of 34 04/24/2014 07:10 PM

displayTrackingResults();
end
Create System Objects
Initialize Tracks
Read a Video Frame
Detect Objects
18 of 34 04/24/2014 07:10 PM

Predict New Locations of Existing Tracks
Assign Detections to Tracks
Update Assigned Tracks
Update Unassigned Tracks
Delete Lost Tracks
Create New Tracks
Display Tracking Results
Summary
Create System Objects
Create System objects used for reading the video frames, detecting foreground objects, and
displaying results.
function obj = setupSystemObjects()
% Initialize Video I/O
% Create objects for reading a video from a file,
drawing the tracked
% objects in each frame, and playing the video.
% Create a video file reader.
obj.reader = vision.VideoFileReader('atrium.avi');
% Create two video players, one to display the video,
% and one to display the foreground mask.
obj.videoPlayer = vision.VideoPlayer('Position', [20,
400, 700, 400]);
obj.maskPlayer = vision.VideoPlayer('Position', [740,
400, 700, 400]);
% Create system objects for foreground detection and
blob analysis
% The foreground detector is used to segment moving
objects from
% the background. It outputs a binary mask, where the
pixel value
% of 1 corresponds to the foreground and the value of
0 corresponds
% to the background.
19 of 34 04/24/2014 07:10 PM

obj.detector =
vision.ForegroundDetector('NumGaussians', 3, ...
'NumTrainingFrames', 40,
'MinimumBackgroundRatio', 0.7);
% Connected groups of foreground pixels are likely to
correspond to moving
% objects. The blob analysis system object is used
to find such groups
% (called 'blobs' or 'connected components'), and
compute their
% characteristics, such as area, centroid, and the
bounding box.
obj.blobAnalyser =
vision.BlobAnalysis('BoundingBoxOutputPort', true, ...
'AreaOutputPort', true, 'CentroidOutputPort',
true, ...
'MinimumBlobArea', 400);
end
Initialize Tracks
The initializeTracks function creates an array of tracks, where each track is a structure
representing a moving object in the video. The purpose of the structure is to maintain the state of a
tracked object. The state consists of information used for detection to track assignment, track
termination, and display.
The structure contains the following fields:
id : the integer ID of the track
bbox : the current bounding box of the object; used for display
kalmanFilter : a Kalman filter object used for motion-based tracking
age : the number of frames since the track was first detected
totalVisibleCount : the total number of frames in which the track was detected (visible)
consecutiveInvisibleCount : the number of consecutive frames for which the track was not
detected (invisible).
Noisy detections tend to result in short-lived tracks. For this reason, the example only displays an
object after it was tracked for some number of frames. This happens when totalVisibleCount
exceeds a specified threshold.
20 of 34 04/24/2014 07:10 PM

When no detections are associated with a track for several consecutive frames, the example
assumes that the object has left the field of view and deletes the track. This happens when
consecutiveInvisibleCount exceeds a specified threshold. A track may also get deleted as noise if
it was tracked for a short time, and marked invisible for most of the of the frames.
function tracks = initializeTracks()
% create an empty array of tracks
tracks = struct(...
'id', {}, ...
'bbox', {}, ...
'kalmanFilter', {}, ...
'age', {}, ...
'totalVisibleCount', {}, ...
'consecutiveInvisibleCount', {});
end
Read a Video Frame
Read the next video frame from the video file.
function frame = readFrame()
frame = obj.reader.step();
end
Detect Objects
The detectObjects function returns the centroids and the bounding boxes of the detected objects.
It also returns the binary mask, which has the same size as the input frame. Pixels with a value of 1
correspond to the foreground, and pixels with a value of 0 correspond to the background.
The function performs motion segmentation using the foreground detector. It then performs
morphological operations on the resulting binary mask to remove noisy pixels and to fill the holes in
the remaining blobs.
function [centroids, bboxes, mask] = detectObjects(frame)
% Detect foreground.
mask = obj.detector.step(frame);
% Apply morphological operations to remove noise and
fill in holes.
mask = imopen(mask, strel('rectangle', [3,3]));
mask = imclose(mask, strel('rectangle', [15, 15]));
21 of 34 04/24/2014 07:10 PM

mask = imfill(mask, 'holes');
% Perform blob analysis to find connected components.
[~, centroids, bboxes] = obj.blobAnalyser.step(mask);
end
Predict New Locations of Existing Tracks
Use the Kalman filter to predict the centroid of each track in the current frame, and update its
bounding box accordingly.
function predictNewLocationsOfTracks()
for i = 1:length(tracks)
bbox = tracks(i).bbox;
% Predict the current location of the track.
predictedCentroid =
predict(tracks(i).kalmanFilter);
% Shift the bounding box so that its center is at
% the predicted location.
predictedCentroid = int32(predictedCentroid) -
bbox(3:4) / 2;
tracks(i).bbox = [predictedCentroid, bbox(3:4)];
end
end
Assign Detections to Tracks
Assigning object detections in the current frame to existing tracks is done by minimizing cost. The
cost is defined as the negative log-likelihood of a detection corresponding to a track.
The algorithm involves two steps:
Step 1: Compute the cost of assigning every detection to each track using the distance method of
the vision.KalmanFilter System object. The cost takes into account the Euclidean distance
between the predicted centroid of the track and the centroid of the detection. It also includes the
confidence of the prediction, which is maintained by the Kalman filter. The results are stored in an
MxN matrix, where M is the number of tracks, and N is the number of detections.
Step 2: Solve the assignment problem represented by the cost matrix using the
assignDetectionsToTracks function. The function takes the cost matrix and the cost of not
assigning any detections to a track.
The value for the cost of not assigning a detection to a track depends on the range of values
22 of 34 04/24/2014 07:10 PM

returned by the distance method of the vision.KalmanFilter. This value must be tuned
experimentally. Setting it too low increases the likelihood of creating a new track, and may result in
track fragmentation. Setting it too high may result in a single track corresponding to a series of
separate moving objects.
The assignDetectionsToTracks function uses the Munkres' version of the Hungarian algorithm to
compute an assignment which minimizes the total cost. It returns an M x 2 matrix containing the
corresponding indices of assigned tracks and detections in its two columns. It also returns the
indices of tracks and detections that remained unassigned.
function [assignments, unassignedTracks,
unassignedDetections] = ...
detectionToTrackAssignment()
nTracks = length(tracks);
nDetections = size(centroids, 1);
% Compute the cost of assigning each detection to
each track.
cost = zeros(nTracks, nDetections);
for i = 1:nTracks
cost(i, :) = distance(tracks(i).kalmanFilter,
centroids);
end
% Solve the assignment problem.
costOfNonAssignment = 20;
[assignments, unassignedTracks, unassignedDetections]
= ...
assignDetectionsToTracks(cost,
costOfNonAssignment);
end
Update Assigned Tracks
The updateAssignedTracks function updates each assigned track with the corresponding detection.
It calls the correct method of vision.KalmanFilter to correct the location estimate. Next, it stores
the new bounding box, and increases the age of the track and the total visible count by 1. Finally,
the function sets the invisible count to 0.
function updateAssignedTracks()
numAssignedTracks = size(assignments, 1);
23 of 34 04/24/2014 07:10 PM

for i = 1:numAssignedTracks
trackIdx = assignments(i, 1);
detectionIdx = assignments(i, 2);
centroid = centroids(detectionIdx, :);
bbox = bboxes(detectionIdx, :);
% Correct the estimate of the object's location
% using the new detection.
correct(tracks(trackIdx).kalmanFilter, centroid);
% Replace predicted bounding box with detected
% bounding box.
tracks(trackIdx).bbox = bbox;
% Update track's age.
tracks(trackIdx).age = tracks(trackIdx).age + 1;
% Update visibility.
tracks(trackIdx).totalVisibleCount = ...
tracks(trackIdx).totalVisibleCount + 1;
tracks(trackIdx).consecutiveInvisibleCount = 0;
end
end
Update Unassigned Tracks
Mark each unassigned track as invisible, and increase its age by 1.
function updateUnassignedTracks()
for i = 1:length(unassignedTracks)
ind = unassignedTracks(i);
tracks(ind).age = tracks(ind).age + 1;
tracks(ind).consecutiveInvisibleCount = ...
tracks(ind).consecutiveInvisibleCount + 1;
end
end
Delete Lost Tracks
The deleteLostTracks function deletes tracks that have been invisible for too many consecutive
frames. It also deletes recently created tracks that have been invisible for too many frames overall.
24 of 34 04/24/2014 07:10 PM

function deleteLostTracks()
if isempty(tracks)
return;
end
invisibleForTooLong = 20;
ageThreshold = 8;
% Compute the fraction of the track's age for which
it was visible.
ages = [tracks(:).age];
totalVisibleCounts = [tracks(:).totalVisibleCount];
visibility = totalVisibleCounts ./ ages;
% Find the indices of 'lost' tracks.
lostInds = (ages < ageThreshold & visibility < 0.6) |
...
[tracks(:).consecutiveInvisibleCount] >=
invisibleForTooLong;
% Delete lost tracks.
tracks = tracks(~lostInds);
end
Create New Tracks
Create new tracks from unassigned detections. Assume that any unassigned detection is a start of
a new track. In practice, you can use other cues to eliminate noisy detections, such as size,
location, or appearance.
function createNewTracks()
centroids = centroids(unassignedDetections, :);
bboxes = bboxes(unassignedDetections, :);
for i = 1:size(centroids, 1)
centroid = centroids(i,:);
bbox = bboxes(i, :);
25 of 34 04/24/2014 07:10 PM

% Create a Kalman filter object.
kalmanFilter =
configureKalmanFilter('ConstantVelocity', ...
centroid, [200, 50], [100, 25], 100);
% Create a new track.
newTrack = struct(...
'id', nextId, ...
'bbox', bbox, ...
'kalmanFilter', kalmanFilter, ...
'age', 1, ...
'totalVisibleCount', 1, ...
'consecutiveInvisibleCount', 0);
% Add it to the array of tracks.
tracks(end + 1) = newTrack;
% Increment the next id.
nextId = nextId + 1;
end
end
Display Tracking Results
The displayTrackingResults function draws a bounding box and label ID for each track on the
video frame and the foreground mask. It then displays the frame and the mask in their respective
video players.
function displayTrackingResults()
% Convert the frame and the mask to uint8 RGB.
frame = im2uint8(frame);
mask = uint8(repmat(mask, [1, 1, 3])) .* 255;
minVisibleCount = 8;
if ~isempty(tracks)
% Noisy detections tend to result in short-lived
tracks.
% Only display tracks that have been visible for
more than
26 of 34 04/24/2014 07:10 PM

% a minimum number of frames.
reliableTrackInds = ...
[tracks(:).totalVisibleCount] >
minVisibleCount;
reliableTracks = tracks(reliableTrackInds);
% Display the objects. If an object has not been
detected
% in this frame, display its predicted bounding
box.
if ~isempty(reliableTracks)
% Get bounding boxes.
bboxes = cat(1, reliableTracks.bbox);
% Get ids.
ids = int32([reliableTracks(:).id]);
% Create labels for objects indicating the
ones for
% which we display the predicted rather than
the actual
% location.
labels = cellstr(int2str(ids'));
predictedTrackInds = ...
[reliableTracks(:).consecutiveInvisibleCount] > 0;
isPredicted = cell(size(labels));
isPredicted(predictedTrackInds) = {'
predicted'};
labels = strcat(labels, isPredicted);
% Draw the objects on the frame.
frame = insertObjectAnnotation(frame,
'rectangle', ...
bboxes, labels);
% Draw the objects on the mask.
mask = insertObjectAnnotation(mask,
27 of 34 04/24/2014 07:10 PM

'rectangle', ...
bboxes, labels);
end
end
% Display the mask and the frame.
obj.maskPlayer.step(mask);
obj.videoPlayer.step(frame);
end
Summary
This example created a motion-based system for detecting and tracking multiple moving objects.
Try using a different video to see if you are able to detect and track objects. Try modifying the
parameters for the detection, assignment, and deletion steps.
The tracking in this example was solely based on motion with the assumption that all objects move
in a straight line with constant speed. When the motion of an object significantly deviates from this
model, the example may produce tracking errors. Notice the mistake in tracking the person labeled
#12, when he is occluded by the tree.
The likelihood of tracking errors can be reduced by using a more complex motion model, such as
constant acceleration, or by using multiple Kalman filters for every object. Also, you can incorporate
other cues for associating detections over time, such as size, shape, and color.
end
Detecting Cars Using Gaussian Mixture Models
This example shows how to detect and count cars in a video sequence using foreground detector
based on Gaussian mixture models (GMMs).
Introduction
Step 1 - Import Video and Initialize Foreground Detector
Step 2 - Detect Cars in an Initial Video Frame
Step 3 - Process the Rest of Video Frames
Introduction
Detecting and counting cars can be used to analyze traffic patterns. Detection is also a first step
prior to performing more sophisticated tasks such as tracking or categorization of vehicles by their
type.
This example shows how to use the foreground detector and blob analysis to detect and count
cars in a video sequence. It assumes that the camera is stationary. The example focuses on
detecting objects. To learn more about tracking objects, see the example titled Motion-Based
Multiple Object Tracking.
28 of 34 04/24/2014 07:10 PM

Step 1 - Import Video and Initialize Foreground Detector
Rather than immediately processing the entire video, the example starts by obtaining an initial
video frame in which the moving objects are segmented from the background. This helps to
gradually introduce the steps used to process the video.
The foreground detector requires a certain number of video frames in order to initialize the
Gaussian mixture model. This example uses the first 50 frames to initialize three Gaussian modes in
the mixture model.
foregroundDetector =
vision.ForegroundDetector('NumGaussians', 3, ...
'NumTrainingFrames', 50);
videoReader = vision.VideoFileReader('visiontraffic.avi');
for i = 1:150
frame = step(videoReader); % read the next video frame
foreground = step(foregroundDetector, frame);
end
After the training, the detector begins to output more reliable segmentation results. The two figures
below show one of the video frames and the foreground mask computed by the detector.
figure; imshow(frame); title('Video Frame');
figure; imshow(foreground); title('Foreground');
29 of 34 04/24/2014 07:10 PM

Step 2 - Detect Cars in an Initial Video Frame
The foreground segmentation process is not perfect and often includes undesirable noise. The
example uses morphological opening to remove the noise and to fill gaps in the detected objects.
se = strel('square', 3);
filteredForeground = imopen(foreground, se);
figure; imshow(filteredForeground); title('Clean Foreground');
Next, we find bounding boxes of each connected component corresponding to a moving car by
30 of 34 04/24/2014 07:10 PM

using vision.BlobAnalysis object. The object further filters the detected foreground by rejecting
blobs which contain fewer than 150 pixels.
blobAnalysis = vision.BlobAnalysis('BoundingBoxOutputPort',
true, ...
'AreaOutputPort', false, 'CentroidOutputPort', false, ...
'MinimumBlobArea', 150);
bbox = step(blobAnalysis, filteredForeground);
To highlight the detected cars, we draw green boxes around them.
result = insertShape(frame, 'Rectangle', bbox, 'Color',
'green');
The number of bounding boxes corresponds to the number of cars found in the video frame. We
display the number of found cars in the upper left corner of the processed video frame.
numCars = size(bbox, 1);
result = insertText(result, [10 10], numCars, 'BoxOpacity',
1, ...
'FontSize', 14);
figure; imshow(result); title('Detected Cars');
Step 3 - Process the Rest of Video Frames
In the final step, we process the remaining video frames.
31 of 34 04/24/2014 07:10 PM

videoPlayer = vision.VideoPlayer('Name', 'Detected Cars');
videoPlayer.Position(3:4) = [650,400]; % window size:
[width, height]
se = strel('square', 3); % morphological filter for noise
removal
while ~isDone(videoReader)
frame = step(videoReader); % read the next video frame
% Detect the foreground in the current video frame
foreground = step(foregroundDetector, frame);
% Use morphological opening to remove noise in the
foreground
filteredForeground = imopen(foreground, se);
% Detect the connected components with the specified
minimum area, and
% compute their bounding boxes
bbox = step(blobAnalysis, filteredForeground);
% Draw bounding boxes around the detected cars
result = insertShape(frame, 'Rectangle', bbox, 'Color',
'green');
% Display the number of cars found in the video frame
numCars = size(bbox, 1);
result = insertText(result, [10 10], numCars,
'BoxOpacity', 1, ...
'FontSize', 14);
step(videoPlayer, result); % display the results
end
release(videoReader); % close the video file
32 of 34 04/24/2014 07:10 PM

The output video displays the bounding boxes around the cars. It also displays the number of cars
in the upper left corner of the video.
Tracking Cars Using Optical Flow
This example shows how to detect and track cars in a video sequence using optical flow estimation.
Example Model
Tracking Cars Using Optical Flow Results
Available Example Versions
Example Model
The following figure shows the Tracking Cars Using Optical Flow model:
Tracking Cars Using Optical Flow Results
33 of 34 04/24/2014 07:10 PM

The model uses an optical flow estimation technique to estimate the motion vectors in each frame
of the video sequence. By thresholding and performing morphological closing on the motion
vectors, the model produces binary feature images. The model locates the cars in each binary
feature image using the Blob Analysis block. Then it uses the Draw Shapes block to draw a green
rectangle around the cars that pass beneath the white line. The counter in the upper left corner of
the Results window tracks the number of cars in the region of interest.
Available Example Versions
Windows® only: viptrafficof_win.mdl
Platform independent: viptrafficof_all.mdl
Windows-only example models might contain compressed multimedia files or To Video Display
blocks, both of which are only supported on Windows platforms. The To Video Display block
supports code generation, and its performance is optimized for Windows.
34 of 34 04/24/2014 07:10 PM

Lath

Recommended

Recommended

More Related Content

Similar to Lath

Similar to Lath (20)

Recently uploaded

Recently uploaded (20)

Lath