1 
Real Time Pedestrian Detection, 
Tracking and Distance Estimation 
Keywords: HOG, Lukas Kanade, Pinehole Camera, OpenCV 
# of slides : 30 
Omid.Asudeh@mavs.uta.edu 
University of Texas At Arlington
2 
Problem definition (Goal): 
In this project, given a stream of video, we want to 
detect people, track them, and find their distance in a 
real-time manner. 
Issues: 
● Pedestrian Detection 
● Pedestrian Tracking 
● Distance Detection
3 
People Detection 
What is HOG (Histograms of Oriented Gradients)?: 
Answer: It is a kind of feature 
descriptor. 
Then what is a feature descriptor? 
The goal of a feature descriptor is to generalize an object in 
a way that same objects (e.g. pedestrians) make as close as 
possible descriptors. That way the task of classification 
becomes easier.
4 
Objects Descriptors 
HOG 
Classifier (SVM) 
? 
No 
Yes Is Pedestrian 
Is Not Pedestrian 
Pedestrian detection Process
5 
Challenges of HOG and SVM Classifier: 
1. HOG is Computationally Complex 
2. It needs the whole body to be detected 
3. Classifiers are not exact (it may classify wrongly) 
HOG in Open CV : 
Input : an image 
Output : a list of rectangles each bounding a pedestrian 
Idea : Read a video stream frame by frame and do HOG 
on each frame and show the results 
Slow 
Lets try the idea! To make sure
6 
Feature point Tracking 
What Does a feature tracker do ? 
Answer: Suppose we are given two images. We have the 
position of some points on the first image. In the second 
image the scene has changed a little. The goal of the tracker 
is to find the position of the points in the second image. 
Benefit : It is not computationally complex
7 
Tracking in OpenCV: OpenCV has implemented the 
Lucas-Kanade (LK) traking algorithm 
Input: Image A, list of feature points in image A, Image B 
Output : List of tracked feature points in the Image B 
Idea : Forget about the HOG. Read the video stream 
frame by frame, do corner detection, track corners using 
LK algorithm from the current frame to the next in a loop. 
seems that you forgot about the pedestrian. Using this algorithm, the 
program will track all the feature points (corners) of the image and not 
specially those related to the people. 
Not that easy! 
Lets Not try the idea To make sure!
● HOG detects people but is not fast enough to be used in real-time applications 
● Tracking corners is a relatively fast operation but it is not specified to people 
8 
Idea : 
1)Read the video stream frame by frame 
2)Do people detection using HOG until finding people. 
3)Do corner detection for each people 
4)Track corners using LK from the current frame to the 
next. 
Good idea, but it still have some issues!! 
But Why??
9 
ImgA = Get Current frame 
People-list = 
Find People 
(HOG) in ImgA 
No People found? 
Feature-list + = 
findFeaturePoints(p) 
ImgB = 
Get Next frame 
1.Tracked-feature-list = 
Track (ImgA,ImgB,Feature-list) 
2.Feature-list = Tracked-feature-list 
3.ImgA = ImgB 
4.Show (Feature-list) 
Yes 
∀ p∈people−list 
Initial Idea 
Detection Tracking loop
10 
What if some feature points lost during tracking loop?
11 
ImgA = Get Current frame 
People-list = 
Find People 
(HOG) in ImgA 
No People found? 
Feature-list + = 
findFeaturePoints(p) 
ImgB = 
Get Next frame 
1.Tracked-feature-list = 
Track (ImgA,ImgB,Feature-list) 
2.Feature-list = Tracked-feature-list 
3.ImgA = ImgB 
4.Show (Feature-list) 
Yes 
∀ p∈people−list 
Developing Idea... 
# of tracked features 
<Threshold 
Detection Tracking loop 
yes 
No
12 
ImgA = Get Current frame 
People-list = 
Find People 
(HOG) in ImgA 
No People found? 
Feature-list + = 
findFeaturePoints(p) 
ImgB = 
Get Next frame 
1.Tracked-feature-list = 
Track (ImgA,ImgB,Feature-list) 
2.Feature-list = Tracked-feature-list 
3. ImgA = ImgB 
4.Show (Feature-list) 
Yes 
∀ p∈people−list 
Developing Idea... 
Detection 
# of tracked features 
<Threshold 
Tracking loop 
yes 
No 
This part is not clear, 
how do you deal with 
finding features of each 
person?
13 
Okay, Let's talk about more details 
What does HOG give us in openCV? 
HOG 
A list of rectangles
14 
What is a rectangle? <X,Y,W,H> 
So, HOG gives us : <(x1,y1,w1,h1);(x2,y2,w2,h2);(x3,y3,w3,h3),...>
15 
Crop each rectangle and do feature detection for it 
Then map features 
of each rectangle to 
the original image
16 
X = X1+x' 
Y = Y1+y' 
Coordinates in 
original image
17 
It seems that you tackled the complexity issue of 
HOG so far, and the algorithm can detect and 
track people relatively fast. Also it can track 
people even if their full body is not in the screen. 
Yes, I did. 
You know what? I think there is 
something badly wrong with your 
algorithm 
What??????
18 
Challenge : 
● HOG is not a perfect algorithm. It may consider 
wrong objects (like a tree ) as people. 
● The tree is not moving, so it rarely loses its 
feature points during tracking loop. 
● Therefore, the algorithm recur tracking 
loop infinite times (trap), and it will never 
go to the detection phase again!! 
You may say this is the end of 
the story, but there is always 
hope!
19 
● The wrongly recognized object, is not moving, so it 
rarely loses its feature points. 
This is the key! 
Idea: 
If the number of the feature points are not 
changing during a while, say in 200 frames, 
then the algorithm should suspect a trap state 
and break the tracking loop. 
Therefore, we can tackle this 
problem by adding a simple 
condition to the previous 
algorithm
20 
Dealing with wrong object detection challenge 
ImgA = Get Current frame 
People-list = 
Find People 
(HOG) in ImgA 
People found? 
No 
Feature-list + = 
findFeaturePoints(p) 
ImgB = 
Get Next frame 
1.Tracked-feature-list = 
Track (ImgA,ImgB,Feature-list) 
2.Feature-list = Tracked-feature-list 
3.ImgA = ImgB 
4.Show (Feature-list) 
Yes 
∀ p∈people−list 
(# of tracked features 
<Threshold) 
yes 
Detection Tracking loop 
No 
OR 
(# of features are not 
changing for a while )
21 
Distance recognition
22 
Problem definition: Using a pinhole camera, 
we want an strategy to estimate the distance 
of the pedestrians 
Challenge: that is impossible to get the distance of 
objects using just a pinhole camera!! 
Yes, but what if we have a piece of information? 
What kind of information are 
you talking about? 
Like the height of the camera from the ground!
23 
h 
Z 
f 
Image plane 
v0 
v It can be proved that: 
Z= 
f∗k v∗h 
v−v0 
Where f and k_v are the focal length and 
the pixel density(pixel/meter) of the 
camera.
24 
Challenge: We should find V which is the 
height of the base point (where feet of the 
pedestrian touch the ground) in the image 
plane 
But How to find the bottom 
point?
25 
h_i 
h_j 
Base line 
Let's define the bottom of the rectangle 
obtained from detection phase as the 
base line. 
Idea: the average height of the 
feature points from the base 
line remains unchanged in the 
video stream.
26 
i 
j 
d_i d_ j 
Detection Rectangle 
j' 
i' 
Base line Estimation 
The whole image while tracking 
Keep <d_i, d_j> 
V_ i' V_ j' 
V'_b 
V'_b = Average(V_ j'+d_ j , V_ I' +d_ i) 
Example for 2 points 
Goal
27 
Scaling 
The closer the pedestrian to the camera the more difference 
between the points. 
V_ i' V_ j' 
j' 
i' 
α=Average∀i, j ∈people[ p] 
V j'−Vi' 
di−d j 
Scaling factor: 
Vb=α∗V 'b
28 
Let's see some videos of the results 
of this work!
29 
References: 
● Learning opencv, 1st edition , O'Reilly Media, Inc. ©2008 
ISBN: 9780596516130, Dr. Gary Rost Bradski, Adrian Kaehler 
● http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients 
● https://chrisjmccormick.wordpress.com/2013/05/09/hog-person-detector-tutorial/ 
● http://en.wikipedia.org/wiki/Lucas%E2%80%93Kanade_method
30 
Omid.Asudeh@mavs.uta.edu

Real time pedestrian detection, tracking, and distance estimation

  • 1.
    1 Real TimePedestrian Detection, Tracking and Distance Estimation Keywords: HOG, Lukas Kanade, Pinehole Camera, OpenCV # of slides : 30 Omid.Asudeh@mavs.uta.edu University of Texas At Arlington
  • 2.
    2 Problem definition(Goal): In this project, given a stream of video, we want to detect people, track them, and find their distance in a real-time manner. Issues: ● Pedestrian Detection ● Pedestrian Tracking ● Distance Detection
  • 3.
    3 People Detection What is HOG (Histograms of Oriented Gradients)?: Answer: It is a kind of feature descriptor. Then what is a feature descriptor? The goal of a feature descriptor is to generalize an object in a way that same objects (e.g. pedestrians) make as close as possible descriptors. That way the task of classification becomes easier.
  • 4.
    4 Objects Descriptors HOG Classifier (SVM) ? No Yes Is Pedestrian Is Not Pedestrian Pedestrian detection Process
  • 5.
    5 Challenges ofHOG and SVM Classifier: 1. HOG is Computationally Complex 2. It needs the whole body to be detected 3. Classifiers are not exact (it may classify wrongly) HOG in Open CV : Input : an image Output : a list of rectangles each bounding a pedestrian Idea : Read a video stream frame by frame and do HOG on each frame and show the results Slow Lets try the idea! To make sure
  • 6.
    6 Feature pointTracking What Does a feature tracker do ? Answer: Suppose we are given two images. We have the position of some points on the first image. In the second image the scene has changed a little. The goal of the tracker is to find the position of the points in the second image. Benefit : It is not computationally complex
  • 7.
    7 Tracking inOpenCV: OpenCV has implemented the Lucas-Kanade (LK) traking algorithm Input: Image A, list of feature points in image A, Image B Output : List of tracked feature points in the Image B Idea : Forget about the HOG. Read the video stream frame by frame, do corner detection, track corners using LK algorithm from the current frame to the next in a loop. seems that you forgot about the pedestrian. Using this algorithm, the program will track all the feature points (corners) of the image and not specially those related to the people. Not that easy! Lets Not try the idea To make sure!
  • 8.
    ● HOG detectspeople but is not fast enough to be used in real-time applications ● Tracking corners is a relatively fast operation but it is not specified to people 8 Idea : 1)Read the video stream frame by frame 2)Do people detection using HOG until finding people. 3)Do corner detection for each people 4)Track corners using LK from the current frame to the next. Good idea, but it still have some issues!! But Why??
  • 9.
    9 ImgA =Get Current frame People-list = Find People (HOG) in ImgA No People found? Feature-list + = findFeaturePoints(p) ImgB = Get Next frame 1.Tracked-feature-list = Track (ImgA,ImgB,Feature-list) 2.Feature-list = Tracked-feature-list 3.ImgA = ImgB 4.Show (Feature-list) Yes ∀ p∈people−list Initial Idea Detection Tracking loop
  • 10.
    10 What ifsome feature points lost during tracking loop?
  • 11.
    11 ImgA =Get Current frame People-list = Find People (HOG) in ImgA No People found? Feature-list + = findFeaturePoints(p) ImgB = Get Next frame 1.Tracked-feature-list = Track (ImgA,ImgB,Feature-list) 2.Feature-list = Tracked-feature-list 3.ImgA = ImgB 4.Show (Feature-list) Yes ∀ p∈people−list Developing Idea... # of tracked features <Threshold Detection Tracking loop yes No
  • 12.
    12 ImgA =Get Current frame People-list = Find People (HOG) in ImgA No People found? Feature-list + = findFeaturePoints(p) ImgB = Get Next frame 1.Tracked-feature-list = Track (ImgA,ImgB,Feature-list) 2.Feature-list = Tracked-feature-list 3. ImgA = ImgB 4.Show (Feature-list) Yes ∀ p∈people−list Developing Idea... Detection # of tracked features <Threshold Tracking loop yes No This part is not clear, how do you deal with finding features of each person?
  • 13.
    13 Okay, Let'stalk about more details What does HOG give us in openCV? HOG A list of rectangles
  • 14.
    14 What isa rectangle? <X,Y,W,H> So, HOG gives us : <(x1,y1,w1,h1);(x2,y2,w2,h2);(x3,y3,w3,h3),...>
  • 15.
    15 Crop eachrectangle and do feature detection for it Then map features of each rectangle to the original image
  • 16.
    16 X =X1+x' Y = Y1+y' Coordinates in original image
  • 17.
    17 It seemsthat you tackled the complexity issue of HOG so far, and the algorithm can detect and track people relatively fast. Also it can track people even if their full body is not in the screen. Yes, I did. You know what? I think there is something badly wrong with your algorithm What??????
  • 18.
    18 Challenge : ● HOG is not a perfect algorithm. It may consider wrong objects (like a tree ) as people. ● The tree is not moving, so it rarely loses its feature points during tracking loop. ● Therefore, the algorithm recur tracking loop infinite times (trap), and it will never go to the detection phase again!! You may say this is the end of the story, but there is always hope!
  • 19.
    19 ● Thewrongly recognized object, is not moving, so it rarely loses its feature points. This is the key! Idea: If the number of the feature points are not changing during a while, say in 200 frames, then the algorithm should suspect a trap state and break the tracking loop. Therefore, we can tackle this problem by adding a simple condition to the previous algorithm
  • 20.
    20 Dealing withwrong object detection challenge ImgA = Get Current frame People-list = Find People (HOG) in ImgA People found? No Feature-list + = findFeaturePoints(p) ImgB = Get Next frame 1.Tracked-feature-list = Track (ImgA,ImgB,Feature-list) 2.Feature-list = Tracked-feature-list 3.ImgA = ImgB 4.Show (Feature-list) Yes ∀ p∈people−list (# of tracked features <Threshold) yes Detection Tracking loop No OR (# of features are not changing for a while )
  • 21.
  • 22.
    22 Problem definition:Using a pinhole camera, we want an strategy to estimate the distance of the pedestrians Challenge: that is impossible to get the distance of objects using just a pinhole camera!! Yes, but what if we have a piece of information? What kind of information are you talking about? Like the height of the camera from the ground!
  • 23.
    23 h Z f Image plane v0 v It can be proved that: Z= f∗k v∗h v−v0 Where f and k_v are the focal length and the pixel density(pixel/meter) of the camera.
  • 24.
    24 Challenge: Weshould find V which is the height of the base point (where feet of the pedestrian touch the ground) in the image plane But How to find the bottom point?
  • 25.
    25 h_i h_j Base line Let's define the bottom of the rectangle obtained from detection phase as the base line. Idea: the average height of the feature points from the base line remains unchanged in the video stream.
  • 26.
    26 i j d_i d_ j Detection Rectangle j' i' Base line Estimation The whole image while tracking Keep <d_i, d_j> V_ i' V_ j' V'_b V'_b = Average(V_ j'+d_ j , V_ I' +d_ i) Example for 2 points Goal
  • 27.
    27 Scaling Thecloser the pedestrian to the camera the more difference between the points. V_ i' V_ j' j' i' α=Average∀i, j ∈people[ p] V j'−Vi' di−d j Scaling factor: Vb=α∗V 'b
  • 28.
    28 Let's seesome videos of the results of this work!
  • 29.
    29 References: ●Learning opencv, 1st edition , O'Reilly Media, Inc. ©2008 ISBN: 9780596516130, Dr. Gary Rost Bradski, Adrian Kaehler ● http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients ● https://chrisjmccormick.wordpress.com/2013/05/09/hog-person-detector-tutorial/ ● http://en.wikipedia.org/wiki/Lucas%E2%80%93Kanade_method
  • 30.