Wang midterm-defence

Combining Motion Information with Appearance Information
and Utilizing of Mutual Information Encoded
Among Image Features of the Same Object
for Better Detection Performance
Zhipeng Wang
Directed by: Pro. Katsushi Ikeuchi, Dr. Masataka Kegesawa, and Dr. Shintaro Ono
Computer Vision Laboratory, The University of Tokyo
運動情報を外観情報との結びつけおよび
同一物体の画像特徴によりエンコードされた相互情報の利用
ーー検出性能の向上のため

Outline
 Background and state-of-the-art detection
methods
 Our work (goal, challenges, related work, method,
results, contribution, and conclusion)
 Section 1, 2: utilizing motion information
 Section 3, 4: utilizing mutual information
 Conclusion
 Publication list

Background
 Object detection from images:
 One of the basic perceptual skills in human
 Plays an important role in machine vision area

Two primary categories
 Sliding-window methods
 Use classifiers to answer whether each sub-
image contains a target object
 Hough transform based methods
 Infer about object status in a bottom-up manner
from detected image features

Better detection performance
 Representative methods
 Image features: HOG, sift
 Classifiers
 Machine learning methods: deep learning
 Efficient search techniques
 Branch and bound[1]
[1] Lampert, C.H.; Blaschko, M.B.; Hofmann, T.; , "Efficient Subwindow Search: A
Branch and Bound Framework for Object Localization," Pattern Analysis and
Machine Intelligence, IEEE Transactions on , vol.31, no.12, pp.2129-2142, Dec.
2009

Our work towards
better detection performance
 Combining motion information with
appearance information
 Using mutual information encoded among the
features of the same object
 By using extra information, we expect to
achieve better detection performance

Our work
 Combining motion information with
 1. Common Fate Hough Transform
 2. Emergency Telephone Indicator Detection in
Tunnel Environment

1. Common Fate Hough
Transform

Goal and challenges
 Where and what
label is each object
in the scene
 Challenges:
 Near objects
 Similar different-
class objects
Common Fate Hough Transform

Related work
 Motion for object detection
 Background subtraction
 Optical flow[1]
 Combination of appearance and motion
information
 Mainly for tracking
[1] G. Brostow and R. Cipolla. Unsupervised bayesian detection of
independent motion in crowds. In CVPR, pages I: 594–601, 2006.

Hough transform
 On the current image:
 Detect keypoints and extract image feature
 Find the best matched codes from a trained
codebook
 Each keypoint votes for
object center and label
B. Leibe and B. Schiele. Interleaved object
categorization and segmentation.
In BMVC, pages 759–768, 2003.

Hough transform
 Discrete to continuous
 Blurring each vote
 Summing up all votes from
all object parts

The common fate principle
 One of the gestalt laws
 Elements with the same motion tend to be
perceived as one unit

Common fate Hough transform
 Motion analysis[1]
 Keypoint tracking (KLT
tracker)
 Clustering the
trajectories
[1] G. Brostow and R. Cipolla. Unsupervised bayesian detection of
independent motion in crowds. In CVPR, pages I: 594–601, 2006.

Common fate Hough transform
 The weight of each vote is related to the
support it gains from the motion group

Common Fate Hough
Transform
 Inference
 1. Find the highest peak in the confidence space
 2. Exclude the object parts belonging to the
highest peak
 3. If the Hough image is not empty, go to 1.

Inference

Experimental results
 Dataset
 720×576
 401 continuous frames
 633 different-class ground truth bounding boxes
on 79 frames

Results

Result comparison
[1] O. Barinova, V. Lempitsky, and P. Kohli. On detection of multiple object
instances using hough transforms. In CVPR, pages 2233–2240, 2010.

Contribution
 A detection method with better detection
performance than the CVPR’10 method
 A successful combination of motion and
appearance information for detection
 A successful attempt to combine human
perception rules into detection method in the
computer vision area

Conclusion
 Motion largely improves detection results by
being effectively combined with appearance
information
 The method is not efficient
  then we propose

2. Detection of Emergency
Telephone Indicators by Fusion
of Motion Information with
Appearance Information

Goal
Emergency Telephone Indicator Detection
 For vehicle positioning
 Detecting emergency telephone indicators
 Installed at known location
 Infrared cameras (far infrared) installed on vehicle’s
top

Challenges
 Challenges:
 Noisy objects
 Real-time requirement

Method
 Method: a two-step method
 Detect, verify, and cluster keypoints
 Verify keypoint clusters by appearance and
motion information

Keypoint Detection
Keypoint Verification
Keypoint Clustering
Keypoint Cluster Verification
by Appearance
Keypoint Cluster
Tracking
Keypoint Cluster Verification
by Motion
Pipeline

Pipeline
 Detect keypoints
 Averagely sampling
 Setting intensity thresholds
 160-190 (0-255)
Emergency Telephone Indicator

Pipeline
 Verify keypoints
 Intensity histogram
 Building a mixture model
using k-means
Detection

Pipeline
 Cluster keypoints
 Build a minimum spanning tree (Euclidean
distance)
 Split the tree
Detection

Pipeline
 Verify keypoint clusters by appearance
 Adaboost machine
 Trained from positive and negative examples
 Intensity histogram
Detection

Pipeline
 Keypoint cluster tracking
 No occlusion
 Connect each detection
response to its nearest
trajectory (appearance,
scale, and time gap)
Detection

Pipeline
 Keypoint cluster verification by motion
 Fit each trajectory as a straight line
 Use the significance of the fitting as criteria
Detection

Benefits of the pipeline
 The system is real-time
 More time consuming steps dealing with
fewer instances
 Original image: 105 pixels
 Keypoint detection: 104 points
 Keypoint verification: 103 keypoints
 Keypoint clustering: 102 keypoints
  Less than 10 keypoint clusters

 Real-time (34 fps)
 A laptop with Intel Core2 Duo 2.8GHz
processors

Contribution
 A specialized real-time detection method on
cluttered data
 A successful combination of motion and
appearance information for detection
 The meaning of this research: a potential
applicable solution for vehicle positioning in
tunnel environment

Conclusion
 Motion information plays an important role in
the detecting process

Future work
 We collect new data using new camera and
plan to improve the pipeline on the new
collected data
Detection

 Two methods utilizing motion information are
proposed
  When motion information is not available,
we then go to

Our work
 Utilizing of Mutual Information Encoded
Among Image Features of the Same Object
 3. Pyramid Match Score for Object Detection
 4. Detection of Object With In-plane Rotations

3. Pyramid Match Score for
Object Detection

Motivation
 Sliding window based methods:
 Often [1] ignore positional information of each
image feature
 Methods based on Hough transform
 Ignore overall information of each object
 Intention of this method:
 An efficient method to utilize positional information
of each image feature and overall information
gained from the whole object
[1] Lazebnik, S.; Schmid, C.; Ponce, J.; , "Beyond Bags of Features: Spatial Pyramid
Matching for Recognizing Natural Scene Categories," Computer Vision and Pattern
Recognition, 2006 IEEE Computer Society Conference on , vol.2, no., pp. 2169- 2178, 2006

Pyramid Match
 Pyramid Match is a method to find best one-
one match between two sets of points
 Here gives a 2D example
 Real data is 14D
 Appearance information:12D
 Position information: 2D

Pyramid match
 Divide the space from fine to coarse
 If two points are in the same grid under a division,
the are considered as match and excluded

Pyramid match
 The distance of two matched points
 No need to be calculated
 Assigned
 According to the size of the grid, when they are found
match
1/4 * 2 1/2 * 2 1*2

Pyramid Match Score
 Pyramid match between two sets
 Each sub-image: considered as a 14D point set
(appearance: 12, position: 2)
 A “super template” contains all 14D points of
training images

Pyramid Match Score
 Find the near best match between each sub-
image of a test image and the “super
template”
 The match score (distance) is used as the
confidence that the sub-image contains a
target object

Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Detectionrate
Precison
3
10
7
av
back

Comments
 The results are not good enough
 The method seems promising, and in future
we will improve the method
  This mutual information can be further
used

4 Detection of Object With
In-plane Rotations

About the related work
 Training examples of all rotation directions,
 not robust
 or very time-consuming

Related work
 Two neural networks to
deal with one sub-image
 One for estimation of the
sub-image’s rotation angle
 The sub-image is rotated so
that the potential face is
upright
 The other for decision of
whether one object exists
H. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural
network-based face detection. In CVPR, pages 38–44, 1998.

Related work
 Local feature method
 Infer rotation based on the gradient direction of
sift feature
 Graph-based methods
 Consider object as graph, and change the graph
Detection of Object With In-plane Rotations

Idea
 Suppose we have a codebook trained from
objects without in-plane rotations

Idea
 With proof omitted
 For two or more votes from different
keypoints, there exists one and only one rotation
angle which will minimize the difference of
the voted centers
If all votes are good estimations, we can expect
to be a good estimation the rotation angle of the
object

To do
 Propose a method capable of detecting
objects with in-plane
 Evaluate the robustness of

Conclusion
 Motion information combined with
 Distinguish target objects with noisy object
 Enhance detection rate
 Mutual information encoded among the
image features of the same object
 We expect to get a method capable of utilizing
positional and overall information
 We expect to get a method capable of detecting
objects with in-plane rotations

Publication list
 [1] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Kegesawa, M.; Ikeuchi, K.; , "Object
detection by common fate Hough transform," Pattern Recognition (ACPR), 2011 First
Asian Conference on , vol., no., pp.613-617, 28-28 Nov. 2011
 [2] Zhipeng Wang; Kagesawa, M.; Ono, S.; Banno, A.; Ikeuchi, K.; , "Emergency light
detection in tunnel environment: An efficient method," Pattern Recognition
(ACPR), 2011 First Asian Conference on , vol., no., pp.628-632, 28-28 Nov. 2011
 [3] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Masataka Kegesawa; Shintaro Ono;
Katsushi Ikeuchi;, “Detection by Motion-based Grouping of Object Parts,”
International Journal of ITS Research (submitted)
 [4] Zhipeng Wang; Masataka Kegesawa; Shintaro Ono; Atsuhiko Banno; Takeshi
Oishi; Katsushi Ikeuchi;, “Detection of Emergency Telephone Indicators Using
Infrared Cameras for Vehicle Positioning in Tunnel Environment,” ITS World
Congress 2013 (submitted)

Wang midterm-defence

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Wang midterm-defence

Similar to Wang midterm-defence (20)

Recently uploaded

Recently uploaded (20)

Wang midterm-defence