1. Combining Motion Information with Appearance Information
and Utilizing of Mutual Information Encoded
Among Image Features of the Same Object
for Better Detection Performance
Zhipeng Wang
Directed by: Pro. Katsushi Ikeuchi, Dr. Masataka Kegesawa, and Dr. Shintaro Ono
Computer Vision Laboratory, The University of Tokyo
運動情報を外観情報との結びつけおよび
同一物体の画像特徴によりエンコードされた相互情報の利用
ーー検出性能の向上のため
2. Outline
Background and state-of-the-art detection
methods
Our work (goal, challenges, related work, method,
results, contribution, and conclusion)
Section 1, 2: utilizing motion information
Section 3, 4: utilizing mutual information
Conclusion
Publication list
3. Background
Object detection from images:
One of the basic perceptual skills in human
Plays an important role in machine vision area
4. Two primary categories
Sliding-window methods
Use classifiers to answer whether each sub-
image contains a target object
Hough transform based methods
Infer about object status in a bottom-up manner
from detected image features
5. Better detection performance
Representative methods
Image features: HOG, sift
Classifiers
Machine learning methods: deep learning
Efficient search techniques
Branch and bound[1]
[1] Lampert, C.H.; Blaschko, M.B.; Hofmann, T.; , "Efficient Subwindow Search: A
Branch and Bound Framework for Object Localization," Pattern Analysis and
Machine Intelligence, IEEE Transactions on , vol.31, no.12, pp.2129-2142, Dec.
2009
6. Our work towards
better detection performance
Combining motion information with
appearance information
Using mutual information encoded among the
features of the same object
By using extra information, we expect to
achieve better detection performance
7. Our work
Combining motion information with
appearance information
1. Common Fate Hough Transform
2. Emergency Telephone Indicator Detection in
Tunnel Environment
9. Goal and challenges
Where and what
label is each object
in the scene
Challenges:
Near objects
Similar different-
class objects
Common Fate Hough Transform
10. Related work
Motion for object detection
Background subtraction
Optical flow[1]
Combination of appearance and motion
information
Mainly for tracking
[1] G. Brostow and R. Cipolla. Unsupervised bayesian detection of
independent motion in crowds. In CVPR, pages I: 594–601, 2006.
11. Hough transform
On the current image:
Detect keypoints and extract image feature
Find the best matched codes from a trained
codebook
Each keypoint votes for
object center and label
B. Leibe and B. Schiele. Interleaved object
categorization and segmentation.
In BMVC, pages 759–768, 2003.
12. Hough transform
Discrete to continuous
Blurring each vote
Summing up all votes from
all object parts
Common Fate Hough Transform
13. The common fate principle
One of the gestalt laws
Elements with the same motion tend to be
perceived as one unit
Common Fate Hough Transform
14. Common fate Hough transform
Motion analysis[1]
Keypoint tracking (KLT
tracker)
Clustering the
trajectories
[1] G. Brostow and R. Cipolla. Unsupervised bayesian detection of
independent motion in crowds. In CVPR, pages I: 594–601, 2006.
15. Common fate Hough transform
The weight of each vote is related to the
support it gains from the motion group
18. Common Fate Hough
Transform
Inference
1. Find the highest peak in the confidence space
2. Exclude the object parts belonging to the
highest peak
3. If the Hough image is not empty, go to 1.
38. Result comparison
[1] O. Barinova, V. Lempitsky, and P. Kohli. On detection of multiple object
instances using hough transforms. In CVPR, pages 2233–2240, 2010.
39. Contribution
A detection method with better detection
performance than the CVPR’10 method
A successful combination of motion and
appearance information for detection
A successful attempt to combine human
perception rules into detection method in the
computer vision area
Common Fate Hough Transform
40. Conclusion
Motion largely improves detection results by
being effectively combined with appearance
information
The method is not efficient
then we propose
Common Fate Hough Transform
41. 2. Detection of Emergency
Telephone Indicators by Fusion
of Motion Information with
Appearance Information
42. Goal
Emergency Telephone Indicator Detection
For vehicle positioning
Detecting emergency telephone indicators
Installed at known location
Infrared cameras (far infrared) installed on vehicle’s
top
47. Pipeline
Verify keypoints
Intensity histogram
Building a mixture model
using k-means
Emergency Telephone Indicator
Detection
48. Pipeline
Cluster keypoints
Build a minimum spanning tree (Euclidean
distance)
Split the tree
Emergency Telephone Indicator
Detection
49. Pipeline
Verify keypoint clusters by appearance
Adaboost machine
Trained from positive and negative examples
Intensity histogram
Emergency Telephone Indicator
Detection
50. Pipeline
Keypoint cluster tracking
No occlusion
Connect each detection
response to its nearest
trajectory (appearance,
scale, and time gap)
Emergency Telephone Indicator
Detection
51. Pipeline
Keypoint cluster verification by motion
Fit each trajectory as a straight line
Use the significance of the fitting as criteria
Emergency Telephone Indicator
Detection
52. Benefits of the pipeline
The system is real-time
More time consuming steps dealing with
fewer instances
Original image: 105 pixels
Keypoint detection: 104 points
Keypoint verification: 103 keypoints
Keypoint clustering: 102 keypoints
Less than 10 keypoint clusters
Emergency Telephone Indicator Detection
55. Contribution
A specialized real-time detection method on
cluttered data
A successful combination of motion and
appearance information for detection
The meaning of this research: a potential
applicable solution for vehicle positioning in
tunnel environment
Emergency Telephone Indicator Detection
57. Future work
We collect new data using new camera and
plan to improve the pipeline on the new
collected data
Emergency Telephone Indicator
Detection
58. Two methods utilizing motion information are
proposed
When motion information is not available,
we then go to
59. Our work
Utilizing of Mutual Information Encoded
Among Image Features of the Same Object
3. Pyramid Match Score for Object Detection
4. Detection of Object With In-plane Rotations
61. Motivation
Sliding window based methods:
Often [1] ignore positional information of each
image feature
Methods based on Hough transform
Ignore overall information of each object
Intention of this method:
An efficient method to utilize positional information
of each image feature and overall information
gained from the whole object
[1] Lazebnik, S.; Schmid, C.; Ponce, J.; , "Beyond Bags of Features: Spatial Pyramid
Matching for Recognizing Natural Scene Categories," Computer Vision and Pattern
Recognition, 2006 IEEE Computer Society Conference on , vol.2, no., pp. 2169- 2178, 2006
62. Pyramid Match
Pyramid Match is a method to find best one-
one match between two sets of points
Here gives a 2D example
Real data is 14D
Appearance information:12D
Position information: 2D
63. Pyramid match
Divide the space from fine to coarse
If two points are in the same grid under a division,
the are considered as match and excluded
64. Pyramid match
The distance of two matched points
No need to be calculated
Assigned
According to the size of the grid, when they are found
match
1/4 * 2 1/2 * 2 1*2
65. Pyramid Match Score
Pyramid match between two sets
Each sub-image: considered as a 14D point set
(appearance: 12, position: 2)
A “super template” contains all 14D points of
training images
66. Pyramid Match Score
Find the near best match between each sub-
image of a test image and the “super
template”
The match score (distance) is used as the
confidence that the sub-image contains a
target object
69. Comments
The results are not good enough
The method seems promising, and in future
we will improve the method
This mutual information can be further
used
71. About the related work
Training examples of all rotation directions,
not robust
or very time-consuming
72. Related work
Two neural networks to
deal with one sub-image
One for estimation of the
sub-image’s rotation angle
The sub-image is rotated so
that the potential face is
upright
The other for decision of
whether one object exists
H. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural
network-based face detection. In CVPR, pages 38–44, 1998.
73. Related work
Local feature method
Infer rotation based on the gradient direction of
sift feature
Graph-based methods
Consider object as graph, and change the graph
Detection of Object With In-plane Rotations
74. Idea
Suppose we have a codebook trained from
objects without in-plane rotations
Detection of Object With In-plane Rotations
75. Idea
With proof omitted
For two or more votes from different
keypoints, there exists one and only one rotation
angle which will minimize the difference of
the voted centers
If all votes are good estimations, we can expect
to be a good estimation the rotation angle of the
object
Detection of Object With In-plane Rotations
76. To do
Propose a method capable of detecting
objects with in-plane
Evaluate the robustness of
Detection of Object With In-plane Rotations
77. Conclusion
Motion information combined with
appearance information
Distinguish target objects with noisy object
Enhance detection rate
Mutual information encoded among the
image features of the same object
We expect to get a method capable of utilizing
positional and overall information
We expect to get a method capable of detecting
objects with in-plane rotations
78. Publication list
[1] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Kegesawa, M.; Ikeuchi, K.; , "Object
detection by common fate Hough transform," Pattern Recognition (ACPR), 2011 First
Asian Conference on , vol., no., pp.613-617, 28-28 Nov. 2011
[2] Zhipeng Wang; Kagesawa, M.; Ono, S.; Banno, A.; Ikeuchi, K.; , "Emergency light
detection in tunnel environment: An efficient method," Pattern Recognition
(ACPR), 2011 First Asian Conference on , vol., no., pp.628-632, 28-28 Nov. 2011
[3] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Masataka Kegesawa; Shintaro Ono;
Katsushi Ikeuchi;, “Detection by Motion-based Grouping of Object Parts,”
International Journal of ITS Research (submitted)
[4] Zhipeng Wang; Masataka Kegesawa; Shintaro Ono; Atsuhiko Banno; Takeshi
Oishi; Katsushi Ikeuchi;, “Detection of Emergency Telephone Indicators Using
Infrared Cameras for Vehicle Positioning in Tunnel Environment,” ITS World
Congress 2013 (submitted)