Combining Motion Information with Appearance Information
and Utilizing of Mutual Information Encoded
Among Image Features ...
Outline
 Background and state-of-the-art detection
methods
 Our work (goal, challenges, related work, method,
results, c...
Background
 Object detection from images:
 One of the basic perceptual skills in human
 Plays an important role in mach...
Two primary categories
 Sliding-window methods
 Use classifiers to answer whether each sub-
image contains a target obje...
Better detection performance
 Representative methods
 Image features: HOG, sift
 Classifiers
 Machine learning methods...
Our work towards
better detection performance
 Combining motion information with
appearance information
 Using mutual in...
Our work
 Combining motion information with
appearance information
 1. Common Fate Hough Transform
 2. Emergency Teleph...
1. Common Fate Hough
Transform
Goal and challenges
 Where and what
label is each object
in the scene
 Challenges:
 Near objects
 Similar different-
c...
Related work
 Motion for object detection
 Background subtraction
 Optical flow[1]
 Combination of appearance and moti...
Hough transform
 On the current image:
 Detect keypoints and extract image feature
 Find the best matched codes from a ...
Hough transform
 Discrete to continuous
 Blurring each vote
 Summing up all votes from
all object parts
Common Fate Hou...
The common fate principle
 One of the gestalt laws
 Elements with the same motion tend to be
perceived as one unit
Commo...
Common fate Hough transform
 Motion analysis[1]
 Keypoint tracking (KLT
tracker)
 Clustering the
trajectories
[1] G. Br...
Common fate Hough transform
 The weight of each vote is related to the
support it gains from the motion group
Common fate Hough
Transform
Common fate Hough transform
Common Fate Hough
Transform
 Inference
 1. Find the highest peak in the confidence space
 2. Exclude the object parts b...
Inference
Common Fate Hough Transform
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Experimental results
 Dataset
 720×576
 401 continuous frames
 633 different-class ground truth bounding boxes
on 79 f...
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Result comparison
[1] O. Barinova, V. Lempitsky, and P. Kohli. On detection of multiple object
instances using hough trans...
Contribution
 A detection method with better detection
performance than the CVPR’10 method
 A successful combination of ...
Conclusion
 Motion largely improves detection results by
being effectively combined with appearance
information
 The met...
2. Detection of Emergency
Telephone Indicators by Fusion
of Motion Information with
Appearance Information
Goal
Emergency Telephone Indicator Detection
 For vehicle positioning
 Detecting emergency telephone indicators
 Instal...
Challenges
 Challenges:
 Noisy objects
 Real-time requirement
Emergency Telephone Indicator Detection
Method
 Method: a two-step method
 Detect, verify, and cluster keypoints
 Verify keypoint clusters by appearance and
mo...
Keypoint Detection
Keypoint Verification
Keypoint Clustering
Keypoint Cluster Verification
by Appearance
Keypoint Cluster
...
Pipeline
 Detect keypoints
 Averagely sampling
 Setting intensity thresholds
 160-190 (0-255)
Emergency Telephone Indi...
Pipeline
 Verify keypoints
 Intensity histogram
 Building a mixture model
using k-means
Emergency Telephone Indicator
D...
Pipeline
 Cluster keypoints
 Build a minimum spanning tree (Euclidean
distance)
 Split the tree
Emergency Telephone Ind...
Pipeline
 Verify keypoint clusters by appearance
 Adaboost machine
 Trained from positive and negative examples
 Inten...
Pipeline
 Keypoint cluster tracking
 No occlusion
 Connect each detection
response to its nearest
trajectory (appearanc...
Pipeline
 Keypoint cluster verification by motion
 Fit each trajectory as a straight line
 Use the significance of the ...
Benefits of the pipeline
 The system is real-time
 More time consuming steps dealing with
fewer instances
 Original ima...
Experimental results
 Real-time (34 fps)
 A laptop with Intel Core2 Duo 2.8GHz
processors
Experimental results
Emergency Telephone Indicator Detection
Contribution
 A specialized real-time detection method on
cluttered data
 A successful combination of motion and
appeara...
Conclusion
 Motion information plays an important role in
the detecting process
Emergency Telephone Indicator Detection
Future work
 We collect new data using new camera and
plan to improve the pipeline on the new
collected data
Emergency Te...
 Two methods utilizing motion information are
proposed
  When motion information is not available,
we then go to
Our work
 Utilizing of Mutual Information Encoded
Among Image Features of the Same Object
 3. Pyramid Match Score for Ob...
3. Pyramid Match Score for
Object Detection
Motivation
 Sliding window based methods:
 Often [1] ignore positional information of each
image feature
 Methods based...
Pyramid Match
 Pyramid Match is a method to find best one-
one match between two sets of points
 Here gives a 2D example...
Pyramid match
 Divide the space from fine to coarse
 If two points are in the same grid under a division,
the are consid...
Pyramid match
 The distance of two matched points
 No need to be calculated
 Assigned
 According to the size of the gr...
Pyramid Match Score
 Pyramid match between two sets
 Each sub-image: considered as a 14D point set
(appearance: 12, posi...
Pyramid Match Score
 Find the near best match between each sub-
image of a test image and the “super
template”
 The matc...
Results on UIUC cars
Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Detectionrate
Precison
3
10
7
av
back
Comments
 The results are not good enough
 The method seems promising, and in future
we will improve the method
  This...
4 Detection of Object With
In-plane Rotations
About the related work
 Training examples of all rotation directions,
 not robust
 or very time-consuming
Related work
 Two neural networks to
deal with one sub-image
 One for estimation of the
sub-image’s rotation angle
 The...
Related work
 Local feature method
 Infer rotation based on the gradient direction of
sift feature
 Graph-based methods...
Idea
 Suppose we have a codebook trained from
objects without in-plane rotations
Detection of Object With In-plane Rotati...
Idea
 With proof omitted
 For two or more votes from different
keypoints, there exists one and only one rotation
angle w...
To do
 Propose a method capable of detecting
objects with in-plane
 Evaluate the robustness of
Detection of Object With ...
Conclusion
 Motion information combined with
appearance information
 Distinguish target objects with noisy object
 Enha...
Publication list
 [1] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Kegesawa, M.; Ikeuchi, K.; , "Object
detection by common fat...
Thank you very much!
Upcoming SlideShare
Loading in …5
×

Wang midterm-defence

256 views
165 views

Published on

This is the slides used for my mid-term defense.

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
256
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Wang midterm-defence

  1. 1. Combining Motion Information with Appearance Information and Utilizing of Mutual Information Encoded Among Image Features of the Same Object for Better Detection Performance Zhipeng Wang Directed by: Pro. Katsushi Ikeuchi, Dr. Masataka Kegesawa, and Dr. Shintaro Ono Computer Vision Laboratory, The University of Tokyo 運動情報を外観情報との結びつけおよび 同一物体の画像特徴によりエンコードされた相互情報の利用 ーー検出性能の向上のため
  2. 2. Outline  Background and state-of-the-art detection methods  Our work (goal, challenges, related work, method, results, contribution, and conclusion)  Section 1, 2: utilizing motion information  Section 3, 4: utilizing mutual information  Conclusion  Publication list
  3. 3. Background  Object detection from images:  One of the basic perceptual skills in human  Plays an important role in machine vision area
  4. 4. Two primary categories  Sliding-window methods  Use classifiers to answer whether each sub- image contains a target object  Hough transform based methods  Infer about object status in a bottom-up manner from detected image features
  5. 5. Better detection performance  Representative methods  Image features: HOG, sift  Classifiers  Machine learning methods: deep learning  Efficient search techniques  Branch and bound[1] [1] Lampert, C.H.; Blaschko, M.B.; Hofmann, T.; , "Efficient Subwindow Search: A Branch and Bound Framework for Object Localization," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.31, no.12, pp.2129-2142, Dec. 2009
  6. 6. Our work towards better detection performance  Combining motion information with appearance information  Using mutual information encoded among the features of the same object  By using extra information, we expect to achieve better detection performance
  7. 7. Our work  Combining motion information with appearance information  1. Common Fate Hough Transform  2. Emergency Telephone Indicator Detection in Tunnel Environment
  8. 8. 1. Common Fate Hough Transform
  9. 9. Goal and challenges  Where and what label is each object in the scene  Challenges:  Near objects  Similar different- class objects Common Fate Hough Transform
  10. 10. Related work  Motion for object detection  Background subtraction  Optical flow[1]  Combination of appearance and motion information  Mainly for tracking [1] G. Brostow and R. Cipolla. Unsupervised bayesian detection of independent motion in crowds. In CVPR, pages I: 594–601, 2006.
  11. 11. Hough transform  On the current image:  Detect keypoints and extract image feature  Find the best matched codes from a trained codebook  Each keypoint votes for object center and label B. Leibe and B. Schiele. Interleaved object categorization and segmentation. In BMVC, pages 759–768, 2003.
  12. 12. Hough transform  Discrete to continuous  Blurring each vote  Summing up all votes from all object parts Common Fate Hough Transform
  13. 13. The common fate principle  One of the gestalt laws  Elements with the same motion tend to be perceived as one unit Common Fate Hough Transform
  14. 14. Common fate Hough transform  Motion analysis[1]  Keypoint tracking (KLT tracker)  Clustering the trajectories [1] G. Brostow and R. Cipolla. Unsupervised bayesian detection of independent motion in crowds. In CVPR, pages I: 594–601, 2006.
  15. 15. Common fate Hough transform  The weight of each vote is related to the support it gains from the motion group
  16. 16. Common fate Hough Transform
  17. 17. Common fate Hough transform
  18. 18. Common Fate Hough Transform  Inference  1. Find the highest peak in the confidence space  2. Exclude the object parts belonging to the highest peak  3. If the Hough image is not empty, go to 1.
  19. 19. Inference Common Fate Hough Transform
  20. 20. Common Fate Hough Transform Inference
  21. 21. Common Fate Hough Transform Inference
  22. 22. Common Fate Hough Transform Inference
  23. 23. Common Fate Hough Transform Inference
  24. 24. Common Fate Hough Transform Inference
  25. 25. Common Fate Hough Transform Inference
  26. 26. Common Fate Hough Transform Inference
  27. 27. Experimental results  Dataset  720×576  401 continuous frames  633 different-class ground truth bounding boxes on 79 frames Common Fate Hough Transform
  28. 28. Results Common Fate Hough Transform
  29. 29. Results Common Fate Hough Transform
  30. 30. Results Common Fate Hough Transform
  31. 31. Results Common Fate Hough Transform
  32. 32. Results Common Fate Hough Transform
  33. 33. Results Common Fate Hough Transform
  34. 34. Results Common Fate Hough Transform
  35. 35. Results Common Fate Hough Transform
  36. 36. Results Common Fate Hough Transform
  37. 37. Results Common Fate Hough Transform
  38. 38. Result comparison [1] O. Barinova, V. Lempitsky, and P. Kohli. On detection of multiple object instances using hough transforms. In CVPR, pages 2233–2240, 2010.
  39. 39. Contribution  A detection method with better detection performance than the CVPR’10 method  A successful combination of motion and appearance information for detection  A successful attempt to combine human perception rules into detection method in the computer vision area Common Fate Hough Transform
  40. 40. Conclusion  Motion largely improves detection results by being effectively combined with appearance information  The method is not efficient   then we propose Common Fate Hough Transform
  41. 41. 2. Detection of Emergency Telephone Indicators by Fusion of Motion Information with Appearance Information
  42. 42. Goal Emergency Telephone Indicator Detection  For vehicle positioning  Detecting emergency telephone indicators  Installed at known location  Infrared cameras (far infrared) installed on vehicle’s top
  43. 43. Challenges  Challenges:  Noisy objects  Real-time requirement Emergency Telephone Indicator Detection
  44. 44. Method  Method: a two-step method  Detect, verify, and cluster keypoints  Verify keypoint clusters by appearance and motion information Emergency Telephone Indicator Detection
  45. 45. Keypoint Detection Keypoint Verification Keypoint Clustering Keypoint Cluster Verification by Appearance Keypoint Cluster Tracking Keypoint Cluster Verification by Motion Pipeline
  46. 46. Pipeline  Detect keypoints  Averagely sampling  Setting intensity thresholds  160-190 (0-255) Emergency Telephone Indicator
  47. 47. Pipeline  Verify keypoints  Intensity histogram  Building a mixture model using k-means Emergency Telephone Indicator Detection
  48. 48. Pipeline  Cluster keypoints  Build a minimum spanning tree (Euclidean distance)  Split the tree Emergency Telephone Indicator Detection
  49. 49. Pipeline  Verify keypoint clusters by appearance  Adaboost machine  Trained from positive and negative examples  Intensity histogram Emergency Telephone Indicator Detection
  50. 50. Pipeline  Keypoint cluster tracking  No occlusion  Connect each detection response to its nearest trajectory (appearance, scale, and time gap) Emergency Telephone Indicator Detection
  51. 51. Pipeline  Keypoint cluster verification by motion  Fit each trajectory as a straight line  Use the significance of the fitting as criteria Emergency Telephone Indicator Detection
  52. 52. Benefits of the pipeline  The system is real-time  More time consuming steps dealing with fewer instances  Original image: 105 pixels  Keypoint detection: 104 points  Keypoint verification: 103 keypoints  Keypoint clustering: 102 keypoints   Less than 10 keypoint clusters Emergency Telephone Indicator Detection
  53. 53. Experimental results  Real-time (34 fps)  A laptop with Intel Core2 Duo 2.8GHz processors
  54. 54. Experimental results Emergency Telephone Indicator Detection
  55. 55. Contribution  A specialized real-time detection method on cluttered data  A successful combination of motion and appearance information for detection  The meaning of this research: a potential applicable solution for vehicle positioning in tunnel environment Emergency Telephone Indicator Detection
  56. 56. Conclusion  Motion information plays an important role in the detecting process Emergency Telephone Indicator Detection
  57. 57. Future work  We collect new data using new camera and plan to improve the pipeline on the new collected data Emergency Telephone Indicator Detection
  58. 58.  Two methods utilizing motion information are proposed   When motion information is not available, we then go to
  59. 59. Our work  Utilizing of Mutual Information Encoded Among Image Features of the Same Object  3. Pyramid Match Score for Object Detection  4. Detection of Object With In-plane Rotations
  60. 60. 3. Pyramid Match Score for Object Detection
  61. 61. Motivation  Sliding window based methods:  Often [1] ignore positional information of each image feature  Methods based on Hough transform  Ignore overall information of each object  Intention of this method:  An efficient method to utilize positional information of each image feature and overall information gained from the whole object [1] Lazebnik, S.; Schmid, C.; Ponce, J.; , "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on , vol.2, no., pp. 2169- 2178, 2006
  62. 62. Pyramid Match  Pyramid Match is a method to find best one- one match between two sets of points  Here gives a 2D example  Real data is 14D  Appearance information:12D  Position information: 2D
  63. 63. Pyramid match  Divide the space from fine to coarse  If two points are in the same grid under a division, the are considered as match and excluded
  64. 64. Pyramid match  The distance of two matched points  No need to be calculated  Assigned  According to the size of the grid, when they are found match 1/4 * 2 1/2 * 2 1*2
  65. 65. Pyramid Match Score  Pyramid match between two sets  Each sub-image: considered as a 14D point set (appearance: 12, position: 2)  A “super template” contains all 14D points of training images
  66. 66. Pyramid Match Score  Find the near best match between each sub- image of a test image and the “super template”  The match score (distance) is used as the confidence that the sub-image contains a target object
  67. 67. Results on UIUC cars
  68. 68. Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Detectionrate Precison 3 10 7 av back
  69. 69. Comments  The results are not good enough  The method seems promising, and in future we will improve the method   This mutual information can be further used
  70. 70. 4 Detection of Object With In-plane Rotations
  71. 71. About the related work  Training examples of all rotation directions,  not robust  or very time-consuming
  72. 72. Related work  Two neural networks to deal with one sub-image  One for estimation of the sub-image’s rotation angle  The sub-image is rotated so that the potential face is upright  The other for decision of whether one object exists H. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based face detection. In CVPR, pages 38–44, 1998.
  73. 73. Related work  Local feature method  Infer rotation based on the gradient direction of sift feature  Graph-based methods  Consider object as graph, and change the graph Detection of Object With In-plane Rotations
  74. 74. Idea  Suppose we have a codebook trained from objects without in-plane rotations Detection of Object With In-plane Rotations
  75. 75. Idea  With proof omitted  For two or more votes from different keypoints, there exists one and only one rotation angle which will minimize the difference of the voted centers If all votes are good estimations, we can expect to be a good estimation the rotation angle of the object Detection of Object With In-plane Rotations
  76. 76. To do  Propose a method capable of detecting objects with in-plane  Evaluate the robustness of Detection of Object With In-plane Rotations
  77. 77. Conclusion  Motion information combined with appearance information  Distinguish target objects with noisy object  Enhance detection rate  Mutual information encoded among the image features of the same object  We expect to get a method capable of utilizing positional and overall information  We expect to get a method capable of detecting objects with in-plane rotations
  78. 78. Publication list  [1] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Kegesawa, M.; Ikeuchi, K.; , "Object detection by common fate Hough transform," Pattern Recognition (ACPR), 2011 First Asian Conference on , vol., no., pp.613-617, 28-28 Nov. 2011  [2] Zhipeng Wang; Kagesawa, M.; Ono, S.; Banno, A.; Ikeuchi, K.; , "Emergency light detection in tunnel environment: An efficient method," Pattern Recognition (ACPR), 2011 First Asian Conference on , vol., no., pp.628-632, 28-28 Nov. 2011  [3] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Masataka Kegesawa; Shintaro Ono; Katsushi Ikeuchi;, “Detection by Motion-based Grouping of Object Parts,” International Journal of ITS Research (submitted)  [4] Zhipeng Wang; Masataka Kegesawa; Shintaro Ono; Atsuhiko Banno; Takeshi Oishi; Katsushi Ikeuchi;, “Detection of Emergency Telephone Indicators Using Infrared Cameras for Vehicle Positioning in Tunnel Environment,” ITS World Congress 2013 (submitted)
  79. 79. Thank you very much!

×