Real-Time Computer Vision       Microsoft Computer Vision SchoolVincent Lepetit - CVLab - EPFL (Lausanne, Switzerland)    ...
demo       2
applications       ...               3
• How the demo works (including Randomized  Trees);• More recent work.                                             4
Background• 3D world to 2D images (projection matrix,  internal parameters, external parameters,  homography, ...);• Robus...
From the 3D World to a 2D Image           World coordinate               system                                           ...
Perspective Projection           World coordinate               system                                                 M  ...
Expressing M in the Camera Coordinates System                 World coordinate                               M            ...
Homogeneous Coordinates                                                                                           ⎛ X ⎞ ...
Projection                                             Mcam                         m         f                           ...
From Projection to ImageCoordinates of m in pixels ?                                         1                            ...
Putting• the perspective projection and• the transformation into pixel coordinatestogether:        X         Y  mX = f , m...
The Full Transformation     The two transformations are chained to form the full transformation from a    3D point in the ...
The Full Transformation                                                                         ⎛ X ⎞           ⎛ u ⎞ ...
Homography                                 H3×3                                                                      M/m  ...
Computing a Projection Matrix or a Homography        from Point Correspondences          by solving a linear system       ...
Computing a Projection Matrix or a Homography fromPoint Correspondences with a non-linear optimization• Non-linear least-s...
A Look to the Reprojection Error                          1D camera under 2DTrue camera position at       translation     ...
Gaussian Noise on the Projections White cross: true camera position; Black cross: global minimum of the objective function...
What if there are Outliers ?                        M3         M1                         M2                              ...
Gaussian Noise on the Projections +             20% outliersWhite cross: true camera position;Black cross: global minimum ...
What Happened ?   Bayesian interpretation:                                                                        2       ...
Robust estimationIdea: Replace the Normal distribution by a more suitable distribution, or   equivalently replace the leas...
Example of an M-estimator:               The Tukey Estimator                          c2    if |x| ≤ c   ρ(x) =   6  (1   ...
Normal distribution       Uniform distribution    (inliers)                  (outliers)                Mixture            ...
Gaussian Noise on the Projections +         20% outliers + Tukey estimatorWhite cross: true camera position;Black cross: g...
Gaussian Noise on the Projections +      50% outliers + Tukey estimatorEven more local minimums.Numerical optimization can...
RANSAC         28
How to Optimize ?Idea: sampling the space of solutions (the camera pose space here):                                      ...
How to Optimize ?Idea: sampling the space of solutions:+ Numerical Optimization from the best sampled pose.Problem: Exhaus...
RANSAC RANSAC: RANdom SAmple Consensus  Line fitting: the Throwing Out the worst residual heuristics can fail (Example for...
RANSACAs before, we could do a regular sampling, but would not be optimal:                     Ideal line                 ...
Idea:  Generate hypotheses from subsets of the measurements.  If a subset contains no gross errors, the estimated paramete...
The quality of a hypothesis is evaluated by the number of measures that lie closeenough to the predicted line. We need to ...
RANSAC for Homographies To apply RANSAC to homography estimation, we need a way to compute ahomography from a subset of me...
How to Get the Correspondences ?               m                                         m    • Extract Feature Points / K...
Affine Region DetectorsHessian-Affine detector   MSER detector                                          37
Affine NormalizationWarp by M11/2                              Warp by M21/2                We still have to correct      ...
Select Canonical Orientation• Create histogram of local gradient directions computed over the image patch;• Each gradient ...
Select Canonical Orientation                               40
Description Vector                                                                    .            ?          .   ...
SIFT Description VectorMade of local histograms of gradients:                                                             ...
Matching Regions                                        .              .   .         .  .  .          ...
Matching: Approximate Nearest NeighbourBest-Bin-First: Approximate nearest-neighbour search in k-d tree                   ...
Keypoint MatchingThe standard approach is a particular case of classification:                                            ...
One Class per Keypoint One class per keypoint: the set of the keypoint’s possible appearancesunder various perspective, li...
Training phaseclass 1                           Classifierclass 2...            Run-Time                           class 1...
Which Classifier ?  We want a classifier that:• can handle many classes;• is very fast;• has reasonable recognition perfor...
Which Classifier ?• Randomized Trees [Amit  Geman, 1997];• Random forests [Breiman, 2001].                                ...
An (Ideal) Single Tree               binary testbinary test     binary test     class #                             50
How to Build the Tree ?training   set                binary test ?                                     51
training                       S   set                     binary test ? found by minimizing the                          ...
training                    S    set                   binary testProblem: runs quickly out of training samples for thedee...
Idea: Use Several Sub-Optimal TreesEach tree is trained with a random subset ofthe training set.                          ...
Idea: Use Several Sub-Optimal TreesThe leaves contain the probabilities over theclasses, computed from the training set.  ...
Classification with Several Sub-Optimal TreesThe test sample is dropped into each tree, and theprobabilities in the leaves...
Visual InterpretationEach tree partitions the space in a different way andcompute the probability of each class for each c...
Visual InterpretationCombining the trees gives a fine partition with abetter estimate of the class probabilities:         ...
For Patches Possible tests: compare the intensities of two pixels around the keypoint after Gaussian smoothing:m + dmi,1  ...
Results          60
Randomized Trees (and Random Ferns) applied to imagepatches are becoming a powerful tool for Computer Vision.             ...
[Shotton et al, CVPR’11]Used to infer body parts in the Kinect body tracking system.The tests rely on the depth map.      ...
Tests in [Shotton et al, CVPR’11]Classes are the body parts. The goal is to label each pixel with thelabel of the part it ...
3D Pose EstimationMean-Shift is used to find the joint locations from the body parts.                                     ...
TrainingMost of the training data is synthetic:“Training 3 trees to depth 20 from 1 million images takes about1 day on a 1...
A SubtreeAverage of the patchesthat reach this node                             66
[Gall et Lempitsky, CVPR’09; Barinova et al, CVPR’10]   Hough Forest for Object Detection:• Random Forests used to make ea...
Tests used in [Gall et Lempitsky, CVPR’09]                          1 if channeli (m + dm1 )  channeli (m + dm2 ) + τfi (m...
[Bosch et al, ICCV’07]Image Classification using Random Forests and Ferns [Bosch et al,ICCV’07]Use a sliding window to det...
[Bosch et al, ICCV’07]Tests:                                          1 if n xm + b ≤ 0                                   ...
[Kalal et al, CVPR’10]  TLD (aka Predator), for Track, Learn, Detect:• Random Ferns used to speed up detection;• Trained o...
[Kalal et al, CVPR’10]• Tests: 2bit binary patterns• Trained online: the distributions in the leaves are updated online, u...
Random Ferns:A Simplified Tree-Like Classifier                                    73
For Keypoint Recognition,               We Can Use Random Tests!Comparison of the recognition rates for 200 keypoints:    ...
We can use random tests• For a small number of classes   – we can try several tests, and   – retain the best one according...
We can use random tests• For a small number of classes   – we can try several tests, and   – retain the best one according...
Why it is Interesting• Building the trees takes no time (we still have to  estimate the posterior probabilities);• Allows ...
The Tree Structure is not Needed                                   78
The Tree Structure is not Needed                       f1                       f2                       f3               ...
The Tree Structure is not Needed                                                 f1                                       ...
We are looking for      argmax P(C = c i patch)                                 i    If patch can be represented by a set ...
Training           82
Training           83
Training    0    1    16                   84
Training    0       1    1       0    1       0        16                   85
Training    0       1          1    1       0          0    1       0          1        1                   56            ...
Training           87
Training           88
Training Results    Normalize:    ∑ P( f , f ,…, f           1     2     n   | C = ci ) = 1     000     001           111...
Training Results    Normalize:    ∑ P( f , f ,…, f           1     2     n   | C = ci ) = 1     000     001           111...
Recognition              91
Normalization    Normalize:    ∑ P( f , f ,…, f           1     2     n   | C = ci ) = 1     000     001           111€  ...
Subtlety with Normalization                                Number of samples(leaf, class)               pleaf, class   =  ...
Influence of Nregularizationpleaf, class =     Number of samples(leaf, class)+NregularizationNumber of samples(class)+Numb...
Implementation of Feature Point    Recognition with Ferns 1: for(int i = 0; i  H; i++) P[i] = 0.; 2: for(int k = 0; k  M; ...
Ferns versus SIFTNumberof inliersfor SIFT                                   each point corresponds                        ...
Randomized Trees vs Ferns       Different combination strategies:              average (RT) / product (Ferns)             ...
Randomized Trees vs Ferns                   Influence of the number of classes:Recognition rate                           ...
Memory and Computation Time• Recognition time grows linearly with the number of Trees/Ferns and  the number of classes.• R...
Influence of the Number of Ferns                                            Ferns with productRecognition rate            ...
Number of Ferns / Number of Leaves /                      Memory / Computation Time                      Number of Ferns  ...
Conclusions on  Randomized Trees and Ferns• Simple to implement, Ferns even simpler;• Both very fast, but dumb: need a lot...
We now have correspondences between a reference imageof the object and the input image:Some correspondences are correct, s...
Computing a Homography from Point Correspondences             by solving a linear system                                  ...
Computing a Homography from Point Correspondences             by solving a linear system                                  ...
Computing a Homography from Point Correspondences             by solving a linear system                            u = ...
How to Solve this Linear System ?BX = 08with X = [H11 , H12 , H13 , H21 , H22 , H23 , H31 , H32 , H33 ]• X is the null eig...
Computing a Homography from Point     Correspondences with a non-linear optimization• Non-linear least-squares minimizatio...
Numerical Optimization                                   p2      p1        p0 Start from an initial guess p0: p0 can be ta...
Numerical OptimizationGeneral methods:• Gradient descent / Steepest Descent;• Conjugate Gradient;• ...Non-linear Least-squ...
Numerical Optimization  We want to find p that minimizes:                                             2                   ...
Gradient descent / Steepest Descentp i+1 = p i − λ∇E (p i )                         2                  T E(p i ) = f (p i ...
The Gauss-Newton and   the Levenberg-Marquardt algorithmsBut first, the Linear Least-Squares Case:                        ...
Non-Linear Least-Squares:             The Gauss-Newton algorithm     Iteration steps:                                     ...
Non-Linear Least-Squares:          The Levenberg-Marquardt AlgorithmIn the Gauss-Newton algorithm:                        ...
Non-Linear Least-Squares: the     Levenberg-Marquardt Algorithm                                    T             −1    T  ...
Another Way to Refine the Pose:      Template Matching                                  117
Global region tracking by minimizing cross-correlation:• Useful for objects difficult to model using local features;• Accu...
Lucas-Kanade Algorithm                                   2 min ∑ (W (I,p)[m j ] − T[m j ])  p      j                      ...
Lucas-Kanade Algorithm                         p                    p0   Template TComputing J and J+ is computationally e...
Inverse Compositional Algorithm                   [Baker et al. IJCV03]                       pi = pi-1 + dpi             ...
ESM (Efficient Second-order Method)(1)    I = T + Jp=0dp + dpTHp=0dp [second-order Taylor expansion](2) Jp=dp = Jp=0 + 2dp...
BRIEF [ECCV’10]very fast feature point descriptor                                     123
Remark• Moving legacy code to new CPUs does not  result in a speed-up anymore;• Should consider the features of new  platf...
1                1Gaussian        0smoothing                ...                0                1             BRIEF       ...
1                              1              Gaussian        0              smoothing                              ...   ...
Integral Images                           Integral Image(u, v) =                   Image(i, j)                         i=1...
How to Use Integral Images                     -                     -           =                     +                  ...
[Viola  Jones, IJCV’01]Features computed in constant time                                       129
Computing Integral ImagesIntegralImage[u][v] = IntegralImage[u][v-1] + LineBuffer[u] + Image[u][v]                        ...
Evaluation             131
Evaluation             132
Computation SpeedFor BRIEF, most of the time is spent in Gaussian smoothing.                                              ...
Matching Speed distance(BRIEF descriptor 1, BRIEF descriptor 2)= Hamming distance(BRIEF descriptor 1, BRIEF descriptor 2)=...
Matching Speed                 135
Picking the Locationsuniform distribution   Gaussian distribution   Gaussian distribution for                             ...
Picking the Locationsuniform distribution   Gaussian distribution   Gaussian distribution for                             ...
Rotation and Scale Invariance                                138
Rotation and Scale InvarianceDuplicate the Descriptors:18 rotations x 3 scales                                         ......
code released in GPL on CVLab website                                        140
DOT [CVPR’10]            dense descriptor for object detectionJoint work with Stefan Hinterstoisser (TU Munich)           ...
object detection with a                    sliding window and                    template matchingTemplate matching with a...
143
144
Initial Similarity Measure                             145
Making the Similarity Measure Robust to Small Motions                                                        146
Downsampling               147
Ignoring the Dependencies between the Regions...                                                   148
Lists of Dominant Orientations                                 149
Fast Computation with Bitwise Operations00010000                00001100                                            150
Code available under LGPL license athttp://campar.in.tum.de/personal/hinterst/index/                                      ...
New Method, LINE[PAMI, under revision]                         152
Initial Similarity Measure                                                                                                ...
Making the Similarity Measure Robust to Small Motions                                                                     ...
Avoiding to Recompute the max Operator         1. spread the gradients                                     155
2. precompute response mapsBecause• we consider only a discrete set of gradient directions;• we do not consider the gradie...
Optimized Version1. The sets of orientations in the image regions are encoded with abinary representation:                ...
Optimized Version2. The binary representation is used as an index to lookup tables withthe precomputed responses for each ...
Avoiding Caches MissesThe response maps are re-arranged into linear memories:                                             ...
Using the Linear Memories  The similarity measure can be computed for all the  image locations by summing linear memories,...
Advantage of Linearizing the Memory                            Speed-up                            factor                 ...
DOT [CVPR’10]LINE                       162
LINE-MOD [Hinterstoisser et al, ICCV’11]Extension to the Kinect: the templates combine the imageand the depth map.        ...
thanks!          164
Upcoming SlideShare
Loading in …5
×

Vincent Lepetit - Real-time computer vision

4,162 views

Published on

Published in: Art & Photos, Technology
1 Comment
3 Likes
Statistics
Notes
  • If to follow directly m'=Hm.
    The straightforward derivation of this should give:
    u'=u''/w'' = (H11 u + H12 v + H13)/w''
    v'=v''/w'' = (H21 u + H22 v + H23)/w''
    w'' = H31 u + H32 v + H33
    that leads to a system in the slide (16), but with the [H31 H32 H33] taken with the negative sign. However, this sign can be moved to a third row of H, i.e (H31, H32, H33 )
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
4,162
On SlideShare
0
From Embeds
0
Number of Embeds
978
Actions
Shares
0
Downloads
98
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Vincent Lepetit - Real-time computer vision

  1. 1. Real-Time Computer Vision Microsoft Computer Vision SchoolVincent Lepetit - CVLab - EPFL (Lausanne, Switzerland) 1
  2. 2. demo 2
  3. 3. applications ... 3
  4. 4. • How the demo works (including Randomized Trees);• More recent work. 4
  5. 5. Background• 3D world to 2D images (projection matrix, internal parameters, external parameters, homography, ...);• Robust estimation (non-linear least-squares, RANSAC, robust estimators, ...);• Feature point matching (affine region detectors, SIFT, ...). 5
  6. 6. From the 3D World to a 2D Image World coordinate system M mWhat is the relation between the 3D coordinates of a point M and itscorrespondent m in the image captured by the camera ? 6
  7. 7. Perspective Projection World coordinate system M m C Camera centerThe image formation is modeled as a perspective projection, which isrealistic for standard cameras:The rays passing through a 3D point M and its correspondent m in theimage all intersect at a single point C, the camera center. 7
  8. 8. Expressing M in the Camera Coordinates System World coordinate M system Mcam m Z C X Y Camera coordinate systemStep 1: Express the coordinates of M in the camera coordinates system as Mcam.This transformation corresponds to a Euclidean displacement (a rotation plus atranslation): Mcam = RM + Twhere:R is a 3x3 rotation matrix, andT is a 3- vector. 8
  9. 9. Homogeneous Coordinates ⎛ X ⎞ ⎛ X ⎞ ⎜ ⎟ ⎜ ⎟ Y World coordinate ˜ = ⎜ ⎟ M = ⎜ Y ⎟ → M ⎜ ⎟ ⎜ Z ⎟ system ⎝ Z ⎠ ⎜ ⎟ ⎝ 1 ⎠ Mcam m € Z C X Camera coordinate system Y ˜Lets replace M by the 4- homogeneous vector M :Just add a 1 as the fourth coordinate.Now, the Euclidean displacement can be expressed as an linear transformationinstead of an affine one: € ⎛ X cam ⎞ € ⎛ ⎛ X ⎞ X ⎞ ⎛ X cam ⎞ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ = RM + T → ⎜ Ycam ⎟ = R⎜ Y ⎟ + T → ⎜ Ycam ⎟ = (R | T)⎜ Y ⎟ → M ˜ M cam cam = (R | T)M ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ Z ⎟ ⎝ Z cam ⎠ ⎝ Z ⎠ ⎝ Z cam ⎠ ⎜ ⎟ (R | T ) is a 3x4 matrix. ⎝ 1 ⎠ 9
  10. 10. Projection Mcam m f Z Z Mcam C X Y Camera coordinate system mXComputation of the coordinates of m in the imageplane, from Mcam (expressed in the camera coordinatessystem): Simply use Thales theorem: f mX X X = → mX = f f Z Z mX X C 10
  11. 11. From Projection to ImageCoordinates of m in pixels ? 1 ku ) Image coordinate 1 pixel ) 1 kv system u € u0 v X Y mX = f , mY = f v0 m Z Z € f mu = u0 + kumX , mv = v 0 + k v mY C Camera coordinate € system € 11
  12. 12. Putting• the perspective projection and• the transformation into pixel coordinatestogether: X Y mX = f , mY = f Z Z mu = u0 + kumX , mv = v 0 + k v mY In matrix form : ⎛ u ⎞ ⎛ k u f 0 u0 ⎞⎛ X ⎞ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ v ⎟ = ⎜ 0 kv f v 0 ⎟⎜ Y ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ w ⎠ ⎝ 0 0 1 ⎠⎝ Z ⎠ ⎛ u ⎞ ⎧ u X ⎜ ⎟ ⎪mu = w = u0 + k u f Z v ⎟ defines m in homogeneous coordinates → ⎨ ⎪mv = v = v 0 + k v f Y ⎜ ⎜ ⎟ ⎝ w ⎠ ⎩ w Z 12
  13. 13. The Full Transformation The two transformations are chained to form the full transformation from a 3D point in the world coordinate system to its projection in the image: ⎛ X ⎞ ⎛ u ⎞ ⎛ k u f 0 u0 ⎞⎛ R11 R13 R13 T ⎞⎜ ⎟ 1 ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ Y ⎟ ⎜ v ⎟ = ⎜ 0 kv f v 0 ⎟⎜R 21 R 22 R 23 T2 ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ Z ⎟ ⎝ w ⎠ ⎝ 0 0 1 ⎠⎝R 31 R 32 R 33 T3 ⎠⎜ ⎟ ⎝ 1 ⎠ ⎛ X ⎞ ⎛ P11 P12 P13 P14 ⎞⎜ ⎟ ⎜ ⎟⎜ Y ⎟ = ⎜ P21 P22 P23 P24 ⎟ ⎜ ⎟⎜ Z ⎟ ⎝ P31 P32 P33 P34 ⎠⎜ ⎟ ⎝ 1 ⎠ projection matrix€ The product of the internal calibration matrix and the external calibration matrix is a 3x4 matrix called the "projection matrix". The projection matrix is defined up to a scale factor. 13
  14. 14. The Full Transformation ⎛ X ⎞ ⎛ u ⎞ ⎛ k u f 0 u0 ⎞⎛ R11 R13 R13 T ⎞⎜ ⎟ 1 ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ Y ⎟ ⎜ v ⎟ = ⎜ 0 kv f v 0 ⎟⎜R 21 R 22 R 23 T2 ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ Z ⎟ ⎝ w ⎠ ⎝ 0 0 1 ⎠⎝R 31 R 32 R 33 T3 ⎠⎜ ⎟ ⎝ 1 ⎠ ⎛ X ⎞ ⎛ P11 P12 P13 P14 ⎞⎜ ⎟ ⎜ ⎟⎜ Y ⎟ = ⎜ P21 P22 P23 P24 ⎟ ⎜ ⎟⎜ Z ⎟ ⎝ P31 P32 P33 P34 ⎠⎜ ⎟ ⎝ 1 ⎠ projection matrix€ R, T, and the products kuf and kvf can be extracted from the projection matrix. 14
  15. 15. Homography H3×3 M/m m     X X    Y    X Y m = PM = [P1 P2 P3 P4 ]   = [P1 P2 P3 P4 ]   Z   = [P1 P2 P4 ]  Y  0  1 1 1 = H3×3 m 15
  16. 16. Computing a Projection Matrix or a Homography from Point Correspondences by solving a linear system m m  m = Hm H11  H12 m = [u, v, 1] , m = [u , v , 1]   H13      H21  u v 1 0 0 0 uu vu u    = 0  H22  0 0 0 u v 1 uv vv v   0  H23   H31     H32  H33 16
  17. 17. Computing a Projection Matrix or a Homography fromPoint Correspondences with a non-linear optimization• Non-linear least-squares minimization: Minimization of a physical, meaningful error (reprojection error, in pixels) M m HR,T m PR,T m m m 2min 2 dist PR,T Mi , m i min dist HR,T mi , mi R,T R,T i i• Minimization algorithms: Gauss-Newton or Levenberg-Marquardt (very efficient). 17
  18. 18. A Look to the Reprojection Error 1D camera under 2DTrue camera position at translation (0, 0) reprojection error 100 3D points taken at randomly in [400;1000]x[-500;+500] 18
  19. 19. Gaussian Noise on the Projections White cross: true camera position; Black cross: global minimum of the objective function. In that case, the global minimum of the objective function is close to the truecamera pose. 19
  20. 20. What if there are Outliers ? M3 M1 M2 M4 m1 m3 m4 incorrect measure m2 (outlier) C 20
  21. 21. Gaussian Noise on the Projections + 20% outliersWhite cross: true camera position;Black cross: global minimum of the objective function.The global minimum is now far from the true camera pose. 21
  22. 22. What Happened ? Bayesian interpretation: 2 argmin i dist PR,T mi , mi R,T = argmax i N mi ; PR,T mi , σI R,T M3 The error on the 2D point locations mi is assumed tohave a Gaussian (Normal) distribution with identical M1covariance matrices σI, and independent; M2 M4 m1 This assumption is violated when mi is an outlier. m3 m 4 m2 C 22
  23. 23. Robust estimationIdea: Replace the Normal distribution by a more suitable distribution, or equivalently replace the least-squares estimator by a robust estimator or “M-estimator”: 2 argmin dist PR,T mi , mi i R,T → argmin i ρ dist PR,T mi , m i R,T 23
  24. 24. Example of an M-estimator: The Tukey Estimator c2 if |x| ≤ c ρ(x) = 6 (1 − (1 − (c) ) ) x 2 3 c2 if |x| c ρ(x) = 6 x2 ρ(x)The Tukey estimator assumes the measures follow adistribution that is a mixture of:• a Normal distribution, for the inliers,• a uniform distribution, for the outliers. 24
  25. 25. Normal distribution Uniform distribution (inliers) (outliers) Mixture + = -log(.) -log(.)Least-squares Tukey estimator 25
  26. 26. Gaussian Noise on the Projections + 20% outliers + Tukey estimatorWhite cross: true camera position;Black cross: global minimum of the object function.The global minimum is very close to the true camera pose.BUT:- local minimums;- the objective function is flat where all the correspondences are considered outliers. 26
  27. 27. Gaussian Noise on the Projections + 50% outliers + Tukey estimatorEven more local minimums.Numerical optimization can get trapped into a local minimum. 27
  28. 28. RANSAC 28
  29. 29. How to Optimize ?Idea: sampling the space of solutions (the camera pose space here): 29
  30. 30. How to Optimize ?Idea: sampling the space of solutions:+ Numerical Optimization from the best sampled pose.Problem: Exhaustive regular sampling is too expensive in 6 dimensions.Can we do a smarter sampling ? 30
  31. 31. RANSAC RANSAC: RANdom SAmple Consensus Line fitting: the Throwing Out the worst residual heuristics can fail (Example forthe original paper [Fischler81]): outlier final least-squares solution Ideal line 31
  32. 32. RANSACAs before, we could do a regular sampling, but would not be optimal: Ideal line 32
  33. 33. Idea: Generate hypotheses from subsets of the measurements. If a subset contains no gross errors, the estimated parameters (the hypothesis) are closedto the true ones. Take several subsets at random, retain the best one. Ideal line 33
  34. 34. The quality of a hypothesis is evaluated by the number of measures that lie closeenough to the predicted line. We need to choose a threshold (T) to decide if the measure is close enough. RANSAC returns the best hypothesis, ie the hypothesis with the largest number ofinliers. T ⎧1 if dist(mi ,line(p)) ≤ T ∑ ⎨0 if dist(mi ,line(p)) T i ⎩ € 34
  35. 35. RANSAC for Homographies To apply RANSAC to homography estimation, we need a way to compute ahomography from a subset of measurements:   H11  H12     H13     H21  u v 1 0 0 0 uu vu u    = 0  H22  0 0 0 u v 1 uv vv v   0  H23   H31     H32  H33 Since RANSAC only provides a solution estimated with a limited number of data, itmust be followed by a robust minimization to refine the solution. 35
  36. 36. How to Get the Correspondences ? m m • Extract Feature Points / Keypoints / Regions (Harris corner detector, extrema of Laplacian, affine region detectors, ...);• standard approach: Match them based on Euclidean distances between descriptors such as SIFT, SURF, ... 36
  37. 37. Affine Region DetectorsHessian-Affine detector MSER detector 37
  38. 38. Affine NormalizationWarp by M11/2 Warp by M21/2 We still have to correct for the orientation ! 38
  39. 39. Select Canonical Orientation• Create histogram of local gradient directions computed over the image patch;• Each gradient contributes for its norm, weighted by its distance to patch center;• Assign canonical orientation at peak of smoothed histogram. 0 2π 39
  40. 40. Select Canonical Orientation 40
  41. 41. Description Vector      .  ?  .   .  41
  42. 42. SIFT Description VectorMade of local histograms of gradients:      .   .   . In practice: 8 orientations x 4 x 4 histograms = 128 dimensions vector.Normalised to be robust to light changes. 42
  43. 43. Matching Regions       .   .   .   .  .  .            .   .   .   .   .   .  ?   .  .  .  m      .   .   .       .   .   .  43
  44. 44. Matching: Approximate Nearest NeighbourBest-Bin-First: Approximate nearest-neighbour search in k-d tree 44
  45. 45. Keypoint MatchingThe standard approach is a particular case of classification: Search in the Database Pre-processing Nearest neighbor Make the actual classification easier classification Idea: let’s try another classification method! 45
  46. 46. One Class per Keypoint One class per keypoint: the set of the keypoint’s possible appearancesunder various perspective, lighting, noise... class 1 class 2 46
  47. 47. Training phaseclass 1 Classifierclass 2... Run-Time class 1 Classifier 47
  48. 48. Which Classifier ? We want a classifier that:• can handle many classes;• is very fast;• has reasonable recognition performances (a very high recognition rate is not an necessary requirement). 48
  49. 49. Which Classifier ?• Randomized Trees [Amit Geman, 1997];• Random forests [Breiman, 2001]. 49
  50. 50. An (Ideal) Single Tree binary testbinary test binary test class # 50
  51. 51. How to Build the Tree ?training set binary test ? 51
  52. 52. training S set binary test ? found by minimizing the entropy after the test: |Sleft | |Sright | argmin |S| Entropy(Sleft ) + |S| Entropy(Sright ) test Sright Sleft 52
  53. 53. training S set binary testProblem: runs quickly out of training samples for thedeeper tests 53
  54. 54. Idea: Use Several Sub-Optimal TreesEach tree is trained with a random subset ofthe training set. 54
  55. 55. Idea: Use Several Sub-Optimal TreesThe leaves contain the probabilities over theclasses, computed from the training set. 55
  56. 56. Classification with Several Sub-Optimal TreesThe test sample is dropped into each tree, and theprobabilities in the leaves it reached are averaged: 1 3 ( + + )= 56
  57. 57. Visual InterpretationEach tree partitions the space in a different way andcompute the probability of each class for each cell ofthe partition: 57
  58. 58. Visual InterpretationCombining the trees gives a fine partition with abetter estimate of the class probabilities: 58
  59. 59. For Patches Possible tests: compare the intensities of two pixels around the keypoint after Gaussian smoothing:m + dmi,1 ˜ ˜ 1 if I(m + dmi,1 ) ≤ I(m + dmi,2 ) fi = 0 otherwise m ˜ m + dmi,2 I : image after Gaussian smoothing • Very efficient to compute; • Invariant to light change by any raising function. 59
  60. 60. Results 60
  61. 61. Randomized Trees (and Random Ferns) applied to imagepatches are becoming a powerful tool for Computer Vision. 61
  62. 62. [Shotton et al, CVPR’11]Used to infer body parts in the Kinect body tracking system.The tests rely on the depth map. 62
  63. 63. Tests in [Shotton et al, CVPR’11]Classes are the body parts. The goal is to label each pixel with thelabel of the part it belongs to.Tests compare the depth of two pixels around the considered pixel.The displacements are normalized by the depth of the consideredpixel for invariance: 1 if depth(m + depth(m) ) ≤ depth(m + depth(m) ) dm1 dm2 fi (m) = 0 otherwise m 63
  64. 64. 3D Pose EstimationMean-Shift is used to find the joint locations from the body parts. 64
  65. 65. TrainingMost of the training data is synthetic:“Training 3 trees to depth 20 from 1 million images takes about1 day on a 1000 core cluster” [Shotton et al, CVPR’11] 65
  66. 66. A SubtreeAverage of the patchesthat reach this node 66
  67. 67. [Gall et Lempitsky, CVPR’09; Barinova et al, CVPR’10] Hough Forest for Object Detection:• Random Forests used to make each patch vote for the object centroid;• The tests compare the output of filters and histograms-of-gradient between 2 pixels;• The leaves contain the displacement toward the object center.Each patch votes for Votes from the 3 Accumulated votes Final detection the object centroid patches from all patches 67
  68. 68. Tests used in [Gall et Lempitsky, CVPR’09] 1 if channeli (m + dm1 ) channeli (m + dm2 ) + τfi (m) = 0 otherwise Channels: the 3 color channels, absolute values of the first and second derivatives of the image, and 9 channels from HoG (Histograms-of-Gradients). HoG 68
  69. 69. [Bosch et al, ICCV’07]Image Classification using Random Forests and Ferns [Bosch et al,ICCV’07]Use a sliding window to detect objects.Much faster than SVMs, recognition performances similar. 69
  70. 70. [Bosch et al, ICCV’07]Tests: 1 if n xm + b ≤ 0 fi (m) = 0 otherwisen and b: random vector and scalar.xm: vector computed from a Pyramidal Histogram-of-Gradients. 70
  71. 71. [Kalal et al, CVPR’10] TLD (aka Predator), for Track, Learn, Detect:• Random Ferns used to speed up detection;• Trained online: the distributions in the leaves are updated online, using the incoming images. 71
  72. 72. [Kalal et al, CVPR’10]• Tests: 2bit binary patterns• Trained online: the distributions in the leaves are updated online, using the incoming images. 72
  73. 73. Random Ferns:A Simplified Tree-Like Classifier 73
  74. 74. For Keypoint Recognition, We Can Use Random Tests!Comparison of the recognition rates for 200 keypoints: Recognition rate tests selected by minimizing entropy tests with random locations Number of trees 74
  75. 75. We can use random tests• For a small number of classes – we can try several tests, and – retain the best one according to some criterion. 75
  76. 76. We can use random tests• For a small number of classes – we can try several tests, and – retain the best one according to some criterion.• When the number of classes is large – any test does a decent job: 76
  77. 77. Why it is Interesting• Building the trees takes no time (we still have to estimate the posterior probabilities);• Allows incremental learning;• Simplifies the classifier structure. 77
  78. 78. The Tree Structure is not Needed 78
  79. 79. The Tree Structure is not Needed f1 f2 f3 79
  80. 80. The Tree Structure is not Needed f1 f2 f3The distributions can be expressed simply, as: Results of pixel comparisons (0 or 1) Class Label 80
  81. 81. We are looking for argmax P(C = c i patch) i If patch can be represented by a set of image features { fi }: P(C = c i patch) = P(C = c i f1, f 2 ,… f n , f n +1,… … f N ) € which is proportional to€ but complete representation of the joint distribution infeasible. Naive Bayesian ignores the correlation: Compromise: 81
  82. 82. Training 82
  83. 83. Training 83
  84. 84. Training 0 1 16 84
  85. 85. Training 0 1 1 0 1 0 16 85
  86. 86. Training 0 1 1 1 0 0 1 0 1 1 56 86
  87. 87. Training 87
  88. 88. Training 88
  89. 89. Training Results Normalize: ∑ P( f , f ,…, f 1 2 n | C = ci ) = 1 000 001  111€ 89
  90. 90. Training Results Normalize: ∑ P( f , f ,…, f 1 2 n | C = ci ) = 1 000 001  111€ 90
  91. 91. Recognition 91
  92. 92. Normalization Normalize: ∑ P( f , f ,…, f 1 2 n | C = ci ) = 1 000 001  111€ 92
  93. 93. Subtlety with Normalization Number of samples(leaf, class) pleaf, class = Number of samples(class) too selective: Number of samples(leaf, class) can be 0 simply because the training set is finite. we use: pleaf, class = Number of samples(leaf, class)+Nregularization Number of samples(class)+Number of leaves×Nregularization This can be done by simply initializing the counters to Nregularization instead of 0. 93
  94. 94. Influence of Nregularizationpleaf, class = Number of samples(leaf, class)+NregularizationNumber of samples(class)+Number of leaves×Nregularization Recognition rate50% Nregularization (log scale) 94
  95. 95. Implementation of Feature Point Recognition with Ferns 1: for(int i = 0; i H; i++) P[i] = 0.; 2: for(int k = 0; k M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j S; j++) { 5: index = 1; 6: if (*(K + d[0]) *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i H; i++) P[i] += p[i]; }• Very simple to implement;• No need for orientation, perspective, light correction. 95
  96. 96. Ferns versus SIFTNumberof inliersfor SIFT each point corresponds to an image from a 1000-frame sequence Number of inliers for FernsFerns are much faster, sometimes more accurate,but SIFT does not need training. 96
  97. 97. Randomized Trees vs Ferns Different combination strategies: average (RT) / product (Ferns) Ferns with productRecognition rate RT (with random tests) with product Ferns with average RT (with random tests) with average Number of structures Ferns more discriminant but more sensitive to outliers. 97
  98. 98. Randomized Trees vs Ferns Influence of the number of classes:Recognition rate Ferns with product Ferns with average 98
  99. 99. Memory and Computation Time• Recognition time grows linearly with the number of Trees/Ferns and the number of classes.• Recognition time grows linearly with the logarithm of the depth of Trees/Ferns.• Memory grows linearly with the number of Trees/Ferns and the number of classes.• Memory grows exponentially with the depth of Trees/Ferns.• Increasing the depth may result in overfitting.• Increasing the number of Trees/Ferns (usually) improves recognition. 99
  100. 100. Influence of the Number of Ferns Ferns with productRecognition rate RT (with random tests) with product Ferns with average RT (with random tests) with average Number of structuresIncreasing the number of Ferns/Trees improves therecognition rate, but increases the computation time andmemory. 100
  101. 101. Number of Ferns / Number of Leaves / Memory / Computation Time Number of Ferns Computation TimeRecognition Rate Fern size Fern size 101
  102. 102. Conclusions on Randomized Trees and Ferns• Simple to implement, Ferns even simpler;• Both very fast, but dumb: need a lot of training examples to learn.• Use a lot of memory to store the posterior distributions in the leaves. 102
  103. 103. We now have correspondences between a reference imageof the object and the input image:Some correspondences are correct, some are not.We can estimate the homography between the 2 images byapplying RANSAC on subsets of 4 correspondences. 103
  104. 104. Computing a Homography from Point Correspondences by solving a linear system m m   m = Hm H11  H12  m = [u, v, 1] , m = [ku , kv , k]   H13      H21  u v 1 0 0 0 uu vu u    = 0  H22  0 0 0 u v 1 uv vv v   0  H23   H31     H32  H33 104
  105. 105. Computing a Homography from Point Correspondences by solving a linear system m m m = Hm m = [u, v, 1] , m = [ku , kv , k]   u = H1 1u+H1 2u+H1 3 H3 1u+H3 2u+H3 3  v = H2 1u+H2 2u+H2 3 H3 1u+H3 2u+H3 3 105
  106. 106. Computing a Homography from Point Correspondences by solving a linear system   u = H1 1u+H1 2u+H1 3 H3 1u+H3 2u+H3 3  v = H2 1u+H2 2u+H2 3 H3 1u+H3 2u+H3 3   H11  H12     H13     H21  u v 1 0 0 0 uu vu u    = 0  H22  0 0 0 u v 1 uv vv v   0  H23   H31     H32  H33Using four correspondences: BX = 08 with X = [H11 , H12 , H13 , H21 , H22 , H23 , H31 , H32 , H33 ] 106
  107. 107. How to Solve this Linear System ?BX = 08with X = [H11 , H12 , H13 , H21 , H22 , H23 , H31 , H32 , H33 ]• X is the null eigenvector of B.• In practice: the eigenvector corresponding to the smallest eigenvalue. 107
  108. 108. Computing a Homography from Point Correspondences with a non-linear optimization• Non-linear least-squares minimization: Minimization of a physical, meaningful error (reprojection error, in pixels) HR,T m m m 2 min dist HR,T mi , mi R,T i• Minimization algorithms: Gauss-Newton or Levenberg-Marquardt (very efficient). 108
  109. 109. Numerical Optimization p2 p1 p0 Start from an initial guess p0: p0 can be taken randomly but should be as close as possible to the globalminimum: - pose computed at time t-1; - pose predicted from pose computed at time t-1 and a motion model; - ... 109
  110. 110. Numerical OptimizationGeneral methods:• Gradient descent / Steepest Descent;• Conjugate Gradient;• ...Non-linear Least-squares optimization:• Gauss-Newton;• Levenberg-Marquardt;• ... 110
  111. 111. Numerical Optimization We want to find p that minimizes: 2 E(p) = i dist HR(p),T(p) mi , mi = f (p) − b2 where• p is a vector of parameters that define the camera pose (translation vector +parameters of the rotation matrix);• b is a vector made of the measurements (here the m’i);• f is the function that relates the camera pose to these measurements.     u(HR(p),T(p) m1 ) u(m ) 1  v(HR(p),T(p) m1 )   v(m )  f (p) =   b= 1  . . . . . . 111
  112. 112. Gradient descent / Steepest Descentp i+1 = p i − λ∇E (p i ) 2 T E(p i ) = f (p i ) − b = ( f (p i ) − b) ( f (pi ) − b)→ ∇E(p i ) = 2J( f (p i ) − b) with J the Jacobian matrix of f , computed at p iWeaknesses:- How to choose λ ?- Needs a lot of iterations in long and narrow valleys: 112
  113. 113. The Gauss-Newton and the Levenberg-Marquardt algorithmsBut first, the Linear Least-Squares Case: 2 E(p) = f (p) − bIf the function f is linear ie f(p) = Ap, p can be estimated as: p=A+b €where A is the pseudo-inverse of A: A+=(ATA)-1AT + 113
  114. 114. Non-Linear Least-Squares: The Gauss-Newton algorithm Iteration steps: pi+1=pi + ∆i ∆i is chosen to minimize the residual || f(pi+1) – b ||2. It is computed by approximating f to the first order: 2 Δi = argmin f (pi + Δ) − b Δ 2 = argmin f (pi ) + JΔ − b First order approximation : f (pi + Δ) ≈ f (pi ) + JΔ Δ 2 = argmin εi + JΔ εi = f (pi ) − b denotes the residual at iteration i Δ Δ i is the solution of the system JΔ = −εi in the least − squares sense : Δ i = −J +εi where J + is the pseudo - inverse of J€ 114
  115. 115. Non-Linear Least-Squares: The Levenberg-Marquardt AlgorithmIn the Gauss-Newton algorithm: −1 Δ i = −( J J) JTεi TIn the Levenberg-Marquardt algorithm: € T −1 T Δ i = −( J J + λI) J εiLevenberg-Marquardt Algorithm:0. Initialize λ with a small value: λ = 0.001 €1. Compute ∆i and E(pi + ∆i)2. If E(pi + ∆i) E(pi): λ ← 10 λ and go back to 1 [happens when the linear approximation of f is too coarse]3. If E(pi + ∆i) E(pi): λ ← λ / 10, pi+1 ← pi + ∆i and go back to 1.Once converged, set λ ← 0 and continue up to convergence. 115
  116. 116. Non-Linear Least-Squares: the Levenberg-Marquardt Algorithm T −1 T Δ i = −( J J + λI) J εi• When λ is small, LM behaves similarly to the Gauss-Newton algorithm. €• When λ becomes large, LM behaves similarly to a steepest descent to guarantee convergence. 116
  117. 117. Another Way to Refine the Pose: Template Matching 117
  118. 118. Global region tracking by minimizing cross-correlation:• Useful for objects difficult to model using local features;• Accurate. p Input Image I Template T 118
  119. 119. Lucas-Kanade Algorithm 2 min ∑ (W (I,p)[m j ] − T[m j ]) p j mj p Input Image I Template T Gauss-Newton step: + Δ i = J ⋅ εp,I pPseudo-inverse of the Jacobian of εp,I = (…, T[m j ] − W (I,p)[m j ],…) TW(I, p) evaluated at p and the mj € 119
  120. 120. Lucas-Kanade Algorithm p p0 Template TComputing J and J+ is computationally expensive. 120
  121. 121. Inverse Compositional Algorithm [Baker et al. IJCV03] pi = pi-1 + dpi Input Image It dpi -pi-1 Template T + dpi = J p= 0 εp= 0,IJp=0 is a constant matrix and can therefore be precomputed ! 121
  122. 122. ESM (Efficient Second-order Method)(1) I = T + Jp=0dp + dpTHp=0dp [second-order Taylor expansion](2) Jp=dp = Jp=0 + 2dpTHp=0 [derivation of (1) wrt p](3) dpTHp=0 = ½(Jp=dp - Jp=0) [from Equation (2)](4) I = T + Jp=0 + ½(Jp=dp - Jp=0)dp [by injecting (3) in (1)](5) dp = [½(Jp=0 + Jp=dp)]+ (I - T) [from Equation (4)]Like Gauss-Newton but replace Jp=0 by ½(Jp=0 + Jp=dp).Need to compute Jp=dp at each iteration, and a pseudo-inverse at each iteration, but need much less iterations. 122
  123. 123. BRIEF [ECCV’10]very fast feature point descriptor 123
  124. 124. Remark• Moving legacy code to new CPUs does not result in a speed-up anymore;• Should consider the features of new platforms: parallelism (multi-cores, GPU), locality, ... 124
  125. 125. 1 1Gaussian 0smoothing ... 0 1 BRIEF descriptor 125
  126. 126. 1 1 Gaussian 0 smoothing ... 0Alternatively, using 1integral images: BRIEF descriptor 126
  127. 127. Integral Images Integral Image(u, v) = Image(i, j) i=1..u j=1..v Integral Image 127
  128. 128. How to Use Integral Images - - = + 128
  129. 129. [Viola Jones, IJCV’01]Features computed in constant time 129
  130. 130. Computing Integral ImagesIntegralImage[u][v] = IntegralImage[u][v-1] + LineBuffer[u] + Image[u][v] 130
  131. 131. Evaluation 131
  132. 132. Evaluation 132
  133. 133. Computation SpeedFor BRIEF, most of the time is spent in Gaussian smoothing. 133
  134. 134. Matching Speed distance(BRIEF descriptor 1, BRIEF descriptor 2)= Hamming distance(BRIEF descriptor 1, BRIEF descriptor 2)= number of bits set to 1(BRIEF descriptor 1 xor BRIEF descriptor 2)= popcount(BRIEF descriptor 1 xor BRIEF descriptor 2)10- to 15-fold speed increase on Intels Bloomfield (SSE 4.2)and AMDs Phenom (SSE 4a) 134
  135. 135. Matching Speed 135
  136. 136. Picking the Locationsuniform distribution Gaussian distribution Gaussian distribution for location and length uniform distribution census transform on Polar coordinates locations 136
  137. 137. Picking the Locationsuniform distribution Gaussian distribution Gaussian distribution for location and length uniform distribution census transform on Polar coordinates locations 137
  138. 138. Rotation and Scale Invariance 138
  139. 139. Rotation and Scale InvarianceDuplicate the Descriptors:18 rotations x 3 scales ... ... ... 139
  140. 140. code released in GPL on CVLab website 140
  141. 141. DOT [CVPR’10] dense descriptor for object detectionJoint work with Stefan Hinterstoisser (TU Munich) 141
  142. 142. object detection with a sliding window and template matchingTemplate matching with an efficientrepresentation of the images and the templates. 142
  143. 143. 143
  144. 144. 144
  145. 145. Initial Similarity Measure 145
  146. 146. Making the Similarity Measure Robust to Small Motions 146
  147. 147. Downsampling 147
  148. 148. Ignoring the Dependencies between the Regions... 148
  149. 149. Lists of Dominant Orientations 149
  150. 150. Fast Computation with Bitwise Operations00010000 00001100 150
  151. 151. Code available under LGPL license athttp://campar.in.tum.de/personal/hinterst/index/ 151
  152. 152. New Method, LINE[PAMI, under revision] 152
  153. 153. Initial Similarity Measure ESteger (I, O, c) = cos orientation(O, r) − orientation(I, c + r) rprevious measure: 153
  154. 154. Making the Similarity Measure Robust to Small Motions ESteger (I, O, c) = cos orientation(O, r) − orientation(I, c + r) r E(I, O, c) = max cos orientation(O, r) − orientation(I, t) t∈region(c+r) r 154
  155. 155. Avoiding to Recompute the max Operator 1. spread the gradients 155
  156. 156. 2. precompute response mapsBecause• we consider only a discrete set of gradient directions;• we do not consider the gradient norms,we can precompute a response for each region in the image and a gradientdirection for the template in the template in the template 156
  157. 157. Optimized Version1. The sets of orientations in the image regions are encoded with abinary representation: 11010 157
  158. 158. Optimized Version2. The binary representation is used as an index to lookup tables withthe precomputed responses for each gradient direction in the template: 158
  159. 159. Avoiding Caches MissesThe response maps are re-arranged into linear memories: 159
  160. 160. Using the Linear Memories The similarity measure can be computed for all the image locations by summing linear memories, shifted by an offset that depends on the template. 160
  161. 161. Advantage of Linearizing the Memory Speed-up factor 161
  162. 162. DOT [CVPR’10]LINE 162
  163. 163. LINE-MOD [Hinterstoisser et al, ICCV’11]Extension to the Kinect: the templates combine the imageand the depth map. 163
  164. 164. thanks! 164

×