Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ICPRAM 2012

849 views

Published on

Oral presentation at ICPRAM 2012 of our paper entitled:
Object recognition in probabilistic 3-D Volumetric scenes

  • Be the first to comment

  • Be the first to like this

ICPRAM 2012

  1. 1. Object Recognition In Probabilistic 3-D Volumetric Scenes Maria Isabel Restrepo Brandon A. Mayer Joseph L. Mundy
  2. 2. Goal: Automated Scene DescriptionMaria Isabel Restrepo. February 7, 2012 2
  3. 3. Goal: Automated Scene DescriptionMaria Isabel Restrepo. February 7, 2012 3
  4. 4. Related Work : 3-d Object Retrieval EC 120 R. Toldo, U. Castellani, and A. Fusiel RSHBronstein et al. Hough Transforms and 3D SURF for robust three dimensional classification 3 of transformations pdif (x) = (log Kατ2 (x, x) − log Kατ1 (x, x), . . . , log Kατ m (x, x) − log Kατm−1 (x, x)), ˆ p(x) = |(F pdif (x))(ω1 , . . . , ωn )|, (3) where F is the discrete Fourier transform, and ω1 , . . . , ωn denotes a set of frequencies at which the transformed vector is sampled. Taking differences of logarithms removes the scaling constant, and the Fourier transform converts the scale-space shift into a complex A. M. Bronstein, et.al. phase, which is removed by taking the absolute value. Typically, (a) (b) J.Knopp, et.al. (c) a large m is used to make the representation insensitive to large R. Toldo, et.al. Fig. 2. Illustration of the detection of 3D SURF features. The shape (a) is voxelized 2011 scaling factors and edge effects. Such a descriptor was dubbed 2010 into the cube grid (side of length 256) (b). 3D SURF features are detected and back- projected to the shape (c), where detected features are represented as Kokkinoswith Scale-Invariant HKS (SI-HKS) [Bronstein and spheres and 2010]. 2009 the radius illustrating the feature scale. 3.3 Numerical Computation of HKS Maria Isabel Restrepo. February 7, 2012 4
  5. 5. crucial because it allows further tasks such as recognition, navigation, and data compression to exploit contextual in- Related Work: Scene Description In LIDAR Thommen Korah formation. A keySwarup Medasani contribution is our novel Strip Histogram Yuri Owechko Grid representation that encodes the scene as a grid of ver- Nokia Research Center, Hollywood HRL Labs, Malibu tical 3D population histograms rising up from the locally {thommen.korah}@nokia.com {smedasani,yowechko}@hrl.com detected ground. This scheme captures the nature of the real world, thereby making segmentation tasks intuitive and efficient. Our algorithms work across a large spectrum of Abstract urban objects ranging from buildings and forested areas to cars and other small street side objects. The methods have As part of a large-scale 3D recognition system applied to areas spanning several kilometers in mul- been for LI- DAR data from urban scenes, we describe an tiple citiesfor data collected from both aerial and ground approach with sensors exhibiting different properties. We processed almost segmenting millions of points into coherent regions that ide- ally belong to a single real-world object. Segmentation is spanning an area of 3.3 km in less than an a billion points 2 crucial because it allows further tasks such ashour on a regular desktop. recognition, navigation, and data compression to exploit contextual in- formation. A key contribution is our novel Strip Histogram Grid representation that encodes the scene as a grid of ver- 1. Introduction tical 3D population histograms rising up from the locally detected ground. This scheme captures the nature of the describes an approach for segmenting 3D ob- This work real world, thereby making segmentation tasksjects from high-resolution scans of complex urban environ- intuitive and efficient. Our algorithms work across a largements. Advances in sensor technology have enabled such spectrum of Object Detection from Large-Scale 3D Datasets Light Standard buildings and forested areaspoint clouds to be routinely collected using both urban objects ranging from colorized to 56 Figure 1: Top image is an input pointcloud for a 100x100 cars and other small street side objects. The methods have and airborne LIDAR platforms. The push ground-based T. Korah, et.al. 2011 towards location-based services has increased demand for been applied to areas spanning several kilometers in mul- square meter tile color-mapped by height. Bottom shows the result of segmentation. Each colored region ideally cor- tiple cities with data collected from both aerial and ground digital maps of urban environments. The highly accurate responds to a physical object. This tile has over 3 million sensors exhibiting different properties. We processed3D data contains millions of data points p = (x, y, z) 1 input almost points. a billion points spanning an area of 3.3 km2 in lessstore the spatial coordinates and possibly RGB color that than an hour on a regular desktop. 0.9 Car information. Segmentation can provide valuable contextual information to Post Short subsequent recognition or scene understand- linearly with the number of points. As a key part of our 3D 0.8 recognition system that demonstrated over 60% accuracy on ing modules, making these tasks more efficient. Millions Newspaper Box of 3D points need to be reduced to perceptually “mean- 40 classes, segmentation took less than an hour on a regular 1. Introduction 0.7 PC to process a collection of nearly 1 billion points. ingful” groupings. To be effective for target recognition, This work describes an approach0.6 segmenting Carob- disaster planning, processing must scale sub- for simulation, or 3D Detailed geometric data at city-scales has not been pos- jects from high-resolution scans of complex urban environ- 0.5 Traffic Light ments. Advances in sensor technology have enabled such 74 Car colorized point clouds to be routinely collected using both 0.4 Figure 1: Top image is an input pointcloud for a 100x100 (c) Zoomed0.4 view The push ground-based and airborne LIDAR platforms. 0.6 0.8 1 square meter tile color-mapped by height. Bottom shows towards location-based services has increased demand for the result of segmentation. Each colored region ideally cor-00 manually labeled objects et.al. truth area in the A. Golovinskiy,Left: The precision-recall curve for carand P. Mordohai,million points con A. Patterson detection on 200 2008 highly accurate digital maps of urban environments. The responds to a physical object. This tile has over 3 million Fig. 6. input 3D data contains millions of data points p = (x, y, z)d points, with colors representing labels.) A points. 2009 1221 cars. (Precision is the x-axis and recall the y-axis.) Right: Screenshot o that store the spatial coordinates and possibly RGB color taining information. Segmentation can provide valuable contextuals on bottom, is shown in (c).(Automatically information to subsequent recognition or scene understand- linearly with the number of points. As a key part of our 3D detected cars. Cars are in random colors and the background in original colors. ing modules, making these tasks more efficient. Millions recognition system that demonstrated over 60% accuracy on Maria Isabel Restrepo. February 7, 2012 of 3D points need to be reduced to perceptually “mean- 40 classes, segmentation took less than an hour on a regular 5
  6. 6. Challenges Of Multi-View StereoMaria Isabel Restrepo. February 7, 2012 6
  7. 7. Challenges Of Multi-View Stereo Scene Ambiguity:Maria Isabel Restrepo. February 7, 2012 6
  8. 8. Challenges Of Multi-View Stereo Scene Ambiguity:Maria Isabel Restrepo. February 7, 2012 6
  9. 9. Challenges Of Multi-View Stereo Scene Ambiguity: Scene Uncertainty: 5 (a) (a) (b) (b) (c) (a) (c) (d) (b) (d) (e) (c) (d)Maria Isabel Restrepo. February 7, 2012 6
  10. 10. Probabilistic 3-d Volumetric Model: PVM Probabilistic representation of 3-d scenes based on volumetric units -voxel. C RX I IX Voxel Volume! V S X P(IX|V=X’)! Intesity! Pollard and Mundy, 2007Maria Isabel Restrepo. February 7, 2012 7
  11. 11. Probabilistic 3-d Volumetric Modeling C RX I IX Voxel Volume! V S X P(IX|V=X’)! Intesity!Maria Isabel Restrepo. February 7, 2012 8
  12. 12. Probabilistic 3-d Volumetric Modeling Surface probability is given by on-line Bayesian learning pN (Ix +1 |X 2 S) N P N +1 (X 2 S|Ix +1 ) = P N (X 2 S) N pN (Ix +1 ) N C RX I IX Voxel Volume! V S X P(IX|V=X’)! Intesity!Maria Isabel Restrepo. February 7, 2012 9
  13. 13. observed image intensity, as well the Gaussian mixture (1) at that voxel explains the intensity to contain the observed surface observed in the N+1 image better than any other voxel along usion. The process of updating the Probabilistic 3-d Volumetric Modeling the projection ray. pancy probabilities is explained in pN (IX +1 |X 2 S) N Update using information along a projection ray P N +1 (X 2 S) = P N (X 2 S) p N (I N +1 ) (3) X e model X pN (IX +1 |V = X 0 )P (V = X 0 |X 2 S) Nvoxel is modeledpwith N +1 |X 2 S) N (IX Gaussian a N N X 0 2RX en P (X 2 S) by (1). I, refers to the +1 N (I N grey- = P (X 2 S) X considered a vector pwith X ) various pN (IX +1 |V = X 0 )P N (V = X 0 ) N X 0 2RX or. The quantities, µk , k and !k , (4) and mixing parameters associated Cution. W is the sum of !k for all To make the PVM representation clear, a term by term R is given by k; for this particular explanation of the update equation in 4 is outlined. I Xxture components. I X N N +1 • The term p (IX |V = X 0 ) is computed using the Voxel Volume! ! 1 (I µk )2 mixture of Gaussians model stored at the voxel X 0 . 2 2p 2 exp k (1) • The probability of a voxel X producing the color in 0 2⇡ k V the image is interpreted geometrically, where a voxelmixture S learned using a modi- are produces the intensity seen in the image if it is a surface on (EM) algorithm similar to that element and it is not occluded by other voxels along the X modeling [45]. The update of |V=X’)! P(I the X ray. Thus, Intesity! P N (V = X 0 ) = P N (X 0 2 S)P N (X 0 is not occluded) (5)+1 The probability of occlusion is defined as the probability that all voxels between X 0 and the sensor are empty,10! Maria Isabel Restrepo. February 7, 2012
  14. 14. observed image intensity, as well the Gaussian mixture (1) at that voxel explains the intensity to contain the observed surface observed in the N+1 image better than any other voxel along usion. The process of updating the Probabilistic 3-d Volumetric Modeling the projection ray. pancy probabilities is explained in pN (IX +1 |X 2 S) N Every voxel contains appearance information P N +1 (X 2 S) = P N (X 2 S) p N (I N +1 ) (3) X e model X pN (IX +1 |V = X 0 )P (V = X 0 |X 2 S) Nvoxel is modeledpwith N +1 |X 2 S) N (IX Gaussian a N N X 0 2RX en P (X 2 S) by (1). I, refers to the +1 N (I N grey- = P (X 2 S) X considered a vector pwith X ) various pN (IX +1 |V = X 0 )P N (V = X 0 ) N X 0 2RX or. The quantities, µk , k and !k , (4) and mixing parameters associated Cution. W is the sum of !k for all To make the PVM representation of the a term by term Probability clear, observed R is given by k; for this particular explanation of the update equation given that the I X intensity, in 4 is outlined.xture components. I • The term p (IX voxels produced the color X N N +1 |V = X 0 ) is computed using the Voxel Volume! ! 1 (I µk )2 mixture of Gaussians model the image voxel X 0 . seen in stored at the 2p 2 exp 2 k (1) • The probability of a voxel X producing the color in 0 2⇡ k V the image is interpreted geometrically, where a voxel 3 ! X wk (I µk )2mixture S learned using a modi- are produces the intensity seen in1the image if it is a surface 2 2 on (EM) algorithm similar to that p e element and it is not occluded by2other voxels along the k X W 2⇡ k modeling [45]. The update of |V=X’)! P(I the X ray. Thus, k=1 Intesity! P N (V = X 0 ) = P N (X 0 2 S)P N (X 0 is not occluded) (5)+1 The probability of occlusion is defined as the probability that all voxels between X 0 and the sensor are empty,11! Maria Isabel Restrepo. February 7, 2012
  15. 15. ance modelobserved image intensity, as well the N +1 Gaussian mixture (1) atNthat +1 (Iexplains 2 S) 0 N p (I N voxel = X 0|X (V = X |X 2 S) p |V X )P the intensity h voxel is modeled with asurface observed(X theS) = 0image better thanpany other)voxel along ( to contain the observed Gaussian P in 2 N+1 P (X 2 S) X N (I N +1 usion. The process of updating grey- = P NProbabilistic X 2RX 3-d Volumetric Modeling X e model given by (1). I, refers to the the the projection 2 S) XX N NN +1+1 (X ray. pancy probabilities is explained in p p X X |V |V = X 0 )P N (V X 0 |X )2 (I (I N = X 0 )P (V = = X 0 oxelconsidered a vectorawith various be is modeled with Gaussian X 0 2R color. The quantities, µk , k and !k , = P N (X 2 S) X 0 2RX X pN (I N +1 |X 2 S)en by (1). I, refers to the grey- N +1 N X e, and mixing parameters associatedP X (X 2 S) = P (X 2 S)N (I N +1 |V+1 X 0 )P N (V = X 0 ) p N = (3) (4) considered a vector with various pN (IX ) X ribution. W is the sum of !k for all To make the PVM representation clear, a term by termor. modelquantities, µk , k and !k , e The X X 0 2RX res is given by k; forN +1 particular explanation of the updateNequation X 0 )Pis outlined. 2 S) ( this pN (IX +1 |V = in 4 (V = X 0 |Xvoxel is modeledpwith a associated nd mixing parameters GaussianS)N mixture components. X |X 2 N (I X0 N N The term 2RX N +1 |V = X 0 ) is computed using the tion. W is2the sum 2to the +1 all = PTo(X 2 S)the pPVM representation clear, a term 0 by ter en P (X I,S) by (1). refers of !N grey- ! N (I k for • make XX(I pwith X ) mixture of Gaussians(IX +1 |Vstored 0at the voxel X 0 . N pN model = X )P N (V = X ) is given by ak; for this particular explanation of the0 update equation in 4 is outlined. considered vector various (I µk ) 1 2 2 pThe quantities, µ , or. 2⇡ 2 exp (1) • The probability Xof a voxel X producing the color in 0 ture components. k k and !k , k X 2R N N +1 k • The term pis (IX the image |V =geometrically, where using th interpreted X 0 ) is computed a (4)voxel and mixing parameters associated ! C e1 mixture are 2 2 (I µk )2 learned !k a all mixture ofthe Probability instored at termit by aterm0 . produces Gaussians modelthat image if is X intensity seen the a voxelution. W is the sum ofusingfor modi- To make the PVM representation clear, a the voxel surface expby algorithm similar to(1) explanation of the updateof a occluded4by producing the color • The probability not voxel X other voxels along the 0 element and it is equation color outlined. produced the in is seen in R zation (EM) k; for this particularthat 2⇡ given is 2 k X Ind modeling [45]. The update of the • The term pN (I Ninterpreted geometrically, where thevoxxturekcomponents. X I the image is +1 ray. Thus, |V = X ) isthe image 0 computed using a Voxel Volume! ! ixture are learned using a modi- (I µk )2 producesGaussians model stored N image if it is a surfa N of the0 intensity seen in the the voxel X 0 . X mixture = X ) = P N (X 0 2 S)P at(X 0 is not occluded) (5 1 (EM) algorithm similar to that P (Vonp exp 2 2 k (1) element and itof anot occluded by other the color in th • The probability is voxel X 0 producing voxels along 2⇡ 2modeling [45]. The update of the the TheThus, interpreted geometrically, where a voxel ray. probability of occlusion is defined as the probability V N +1 k image is k that all voxels between0 XtheandNthe ifsensora are empty, produces the intensity N 0mixture S N +1 d! are learnedNusing a modi- Pnamely: X 0 ) = P seen in S)P (X 0 is is occluded) N (V = (X 2 image it not surface (I on (EM) algorithm X N µk similar to that (2) ) element and it is not occluded by other voxels along the! + !k Y ray. Thus, 0 The (X is not occluded) = N probability of occlusion is defined N the probabili modeling [45]. The update of the (1 P as 00 2 S)) (6 P(I |V=X’)! 1 d! X P (X N +1 N Intesity! 2 N 2 P N (V all X 0 ) = P N (X 0 2 S)P00Nand0 the sensor are (5) that = voxels between X <X 0 is not occluded) empt (I µk ) ( k) 0 d! + N k !+1 N X (X (I µN ) k (2) namely: • The term P (V = X |X 2 S) is computed analogously N N 0 ng !k weight, d!, upon observing image+1 The probability of0 occlusion is defined as the probability Y P N P (V = between X= and instances of P empty, S) toall voxels X ). However, anythe (1 P N (XN (X S)) N0 00 d!analyzing N +1 distributions in other the N 2 N 2 that (X is not occluded) 0 sensor are 2 212! Maria Isabel Restrepo. February 7,µ ) (I 2012 ( )
  16. 16. Spatial Optimization: Octree empty space surfaceMaria Isabel Restrepo. February 7, 2012 13
  17. 17. Spatial Optimization: Octree empty space surfaceMaria Isabel Restrepo. February 7, 2012 14
  18. 18. Spatial Optimization: Octree p(intensity) p(intensity) intensity intensity Crispell, Mundy and Taubin 2011 Miller, Jain and Mundy 2011Maria Isabel Restrepo. February 7, 2012 15
  19. 19. Probabilistic 3-d Volumetric Modeling Demo: https://vimeo.com/43729866Maria Isabel Restrepo. February 7, 2012 16
  20. 20. Geometry And Appearance Demo: https://vimeo.com/43690883 https://vimeo.com/45322168Maria Isabel Restrepo. February 7, 2012 17
  21. 21. Expected Appearance Volume Model: EVM Voxel’s Expected = E(IX |V = X )P (X 2 S) 0 0 AppearanceMaria Isabel Restrepo. February 7, 2012 18
  22. 22. Object Categorization: Bag Of Volumetric Words Parking Car Plane Building House Input: Feature Descriptor: Volumetric Classifier: EVM sampling: Taylor Vocabulary: Naive Bayes Dense PCA K-meansMaria Isabel Restrepo. February 7, 2012 19
  23. 23. Experiments: Data Collection http://vision.lems.brown.edu/project_desc/Object-Recognition-in-Probabilistic-3D-ScenesMaria Isabel Restrepo. February 7, 2012 20
  24. 24. Experiments: Train And Test Sites Site 1 Site 2 Site 3 Site 5 Site 6 Site 7 Site 8 Site 10 Site 11 Site 12 Site 16 Site 18 Site 21 Site 22 Site 23 Site 25 Site 26 Site 27 http://vision.lems.brown.edu/project_desc/Object-Recognition-in-Probabilistic-3D-ScenesMaria Isabel Restrepo. February 7, 2012 21
  25. 25. Experiments: The Input Camera matrices were recovered using Bundler: Snavely, N. and Seitz, S. (2006). Photo tourism: exploring photo collections in 3D. ACM Transactions on Graphics.Maria Isabel Restrepo. February 7, 2012 22
  26. 26. Feature Description 394 D. Saupe and D.V. Vrani´ c Global Features Spherical Harmonics: D. Saupe and D. V. Vrani, 2001 Original 823-d Zernik Moments: M. Novotnia and R. Klein, 2003 harmonics 162 harmonics 242 harmonicsTransforms and 3D IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 5, 3 Regional Point Descriptors SURF for robust Recognizing Objects in Range Data Using 1999 Fig. 1. Multi-resolution representationFeatures AND MACHINE INTELLIGENCE,| VOL. 21, NO. 5, MA 227 IEEE TRANSACTIONS ON PATTERN ANALYSIS r(u) = max{r ≥ 0 Local of the function three dimensional classification ru ∈MAY I ∪{0}} used to derive feature vectors frommakes thecoefficients for spherical harmonics. Sampling logarithmically Fourier descriptor more robust to distortions in shape with distance from the basis point. Bins closer to the center are smaller in all three spherical dimensions, so we use a minimum radius (rmin > 0) to avoid being overly 3 Functions on the Sphere for 3D Shape Feature Vectors sensitive to small differences in shape very close to the center. The Θ and Φ divisions are evenly In this section we describe the feature vectors used in our comparative study. As spaced along the 180◦ and 360◦ elevation and az-onents of our surface representation. A surface described by a polygonal surface mesh can be represented for matching as a set ofand surface normals and (b) spin images. imuth ranges. 3 3D models we take triangle meshes consisting of triangles {T , . . . , T }, T ⊂ R , enes is difficult. The usual method for(b) the cency. Given enough points, weighted count w(pi )ating object-centered coordinate systems inBin(j, k, l) accumulates a any object can be represented dealing by points sensed on the object Spin so surface meshes 1 surface, Images: Johnson and Hebert, 1999 m i (a) for each point pi whose spherical coordinates rela- (c) 3 given by vertices (geometry) {p , . . . , p }, p = (x , y , z ) ∈ R and an indexr is to segment the scene into object and non- can represent objects of general shape. Surface meshes canponents [1], [7]; naturally, this is difficult if the pbe generated from described n a polygonal Rdo i ), SURF: Knopp, et.al. 2010 3-D tive to fall A surface1 different types i [R andi j+1 i Fig. 2. Visualization for matching as . 1. Components of our surface representation. within the radius by of sensors j , surface mesh can be represented of the interval notustrationisandthe detection ofand (b) spin images. [Φ The shape (a)elevation interval histogram bins ofmthe 3D the object table with three vertices per triangle (topology). Then our object is I = unknown. An alternative to seg-3D SURF features. Φ T, 3D points of surface normals azimuth generally contain,sensor-specific information; they are sen- is voxelized interval k representations. The useShape mesh k+1 ) andube grid (side of length coordinate sys- SURF features are detected and back- Context: context. et.al. 2004is to construct object-centered 256) (b). 3D sor-independent of surface shape Frome, i=1 i local features detected in the scene [Θl ,[18]; as representations for 3D shapes thebeen avoided in for [9], Θl+1 ). The contribution to has bin count the to the shape (c), where 2012 Maria Isabel Restrepo. February 7, detected features are represented as spheres and with 23
  27. 27. Feature Formation Volumetric Form of Vector Form of Voxel Voxel Neighborhoods Neighborhoods E(IX |V = X )P (X 2 S) 0 0 24Maria Isabel Restrepo. February 7, 2012
  28. 28. Feature Formation Volumetric Form of Vector Form of Voxel Voxel Neighborhoods Neighborhoods E(IX |V = X )P (X 2 S) 0 0 24Maria Isabel Restrepo. February 7, 2012
  29. 29. Feature Formation Volumetric Form of Vector Form of Voxel Voxel Neighborhoods Neighborhoods E(IX |V = X )P (X 2 S) 0 0 24Maria Isabel Restrepo. February 7, 2012
  30. 30. Feature Formation Volumetric Form of Vector Form of Voxel Voxel Neighborhoods Neighborhoods E(IX |V = X )P (X 2 S) 0 0 24Maria Isabel Restrepo. February 7, 2012
  31. 31. ity leaf nodes contain the Gaussian mixture models) Feature Description: PCA Features (c)tree subdivision of space proposed by Crispell [20]. S. In the PCA spac by the eigenvalue decomposition of 1-dimensional space d-dimensional space neighborhood (represented by a d-dimensional featur a1 ⇧d e1 x) can be exactly expressed as x = x + i=1 ai ei , w ¯ by theprincipal axes associated S. In the PCAeigenvalues, are eigenvalue decomposition of with the d space, every neighborhood x ⇡ x + a1 e1 by a d-dimensional feature vector (represented are the corresponding coefficients.⇧d k-dimensional ¯ A a e , where e x) can be exactly expressed as x = x + i=1 i i ¯ i approximation of the neighborhoodseigenvalues, and ai b are principal axes associated with the d can be obtained ⇧k the first k on the samplecomponents i.e.k-dimensional EVD principal ˜ ¯ x = x + i= are the corresponding coefficients. A k-dimensional (k < d) approximation, for k<d scatter matrix S a detailed analysis of the recons approximationpresents Section V of the neighborhoods can be obtained by using ⇧k 2 the firstof local neighborhoods,i.e. x = x + x| , ai ei a. error k principal components namely ¯ ˜ |x ˜i=1 as Section V presents a detailed analysis of In the remainder of dimension and training set size. 2 the reconstruction error of localvector arrangement of |x x| , as coefficien paper, the neighborhoods, namely projection a function ˜ of dimension and training set size. In the remainder of this PCA the vector arrangement of projection coefficients in the paper, space is referred to as a PCA feature.Maria Isabel Restrepo. February 7, 2012 25
  32. 32. on ni ⌃ ⌃ ⌃ nj nk ⇥2es, as the computation of derivatives in (i, j,expectationj,volume m V the k) Taylor Features ˜ E= Feature Description: V (i, k) (5) EVM, i= ni j= nj k= nk a least square error minimiz can be expressed as of the following energy function. ’s expected ˜ Where V (i, j, k) is the Taylor series approximation of ni nj nk ⌃ ⌃ ⌃ a volume V centered on2 the ⇥nces, Minimize: E =3-d appearance of V (i, j, k) V (i, j, k) as the expected ˜ identify- point (i, j, k). Using the second degree Taylor expansion o i= ni j= nj k= nk st of the about (0, 0, 0), ( 6) becomesis (PCA) ˜ ⇤ Where V (i, j, k) is the Taylor series approximation o ⌅2 ⌃ epresents expected 3-d appearance of axvolume 1 xT Hx E= V (x) V0 T G V centered on thby identify- point (i, j, x Using the second degree Taylor expansion or sense. k). 2!most ofof order the about (0, 0, 0), ( 6) becomes ysis (PCA) Where V0 , G, H are the zeroth derivative, the grad e scatter ⌃ ⇤ ⌅2 represents vector and the Hessian matrix of the 1 T T volume of expe E= V (x) V0 x G x Hx error sense. 3-d appearances about the point (0, 0, 0), respectively. 2! obtained coefficients for 3-d derivative operators can be found by xhe octree of imizing (7) withG, H are the zeroth derivative,second o ng order Where V0 , respect to the zeroth, first and the grample scatteraces and derivatives. The computedmatrix of the volume are exp vector and the Hessian derivative operators of app location, algebraically to neighborhoods in the 0, 0), respectively. 3-d appearances about the point (0, EVM. The respore obtained Maria Isabel Restrepo. February 7, 2012 26
  33. 33. Learning The Codebook Learn Volumetric Vocabulary using K-Means Clustering: ✤ Determine the best number of means: Heuristically ✤ Convergence depends on initialization: P. S. Bradley and U. M. Fayyad. 1998Maria Isabel Restrepo. February 7, 2012 27
  34. 34. Vocabulary: Twenty Volumetric Words PCA based Taylor basedMaria Isabel Restrepo. February 7, 2012 28
  35. 35. ssification, the class label with i=1 i 414ep a count the number of cluster centerstheLearning Class Distributions is obtained, cij , of in the vocabulary.of the 405 number From 413obability isachosen v , ominimizeUsing Bayes 415 ion meth-center, vi , times cluster center, proposed tooccursUsing Bayes quantization step a count is obtained, c , of the number of occurs in object j . in object o . ij 414 415 he means. iposteriori class probability class probability is given by: formula, the a posteriori is given by: j 406 416416 f the data 417 417of a particular category be the The clus- P (Cl |oi ) ⇥ P (oi |Cl )P (Cl ) (8)(Cl |oi ) ⇥ P (oi likelihood(Clan object is given by the product of |Cl )P of ) (8) 407418 k-means, The 418419 frequencyis the class label and N is the distances the likelihoods of the independent entries of the vocabulary, 408 419 420 the initialan object ),is given estimated l product ofThe full od of P (vj |Cl which are by the during learning. 421 s label l. Then, the set of alle manage-independent entries posterior becomes: of the expression for the class of the vocabulary, 409 420422 ⌥ f subsam- d k-means estimated )during learning. )The full ch are Nc (C |o ⇥ P (C ) P (v |C cji k 421 423O= O , where N is the P l i (9) 410424meansclass posteriorlbecomes: c l j lthe pro- l=1 j=1 422425he vocabulary of 3-d expectedetric train- Nm ⇥cji 411 423 426ng parallel are avail-⌥ k k ⇧ cji k ⇧ m=1:om O cjm ⌃ ⌃ 412 424 427ed as V = v , where k is 428⇥ not be l ) P (C P (vj |ClP (Cl= ⇧ k l (9) ⌃ uld ⇥ ) i ) ⇧ ⌃ (10) 429 i=1 ⇧ Nm ⌃ 425Therefore, s in the vocabulary. From the j=1 j=1 ⇤ ⇥ ji cnm ⌅ 413430which is a n=1 m=1:om cOl 426431 N mobtained, c⇧ ,: number of times accluster i⌃ in object j 414 ij of the number occurs of 427 4 k jm ⇧ ⌃ 415 428curs in object o . Using Bayes ⇧ m=1:o O Maria Isabel Restrepo. February 7, 2012 ⌃m l 29
  36. 36. appearance patterns be defined as V = i=1 vi , where k is the ⌥N c Then, the set of Bayes the 409 withnumber of clusterl.centers in the vocabulary. all Classifier 4 class label Classification: Fromefined as O = l=1 is l , where , of is number of 4104 quantization step a count Oobtained, cijNc thethe times a cluster center, vi , occurs in object oj . Using Bayes 4114es. Let the vocabulary of 3-d expected 4 ⌥k formula, the a posteriori class probability is given by: 4124 be defined as V )= P (o |C vi ,(C ) where k is (8) frequency P (Cl |oi ⇥ i=1 l )P l i er centers in the vocabulary. From the 4134 The likelihood of an object is given by the product of 4 count is obtained, cij , of the number of 414 the likelihoods of the independent entries of the vocabulary, 4er, (vj |Coccurs in object oj during learning. The full P vi , l ), which are estimated . Using Bayes 4154 expression for the class posterior becomes:eriori class probability is given by: 4164 4 k 4174oi )P⇥ P (oi |Cl )P= l ) l ) (C (Cl |oi ) ⇥ P (C P (vj |Cl ) cji (8) (9) 4184 j=1 ⇥cji 4194of an object is given by the product of N m 4 ⇧ the vocabulary,⌃ e independent entries of k ⇧ cjm 4204 ⌃ ⇧ m=1:om Ol ⌃ Maria Isabel Restrepo. February 7, 2012 ⇥ P (C ) ⇧ ⌃ (10) 30
  37. 37. appearance patterns be defined as V = i=1 vi , where k is withnumber of clusterl.centers in theLearning of all the 4094 the class label ⌥N c Then, the set Class Distributions vocabulary. Fromefined as O = l=1 is l , where , of is number of 4104 quantization step a count Oobtained, cijNc thethe times a cluster center, vi , occurs in object oj . Using Bayes 4114es. Let the vocabulary of 3-d expected 4 ⌥k formula, the a posteriori class probability is given by: 4124 be defined as V )= P (o |C vi ,(C ) where k is (8) frequency P (Cl |oi ⇥ i=1 l )P l i er centers in the vocabulary. From the 4134 The likelihood of an object is given by the product of 4 count is obtained, cij , of the number of 414 the likelihoods of the independent entries of the vocabulary, 4er, (vj |Coccurs in object oj during learning. The full P vi , l ), which are estimated . Using Bayes 4154 expression for the class posterior becomes:eriori class probability is given by: 4164 4 k 4174oi )P⇥ P (oi |Cl )P= l ) l ) (C (Cl |oi ) ⇥ P (C P (vj |Cl ) cji (8) (9) Train 4184 j=1 ⇥cji 4194of an object is given by the product of N m 4 ⇧ the vocabulary,⌃ e independent entries of k ⇧ cjm 4204 ⌃ ⇧ m=1:om Ol ⌃ Maria Isabel Restrepo. February 7, 2012 ⇥ P (C ) ⇧ ⌃ (10) 31
  38. 38. appearance patterns be defined as V = i=1 vi , where k is withnumber of clusterl.centers in theLearning of all the 4094 the class label ⌥N c Then, the set Class Distributions vocabulary. Fromefined as O = l=1 is l , where , of is number of 4104 quantization step a count Oobtained, cijNc thethe times a cluster center, vi , occurs in object oj . Using Bayes 4114es. Let the vocabulary of 3-d expected 4 ⌥k formula, the a posteriori class probability is given by: 4124 be defined as V )= P (o |C vi ,(C ) where k is (8) frequency P (Cl |oi ⇥ i=1 l )P l i er centers in the vocabulary. From the 4134 The likelihood of an object is given by the product of 4 count is obtained, cij , of the number of 414 the likelihoods of the independent entries of the vocabulary, 4er, (vj |Coccurs in object oj during learning. The full P vi , l ), which are estimated . Using Bayes 4154 expression for the class posterior becomes:eriori class probability is given by: 4164 4 k Test 4174oi )P⇥ P (oi |Cl )P= l ) l ) (C (Cl |oi ) ⇥ P (C P (vj |Cl ) cji (8) (9) Train 4184 j=1 ⇥cji 4194of an object is given by the product of N m 4 ⇧ the vocabulary,⌃ e independent entries of k ⇧ cjm 4204 ⌃ ⇧ m=1:om Ol ⌃ Maria Isabel Restrepo. February 7, 2012 ⇥ P (C ) ⇧ ⌃ (10) 32
  39. 39. Results: PCA Classes Buildings PlanesMaria Isabel Restrepo. February 7, 2012 33
  40. 40. Results: Taylor Classes Buildings PlanesMaria Isabel Restrepo. February 7, 2012 34
  41. 41. during training and classification. Experiments: Number Of Objects Table 2: Number of objects in every category. Planes Cars Houses Buildings Parking Lots Train 18 54 61 24 27 Test 16 29 45 15 17 Two measurements were used to evaluate the clas- sification performance: (i) classifier accuracy (i.e the fraction of correctly classified objects), and (ii) the confusion matrix. During classification experiments, the number of clusters in the codebook was varied from k = 2 to k = 100. Figure 4 presents classification accuracy as a function of the number of clusters. For 18 Probabilistic Sites both, Taylor-based features and PCA-based features,Maria Isabel Restrepo. February 7, 2012 35
  42. 42. Results: Classification AccuracyMaria Isabel Restrepo. February 7, 2012 36
  43. 43. row corresponds to those learned with Taylor-based features. The x-axis shows the feature. The most probable volumetric featuresResults:class are shown Matrix for each Confusion beside each was True Parking Class Plane House Building Car Lot True Class Plane House Building Car Parking Lot very are Plane 0.86 0.02 0.00 0.03 0.00 Plane 0.86 0.02 0.00 0.03 0.00 neg House 0.00 0.67 0.27 0.00 0.12 House 0.00 0.64 0.27 0.00 0.12 that not Building 0.00 0.31 0.67 0.00 0.00 Building 0.00 0.33 0.67 0.00 0.00 num ⇤, i Car 0.00 0.00 0.07 0.93 0.00 0.00 0.00 0.07 0.86 0.00 Car F Parking 0.14 0.00 0.00 0.03 0.88 Parking 0.14 0.00 0.00 0.10 0.88 mat Lot Lot sam (a) PCA (b) Taylor vari Fig. 9. Confusion matrix for a 20-keyword codebook of PCA based features valu on the left and Taylor based features on the right clasMaria Isabel Restrepo. February 7, 2012 cate 37
  44. 44. Future Work ✴ Evaluation of effectiveness of the EVM, by performing classification tasks on different underlying 3-d reconstruction algorithms. ✴ Performance evaluation of additional feature descriptors. ✴ Explore algorithms for detection.Maria Isabel Restrepo. February 7, 2012 38
  45. 45. Effectiveness Of Probabilistic Volumetric LearningMaria Isabel Restrepo. February 7, 2012 Y. Furukawa and J. Ponce, 2010 39
  46. 46. Effectiveness Of Probabilistic Volumetric Learning Probabilistic 3-d Modeling Threshold Based 3-d ModelingMaria Isabel Restrepo. February 7, 2012 40
  47. 47. Effectiveness Of Probabilistic Volumetric LearningMaria Isabel Restrepo. February 7, 2012 41

×