CVPR 2004 TUTORIAL ON  NONLINEAR MANIFOLDS IN COMPUTER VISIONAnuj Srivastava∗ , Washington Mio∗∗ and Xiuwen Liu∗∗∗        ...
PROGRAM OVERVIEWPart I: Motivation (1:00 – 2:00pm)presented by Anuj Srivastava— Why should one study nonlinear manifolds?—...
Part III: Statistics on Nonlinear Manifolds (3:30 – 4:00pm)presented by Anuj Srivastava— Intrinsic and Extrinsic means— Co...
NONLINEAR MANIFOLDS• What are nonlinear manifolds?:  Nonlinear manifolds are manifolds that are not vector spaces, i.e. if...
MOTIVATION – OUTLINEA large number of problems in computer vision can be studied a problems inoptimization or inference on...
• Statistical study of shapes: shapes of 2D and 3D objects can be represented via  landmarks, curves, or surfaces.• Learni...
PROBLEM 1: POSE ESTIMATIONWe are interested in recognizing known objects in observed images. Consider asimple sub-problem ...
represented by a 3 × 3 orthogonal matrix [6]. The space of all such matrices is:               SO(3) = {O ∈ R3×3 |OT O = I...
Problem: Given an image I , estimate s   ∈ SO(3) according to:                         sα = argmax P (I|s, α)             ...
GRADIENT-BASED OPTIMIZATIONIn Rn , one can seek a local optimum of a function F using a gradient process:                 ...
OBJECT RECOGNITIONGoal: Given a noisy image I of an object, find the most probable object?Several strategies. One is to defi...
PROBLEM 2: OBJECT MOTION TRACKINGGoal: Given a time series of sensor measurements (images, range measurements),estimate a ...
Classical Filtering: For Euclidean case,Use a state equation and an observation equation:                        State equ...
Nonlinear Manifolds: Kalman filtering does not apply, but Monte Carlo methods do!Use nonlinear filtering equations: Under us...
PROBLEM 3: OPTIMAL LINEAR PROJECTIONS• Raw images are elements of a high-dimensional space.• One need to reduce their dime...
• How to choose U in a particular application? For example, in recognition of people    by their facial images, is PCA the...
Sn,d . In case F depends on the subspace but not on a basis chosen to represent thesubspace, then the search space is a Gr...
Example:  X denotes face images and u(X) is used for recognition. Let F be arecognition performance and choose U to maximi...
PROBLEM 4: STATISTICAL SHAPE ANALYSIS                                                                                     ...
PROCRUSTES APPROACHRepresentation: Let a shape be represented by k points on its boundary. A shape isan element of x   ∈ R...
PROCRUSTES APPROACH Contd.Pre-shape Space:                                          k                                k 1  ...
ANOTHER APPROACH: GEOMETRY OF CLOSED CURVESGoal: To analyze shapes of continuous boundaries of objects as they appear inim...
Example                 2                 1                 0                         0       2       4       6   8     10...
GEODESIC FLOWS ON SHAPE SPACES OF CUVES
PROBLEM 5: LEARNING IMAGE MANIFOLDSLet I(s, α) denote an image of object α at variables (pose, location, illumination, etc...
Complete image manifold:                           I=       Iα .                                α
Goal: We are interested in estimating (or learning) I from the given observations.Representation:   I is a low-dimensional...
CVPR TUTORIAL ONNONLINEAR MANIFOLDS IN COMPUTER VISION      II. DIFFERENTIABLE MANIFOLDS               Washington Mio     ...
WHY MANIFOLDS?Non-linearity underlies many mathematical structures and representations used incomputer vision. For example...
A natural question arises:What spaces can the techniques of differential calculus be extended to?Since derivative is a loc...
Manifolds Underlying Data Points 1.5                                                                              1.5   1 ...
For simplicity, we only consider manifolds in Euclidean spaces, but a more generalapproach may be taken. Also, most concep...
How to make sense of “locally similar” to an Euclidean space?A map ϕ :    U → Rm defined on an open region U ⊆ Rn , n ≤ m, ...
∂ϕi(ii) The m × n Jacobian matrix J(x)                 =   ∂xj (x)   has rank n, for every x   ∈ U.    Here, x      = (x1 ...
(ii) The inverse map x   = ϕ−1 : V → U is continuous.                                          ϕ                          ...
Definition of Differentiable ManifoldsA subspace M      ⊆ Rm is an n-dimensional differentiable manifold if every pointhas ...
Examples of Manifolds• The Space of Directions in RnA direction is determined by a unit vector. Thus, this space can be id...
This implicit description of the sphere is a special case of the following generalconstruction.Let F: Rm → Rk , k < m, be ...
• The Group O(n) of Orthogonal MatricesLet M(n)   ∼ Rn2 be the space of all n × n real matrices, and S(n) ∼ Rn(n+1)/2     ...
• Lines in Rn Through the OriginA line in Rn through the origin is completely determined by its intersection with the unit...
• Grassmann ManifoldsThe spaces G(n, k) of all k -planes through the origin in Rn , k   ≤ n, generalize realprojective spa...
What is a Differentiable Function?Let f :M n ⊂ Rm → R be a function. Given u0 ∈ M , let ϕ : U → Rm be aparameterization su...
Tangent VectorsA tangent vector to a manifold M n     ⊂ Rm at a point p ∈ M is a vector in Rm thatcan be realized as the v...
Tangent SpacesThe collection of all tangent vectors to an n-manifold M n     ⊂ Rm at a point p forms ann-dimensional real ...
ExampleLet F : Rm → R be a differentiable function, and let a ∈ R be a regular value of F .The manifold M = F −1 (a) is (n...
DerivativesLet f :M → R be a differentiable function and v ∈ Tp M a tangent vector to M atthe point p.Realize v as the vel...
Differential Geometry on MGiven a manifold M n   ⊂ Rm , one can define geometric quantities such as thelength of a curve in...
GeodesicsThe study of geodesics is motivated, for example, by the question of finding the curveof minimal length connecting...
Numerical Calculation of Geodesics1. Geodesics with prescribed initial position and velocityFor many manifolds that arise ...
2. Geodesics between points a, b           ∈MOne approach is to use a shooting strategy. The idea is to shoot a geodesic f...
2Otherwise, we consider the miss function h(v)      =     exp(v) − b          , defined forv ∈ Ta M .The goal is to minimiz...
Optimization Problems on ManifoldsHow can one find minima of functions defined on nonlinear manifoldsalgorithmically? How to...
Given x1 , . . . , xk   ∈ M , consider the total variance function                                                 k      ...
What is the Gradient of a Function on M ?The gradient of a function F     : M → R at a point p ∈ M is the unique vector  M...
Gradient SearchAt each p   ∈ M , calculate   M f (p)  ∈ Tp M . The goal is to search for minima of fby integrating the neg...
Calculation of Karcher MeansIf {x1 , . . . , xn } are sample points in a manifold M , the negative gradient of the totalva...
Applications to the Analysis of Planar ShapesTo illustrate the techniques discussed, we apply them to the study of shapes ...
Procrustean Shape AnalysisA contour is described by a finite, ordered sequence of landmark points in the plane,say, p1 , . ...
In this y -representation, rotational effects reduce to rotations about the origin. Ify = (y1 , . . . , yn ), in complex n...
Examples of Procrustean Geodesics
Parametric-Curves ApproachAs a preliminary representation, think of a planar shape as a parametric curveα : I → R2 travers...
To obtain shape representations that are insensitive to rigid motions of the plane,we fix the average of θ to be, say, π . ...
Let C the collection of all θ satisfying the three constraints above. We refer to anelement of C as a pre-shape.• Why call...
The manifold C and the shape space S are infinite dimensional. Although thesespaces can be analyzed with the techniques we ...
Thus, the finite-dimensional analogue of C is a manifold Cm      ⊂ Rm of dimension(m − 3). Modulo 2π , a reparameterization...
−2−1 0 1 2     0   2   4   6   8   10   12   14   16   18
CVPR 2004 TUTORIAL ONNONLINEAR MANIFOLDS IN COMPUTER VISION       III. STATISTICS ON MANIFOLDS                 Anuj Srivas...
STATISTICS – OUTLINEWe are interested in extending usual techniques from statistics onto nonlinearmanifolds: learning, est...
PROBABILITY DISTRIBUTIONS ON MANIFOLDS• Probability densities on Rn are defined with respect to the Lebesgue measure.  Simi...
– Special Orthogonal group SO(n):                                        1                          f (O|A) =       exp(tr...
STATISTICS ON MANIFOLDSDefinition and estimation of mean on a manifold.For a Riemannian manifold M , let d(p1 , p2 ) be the...
Mean under a probability density function f (p): • Intrinsic mean:                         p = argmin                     ...
EXAMPLE OF MEAN ESTIMATION ON A CIRCLEGiven θ1 , θ2 , . . . , θn   ∈ S1. 1. Intrinsic Mean:              d(θ1 , θ2 ) = min...
STATISTICS ON MANIFOLDSCovariance under a probability density function f (p): • Let expp : TP (M ) → M be the exponential ...
Define the covariance Σ:                          Σ=           ˜                                   xxT f (x)dx ∈ Rm×m      ...
LEARNING PROBABILITY MODELS FROM OBSERVATIONSExample: Assume a multivariate normal model in the tangent space at mean.Obse...
LEARNING PROBABILITY MODELS FROM OBSERVATIONSSynthesis:(i) Generate samples of x, multivariate normal on the tangent space...
CVPR 2004 TUTORIAL ONNONLINEAR MANIFOLDS IN COMPUTER VISION     IV. ALGORITHMS AND EXAMPLES                  Xiuwen Liu   ...
Algorithms and ExamplesHere we focus on computational issues of implementing algorithms based onnonlinear manifolds. Live ...
Optimal Component AnalysisOptimal component analysis (OCA) is a framework to formulate computer visionproblems based on li...
Commonly Used Linear Representation ManifoldsParamete-           Manifold        Search space       ExamplesrizationSubspa...
Optimal Linear SubspacesWhen the performance does not depend on the choice of basis but only the subspace,the underlying s...
Optimal Linear Subspaces -cont.For recognition, we define the following criterion,                                    minc ...
Optimal Linear Subspaces -cont.Live Demonstration One: Recognition using part of the CMPU PIE dataset • C = 10.           ...
Optimal Linear Subspaces -cont.Live Demonstration Two: Recognition using part of the CMPU PIE dataset • C = 10.           ...
Optimal Linear Subspaces -cont.Live Demonstration Three: Recognition using part of the CMPU PIE dataset • C = 10.         ...
Optimal Linear Subspaces -cont.Live Demonstration Four: Recognition using part of the CMPU PIE dataset • C = 10.          ...
Optimal Linear Subspaces -cont.Offline Demonstration: Recognition using part of the CMPU PIE dataset • C = 66.             ...
Optimal Linear Subspaces -cont.Offline Demonstration: Recognition using part of the CMPU PIE dataset • C = 66.             ...
Further Speedup TechniquesWe have developed techniques to further speed up the optimization process.A particularly effecti...
Hierarichal LearningLevel   Image      Basis  0  1  L
Hierarichal Learning -cont.We have applied hierarchical learning on CMPU PIE and ORL datasets. We typicallyspeed up the pr...
Optimal Linear Subspaces -cont.Some experimental resultsPCA - Black; ICA - Red; FDA - Green; OCA - Blue                   ...
Other Performance FunctionsNote that this framework applies to any performance functions.To derive linear representations ...
Sparse Linear Representations for Recognition                   100                                                       ...
Comparison of Sparseness and Recognition                                 Optimal              PCAPerformance              ...
DiscussionNote that OCA provides a framework to formulate and solve problems. In computervision, a central issue is genera...
Geodesic Path Examples on Shape Manifold210          0       2       4       6   8   10   12210      0       2       4    ...
Geodesic Computation and Shape ClusteringLive Demonstration Five: Geodesic Computation and Shape Clustering.Here we show h...
Shape Clustering -cont.
Shape Clustering -cont.
Hierarchical Shape ClusteringWe have clustered 3000 shapes using this method. This figure shows the first fewlayers in the r...
ABCDEF
A   B   C   D   E   F   Database
A Brief Summary of Part IV• By utilizing geometric structures of underlying manifolds, we have developed  effective optimi...
THANK YOU!
References[1] W. M. Boothby. An Introduction to Differential Manifolds and Riemannian Geome-    try. Academic Press, Inc.,...
[7] E. Klassen, A. Srivastava, W. Mio, and S. Joshi. Analysis of planar shapes using     geodesic paths on shape spaces. I...
[14] Michael Spivak. A Comprehensive Introduction to Differential Geometry, Vol I & II.     Publish or Perish, Inc., Berke...
Upcoming SlideShare
Loading in...5
×

Nonlinear Manifolds in Computer Vision

1,265

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,265
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
41
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Nonlinear Manifolds in Computer Vision

  1. 1. CVPR 2004 TUTORIAL ON NONLINEAR MANIFOLDS IN COMPUTER VISIONAnuj Srivastava∗ , Washington Mio∗∗ and Xiuwen Liu∗∗∗ ∗ Department of Statistics ∗∗ Department of Mathematics ∗∗∗ Department of Computer Science Florida State University Sunday, June 27th , 1:00-5:00pm, Washington DC.
  2. 2. PROGRAM OVERVIEWPart I: Motivation (1:00 – 2:00pm)presented by Anuj Srivastava— Why should one study nonlinear manifolds?— How can differential geometry be useful in vision applications?— Examples, illustrations, and referencesPart II: Tools from Differential Geometry (2:00 – 3:30pm)presented by Washington Mio— Differentiable manifolds, definitions and examples— Tangent vectors and tangent spaces— Riemannian metrics, gradients, geodesics, exponential map— Integral curves and flows
  3. 3. Part III: Statistics on Nonlinear Manifolds (3:30 – 4:00pm)presented by Anuj Srivastava— Intrinsic and Extrinsic means— Covariances in tangent spaces— Probability distributions on manifolds— Estimation and testing on manifolds, with examplesPart IV: Algorithms and Applications in Computer Vision (4:00 – 5:00pm)presented by Xiuwen Liu— Statistics on a unit circle— Geodesic flows on shape spaces— Optimal linear bases for image-based recognition— Clustering on shape manifolds
  4. 4. NONLINEAR MANIFOLDS• What are nonlinear manifolds?: Nonlinear manifolds are manifolds that are not vector spaces, i.e. if x, y ∈ M, then ax + by may not be in M , for a pair a, b ∈ R [1, 14]. The usual Euclidean calculus may not apply. Conventional statistics may not apply. (More later in Part II)• When do we need them?: In applications where certain constraints make the underlying space nonlinear. For example, elements of Rn with unit norm constraint. Or, matrices with orthogonality constraint.• How do we deal with them?: Using differential geometry of the underlying manifold, one can usually derive most of the corresponding results from Euclidean spaces.
  5. 5. MOTIVATION – OUTLINEA large number of problems in computer vision can be studied a problems inoptimization or inference on nonlinear manifolds.We will describe some examples: • Pose estimation and recognition of 3D objects from static 2D images. • Tracking of 3D objects using video sequences. • Component Analysis: Is PCA better than ICA? Or vice-versa? – Search for optimal linear projection for reducing image size, before a statistical analysis. Non-negative matrix factorizations, sparse representations, etc are other examples.
  6. 6. • Statistical study of shapes: shapes of 2D and 3D objects can be represented via landmarks, curves, or surfaces.• Learning Image manifold: estimation and learning of manifolds formed by images embedded in a bigger Euclidean space.• Structure from motion, shape from shading, etc
  7. 7. PROBLEM 1: POSE ESTIMATIONWe are interested in recognizing known objects in observed images. Consider asimple sub-problem from “transformable templates”.Goal: Given a 3D model of an object (truck in the figure), and a noisy image I , how toestimate its pose (orientation) in that image?Mathematical Formulation:Let I α be a 3D mesh of the object (e.g. facial scan). Orientation of an object can be
  8. 8. represented by a 3 × 3 orthogonal matrix [6]. The space of all such matrices is: SO(3) = {O ∈ R3×3 |OT O = I3 , det(O) = 1} . Figure 1: A truck rendered at different orientations in SO(2).Notation: Let T (sI α ) denote an image of an object α taken from a pose s ∈ SO(3).T is a projection from R3 to R2 .Image Model: Let a simple image model be: I = T (sI α ) ⊕ clutter.
  9. 9. Problem: Given an image I , estimate s ∈ SO(3) according to: sα = argmax P (I|s, α) ˆ (MLE) . s∈SO(3)SO(3) is not a vector space. However, it is a Lie group! How to solve optimizationproblems on Lie groups?Given a function F : SO(n) → R, we need to tools to solve optimization problemson SO(n). We need definitions of derivatives (gradients), Hessians, increments, etc.[10, 5, 3, 15]
  10. 10. GRADIENT-BASED OPTIMIZATIONIn Rn , one can seek a local optimum of a function F using a gradient process: dX(t) = F (X(t)) , dtor its discrete implementation, for > 0 small, X(t + ) = X(t) + F (X(t)) .Addition is not a valid operation in SO(3)!One has to use the differential geometry of SO(3) to construct gradient flows.
  11. 11. OBJECT RECOGNITIONGoal: Given a noisy image I of an object, find the most probable object?Several strategies. One is to define posterior probability: P (α|I) = P (α) P (I|s, α)P (s|α)ds . SS can include pose, illumination, motion, etc. S is the set of nuisance variables inrecognition.Solve for: α = argmax P (α|I) . ˆ αThis requires tools for integration on a Lie group.
  12. 12. PROBLEM 2: OBJECT MOTION TRACKINGGoal: Given a time series of sensor measurements (images, range measurements),estimate a moving object’s positions and orientations at each time.Representation: Rotation and translation as treated as elements of special Euclideangroup SE(3) = SO(3) R3 . Consider the stochastic process {st ∈ SE(3), t = 1, 2, . . . , T } ,and the goal is to estimate the time series {st } using observations.Solution:If st was in a Euclidean space, then the solution comes from filtering, smoothing, orprediction.
  13. 13. Classical Filtering: For Euclidean case,Use a state equation and an observation equation: State equation : st+1 = A(st ) + µt Observation equation : It+1 = B(st+1 ) + νtThen, find the mean and covariance associated with the posterior P (st |I1 , I2 , . . . , It ) .If A and B are linear functions, (and µt , νt are independent Gaussian), then Kalmanfiltering provides exact solution. Else, use Monte Carlo methods for approximatesolutions.
  14. 14. Nonlinear Manifolds: Kalman filtering does not apply, but Monte Carlo methods do!Use nonlinear filtering equations: Under usual Markov assumptions, Predict : P (st+1 |I[1:t] ) = P (st+1 |st )P (st |I[1:t] )dst st P (It+1 |st+1 )P (st+1 |I[1:t] ) Update : P (st+1 |I[1:t+1] ) = P (It+1 |I[1:t] )Use Monte Carlo method (condensation, Jump-Diffusion, etc.) to solve for theposterior mean, covariance, etc, at each time.We need tools to sample from probability distributions and to compute averages fromthese samples on nonlinear manifolds.Statistics on circle: [9], statistics on SO(n): [4], statistics on Grassmann manifolds:[16]
  15. 15. PROBLEM 3: OPTIMAL LINEAR PROJECTIONS• Raw images are elements of a high-dimensional space.• One need to reduce their dimension using a linear or a nonlinear method, before proceeding to a statistical analysis.• In view of their simplicity, linear projections are popular, e.g. PCA, ICA, FDA, etc.• Let X ∈ Rn be a random image (as a column vector), and U ∈ Rn×d (d << n) be an orthogonal matrix. That is, U T U = Id . Let u(X) = U T X be a reduced representation of X .
  16. 16. • How to choose U in a particular application? For example, in recognition of people by their facial images, is PCA the best linear projection one can have? • In general, the solution is difficult! However, assume that we have a well-defined performance function F that evaluates the choice of U . For example, F is the recognition performance on the training data. (More in Part IV). F can also be a measure of sparsity, or independence of components. • How to solve for: ˆ U = argmax F (U ) . U On what space should this optimization problem be solved?Representation: The set of all n × d orthogonal matrices forms a Stiefel manifold
  17. 17. Sn,d . In case F depends on the subspace but not on a basis chosen to represent thesubspace, then the search space is a Grassmann manifold Gn,d .There are quotient spaces: Sn,d = SO(n)/SO(n − d) andSteifel:Grassmann: Gn,d = SO(n)/(SO(n − d) × SO(d)).Solution: Solve for ˆ U = argmax F (U ) or argmax F (U ) . Sn,d Gn,d
  18. 18. Example: X denotes face images and u(X) is used for recognition. Let F be arecognition performance and choose U to maximize F . U is an element of aGrassmann manifold Gn,d . [8]. 100 Recognition rate (%) 95 90 85 80 0 200 400 600 800 1000 IterationsFigure shows evolution of F (Xt ), where Xt is a stochastic optimization process onGn,d .(More in Part IV)
  19. 19. PROBLEM 4: STATISTICAL SHAPE ANALYSIS 50 50 50 50 100 100 100 150 100 200 150 150 150 250 200 200 200 300 350 250 250 250 400 300 300 300 450 350 500 350 350 50 100 150 200 250 300 350 400 450 500 550 50 100 150 200 250 300 350 400 450 500 550 50 100 150 200 250 300 350 400 450 500 550 50 100 150 200 250 300 350 400 450 500 550We wish to use shapes of boundaries as features in object recognition.Consider shapes in R2 . The important issues are: how to represent shapes, how tocompare them, how to analyze them statistically?
  20. 20. PROCRUSTES APPROACHRepresentation: Let a shape be represented by k points on its boundary. A shape isan element of x ∈ R2k or z ∈ Ck . [2, 13].Shape Preserving Transformations: Under rigid rotation and translation, and uniformscaling, the shape remains unchanged. We need a quantity that uniquely represents ashape. Assuming known registration: 1 k • Remove translation by assuming k i=1 zi = 0. All shapes are centered. • Remove scale by assuming z = 1, i.e. all shape vectors lie on a unit sphere. • Remove rotation in a pairwise fashion. For example, to align z (2) to z (1) replace ˆ by ej θ z (2) , where ˆ θ = argmin z (1) − ejθ z (2) 2 . θ∈SO(2)
  21. 21. PROCRUSTES APPROACH Contd.Pre-shape Space: k k 1 C = {z ∈ C | zi = 0, z = 1} . k i=1Shape Space: S = C/SO(2). Shape space is a quotient space; all shapes that arewithin a planar rotation are considered equivalent.How to quantify dissimilarities between two shapes?Construct a geodesic path between them on S , and compute the geodesic length. Itserves as a shape metric.
  22. 22. ANOTHER APPROACH: GEOMETRY OF CLOSED CURVESGoal: To analyze shapes of continuous boundaries of objects as they appear inimages. [7]Representation: • Denote shapes by their angle functions θ (with arc-length parametrization). Consider 2π 2π 2 C = {θ ∈ L [0, 2π]| cos(θ(s))ds = sin(θ(s))ds = 0} 0 0 • Shape space S = C/S , where S includes rotation (unit circle) and re-parametrization (unit circle) groups.Solution: Compute exponential flows or geodesics on S to quantify shape differences.
  23. 23. Example 2 1 0 0 2 4 6 8 10 12 2 1 0 0 2 4 6 8 10 12Geodesic paths between the end shapes. Geodesic lengths quantify shapedifferences, and help perform statistical analysis of shapes.
  24. 24. GEODESIC FLOWS ON SHAPE SPACES OF CUVES
  25. 25. PROBLEM 5: LEARNING IMAGE MANIFOLDSLet I(s, α) denote an image of object α at variables (pose, location, illumination, etc)denoted by s. Let I α = {I(s, α)|s ∈ S} be the collection of all such images of α.I α is called image manifold of α. IMAGE MANIFOLD IMAGE SPACE
  26. 26. Complete image manifold: I= Iα . α
  27. 27. Goal: We are interested in estimating (or learning) I from the given observations.Representation: I is a low-dimensional manifold embedded in a large Euclideanspace. We wish to represent individual images by their coefficients in low-dimensionalsubspaces.Solutions: [12]http://www.cse.msu.edu/ lawhiu/manifold/ 1. Isomap: Denote observed images as points on a discrete lattice. Approximate geodesic lengths on I by shortest graph lengths on this lattice. Perform dimension reduction, dimension estimation, clustering, etc. using this geodesic length. [11]. 2. Locally-linear embedding: [17] 3. Pattern Theoretic Approach:
  28. 28. CVPR TUTORIAL ONNONLINEAR MANIFOLDS IN COMPUTER VISION II. DIFFERENTIABLE MANIFOLDS Washington Mio Anuj Srivastava and Xiuwen Liu (Illustrations by D. Badlyans) July 27th, 2004 – Washington, DC
  29. 29. WHY MANIFOLDS?Non-linearity underlies many mathematical structures and representations used incomputer vision. For example, many image and shape descriptors fall in thiscategory, as discussed in the introduction to this tutorial.On the other hand, differential calculus is one of the most useful tools employed inthe study of phenomena modeled on Euclidean n-space Rn . The study ofoptimization and inference problems, dynamics, and geometry on linear spacescan all be approached with the tools of calculus.
  30. 30. A natural question arises:What spaces can the techniques of differential calculus be extended to?Since derivative is a local notion, it is reasonable to expect that differential calculuscan be developed in spaces that locally “look like” Euclidean spaces. These spacesare known as differentiable manifolds.
  31. 31. Manifolds Underlying Data Points 1.5 1.5 1 1 0.5 0.5 0 0−0.5 −0.5 −1 −1−1.5 −1.5 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 1.5 1 0.5 0−0.5 −1 −2−1.5 −1 1.5 1 0 0.5 0 1 −0.5 −1 2 −1.5
  32. 32. For simplicity, we only consider manifolds in Euclidean spaces, but a more generalapproach may be taken. Also, most concepts to be discussed extend toinfinite-dimensional manifolds, whose relevance to shape and vision problemsalready has been indicated in the introduction.Our discussion will be somewhat informal. A few references to more complete andgeneral treatments: • W. Boothby, An Introduction to Differentiable Manifolds and Riemannian Geometry, Academic Press, 2002. • F. Warner, Foundations of Differential Geometry and Lie Groups, Graduate Texts in Mathematics, Springer-Verlag, 1983. • J. Lee, Introduction to Smooth Manifolds, Springer-Verlag, 2002.
  33. 33. How to make sense of “locally similar” to an Euclidean space?A map ϕ : U → Rm defined on an open region U ⊆ Rn , n ≤ m, is said to be aparameterization if:(i) ϕ is a smooth (i.e., infinitely differentiable), one-to-one mapping. ϕ −→ U ⊂ R2This simply says that V = ϕ(U ) is produced by bending and stretching the regionU in a gentle, elastic manner, disallowing self-intersections.
  34. 34. ∂ϕi(ii) The m × n Jacobian matrix J(x) = ∂xj (x) has rank n, for every x ∈ U. Here, x = (x1 , . . . , xn ) and ϕ = (ϕ1 , . . . , ϕm ). 9 8 7 6 5 4 3 2 1 0 −30 −20 −10 0 10 20 30 (a) ϕ(t) = (t3 , t2 ) (b) ϕ(t, s) = (t3 , t2 , s)This condition further ensures that V has no sharp bends, corners, peaks, or othersingularities.
  35. 35. (ii) The inverse map x = ϕ−1 : V → U is continuous. ϕ −→This condition has to do with the fact that we should be able to recover U from Vwith a continuous deformation. In the illustration above, ϕ−1 is not continuous.One often refers to ϕ as a parameterization of the set V = ϕ(U ), and to theinverse mapping x = ϕ−1 as a local chart, or a local coordinate system on V .
  36. 36. Definition of Differentiable ManifoldsA subspace M ⊆ Rm is an n-dimensional differentiable manifold if every pointhas a neighborhood that admits a parameterization defined on a region in Rn . Moreprecisely, for every point x ∈ M , there are a neighborhood Vx ⊆ M of x, and aparameterization ϕ : U ⊆ Rn → Rm , with ϕ(U ) = Vx . ϕ −→
  37. 37. Examples of Manifolds• The Space of Directions in RnA direction is determined by a unit vector. Thus, this space can be identified with theunit sphere Sn−1 = {x ∈ Rn : x = 1}Notice that Sn−1 can be described as the level set F −1 (1) of the functionF (x1 , . . . , xn ) = x2 + . . . + x2 . 1 n
  38. 38. This implicit description of the sphere is a special case of the following generalconstruction.Let F: Rm → Rk , k < m, be a smooth mapping, and let a ∈ Rk . If the k × mJacobian matrix J(x) of F has rank k, for every x ∈ F −1 (a), then M = F −1 (a) ⊂ Rmis an (m − k)-dimensional manifold. When this condition on J is satisfied, a ∈ Rkis said to be a regular value of F .• The Graph of a Smooth FunctionLet g : Rn → R be smooth. Then, M = (x, y) ∈ Rn+k : y = g(x)is an n-dimensional manifold. Here, F (x, y) = y − g(x) and M = F −1 (0).
  39. 39. • The Group O(n) of Orthogonal MatricesLet M(n) ∼ Rn2 be the space of all n × n real matrices, and S(n) ∼ Rn(n+1)/2 = =the subcollection of all symmetric matrices. Consider the map F : M(n) → S(n)given by F (A) = AAt , where the superscript t indicates transposition.Orthogonal matrices are those satisfying AAt = I . Hence, O(n) = F −1 (I).It can be shown that I is a regular value of F . Thus, the group of n × n orthogonalmatrices is a manifold of dimension 2 n(n + 1) n(n − 1) n − = . 2 2Remark. O(n) has two components formed by the orthogonal matrices with positiveand negative determinants, resp. The positive component is denoted SO(n).
  40. 40. • Lines in Rn Through the OriginA line in Rn through the origin is completely determined by its intersection with the unitsphere Sn−1 . This intersection always consists a pair of antipodal points{x, −x} ⊂ Sn−1 . Thus, the space of lines through the origin can be modeledon the sphere Sn−1 with antipodal points identified. This is an (n − 1)-dimensionalmanifold known as the real projective space RPn−1 .
  41. 41. • Grassmann ManifoldsThe spaces G(n, k) of all k -planes through the origin in Rn , k ≤ n, generalize realprojective spaces. G(n, k) is a manifold of dimension k(n − k) and is known as aGrassmann manifold.A k -plane in Rn determines an (n − k)-plane in Rn (namely, its orthogonalcomplement), and vice-versa. Thus, G(n, k) ∼ G(n, n − k). =
  42. 42. What is a Differentiable Function?Let f :M n ⊂ Rm → R be a function. Given u0 ∈ M , let ϕ : U → Rm be aparameterization such that ϕ(x0 ) = u0 . The map f is said to be differentiable atthe point u0 if the composite map f ◦ ϕ : U ⊂ Rn → R,given by f ◦ ϕ(x) = f (ϕ(x)), is differentiable at x0 . ϕ f −→ −→
  43. 43. Tangent VectorsA tangent vector to a manifold M n ⊂ Rm at a point p ∈ M is a vector in Rm thatcan be realized as the velocity at p of a curve in M . ∈ Rm is tangent to M at p if there is a smooth curveMore precisely, a vector vα : (−ε, ε) → M ⊆ Rm such that α(0) = p and α (0) = v .
  44. 44. Tangent SpacesThe collection of all tangent vectors to an n-manifold M n ⊂ Rm at a point p forms ann-dimensional real vector space denoted Tp M .If ϕ : U ⊂ Rn → M is a parameterization with ϕ(a) = p, then the set ∂ϕ ∂ϕ (a), . . . , (a) ∂x1 ∂xnis a basis of Tp M .
  45. 45. ExampleLet F : Rm → R be a differentiable function, and let a ∈ R be a regular value of F .The manifold M = F −1 (a) is (n − 1)-dimensional and the tangent space to Mat p consists of all vectors perpendicular to the gradient vector F (x); that is, Tx M = {v ∈ Rm : F (x), v = 0} .Here, , denotes the usual inner (or dot) product on Rn .• The Sphere = x2 + . . . + x2 , the level set Sn−1 = F −1 (1) is theIf F (x1 , . . . , xn ) 1 n(n − 1)-dimensional unit sphere. In this case, F (x) = 2(x1 , . . . , xn ) = 2x.Thus, the tangent space to the Sn−1 at the point x consists of vectors orthogonal tox; i.e., Tx Sn−1 = {v ∈ Rn : x, v = 0} .
  46. 46. DerivativesLet f :M → R be a differentiable function and v ∈ Tp M a tangent vector to M atthe point p.Realize v as the velocity vector at p of a parametric curve α : (−ε, ε) → M . Thederivative of f at p along v is the derivative of f (α(t)) at t = 0; that is, d dfp (v) = (f ◦ α) (0). dtThe derivatives of f at p ∈ M along vectors v ∈ Tp M can be assembled into asingle linear map dfp : Tp M → R v → dfp (v),referred to as the derivative of f at p.
  47. 47. Differential Geometry on MGiven a manifold M n ⊂ Rm , one can define geometric quantities such as thelength of a curve in M , the area of a 2D region in M , and curvature.The length of a parametric curve α : [0, 1] → M is defined in the usual way, asfollows. The velocity vector α (t) is a tangent vector to M at the point α(t). Thespeed of α at t is given by 1/2 α (t) = [ α (t), α (t) ] ,and the length of α by 1 L= α (t) dt. 0
  48. 48. GeodesicsThe study of geodesics is motivated, for example, by the question of finding the curveof minimal length connecting two points in a manifold M . This is important inapplications because minimal-length curves can be used to define an intrinsicgeodesic metric on M , which is useful in a variety of ways. In addition, geodesicsprovide interpolation and extrapolation techniques, and tools of statistical analysis onnon-linear manifolds.In order to avoid further technicalities, think of geodesics as length-minimizing curvesin M (locally, this is always the case). A more accurate way of viewing geodesics is asparametric curves with zero intrinsic acceleration. I → M ⊂ Rm is a geodesic if and only if the acceleration vectorA curve α :α (t) ∈ Rm is orthogonal to Tα(t) M , for every t ∈ I .
  49. 49. Numerical Calculation of Geodesics1. Geodesics with prescribed initial position and velocityFor many manifolds that arise in applications to computer vision, one can write thedifferential equation that governs geodesics explicitly, which can then be integratednumerically using classical methods.
  50. 50. 2. Geodesics between points a, b ∈MOne approach is to use a shooting strategy. The idea is to shoot a geodesic from ain a given direction v ∈ Ta M and follow it for unit time. The terminal pointexpa (v) ∈ M is known as the exponential of v . If we hit the target (that is,exp(v) = b), then the problem is solved.
  51. 51. 2Otherwise, we consider the miss function h(v) = exp(v) − b , defined forv ∈ Ta M .The goal is to minimize (or equivalently, annihilate) the function h. This optimizationproblem on the linear space Ta M can be approached with standard gradientmethods.
  52. 52. Optimization Problems on ManifoldsHow can one find minima of functions defined on nonlinear manifoldsalgorithmically? How to carry out a gradient search?Example. Let x1 , . . . , xk be sample points in a manifold M . How to make sense ofthe mean of these points?The arithmetic mean µ = (x1 + . . . + xk ) /k used for data in linear spacesdoes not generalize to manifolds in an obvious manner. However, a simple calculationshows that µ can be viewed as the minimum of the total variance function k 1 2 V (x) = x − xi . 2 i=1This minimization problem can be posed in more general manifolds using the geodesicmetric.
  53. 53. Given x1 , . . . , xk ∈ M , consider the total variance function k 1 V (x) = d2 (x, xi ), 2 i=1where d denotes geodesic distance. A Karcher mean of the sample is a localminimum µ ∈ M of V .Back to OptimizationGradient search strategies used for functions on Rn can be adapted to find minima offunctions f : M → R defined on nonlinear manifolds. For example, the calculationof Karcher means can be approached with this technique. We discuss gradients next.
  54. 54. What is the Gradient of a Function on M ?The gradient of a function F : M → R at a point p ∈ M is the unique vector M F (p) ∈ Tp M such that dFp (v) = M F (p), v ,for every v ∈ Tp M . How can it be computed in practice? Oftentimes, F is definednot only on M , but on the larger space Rm in which M is embedded. In this case,one first calculates ∂F ∂F F (p) = ,..., ∈ Rm . ∂x1 ∂x1The orthogonal projection of this vector onto Tp M gives M F (p).
  55. 55. Gradient SearchAt each p ∈ M , calculate M f (p) ∈ Tp M . The goal is to search for minima of fby integrating the negative gradient vector field − M f on M . • Initialize the search at a point p ∈ M . • Update p by infinitesimally following the unique geodesic starting at p with initial velocity − M f (p). • Iterate the procedure.Remark. The usual convergence issues associated with gradient methods on Rnarise and can be dealt with in a similar manner.
  56. 56. Calculation of Karcher MeansIf {x1 , . . . , xn } are sample points in a manifold M , the negative gradient of the totalvariance function V is MV (x) = v1 (x) + . . . + vn (x),where vi (x) is the initial velocity of the geodesic that connects x to xi in unit time.
  57. 57. Applications to the Analysis of Planar ShapesTo illustrate the techniques discussed, we apply them to the study of shapes of planarcontours. We consider two different approaches, namely: • The Procrustean Shape Analysis of Bookstein and Kendall. • The Parametric-Curves Model of Klassen, Srivastava, Mio and Joshi.In both cases, a shape is viewed as an element of a shape space. Geodesics areused to quantify shape similarities and dissimilarities, interpolate and extrapolateshapes, and develop a statistical theory of shapes.
  58. 58. Procrustean Shape AnalysisA contour is described by a finite, ordered sequence of landmark points in the plane,say, p1 , . . . , pn . Contours that differ by translations, rotations and uniformscalings of the plane are to be viewed as having the same shape. Hence, shaperepresentations should be insensitive to these transformations.If µ be the centroid of the given points, the vector x = (p1 − µ, . . . , pn − µ) ∈ R2n .is invariant to translations of the contour. This representation places the centroid at theorigin. To account for uniform scaling, we normalize x to have unit length. Thus, weadopt the preliminary representation y = x/ x ∈ S2n−1 .
  59. 59. In this y -representation, rotational effects reduce to rotations about the origin. Ify = (y1 , . . . , yn ), in complex notation, a θ -rotation of y ∈ S2n−1 about the originis given by y → λy = (λy1 , . . . , λyn ), jθ √where λ = e , where j = −1. ∈ S2n−1 that differ by a unit complex scalar represent theIn other words, vectors ysame shape. Thus, the Procrustean shape space is the space obtained from S2n−1by identifying points that differ by multiplication by a unit complex number. It is manifoldof dimension (2n − 2) known as the complex projective space CP (n − 1).Geodesics in CP (n − 1) can be used to study shapes quantitatively and develop astatistical theory of shapes. This will be discussed in more detail in Part III.
  60. 60. Examples of Procrustean Geodesics
  61. 61. Parametric-Curves ApproachAs a preliminary representation, think of a planar shape as a parametric curveα : I → R2 traversed with constant speed, where I = [0, 1].To make the representation invariant to uniform scaling, fix the length of α to be 1.This is equivalent to assuming that α (s) = 1, ∀s ∈ I .Since α is traversed with unit speed, we can write α (s) = ejθ(s) , where √j = −1.A function θ : I → R with this property is called an angle function for α. Anglefunctions are insensitive to translations, and the effect of a rotation is to add aconstant to θ .
  62. 62. To obtain shape representations that are insensitive to rigid motions of the plane,we fix the average of θ to be, say, π . In other words, we choose representatives thatsatisfy the constraint 1 θ(s) ds = π. 0We are only interested in closed curves. The closure condition 1 1 α(1) − α(0) = α (s) ds = ejθ(s) ds = 0 0 0is equivalent to the real conditions 1 1 cos θ(s) ds = 0 and sin θ(s) ds = 0. 0 0These are nonlinear conditions on θ , which will make the geometry of the shape spacewe consider interesting.
  63. 63. Let C the collection of all θ satisfying the three constraints above. We refer to anelement of C as a pre-shape.• Why call θ ∈ C a pre-shape instead of shape?This is because a shape may admit multiple representations in C due to the choices inthe placement of the initial point (s= 0) on the curve. For each shape, there is a circleworth of initial points. Thus, our shape space S is the quotient of C by the action ofthe re-parametrization group S1 ; i.e., the space S = C/ S1 ,obtained by identifying all pre-shapes that differ by a reparameterization.
  64. 64. The manifold C and the shape space S are infinite dimensional. Although thesespaces can be analyzed with the techniques we have discussed, we considerfinite-dimensional approximations for implementation purposes.We discretize an angle function θ : I → R using a uniform sampling x = (x1 , . . . , xm ) ∈ Rm .The three conditions on θ that define C can be rephrased as: 1 1 m 0 θ(s) ds = π ⇐⇒ m i=1 xi = π 1 m 0 cos θ(s) ds = 0 ⇐⇒ i=1 cos(xi ) = 0 1 m 0 sin θ(s) ds = 0 ⇐⇒ i=1 sin(xi ) = 0
  65. 65. Thus, the finite-dimensional analogue of C is a manifold Cm ⊂ Rm of dimension(m − 3). Modulo 2π , a reparameterization of θ corresponds to a cyclic permutationof (x1 , . . . , xm ), followed by an adjustment to ensure that the first of the threeconditions is satisfied. The quotient space by reparameterizations is thefinite-dimensional version of the shape space S.Examples of Geodesics in S
  66. 66. −2−1 0 1 2 0 2 4 6 8 10 12 14 16 18
  67. 67. CVPR 2004 TUTORIAL ONNONLINEAR MANIFOLDS IN COMPUTER VISION III. STATISTICS ON MANIFOLDS Anuj Srivastava Washington Mio and Xiuwen Liu Sunday, June 27th , 1:00-5:00pm, Washington DC.
  68. 68. STATISTICS – OUTLINEWe are interested in extending usual techniques from statistics onto nonlinearmanifolds: learning, estimation, clustering, testing.We will cover the following topics: • Probability distributions on manifolds of interest. • Definitions and computations of intrinsic means and covariances. Extrinsic versus intrinsic analysis. • Sampling and Monte Carlo estimation on manifolds. Statistical bounds on estimation. • Learning and hypothesis testing on manifolds.
  69. 69. PROBABILITY DISTRIBUTIONS ON MANIFOLDS• Probability densities on Rn are defined with respect to the Lebesgue measure. Similarly, probability measures on a manifold M is defined with respect to an invariant measure, e.g. Haar measure for Lie groups.• (Radon-Nikodym) Derivatives of probability measures with respect to underlying invariant measures provide probability densities.• Some examples of probability densities: – Unit circle S 1 : Analog of Gaussian density: 1 f (θ; θ0 , a) = exp(a cos(θ − θ0 )) I0 (a) is the von Mises density. θ0 is the mean angle, and a is a measure of the variance. I0 is a modified Bessel function of zeroth order.
  70. 70. – Special Orthogonal group SO(n): 1 f (O|A) = exp(trace(OAT ) Z for some A ¯ ¯ ∈ Rn×n . Let A = OP be the polar decomposition of A, then O is the mean of f and P relates to the variance.– Grassmann manifold Gn,d : Let U ∈ Gn,d be represented by an n × d (tall-skinny) orthogonal matrix. 1 f (U ) = exp(trace(U AU T ) Z for some symmetric, positive definite matrix A ∈ Rn×n . SVD of A gives mean and variance of f .A more general form is given later.
  71. 71. STATISTICS ON MANIFOLDSDefinition and estimation of mean on a manifold.For a Riemannian manifold M , let d(p1 , p2 ) be the Riemannian distance between p1and p2 in M . Also, let M be embedded inside Rn for some n, and let p 1 − p2denote the Euclidean distance after embedding.
  72. 72. Mean under a probability density function f (p): • Intrinsic mean: p = argmin ˆ d(p, u)2 f (u)γ(du) , p∈M M • Extrinsic mean: p = argmin ˆ p − u 2 f (u)γ(du) , p∈M M This can also be viewed as computing the mean in Rn and then projecting it back to M . Extrinsic analysis implies emedding the manifold in a larger Euclidean space, computing the estimate there, and projecting the solution back on the manifold. The solution may depend upon the choice of embedding.
  73. 73. EXAMPLE OF MEAN ESTIMATION ON A CIRCLEGiven θ1 , θ2 , . . . , θn ∈ S1. 1. Intrinsic Mean: d(θ1 , θ2 ) = min{|θ1 − θ2 |, |θ1 − θ2 + 2π|, |θ1 − θ2 − 2π| Then, find the mean n ˆ θ = argmin θ d(θ, θi )2 . i=1 Solve this problem numerically. 1 n 2. An Extrinsic Mean: Let zi = ejθi ∈ C, i = 1, . . . , n. Then, z = ˆ n i=1 zi , ˆ and set θ = arg(ˆ). z
  74. 74. STATISTICS ON MANIFOLDSCovariance under a probability density function f (p): • Let expp : TP (M ) → M be the exponential map, computed using geodesic defined earlier. ˆ ˆ ˆ ˜ • Let p be the mean, and Tp (M ) be the tangent space at p. Then, let f be the probability density induced on Tp (M ) using the inverse of exponential map. ˆ • Similarly one can compute higher monments.
  75. 75. Define the covariance Σ: Σ= ˜ xxT f (x)dx ∈ Rm×m −1 where m = dim(M ) and x = expp (p). ˆ (Essentially, one computes the covariance matrix in the tangent space.) ˜• A simple example of f is multivariate normal in the tangent space at p. ˆ
  76. 76. LEARNING PROBABILITY MODELS FROM OBSERVATIONSExample: Assume a multivariate normal model in the tangent space at mean.Observed shapes:Sample statistics of observed shapes: 4 3.5 3 2.5 2 1.5 1 0.5 0 0 5 10 15 20 Mean Shape Eigen Values of Covariance matrix
  77. 77. LEARNING PROBABILITY MODELS FROM OBSERVATIONSSynthesis:(i) Generate samples of x, multivariate normal on the tangent space, and(ii) Use expp (x) to generate samples on M . ˆExamples: Samples from multivariate normal model on tangent shape space.
  78. 78. CVPR 2004 TUTORIAL ONNONLINEAR MANIFOLDS IN COMPUTER VISION IV. ALGORITHMS AND EXAMPLES Xiuwen Liu Anuj Srivastava and Washington Mio
  79. 79. Algorithms and ExamplesHere we focus on computational issues of implementing algorithms based onnonlinear manifolds. Live demonstrations will be given to show the feasibilities,effectiveness and efficiency.We use two examples for illustrations. • Optimal Component Analysis. • Computation of geodesics on shape manifolds and shape clustering
  80. 80. Optimal Component AnalysisOptimal component analysis (OCA) is a framework to formulate computer visionproblems based on linear representations on proper manifolds • Define optimization criteria. • Identify the underlying manifolds, such as Grassmann or Stiefel manifold. • Design optimization algorithm utilizing the geometric properties of the manifold.While there exist several standard component analysis techniques, a critical differencebetween OCA and these standard techniques is that OCA provides a generalprocedure in that it can be used to solve different kinds of problems
  81. 81. Commonly Used Linear Representation ManifoldsParamete- Manifold Search space ExamplesrizationSubspace Grassmann Gn,d d(n − d) Optimal subspace/ Principal/minor components d+1Orthonormal Stiefel Sn,d d(n − 2 ) Orthonormalbasis filtersDirections Direction manifold d(n − 1) Optimal filters / (d Grassmann Gn,1 ) Independent componentsVectors Euclidean dn Linear basis with weights
  82. 82. Optimal Linear SubspacesWhen the performance does not depend on the choice of basis but only the subspace,the underlying solution space is Grassmann.To be concrete, here we use recognition for illustration. Let there be C classes to berecognized from the images; each class has ktrain training images (denoted byIc,1 , . . . , Ic,ktrain ) and kcr cross validation (denoted by Ic,1 , . . . , Ic,kcr ).
  83. 83. Optimal Linear Subspaces -cont.For recognition, we define the following criterion, minc =c,j D(Ic,i , Ic ,j ; U ) ρ(Ic,i , U ) = , (1) 1/|τ (Ic,i , K)| j∈τ (I ,K) D(Ic,i , Ic,j ; U ) + 0 c,iwhere τ (Ic,i , K) is the set of images of K closest to Ic,i but in class cD(I1 , I2 ; U ) = a(I1 , U ) − a(I2 , U ) , and 0 > 0 is a small number to avoiddivision by zero. Then, define R according to: C kcr 1 R(U ) = h(ρ(Ic,i , U ) − 1), 0 ≤ R ≤ 1.0 (2) Ckcr c=1 i=1where h(·) is a monotonically increasing and bounded function. In our experiments,we have used h(x) = 1/(1 + exp(−2βx)), where β controls the degree ofsmoothness of R.
  84. 84. Optimal Linear Subspaces -cont.Live Demonstration One: Recognition using part of the CMPU PIE dataset • C = 10. k=10 m=615 best 100.0000% (92.0290%) time 54.0759 s • n = 25 × 25. 100 90 • d = 10. 80 70 Recognition/ratio rate (%) • Initial condition: ICA. 60 • 54s for 100 iterations 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Iterations
  85. 85. Optimal Linear Subspaces -cont.Live Demonstration Two: Recognition using part of the CMPU PIE dataset • C = 10. k=10 m=615 best 100.0000% (99.0866%) time 59.4866 s • n = 25 × 25. 100 90 • d = 10. 80 70 Recognition/ratio rate (%) • Initial condition: Axis. 60 • 1m for 100 iterations 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Iterations
  86. 86. Optimal Linear Subspaces -cont.Live Demonstration Three: Recognition using part of the CMPU PIE dataset • C = 10. k=5 m=620 best 100.0000% (97.0251%) time 24.5866 s • n = 25 × 25. 100 90 • d = 5. 80 70 Recognition/ratio rate (%) • Initial condition: Axis. 60 • 25s for 100 iterations 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Iterations
  87. 87. Optimal Linear Subspaces -cont.Live Demonstration Four: Recognition using part of the CMPU PIE dataset • C = 10. k=3 m=622 best 100.0000% (92.1825%) time 36.5964 s • n = 25 × 25. 100 90 • d = 3. 80 70 Recognition/ratio rate (%) • Initial condition: Axis. 60 • 36.5s for 200 iterations 50 40 30 20 10 0 0 20 40 60 80 100 120 140 160 180 200 Iterations
  88. 88. Optimal Linear Subspaces -cont.Offline Demonstration: Recognition using part of the CMPU PIE dataset • C = 66. k=10 m=615 best 99.1736% (78.5801%) time 638.4258 s • n = 25 × 25. 100 90 • d = 10. 80 70 Recognition/ratio rate (%) • Initial condition: Axis. 60 • 10m for 100 iterations 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Iterations
  89. 89. Optimal Linear Subspaces -cont.Offline Demonstration: Recognition using part of the CMPU PIE dataset • C = 66. k=10 m=9990 best 99.8623% (92.8656%) time 42934.3984 s 100 • n = 100 × 100. 90 80 • d = 10. 70 Recognition/ratio rate (%) • Initial condition: Axis. 60 50 • 42934s for 500 iterations 40 30 20 10 0 0 50 100 150 200 250 300 350 400 450 500 Iterations
  90. 90. Further Speedup TechniquesWe have developed techniques to further speed up the optimization process.A particularly effective technique is called hierarchical learning based on someheuristics to decompose the process into layers.
  91. 91. Hierarichal LearningLevel Image Basis 0 1 L
  92. 92. Hierarichal Learning -cont.We have applied hierarchical learning on CMPU PIE and ORL datasets. We typicallyspeed up the process by a factor of 30. 4 x 10 2 1.5 Time (s) 1 0.5 0 0 1 2 3 Level L
  93. 93. Optimal Linear Subspaces -cont.Some experimental resultsPCA - Black; ICA - Red; FDA - Green; OCA - Blue 100 100 Recognition rate (%) Recognition rate (%) 50 50 0 0 1 10 20 1 4 8 Dimension of subspace Number of training
  94. 94. Other Performance FunctionsNote that this framework applies to any performance functions.To derive linear representations that are sparse and effective for recognition, we haveused a linear combination of sparseness and recognition performance as theperformance function.
  95. 95. Sparse Linear Representations for Recognition 100 0 90Recognition rate −0.3 Sparseness 80 −0.6 70 −0.9 60 50 −1.2 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Iterations Iterations 1 1 0.8 0.8Distribution Distribution 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −10 −5 0 5 10 −10 −5 0 5 10 Coefficients Coefficients
  96. 96. Comparison of Sparseness and Recognition Optimal PCAPerformance FDA InfoMax FastICA Sparseness
  97. 97. DiscussionNote that OCA provides a framework to formulate and solve problems. In computervision, a central issue is generalization, which is determined by the performancefunction you use. The performance we use here is loosely connected to large marginidea and often give good generalization performance. 100 25 20 Correct recognition(%) 80 15 Difference(%) 10 60 5 40 0 −5 20 −10 0 −15 0 2000 4000 0 2000 4000 Iterations Iterations
  98. 98. Geodesic Path Examples on Shape Manifold210 0 2 4 6 8 10 12210 0 2 4 6 8 10 12
  99. 99. Geodesic Computation and Shape ClusteringLive Demonstration Five: Geodesic Computation and Shape Clustering.Here we show how fast one can compute geodesics among shapes and then performclustering based on the pair-wise geodesic distances.For a small database of 25 shapes, the pair-wise geodesic (300) computation takes 13seconds in C and the clustering takes 10 seconds in Matlab (A C implementationwould improve by a factor 5-10).For a larger database of 300 shapes, the pair-wise geodesic (44850) computationtakes 600 seconds and the clustering takes 170 seconds in Matlab.
  100. 100. Shape Clustering -cont.
  101. 101. Shape Clustering -cont.
  102. 102. Hierarchical Shape ClusteringWe have clustered 3000 shapes using this method. This figure shows the first fewlayers in the resulting shape hierarchy.
  103. 103. ABCDEF
  104. 104. A B C D E F Database
  105. 105. A Brief Summary of Part IV• By utilizing geometric structures of underlying manifolds, we have developed effective optimization algorithms that can be implemented efficiently.• Note that the computation here can be done offline.• We hope techniques along this line of research will be developed for many other computer vision problems.
  106. 106. THANK YOU!
  107. 107. References[1] W. M. Boothby. An Introduction to Differential Manifolds and Riemannian Geome- try. Academic Press, Inc., 1986.[2] I. L. Dryden and K. V. Mardia. Statistical Shape Analysis. John Wiley & Son, 1998.[3] A. Edelman, T. Arias, and S. T. Smith. The geometry of algorithms with orthogonal- ity constraints. SIAM Journal of Matrix Analysis and Applications, 20(2):303–353, 1998.[4] U. Grenander, M. I. Miller, and A. Srivastava. Hilbert-Schmidt lower bounds for estimators on matrix Lie groups for ATR. IEEE Transactions on PAMI, 20(8):790– 802, 1998.[5] H. Hendricks. A Cramer-Rao type lower bound for estimators with values in a manifold. Journal of Multivariate Analysis, 38:245–261, 1991.[6] K. Kanatani. Geometric Computation for Machine Vision. Clarendon Press, Ox- ford, 1993.
  108. 108. [7] E. Klassen, A. Srivastava, W. Mio, and S. Joshi. Analysis of planar shapes using geodesic paths on shape spaces. IEEE Pattern Analysis and Machiner Intelli- gence, 26(3):372–383, March, 2004. [8] X. Liu, A. Srivastava, and K. Gallivan. Optimal linear representations of images for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 26(5):662–666, 2004. [9] K. V. Mardia. Statistics of Directional Data. Academic Press, 1972.[10] J. M. Oller and J. M. Corcuera. Intrinsic analysis of statistical estimation. Annals of Statistics, 23(5):1562–1581, 1995.[11] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, 2000.[12] H. S. Seung and D. D. Lee. The manifold ways of perception. Science, 290:2268– 2269, 2000.[13] C. G. Small. The Statistical Theory of Shape. Springer, 1996.
  109. 109. [14] Michael Spivak. A Comprehensive Introduction to Differential Geometry, Vol I & II. Publish or Perish, Inc., Berkeley, 1979.[15] A. Srivastava and E. Klassen. Monte Carlo extrinsic estimators for manifold-valued parameters. IEEE Trans. on Signal Processing, 50(2):299–308, February 2001.[16] A. Srivastava and E. Klassen. Bayesian, geometric subspace tracking. Journal for Advances in Applied Probability, 36(1):43–56, March 2004.[17] J. B. Tenenbaum, V. Silva, , and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, 2000.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×