1.
Generalized Principal Component Analysis Tutorial @ CVPR 2008 Yi Ma René Vidal ECE Department Center for Imaging Science University of Illinois Institute for Computational Medicine Urbana Champaign Johns Hopkins University
2.
Data segmentation and clustering • Given a set of points, separate them into multiple groups • Discriminative methods: learn boundary • Generative methods: learn mixture model, using, e.g. Expectation Maximization
3.
Dimensionality reduction and clustering • In many problems data is high-dimensional: can reduce dimensionality using, e.g. Principal Component Analysis • Image compression • Recognition – Faces (Eigenfaces) • Image segmentation – Intensity (black-white) – Texture
4.
Segmentation problems in dynamic vision • Segmentation of video and dynamic textures • Segmentation of rigid-body motions
5.
Segmentation problems in dynamic vision • Segmentation of rigid-body motions from dynamic textures
6.
Clustering data on non Euclidean spaces • Clustering data on non Euclidean spaces – Mixtures of linear spaces – Mixtures of algebraic varieties – Mixtures of Lie groups • “Chicken-and-egg” problems – Given segmentation, estimate models – Given models, segment the data – Initialization? • Need to combine – Algebra/geometry, dynamics and statistics
7.
Outline of the tutorial • Introduction (8.00-8.15) • Part I: Theory (8.15-9.45) – Basic GPCA theory and algorithms (8.15-9.00) – Advanced statistical methods for GPCA (9.00-9.45) • Questions (9.45-10.00) • Break (10.00-10.30) • Part II: Applications (10.30-12.00) – Applications to motion and video segmentation (10.30-11.15) – Applications to image representation & segmentation (11.15-12.00) • Questions (12.00-12.15)
8.
Part I: Theory • Introduction to GPCA (8.00-8.15) • Basic GPCA theory and algorithms (8.15-9.00) – Review of PCA and extensions – Introductory cases: line, plane and hyperplane segmentation – Segmentation of a known number of subspaces – Segmentation of an unknown number of subspaces • Advanced statistical and methods for GPCA (9.00-9.45) – Lossy coding of samples from a subspace – Minimum coding length principle for data segmentation – Agglomerative lossy coding for subspace clustering
9.
Part II: Applications in computer vision • Applications to motion & video segmentation (10.30-11.15) – 2-D and 3-D motion segmentation – Temporal video segmentation – Dynamic texture segmentation • Applications to image representation and segmentation (11.15-12.00) – Multi-scale hybrid linear models for sparse image representation – Hybrid linear models for image segmentation
12.
Part IGeneralized Principal Component Analysis René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University
13.
Principal Component Analysis (PCA) • Given a set of points x1, x2, …, xN – Geometric PCA: find a subspace S passing through them – Statistical PCA: find projection directions that maximize the variance • Solution (Beltrami’1873, Jordan’1874, Hotelling’33, Eckart-Householder-Young’36) Basis for S • Applications: data compression, regression, computer vision (eigenfaces), pattern recognition, genomics
14.
Extensions of PCA • Higher order SVD (Tucker’66, Davis’02) • Independent Component Analysis (Common ‘94) • Probabilistic PCA (Tipping-Bishop ’99) – Identify subspace from noisy data – Gaussian noise: standard PCA – Noise in exponential family (Collins et al.’01) • Nonlinear dimensionality reduction – Multidimensional scaling (Torgerson’58) – Locally linear embedding (Roweis-Saul ’00) – Isomap (Tenenbaum ’00) • Nonlinear PCA (Scholkopf-Smola-Muller ’98) – Identify nonlinear manifold by applying PCA to data embedded in high-dimensional space • Principal Curves and Principal Geodesic Analysis (Hastie-Stuetzle’89, Tishbirany ‘92, Fletcher ‘04)
15.
Generalized Principal Component Analysis • Given a set of points lying in multiple subspaces, identify – The number of subspaces and their dimensions – A basis for each subspace – The segmentation of the data points • “Chicken-and-egg” problem – Given segmentation, estimate subspaces – Given subspaces, segment the data
16.
Prior work on subspace clustering • Iterative algorithms: – K-subspace (Ho et al. ’03), – RANSAC, subspace selection and growing (Leonardis et al. ’02) • Probabilistic approaches: learn the parameters of a mixture model using e.g. EM – Mixtures of PPCA: (Tipping-Bishop ‘99): – Multi-Stage Learning (Kanatani’04) • Initialization – Geometric approaches: 2 planes in R3 (Shizawa-Maze ’91) – Factorization approaches: independent subspaces of equal dimension (Boult-Brown ‘91, Costeira-Kanade ‘98, Kanatani ’01) – Spectral clustering based approaches: (Yan-Pollefeys’06)
17.
Basic ideas behind GPCA• Towards an analytic solution to subspace clustering – Can we estimate ALL models simultaneously using ALL data? – When can we do so analytically? In closed form? – Is there a formula for the number of models?• Will consider the most general case – Subspaces of unknown and possibly different dimensions – Subspaces may intersect arbitrarily (not only at the origin)• GPCA is an algebraic geometric approach to data segmentation – Number of subspaces = degree of a polynomial – Subspace basis = derivatives of a polynomial – Subspace clustering is algebraically equivalent to • Polynomial fitting • Polynomial differentiation
18.
Applications of GPCA in computer vision • Geometry – Vanishing points • Image compression • Segmentation – Intensity (black-white) – Texture – Motion (2-D, 3-D) – Video (host-guest) • Recognition – Faces (Eigenfaces) • Man - Woman – Human Gaits – Dynamic Textures • Water-bird • Biomedical imaging • Hybrid systems identification
19.
Introductory example: algebraic clustering in 1D • Number of groups?
20.
Introductory example: algebraic clustering in 1D • How to compute n, c, b’s? – Number of clusters – Cluster centers – Solution is unique if – Solution is closed form if
21.
Introductory example: algebraic clustering in 2D • What about dimension 2? • What about higher dimensions? – Complex numbers in higher dimensions? – How to find roots of a polynomial of quaternions? • Instead – Project data onto one or two dimensional space – Apply same algorithm to projected data
22.
Representing one subspace • One plane • One line • One subspace can be represented with – Set of linear equations – Set of polynomials of degree 1
23.
Representing n subspaces • Two planes • One plane and one line – Plane: – Line: De Morgan’s rule • A union of n subspaces can be represented with a set of homogeneous polynomials of degree n
24.
Fitting polynomials to data points • Polynomials can be written linearly in terms of the vector of coefficients by using polynomial embedding Veronese map • Coefficients of the polynomials can be computed from nullspace of embedded data – Solve using least squares – N = #data points
25.
Finding a basis for each subspace • Case of hyperplanes: – Only one polynomial – Number of subspaces – Basis are normal vectors Polynomial Factorization (GPCA-PFA) [CVPR 2003] • Find roots of polynomial of degree in one variable • Solve linear systems in variables • Solution obtained in closed form for • Problems – Computing roots may be sensitive to noise – The estimated polynomial may not perfectly factor with noisy – Cannot be applied to subspaces of different dimensions • Polynomials are estimated up to change of basis, hence they may not factor, even with perfect data
26.
Finding a basis for each subspace Polynomial Differentiation (GPCA-PDA) [CVPR’04] • To learn a mixture of subspaces we just need one positive example per class
27.
Choosing one point per subspace • With noise and outliers – Polynomials may not be a perfect union of subspaces – Normals can estimated correctly by choosing points optimally • Distance to closest subspace without knowing segmentation?
28.
GPCA for hyperplane segmentation • Coefficients of the polynomial can be computed from null space of embedded data matrix – Solve using least squares – N = #data points • Number of subspaces can be computed from the rank of embedded data matrix • Normal to the subspaces can be computed from the derivatives of the polynomial
29.
GPCA for subspaces of different dimensions • There are multiple polynomials fitting the data • The derivative of each polynomial gives a different normal vector • Can obtain a basis for the subspace by applying PCA to normal vectors
30.
GPCA for subspaces of different dimensions • Apply polynomial embedding to projected data • Obtain multiple subspace model by polynomial fitting – Solve to obtain – Need to know number of subspaces • Obtain bases & dimensions by polynomial differentiation • Optimally choose one point per subspace using distance
31.
An example• Given data lying in the union of the two subspaces• We can write the union as• Therefore, the union can be represented with the two polynomials
32.
An example• Can compute polynomials from• Can compute normals from
33.
Dealing with high-dimensional data • Minimum number of points – K = dimension of ambient space – n = number of subspaces Subspace 1 • In practice the dimension of each subspace ki is much smaller than K Subspace 2 – Number and dimension of the subspaces is preserved by a linear projection onto a subspace of dimension • Open problem: how to choose – Can remove outliers by robustly projection? fitting the subspace – PCA?
34.
GPCA with spectral clustering • Spectral clustering – Build a similarity matrix between pairs of points – Use eigenvectors to cluster data • How to define a similarity for subspaces? – Want points in the same subspace to be close – Want points in different subspace to be far • Use GPCA to get basis • Distance: subspace angles
35.
Comparison of PFA, PDA, K-sub, EM 18 PFA K−sub 16 PDA Error in the normals [degrees] EM 14 PDA+K−sub PDA+EM 12 PDA+K−sub+EM 10 8 6 4 2 0 0 1 2 3 4 5 Noise level [%]
36.
Dealing with outliers • GPCA with perfect data • GPCA with outliers • GPCA fails because PCA fails seek a robust estimate of Null(Ln ) where Ln = [ n (x1 ), . . . , n (xN )].
37.
Three approaches to tackle outliers • Probability-based: small-probability samples – Probability plots: [Healy 1968, Cox 1968] – PCs: [Rao 1964, Ganadesikan & Kettenring 1972] – M-estimators: [Huber 1981, Camplbell 1980] – Multivariate-trimming (MVT): [Ganadesikan & Kettenring 1972] • Influence-based: large influence on model parameters – Parameter difference with and without a sample: [Hampel et al. 1986, Critchley 1985] • Consensus-based: not consistent with models of high consensus. – Hough: [Ballard 1981, Lowe 1999] – RANSAC: [Fischler & Bolles 1981, Torr 1997] – Least Median Estimate (LME): [Rousseeuw 1984, Steward 1999]
40.
Robust GPCAComparison with RANSAC• Accuracy (q) (2,2,1) in 3 (r) (4,2,2,1) in 5 (s) (5,5,5) in 6• Speed Table: Average time of RANSAC and RGPCA with 24% outliers. Arrangement (2,2,1) in 3 (4,2,2,1) in 5 (5,5,5) in 6 RANSAC 44s 5.1min 3.4min MVT 46s 23min 8min Influence 3min 58min 146min
41.
Summary• GPCA: algorithm for clustering subspaces – Deals with unknown and possibly different dimensions – Deals with arbitrary intersections among the subspaces• Our approach is based on – Projecting data onto a low-dimensional subspace – Fitting polynomials to projected subspaces – Differentiating polynomials to obtain a basis• Applications in image processing and computer vision – Image segmentation: intensity and texture – Image compression – Face recognition under varying illumination
42.
For more information, Vision, Dynamics and Learning Lab @ Johns Hopkins University Thank You!
43.
Generalized Principal Component Analysis via Lossy Coding and Compression Yi Ma Image Formation & Processing Group, Beckman Decision & Control Group, Coordinated Science Lab. Electrical & Computer Engineering Department University of Illinois at Urbana-Champaign
44.
OUTLINEMOTIVATIONPROBLEM FORMULATION AND EXISTING APPROACHESSEGMENTATION VIA LOSSY DATA COMPRESSIONSIMULATIONS (AND EXPERIMENTS)CONCLUSIONS AND FUTURE DIRECTIONS
45.
MOTIVATION – Motion Segmentation in Computer Vision Goal: Given a sequence of images of multiple moving objects, determine: – 1. the number and types of motions (rigid-body, affine, linear, etc.) 2. the features that belong to the same motion. QuickTime™ and a Cinepak decompressorare needed to see this picture. The “chicken-and-egg” difficulty: – Knowing the segmentation, estimating the motions is easy; – Knowing the motions, segmenting the features is easy. A Unified Algebraic Approach to 2D and 3D Motion Segmentation, [Vidal-Ma, ECCV’
46.
MOTIVATION – Image SegmentationGoal: segment an image into multiple regions with homogeneous texture. feature s Computer Human Difficulty: A mixture of models of different dimensions or complexities.Multiscale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP’
47.
MOTIVATION – Video Segmentation Goal: segmenting a video sequence into segments with “stationary” dynamics Model: different segments as outputs from different (linear) dynamical systems: QuickTime™ and a H.264 decompressor are needed to see this picture.Identification of Hybrid Linear Systems via Subspace Segmentation, [Huang-Wagner-Ma, C
48.
MOTIVATION – Massive Multivariate Mixed Data QuickTime™ and a BMP decompressor are needed to see this picture.Face database Hyperspectral images Articulate motions Hand written digits Microarrays
49.
SUBSPACE SEGMENTATION – Problem FormulationAssumption: the data are noisy samples from anarrangement of linear subspaces: noise-free samples noisy samples samples with outliers Difficulties: – the dimensions of the subspaces can be different – the data can be corrupted by noise or contaminated by outliers – the number and dimensions of subspaces may be unknown
50.
SUBSPACE SEGMENTATION – Statistical Approaches Assume that the data are i.i.d. samples from a mixture of probabilistic distributions: Solutions: • Expectation Maximization (EM) for the maximum-likelihood estimate [Dempster et. al.’77], e.g., Probabilistic PCA [Tipping-Bishop’99]: • K-Means for a minimax-like estimate [Forgy’65, Jancey’66, MacQueen’67], e.g., K-Subspaces [Ho and Kriegman’03]: Essentially iterate between data segmentation and model estimation.
51.
SUBSPACE SEGMENTATION – An Algebro-Geometric ApproachIdea: a union of linear subspaces is analgebraic set -- the zero set of a set of(homogeneous) polynomials:Solution:• Identify the set of polynomials of degree n that vanish on• Gradients of the vanishing polynomials are normals to the subspaces Complexity exponential in the dimension and number of subspaces.Generalized Principal Component Analysis, [Vidal-Ma-Sastry, IEEE Transactions PAMI’0
52.
SUBSPACE SEGMENTATION – An Information-Theoretic ApproachProblem: If the number/dimension of subspaces not given and data corruptedby noise and outliers, how to determine the optimal subspaces that fitSolutions: Model Selection Criteria? the data? – Minimum message length (MML) [Wallace-Boulton’68] – Minimum description length (MDL) [Rissanen’78] – Bayesian information criterion (BIC) – Akaike information criterion (AIC) [Akaike’77] – Geometric AIC [Kanatani’03], Robust AIC [Torr’98] Key idea (MDL): • a good balance between model complexity and data fidelity. • minimize the length of codes that describe the model and the data: with a quantization error optimal for the model.
53.
LOSSY DATA COMPRESSIONQuestions: – What is the “gain” or “loss” of segmenting or merging data? – How does tolerance of error affect segmentation results? Basic idea: whether the number of bits required to store “the whole is more than the sum of its parts”?
54.
LOSSY DATA COMPRESSION – Problem Formulation– A coding scheme maps a set of vectors to a sequence of bits, from which we can decode The coding length is denoted as:– Given a set of real-valued mixed data the optimal segmentationminimizes the overall coding length: where
55.
LOSSY DATA COMPRESSION – Coding Length for Multivariate DataTheorem.Given withis the number of bits needed to encode the data s.t.. A nearly optimal bound for even a small number of vectors drawn from a subspace or a Gaussian source. Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’
56.
LOSSY DATA COMPRESSION – Two Coding SchemesGoal: code s.t. a mean squared error Linear subspace Gaussian source
57.
LOSSY DATA COMPRESSION – Properties of the Coding Length1. Commutative Property: For high-dimensional data, computing the coding length only needs the kernel matrix:2. Asymptotic Property: At high SNR, this is the optimal rate distortion for a Gaussian source.3. Invariant Property: Harmonic Analysis is useful for data compression only when the data are non-Gaussian or nonlinear ……… so is segmentation!
58.
LOSSY DATA COMPRESSION – Why Segment? partitioning: sifting:
59.
LOSSY DATA COMPRESSION – Probabilistic Segmentation?Assign the ith point to the jth group with probabilityTheorem. The expected coding length of the segmented datais a concave function in Π over the domain of a convex polytope. Minima are reached at the vertexes of the polytope -- no probabilistic segmentation! Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’
60.
LOSSY DATA COMPRESSION – Segmentation & Channel CapacityA MIMO additive white Gaussian noise (AWGN) channelhas the capacity:If allowing probabilistic grouping of transmitters, the expectedcapacityis a concave function in Π over a convex polytope.Maximizing such a capacity is a convexproblem. On Coding and Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI
61.
LOSSY DATA COMPRESSION – A Greedy (Agglomerative) AlgorithmObjective: minimizing the overall coding lengthInput: “Bottom-up” mergewhile true do choose two sets suchthatis minimal QuickTime™ and a if PNG decompressor are needed to see this picture. then else break endifendOutput: Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
62.
SIMULATIONS – Mixture of Almost Degenerate Gaussians Noisy samples from two lines and one plane in <3 Given Data Segmentation Resultsε0 = 0.01ε0 = 0.08 Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
63.
SIMULATIONS – “Phase Transition” #group v.s. distortion Rate v.s. distortion ε0 = 0.0 0.08 8 ice cubessteam Stability: the same segmentation water for ε across 3 magnitudes! 0.08 Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
64.
SIMULATIONS – Comparison with EM100 x d uniformly distributed random samples from each subspace, corruptewith 4% noise. Classification rate averaged over 25 trials for each case. Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
65.
SIMULATIONS – Comparison with EMSegmenting three degenerate or non-degenerate Gaussian clusters for 50 tria Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
66.
SIMULATIONS – Robustness with Outliers 35.8% outliers 45.6% 71.5% 73.6% Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
67.
SIMULATIONS – Affine Subspaces with Outliers 35.8% outliers 45.6% 66.2% 69.1% Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
68.
SIMULATIONS – Piecewise-Linear Approximation of Manifolds Swiss roll Mobius strip Torus Klein bottle
69.
SIMULATIONS – Summary– The minimum coding length objective automatically addressesthe model selection issue: the optimal solution is very stable androbust.– The segmentation/merging is physically meaningful (measuredin bits). The results resemble phase transition in statistical physics.– The greedy algorithm is scalable (polynomial in both K and N)and converges well when ε is not too small w.r.t. the sampledensity.
70.
Clustering from a Classification PerspectiveAssumption: The training dataare drawn from a distributionGoal: Construct a classifiersuch that the misclassificationerrorreaches minimum.Solution: Knowing the distributions and, the optimal classifier is the maximum a posteriori (MAP)classifier:Difficulties: How to learn the two distributions from samples?(parametric, non-parametric, model selection, high-dimension, outliers…)
71.
MINIMUM INCREMENTAL CODING LENGTH – Problem FormulationIdeas: Using the lossy coding lengthas a surrogate for the Shannon lossless coding length w.r.t. truedistributions.Additional bits need to encode the testsample with the jth training set isClassification Criterion: Minimum Incremental Coding Length(MICL)
72.
MICL (“Michael”) – Asymptotic PropertiesTheorem: As the number of samples goes to infinity, the MICLcriterion converges with probability one to the following criterion:where ? is the “number of effectiveparameters” of the j-th model (class).Theorem: The MICL classifier converges to the above asymptotic format the rate of for some constant . Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]
73.
SIMULATIONS – Interpolation and Extrapolation via MICL MICL SVM k-NN Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]
74.
SIMULATIONS – Improvement over MAP and RDA [Friedman1989]Two Gaussians inR2 isotropic (left) anisotropic(right)(500 trials)Three Gaussians inRn dim = n dim = n/2 dim = 1(500 trials) Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]
75.
SIMULATIONS – Local and Kernel MICLLocal MICL (LMICL): Applying MICL locally to the k-nearestneighbors of the test sample (frequencylist + Bayesianist).Kernel MICL (KMICL): Incorporating MICL with a nonlinear kernelnaturally through the identity (“kernelized” RDA): LMICL k- KMICL-RBF SVM-RBF NN Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]
76.
CONCLUSIONS Assumptions: Data are in a high-dimensional space but have low-dimensional structures (subspaces or submanifolds). Compression => Clustering & Classification: – Minimum (incremental) coding length subject to distortion. – Asymptotically optimal clustering and classification. – Greedy clustering algorithm (bottom-up, agglomerative). – MICL corroborates MAP, RDA, k-NN, and kernel methods. Applications (Next Lectures): – Video segmentation, motion segmentation (Vidal) – Image representation & segmentation (Ma) – Others: microarray clustering, recognition of faces and handwritten digits (Ma)
77.
FUTURE DIRECTIONS Theory – More complex structures: manifolds, systems, random fields… – Regularization (ridge, lasso, banding etc.) – Sparse representation and subspace arrangements Computation – Global optimality (random techniques, convex optimization…) – Scalability: random sampling, approximation… Future Application Domains – Image/video/audio classification, indexing, and retrieval – Hyper-spectral images and videos – Biomedical images, microarrays – Autonomous navigation, surveillance, and 3D mapping – Identification of hybrid linear/nonlinear systems
78.
REFERENCES & ACKNOWLEGMENT References: – Segmentation of Multivariate Mixed Data via Lossy Data Compression, Yi Ma, Harm Derksen, Wei Hong, John Wright, PAMI, 2007. – Classification via Minimum Incremental Coding Length (MICL), John Wright et. al., NIPS, 2007. – Website: http://perception.csl.uiuc.edu/coding/home.htm People: – John Wright, PhD Student, ECE Department, University of Illinois – Prof. Harm Derksen, Mathematics Department, University of Michigan – Allen Yang (UC Berkeley) and Wei Hong (Texas Instruments R&D) – Zhoucheng Lin and Harry Shum, Microsoft Research Asia, China Funding: – ONR YIP N00014-05-1-0633 – NSF CAREER IIS-0347456, CCF-TF-0514955, CRS-EHS-0509151
79.
11/2003 “The whole is more than the sum of its parts.” -- Aristotle Questions, please?Yi Ma, CVPR 2008
80.
Part IIApplications of GPCA in Computer Vision René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University
81.
Part II: Applications in computer vision • Applications to motion & video segmentation (10.30-11.15) – 2-D and 3-D motion segmentation – Temporal video segmentation – Dynamic texture segmentation • Applications to image representation and segmentation (11.15-12.00) – Multi-scale hybrid linear models for sparse image representation – Hybrid linear models for image segmentation
82.
Applications to motion and and video segmentation René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University
83.
3-D motion segmentation problem • Given a set of point correspondences in multiple views, determine – Number of motion models – Motion model: affine, homography, fundamental matrix, trifocal tensor – Segmentation: model to which each pixel belongs • Mathematics of the problem depends on – Number of frames (2, 3, multiple) – Projection model (affine, perspective) – Motion model (affine, translational, homography, fundamental matrix, etc.) – 3-D structure (planar or not)
84.
Taxonomy of problems• 2-D Layered representation – Probabilistic approaches: Jepson-Black’93, Ayer-Sawhney’95, Darrel-Pentland’95, Weiss- Adelson’96, Weiss’97, Torr-Szeliski-Anandan’99 – Variational approaches: Cremers-Soatto ICCV’03 – Initialization: Wang-Adelson’94, Irani-Peleg’92, Shi-Malik‘98, Vidal-Singaraju’05-’06• Multiple rigid motions in two perspective views – Probabilistic approaches: Feng-Perona’98, Torr’98 – Particular cases: Izawa-Mase’92, Shashua-Levin’01, Sturm’02, – Multibody fundamental matrix: Wolf-Shashua CVPR’01, Vidal et al. ECCV’02, CVPR’03, IJCV’06 – Motions of different types: Vidal-Ma-ECCV’04, Rao-Ma-ICCV’05• Multiple rigid motions in three perspective views – Multibody trifocal tensor: Hartley-Vidal-CVPR’04• Multiple rigid motions in multiple affine views – Factorization-based: Costeira-Kanade’98, Gear’98, Wu et al.’01, Kanatani’ et al.’01-02-04 – Algebraic: Yan-Pollefeys-ECCV’06, Vidal-Hartley-CVPR’04• Multiple rigid motions in multiple perspective views – Schindler et al. ECCV’06, Li et al. CVPR’07
85.
A unified approach to motion segmentation • Estimation of multiple motion models equivalent to estimation of one multibody motion model chicken-and-egg – Eliminate feature clustering: multiplication – Estimate a single multibody motion model: polynomial fitting – Segment multibody motion model: polynomial differentiation
86.
A unified approach to motion segmentation • Applies to most motion models in computer vision • All motion models can be segmented algebraically by – Fitting multibody model: real or complex polynomial to all data – Fitting individual model: differentiate polynomial at a data point
87.
Segmentation of 3-D translational motions • Multiple epipoles (translation) • Epipolar constraint: plane in – Plane normal = epipoles – Data = epipolar lines • Multibody epipolar constraint • Epipoles are derivatives of at epipolar lines
89.
Single-body factorization Structure = 3D surface • Affine camera model – p = point – f = frame Motion = camera position and orientation • Motion of one rigid-body lives in a 4-D subspace (Boult and Brown ’91, Tomasi and Kanade ‘92) – P = #points – F = #frames
90.
Multi-body factorization • Given n rigid motions • Motion segmentation is obtained from – Leading singular vector of (Boult and Brown ’91) – Shape interaction matrix (Costeira & Kanade ’95, Gear ’94) – Number of motions (if fully-dimensional) • Motion subspaces need to be independent (Kanatani ’01)
91.
Multi-body factorization • Sensitive to noise – Kanatani (ICCV ’01): use model selection to scale Q – Wu et al. (CVPR’01): project data onto subspaces and iterate • Fails with partially dependent motions – Zelnik-Manor and Irani (CVPR’03) • Build similarity matrix from normalized Q • Apply spectral clustering to similarity matrix – Yan and Pollefeys (ECCV’06) • Local subspace estimation + spectral clustering – Kanatani (ECCV’04) • Assume degeneracy is known: pure translation in the image • Segment data by multi-stage optimization (multiple EM problems) • Cannot handle missing data – Gruber and Weiss (CVPR’04) • Expectation Maximization
92.
PowerFactorization+GPCA• A motion segmentation algorithm that – Is provably correct with perfect data – Handles both independent and degenerate motions – Handles both complete and incomplete data• Project trajectories onto a 5-D subspace of – Complete data: PCA or SVD – Incomplete data: PowerFactorization• Cluster projected subspaces using GPCA – Handles both independent and degenerate motions – Non-iterative: can be used to initialize EM
93.
Projection onto a 5-D subspace • Motion of one rigid-body lives in 4-D subspace of Motion 1 • By projecting onto a 5-D subspace of Motion 2 – Number and dimensions of subspaces are preserved – Motion segmentation is equivalent to clustering subspaces of dimension 2, 3 or 4 in – Minimum #frames = 3 (CK needs a minimum of 2n frames for n motions) • What projection to use? – Can remove outliers by robustly – PCA: 5 principal components fitting the 5-D subspace using Robust SVD (DeLaTorre-Black) – RPCA: with outliers
94.
Projection onto a 5-D subspace PowerFactorization algorithm: Given , factor it as • Complete data • Incomplete data – Given A solve for B – Orthonormalize B Linear problem – Given B solve for A – Iterate • It diverges in some cases • Converges to rank-r approximation with rate • Works well with up to 30% of missing data
95.
Motion segmentation using GPCA• Apply polynomial embedding to 5-D points Veronese map
96.
Hopkins 155 motion segmentation database • Collected 155 sequences – 120 with 2 motions – 35 with 3 motions • Types of sequences – Checkerboard sequences: mostly full dimensional and independent motions – Traffic sequences: mostly degenerate (linear, planar) and partially dependent motions – Articulated sequences: mostly full dimensional and partially dependent motions • Point correspondences – In few cases, provided by Kanatani & Pollefeys – In most cases, extracted semi-automatically with OpenCV
99.
Experimental results: missing data sequences • There is no clear correlation between amount of missing data and percentage of misclassification • This could be because convergence of PF depends more on “where” missing data is located than on “how much” missing data there is
100.
Conclusions • For two motions – Algebraic methods (GPCA and LSA) are more accurate than statistical methods (RANSAC and MSL) – LSA performs better on full and independent sequences, while GPCA performs better on degenerate and partially dependent – LSA is sensitive to dimension of projection: d=4n better than d=5 – MSL is very slow, RANSAC and GPCA are fast • For three motions – GPCA is not very accurate, but is very fast – MSL is the most accurate, but it is very slow – LSA is almost as accurate as MSL and almost as fast as GPCA
101.
Segmentation of Dynamic Textures René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University
102.
Modeling a dynamic texture: fixed boundary • Examples of dynamic textures: • Model temporal evolution as the output of a linear dynamical system (LDS): Soatto et al. ‘01 dynamics zt+1 = Azt + vt images yt = Czt + wt appearance
103.
Segmenting non-moving dynamic textures • One dynamic texture lives in the observability subspace zt+1 = Azt + vt yt = Czt + wt • Multiple textures live in multiple subspaces water steam • Cluster the data using GPCA
106.
Level-set intensity-based segmentation • Chan-Vese energy functional • Implicit methods – Represent C as the zero level set of an implicit function , i.e. C = {(x, y) : (x, y) = 0} • Solution – The solution to the gradient descent algorithm for is given by – c1 and c2 are the mean intensities inside and outside the contour C.
107.
Dynamics & intensity-based energy • We represent the intensities of the pixels in the images as the output of a mixture of AR models of order p • We propose the following spatial-temporal extension of the Chan-Vese energy functional where
108.
Variational segmentation of dynamic textures • Given the ARX parameters, we can solve for the implicit function by solving the PDE • Given the implicit function , we can solve for the ARX parameters of the jth region by solving the linear system
109.
Variational segmentation of dynamic textures • Fixed boundary segmentation results and comparison Ocean-smoke Ocean-dynamics Ocean-appearance
110.
Variational segmentation of dynamic textures • Moving boundary segmentation results and comparison Ocean-fire
111.
Variational segmentation of dynamic textures • Results on a real sequence Raccoon on River
112.
Temporal video segmentation • Segmenting N=30 frames of a sequence containing n=3 scenes – Host – Guest – Both • Image intensities are output of linear system dynamics xt+1 = Axt +vt • y =C Apply GPCA totfit n=3 xt +wt images observability subspaces appearance
113.
Temporal video segmentation • Segmenting N=60 frames of a sequence containing n=3 scenes – Burning wheel – Burnt car with people – Burning car • Image intensities are output of linear system dynamics xt+1 = Axt +vt yt = Cxt +wt images • Apply GPCA to fit n=3 appearance observability subspaces
114.
Conclusions • Many problems in computer vision can be posed as subspace clustering problems – Temporal video segmentation – 2-D and 3-D motion segmentation – Dynamic texture segmentation – Nonrigid motion segmentation • These problems can be solved using GPCA: an algorithm for clustering subspaces – Deals with unknown and possibly different dimensions – Deals with arbitrary intersections among the subspaces • GPCA is based on – Projecting data onto a low-dimensional subspace – Recursively fitting polynomials to projected subspaces – Differentiating polynomials to obtain a basis
115.
For more information, Vision, Dynamics and Learning Lab @ Johns Hopkins University Thank You!
116.
Generalized Principal Component Analysisfor Image Representation & Segmentation Yi Ma Control & Decision, Coordinated Science Laboratory Image Formation & Processing Group, Beckman Department of Electrical & Computer Engineering University of Illinois at Urbana-Champaign
117.
INTRODUCTIONGPCA FOR LOSSY IMAGE REPRESENTATIONIMAGE SEGMENTATION VIA LOSSY COMPRESSIONOTHER APPLICATIONSCONCLUSIONS AND FUTURE DIRECTIONS
118.
Introduction – Image Representation via Linear Transformations better representations? pixel-based representation three matrixes of RGB-values a more compact linear transformation representation
120.
IntroductionAdaptive Bases (optimal if imagery data are uni-modal)- Karhunen-Loeve transform (KLT), also known as PCA (Pearson’1901, Hotelling’33, Jolliffe’86) stack adaptive bases
121.
Introduction – Principal Component Analysis (PCA)Dimensionality ReductionFind a low-dimensional representation (model) for high-dimensional data.Principal Component Analysis (Pearson’1901, Hotelling’1933, Eckart &Young’1936) or Karhunen-Loeve transform (KLT). Basis for S SVDVariations of PCA – Nonlinear Kernel PCA (Scholkopf-Smola-Muller’98) – Probabilistic PCA (Tipping-Bishop’99, Collins et.al’01) – Higher-Order SVD (HOSVD) (Tucker’66, Davis’02) – Independent Component Analysis (Hyvarinen-Karhunen-Oja’01)
122.
Hybrid Linear Models – Multi-Modal CharacteristicsDistribution of the first three principal components ofthe Baboon image: A clear multi-modal distribution
124.
Hybrid Linear Models – Versus Linear ModelsA single linear model Linear stackHybrid linear models Hybrid linear stack
125.
Hybrid Linear Models – Characteristics of Natural Images Multivariate Hybrid Hierarchical High-dimensio 1D 2D (multi-modal) (multi-scale) (vector-value Fourier (DCT) X X Wavelets X X Curvelets XRandom fields X X X PCA/KLT X X X VQ X X X XHybrid linear X X X X X We need a new & simple paradigm to effectively account for all these characteristics simultaneously.
126.
Hybrid Linear Models – Subspace Estimation and SegmentationHybrid Linear Models (or SubspaceArrangements) – the number of subspaces is unknown – the dimensions of the subspaces are unknown – the basis of the subspaces are unknown – the segmentation of the data points is unknown “Chicken-and-Egg” Coupling – Given segmentation, estimate subspaces – Given subspaces, segment the data
127.
Hybrid Linear Models – Recursive GPCA (an Example)
128.
Hybrid Linear Models – Effective DimensionModel Selection (for Noisy Data) Model complexity; Data fidelity; Number of subspaces Total Dimension Number of number of of each points in each points subspace subspace Model selection criterion: minimizing effective dimension subject to a given error tolerance (or PSNR)
129.
Hybrid Linear Models – Simulation Results (5% Noise)ED=3ED=2.0067ED=1.6717
130.
Hybrid Linear Models – Subspaces of the Barbara Image
131.
Hybrid Linear Models – Lossy Image Representation (Baboon) GPCA Original PCA (8x8) DCT (JPEG) Harr Wavelet GPCA (8x8)
132.
Multi-Scale Implementation – Algorithm DiagramDiagram for a level-3 implementation of hybrid linear modelsfor image representationMulti-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
133.
Multi-Scale Implementation – The Baboon Image The Baboon image downsample by two twice segmentation of 2 by 2 blocksMulti-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
134.
Multi-Scale Implementation – Comparison with Other Methods The Baboon imageMulti-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
135.
Multi-Scale Implementation – Image ApproximationComparison with level-3 wavelet (7.5% coefficients) Level-3 bior-4.4 wavelets Level-3 hybrid linear model PSNR=23.94 PSNR=24.64Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
136.
Multi-Scale Implementation – Block Size Effect The Baboon image Some problems with the multi-scale hybrid linear model: 1. has minor block effect; 2. is computationally more costly (than Fourier, wavelets, PCA); 3. does not fully exploit spatial smoothness as wavelets.Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
137.
Multi-Scale Implementation – The Wavelet Domain The Baboon image HL LH HH segmentation at each scaleMulti-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
138.
Multi-Scale Implementation – Wavelets v.s. Hybrid Linear Wavelets The Baboon image Advantages of the hybrid linear model in wavelet domain: 1. eliminates block effect; 2. is computationally less costly (than in the spatial domain); 3. achieves higher PSNR.Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
139.
Multi-Scale Implementation – Visual Comparison Comparison among several models (7.5% coefficients) Original Wavelets Image PSNR=23.94Hybrid model Hybrid model in spatial in wavelet domain domainPSNR=24.64 PSNR=24.88 Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
140.
Image Segmentation – via Lossy Data Compression stack QuickTime™ and a PNG decompressor are needed to see this picture.
141.
APPLICATIONS – Texture-Based Image Segmentation Naïve approach: – Take a 7x7 Gaussian window around every pixel. – Stack these windows as vectors. – Clustering the vectors using our algorithm.A few results: Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
142.
APPLICATIONS – Distribution of Texture Features Question: why does such a simple algorithm work at all? Answer: Compression (MDL/MCL) is well suited to mid-level texture segmentation. Using a single representation (e.g. windows, filterbank responses) for textures different complexity ⇒ redundancy and degeneracy, which can be exploited fo clustering / compression. QuickTime™ and a TIFF (LZW) decompressorare needed to see this picture. Above: singular values of feature vectors from two different segments of the image at left.
143.
APPLICATIONS – Compression-based Texture Merging (CTM)Problem with the naïveapproach: QuickTime™ and a TIFF (LZW) decompressor QuickTime™ and a QuickTime™ and a are needed to see this picture.Strong edges, segment boundaries TIFF (LZW) decompressor TIFF (LZW) decompressor are needed to see this picture. are needed to see this picture.Solution:Low-level, edge-preserving over-segmentation into small homogeneousregions.Simple features: stacked Gaussian windows (7x7 in our experiments).Merge adjacent regions to minimize coding length (“compress” the features).
146.
APPLICATIONS – CTM: Quantitative Evaluation and ComparisonBerkeley Image Segmentation Database PRI: Probabilistic Rand Index [Pantofaru 2005] VoI: Variation of Information [Meila 2005] GCE: Global Consistency Error [Martin 2001] BDE: Boundary Displacement Error [Freixenet 2002]Unsupervised Segmentation of Natural Images via Lossy Data Compression, CVIU, 200
147.
Other Applications: Multiple Motion Segmentation (on Hopkins155) QuickTime™ and a QuickTime™ and a Cinepak decompressor Cinepak decompressor are needed to see this picture. are needed to see this picture.Two Motions: MSL 4.14%, LSA 3.45%, ALC 2.40%, and work with up to 25% outliers.Three Motions: MSL 8.32%, LSA 9.73%, ALC 6.26%. Shankar Rao, Roberton Tron, Rene Vidal, and Yi Ma, to appear in CVPR’08
148.
Other Applications – Clustering of Microarray Data Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’
149.
Other Applications – Clustering of Microarray Data Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’
150.
Other Applications – Supervised Classification Premises: Data lie on an arrangement of subspaces Unsupervised Clustering Supervised Classification – Generalized PCA – Sparse Representation
151.
Other Applications – Robust Face Recognition Robust Face Recognition via Sparse Representation, to appear in PAMI 2008
152.
Other Applications: Robust Motion Segmentation (on Hopkins155) Dealing with incomplete or mistracked features with dataset 80% corrupted! Shankar Rao, Roberto Tron, Rene Vidal, and Yi Ma, to appear in CVPR’08
153.
Three Measures of Sparsity: Bits, L_0 and L1-NormReason: High-dimensional data, like images, do have compact,compressible, sparse structures, in terms of their geometry,statistics, and semantics.
154.
Conclusions Most imagery data are high-dimensional, statistically or geometrically heterogeneous, and have multi-scale structures. Imagery data require hybrid models that can adaptively represent different subsets of the data with different (sparse) linear models. Mathematically, it is possible to estimate and segment hybrid (linear) models non-iteratively. GPCA offers one such method. Hybrid models lead to new paradigms, new principles, and new applications for image representation, compression, and segmentation.
155.
Future Directions Mathematical Theory – Subspace arrangements (algebraic properties). – Extension of GPCA to more complex algebraic varieties (e.g., hybrid multilinear, high-order tensors). – Representation & approximation of vector-valued functions. Computation & Algorithm Development – Efficiency, noise sensitivity, outlier elimination. – Other ways to combine with wavelets and curvelets. Applications to Other Data – Medical imaging (ultra-sonic, MRI, diffusion tensor…) – Satellite hyper-spectral imaging. – Audio, video, faces, and digits. – Sensor networks (location, temperature, pressure, RFID…) – Bioinformatics (gene expression data…)
156.
AcknowledgementPeople – Wei Hong, Allen Yang, John Wright, University of Illinois – Rene Vidal of Biomedical Engineering Dept., Johns Hopkins University – Kun Huang of Biomedical & Informatics Science Dept., Ohio- State UniversityFunding – Research Board, University of Illinois at Urbana-Champaign – National Science Foundation (NSF CAREER IIS-0347456) – Office of Naval Research (ONR YIP N000140510633) – National Science Foundation (NSF CRS-EHS0509151) – National Science Foundation (NSF CCF-TF0514955)
157.
Generalized Principal Component Analysis:Modeling and Segmentation of Multivariate MixedDataRene Vidal, Yi Ma, and Shankar SastrySpringer-Verlag, to appear Thank You!Yi Ma, CVPR 2008
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment