The document describes a tutorial on generalized principal component analysis (GPCA) which presents a method for clustering data points that lie in multiple subspaces of unknown and possibly different dimensions. The tutorial covers the basic theory and algorithms of GPCA, including applications to problems in computer vision such as image and video segmentation, motion segmentation, and face recognition. Advanced statistical methods for robust GPCA in the presence of noise and outliers are also discussed.
Explores the type of structure learned by Convolutional Neural Networks, the applications where they're most valuable and a number of appropriate mental models for understanding deep learning.
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
Explores the type of structure learned by Convolutional Neural Networks, the applications where they're most valuable and a number of appropriate mental models for understanding deep learning.
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
TensorFlow Korea 논문읽기모임 PR12 243째 논문 review입니다
이번 논문은 RegNet으로 알려진 Facebook AI Research의 Designing Network Design Spaces 입니다.
CNN을 디자인할 때, bottleneck layer는 정말 좋을까요? layer 수는 많을 수록 높은 성능을 낼까요? activation map의 width, height를 절반으로 줄일 때(stride 2 혹은 pooling), channel을 2배로 늘려주는데 이게 최선일까요? 혹시 bottleneck layer가 없는 게 더 좋지는 않은지, 최고 성능을 내는 layer 수에 magic number가 있는 건 아닐지, activation이 절반으로 줄어들 때 channel을 2배가 아니라 3배로 늘리는 게 더 좋은건 아닌지?
이 논문에서는 하나의 neural network을 잘 design하는 것이 아니라 Auto ML과 같은 기술로 좋은 neural network을 찾을 수 있는 즉 좋은 neural network들이 살고 있는 좋은 design space를 design하는 방법에 대해서 얘기하고 있습니다. constraint이 거의 없는 design space에서 human-in-the-loop을 통해 좋은 design space로 그 공간을 좁혀나가는 방법을 제안하였는데요, EfficientNet보다 더 좋은 성능을 보여주는 RegNet은 어떤 design space에서 탄생하였는지 그리고 그 과정에서 우리가 당연하게 여기고 있었던 design choice들이 잘못된 부분은 없었는지 아래 동영상에서 확인하실 수 있습니다~
영상링크: https://youtu.be/bnbKQRae_u4
논문링크: https://arxiv.org/abs/2003.13678
In this presentation we discuss the convolution operation, the architecture of a convolution neural network, different layers such as pooling etc. This presentation draws heavily from A Karpathy's Stanford Course CS 231n
Computer apparition plays the most important role in human perception, which is limited to only the visual band of the electromagnetic spectrum. The need for Radar imaging systems, to recover some sources that
are not within human visual band, is raised. This paper present new algorithm for Synthetic Aperture Radar (SAR) images segmentation based on thresholding technique. Entropy based image thresholding has
received sustainable interest in recent years. It is an important concept in the area of image processing.
Pal (1996) proposed a cross entropy thresholding method based on Gaussian distribution for bi-modal images. Our method is derived from Pal method that segment images using cross entropy thresholding based on Gamma distribution and can handle bi-modal and multimodal images. Our method is tested using
Synthetic Aperture Radar (SAR) images and it gave good results for bi-modal and multimodal images. The
results obtained are encouraging.
Image generation. Gaussian models for human faces, limits and relations with linear neural networks. Generative adversarial networks (GANs), generators, discrinators, adversarial loss and two player games. Convolutional GAN and image arithmetic. Super-resolution. Nearest-neighbor, bilinear and bicubic interpolation. Image sharpening. Linear inverse problems, Tikhonov and Total-Variation regularization. Super-Resolution CNN, VDSR, Fast SRCNN, SRGAN, perceptual, adversarial and content losses. Style transfer: Gatys model, content loss and style loss.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa
4.6.16 AI&BigData Lab
Upcoming events: goo.gl/I2gJ4H
Поговорим об одной из базовых практических техник обучения нейронных сетей - предобучение, finetuning, transfer learning. В каких случаях применять, какие модели использовать, где их брать и как адаптировать.
Blind Source Separation using Dictionary LearningDavide Nardone
The sparse decomposition of images and signals found great use in the field of: Compression, Noise removal and also in the Sources separation. This implies the decomposition of signals in the form of linear combinations with some elements of a redundant dictionary. The dictionary may be either a fixed dictionary (Fourier, Wavelet, etc) or may be learned from a set of samples. The algorithms based on learning the dictionary can be applied to a broad class of signals and have a better compression performance than methods based on fixed dictionary. Here we present a Compressed Sensing (CS) approach with an adaptive dictionary for solving a Determined Blind Source Separation (DBSS). The proposed method has been developed by reformulating a DBSS as Sparse Coding (SC) problem. The algorithm consist of few steps: Mixing matrix estimation, Sparse source separation and Source reconstruction. A sparse mixture of the original source signals has been used for the estimating the mixing matrix which have been used for the reconstruction of the of the source signals. A 'block signal representation' is used for representing the mixture in order to greatly improve the computation efficiency of the 'mixing matrix estimation' and the 'signal recovery' processes without particularly lose separation accuracy. Some experimental results are provided to compare the computation and separation performance of the method by varying the type of the dictionary used, be it fixed or an adaptive one. Finally a real case of study in the field of the Wireless Sensor Network (WSN) is illustrated in which a set of sensor nodes relay data to a multi-receiver node. Since more nodes transmits messages simultaneously it's necessary to separate the mixture of information at the receiver, thus solving a BSS problem.
TensorFlow Korea 논문읽기모임 PR12 243째 논문 review입니다
이번 논문은 RegNet으로 알려진 Facebook AI Research의 Designing Network Design Spaces 입니다.
CNN을 디자인할 때, bottleneck layer는 정말 좋을까요? layer 수는 많을 수록 높은 성능을 낼까요? activation map의 width, height를 절반으로 줄일 때(stride 2 혹은 pooling), channel을 2배로 늘려주는데 이게 최선일까요? 혹시 bottleneck layer가 없는 게 더 좋지는 않은지, 최고 성능을 내는 layer 수에 magic number가 있는 건 아닐지, activation이 절반으로 줄어들 때 channel을 2배가 아니라 3배로 늘리는 게 더 좋은건 아닌지?
이 논문에서는 하나의 neural network을 잘 design하는 것이 아니라 Auto ML과 같은 기술로 좋은 neural network을 찾을 수 있는 즉 좋은 neural network들이 살고 있는 좋은 design space를 design하는 방법에 대해서 얘기하고 있습니다. constraint이 거의 없는 design space에서 human-in-the-loop을 통해 좋은 design space로 그 공간을 좁혀나가는 방법을 제안하였는데요, EfficientNet보다 더 좋은 성능을 보여주는 RegNet은 어떤 design space에서 탄생하였는지 그리고 그 과정에서 우리가 당연하게 여기고 있었던 design choice들이 잘못된 부분은 없었는지 아래 동영상에서 확인하실 수 있습니다~
영상링크: https://youtu.be/bnbKQRae_u4
논문링크: https://arxiv.org/abs/2003.13678
In this presentation we discuss the convolution operation, the architecture of a convolution neural network, different layers such as pooling etc. This presentation draws heavily from A Karpathy's Stanford Course CS 231n
Computer apparition plays the most important role in human perception, which is limited to only the visual band of the electromagnetic spectrum. The need for Radar imaging systems, to recover some sources that
are not within human visual band, is raised. This paper present new algorithm for Synthetic Aperture Radar (SAR) images segmentation based on thresholding technique. Entropy based image thresholding has
received sustainable interest in recent years. It is an important concept in the area of image processing.
Pal (1996) proposed a cross entropy thresholding method based on Gaussian distribution for bi-modal images. Our method is derived from Pal method that segment images using cross entropy thresholding based on Gamma distribution and can handle bi-modal and multimodal images. Our method is tested using
Synthetic Aperture Radar (SAR) images and it gave good results for bi-modal and multimodal images. The
results obtained are encouraging.
Image generation. Gaussian models for human faces, limits and relations with linear neural networks. Generative adversarial networks (GANs), generators, discrinators, adversarial loss and two player games. Convolutional GAN and image arithmetic. Super-resolution. Nearest-neighbor, bilinear and bicubic interpolation. Image sharpening. Linear inverse problems, Tikhonov and Total-Variation regularization. Super-Resolution CNN, VDSR, Fast SRCNN, SRGAN, perceptual, adversarial and content losses. Style transfer: Gatys model, content loss and style loss.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa
4.6.16 AI&BigData Lab
Upcoming events: goo.gl/I2gJ4H
Поговорим об одной из базовых практических техник обучения нейронных сетей - предобучение, finetuning, transfer learning. В каких случаях применять, какие модели использовать, где их брать и как адаптировать.
Blind Source Separation using Dictionary LearningDavide Nardone
The sparse decomposition of images and signals found great use in the field of: Compression, Noise removal and also in the Sources separation. This implies the decomposition of signals in the form of linear combinations with some elements of a redundant dictionary. The dictionary may be either a fixed dictionary (Fourier, Wavelet, etc) or may be learned from a set of samples. The algorithms based on learning the dictionary can be applied to a broad class of signals and have a better compression performance than methods based on fixed dictionary. Here we present a Compressed Sensing (CS) approach with an adaptive dictionary for solving a Determined Blind Source Separation (DBSS). The proposed method has been developed by reformulating a DBSS as Sparse Coding (SC) problem. The algorithm consist of few steps: Mixing matrix estimation, Sparse source separation and Source reconstruction. A sparse mixture of the original source signals has been used for the estimating the mixing matrix which have been used for the reconstruction of the of the source signals. A 'block signal representation' is used for representing the mixture in order to greatly improve the computation efficiency of the 'mixing matrix estimation' and the 'signal recovery' processes without particularly lose separation accuracy. Some experimental results are provided to compare the computation and separation performance of the method by varying the type of the dictionary used, be it fixed or an adaptive one. Finally a real case of study in the field of the Wireless Sensor Network (WSN) is illustrated in which a set of sensor nodes relay data to a multi-receiver node. Since more nodes transmits messages simultaneously it's necessary to separate the mixture of information at the receiver, thus solving a BSS problem.
Aggelos Katsaggelos, Professor and AT&T Chair, Northwestern University, Department of Electrical Engineering & Computer Science (IEEE/ SPIE Fellow, IEEE SPS DL), Sparse and Redundant Representations: Theory and Applications
Computer vision has been studied for more than 40 years. Due to the increasingly diverse and rapidly developed topics in vision and the related fields (e.g., machine learning, signal processing, cognitive science), the tasks to come up with new research ideas are usually daunting for junior graduate students in this field. In this talk, I will present five methods to come up with new research ideas. For each method, I will give several examples (i.e., existing works in the literature) to illustrate how the method works in practice.
This is a common sense talk and will not have complicated math equations and theories.
Note: The content of this talk is inspired by "Raskar Idea Hexagon" - Prof. Ramesh Raskar's talk on "How to come up with new Ideas".
To download the presentation slide with videos, please visit
http://jbhuang0604.blogspot.com/2010/05/how-to-come-up-with-new-research-ideas.html
For the video lecture (in Chinese), please visit
http://jbhuang0604.blogspot.com/2010/06/blog-post_14.html
ODSC India 2018: Topological space creation & Clustering at BigData scaleKuldeep Jiwani
Every data has an inherent natural geometry associated with it. We are generally influenced by how the world visually appears to us and apply the same flat Euclidean geometry to data. The data geometry could be curved, may have holes, distances cannot be defined in all cases. But if we still impose Euclidean geometry on it, then we may be distorting the data space and also destroying the information content inside it.
In the space of BigData world we have to regularly handle TBs of data and extract meaningful information from it. We have to apply many Unsupervised Machine Learning techniques to extract such information from the data. Two important steps in this process is building a topological space that captures the natural geometry of the data and then clustering in that topological space to obtain meaningful clusters.
This talk will walk through "Data Geometry" discovery techniques, first analytically and then via applied Machine learning methods. So that the listeners can take back, hands on techniques of discovering the real geometry of the data. The attendees will be presented with various BigData techniques along with showcasing Apache Spark code on how to build data geometry over massive data lakes.
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
Abstract (Eng/Kor):
Image restoration (IR) is one of the fundamental problems, which includes denoising, deblurring, super-resolution, etc. Among those, in today's talk, I will more focus on the super-resolution task. There are two main streams in the super-resolution studies; a traditional model-based optimization and a discriminative learning method. I will present the pros and cons of both methods and their recent developments in the research field. Finally, I will provide a mathematical view that explains both methods in a single holistic framework, while achieving the best of both worlds. The last slide summarizes the remaining problems that are yet to be solved in the field.
영상 복원(Image restoration, IR)은 low-level vision에서 매우 중요하게 다루는 근본적인 문제 중 하나로서 denoising, deblurring, super-resolution 등의 다양한 영상 처리 문제를 포괄합니다. 오늘 발표에서는 영상 복원 분야 중에서도 super-resolution 문제에 대해 집중적으로 다루겠습니다. 전통적인 model-based optimization 방식과 deep learning을 적용하여 문제를 푸는 방식에 대해, 각각의 장단점과 최신 연구 발전 흐름을 소개하겠습니다. 마지막으로는 이 둘을 하나로 잇는 통일된 관점을 제시하고 관련 연구들 살펴본 후, super-resolution 분야에서 아직 남아있는 문제점들을 정리하겠습니다.
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
Supervised ML technique, K-Nearest Neighbor and Unsupervised Clustering techniques are learnt in this lesson. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
20180526@Taiwan AI Academy, Professional Managers Class.
Covering important concepts of classical machine learning, in preparation for deep learning topics to follow. Topics include regression (linear, polynomial, gaussian and sigmoid basis functions), dimension reduction (PCA, LDA, ISOMAP), clustering (K-means, GMM, Mean-Shift, DBSCAN, Spectral Clustering), classification (Naive Bayes, Logistic Regression, SVM, kNN, Decision Tree, Classifier Ensembles, Bagging, Boosting, Adaboost) and Semi-Supervised learning techniques. Emphasis on sampling, probability, curse of dimensionality, decision theory and classifier generalizability.
Multi-class Classification on Riemannian Manifolds for Video SurveillanceDiego Tosato
In video surveillance, classification of visual data can be very hard due to the scarce resolution and the noise characterizing the sensors data. In this paper, we propose a novel feature, the ARray of COvariances (ARCO), and a multi-class classification framework operating on Riemannian manifolds. ARCO is composed by a structure of covariance matrices of image features, able to extract information from data at prohibitive low resolutions. The proposed classification framework consists in instantiating a new multi-class boosting method, working on the manifoldof symmetric positive definite d×d (covariance) matrices. As practical applications, we consider different surveillance tasks, such as head pose classification and pedestrian detection, providing novel state-of-the-art performances on standard datasets.
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
Data is increasing day by day and so is the cost of data storage and handling. However, by understanding the concepts of machine learning one can easily handle the excessive data and can process it in an affordable manner.
The process includes making models by using several kinds of algorithms. If the model is created precisely for certain task, then the organizations have a very wide chance of making use of profitable opportunities and avoiding the risks lurking behind the scenes.
Learn more about:
» Understanding Machine Learning Objectives.
» Data dimensions in Machine Learning.
» Fundamentals of Algorithms and Mapping from Input/Output.
» Parametric and Non-parametric Machine Learning Algorithms.
» Supervised, Unsupervised and Semi-Supervised Learning.
» Estimating Over-fitting and Under-fitting.
» Use Cases.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Azure Interview Questions and Answers PDF By ScholarHat
CVPR2008 tutorial generalized pca
1. Generalized Principal Component Analysis
Tutorial @ CVPR 2008
Yi Ma René Vidal
ECE Department Center for Imaging Science
University of Illinois Institute for Computational Medicine
Urbana Champaign Johns Hopkins University
2. Data segmentation and clustering
• Given a set of points, separate them into multiple groups
• Discriminative methods: learn boundary
• Generative methods: learn mixture model, using, e.g.
Expectation Maximization
3. Dimensionality reduction and clustering
• In many problems data is high-dimensional: can reduce
dimensionality using, e.g. Principal Component Analysis
• Image compression
• Recognition
– Faces (Eigenfaces)
• Image segmentation
– Intensity (black-white)
– Texture
4. Segmentation problems in dynamic vision
• Segmentation of video and dynamic textures
• Segmentation of rigid-body motions
5. Segmentation problems in dynamic vision
• Segmentation of rigid-body motions from dynamic textures
6. Clustering data on non Euclidean spaces
• Clustering data on non Euclidean spaces
– Mixtures of linear spaces
– Mixtures of algebraic varieties
– Mixtures of Lie groups
• “Chicken-and-egg” problems
– Given segmentation, estimate models
– Given models, segment the data
– Initialization?
• Need to combine
– Algebra/geometry, dynamics and statistics
7. Outline of the tutorial
• Introduction (8.00-8.15)
• Part I: Theory (8.15-9.45)
– Basic GPCA theory and algorithms (8.15-9.00)
– Advanced statistical methods for GPCA (9.00-9.45)
• Questions (9.45-10.00)
• Break (10.00-10.30)
• Part II: Applications (10.30-12.00)
– Applications to motion and video segmentation (10.30-11.15)
– Applications to image representation & segmentation (11.15-12.00)
• Questions (12.00-12.15)
8. Part I: Theory
• Introduction to GPCA (8.00-8.15)
• Basic GPCA theory and algorithms (8.15-9.00)
– Review of PCA and extensions
– Introductory cases: line, plane and hyperplane segmentation
– Segmentation of a known number of subspaces
– Segmentation of an unknown number of subspaces
• Advanced statistical and methods for GPCA (9.00-9.45)
– Lossy coding of samples from a subspace
– Minimum coding length principle for data segmentation
– Agglomerative lossy coding for subspace clustering
9. Part II: Applications in computer vision
• Applications to motion & video segmentation (10.30-11.15)
– 2-D and 3-D motion segmentation
– Temporal video segmentation
– Dynamic texture segmentation
• Applications to image representation and segmentation
(11.15-12.00)
– Multi-scale hybrid linear models for sparse
image representation
– Hybrid linear models for image segmentation
12. Part I
Generalized Principal Component Analysis
René Vidal
Center for Imaging Science
Institute for Computational Medicine
Johns Hopkins University
13. Principal Component Analysis (PCA)
• Given a set of points x1, x2, …, xN
– Geometric PCA: find a subspace S passing through them
– Statistical PCA: find projection directions that maximize the variance
• Solution (Beltrami’1873, Jordan’1874, Hotelling’33, Eckart-Householder-Young’36)
Basis for S
• Applications: data compression, regression, computer
vision (eigenfaces), pattern recognition, genomics
14. Extensions of PCA
• Higher order SVD (Tucker’66, Davis’02)
• Independent Component Analysis (Common ‘94)
• Probabilistic PCA (Tipping-Bishop ’99)
– Identify subspace from noisy data
– Gaussian noise: standard PCA
– Noise in exponential family (Collins et al.’01)
• Nonlinear dimensionality reduction
– Multidimensional scaling (Torgerson’58)
– Locally linear embedding (Roweis-Saul ’00)
– Isomap (Tenenbaum ’00)
• Nonlinear PCA (Scholkopf-Smola-Muller ’98)
– Identify nonlinear manifold by applying PCA to
data embedded in high-dimensional space
• Principal Curves and Principal Geodesic Analysis
(Hastie-Stuetzle’89, Tishbirany ‘92, Fletcher ‘04)
15. Generalized Principal Component Analysis
• Given a set of points lying in multiple subspaces, identify
– The number of subspaces and their dimensions
– A basis for each subspace
– The segmentation of the data points
• “Chicken-and-egg” problem
– Given segmentation, estimate subspaces
– Given subspaces, segment the data
16. Prior work on subspace clustering
• Iterative algorithms:
– K-subspace (Ho et al. ’03),
– RANSAC, subspace selection and growing (Leonardis et al. ’02)
• Probabilistic approaches: learn the parameters of a mixture
model using e.g. EM
– Mixtures of PPCA: (Tipping-Bishop ‘99):
– Multi-Stage Learning (Kanatani’04)
• Initialization
– Geometric approaches: 2 planes in R3 (Shizawa-Maze ’91)
– Factorization approaches: independent subspaces of equal
dimension (Boult-Brown ‘91, Costeira-Kanade ‘98, Kanatani ’01)
– Spectral clustering based approaches: (Yan-Pollefeys’06)
17. Basic ideas behind GPCA
• Towards an analytic solution to subspace clustering
– Can we estimate ALL models simultaneously using ALL data?
– When can we do so analytically? In closed form?
– Is there a formula for the number of models?
• Will consider the most general case
– Subspaces of unknown and possibly different dimensions
– Subspaces may intersect arbitrarily (not only at the origin)
• GPCA is an algebraic geometric approach to data segmentation
– Number of subspaces = degree of a polynomial
– Subspace basis = derivatives of a polynomial
– Subspace clustering is algebraically equivalent to
• Polynomial fitting
• Polynomial differentiation
18. Applications of GPCA in computer vision
• Geometry
– Vanishing points
• Image compression
• Segmentation
– Intensity (black-white)
– Texture
– Motion (2-D, 3-D)
– Video (host-guest)
• Recognition
– Faces (Eigenfaces)
• Man - Woman
– Human Gaits
– Dynamic Textures
• Water-bird
• Biomedical imaging
• Hybrid systems identification
20. Introductory example: algebraic clustering in 1D
• How to compute n, c, b’s?
– Number of clusters
– Cluster centers
– Solution is unique if
– Solution is closed form if
21. Introductory example: algebraic clustering in 2D
• What about dimension 2?
• What about higher dimensions?
– Complex numbers in higher dimensions?
– How to find roots of a polynomial of quaternions?
• Instead
– Project data onto one or two dimensional space
– Apply same algorithm to projected data
22. Representing one subspace
• One plane
• One line
• One subspace can be represented with
– Set of linear equations
– Set of polynomials of degree 1
23. Representing n subspaces
• Two planes
• One plane and one line
– Plane:
– Line:
De Morgan’s rule
• A union of n subspaces can be represented with a set of
homogeneous polynomials of degree n
24. Fitting polynomials to data points
• Polynomials can be written linearly in terms of the vector of coefficients
by using polynomial embedding
Veronese map
• Coefficients of the polynomials can be computed from nullspace of
embedded data
– Solve using least squares
– N = #data points
25. Finding a basis for each subspace
• Case of hyperplanes:
– Only one polynomial
– Number of subspaces
– Basis are normal vectors
Polynomial Factorization (GPCA-PFA) [CVPR 2003]
• Find roots of polynomial of degree in one variable
• Solve linear systems in variables
• Solution obtained in closed form for
• Problems
– Computing roots may be sensitive to noise
– The estimated polynomial may not perfectly factor with noisy
– Cannot be applied to subspaces of different dimensions
• Polynomials are estimated up to change of basis, hence they may not factor,
even with perfect data
26. Finding a basis for each subspace
Polynomial Differentiation (GPCA-PDA) [CVPR’04]
• To learn a mixture of subspaces we just need one positive
example per class
27. Choosing one point per subspace
• With noise and outliers
– Polynomials may not be a perfect union of subspaces
– Normals can estimated correctly by choosing points optimally
• Distance to closest subspace without knowing
segmentation?
28. GPCA for hyperplane segmentation
• Coefficients of the polynomial can be computed from null
space of embedded data matrix
– Solve using least squares
– N = #data points
• Number of subspaces can be computed from the rank of
embedded data matrix
• Normal to the subspaces can be computed
from the derivatives of the polynomial
29. GPCA for subspaces of different dimensions
• There are multiple polynomials
fitting the data
• The derivative of each
polynomial gives a different
normal vector
• Can obtain a basis for the
subspace by applying PCA to
normal vectors
30. GPCA for subspaces of different dimensions
• Apply polynomial embedding to projected data
• Obtain multiple subspace model by polynomial fitting
– Solve to obtain
– Need to know number of subspaces
• Obtain bases & dimensions by polynomial differentiation
• Optimally choose one point per subspace using distance
31. An example
• Given data lying in the union
of the two subspaces
• We can write the union as
• Therefore, the union can be represented with the two
polynomials
32. An example
• Can compute polynomials from
• Can compute normals from
33. Dealing with high-dimensional data
• Minimum number of points
– K = dimension of ambient space
– n = number of subspaces
Subspace 1
• In practice the dimension of
each subspace ki is much
smaller than K Subspace 2
– Number and dimension of the
subspaces is preserved by a
linear projection onto a
subspace of dimension
• Open problem: how to choose
– Can remove outliers by robustly projection?
fitting the subspace – PCA?
34. GPCA with spectral clustering
• Spectral clustering
– Build a similarity matrix between pairs of points
– Use eigenvectors to cluster data
• How to define a similarity for subspaces?
– Want points in the same subspace to be close
– Want points in different subspace to be far
• Use GPCA to get basis
• Distance: subspace angles
35. Comparison of PFA, PDA, K-sub, EM
18
PFA
K−sub
16
PDA
Error in the normals [degrees]
EM
14 PDA+K−sub
PDA+EM
12 PDA+K−sub+EM
10
8
6
4
2
0
0 1 2 3 4 5
Noise level [%]
36. Dealing with outliers
• GPCA with perfect data
• GPCA with outliers
• GPCA fails because PCA fails seek a robust estimate
of Null(Ln ) where Ln = [ n (x1 ), . . . , n
(xN )].
37. Three approaches to tackle outliers
• Probability-based: small-probability samples
– Probability plots: [Healy 1968, Cox 1968]
– PCs: [Rao 1964, Ganadesikan & Kettenring 1972]
– M-estimators: [Huber 1981, Camplbell 1980]
– Multivariate-trimming (MVT):
[Ganadesikan & Kettenring 1972]
• Influence-based: large influence on model parameters
– Parameter difference with and without a sample:
[Hampel et al. 1986, Critchley 1985]
• Consensus-based: not consistent with models of high consensus.
– Hough: [Ballard 1981, Lowe 1999]
– RANSAC: [Fischler & Bolles 1981, Torr 1997]
– Least Median Estimate (LME):
[Rousseeuw 1984, Steward 1999]
40. Robust GPCA
Comparison with RANSAC
• Accuracy
(q) (2,2,1) in 3 (r) (4,2,2,1) in 5 (s) (5,5,5) in 6
• Speed
Table: Average time of RANSAC and RGPCA with 24% outliers.
Arrangement (2,2,1) in 3 (4,2,2,1) in 5 (5,5,5) in 6
RANSAC 44s 5.1min 3.4min
MVT 46s 23min 8min
Influence 3min 58min 146min
41. Summary
• GPCA: algorithm for clustering subspaces
– Deals with unknown and possibly different dimensions
– Deals with arbitrary intersections among the subspaces
• Our approach is based on
– Projecting data onto a low-dimensional subspace
– Fitting polynomials to projected subspaces
– Differentiating polynomials to obtain a basis
• Applications in image processing and computer vision
– Image segmentation: intensity and texture
– Image compression
– Face recognition under varying illumination
42. For more information,
Vision, Dynamics and Learning Lab
@
Johns Hopkins University
Thank You!
43. Generalized Principal Component Analysis
via Lossy Coding and Compression
Yi Ma
Image Formation & Processing Group, Beckman
Decision & Control Group, Coordinated Science
Lab.
Electrical & Computer Engineering Department
University of Illinois at Urbana-Champaign
45. MOTIVATION – Motion Segmentation in Computer Vision
Goal: Given a sequence of images of multiple moving objects, determine:
– 1. the number and types of motions (rigid-body, affine, linear, etc.)
2. the features that belong to the same motion.
QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
The “chicken-and-egg” difficulty:
– Knowing the segmentation, estimating the motions is easy;
– Knowing the motions, segmenting the features is easy.
A Unified Algebraic Approach to 2D and 3D Motion Segmentation, [Vidal-Ma, ECCV’
46. MOTIVATION – Image Segmentation
Goal: segment an image into multiple regions with homogeneous texture.
feature
s
Computer Human
Difficulty: A mixture of models of different dimensions or
complexities.
Multiscale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP’
47. MOTIVATION – Video Segmentation
Goal: segmenting a video sequence into segments with “stationary” dynamics
Model: different
segments as outputs
from different (linear)
dynamical systems:
QuickTime™ and a
H.264 decompressor
are needed to see this picture.
Identification of Hybrid Linear Systems via Subspace Segmentation, [Huang-Wagner-Ma, C
48. MOTIVATION – Massive Multivariate Mixed Data
QuickTime™ and a
BMP decompressor
are needed to see this picture.
Face database Hyperspectral images Articulate motions
Hand written digits
Microarrays
49. SUBSPACE SEGMENTATION – Problem Formulation
Assumption: the data are noisy samples from an
arrangement of linear subspaces:
noise-free samples noisy samples samples with outliers
Difficulties:
– the dimensions of the subspaces can be different
– the data can be corrupted by noise or contaminated by outliers
– the number and dimensions of subspaces may be unknown
50. SUBSPACE SEGMENTATION – Statistical Approaches
Assume that the data
are i.i.d. samples from a mixture of
probabilistic distributions:
Solutions:
• Expectation Maximization (EM) for the maximum-likelihood estimate
[Dempster et. al.’77], e.g., Probabilistic PCA [Tipping-Bishop’99]:
• K-Means for a minimax-like estimate [Forgy’65, Jancey’66, MacQueen’67],
e.g., K-Subspaces [Ho and Kriegman’03]:
Essentially iterate between data segmentation and model estimation.
51. SUBSPACE SEGMENTATION – An Algebro-Geometric Approach
Idea: a union of linear subspaces is an
algebraic set -- the zero set of a set of
(homogeneous) polynomials:
Solution:
• Identify the set of polynomials of degree n that vanish on
• Gradients of the vanishing polynomials are normals to the
subspaces
Complexity exponential in the dimension and number of subspaces.
Generalized Principal Component Analysis, [Vidal-Ma-Sastry, IEEE Transactions PAMI’0
52. SUBSPACE SEGMENTATION – An Information-Theoretic Approach
Problem: If the number/dimension of subspaces not given and data
corrupted
by noise and outliers, how to determine the optimal subspaces that fit
Solutions: Model Selection Criteria?
the data?
– Minimum message length (MML) [Wallace-Boulton’68]
– Minimum description length (MDL) [Rissanen’78]
– Bayesian information criterion (BIC)
– Akaike information criterion (AIC) [Akaike’77]
– Geometric AIC [Kanatani’03], Robust AIC [Torr’98]
Key idea (MDL):
• a good balance between model complexity and data fidelity.
• minimize the length of codes that describe the model and the
data:
with a quantization error optimal for the model.
53. LOSSY DATA COMPRESSION
Questions:
– What is the “gain” or “loss” of segmenting or merging data?
– How does tolerance of error affect segmentation results?
Basic idea: whether the number of bits required to store “the
whole is more than the sum of its parts”?
54. LOSSY DATA COMPRESSION – Problem Formulation
– A coding scheme maps a set of vectors
to a sequence of bits, from which we can decode
The coding length is denoted as:
– Given a set of real-valued mixed data
the optimal segmentation
minimizes
the overall coding length:
where
55. LOSSY DATA COMPRESSION – Coding Length for Multivariate Data
Theorem.
Given with
is the number of bits needed to encode the data s.t.
.
A nearly optimal bound for even a small number
of vectors drawn from a subspace or a Gaussian
source.
Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’
56. LOSSY DATA COMPRESSION – Two Coding Schemes
Goal: code s.t. a mean squared error
Linear subspace Gaussian source
57. LOSSY DATA COMPRESSION – Properties of the Coding Length
1. Commutative Property:
For high-dimensional data, computing the coding length only needs
the kernel matrix:
2. Asymptotic Property:
At high SNR, this is the optimal rate distortion for a Gaussian source.
3. Invariant Property:
Harmonic Analysis is useful for data compression only when the data are
non-Gaussian or nonlinear ……… so is segmentation!
59. LOSSY DATA COMPRESSION – Probabilistic Segmentation?
Assign the ith point to the jth group with probability
Theorem. The expected coding length of the segmented data
is a concave function in Π over the domain of a convex polytope.
Minima are reached at the vertexes of the
polytope -- no probabilistic
segmentation!
Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’
60. LOSSY DATA COMPRESSION – Segmentation & Channel Capacity
A MIMO additive white Gaussian noise (AWGN) channel
has the capacity:
If allowing probabilistic grouping of transmitters, the expected
capacity
is a concave function in Π over a convex polytope.
Maximizing such a capacity is a convex
problem.
On Coding and Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI
61. LOSSY DATA COMPRESSION – A Greedy (Agglomerative) Algorithm
Objective: minimizing the overall coding length
Input: “Bottom-up” merge
while true do
choose two sets such
that
is minimal QuickTime™ and a
if
PNG decompressor
are needed to see this picture.
then
else break
endif
end
Output:
Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
62. SIMULATIONS – Mixture of Almost Degenerate Gaussians
Noisy samples from two lines and one plane in <3
Given Data Segmentation Results
ε0 = 0.01
ε0 = 0.08
Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
63. SIMULATIONS – “Phase Transition”
#group v.s. distortion Rate v.s. distortion
ε0 = 0.0
0.08 8
ice
cubes
steam Stability: the same segmentation
water for ε across 3 magnitudes!
0.08
Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
64. SIMULATIONS – Comparison with EM
100 x d uniformly distributed random samples from each subspace, corrupte
with 4% noise. Classification rate averaged over 25 trials for each case.
Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
65. SIMULATIONS – Comparison with EM
Segmenting three degenerate or non-degenerate Gaussian clusters for 50 tria
Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
66. SIMULATIONS – Robustness with Outliers
35.8% outliers 45.6%
71.5% 73.6%
Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
67. SIMULATIONS – Affine Subspaces with Outliers
35.8% outliers 45.6%
66.2% 69.1%
Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
69. SIMULATIONS – Summary
– The minimum coding length objective automatically addresses
the
model selection issue: the optimal solution is very stable and
robust.
– The segmentation/merging is physically meaningful (measured
in bits).
The results resemble phase transition in statistical physics.
– The greedy algorithm is scalable (polynomial in both K and N)
and
converges well when ε is not too small w.r.t. the sample
density.
70. Clustering from a Classification Perspective
Assumption: The training data
are drawn from a distribution
Goal: Construct a classifier
such that the misclassification
error
reaches minimum.
Solution: Knowing the distributions and
, the optimal classifier is the maximum a posteriori (MAP)
classifier:
Difficulties: How to learn the two distributions from samples?
(parametric, non-parametric, model selection, high-dimension, outliers…)
71. MINIMUM INCREMENTAL CODING LENGTH – Problem Formulation
Ideas: Using the lossy coding length
as a surrogate for the Shannon lossless coding length w.r.t. true
distributions.
Additional bits need to encode the test
sample with the jth training set is
Classification Criterion: Minimum Incremental Coding Length
(MICL)
72. MICL (“Michael”) – Asymptotic Properties
Theorem: As the number of samples goes to infinity, the MICL
criterion converges with probability one to the following criterion:
where
?
is the “number of effective
parameters” of the j-th model (class).
Theorem: The MICL classifier converges to the above asymptotic form
at the rate of for some constant .
Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]
73. SIMULATIONS – Interpolation and Extrapolation via MICL
MICL
SVM
k-NN
Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]
74. SIMULATIONS – Improvement over MAP and RDA [Friedman1989]
Two Gaussians in
R2
isotropic (left)
anisotropic
(right)
(500 trials)
Three Gaussians in
Rn
dim = n
dim = n/2
dim = 1
(500 trials)
Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]
75. SIMULATIONS – Local and Kernel MICL
Local MICL (LMICL): Applying MICL locally to the k-nearest
neighbors of the test sample (frequencylist + Bayesianist).
Kernel MICL (KMICL): Incorporating MICL with a nonlinear kernel
naturally through the identity (“kernelized” RDA):
LMICL k- KMICL-RBF SVM-RBF
NN
Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]
76. CONCLUSIONS
Assumptions: Data are in a high-dimensional space but have
low-dimensional structures (subspaces or submanifolds).
Compression => Clustering & Classification:
– Minimum (incremental) coding length subject to distortion.
– Asymptotically optimal clustering and classification.
– Greedy clustering algorithm (bottom-up, agglomerative).
– MICL corroborates MAP, RDA, k-NN, and kernel methods.
Applications (Next Lectures):
– Video segmentation, motion segmentation (Vidal)
– Image representation & segmentation (Ma)
– Others: microarray clustering, recognition of faces and
handwritten digits (Ma)
77. FUTURE DIRECTIONS
Theory
– More complex structures: manifolds, systems, random
fields…
– Regularization (ridge, lasso, banding etc.)
– Sparse representation and subspace arrangements
Computation
– Global optimality (random techniques, convex
optimization…)
– Scalability: random sampling, approximation…
Future Application Domains
– Image/video/audio classification, indexing, and retrieval
– Hyper-spectral images and videos
– Biomedical images, microarrays
– Autonomous navigation, surveillance, and 3D mapping
– Identification of hybrid linear/nonlinear systems
78. REFERENCES & ACKNOWLEGMENT
References:
– Segmentation of Multivariate Mixed Data via Lossy Data
Compression, Yi Ma, Harm Derksen, Wei Hong, John Wright,
PAMI, 2007.
– Classification via Minimum Incremental Coding Length (MICL),
John Wright et. al., NIPS, 2007.
– Website: http://perception.csl.uiuc.edu/coding/home.htm
People:
– John Wright, PhD Student, ECE Department, University of Illinois
– Prof. Harm Derksen, Mathematics Department, University of
Michigan
– Allen Yang (UC Berkeley) and Wei Hong (Texas Instruments R&D)
– Zhoucheng Lin and Harry Shum, Microsoft Research Asia, China
Funding:
– ONR YIP N00014-05-1-0633
– NSF CAREER IIS-0347456, CCF-TF-0514955, CRS-EHS-0509151
79. 11/2003
“The whole is more than the sum of its
parts.”
--
Aristotle
Questions, please?
Yi Ma, CVPR 2008
80. Part II
Applications of GPCA in Computer Vision
René Vidal
Center for Imaging Science
Institute for Computational Medicine
Johns Hopkins University
81. Part II: Applications in computer vision
• Applications to motion & video segmentation (10.30-11.15)
– 2-D and 3-D motion segmentation
– Temporal video segmentation
– Dynamic texture segmentation
• Applications to image representation and segmentation
(11.15-12.00)
– Multi-scale hybrid linear models for sparse
image representation
– Hybrid linear models for image segmentation
82. Applications to motion and and video
segmentation
René Vidal
Center for Imaging Science
Institute for Computational Medicine
Johns Hopkins University
83. 3-D motion segmentation problem
• Given a set of point correspondences in multiple views, determine
– Number of motion models
– Motion model: affine, homography, fundamental matrix, trifocal tensor
– Segmentation: model to which each pixel belongs
• Mathematics of the problem depends on
– Number of frames (2, 3, multiple)
– Projection model (affine, perspective)
– Motion model (affine, translational, homography, fundamental matrix, etc.)
– 3-D structure (planar or not)
84. Taxonomy of problems
• 2-D Layered representation
– Probabilistic approaches: Jepson-Black’93, Ayer-Sawhney’95, Darrel-Pentland’95, Weiss-
Adelson’96, Weiss’97, Torr-Szeliski-Anandan’99
– Variational approaches: Cremers-Soatto ICCV’03
– Initialization: Wang-Adelson’94, Irani-Peleg’92, Shi-Malik‘98, Vidal-Singaraju’05-’06
• Multiple rigid motions in two perspective views
– Probabilistic approaches: Feng-Perona’98, Torr’98
– Particular cases: Izawa-Mase’92, Shashua-Levin’01, Sturm’02,
– Multibody fundamental matrix: Wolf-Shashua CVPR’01, Vidal et al. ECCV’02, CVPR’03, IJCV’06
– Motions of different types: Vidal-Ma-ECCV’04, Rao-Ma-ICCV’05
• Multiple rigid motions in three perspective views
– Multibody trifocal tensor: Hartley-Vidal-CVPR’04
• Multiple rigid motions in multiple affine views
– Factorization-based: Costeira-Kanade’98, Gear’98, Wu et al.’01, Kanatani’ et al.’01-02-04
– Algebraic: Yan-Pollefeys-ECCV’06, Vidal-Hartley-CVPR’04
• Multiple rigid motions in multiple perspective views
– Schindler et al. ECCV’06, Li et al. CVPR’07
85. A unified approach to motion segmentation
• Estimation of multiple motion models equivalent to
estimation of one multibody motion model
chicken-and-egg
– Eliminate feature clustering: multiplication
– Estimate a single multibody motion model: polynomial fitting
– Segment multibody motion model: polynomial differentiation
86. A unified approach to motion segmentation
• Applies to most motion models in computer vision
• All motion models can be segmented algebraically by
– Fitting multibody model: real or complex polynomial to all data
– Fitting individual model: differentiate polynomial at a data point
87. Segmentation of 3-D translational motions
• Multiple epipoles (translation)
• Epipolar constraint: plane in
– Plane normal = epipoles
– Data = epipolar lines
• Multibody epipolar constraint • Epipoles are derivatives of
at epipolar lines
89. Single-body factorization
Structure = 3D surface
• Affine camera model
– p = point
– f = frame
Motion = camera position and orientation
• Motion of one rigid-body lives in a 4-D subspace
(Boult and Brown ’91,
Tomasi and Kanade ‘92)
– P = #points
– F = #frames
90. Multi-body factorization
• Given n rigid motions
• Motion segmentation is obtained from
– Leading singular vector of (Boult and Brown ’91)
– Shape interaction matrix (Costeira & Kanade ’95, Gear ’94)
– Number of motions (if fully-dimensional)
• Motion subspaces need to be independent (Kanatani ’01)
91. Multi-body factorization
• Sensitive to noise
– Kanatani (ICCV ’01): use model selection to scale Q
– Wu et al. (CVPR’01): project data onto subspaces and iterate
• Fails with partially dependent motions
– Zelnik-Manor and Irani (CVPR’03)
• Build similarity matrix from normalized Q
• Apply spectral clustering to similarity matrix
– Yan and Pollefeys (ECCV’06)
• Local subspace estimation + spectral clustering
– Kanatani (ECCV’04)
• Assume degeneracy is known: pure translation in the image
• Segment data by multi-stage optimization (multiple EM problems)
• Cannot handle missing data
– Gruber and Weiss (CVPR’04)
• Expectation Maximization
92. PowerFactorization+GPCA
• A motion segmentation algorithm that
– Is provably correct with perfect data
– Handles both independent and degenerate motions
– Handles both complete and incomplete data
• Project trajectories onto a 5-D subspace of
– Complete data: PCA or SVD
– Incomplete data: PowerFactorization
• Cluster projected subspaces using GPCA
– Handles both independent and degenerate motions
– Non-iterative: can be used to initialize EM
93. Projection onto a 5-D subspace
• Motion of one rigid-body lives in
4-D subspace of Motion 1
• By projecting onto a 5-D
subspace of Motion 2
– Number and dimensions of
subspaces are preserved
– Motion segmentation is
equivalent to clustering
subspaces of dimension
2, 3 or 4 in
– Minimum #frames = 3
(CK needs a minimum of 2n
frames for n motions)
• What projection to use?
– Can remove outliers by robustly
– PCA: 5 principal components
fitting the 5-D subspace using
Robust SVD (DeLaTorre-Black) – RPCA: with outliers
94. Projection onto a 5-D subspace
PowerFactorization algorithm: Given , factor it as
• Complete data • Incomplete data
– Given A solve for B
– Orthonormalize B
Linear problem
– Given B solve for A
– Iterate
• It diverges in some cases
• Converges to rank-r
approximation with rate • Works well with up to 30% of
missing data
96. Hopkins 155 motion segmentation database
• Collected 155 sequences
– 120 with 2 motions
– 35 with 3 motions
• Types of sequences
– Checkerboard sequences: mostly full
dimensional and independent motions
– Traffic sequences: mostly degenerate (linear,
planar) and partially dependent motions
– Articulated sequences: mostly full dimensional
and partially dependent motions
• Point correspondences
– In few cases, provided by Kanatani & Pollefeys
– In most cases, extracted semi-automatically
with OpenCV
99. Experimental results: missing data sequences
• There is no clear correlation between amount of missing data and
percentage of misclassification
• This could be because convergence of PF depends more on “where”
missing data is located than on “how much” missing data there is
100. Conclusions
• For two motions
– Algebraic methods (GPCA and LSA) are more accurate than
statistical methods (RANSAC and MSL)
– LSA performs better on full and independent sequences, while
GPCA performs better on degenerate and partially dependent
– LSA is sensitive to dimension of projection: d=4n better than d=5
– MSL is very slow, RANSAC and GPCA are fast
• For three motions
– GPCA is not very accurate, but is very fast
– MSL is the most accurate, but it is very slow
– LSA is almost as accurate as MSL and almost as fast as GPCA
101. Segmentation of Dynamic Textures
René Vidal
Center for Imaging Science
Institute for Computational Medicine
Johns Hopkins University
102. Modeling a dynamic texture: fixed boundary
• Examples of dynamic textures:
• Model temporal evolution as the output of a linear
dynamical system (LDS): Soatto et al. ‘01
dynamics
zt+1 = Azt + vt
images yt = Czt + wt
appearance
103. Segmenting non-moving dynamic textures
• One dynamic texture lives in the observability subspace
zt+1 = Azt + vt
yt = Czt + wt
• Multiple textures live in multiple subspaces
water
steam
• Cluster the data using GPCA
106. Level-set intensity-based segmentation
• Chan-Vese energy functional
• Implicit methods
– Represent C as the zero level set
of an implicit function , i.e.
C = {(x, y) : (x, y) = 0}
• Solution
– The solution to the gradient descent algorithm for is given by
– c1 and c2 are the mean intensities inside and outside the contour C.
107. Dynamics & intensity-based energy
• We represent the intensities of the pixels in the images as
the output of a mixture of AR models of order p
• We propose the following spatial-temporal extension of the
Chan-Vese energy functional
where
108. Variational segmentation of dynamic textures
• Given the ARX parameters, we can solve for the implicit
function by solving the PDE
• Given the implicit function , we can solve for the ARX
parameters of the jth region by solving the linear system
109. Variational segmentation of dynamic textures
• Fixed boundary segmentation results and comparison
Ocean-smoke Ocean-dynamics Ocean-appearance
110. Variational segmentation of dynamic textures
• Moving boundary segmentation results and comparison
Ocean-fire
112. Temporal video segmentation
• Segmenting N=30 frames of a
sequence containing n=3
scenes
– Host
– Guest
– Both
• Image intensities are output of
linear system
dynamics
xt+1 = Axt +vt
• y =C
Apply GPCA totfit n=3 xt +wt
images
observability subspaces
appearance
113. Temporal video segmentation
• Segmenting N=60 frames of a
sequence containing n=3
scenes
– Burning wheel
– Burnt car with people
– Burning car
• Image intensities are output of linear
system
dynamics
xt+1 = Axt +vt
yt = Cxt +wt
images
• Apply GPCA to fit n=3 appearance
observability
subspaces
114. Conclusions
• Many problems in computer vision can be posed as subspace
clustering problems
– Temporal video segmentation
– 2-D and 3-D motion segmentation
– Dynamic texture segmentation
– Nonrigid motion segmentation
• These problems can be solved using GPCA: an algorithm for clustering
subspaces
– Deals with unknown and possibly different dimensions
– Deals with arbitrary intersections among the subspaces
• GPCA is based on
– Projecting data onto a low-dimensional subspace
– Recursively fitting polynomials to projected subspaces
– Differentiating polynomials to obtain a basis
115. For more information,
Vision, Dynamics and Learning Lab
@
Johns Hopkins University
Thank You!
116. Generalized Principal Component Analysis
for Image Representation & Segmentation
Yi Ma
Control & Decision, Coordinated Science Laboratory
Image Formation & Processing Group, Beckman
Department of Electrical & Computer Engineering
University of Illinois at Urbana-Champaign
117. INTRODUCTION
GPCA FOR LOSSY IMAGE REPRESENTATION
IMAGE SEGMENTATION VIA LOSSY COMPRESSION
OTHER APPLICATIONS
CONCLUSIONS AND FUTURE DIRECTIONS
118. Introduction – Image Representation via Linear Transformations
better
representations?
pixel-based representation
three matrixes of RGB-values
a more compact
linear transformation representation
120. Introduction
Adaptive Bases (optimal if imagery data are uni-modal)
- Karhunen-Loeve transform (KLT), also known as PCA (Pearson’1901,
Hotelling’33, Jolliffe’86)
stack
adaptive bases
121. Introduction – Principal Component Analysis (PCA)
Dimensionality Reduction
Find a low-dimensional representation (model) for high-dimensional data.
Principal Component Analysis (Pearson’1901, Hotelling’1933, Eckart &
Young’1936) or Karhunen-Loeve transform (KLT).
Basis for S SVD
Variations of PCA
– Nonlinear Kernel PCA (Scholkopf-Smola-Muller’98)
– Probabilistic PCA (Tipping-Bishop’99, Collins et.al’01)
– Higher-Order SVD (HOSVD) (Tucker’66, Davis’02)
– Independent Component Analysis (Hyvarinen-Karhunen-Oja’01)
122. Hybrid Linear Models – Multi-Modal Characteristics
Distribution of the first three principal components of
the Baboon image: A clear multi-modal distribution
124. Hybrid Linear Models – Versus Linear Models
A single linear model
Linear
stack
Hybrid linear models
Hybrid linear
stack
125. Hybrid Linear Models – Characteristics of Natural Images
Multivariate Hybrid Hierarchical High-dimensio
1D 2D (multi-modal) (multi-scale) (vector-value
Fourier
(DCT) X X
Wavelets X
X
Curvelets X
Random fields X X X
PCA/KLT X X X
VQ X X X X
Hybrid linear X X X X X
We need a new & simple paradigm to effectively account for all
these characteristics simultaneously.
126. Hybrid Linear Models – Subspace Estimation and Segmentation
Hybrid Linear Models (or Subspace
Arrangements)
– the number of subspaces is
unknown
– the dimensions of the
subspaces are unknown
– the basis of the subspaces are
unknown
– the segmentation of the data
points is unknown
“Chicken-and-Egg” Coupling
– Given segmentation, estimate subspaces
– Given subspaces, segment the data
128. Hybrid Linear Models – Effective Dimension
Model Selection (for Noisy Data)
Model complexity;
Data fidelity;
Number of
subspaces
Total Dimension Number of
number of of each points in each
points subspace subspace
Model selection criterion: minimizing effective dimension
subject to a given error tolerance (or PSNR)
131. Hybrid Linear Models – Lossy Image Representation (Baboon)
GPCA
Original PCA (8x8)
DCT (JPEG)
Harr
Wavelet GPCA (8x8)
132. Multi-Scale Implementation – Algorithm Diagram
Diagram for a level-3 implementation of hybrid linear models
for image representation
Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
133. Multi-Scale Implementation – The Baboon Image
The Baboon image
downsample
by two twice
segmentation of
2 by 2 blocks
Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
134. Multi-Scale Implementation – Comparison with Other Methods
The Baboon image
Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
135. Multi-Scale Implementation – Image Approximation
Comparison with level-3 wavelet (7.5% coefficients)
Level-3 bior-4.4 wavelets Level-3 hybrid linear model
PSNR=23.94 PSNR=24.64
Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
136. Multi-Scale Implementation – Block Size Effect
The Baboon image
Some problems with the multi-scale hybrid linear model:
1. has minor block effect;
2. is computationally more costly (than Fourier, wavelets, PCA);
3. does not fully exploit spatial smoothness as wavelets.
Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
137. Multi-Scale Implementation – The Wavelet Domain
The Baboon image HL
LH HH
segmentation
at each scale
Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
138. Multi-Scale Implementation – Wavelets v.s. Hybrid Linear Wavelets
The Baboon image
Advantages of the hybrid linear model in wavelet domain:
1. eliminates block effect;
2. is computationally less costly (than in the spatial domain);
3. achieves higher PSNR.
Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
139. Multi-Scale Implementation – Visual Comparison
Comparison among several models (7.5% coefficients)
Original Wavelets
Image PSNR=23.94
Hybrid model Hybrid model
in spatial in wavelet
domain domain
PSNR=24.64 PSNR=24.88
Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP
140. Image Segmentation – via Lossy Data Compression
stack
QuickTime™ and a
PNG decompressor
are needed to see this picture.
141. APPLICATIONS – Texture-Based Image Segmentation
Naïve approach:
– Take a 7x7 Gaussian window around every
pixel.
– Stack these windows as vectors.
– Clustering the vectors using our algorithm.
A few results:
Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]
142. APPLICATIONS – Distribution of Texture Features
Question: why does such a simple algorithm work at all?
Answer: Compression (MDL/MCL) is well suited to mid-level texture
segmentation.
Using a single representation (e.g. windows, filterbank responses) for textures
different complexity ⇒ redundancy and degeneracy, which can be exploited fo
clustering / compression.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Above: singular values of feature vectors from two different
segments of the image at left.
143. APPLICATIONS – Compression-based Texture Merging (CTM)
Problem with the naïve
approach: QuickTime™ and a
TIFF (LZW) decompressor
QuickTime™ and a QuickTime™ and a are needed to see this picture.
Strong edges, segment boundaries
TIFF (LZW) decompressor TIFF (LZW) decompressor
are needed to see this picture. are needed to see this picture.
Solution:
Low-level, edge-preserving over-segmentation into small homogeneous
regions.
Simple features: stacked Gaussian windows (7x7 in our experiments).
Merge adjacent regions to minimize coding length (“compress” the features).
144. APPLICATIONS – Hierarchical Image Segmentation via CTM
ε = 0.1 ε = 0.2 ε = 0.4
Lossy coding with varying distortion ε => hierarchy of
segmentations
146. APPLICATIONS – CTM: Quantitative Evaluation and Comparison
Berkeley Image Segmentation Database
PRI: Probabilistic Rand Index [Pantofaru 2005]
VoI: Variation of Information [Meila 2005]
GCE: Global Consistency Error [Martin 2001]
BDE: Boundary Displacement Error [Freixenet 2002]
Unsupervised Segmentation of Natural Images via Lossy Data Compression, CVIU, 200
147. Other Applications: Multiple Motion Segmentation (on Hopkins155)
QuickTime™ and a QuickTime™ and a
Cinepak decompressor Cinepak decompressor
are needed to see this picture. are needed to see this picture.
Two Motions: MSL 4.14%, LSA 3.45%, ALC 2.40%, and work with up to 25% outliers.
Three Motions: MSL 8.32%, LSA 9.73%, ALC 6.26%.
Shankar Rao, Roberton Tron, Rene Vidal, and Yi Ma, to appear in CVPR’08
148. Other Applications – Clustering of Microarray Data
Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’
149. Other Applications – Clustering of Microarray Data
Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’
150. Other Applications – Supervised Classification
Premises: Data lie on an
arrangement of subspaces
Unsupervised Clustering Supervised Classification
– Generalized PCA – Sparse Representation
151. Other Applications – Robust Face Recognition
Robust Face Recognition via Sparse Representation, to appear in PAMI 2008
152. Other Applications: Robust Motion Segmentation (on Hopkins155)
Dealing with incomplete or mistracked features with dataset 80%
corrupted!
Shankar Rao, Roberto Tron, Rene Vidal, and Yi Ma, to appear in CVPR’08
153. Three Measures of Sparsity: Bits, L_0 and L1-Norm
Reason: High-dimensional data, like images, do have compact,
compressible, sparse structures, in terms of their geometry,
statistics, and semantics.
154. Conclusions
Most imagery data are high-dimensional, statistically or
geometrically heterogeneous, and have multi-scale
structures.
Imagery data require hybrid models that can adaptively
represent different subsets of the data with different
(sparse) linear models.
Mathematically, it is possible to estimate and segment
hybrid (linear) models non-iteratively. GPCA offers one such
method.
Hybrid models lead to new paradigms, new principles, and
new applications for image representation, compression,
and segmentation.
155. Future Directions
Mathematical Theory
– Subspace arrangements (algebraic properties).
– Extension of GPCA to more complex algebraic varieties (e.g.,
hybrid multilinear, high-order tensors).
– Representation & approximation of vector-valued functions.
Computation & Algorithm Development
– Efficiency, noise sensitivity, outlier elimination.
– Other ways to combine with wavelets and curvelets.
Applications to Other Data
– Medical imaging (ultra-sonic, MRI, diffusion tensor…)
– Satellite hyper-spectral imaging.
– Audio, video, faces, and digits.
– Sensor networks (location, temperature, pressure, RFID…)
– Bioinformatics (gene expression data…)
156. Acknowledgement
People
– Wei Hong, Allen Yang, John Wright, University of Illinois
– Rene Vidal of Biomedical Engineering Dept., Johns Hopkins
University
– Kun Huang of Biomedical & Informatics Science Dept., Ohio-
State University
Funding
– Research Board, University of Illinois at Urbana-Champaign
– National Science Foundation (NSF CAREER IIS-0347456)
– Office of Naval Research (ONR YIP N000140510633)
– National Science Foundation (NSF CRS-EHS0509151)
– National Science Foundation (NSF CCF-TF0514955)
157. Generalized Principal Component Analysis:
Modeling and Segmentation of Multivariate Mixed
Data
Rene Vidal, Yi Ma, and Shankar Sastry
Springer-Verlag, to appear
Thank You!
Yi Ma, CVPR 2008