Upcoming SlideShare
×

Like this presentation? Why not share!

# Mixture Models for Image Analysis

## on May 30, 2012

• 417 views

Aristidis Likas, Associate Professor and Christoforos Nikou, Assistant Professor, University of Ioannina, Department of Computer Science , Mixture Models for Image Analysis

Aristidis Likas, Associate Professor and Christoforos Nikou, Assistant Professor, University of Ioannina, Department of Computer Science , Mixture Models for Image Analysis

### Views

Total Views
417
Views on SlideShare
390
Embed Views
27

Likes
0
8
0

### 1 Embed27

 http://dls.csd.auth.gr 27

### Report content

• Comment goes here.
Are you sure you want to

## Mixture Models for Image AnalysisPresentation Transcript

• Mixture Models for Image AnalysisAristidis Likas & Christophoros Nikou IPAN Research Group Department of Computer Science University of Ioannina
• Collaborators: Nikolaos Galatsanos, Professor Konstantinos Blekas, Assistant Professor Dr. Costas Constantinopoulos, Researcher George Sfikas, Ph.d Candidate Demetrios Gerogiannis, Ph.d Candidate
• Outline• Mixture Models and EM (GMM, SMM)• Bayesian GMMs• Image segmentation using mixture models – Incremental Bayesian GMMs – Spatially varying GMMs (SVMMs) with MRF priors – SVMMs and line processes• Image registration using mixture models
• Mixture Models• Probability density estimation: estimate the density function model f(x) that generated a given dataset X={x1,…, xN}• Mixture Models M M f ( x)    j 1 j j ( x;  j )  j  0,  j 1 j 1 – M pdf components φj(x), – mixing weights: π1, π2, …, πM (priors)• Gaussian Mixture Model (GMM): φj = N(μj, Σj)
• GMM (graphical model) πj Hidden variable observation
• GMM examplesGMMs be used for density estimation (like histograms) or clustering  j j ( x n ; j ) Cluster P( j | x )  n n  z j  n memberhsip f (x ) 6 probability
• Mixture Model training• Given a dataset X={x1,…, xN} and a GMM f (x;Θ) p( X ; )  p( x1 ,..., xN ; )  i 1 f ( xi ; ) N• Likelihood:• GMM training: log-likelihood maximization N   arg max  ln p( xi ; )   i 1• Expectation-maximization (EM) algorithm – Applicable when posterior P(Z|X) can be computed
• EM for Mixture Models• E-step: compute expectation of hidden variables given the observations:  j ( xn |  j ) P( j | x n )  z n  j K   j ( xn |  p ) p 1• M-step: maximize expected complete likelihood  ( t 1)  arg max  Q(Θ)   log p( X , Z ;Θ)P( Z |X ) Q ( )    z n  log  j  log  ( x n |  j ) N K j n 1 j 1
• EM for GMM (M-step) n1 j N  zn  xnMean  (jt 1)   N n 1 z  n j n1 j  z n  ( x n   (jt 1) )( x n   (jt 1) )T NCovariance (jt 1)  n1 j N  zn  n1 j N  zn Mixing weights  ( t 1) j  N
• Students t-distribution    d  1/ 2  ||St ( x;  , , )   2  d / 2   1 (  d ) ( )    1  ( x   )  ( x   ) /   1 2  2    Mean μ  Covariance matrix Σ  Degrees of freedom v  Bell-shaped + heavy-tailed (depending on v)  Tends to Gaussian for large v
• The Students t-distribution
• The Students t-distribution u; ~ Gamma( / 2, / 2) x | u; ,  ~ ( ,  / u) Hierarchical distribution x follows a Gaussian distribution whose covariance is scaled by a factor following a Gamma distribution. ML parameter estimation using the EM algorithm (u is considered as hidden variable).
• The Students t-distribution
• SMM: Students t Mixture Models Each component j follows St(μj, Σj, vj) (robust mixture) Parameter estimation using EM hidden variables: uj and zj  j ( xn |  j ) E-step:  z n  j K   p ( xn |  p ) p 1 v (jt )  d  u  n v  x      ( t ) 1  x n   (jt )  j (t ) n (t ) T j j j
• SMM training• M-step n1 j N  u n  z n  x n Mean  (jt 1)  j n1 j N  u n  z n  j n1 j  u n  z n  ( x n   (jt 1) )( x n   (jt 1) )T N (jt 1)  j Covariance  N n 1  u n  z n  j j n1 j N  u n  z n   (jt 1)  j Mixing N proportion
• EM for SMM• M-step Degrees of freedom: no closed form update  v (jt 1)   v (jt 1)   v (jt 1)  d  log   2      2   1  log           2   z n (t )  log  u n (t )   u n (t )  n1 j N  v(jt 1)  d     0 j j   n1 j N  z n ( t )  2 
• Mixture model training issues• EM local maxima (dependence on initialization)• Covariance Singularities• How to select the number of components• SMM vs GMM • Better results for data with outliers (robustness) • Higher dependence on initialization (how to initialize vi ?)
• EM Local Maxima
• Bayesian GMM M M f ( x )    j j ( x;  j ,  j )  j 1 j 1 j 1Typical approach: Priors on all GMM parameters   j : N (m, S ), p(  )   p( j ) j 1  Tj : Wishart (v,V ), p(T )   p(T j ), T j  1 j j 1   ( 1 ,...,   ) : Dirichlet (a1 ,..., aM )
• Bayesian GMM training• Parameters Θ become (hidden) RVs: H={Z, Θ}• Objective: Compute Posteriors P(Z|X), P(Θ|X) (intractable)• Approximations • Sampling (RJMCMC) • MAP approach • Variational approach• MAP approximation • mode of the posterior P(Θ|Χ) (MAP-EM) MAP  arg max {log P( X | )  log P( )} • compute P(Z|X,ΘMAP)
• Variational Inference (no parameters)• Computes approximation q(H) of the true posterior P(H|X)• For any pdf q(H): ln p  X   F  q   KL  q  H  P  H | X  • Variational Bound (F) maximization pX,H  q*  arg max q F  q   arg max q  q  H  ln dH qH • Mean field approximation qH    qH k  k exp ln p  X , H ;     q  H ;   k q H k  exp ln p  X , H ;   dH k  q H k • System of equations D. Tzikas, A. Likas, N. Galatsanos, IEEE Signal Processing Magazine, 2008
• Variational Inference (with parameters)• X data, H hidden RVs, Θ parameters• For any pdf q(H;Θ):  ln p  X ;   F  q,   KL q  H ;  p  H | X ;  • Maximization of Variational Bound F p  X , H ;  F  q,    q  H ;  ln dH  ln p  X ;  q  H ;  • Variational EM • VE-Step: q  arg max F q,  old q   • VM-Step:   arg max F qold ,    
• Bayesian GMM training• Bayesian GMMs (no parameters) • mean field variational approximation • tackles the covariance singularity problem • requires to specify the parameters of the priors• Estimating the number of components: • Start with a large number of components • Let the training process prune redundant components (πj=0) • Dirichlet prior on πj prevents component prunning
• Bayesian GMM without prior on π• Mixing weights πj are parameters (remove Dirichlet prior)• Training using Variational EMMethod (C-B) • Start with a large number of components • Perform variational maximization of the marginal likelihood • Prunning of redundant components (πj=0) • Only components that fit well to the data are finally retained
• Bayesian GMM (C-B)• C-B method: Results depend on • the number of initial components • initialization of components • specification of the scale matrix V of the Wishart prior p(T)
• Incremental Bayesian GMM • Solution: incremental training using component splitting • Local scale matrix V: based on the variance of the component to be splitted• Modification of the Bayesian GMM is needed • Divide the components as ‘fixed’ or ‘free’ • Prior on the weights of ‘fixed’ components (retained) • No prior on the weights of ‘free’ components (may be eliminated) • Prunning restricted among ‘free’ components C. Constantinopoulos & A. Likas, IEEE Trans. on Neural Networks, 2007
• Incremental Bayesian GMM
• Incremental Bayesian GMM• Start with k=1 component.•At each step: • select a component j • split component j in two subcomponents • set the scale matrix V analogous to Σj • apply Variational EM considering the two subcomponents as free and the rest components as fixed • either the two components will be retained and adjusted • or one of them will be eliminated and the other one will recover the original component (before split)• until all components have been tested for split unsuccessfully
• Mixture Models for Image Modeling• Select a feature representation• Compute a feature vector per pixel to form the training set• Build a mixture model for the image using the training set•Applications • Image retrieval + relevance feedback • Image segmentation • Image registration
• Mixture Models for Image Segmentation• One cluster per mixture component.• Assign pixels to clusters based on P(j|x)• Take into account spatial smoothness: neighbouring pixels are expected to have the same label • Simple way: add pixel coordinates to the feature vector • Bayesian way: impose MRF priors (SVMM)
• Incremental Bayesian GMM Image segmentationNumber of segments determined automatically
• Incremental Bayesian GMM Image segmentationNumber of segments determined automatically
• Spatially Varying mixtures (1) K f ( x n | Π, Θ)    n  ( x n |  j ) j n  1,2,...,N j 1 x n Image feature (e.g. pixel intensity)  n j Contextual mixing proportions(x |  j ) n Gaussian parameterized by  j  { j ,  j } n z j Data Label, hidden variable
• Spatially Varying mixtures (2)Insight into the contextual mixing proportions:   p( z  1| x ) n j n j nSmoothness is enforced in the image by imposinga prior p(Π) on the probability of the pixel labels(contextual mixing proportions). N L(Π | Χ, Θ)   log f ( x n | Π, Θ)  log p( Π) n 1
• SV-GMM with Gibbs prior (1) • A typical constraint is the Gibbs prior: 1 U (Π ) N p(Π)  e , U (Π )    VN i (Π ), Z i 1 Ni 2   KVNi (Π)      i j m j , Smoothness weight j 1 m 1[K. Blekas, A. Likas, N. Galatsanos and I. Lagaris. IEEE Trans. Neur.Net., 2005]
• SV-GMM with Gibbs prior (2)
• SV-GMM with Gibbs prior (3)• E-step: equivalent with GMM.• M-step: the contextual mixing proportions are solutions to a quadratic equation.• Note that: 1) Parameter β of the Gibbs prior must be determined beforehand. 2) The contextual mixing proportions are not constraint to be probability vectors: K 0   n  1, j  j  n  1, n  1, 2,..., N j 1
• SV-GMM with Gibbs prior (4)To address these issues:1) Class adaptive Gauss-Markov random field prior.2) Projection of probabilities to the hyper-plane (another solution will be presented later on): K  j  n  1, n  1, 2,..., N j 1
• SV-GMM with Gauss-Markov prior (2)• One variance per cluster j=1,2,…,K per direction d=0, 45, 90, 135 degrees  N m 2 D K   m ( j   j )  n p(Π )    j ,d exp N   1 n 1 Nn  d 1 j 1  2  j2,d      [C. Nikou, N. Galatsanos and A. Likas. IEEE Trans. Im. Proc., 2007]
• SV-GMM with Gauss-Markov prior (3)
• MAP estimation  Posterior probabilities are the non-negative solutions of the second degree equation:   D D Q D D D  0  | N n |    j2,d  ( n ) 2      j2,d   m  ( n )  z ij   j2,d  0 n  p 1 d 1  j  p 1 d 1 mN n p j  j d 1 j   d p    dp     There is always a non-negative solution. K  Projection to the hyperplane:  j  n  1, n  1, 2,..., N j 1
• RGB image segmentation (1)Original image R-SNR = 2 dB G-SNR = 4 dB B-SNR = 3 dB
• RGB image segmentation (2) Noise-free image segmentation SVFMM CA-SVFMM
• RGB image segmentation (3) Degraded image segmentation SVFMM CA-SVFMM(β determined by trial and error)
• RGB image segmentation (4) Shading effect on cupola and wall modeled with SVFMM with a GMRF prior. βj x10-3 Cupola 128 Sky 33 Wall 119
• SV-GMM with DCM prior (1)For pixel n, the class label is a random variable multinomiallydistributed: !   jn  ,  jn  0, K zn K p( z n |  n )    jn  1, n  1,..., N , j K  z ij j 1 j 1 j 1    ,  ,...,   T parameterized by probability vector n 1 n n 2 n K . The whole image is parameterized by Ξ   1  ,    2    ,...,   
• SV-GMM with DCM prior (2) !   jn  ,  jn  0, K zn Kp( z n |  n )    jn  1, n  1,..., N , j K  z ij j 1 j 1 j 1Generative model for the image• Multinomial distribution: K possile outcomes.• Class label j, (j=1…K) appears with probability ξj .• M realizations of the process.• The distribution of the counts of a certain class isbinomial.
• SV-GMM with DCM prior (3)• The Dirichlet distribution forms the conjugateprior for the multinomial distribution.– The posterior p( | x) has the same functional form as the prior p( ) . p( x |  ) p( ) p( | x)   p( x |  ) p( )d [C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]
• SV-GMM with DCM prior (4) • It is natural to impose a Dirichlet prior on the parameters of the multinomial pdf:  K n aj p ( n | a n )  K    K ( a n 1)  j 1  jn , a n  0, n  1,..., N , j  1,..., K , j j  j    a n j 1 j 1 parameterized by vector a   a , a ,..., a  T n n n n 1 2 K .
• SV-GMM with DCM prior (5)Marginalizing the parameters of the multinomial 1 p( z n | a n )   p( z n |  n ) p( n | a n ) d i , n  1, 2,..., N 0yields the Dirichlet compound multinomial distribution forthe class labels:  K n aj  M!  j 1   K  an  z n  p( z | a )  K   a n , n  1,..., N . n n j j   K n  z j    a j  z n  j 1 n j j   j 1  j 1 
• SV-GMM with DCM prior (6)Image model: for a given pixel, its class j isdetermined by M=1 realization of the process.   n  p( z n  1| x n )  1  n j j  m  p( zm  1| x n )  0 m  j, m  1, 2,..., K , nThe DCM prior for the class label becomes: an an p( z n  1| a n )   j  1,..., K . j j  j K n  n am m 1
• SV-GMM with DCM prior (7)The model becomes spatially varying by imposing aGMRF prior on the parameters of the Dirichlet pdf.  N n 2 K   m (a j  a j )  n p ( A )    j exp N   1 n 1 Nn  j 1  2  j2      [C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]
• SV-GMM with DCM prior (8)
• MAP estimation Posterior probabilities are the non-negative solutions toQ   jam   n mNi  n 2   n j  a m   mN i j  n z j  j  j n n 2  0  ( a n )3     j   (a j )    (a j )  | N |  0a j n j | Nn | | Nn |     n     K  n j   am , n  1, 2,..., N n m 1 m j There is always a non-negative solution. No need for projection!
• Natural image segmentation (1)  Berkeley image data base (300 images).  Ground truth: human segmentations.  Features  MRF features o 7x7 windows x 3 components. o 147 dimensional vector. o PCA on a single image. o 8 principal components kept.
• Natural image segmentation (2)
• Natural image segmentation (3) MRF features
• Natural image segmentation (4) MRF features
• Natural image segmentation (6)
• Natural image segmentation (7)
• Natural image segmentation (8)
• Natural image segmentation (9)
• Results (K=5)
• Segmentation and recovery (1) Berkeley image data base. Additive white Gaussian noise  SNR between -4 dB and 12 dB MRF features. Evaluation indices  PR.  VI.  GCE.  BDE.
• Segmentation and recovery (2) PR index (K=5)
• Line processes (1)Image recovery: estimate a smooth function from noisyobservations.• Observations: d• Function to be estimated: u  2 min   di  ui      ui  ui 1   2 u  i i  Data fidelity term Smoothness term• Calculus of variations (Euler-Lagrange equations).
• Line processes (2)In presence of many edges (piecewise smoothness) the standardsolution is not satisfactory. A line process is integrated:   min   di  ui      ui  ui 1  1  li   a li  2 2 u ,l  i i i  li  0 : Non-edge, include the term. Penalty term li  1: Edge, add penalty.• Many local minima (due to simultaneous estimation of u and l), calculus of variations cannot be applied.
• Line processes (3) Milestones  [D. Geman and S. Geman 1984],  [A. Blake and A. Zisserman 1988],  [M. Black 1996 ]. Integration of la line process into a SV-GMM.  Continuous line process model on the contextual mixing proportions.  Gamma distributed line process variables.  Line process parameters are automatically estimated from the data (EM and Variational EM).
• GMM with line process (2)Line Process
• GMM with continuous line process (1)  Student’s-t prior on the local differences of the contextual mixing proportions:    ~ St (0,  , v jd ), d , n, j, k  d (n) n j k j 2 jd  Distinct priors on each:  Image class,  Neighborhood direction (horizontal, vertical).
• GMM with continuous line process (2)  Equivalently, at each pixel n:    ~ N (0,  / u ), n j k j 2 jd nk j u nk ~ Gamma(v jd / 2, v jd / 2), d , n, j, k  d (n). j  Joint distribution:  N D p(;  , v)    St ( n |  k ;  jd , v jd ). j j 2  j 1 n 1 d 1 k ( n )
• GMM with continuous line process (3)
• GMM with continuous line process (4)  Description of edge structure.  Continuous generalization of a binary line process. u nk   j Weak class variances (smoothness). u nk  0 j Uninformative prior (no smoothness). nk u j Separation of class j from the remaining classes. [G. Sfikas, C. Nikou and N. Galatsanos. IEEE CVPR, 2008]
• Edges between segments (1)
• Edges between segments (2) Horizontal differences Vertical differences Sky Cupola Building
• Numerical results (1)Berkeley images -Rand index (RI)
• Image registration• Estimate the transformation TΘ mapping the coordinates of an image I1 to a target image I2: 2  x, y, z   I1    x, y, z  ΤΘ is described by a set of parameters Θ
• Image similarity measure E()  I1    x, y, z  ,  2  x, y, z   • Single modal images – Quadratic error, Correlation, Fourier transform, Sign changes.• Multimodal images – Inter-image uniformity, mutual information (MI), normalized MI.
• Fundamental hypothesis• Correspondence between uniform regions in the two images.• Partitioning of the image to be registered. – Not necessarily into joint regions.• Projection of the partition onto the reference image.• Evaluation of a distance between the overlapping regions. – Minimum at correct alignment. – Minimize the distance.
• Distance between GMMs (1) A straightforward approach would be: MG1 ( x | Π1 , Θ1 )    m  ( x ) i 1 m i m 1 NG2 ( x | Π 2 , Θ 2 )    n n2 ( x i ) i n 1 M NE (G1 , G2 )    m n B  m , n  1 2 m 1 n 1 Bhattacharyya distance
• Distance between GMMs (2)Knowing the correspondencesallows the definition of: E (G1 , G2 )     B  T k  , k  K 1 2 k 1Pixels of the transformed floating Component of the reference image.image overlapping with the kthcomponent of the reference image.
• Energy function (1)• For a set of transformation parameters Θ: – Segment the image to be registered into K segments by GMM (or SMM). – For each segment: • Project the pixels onto the reference image. • Compute the mean and covariance of the reference image pixels under the projection mask. – Evaluate the distance between the distributions.
• Energy function (2) • Find the transformation parameters Θ:  2  min    B  T k  , k   K 1   k 1  • Optimization by simplex, Powell method or ICM.[D. Gerogiannis, C. Nikou and A. Likas. Image and Vision Computing, 2009]
• ConvexityBattacharyya distance
• Registration error (Gaussian noise)MMBIA 2007, Rio deJaneiro, Brazil
• Registration error
• Registration of point clouds• Correspondence is unknown – Greedy distance between mixtures – Determine the correspondence (Hungarian algorithm)
• Experimental resultsinitial point set 2% outliers + uniform noise
• Greedy distance
• Hungarian algorithm
• Conclusions• Application of mixture models to – Image segmentation – Image registration• Other applications – Image retrieval – Visual tracking –…