Mixture Models for Image Analysis
Upcoming SlideShare
Loading in...5
×
 

Mixture Models for Image Analysis

on

  • 417 views

Aristidis Likas, Associate Professor and Christoforos Nikou, Assistant Professor, University of Ioannina, Department of Computer Science , Mixture Models for Image Analysis

Aristidis Likas, Associate Professor and Christoforos Nikou, Assistant Professor, University of Ioannina, Department of Computer Science , Mixture Models for Image Analysis

Statistics

Views

Total Views
417
Views on SlideShare
390
Embed Views
27

Actions

Likes
0
Downloads
8
Comments
0

1 Embed 27

http://dls.csd.auth.gr 27

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Mixture Models for Image Analysis Mixture Models for Image Analysis Presentation Transcript

  • Mixture Models for Image AnalysisAristidis Likas & Christophoros Nikou IPAN Research Group Department of Computer Science University of Ioannina
  • Collaborators: Nikolaos Galatsanos, Professor Konstantinos Blekas, Assistant Professor Dr. Costas Constantinopoulos, Researcher George Sfikas, Ph.d Candidate Demetrios Gerogiannis, Ph.d Candidate
  • Outline• Mixture Models and EM (GMM, SMM)• Bayesian GMMs• Image segmentation using mixture models – Incremental Bayesian GMMs – Spatially varying GMMs (SVMMs) with MRF priors – SVMMs and line processes• Image registration using mixture models
  • Mixture Models• Probability density estimation: estimate the density function model f(x) that generated a given dataset X={x1,…, xN}• Mixture Models M M f ( x)    j 1 j j ( x;  j )  j  0,  j 1 j 1 – M pdf components φj(x), – mixing weights: π1, π2, …, πM (priors)• Gaussian Mixture Model (GMM): φj = N(μj, Σj)
  • GMM (graphical model) πj Hidden variable observation
  • GMM examplesGMMs be used for density estimation (like histograms) or clustering  j j ( x n ; j ) Cluster P( j | x )  n n  z j  n memberhsip f (x ) 6 probability
  • Mixture Model training• Given a dataset X={x1,…, xN} and a GMM f (x;Θ) p( X ; )  p( x1 ,..., xN ; )  i 1 f ( xi ; ) N• Likelihood:• GMM training: log-likelihood maximization N   arg max  ln p( xi ; )   i 1• Expectation-maximization (EM) algorithm – Applicable when posterior P(Z|X) can be computed
  • EM for Mixture Models• E-step: compute expectation of hidden variables given the observations:  j ( xn |  j ) P( j | x n )  z n  j K   j ( xn |  p ) p 1• M-step: maximize expected complete likelihood  ( t 1)  arg max  Q(Θ)   log p( X , Z ;Θ)P( Z |X ) Q ( )    z n  log  j  log  ( x n |  j ) N K j n 1 j 1
  • EM for GMM (M-step) n1 j N  zn  xnMean  (jt 1)   N n 1 z  n j n1 j  z n  ( x n   (jt 1) )( x n   (jt 1) )T NCovariance (jt 1)  n1 j N  zn  n1 j N  zn Mixing weights  ( t 1) j  N
  • Students t-distribution    d  1/ 2  ||St ( x;  , , )   2  d / 2   1 (  d ) ( )    1  ( x   )  ( x   ) /   1 2  2    Mean μ  Covariance matrix Σ  Degrees of freedom v  Bell-shaped + heavy-tailed (depending on v)  Tends to Gaussian for large v
  • The Students t-distribution
  • The Students t-distribution u; ~ Gamma( / 2, / 2) x | u; ,  ~ ( ,  / u) Hierarchical distribution x follows a Gaussian distribution whose covariance is scaled by a factor following a Gamma distribution. ML parameter estimation using the EM algorithm (u is considered as hidden variable).
  • The Students t-distribution
  • SMM: Students t Mixture Models Each component j follows St(μj, Σj, vj) (robust mixture) Parameter estimation using EM hidden variables: uj and zj  j ( xn |  j ) E-step:  z n  j K   p ( xn |  p ) p 1 v (jt )  d  u  n v  x      ( t ) 1  x n   (jt )  j (t ) n (t ) T j j j
  • SMM training• M-step n1 j N  u n  z n  x n Mean  (jt 1)  j n1 j N  u n  z n  j n1 j  u n  z n  ( x n   (jt 1) )( x n   (jt 1) )T N (jt 1)  j Covariance  N n 1  u n  z n  j j n1 j N  u n  z n   (jt 1)  j Mixing N proportion
  • EM for SMM• M-step Degrees of freedom: no closed form update  v (jt 1)   v (jt 1)   v (jt 1)  d  log   2      2   1  log           2   z n (t )  log  u n (t )   u n (t )  n1 j N  v(jt 1)  d     0 j j   n1 j N  z n ( t )  2 
  • Mixture model training issues• EM local maxima (dependence on initialization)• Covariance Singularities• How to select the number of components• SMM vs GMM • Better results for data with outliers (robustness) • Higher dependence on initialization (how to initialize vi ?)
  • EM Local Maxima
  • Bayesian GMM M M f ( x )    j j ( x;  j ,  j )  j 1 j 1 j 1Typical approach: Priors on all GMM parameters   j : N (m, S ), p(  )   p( j ) j 1  Tj : Wishart (v,V ), p(T )   p(T j ), T j  1 j j 1   ( 1 ,...,   ) : Dirichlet (a1 ,..., aM )
  • Bayesian GMM training• Parameters Θ become (hidden) RVs: H={Z, Θ}• Objective: Compute Posteriors P(Z|X), P(Θ|X) (intractable)• Approximations • Sampling (RJMCMC) • MAP approach • Variational approach• MAP approximation • mode of the posterior P(Θ|Χ) (MAP-EM) MAP  arg max {log P( X | )  log P( )} • compute P(Z|X,ΘMAP)
  • Variational Inference (no parameters)• Computes approximation q(H) of the true posterior P(H|X)• For any pdf q(H): ln p  X   F  q   KL  q  H  P  H | X  • Variational Bound (F) maximization pX,H  q*  arg max q F  q   arg max q  q  H  ln dH qH • Mean field approximation qH    qH k  k exp ln p  X , H ;     q  H ;   k q H k  exp ln p  X , H ;   dH k  q H k • System of equations D. Tzikas, A. Likas, N. Galatsanos, IEEE Signal Processing Magazine, 2008
  • Variational Inference (with parameters)• X data, H hidden RVs, Θ parameters• For any pdf q(H;Θ):  ln p  X ;   F  q,   KL q  H ;  p  H | X ;  • Maximization of Variational Bound F p  X , H ;  F  q,    q  H ;  ln dH  ln p  X ;  q  H ;  • Variational EM • VE-Step: q  arg max F q,  old q   • VM-Step:   arg max F qold ,    
  • Bayesian GMM training• Bayesian GMMs (no parameters) • mean field variational approximation • tackles the covariance singularity problem • requires to specify the parameters of the priors• Estimating the number of components: • Start with a large number of components • Let the training process prune redundant components (πj=0) • Dirichlet prior on πj prevents component prunning
  • Bayesian GMM without prior on π• Mixing weights πj are parameters (remove Dirichlet prior)• Training using Variational EMMethod (C-B) • Start with a large number of components • Perform variational maximization of the marginal likelihood • Prunning of redundant components (πj=0) • Only components that fit well to the data are finally retained
  • Bayesian GMM (C-B)• C-B method: Results depend on • the number of initial components • initialization of components • specification of the scale matrix V of the Wishart prior p(T)
  • Incremental Bayesian GMM • Solution: incremental training using component splitting • Local scale matrix V: based on the variance of the component to be splitted• Modification of the Bayesian GMM is needed • Divide the components as ‘fixed’ or ‘free’ • Prior on the weights of ‘fixed’ components (retained) • No prior on the weights of ‘free’ components (may be eliminated) • Prunning restricted among ‘free’ components C. Constantinopoulos & A. Likas, IEEE Trans. on Neural Networks, 2007
  • Incremental Bayesian GMM
  • Incremental Bayesian GMM• Start with k=1 component.•At each step: • select a component j • split component j in two subcomponents • set the scale matrix V analogous to Σj • apply Variational EM considering the two subcomponents as free and the rest components as fixed • either the two components will be retained and adjusted • or one of them will be eliminated and the other one will recover the original component (before split)• until all components have been tested for split unsuccessfully
  • Mixture Models for Image Modeling• Select a feature representation• Compute a feature vector per pixel to form the training set• Build a mixture model for the image using the training set•Applications • Image retrieval + relevance feedback • Image segmentation • Image registration
  • Mixture Models for Image Segmentation• One cluster per mixture component.• Assign pixels to clusters based on P(j|x)• Take into account spatial smoothness: neighbouring pixels are expected to have the same label • Simple way: add pixel coordinates to the feature vector • Bayesian way: impose MRF priors (SVMM)
  • Incremental Bayesian GMM Image segmentationNumber of segments determined automatically
  • Incremental Bayesian GMM Image segmentationNumber of segments determined automatically
  • Spatially Varying mixtures (1) K f ( x n | Π, Θ)    n  ( x n |  j ) j n  1,2,...,N j 1 x n Image feature (e.g. pixel intensity)  n j Contextual mixing proportions(x |  j ) n Gaussian parameterized by  j  { j ,  j } n z j Data Label, hidden variable
  • Spatially Varying mixtures (2)Insight into the contextual mixing proportions:   p( z  1| x ) n j n j nSmoothness is enforced in the image by imposinga prior p(Π) on the probability of the pixel labels(contextual mixing proportions). N L(Π | Χ, Θ)   log f ( x n | Π, Θ)  log p( Π) n 1
  • SV-GMM with Gibbs prior (1) • A typical constraint is the Gibbs prior: 1 U (Π ) N p(Π)  e , U (Π )    VN i (Π ), Z i 1 Ni 2   KVNi (Π)      i j m j , Smoothness weight j 1 m 1[K. Blekas, A. Likas, N. Galatsanos and I. Lagaris. IEEE Trans. Neur.Net., 2005]
  • SV-GMM with Gibbs prior (2)
  • SV-GMM with Gibbs prior (3)• E-step: equivalent with GMM.• M-step: the contextual mixing proportions are solutions to a quadratic equation.• Note that: 1) Parameter β of the Gibbs prior must be determined beforehand. 2) The contextual mixing proportions are not constraint to be probability vectors: K 0   n  1, j  j  n  1, n  1, 2,..., N j 1
  • SV-GMM with Gibbs prior (4)To address these issues:1) Class adaptive Gauss-Markov random field prior.2) Projection of probabilities to the hyper-plane (another solution will be presented later on): K  j  n  1, n  1, 2,..., N j 1
  • SV-GMM with Gauss-Markov prior (2)• One variance per cluster j=1,2,…,K per direction d=0, 45, 90, 135 degrees  N m 2 D K   m ( j   j )  n p(Π )    j ,d exp N   1 n 1 Nn  d 1 j 1  2  j2,d      [C. Nikou, N. Galatsanos and A. Likas. IEEE Trans. Im. Proc., 2007]
  • SV-GMM with Gauss-Markov prior (3)
  • MAP estimation  Posterior probabilities are the non-negative solutions of the second degree equation:   D D Q D D D  0  | N n |    j2,d  ( n ) 2      j2,d   m  ( n )  z ij   j2,d  0 n  p 1 d 1  j  p 1 d 1 mN n p j  j d 1 j   d p    dp     There is always a non-negative solution. K  Projection to the hyperplane:  j  n  1, n  1, 2,..., N j 1
  • RGB image segmentation (1)Original image R-SNR = 2 dB G-SNR = 4 dB B-SNR = 3 dB
  • RGB image segmentation (2) Noise-free image segmentation SVFMM CA-SVFMM
  • RGB image segmentation (3) Degraded image segmentation SVFMM CA-SVFMM(β determined by trial and error)
  • RGB image segmentation (4) Shading effect on cupola and wall modeled with SVFMM with a GMRF prior. βj x10-3 Cupola 128 Sky 33 Wall 119
  • SV-GMM with DCM prior (1)For pixel n, the class label is a random variable multinomiallydistributed: !   jn  ,  jn  0, K zn K p( z n |  n )    jn  1, n  1,..., N , j K  z ij j 1 j 1 j 1    ,  ,...,   T parameterized by probability vector n 1 n n 2 n K . The whole image is parameterized by Ξ   1  ,    2    ,...,   
  • SV-GMM with DCM prior (2) !   jn  ,  jn  0, K zn Kp( z n |  n )    jn  1, n  1,..., N , j K  z ij j 1 j 1 j 1Generative model for the image• Multinomial distribution: K possile outcomes.• Class label j, (j=1…K) appears with probability ξj .• M realizations of the process.• The distribution of the counts of a certain class isbinomial.
  • SV-GMM with DCM prior (3)• The Dirichlet distribution forms the conjugateprior for the multinomial distribution.– The posterior p( | x) has the same functional form as the prior p( ) . p( x |  ) p( ) p( | x)   p( x |  ) p( )d [C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]
  • SV-GMM with DCM prior (4) • It is natural to impose a Dirichlet prior on the parameters of the multinomial pdf:  K n aj p ( n | a n )  K    K ( a n 1)  j 1  jn , a n  0, n  1,..., N , j  1,..., K , j j  j    a n j 1 j 1 parameterized by vector a   a , a ,..., a  T n n n n 1 2 K .
  • SV-GMM with DCM prior (5)Marginalizing the parameters of the multinomial 1 p( z n | a n )   p( z n |  n ) p( n | a n ) d i , n  1, 2,..., N 0yields the Dirichlet compound multinomial distribution forthe class labels:  K n aj  M!  j 1   K  an  z n  p( z | a )  K   a n , n  1,..., N . n n j j   K n  z j    a j  z n  j 1 n j j   j 1  j 1 
  • SV-GMM with DCM prior (6)Image model: for a given pixel, its class j isdetermined by M=1 realization of the process.   n  p( z n  1| x n )  1  n j j  m  p( zm  1| x n )  0 m  j, m  1, 2,..., K , nThe DCM prior for the class label becomes: an an p( z n  1| a n )   j  1,..., K . j j  j K n  n am m 1
  • SV-GMM with DCM prior (7)The model becomes spatially varying by imposing aGMRF prior on the parameters of the Dirichlet pdf.  N n 2 K   m (a j  a j )  n p ( A )    j exp N   1 n 1 Nn  j 1  2  j2      [C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]
  • SV-GMM with DCM prior (8)
  • MAP estimation Posterior probabilities are the non-negative solutions toQ   jam   n mNi  n 2   n j  a m   mN i j  n z j  j  j n n 2  0  ( a n )3     j   (a j )    (a j )  | N |  0a j n j | Nn | | Nn |     n     K  n j   am , n  1, 2,..., N n m 1 m j There is always a non-negative solution. No need for projection!
  • Natural image segmentation (1)  Berkeley image data base (300 images).  Ground truth: human segmentations.  Features  MRF features o 7x7 windows x 3 components. o 147 dimensional vector. o PCA on a single image. o 8 principal components kept.
  • Natural image segmentation (2)
  • Natural image segmentation (3) MRF features
  • Natural image segmentation (4) MRF features
  • Natural image segmentation (6)
  • Natural image segmentation (7)
  • Natural image segmentation (8)
  • Natural image segmentation (9)
  • Results (K=5)
  • Segmentation and recovery (1) Berkeley image data base. Additive white Gaussian noise  SNR between -4 dB and 12 dB MRF features. Evaluation indices  PR.  VI.  GCE.  BDE.
  • Segmentation and recovery (2) PR index (K=5)
  • Line processes (1)Image recovery: estimate a smooth function from noisyobservations.• Observations: d• Function to be estimated: u  2 min   di  ui      ui  ui 1   2 u  i i  Data fidelity term Smoothness term• Calculus of variations (Euler-Lagrange equations).
  • Line processes (2)In presence of many edges (piecewise smoothness) the standardsolution is not satisfactory. A line process is integrated:   min   di  ui      ui  ui 1  1  li   a li  2 2 u ,l  i i i  li  0 : Non-edge, include the term. Penalty term li  1: Edge, add penalty.• Many local minima (due to simultaneous estimation of u and l), calculus of variations cannot be applied.
  • Line processes (3) Milestones  [D. Geman and S. Geman 1984],  [A. Blake and A. Zisserman 1988],  [M. Black 1996 ]. Integration of la line process into a SV-GMM.  Continuous line process model on the contextual mixing proportions.  Gamma distributed line process variables.  Line process parameters are automatically estimated from the data (EM and Variational EM).
  • GMM with line process (2)Line Process
  • GMM with continuous line process (1)  Student’s-t prior on the local differences of the contextual mixing proportions:    ~ St (0,  , v jd ), d , n, j, k  d (n) n j k j 2 jd  Distinct priors on each:  Image class,  Neighborhood direction (horizontal, vertical).
  • GMM with continuous line process (2)  Equivalently, at each pixel n:    ~ N (0,  / u ), n j k j 2 jd nk j u nk ~ Gamma(v jd / 2, v jd / 2), d , n, j, k  d (n). j  Joint distribution:  N D p(;  , v)    St ( n |  k ;  jd , v jd ). j j 2  j 1 n 1 d 1 k ( n )
  • GMM with continuous line process (3)
  • GMM with continuous line process (4)  Description of edge structure.  Continuous generalization of a binary line process. u nk   j Weak class variances (smoothness). u nk  0 j Uninformative prior (no smoothness). nk u j Separation of class j from the remaining classes. [G. Sfikas, C. Nikou and N. Galatsanos. IEEE CVPR, 2008]
  • Edges between segments (1)
  • Edges between segments (2) Horizontal differences Vertical differences Sky Cupola Building
  • Numerical results (1)Berkeley images -Rand index (RI)
  • Image registration• Estimate the transformation TΘ mapping the coordinates of an image I1 to a target image I2: 2  x, y, z   I1    x, y, z  ΤΘ is described by a set of parameters Θ
  • Image similarity measure E()  I1    x, y, z  ,  2  x, y, z   • Single modal images – Quadratic error, Correlation, Fourier transform, Sign changes.• Multimodal images – Inter-image uniformity, mutual information (MI), normalized MI.
  • Fundamental hypothesis• Correspondence between uniform regions in the two images.• Partitioning of the image to be registered. – Not necessarily into joint regions.• Projection of the partition onto the reference image.• Evaluation of a distance between the overlapping regions. – Minimum at correct alignment. – Minimize the distance.
  • Distance between GMMs (1) A straightforward approach would be: MG1 ( x | Π1 , Θ1 )    m  ( x ) i 1 m i m 1 NG2 ( x | Π 2 , Θ 2 )    n n2 ( x i ) i n 1 M NE (G1 , G2 )    m n B  m , n  1 2 m 1 n 1 Bhattacharyya distance
  • Distance between GMMs (2)Knowing the correspondencesallows the definition of: E (G1 , G2 )     B  T k  , k  K 1 2 k 1Pixels of the transformed floating Component of the reference image.image overlapping with the kthcomponent of the reference image.
  • Energy function (1)• For a set of transformation parameters Θ: – Segment the image to be registered into K segments by GMM (or SMM). – For each segment: • Project the pixels onto the reference image. • Compute the mean and covariance of the reference image pixels under the projection mask. – Evaluate the distance between the distributions.
  • Energy function (2) • Find the transformation parameters Θ:  2  min    B  T k  , k   K 1   k 1  • Optimization by simplex, Powell method or ICM.[D. Gerogiannis, C. Nikou and A. Likas. Image and Vision Computing, 2009]
  • ConvexityBattacharyya distance
  • Registration error (Gaussian noise)MMBIA 2007, Rio deJaneiro, Brazil
  • Registration error
  • Registration of point clouds• Correspondence is unknown – Greedy distance between mixtures – Determine the correspondence (Hungarian algorithm)
  • Experimental resultsinitial point set 2% outliers + uniform noise
  • Greedy distance
  • Hungarian algorithm
  • Conclusions• Application of mixture models to – Image segmentation – Image registration• Other applications – Image retrieval – Visual tracking –…