SlideShare a Scribd company logo
1 of 31
Download to read offline
k-MLE: 
A fast algorithm for learning statistical mixture models 
(arXiv:1203.5181) 
Frank NIELSEN 
Sony Computer Science Laboratories, Inc. 
28th March 2012 
International Conference on Acoustics, Speech, and Signal Processing 
ICASSP, Kyoto ICC 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/28
Outline 
I Background 
I Statistical mixtures of exponential families (EFMMs) 
I Legendre transform and mixture dual parameterizations 
I Contributions 
I k-MLE and its variants 
I k-MLE initialization (k-MLE++) 
I Summary 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/28
Exponential Family Mixture Models (EFMMs) 
Generalize Gaussian & Rayleigh MMs to many common 
distributions. 
m(x) = 
Xk 
i=1 
wipF (x; i ) with 8i wi  0, 
Pk 
i=1 wi = 1 
pF (x; ) = eht(x),i−F()+k(x) 
F: log-Laplace transform (partition, cumulant function): 
Z 
x2X 
pF (x; )dx = 1 ) F() = log 
Z 
x2X 
eht(x),i+k(x)dx, 
 2  = 
 
 | 
Z 
x2X 
eht(x),i+k(x)dx  1 
 
the natural parameter space. 
I d: Dimension of the support X. 
I D: order of the family (= dim). Statistic: t(x) : Rd ! RD. 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/28
Statistical mixtures: Rayleigh MMs [7, 5] 
IntraVascular UltraSound (IVUS) imaging: 
Rayleigh distribution: 
p(x; ) = x 
2 e− x2 
22 
x 2 R+ = X 
d = 1 (univariate) 
D = 1 (order 1) 
 = − 1 
22 
 = (−1, 0) 
F() = −log(−2) 
t(x) = x2 
k(x) = log x 
(Weibull k = 2) 
Coronary plaques: fibrotic tissues, calcified tissues, lipidic tissues 
Rayleigh Mixture Models (RMMs): 
for segmentation and classification tasks 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/28
Statistical mixtures: Gaussian MMs [3, 5] 
Gaussian mixture models (GMMs). 
Color image interpreted as a 5D xyRGB point set. 
Gaussian distribution p(x; μ,): 
1 
(2) 
d 
2p|| 
e−1 
2D−1 (x−μ,x−μ) 
Squared Mahalanobis distance: 
DQ(x, y) = (x − y)TQ(x − y) 
x 2 Rd = X 
d (multivariate) 
D = d(d+3) 
2 (order) 
 = (−1μ, 1 
2−1) = (v , M) 
 = R × Sd+ 
+ 
F() = 1 
v −1 
4 T 
M v − 1 
2 log |M| + 
d 
2 log  
t(x) = (x,−xxT ) 
k(x) = 0 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/28
Sampling from a Gaussian Mixture Model (GMM) 
To sample a variate x from a GMM: 
I Choose a component l according to the weight distribution 
w1, ...,wk , 
I Draw a variate x according to N(μl ,l ). 
Doubly stochastic process: 
1. throw a (biased) dice with k faces to choose the component: 
l  Multinomial(w1, ...,wk ) 
(Multinomial distribution belongs also to the exponential 
families.) 
2. then draw at random a variate x from the l -th component 
x  Normal(μl ,l ) 
x = μ + Cz with Cholesky:  = CCT and z = [z1 ... zd ]T 
standard normal random variate: zi = p−2 log U1 cos(2U2) 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/28
Statistical mixtures: Generative models of data sets 
GMM = feature descriptor for information retrieval (IR) 
! classification, matching, etc. 
Increase dimension using color image patches. 
Low-frequency information encoded into compact statistical model. 
Generative model ! statistical image by GMM sampling. 
Source GMM Sample 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/28
Distance between exponential families: Relative entropy 
I Distance between features (e.g., GMMs) 
I Kullback-Leibler divergence (cross-entropy minus entropy): 
KL(P : Q) = 
Z 
p(x) log 
p(x) 
q(x) 
dx  0 
= 
Z 
p(x) log 
1 
q(x) 
dx 
| {z } 
H×(P:Q) 
− 
Z 
p(x) log 
1 
p(x) 
dx 
| {z } 
H(p)=H×(P:P) 
= F(Q) − F(P) − hQ − P,rF(P)i 
= BF (Q : P) 
Bregman divergence BF defined for a strictly convex and 
differentiable function (up to some affine terms). 
I Proof KL(P : Q) = BF (Q : P) follows from 
X  EF () =) E[t(X)] = rF() 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/28
Bregman divergence: Geometric interpretation 
Potential function F, graph plot F : (x, F(x)). 
DF (p : q) = F(p) − F(q) − hp − q,rF(q)i 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/28
Convex duality: Legendre transformation 
I For a strictly convex and differentiable function F : X ! R: 
F(y) = sup 
x2X{hy, xi − F(x) | {z } 
lF (y;x); 
} 
I Maximum obtained for y = rF(x): 
rx lF (y; x) = y − rF(x) = 0 ) y = rF(x) 
I Maximum unique from convexity of F (r2F  0): 
r2x 
lF (y; x) = −r2F(x)  0 
I Convex conjugates: 
(F,X) , (F,Y), Y = {rF(x) | x 2 X} 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/28
Legendre duality  Canonical divergence 
I Convex conjugates have functional inverse gradients 
rF−1 = rF 
rF may require numerical approximation 
(not always available in analytical closed-form) 
I Involution: (F) = F. 
I Convex conjugate F expressed using (rF)−1: 
F(y) = h(rF)−1(y), yi − F((rF)−1(y)) 
I Fenchel-Young inequality at the heart of canonical divergence: 
F(x) + F(y)  hx, yi 
AF (x : y) = AF(y : x) = F(x) + F(y) − hx, yi  0 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/28
Dual Bregman divergences  canonical divergence [6] 
KL(P : Q) = EP 
 
log 
p(x) 
q(x) 
 
 0 
= BF (Q : P) = BF(P : Q) 
= F(Q) + F(P) − hQ, Pi 
= AF (Q : P) = AF(P : Q) 
with Q (natural parameterization) and P = EP[t(X)] = rF(P) 
(moment parameterization). 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/28
Exponential family mixtures: Dual parameterizations 
A finite weighted point set {(wi , i )}ki 
=1 in a statistical manifold. 
Many coordinate systems for computing (two canonical): 
I usual -parameterization, 
I natural -parameterization and dual -parameterization. 
Original parameters 
 2  
Exponential family 
dual parameterization 
Legendre transform 
(, F) $ (H, F) 
 2   2 H 
 = rF()  = rF() 
Natural parameters Expectation parameters 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/28
Maximum Likelihood Estimator (MLE) 
Given n identical and independently distributed observations 
X = {x1, ..., xn} 
Maximum Likelihood Estimator 
Yn 
 ˆ= argmax2 
i=1 
pF (xi ; ) = argmax2ePn 
i=1ht(xi ),i−F()+k(xi ) 
is unique maximum since r2F  0 (Hessian): 
rF(ˆ) = 
1 
n 
Xn 
i=1 
t(xi ) 
MLE is consistent, efficient with asymptotic normal distribution 
ˆ  N 
 
, 
1 
n 
 
I−1() 
Fisher information matrix 
I () = var[t(X)] = r2F() 
MLE may be biased (eg, normal distributions). 
c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/28
Duality Bregman $ Exponential families [2] 
Bregman divergence: 
BF (x : ) 
Bregman generator: 
F() 
Cumulant function: 
F() 
Exponential family: 
pF (x|) 
Legendre 
duality 
 = rF() 
An exponential family... 
pF (x; ) = exp(ht(x), i − F() + k(x)) 
has the log-density interpreted as a Bregman divergence: 
log pF (x; ) = −BF(t(x) : ) + F(t(x)) + k(x) 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/28
Exponential families , Bregman divergences: Examples 
F(x) pF (x|) , BF 
Generator Exponential Family , Dual Bregman divergence 
x2 Spherical Gaussian , Squared loss 
x log x Multinomial , Kullback-Leibler divergence 
x log x − x Poisson , I -divergence 
−log(−2x) Rayleigh , Itakura-Saito divergence 
−log x Geometric , Itakura-Saito divergence 
log |X| Wishart , log-det/Burg matrix div. [8] 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/28
Maximum likelihood estimator revisited 
ˆ = argmax 
Qn 
i=1 pF (xi ; ) = argmax 
Pn 
i=1 log pF (xi ; ) 
argmax 
Xn 
i=1 
(ht(xi ), i − F() + k(xi )) 
argmax 
Xn 
i=1 
−BF(t(xi ) : ) + F(t(xi )) + k(xi ) | {z } 
constant 
 argmin 
Xn 
i=1 
BF(t(xi ) : ) 
Right-sided Bregman centroid = center of mass: ˆ = 1 
n 
Pn 
i=1 t(xi ). 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/28
Bregman batched Lloyd’s k-means [2] 
Extends Lloyd’s k-means heuristic to Bregman divergences. 
I Initialize distinct seeds: C1 = P1, ..., Ck = Pk 
I Repeat until convergence 
I Assign point Pi to its “closest” centroid (wrt. BF (Pi : C)) 
Ci = {P 2 P | BF (P : Ci )  BF (P : Cj ) 8j6= i} 
I Update cluster centroids by taking their center of mass: 
Ci = 1 
|Ci | 
P 
P2Ci 
P. 
Loss function 
LF (P : C) = 
X 
P2P 
BF (P : C) 
BF (P : C) = min 
i2{1,...,k} 
BF (P : Ci ) 
...monotonically decreases and converges to a local optimum. 
(Extend to weighted point sets using barycenters.) 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 18/28
k-MLE for EFMM  Bregman Hard Clustering [4] 
Bijection exponential families (distributions) $ Bregman distances 
log pF (x; ) = −BF(t(x) : ) + F(t(x)) + k(x),  = rF() 
Bregman k-MLE for EFMMs (F) = additively weighted Bregman 
hard k-means for F in the Qspace {yi = t(xi )}i : 
Complete log-likelihood log 
n 
i=1 
Qk 
j=1(wjpF (xi |j ))j (zi ): 
= max 
,w 
Xn 
i=1 
Xk 
j=1 
j (zi )(log pF (xi |j ) + log wj ) 
min 
H,w 
Xn 
i=1 
Xk 
j=1 
j (zi )((BF (t(xi ) : j ) − log wj )−k(xi ) − F(t(xi ) | {z } 
constant 
) 
 min 
,w 
Xn 
i=1 
k 
min 
j=1 
BF(t(xi ) : j ) − log wj 
(This is the argmin that gives the zi ’s) 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 19/28
Complete average log-likelihood optimization 
Minimize monotonically the complete average log-likelihood: 
1 
n 
min 
H,w 
Xn 
i=1 
k 
min 
j=1 
BF(t(xi ) : j ) − log wj 
I 1. Constant weights ! dual additive Bregman k-means 
1 
n 
min 
H 
Xn 
i=1 
k 
min 
j=1 
(BF(t(xi ) : j ) − log wj ) 
I 2. Component moment parameters  fixed: 
min 
w 
Xn 
i=1 
Xk 
j=1 
−j (zi ) log wj = min 
w 
Xk 
j=1 
−j log wj , 
where j = |Cj | 
n . That is, minimize the cross-entropy: 
minw H×( : w) ) w = . 
I Go to 1 until (local) convergence is met. 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 20/28
k-MLE-EFMM algorithm [4] 
I 0. Initialization: 8i 2 {1, ..., k}, let wi = 1 
k and i = t(xi ) 
(initialization is further discussed later on). 
I 1. Assignment: 
8i 2 {1, ..., n}, zi = argminkj 
=1BF(t(xi ) : j ) − log wj . 
Let Ci = {xj |zj = i}, 8i 2 {1, ..., k} be the cluster partition: 
X = [ki 
=1Ci . 
I 2. Update the -parameters: 
P 
8i 2 {1, ..., k}, 1 
i = |Ci | 
x2Ci 
t(x). 
Goto step 1 unless local convergence of the complete 
likelihood is reached. 
I 3. Update the mixture weights: 8i 2 {1, ..., k},wi = 1 
n |Ci |. 
Goto step 1 unless local convergence of the complete 
likelihood is reached. 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 21/28
k-MLE initialization 
I Forgy’s random seed (d = D), 
I Bregman k-means (for F on Y, and MLE on each cluster). 
Usually D  d (eg., multivariate Gaussians D = d(d+3) 
2 ) 
I Compute global MLE ˆ = 1 
n 
Pn 
i=1 t(xi ) 
(well-defined for n  D ! ˆ 2 ) 
I Consider restricted exponential family for Fˆ(d+1...D)((1...d)), 
i = t(1...d)(xi ) and (d+1...D) 
then set (1...d) 
i = ˆ(d+1...D). 
(e.g., we fix global covariance matrix, and let μi = xi for 
Gaussians) 
I Improve initialization by applying Bregman k-means++ [1] for 
the convex conjugate of Fˆ(d+1...D)((1...d)) 
k-MLE++ based on Bregman k-means++ 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 22/28
k-MLE variants using any Bregman k-means heuristic 
I Any k-means optimization heuristic allows one to update the 
mixture -parameters. 
I Hartigan  Wang’s greedy swap (after Lloyd convergence) 
I Kanungo et al. swap ((9 + )-approximation) 
I Performing successively mixture  and w parameters yield 
Hard EM variant. 
(easily implemented by winner-take-all EM weight 
membership) 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 23/28
k-MLE for MVNs with the (μ,) parameters 
I 0. Initialization: 
I Calculate global mean ¯μ and global covariance matrix ¯ 
: 
¯μ = 1 
n 
Pk 
i=1 xi , ¯ 
= 1 
n 
Pk 
i=1 xi xT 
i − ¯μ¯μT 
I 8i 2 {1, ..., k}, initialize the i th seed as (μi = xi ,i = ¯ 
). 
I 1. Assignment: 8i 2 {1, ..., n} 
zi = argminkj 
=1M−1 
i 
(x − μi , x − μi ) + log |i | − 2 log wi with 
M−1 
i 
(x − μi , x − μi ) the squared Mahalanobis distance: 
MQ(x, y) = (x − y)TQ(x − y). Let 
Ci = {xj |zj = i}, 8i 2 {1, ..., k} be the cluster partition: 
X = [ki 
=1Ci . 
I 2. Update the parameters: 
P 
8i 2 {1, ..., k}, μi = 1 
|Ci | 
x2Ci 
x,i = 1 
|Ci | 
P 
x2Ci 
xxT − μiμTi 
Goto step 1 unless local convergence. 
I 3. Update the mixture weights: 8i 2 {1, ..., k},wi = |Ci | 
n . 
Goto step 1 unless local convergence. 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 24/28
Summary of contributions 
I Hard k-MLE versus soft EM: 
I k-MLE maximizes locally the complete likelihood 
I EM maximizes the incomplete likelihood 
I The component parameter  update can be implemented 
using any Bregman k-means heuristic on conjugate F, 
I Initialization can be performed using k-MLE++ 
I Indivisibility: Robustness when identifying statistical mixture 
models? Which k? 8k 2 N, N(μ, 2) = 
Pk 
i=1 N 
 
μ 
k , 2 
k 
 
Simplifying mixtures from kernel density estimators is one 
fine-to-coarse solution. See: 
Model centroids for the simplification of kernel density 
estimators, ICASSP 2012, March 29th. 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 25/28
Marcel R. Ackermann and Johannes Bl¨omer. 
Bregman clustering for separable instances. 
In Scandinavian Workshop on Algorithm Theory (SWAT), 
pages 212–223, 2010. 
Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and 
Joydeep Ghosh. 
Clustering with Bregman divergences. 
Journal of Machine Learning Research, 6:1705–1749, 2005. 
Vincent Garcia and Frank Nielsen. 
Simplification and hierarchical representations of mixtures of 
exponential families. 
Signal Processing (Elsevier), 90(12):3197–3212, 2010. 
Frank Nielsen. 
k-MLE: A fast algorithm for learning statistical mixture 
models. 
In IEEE International Conference on Acoustics, Speech, and 
Signal Processing (ICASSP). IEEE, 2012. 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 25/28
preliminary, technical report on arXiv. 
Frank Nielsen and Vincent Garcia. 
Statistical exponential families: A digest with flash cards, 
2009. 
arXiv.org:0911.4863. 
Frank Nielsen and Richard Nock. 
Entropies and cross-entropies of exponential families. 
In International Conference on Image Processing (ICIP), pages 
3621–3624, 2010. 
Jose Seabra, Francesco Ciompi, Oriol Pujol, Josepa Mauri, 
Petia Radeva, and Joao Sanchez. 
Rayleigh mixture model for plaque characterization in 
intravascular ultrasound. 
IEEE Transaction on Biomedical Engineering, 
58(5):1314–1324, 2011. 
Shijun Wang and Rong Jin. 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 25/28
An information geometry approach for distance metric 
learning. 
Journal of Machine Learning Research, 5:591–598, 2009. 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 26/28
Anisotropic Voronoi diagram (for MVN MMs) 
From the source color image (a), we buid a 5D GMM with k = 32 
components, and color each pixel with the mean color of the 
anisotropic Voronoi cell it belongs to 
(a) (b) 
Speed-up assignment step using Bregman ball trees or Bregman 
vantage point trees. 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 26/28
Expectation-maximization (EM) for EFMMs [2] 
EM increases monotonically the expected complete likelihood 
(marginalize): 
Xn 
i=1 
Xk 
j=1 
p(zj |xi , ) log p(xi , zj |) 
Banerjee et al. [2] proved it amounts to a Bregman soft clustering: 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 27/28
Comparisons: k-MLE vs. EM for EFMMs 
k-MLE/Hard EM Soft EM (1977) 
= Bregman hard clustering = Bregman soft clustering 
Memory lighter heavier (W matrix) 
Speed lighter (VP-tree) heavier (all weights wij ) 
Conv. always finitely 1, stopping criterion 
Init. k-MLE++ k-means(++) 

c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 28/28

More Related Content

What's hot

Small updates of matrix functions used for network centrality
Small updates of matrix functions used for network centralitySmall updates of matrix functions used for network centrality
Small updates of matrix functions used for network centralityFrancesco Tudisco
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
Optimal L-shaped matrix reordering, aka graph's core-periphery
Optimal L-shaped matrix reordering, aka graph's core-peripheryOptimal L-shaped matrix reordering, aka graph's core-periphery
Optimal L-shaped matrix reordering, aka graph's core-peripheryFrancesco Tudisco
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon informationFrank Nielsen
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Francesco Tudisco
 
Clustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryClustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryFrank Nielsen
 
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Frank Nielsen
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 
A new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsA new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsFrancesco Tudisco
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Core–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsCore–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsFrancesco Tudisco
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...Yandex
 

What's hot (20)

Small updates of matrix functions used for network centrality
Small updates of matrix functions used for network centralitySmall updates of matrix functions used for network centrality
Small updates of matrix functions used for network centrality
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Optimal L-shaped matrix reordering, aka graph's core-periphery
Optimal L-shaped matrix reordering, aka graph's core-peripheryOptimal L-shaped matrix reordering, aka graph's core-periphery
Optimal L-shaped matrix reordering, aka graph's core-periphery
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon information
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
 
Clustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryClustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometry
 
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
A new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsA new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensors
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
Core–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsCore–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectors
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...
 

Viewers also liked

A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
(slides 9) Visual Computing: Geometry, Graphics, and Vision
(slides 9) Visual Computing: Geometry, Graphics, and Vision(slides 9) Visual Computing: Geometry, Graphics, and Vision
(slides 9) Visual Computing: Geometry, Graphics, and VisionFrank Nielsen
 
(slides 8) Visual Computing: Geometry, Graphics, and Vision
(slides 8) Visual Computing: Geometry, Graphics, and Vision(slides 8) Visual Computing: Geometry, Graphics, and Vision
(slides 8) Visual Computing: Geometry, Graphics, and VisionFrank Nielsen
 
Fundamentals cig 4thdec
Fundamentals cig 4thdecFundamentals cig 4thdec
Fundamentals cig 4thdecFrank Nielsen
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Frank Nielsen
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-DivergencesFrank Nielsen
 
On approximating the Riemannian 1-center
On approximating the Riemannian 1-centerOn approximating the Riemannian 1-center
On approximating the Riemannian 1-centerFrank Nielsen
 
INF442: Traitement des données massives
INF442: Traitement des données massivesINF442: Traitement des données massives
INF442: Traitement des données massivesFrank Nielsen
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
 
Traitement des données massives (INF442, A8)
Traitement des données massives (INF442, A8)Traitement des données massives (INF442, A8)
Traitement des données massives (INF442, A8)Frank Nielsen
 
On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)Frank Nielsen
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
 
Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Frank Nielsen
 
Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)Frank Nielsen
 
Traitement massif des données 2016
Traitement massif des données 2016Traitement massif des données 2016
Traitement massif des données 2016Frank Nielsen
 
Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Frank Nielsen
 
Computational Information Geometry for Machine Learning
Computational Information Geometry for Machine LearningComputational Information Geometry for Machine Learning
Computational Information Geometry for Machine LearningFrank Nielsen
 

Viewers also liked (17)

A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
(slides 9) Visual Computing: Geometry, Graphics, and Vision
(slides 9) Visual Computing: Geometry, Graphics, and Vision(slides 9) Visual Computing: Geometry, Graphics, and Vision
(slides 9) Visual Computing: Geometry, Graphics, and Vision
 
(slides 8) Visual Computing: Geometry, Graphics, and Vision
(slides 8) Visual Computing: Geometry, Graphics, and Vision(slides 8) Visual Computing: Geometry, Graphics, and Vision
(slides 8) Visual Computing: Geometry, Graphics, and Vision
 
Fundamentals cig 4thdec
Fundamentals cig 4thdecFundamentals cig 4thdec
Fundamentals cig 4thdec
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 
On approximating the Riemannian 1-center
On approximating the Riemannian 1-centerOn approximating the Riemannian 1-center
On approximating the Riemannian 1-center
 
INF442: Traitement des données massives
INF442: Traitement des données massivesINF442: Traitement des données massives
INF442: Traitement des données massives
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)
 
Traitement des données massives (INF442, A8)
Traitement des données massives (INF442, A8)Traitement des données massives (INF442, A8)
Traitement des données massives (INF442, A8)
 
On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metrics
 
Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)
 
Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)
 
Traitement massif des données 2016
Traitement massif des données 2016Traitement massif des données 2016
Traitement massif des données 2016
 
Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)
 
Computational Information Geometry for Machine Learning
Computational Information Geometry for Machine LearningComputational Information Geometry for Machine Learning
Computational Information Geometry for Machine Learning
 

Similar to k-MLE: A fast algorithm for learning statistical mixture models

Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Frank Nielsen
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometryFrank Nielsen
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Frank Nielsen
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Slides: The Burbea-Rao and Bhattacharyya centroids
Slides: The Burbea-Rao and Bhattacharyya centroidsSlides: The Burbea-Rao and Bhattacharyya centroids
Slides: The Burbea-Rao and Bhattacharyya centroidsFrank Nielsen
 
Computational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingComputational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingVictor Zhorin
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted ParaproductsVjekoslavKovac1
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureVjekoslavKovac1
 
On the smallest enclosing information disk
 On the smallest enclosing information disk On the smallest enclosing information disk
On the smallest enclosing information diskFrank Nielsen
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationAlexander Litvinenko
 
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCETHE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCEFrank Nielsen
 
1531 fourier series- integrals and trans
1531 fourier series- integrals and trans1531 fourier series- integrals and trans
1531 fourier series- integrals and transDr Fereidoun Dejahang
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsMercier Jean-Marc
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsStéphane Canu
 
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfStatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfAnna Carbone
 

Similar to k-MLE: A fast algorithm for learning statistical mixture models (20)

Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometry
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Slides: The Burbea-Rao and Bhattacharyya centroids
Slides: The Burbea-Rao and Bhattacharyya centroidsSlides: The Burbea-Rao and Bhattacharyya centroids
Slides: The Burbea-Rao and Bhattacharyya centroids
 
Computational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingComputational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial Modeling
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structure
 
Section5 stochastic
Section5 stochasticSection5 stochastic
Section5 stochastic
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
On the smallest enclosing information disk
 On the smallest enclosing information disk On the smallest enclosing information disk
On the smallest enclosing information disk
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCETHE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
 
1531 fourier series- integrals and trans
1531 fourier series- integrals and trans1531 fourier series- integrals and trans
1531 fourier series- integrals and trans
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methods
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhs
 
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfStatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
 

Recently uploaded

Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 

k-MLE: A fast algorithm for learning statistical mixture models

  • 1. k-MLE: A fast algorithm for learning statistical mixture models (arXiv:1203.5181) Frank NIELSEN Sony Computer Science Laboratories, Inc. 28th March 2012 International Conference on Acoustics, Speech, and Signal Processing ICASSP, Kyoto ICC c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/28
  • 2. Outline I Background I Statistical mixtures of exponential families (EFMMs) I Legendre transform and mixture dual parameterizations I Contributions I k-MLE and its variants I k-MLE initialization (k-MLE++) I Summary c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/28
  • 3. Exponential Family Mixture Models (EFMMs) Generalize Gaussian & Rayleigh MMs to many common distributions. m(x) = Xk i=1 wipF (x; i ) with 8i wi 0, Pk i=1 wi = 1 pF (x; ) = eht(x),i−F()+k(x) F: log-Laplace transform (partition, cumulant function): Z x2X pF (x; )dx = 1 ) F() = log Z x2X eht(x),i+k(x)dx, 2 = | Z x2X eht(x),i+k(x)dx 1 the natural parameter space. I d: Dimension of the support X. I D: order of the family (= dim). Statistic: t(x) : Rd ! RD. c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/28
  • 4. Statistical mixtures: Rayleigh MMs [7, 5] IntraVascular UltraSound (IVUS) imaging: Rayleigh distribution: p(x; ) = x 2 e− x2 22 x 2 R+ = X d = 1 (univariate) D = 1 (order 1) = − 1 22 = (−1, 0) F() = −log(−2) t(x) = x2 k(x) = log x (Weibull k = 2) Coronary plaques: fibrotic tissues, calcified tissues, lipidic tissues Rayleigh Mixture Models (RMMs): for segmentation and classification tasks c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/28
  • 5. Statistical mixtures: Gaussian MMs [3, 5] Gaussian mixture models (GMMs). Color image interpreted as a 5D xyRGB point set. Gaussian distribution p(x; μ,): 1 (2) d 2p|| e−1 2D−1 (x−μ,x−μ) Squared Mahalanobis distance: DQ(x, y) = (x − y)TQ(x − y) x 2 Rd = X d (multivariate) D = d(d+3) 2 (order) = (−1μ, 1 2−1) = (v , M) = R × Sd+ + F() = 1 v −1 4 T M v − 1 2 log |M| + d 2 log t(x) = (x,−xxT ) k(x) = 0 c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/28
  • 6. Sampling from a Gaussian Mixture Model (GMM) To sample a variate x from a GMM: I Choose a component l according to the weight distribution w1, ...,wk , I Draw a variate x according to N(μl ,l ). Doubly stochastic process: 1. throw a (biased) dice with k faces to choose the component: l Multinomial(w1, ...,wk ) (Multinomial distribution belongs also to the exponential families.) 2. then draw at random a variate x from the l -th component x Normal(μl ,l ) x = μ + Cz with Cholesky: = CCT and z = [z1 ... zd ]T standard normal random variate: zi = p−2 log U1 cos(2U2) c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/28
  • 7. Statistical mixtures: Generative models of data sets GMM = feature descriptor for information retrieval (IR) ! classification, matching, etc. Increase dimension using color image patches. Low-frequency information encoded into compact statistical model. Generative model ! statistical image by GMM sampling. Source GMM Sample c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/28
  • 8. Distance between exponential families: Relative entropy I Distance between features (e.g., GMMs) I Kullback-Leibler divergence (cross-entropy minus entropy): KL(P : Q) = Z p(x) log p(x) q(x) dx 0 = Z p(x) log 1 q(x) dx | {z } H×(P:Q) − Z p(x) log 1 p(x) dx | {z } H(p)=H×(P:P) = F(Q) − F(P) − hQ − P,rF(P)i = BF (Q : P) Bregman divergence BF defined for a strictly convex and differentiable function (up to some affine terms). I Proof KL(P : Q) = BF (Q : P) follows from X EF () =) E[t(X)] = rF() c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/28
  • 9. Bregman divergence: Geometric interpretation Potential function F, graph plot F : (x, F(x)). DF (p : q) = F(p) − F(q) − hp − q,rF(q)i c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/28
  • 10. Convex duality: Legendre transformation I For a strictly convex and differentiable function F : X ! R: F(y) = sup x2X{hy, xi − F(x) | {z } lF (y;x); } I Maximum obtained for y = rF(x): rx lF (y; x) = y − rF(x) = 0 ) y = rF(x) I Maximum unique from convexity of F (r2F 0): r2x lF (y; x) = −r2F(x) 0 I Convex conjugates: (F,X) , (F,Y), Y = {rF(x) | x 2 X} c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/28
  • 11. Legendre duality Canonical divergence I Convex conjugates have functional inverse gradients rF−1 = rF rF may require numerical approximation (not always available in analytical closed-form) I Involution: (F) = F. I Convex conjugate F expressed using (rF)−1: F(y) = h(rF)−1(y), yi − F((rF)−1(y)) I Fenchel-Young inequality at the heart of canonical divergence: F(x) + F(y) hx, yi AF (x : y) = AF(y : x) = F(x) + F(y) − hx, yi 0 c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/28
  • 12. Dual Bregman divergences canonical divergence [6] KL(P : Q) = EP log p(x) q(x) 0 = BF (Q : P) = BF(P : Q) = F(Q) + F(P) − hQ, Pi = AF (Q : P) = AF(P : Q) with Q (natural parameterization) and P = EP[t(X)] = rF(P) (moment parameterization). c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/28
  • 13. Exponential family mixtures: Dual parameterizations A finite weighted point set {(wi , i )}ki =1 in a statistical manifold. Many coordinate systems for computing (two canonical): I usual -parameterization, I natural -parameterization and dual -parameterization. Original parameters 2 Exponential family dual parameterization Legendre transform (, F) $ (H, F) 2 2 H = rF() = rF() Natural parameters Expectation parameters c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/28
  • 14. Maximum Likelihood Estimator (MLE) Given n identical and independently distributed observations X = {x1, ..., xn} Maximum Likelihood Estimator Yn ˆ= argmax2 i=1 pF (xi ; ) = argmax2ePn i=1ht(xi ),i−F()+k(xi ) is unique maximum since r2F 0 (Hessian): rF(ˆ) = 1 n Xn i=1 t(xi ) MLE is consistent, efficient with asymptotic normal distribution ˆ N , 1 n I−1() Fisher information matrix I () = var[t(X)] = r2F() MLE may be biased (eg, normal distributions). c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/28
  • 15. Duality Bregman $ Exponential families [2] Bregman divergence: BF (x : ) Bregman generator: F() Cumulant function: F() Exponential family: pF (x|) Legendre duality = rF() An exponential family... pF (x; ) = exp(ht(x), i − F() + k(x)) has the log-density interpreted as a Bregman divergence: log pF (x; ) = −BF(t(x) : ) + F(t(x)) + k(x) c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/28
  • 16. Exponential families , Bregman divergences: Examples F(x) pF (x|) , BF Generator Exponential Family , Dual Bregman divergence x2 Spherical Gaussian , Squared loss x log x Multinomial , Kullback-Leibler divergence x log x − x Poisson , I -divergence −log(−2x) Rayleigh , Itakura-Saito divergence −log x Geometric , Itakura-Saito divergence log |X| Wishart , log-det/Burg matrix div. [8] c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/28
  • 17. Maximum likelihood estimator revisited ˆ = argmax Qn i=1 pF (xi ; ) = argmax Pn i=1 log pF (xi ; ) argmax Xn i=1 (ht(xi ), i − F() + k(xi )) argmax Xn i=1 −BF(t(xi ) : ) + F(t(xi )) + k(xi ) | {z } constant argmin Xn i=1 BF(t(xi ) : ) Right-sided Bregman centroid = center of mass: ˆ = 1 n Pn i=1 t(xi ). c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/28
  • 18. Bregman batched Lloyd’s k-means [2] Extends Lloyd’s k-means heuristic to Bregman divergences. I Initialize distinct seeds: C1 = P1, ..., Ck = Pk I Repeat until convergence I Assign point Pi to its “closest” centroid (wrt. BF (Pi : C)) Ci = {P 2 P | BF (P : Ci ) BF (P : Cj ) 8j6= i} I Update cluster centroids by taking their center of mass: Ci = 1 |Ci | P P2Ci P. Loss function LF (P : C) = X P2P BF (P : C) BF (P : C) = min i2{1,...,k} BF (P : Ci ) ...monotonically decreases and converges to a local optimum. (Extend to weighted point sets using barycenters.) c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 18/28
  • 19. k-MLE for EFMM Bregman Hard Clustering [4] Bijection exponential families (distributions) $ Bregman distances log pF (x; ) = −BF(t(x) : ) + F(t(x)) + k(x), = rF() Bregman k-MLE for EFMMs (F) = additively weighted Bregman hard k-means for F in the Qspace {yi = t(xi )}i : Complete log-likelihood log n i=1 Qk j=1(wjpF (xi |j ))j (zi ): = max ,w Xn i=1 Xk j=1 j (zi )(log pF (xi |j ) + log wj ) min H,w Xn i=1 Xk j=1 j (zi )((BF (t(xi ) : j ) − log wj )−k(xi ) − F(t(xi ) | {z } constant ) min ,w Xn i=1 k min j=1 BF(t(xi ) : j ) − log wj (This is the argmin that gives the zi ’s) c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 19/28
  • 20. Complete average log-likelihood optimization Minimize monotonically the complete average log-likelihood: 1 n min H,w Xn i=1 k min j=1 BF(t(xi ) : j ) − log wj I 1. Constant weights ! dual additive Bregman k-means 1 n min H Xn i=1 k min j=1 (BF(t(xi ) : j ) − log wj ) I 2. Component moment parameters fixed: min w Xn i=1 Xk j=1 −j (zi ) log wj = min w Xk j=1 −j log wj , where j = |Cj | n . That is, minimize the cross-entropy: minw H×( : w) ) w = . I Go to 1 until (local) convergence is met. c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 20/28
  • 21. k-MLE-EFMM algorithm [4] I 0. Initialization: 8i 2 {1, ..., k}, let wi = 1 k and i = t(xi ) (initialization is further discussed later on). I 1. Assignment: 8i 2 {1, ..., n}, zi = argminkj =1BF(t(xi ) : j ) − log wj . Let Ci = {xj |zj = i}, 8i 2 {1, ..., k} be the cluster partition: X = [ki =1Ci . I 2. Update the -parameters: P 8i 2 {1, ..., k}, 1 i = |Ci | x2Ci t(x). Goto step 1 unless local convergence of the complete likelihood is reached. I 3. Update the mixture weights: 8i 2 {1, ..., k},wi = 1 n |Ci |. Goto step 1 unless local convergence of the complete likelihood is reached. c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 21/28
  • 22. k-MLE initialization I Forgy’s random seed (d = D), I Bregman k-means (for F on Y, and MLE on each cluster). Usually D d (eg., multivariate Gaussians D = d(d+3) 2 ) I Compute global MLE ˆ = 1 n Pn i=1 t(xi ) (well-defined for n D ! ˆ 2 ) I Consider restricted exponential family for Fˆ(d+1...D)((1...d)), i = t(1...d)(xi ) and (d+1...D) then set (1...d) i = ˆ(d+1...D). (e.g., we fix global covariance matrix, and let μi = xi for Gaussians) I Improve initialization by applying Bregman k-means++ [1] for the convex conjugate of Fˆ(d+1...D)((1...d)) k-MLE++ based on Bregman k-means++ c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 22/28
  • 23. k-MLE variants using any Bregman k-means heuristic I Any k-means optimization heuristic allows one to update the mixture -parameters. I Hartigan Wang’s greedy swap (after Lloyd convergence) I Kanungo et al. swap ((9 + )-approximation) I Performing successively mixture and w parameters yield Hard EM variant. (easily implemented by winner-take-all EM weight membership) c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 23/28
  • 24. k-MLE for MVNs with the (μ,) parameters I 0. Initialization: I Calculate global mean ¯μ and global covariance matrix ¯ : ¯μ = 1 n Pk i=1 xi , ¯ = 1 n Pk i=1 xi xT i − ¯μ¯μT I 8i 2 {1, ..., k}, initialize the i th seed as (μi = xi ,i = ¯ ). I 1. Assignment: 8i 2 {1, ..., n} zi = argminkj =1M−1 i (x − μi , x − μi ) + log |i | − 2 log wi with M−1 i (x − μi , x − μi ) the squared Mahalanobis distance: MQ(x, y) = (x − y)TQ(x − y). Let Ci = {xj |zj = i}, 8i 2 {1, ..., k} be the cluster partition: X = [ki =1Ci . I 2. Update the parameters: P 8i 2 {1, ..., k}, μi = 1 |Ci | x2Ci x,i = 1 |Ci | P x2Ci xxT − μiμTi Goto step 1 unless local convergence. I 3. Update the mixture weights: 8i 2 {1, ..., k},wi = |Ci | n . Goto step 1 unless local convergence. c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 24/28
  • 25. Summary of contributions I Hard k-MLE versus soft EM: I k-MLE maximizes locally the complete likelihood I EM maximizes the incomplete likelihood I The component parameter update can be implemented using any Bregman k-means heuristic on conjugate F, I Initialization can be performed using k-MLE++ I Indivisibility: Robustness when identifying statistical mixture models? Which k? 8k 2 N, N(μ, 2) = Pk i=1 N μ k , 2 k Simplifying mixtures from kernel density estimators is one fine-to-coarse solution. See: Model centroids for the simplification of kernel density estimators, ICASSP 2012, March 29th. c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 25/28
  • 26. Marcel R. Ackermann and Johannes Bl¨omer. Bregman clustering for separable instances. In Scandinavian Workshop on Algorithm Theory (SWAT), pages 212–223, 2010. Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. Clustering with Bregman divergences. Journal of Machine Learning Research, 6:1705–1749, 2005. Vincent Garcia and Frank Nielsen. Simplification and hierarchical representations of mixtures of exponential families. Signal Processing (Elsevier), 90(12):3197–3212, 2010. Frank Nielsen. k-MLE: A fast algorithm for learning statistical mixture models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2012. c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 25/28
  • 27. preliminary, technical report on arXiv. Frank Nielsen and Vincent Garcia. Statistical exponential families: A digest with flash cards, 2009. arXiv.org:0911.4863. Frank Nielsen and Richard Nock. Entropies and cross-entropies of exponential families. In International Conference on Image Processing (ICIP), pages 3621–3624, 2010. Jose Seabra, Francesco Ciompi, Oriol Pujol, Josepa Mauri, Petia Radeva, and Joao Sanchez. Rayleigh mixture model for plaque characterization in intravascular ultrasound. IEEE Transaction on Biomedical Engineering, 58(5):1314–1324, 2011. Shijun Wang and Rong Jin. c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 25/28
  • 28. An information geometry approach for distance metric learning. Journal of Machine Learning Research, 5:591–598, 2009. c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 26/28
  • 29. Anisotropic Voronoi diagram (for MVN MMs) From the source color image (a), we buid a 5D GMM with k = 32 components, and color each pixel with the mean color of the anisotropic Voronoi cell it belongs to (a) (b) Speed-up assignment step using Bregman ball trees or Bregman vantage point trees. c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 26/28
  • 30. Expectation-maximization (EM) for EFMMs [2] EM increases monotonically the expected complete likelihood (marginalize): Xn i=1 Xk j=1 p(zj |xi , ) log p(xi , zj |) Banerjee et al. [2] proved it amounts to a Bregman soft clustering: c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 27/28
  • 31. Comparisons: k-MLE vs. EM for EFMMs k-MLE/Hard EM Soft EM (1977) = Bregman hard clustering = Bregman soft clustering Memory lighter heavier (W matrix) Speed lighter (VP-tree) heavier (all weights wij ) Conv. always finitely 1, stopping criterion Init. k-MLE++ k-means(++) c 1997-2012 Frank Nielsen, Sony Computer Science Laboratories, Inc. 28/28