SlideShare a Scribd company logo
1 of 46
Download to read offline
A new implementation of k-MLE for 
mixture modelling of Wishart distributions 
Christophe Saint-Jean Frank Nielsen 
Geometric Science of Information 2013 
August 28, 2013 - Mines Paris Tech
Application Context (1) 
2/31 
We are interested in clustering varying-length sets of multivariate 
observations of same dim. p. 
X1 = 
0 
@ 
3:6 0:05 4: 
3:6 0:05 4: 
3:6 0:05 4: 
1 
A; : : : ;XN = 
0 
BBBB@ 
5:3 0:5 2:5 
3:6 0:5 3:5 
1:6 0:5 4:6 
1:6 0:5 5:1 
2:9 0:5 6:1 
1 
CCCCA 
Sample mean is a good but not discriminative enough feature. 
Second order cross-product matrices tXiXi may capture some 
relations between (column) variables.
Application Context (2) 
3/31 
The problem is now the clustering of a set of p  p PSD matrices : 
 = 
 
x1 = tX1X1; x2 = tX2X2; : : : ; xN = tXNXN 
	 
Examples of applications : multispectral/DTI/radar imaging, 
motion retrieval system, ...
Application Context (2) 
3/31 
The problem is now the clustering of a set of p  p PSD matrices : 
 = 
 
x1 = tX1X1; x2 = tX2X2; : : : ; xN = tXNXN 
	 
Examples of applications : multispectral/DTI/radar imaging, 
motion retrieval system, ...
Outline of this talk 
4/31 
1 MLE and Wishart Distribution 
Exponential Family and Maximum Likehood Estimate 
Wishart Distribution 
Two sub-families of the Wishart Distribution 
2 Mixture modeling with k-MLE 
Original k-MLE 
k-MLE for Wishart distributions 
Heuristics for the initialization 
3 Application to motion retrieval
Reminder : Exponential Family (EF) 
5/31 
An exponential family is a set of parametric probability distributions 
EF = fp(x; ) = pF (x; ) = exp fht(x); i + k(x)  F()j 2 g 
Terminology: 
 source parameters. 
 natural parameters. 
t(x) sucient statistic. 
k(x) auxiliary carrier measure. 
F() the log-normalizer: 
dierentiable, strictly 
convex 
 = f 2 RDjF()  1g 
is an open convex set 
Almost all commonly used distributions are EF members but 
uniform, Cauchy distributions.
Reminder : Maximum Likehood Estimate (MLE) 
6/31 
Maximum Likehood Estimate principle is a very common 
approach for
tting parameters of a distribution 
^ = argmax 
 
L(; ) = argmax 
 
YN 
i=1 
p(xi ; ) = argmin 
 
 
1 
N 
XN 
i=1 
log p(xi ; ) 
assuming a sample  = fx1; x2; :::; xNg of i.i.d observations. 
Log density have a convenient expression for EF members 
log pF (x; ) = ht(x); i + k(x)  F() 
It follows 
^ = argmax 
 
XN 
i=1 
log pF (xi ; ) = argmax 
 
  
h 
XN 
i=1 
! 
t(xi ); i  NF()
MLE with EF 
7/31 
Since F is a strictly convex, dierentiable function, MLE 
exists and is unique : 
rF(^) = 
1 
N 
XN 
i=1 
t(xi ) 
Ideally, we have a closed form : 
^ = rF1 
  
1 
N 
XN 
i=1 
! 
t(xi ) 
Numerical methods including Newton-Raphson can be 
successfully applied.
Wishart Distribution 
8/31 
De
nition (Central Wishart distribution) 
Wishart distribution characterizes empirical covariance matrices for 
zero-mean gaussian samples: 
Wd (X; n; S) = 
jXj 
nd1 
2 exp 
 
 12 
	 
tr(S1X) 
2 
nd 
2 jSj 
n 
2 d 
n 
2 
 
where for x  0, d (x) =  
d(d1) 
4 
Qd 
j=1  
 
x  j1 
2 
 
is the 
multivariate gamma function. 
Remarks : n  d  1, E[X] = nS 
The multivariate generalization of the chi-square distribution.
Wishart Distribution as an EF 
9/31 
It's an exponential family: 
logWd (X; n; S ) =  n; log jXj R +  S ; 
1 
2 
X HS 
+ k(X)  F(n; S ) 
with k(X) = 0 and 
(n; S ) = ( 
n  d  1 
2 
; S1); t(X) = (log jXj; 
1 
2 
X); 
F(n; S ) = 
 
n + 
(d + 1) 
2 
 
(d log(2)  log jS j)+log d 
 
n + 
(d + 1) 
2
MLE for Wishart Distribution 
10/31 
In the case of the Wishart distribution, a closed form would be 
obtained by solving the following system 
^ = rF1 
  
1 
N 
XN 
i=1 
! 
t(xi ) 
 8 
: 
d log(2)  log jS j + 	d 
 
n + (d+1) 
2 
 
= n 
 
 
n + (d+1) 
2 
 
1 
S = S 
(1) 
with n and S the expectation parameters and 	d the derivative 
of the log d . 
Unfortunately, no closed-form solution is known.
Two sub-families of the Wishart Distribution (1) 
11/31 
Case n
xed (n = 2n + d + 1) 
Fn(S ) = 
nd 
2 
log(2)  
n 
2 
log jS j + log d 
n 
2 
 
kn(X) = 
n  d  1 
2 
log jXj 
Case S
xed (S = 1 
S ) 
FS (n) = 
 
n + 
d + 1 
2 
 
log j2Sj + log d 
 
n + 
d + 1 
2 
 
kS (X) =  
1 
2 
tr (S1X)
Two sub-families of the Wishart Distribution (2) 
12/31 
Both are exponential families and MLE equations are solvable ! 
Case n
xed: 
 
n 
2 
^1 
S = 
1 
N 
XN 
i=1 
 
1 
2 
Xi =) ^S = Nn 
  
XN 
i=1 
Xi 
!1 
(2) 
Case S
xed : 
^n = 	1 
d 
  
1 
N 
XN 
i=1 
! 
 
log jXi j  log j2Sj 
d + 1 
2 
; ^n  0 (3) 
with 	1 
d the functional reciprocal of 	d .
An iterative estimator for the Wishart Distribution 
13/31 
Algorithm 1: An estimator for parameters of the Wishart 
Input: A sample X1;X2; : : : ;XN of Sd 
++ 
Output: Final values of ^n and ^S 
Initialize ^n with some value  0; 
repeat 
Update ^S using Eq. 2 with n = 2^n + d + 1; 
Update ^n using Eq. 3 with S the inverse matrix of ^S ; 
until convergence of the likelihood;
Questions and open problems 
14/31 
From a sample of Wishart matrices, distr. parameters are 
recovered in few iterations. 
Major question : do you have a MLE ? probably ... 
Minor question : sample size N = 1 ? 
Under-determined system 
Regularization by sampling around X1
Mixture Models (MM) 
15/31 
A additive (
nite) mixture is a 
exible tool to model a more 
complex distribution m: 
m(x) = 
Xk 
j=1 
wjpj (x); 0  wj  1; 
Xk 
j=1 
wj = 1 
where pj are the component distributions of the mixture, wj 
the mixing proportions. 
In our case, we consider pj as member of some parametric 
family (EF) 
m(x; 	) = 
Xk 
j=1 
wjpFj (x; j ) 
with 	 = (w1;w2; :::;wk1; 1; 2; :::; k ) 
Expectation-Maximization is not fast enough [5] ...
Original k-MLE (primal form.) in one slide 
16/31 
Algorithm 2: k-MLE 
Input: A sample  = fx1; x2; :::; xNg, F1; F2; :::; Fk Bregman 
generator 
Output: Estimate ^	 
of mixture parameters 
A good initialization for 	 (see later); 
repeat 
repeat 
foreach xi 2  do zi = argmaxj log w^jpFj (xi ; ^j ); 
foreach Cj := fxi 2 jzi = jg do ^j = MLEFj (Cj ); 
until Convergence of the complete likelihood; 
Update mixing proportions : w^j = jCj j=N 
until Further convergence of the complete likelihood;
k-MLE’s properties 
17/31 
Another formulation comes with the connection between EF 
and Bregman divergences [3]: 
log pF (x; ) = BF(t(x) : ) + F(t(x)) + k(x) 
Bregman divergence BF (: : :) associated to a strictly convex 
and dierentiable function F :
Original k-MLE (dual form.) in one slide 
18/31 
Algorithm 3: k-MLE 
Input: A sample  = fy1 = t(x1); y2 = x2; :::; yn = t(xN)g, 
F 
1 ; F 
2 ; :::; F 
k Bregman generator 
Output: ^	 
= ( ^w1; ^w2; :::; ^wk1; ^1 = rF(^1); :::; ^k = rF(^k )) 
A good initialization for 	 (see later); 
repeat 
repeat 
foreach xi 2  do zi = argminj 
h 
BF 
j 
(yi : ^j )  log w^j 
i 
; 
foreach Cj := fxi 2 jzi = jg do ^j = 
P 
xi2Cj 
yi=jCj j 
until Convergence of the complete likelihood; 
Update mixing proportions : w^j = jCj j=N 
until Further convergence of the complete likelihood;
k-MLE for Wishart distributions 
19/31 
Practical considerations impose modi
cations of the algorithm: 
During the assignment empty clusters may appear (High 
dimensional data get this worse). 
A possible solution is to consider Hartigan and Wang's 
strategy [6] instead of Lloyd's strategy: 
Optimally transfer one observation at a time 
Update the parameters of involved clusters. 
Stop when no transfer is possible. 
This should guarantees non-empty clusters [7] but does not 
work when considering weighted clusters... 
Get back to an old school criterion : jCzi j  1 
Experimentally shown to perform better in high dimension 
than the Lloyd's strategy.
k-MLE - Hartigan and Wang 
20/31 
Criterion for potential transfer (Max): 
log ^wzi pFzi 
(xi ; ^zi ) 
log ^wz 
i 
pFz 
i 
(xi ; ^zi 
) 
 1 
i = argmaxj log w^jpFj (xi ; ^j ) 
with z 
Update rules : 
^zi = MLEFj (Czi nfxig) 
^z 
i 
= MLEFj (Cz 
i 
[ fxig) 
OR 
Criterion for potential transfer (Min): 
BF(yi : z 
i 
)  log wz 
i 
BF(yi : zi )  log wzi 
 1 
with z 
i = argminj (BF(yi : j )  
log wj ) 
Update rules : 
zi = 
jCzi jzi  yi 
jCzi j  1 
z 
i 
= 
jCz 
i 
jz 
i 
+ yi 
jCz 
i 
j + 1
Towards a good initialization... 
21/31 
Classical initializations methods : random centers, random 
partition, furthest point (2-approximation), ... 
Better approach is k-means++ [8]: 
Sampling prop. to sq. distance to the nearest center.
Towards a good initialization... 
21/31 
Classical initializations methods : random centers, random 
partition, furthest point (2-approximation), ... 
Better approach is k-means++ [8]: 
Sampling prop. to sq. distance to the nearest center.
Towards a good initialization... 
21/31 
Classical initializations methods : random centers, random 
partition, furthest point (2-approximation), ... 
Better approach is k-means++ [8]: 
Sampling prop. to sq. distance to the nearest center.
Towards a good initialization... 
21/31 
Classical initializations methods : random centers, random 
partition, furthest point (2-approximation), ... 
Better approach is k-means++ [8]: 
Sampling prop. to sq. distance to the nearest center.
Towards a good initialization... 
21/31 
Classical initializations methods : random centers, random 
partition, furthest point (2-approximation), ... 
Better approach is k-means++ [8]: 
Sampling prop. to sq. distance to the nearest center.
Towards a good initialization... 
21/31 
Classical initializations methods : random centers, random 
partition, furthest point (2-approximation), ... 
Better approach is k-means++ [8]: 
Sampling prop. to sq. distance to the nearest center.
Towards a good initialization... 
21/31 
Classical initializations methods : random centers, random 
partition, furthest point (2-approximation), ... 
Better approach is k-means++ [8]: 
Sampling prop. to sq. distance to the nearest center.
Towards a good initialization... 
21/31 
Classical initializations methods : random centers, random 
partition, furthest point (2-approximation), ... 
Better approach is k-means++ [8]: 
Sampling prop. to sq. distance to the nearest center. 
Fast and greedy approximation : (kN) 
Probabilistic guarantee of good initialization: 
OPTF  k-meansF  O(log k)OPTF 
Dual Bregman divergence BF may replace the square distance
Heuristic to avoid to fix k 
22/31 
K-means imposes to
x k, the number of clusters 
We propose on-the-
y cluster creation together with the 
k-MLE++ (inspired by DP-k-means [9]) : 
Create cluster when there exists observations contributing too 
much to the loss function with already selected centers
Heuristic to avoid to fix k 
22/31 
K-means imposes to
x k, the number of clusters 
We propose on-the-
y cluster creation together with the 
k-MLE++ (inspired by DP-k-means [9]) : 
Create cluster when there exists observations contributing too 
much to the loss function with already selected centers
Heuristic to avoid to fix k 
22/31 
K-means imposes to
x k, the number of clusters 
We propose on-the-
y cluster creation together with the 
k-MLE++ (inspired by DP-k-means [9]) : 
Create cluster when there exists observations contributing too 
much to the loss function with already selected centers 
It may overestimate the number of clusters...
Initialization with DP-k-MLE++ 
23/31 
Algorithm 4: DP-k-MLE++ 
Input: A sample y1 = t(X1); : : : ; yN = t(XN), F ,   0 
Output: C a subset of y1; : : : ; yN, k the number of clusters 
Choose
rst seed C = fyjg, for j uniformly random in f1; 2; : : : ;Ng; 
repeat 
foreach yi do compute pi = BF(yi : C)= 
PN 
i 0=1 BF(yi 0 : C) 
where BF(yi : C) = minc2CBF(yi : c) ; 
if 9pi   then 
Choose next seed s among y1; y2; : : : ; yN with prob. pi ; 
Add selected seed to C : C = C [ fsg ; 
until all pi  ; 
k = jCj;
Motion capture 
24/31 
Real dataset: 
Motion capture of contemporary dancers (15 sensors in 3d).

More Related Content

What's hot

Numerical
NumericalNumerical
Numerical1821986
 
PRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodPRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodHa Phuong
 
tw1979 Exercise 1 Report
tw1979 Exercise 1 Reporttw1979 Exercise 1 Report
tw1979 Exercise 1 ReportThomas Wigg
 
tw1979 Exercise 3 Report
tw1979 Exercise 3 Reporttw1979 Exercise 3 Report
tw1979 Exercise 3 ReportThomas Wigg
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
tw1979 Exercise 2 Report
tw1979 Exercise 2 Reporttw1979 Exercise 2 Report
tw1979 Exercise 2 ReportThomas Wigg
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Ismael Torres-Pizarro, PhD, PE, Esq.
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleAlexander Litvinenko
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksFederico Cerutti
 

What's hot (20)

Numerical
NumericalNumerical
Numerical
 
Diffusion Homework Help
Diffusion Homework HelpDiffusion Homework Help
Diffusion Homework Help
 
Lesson 8
Lesson 8Lesson 8
Lesson 8
 
Es272 ch4a
Es272 ch4aEs272 ch4a
Es272 ch4a
 
PRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodPRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling Method
 
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
 
tw1979 Exercise 1 Report
tw1979 Exercise 1 Reporttw1979 Exercise 1 Report
tw1979 Exercise 1 Report
 
tw1979 Exercise 3 Report
tw1979 Exercise 3 Reporttw1979 Exercise 3 Report
tw1979 Exercise 3 Report
 
Statistical Method In Economics
Statistical Method In EconomicsStatistical Method In Economics
Statistical Method In Economics
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
tw1979 Exercise 2 Report
tw1979 Exercise 2 Reporttw1979 Exercise 2 Report
tw1979 Exercise 2 Report
 
Es272 ch5a
Es272 ch5aEs272 ch5a
Es272 ch5a
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Solution 3.
Solution 3.Solution 3.
Solution 3.
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
Computer Science Assignment Help
Computer Science Assignment Help Computer Science Assignment Help
Computer Science Assignment Help
 
Data Analysis Homework Help
Data Analysis Homework HelpData Analysis Homework Help
Data Analysis Homework Help
 
Es272 ch4b
Es272 ch4bEs272 ch4b
Es272 ch4b
 

Viewers also liked

On approximating the Riemannian 1-center
On approximating the Riemannian 1-centerOn approximating the Riemannian 1-center
On approximating the Riemannian 1-centerFrank Nielsen
 
(slides 8) Visual Computing: Geometry, Graphics, and Vision
(slides 8) Visual Computing: Geometry, Graphics, and Vision(slides 8) Visual Computing: Geometry, Graphics, and Vision
(slides 8) Visual Computing: Geometry, Graphics, and VisionFrank Nielsen
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Frank Nielsen
 
(slides 9) Visual Computing: Geometry, Graphics, and Vision
(slides 9) Visual Computing: Geometry, Graphics, and Vision(slides 9) Visual Computing: Geometry, Graphics, and Vision
(slides 9) Visual Computing: Geometry, Graphics, and VisionFrank Nielsen
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 
Fundamentals cig 4thdec
Fundamentals cig 4thdecFundamentals cig 4thdec
Fundamentals cig 4thdecFrank Nielsen
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-DivergencesFrank Nielsen
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
 
Traitement massif des données 2016
Traitement massif des données 2016Traitement massif des données 2016
Traitement massif des données 2016Frank Nielsen
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsFrank Nielsen
 
INF442: Traitement des données massives
INF442: Traitement des données massivesINF442: Traitement des données massives
INF442: Traitement des données massivesFrank Nielsen
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
 
Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Frank Nielsen
 
Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Frank Nielsen
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyFrank Nielsen
 
On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)Frank Nielsen
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon informationFrank Nielsen
 

Viewers also liked (20)

On approximating the Riemannian 1-center
On approximating the Riemannian 1-centerOn approximating the Riemannian 1-center
On approximating the Riemannian 1-center
 
(slides 8) Visual Computing: Geometry, Graphics, and Vision
(slides 8) Visual Computing: Geometry, Graphics, and Vision(slides 8) Visual Computing: Geometry, Graphics, and Vision
(slides 8) Visual Computing: Geometry, Graphics, and Vision
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
 
(slides 9) Visual Computing: Geometry, Graphics, and Vision
(slides 9) Visual Computing: Geometry, Graphics, and Vision(slides 9) Visual Computing: Geometry, Graphics, and Vision
(slides 9) Visual Computing: Geometry, Graphics, and Vision
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
Fundamentals cig 4thdec
Fundamentals cig 4thdecFundamentals cig 4thdec
Fundamentals cig 4thdec
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)
 
Traitement massif des données 2016
Traitement massif des données 2016Traitement massif des données 2016
Traitement massif des données 2016
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
INF442: Traitement des données massives
INF442: Traitement des données massivesINF442: Traitement des données massives
INF442: Traitement des données massives
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metrics
 
Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)
 
Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropy
 
On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon information
 

Similar to A new implementation of k-MLE for mixture modelling of Wishart distributions

Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3MuhannadSaleh
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsVjekoslavKovac1
 
Phase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsPhase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsStefan Eng
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT posterAlexander Litvinenko
 
NIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningNIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningzukun
 
Basic Cal - Quarter 1 Week 1-2.pptx
Basic Cal - Quarter 1 Week 1-2.pptxBasic Cal - Quarter 1 Week 1-2.pptx
Basic Cal - Quarter 1 Week 1-2.pptxjamesvalenzuela6
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt msFaeco Bot
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 

Similar to A new implementation of k-MLE for mixture modelling of Wishart distributions (20)

Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Input analysis
Input analysisInput analysis
Input analysis
 
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
 
Modeling the dynamics of molecular concentration during the diffusion procedure
Modeling the dynamics of molecular concentration during the  diffusion procedureModeling the dynamics of molecular concentration during the  diffusion procedure
Modeling the dynamics of molecular concentration during the diffusion procedure
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurations
 
Phase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsPhase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle Systems
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT poster
 
NIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningNIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learning
 
Basic Cal - Quarter 1 Week 1-2.pptx
Basic Cal - Quarter 1 Week 1-2.pptxBasic Cal - Quarter 1 Week 1-2.pptx
Basic Cal - Quarter 1 Week 1-2.pptx
 
Metodo gauss_newton.pdf
Metodo gauss_newton.pdfMetodo gauss_newton.pdf
Metodo gauss_newton.pdf
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

A new implementation of k-MLE for mixture modelling of Wishart distributions

  • 1. A new implementation of k-MLE for mixture modelling of Wishart distributions Christophe Saint-Jean Frank Nielsen Geometric Science of Information 2013 August 28, 2013 - Mines Paris Tech
  • 2. Application Context (1) 2/31 We are interested in clustering varying-length sets of multivariate observations of same dim. p. X1 = 0 @ 3:6 0:05 4: 3:6 0:05 4: 3:6 0:05 4: 1 A; : : : ;XN = 0 BBBB@ 5:3 0:5 2:5 3:6 0:5 3:5 1:6 0:5 4:6 1:6 0:5 5:1 2:9 0:5 6:1 1 CCCCA Sample mean is a good but not discriminative enough feature. Second order cross-product matrices tXiXi may capture some relations between (column) variables.
  • 3. Application Context (2) 3/31 The problem is now the clustering of a set of p p PSD matrices : = x1 = tX1X1; x2 = tX2X2; : : : ; xN = tXNXN Examples of applications : multispectral/DTI/radar imaging, motion retrieval system, ...
  • 4. Application Context (2) 3/31 The problem is now the clustering of a set of p p PSD matrices : = x1 = tX1X1; x2 = tX2X2; : : : ; xN = tXNXN Examples of applications : multispectral/DTI/radar imaging, motion retrieval system, ...
  • 5. Outline of this talk 4/31 1 MLE and Wishart Distribution Exponential Family and Maximum Likehood Estimate Wishart Distribution Two sub-families of the Wishart Distribution 2 Mixture modeling with k-MLE Original k-MLE k-MLE for Wishart distributions Heuristics for the initialization 3 Application to motion retrieval
  • 6. Reminder : Exponential Family (EF) 5/31 An exponential family is a set of parametric probability distributions EF = fp(x; ) = pF (x; ) = exp fht(x); i + k(x) F()j 2 g Terminology: source parameters. natural parameters. t(x) sucient statistic. k(x) auxiliary carrier measure. F() the log-normalizer: dierentiable, strictly convex = f 2 RDjF() 1g is an open convex set Almost all commonly used distributions are EF members but uniform, Cauchy distributions.
  • 7. Reminder : Maximum Likehood Estimate (MLE) 6/31 Maximum Likehood Estimate principle is a very common approach for
  • 8. tting parameters of a distribution ^ = argmax L(; ) = argmax YN i=1 p(xi ; ) = argmin 1 N XN i=1 log p(xi ; ) assuming a sample = fx1; x2; :::; xNg of i.i.d observations. Log density have a convenient expression for EF members log pF (x; ) = ht(x); i + k(x) F() It follows ^ = argmax XN i=1 log pF (xi ; ) = argmax h XN i=1 ! t(xi ); i NF()
  • 9. MLE with EF 7/31 Since F is a strictly convex, dierentiable function, MLE exists and is unique : rF(^) = 1 N XN i=1 t(xi ) Ideally, we have a closed form : ^ = rF1 1 N XN i=1 ! t(xi ) Numerical methods including Newton-Raphson can be successfully applied.
  • 11. nition (Central Wishart distribution) Wishart distribution characterizes empirical covariance matrices for zero-mean gaussian samples: Wd (X; n; S) = jXj nd1 2 exp 12 tr(S1X) 2 nd 2 jSj n 2 d n 2 where for x 0, d (x) = d(d1) 4 Qd j=1 x j1 2 is the multivariate gamma function. Remarks : n d 1, E[X] = nS The multivariate generalization of the chi-square distribution.
  • 12. Wishart Distribution as an EF 9/31 It's an exponential family: logWd (X; n; S ) = n; log jXj R + S ; 1 2 X HS + k(X) F(n; S ) with k(X) = 0 and (n; S ) = ( n d 1 2 ; S1); t(X) = (log jXj; 1 2 X); F(n; S ) = n + (d + 1) 2 (d log(2) log jS j)+log d n + (d + 1) 2
  • 13. MLE for Wishart Distribution 10/31 In the case of the Wishart distribution, a closed form would be obtained by solving the following system ^ = rF1 1 N XN i=1 ! t(xi ) 8 : d log(2) log jS j + d n + (d+1) 2 = n n + (d+1) 2 1 S = S (1) with n and S the expectation parameters and d the derivative of the log d . Unfortunately, no closed-form solution is known.
  • 14. Two sub-families of the Wishart Distribution (1) 11/31 Case n
  • 15. xed (n = 2n + d + 1) Fn(S ) = nd 2 log(2) n 2 log jS j + log d n 2 kn(X) = n d 1 2 log jXj Case S
  • 16. xed (S = 1 S ) FS (n) = n + d + 1 2 log j2Sj + log d n + d + 1 2 kS (X) = 1 2 tr (S1X)
  • 17. Two sub-families of the Wishart Distribution (2) 12/31 Both are exponential families and MLE equations are solvable ! Case n
  • 18. xed: n 2 ^1 S = 1 N XN i=1 1 2 Xi =) ^S = Nn XN i=1 Xi !1 (2) Case S
  • 19. xed : ^n = 1 d 1 N XN i=1 ! log jXi j log j2Sj d + 1 2 ; ^n 0 (3) with 1 d the functional reciprocal of d .
  • 20. An iterative estimator for the Wishart Distribution 13/31 Algorithm 1: An estimator for parameters of the Wishart Input: A sample X1;X2; : : : ;XN of Sd ++ Output: Final values of ^n and ^S Initialize ^n with some value 0; repeat Update ^S using Eq. 2 with n = 2^n + d + 1; Update ^n using Eq. 3 with S the inverse matrix of ^S ; until convergence of the likelihood;
  • 21. Questions and open problems 14/31 From a sample of Wishart matrices, distr. parameters are recovered in few iterations. Major question : do you have a MLE ? probably ... Minor question : sample size N = 1 ? Under-determined system Regularization by sampling around X1
  • 22. Mixture Models (MM) 15/31 A additive (
  • 23. nite) mixture is a exible tool to model a more complex distribution m: m(x) = Xk j=1 wjpj (x); 0 wj 1; Xk j=1 wj = 1 where pj are the component distributions of the mixture, wj the mixing proportions. In our case, we consider pj as member of some parametric family (EF) m(x; ) = Xk j=1 wjpFj (x; j ) with = (w1;w2; :::;wk1; 1; 2; :::; k ) Expectation-Maximization is not fast enough [5] ...
  • 24. Original k-MLE (primal form.) in one slide 16/31 Algorithm 2: k-MLE Input: A sample = fx1; x2; :::; xNg, F1; F2; :::; Fk Bregman generator Output: Estimate ^ of mixture parameters A good initialization for (see later); repeat repeat foreach xi 2 do zi = argmaxj log w^jpFj (xi ; ^j ); foreach Cj := fxi 2 jzi = jg do ^j = MLEFj (Cj ); until Convergence of the complete likelihood; Update mixing proportions : w^j = jCj j=N until Further convergence of the complete likelihood;
  • 25. k-MLE’s properties 17/31 Another formulation comes with the connection between EF and Bregman divergences [3]: log pF (x; ) = BF(t(x) : ) + F(t(x)) + k(x) Bregman divergence BF (: : :) associated to a strictly convex and dierentiable function F :
  • 26. Original k-MLE (dual form.) in one slide 18/31 Algorithm 3: k-MLE Input: A sample = fy1 = t(x1); y2 = x2; :::; yn = t(xN)g, F 1 ; F 2 ; :::; F k Bregman generator Output: ^ = ( ^w1; ^w2; :::; ^wk1; ^1 = rF(^1); :::; ^k = rF(^k )) A good initialization for (see later); repeat repeat foreach xi 2 do zi = argminj h BF j (yi : ^j ) log w^j i ; foreach Cj := fxi 2 jzi = jg do ^j = P xi2Cj yi=jCj j until Convergence of the complete likelihood; Update mixing proportions : w^j = jCj j=N until Further convergence of the complete likelihood;
  • 27. k-MLE for Wishart distributions 19/31 Practical considerations impose modi
  • 28. cations of the algorithm: During the assignment empty clusters may appear (High dimensional data get this worse). A possible solution is to consider Hartigan and Wang's strategy [6] instead of Lloyd's strategy: Optimally transfer one observation at a time Update the parameters of involved clusters. Stop when no transfer is possible. This should guarantees non-empty clusters [7] but does not work when considering weighted clusters... Get back to an old school criterion : jCzi j 1 Experimentally shown to perform better in high dimension than the Lloyd's strategy.
  • 29. k-MLE - Hartigan and Wang 20/31 Criterion for potential transfer (Max): log ^wzi pFzi (xi ; ^zi ) log ^wz i pFz i (xi ; ^zi ) 1 i = argmaxj log w^jpFj (xi ; ^j ) with z Update rules : ^zi = MLEFj (Czi nfxig) ^z i = MLEFj (Cz i [ fxig) OR Criterion for potential transfer (Min): BF(yi : z i ) log wz i BF(yi : zi ) log wzi 1 with z i = argminj (BF(yi : j ) log wj ) Update rules : zi = jCzi jzi yi jCzi j 1 z i = jCz i jz i + yi jCz i j + 1
  • 30. Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: Sampling prop. to sq. distance to the nearest center.
  • 31. Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: Sampling prop. to sq. distance to the nearest center.
  • 32. Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: Sampling prop. to sq. distance to the nearest center.
  • 33. Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: Sampling prop. to sq. distance to the nearest center.
  • 34. Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: Sampling prop. to sq. distance to the nearest center.
  • 35. Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: Sampling prop. to sq. distance to the nearest center.
  • 36. Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: Sampling prop. to sq. distance to the nearest center.
  • 37. Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: Sampling prop. to sq. distance to the nearest center. Fast and greedy approximation : (kN) Probabilistic guarantee of good initialization: OPTF k-meansF O(log k)OPTF Dual Bregman divergence BF may replace the square distance
  • 38. Heuristic to avoid to fix k 22/31 K-means imposes to
  • 39. x k, the number of clusters We propose on-the- y cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : Create cluster when there exists observations contributing too much to the loss function with already selected centers
  • 40. Heuristic to avoid to fix k 22/31 K-means imposes to
  • 41. x k, the number of clusters We propose on-the- y cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : Create cluster when there exists observations contributing too much to the loss function with already selected centers
  • 42. Heuristic to avoid to fix k 22/31 K-means imposes to
  • 43. x k, the number of clusters We propose on-the- y cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : Create cluster when there exists observations contributing too much to the loss function with already selected centers It may overestimate the number of clusters...
  • 44. Initialization with DP-k-MLE++ 23/31 Algorithm 4: DP-k-MLE++ Input: A sample y1 = t(X1); : : : ; yN = t(XN), F , 0 Output: C a subset of y1; : : : ; yN, k the number of clusters Choose
  • 45. rst seed C = fyjg, for j uniformly random in f1; 2; : : : ;Ng; repeat foreach yi do compute pi = BF(yi : C)= PN i 0=1 BF(yi 0 : C) where BF(yi : C) = minc2CBF(yi : c) ; if 9pi then Choose next seed s among y1; y2; : : : ; yN with prob. pi ; Add selected seed to C : C = C [ fsg ; until all pi ; k = jCj;
  • 46. Motion capture 24/31 Real dataset: Motion capture of contemporary dancers (15 sensors in 3d).
  • 47. Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with dierent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ^ i .
  • 48. Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with dierent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ^ i .
  • 49. Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with dierent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ^ i .
  • 50. Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with dierent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ^ i .
  • 51. Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with dierent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ^ i . Remark: Size of each sub-motion is known (so its n)
  • 52. Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with dierent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ^ i . Mixture parameters can be viewed as a sparse representation of local dynamics in Xi .
  • 53. Application to motion retrieval(2) 26/31 Comparing two movements amounts to compute a dissimilarity measure between ^ i and ^ j . Remark 1 : with DP-k-MLE++, the two mixtures would not probably have the same number of components. Remark 2 : when both mixtures have one component, a natural choice is KL(Wd (:; ^)jjWd (:; ^0)) = BF(^ : ^0) = BF (^0 : ^) A closed form is always available ! No closed form exists for KL divergence between general mixtures.
  • 54. Application to motion retrieval(3) 27/31 A possible solution is to use the CS divergence [10]: CS(m : m0) = log R m(x)m0 R (x)dx m(x)2dx R m0(x)2dx It has a analytic formula for Z m(x)m0(x)dx = Xk j=1 k0 X j 0=1 j 0 expF(j+0 wjw0 j0 )(F(j)+F(0 j0 )) + Note that this expression is well de
  • 55. ned since natural parameter space = R Sp ++ is a convex cone.
  • 57. c code in MatlabTM. Today implementation in Python (based on pyMEF [2]) Ongoing proof of concept (with Herranz F., Beurive A.)
  • 58. Conclusions - Future works 29/31 Still some mathematical work to be done: Solve MLE equations to get rF = (rF)1 then F Characterize our estimator for full Wishart distribution. Complete and validate the prototype of system for motion retrieval. Speeding-up algorithm: computational/numerical/algorithmic tricks. library for bregman divergences learning ? Possible extensions: Reintroduce mean vector in the model : Gaussian-Wishart Online k-means - online k-MLE ...
  • 59. References I 30/31 Nielsen, F.: k-MLE: A fast algorithm for learning statistical mixture models. In: International Conference on Acoustics, Speech and Signal Processing. (2012) pp. 869{872 Schwander, O. and Nielsen, F. pyMEF - A framework for Exponential Families in Python in Proceedings of the 2011 IEEE Workshop on Statistical Signal Processing Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J. Clustering with bregman divergences. Journal of Machine Learning Research (6) (2005) 1705{1749 Nielsen, F., Garcia, V.: Statistical exponential families: A digest with ash cards. http://arxiv.org/abs/0911.4863 (11 2009) Hidot, S., Saint Jean, C.: An Expectation-Maximization algorithm for the Wishart mixture model: Application to movement clustering. Pattern Recognition Letters 31(14) (2010) 2318{2324
  • 60. References II 31/31 Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1) (1979) 100{108 Telgarsky, M., Vattani, A.: Hartigan's method: k-means clustering without Voronoi. In: Proc. of International Conference on Arti
  • 61. cial Intelligence and Statistics (AISTATS). (2010) pp. 820{827 Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (2007) pp. 1027{1035 Kulis, B., Jordan, M.I.: Revisiting k-means: New algorithms via Bayesian nonparametrics. In: International Conference on Machine Learning (ICML). (2012) Nielsen, F.: Closed-form information-theoretic divergences for statistical mixtures. In: International Conference on Pattern Recognition (ICPR). (2012) pp. 1723{1726