SlideShare a Scribd company logo
1 of 47
Download to read offline
A Tutorial of the EM-algorithm and its Application to
Outlier Detection
Jaehyeong Ahn
Konkuk University
jayahn0104@gmail.com
September 9, 2020
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 1 / 47
Table of Contents
1 EM-algorithm: An Overview
2 Proof for EM-algorithm
Non-decreasing (Ascent Property)
Convergence
Local Maximum
3 An Example: Gaussian Mixture Model (GMM)
4 Application to Outlier Detection
Directly Used
Indirectly Used
5 Summary
6 References
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 2 / 47
EM-algorithm: An Overview
EM-algorithm: An Overview
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 3 / 47
EM-algorithm: An Overview
Introduction
The EM-algorithm (Expectation-Maximization algorithm) is an
iterative procedure for computing the maximum likelihood estimator
(MLE) when only a subset of the data is available (When the model
depends on the unobserved latent variable)
The first proper theoretical study of the algorithm was done by
Dempster, Laird, and Rubin (1977) [1]
The EM-algorithm is widely used in various research areas when
unobserved latent variables are included in the model
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 4 / 47
EM-algorithm: An Overview
Data
Y = (y, · · · , yN )T : Observed data
Model
Assume that Y is dependent on some unobserved latent variable Z
where Z = (z, · · · , zN )T
When Z is assumed as discrete random variable
rik = P(zi = k|Y, θ), k = , · · · , K
z∗
i = argmax
k
rik
When Z is assumed as continuous random variable
ri = fZ|Y,Θ(zi|Y, θ)
Log-likelihood
obs(θ; Y ) = log fY |Θ(Y |θ) = log fY,Z|Θ(Y, Z|θ) dz
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 5 / 47
EM-algorithm: An Overview
Goal
By maximizing obs(θ; Y ) w.r.t. θ
Find ˆθ which satisfies ∂θj obs(θ; Y )|θ=ˆθ =0, for j = , · · · , J
Compute the estimated value ˆrik = fZ|Y , Θ(zi|Y, ˆθ)
Problem
The latent variable Z is not observable
It is difficult to compute the integral in obs(θ; Y )
Thus the parameters can not be estimated separately
Solution
Assume the latent variable Z is observed
Define the complete Data
Maximize the complete log likelihood
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 6 / 47
EM-algorithm: An Overview
Data
Y = (y, · · · , yN )T : Observed data
Z = (z, · · · , zN )T : Unobserved (latent) variable
It is assumed as observed
X = (Y, Z): Complete Data
Model
ri = fZ|Y,Θ(zi|Y, θ)
Complete log-likelihood
C(θ; X) = log fX|Θ(X|θ) = log fX|Θ(Y, Z|θ)
Log-likelihood for observed data
obs(θ; Y ) = log fY |Θ(Y |θ) = log fX|Θ(Y, Z|θ) dz
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 7 / 47
EM-algorithm: An Overview
Estimation Idea
Maximize C(θ; X) instead of maximizing obs(θ; Y )
Since the parameters in C(θ; X) can be decoupled
E-step
Compute the Expected Complete Log Likelihood (ECLL)
Taking conditional expectation on C (θ; X) given Y and current value
of parameters θ(s)
This step estimates the realizations of z (since the value of z is not
identified)
M-step
Maximize the computed ECLL w.r.t. θ
Update the estimates of Θ by θ(s+)
which maximize the current
ECLL
Iterate this procedure until it converges
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 8 / 47
EM-algorithm: An Overview
Estimation
E-step
Taking conditional expectation on C(θ; X) given Y and θ = θ(s)
For θ()
, set initial guess
Compute Q(θ|θ(s)) = Eθ(s) [ C(θ; X)|Y ]
M-step
Maximize Q(θ|θ(s)) w.r.t. θ
Put θ(s+) = argmax
θ
Q(θ|θ(s))
Iterate until it satisfies following inequality
||θ(s+)
− θ(s)
|| < , where denotes the sufficiently small value
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 9 / 47
EM-algorithm: An Overview
Immediate Question
Does maximizing the sequence Q(θ|θ(s)) leads to maximizing
obs(θ; Y ) ?
This question will be answered in following slides (Proof for
EM-algorithm) with 3 parts:
Non-decreasing / Convergence / Local maximum
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 10 / 47
Proof for EM-algorithm
Proof for EM-algorithm
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 11 / 47
Proof for EM-algorithm: Non-decreasing
Non-decreasing (Ascent Property)
Proposition 1.
The Sequence obs(θ(s); Y ) in the EM-algorithm is non-decreasing
Proof
We write X = (Y, Z) for the complete data
Then
fZ|Y,Θ(Z|Y, θ) =
fX|Θ(Y,Z|θ)
fY |Θ(Y |θ)
Hence,
obs(θ; Y ) = log fY |Θ(Y |θ) = log fX|Θ((Y, Z)|θ) − log fZ|Y,Θ(Z|Y, θ)
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 12 / 47
Proof for EM-algorithm: Non-decreasing
Taking conditional expectation given Y and Θ = θ(s)
on both sides
yields
obs(θ; Y ) = Eθ(s) [ obs(θ; Y )|Y ]
= Eθ(s) [log fX|Θ((Y, Z)|θ)|Y ] − Eθ(s) [log fZ|Y,Θ(Z|Y, θ)|Y ]
= Q(θ|θ(s)
) − H(θ|θ(s)
)
Where
Q(θ|θ(s)
) = Eθ(s) [log fX|Θ((Y, Z)|θ)|Y ]
H(θ|θ(s)
) = Eθ(s) [log fZ|Y,Θ(Z|Y, θ)|Y ]
Then we have
obs(θ(s+)
; Y ) − obs(θ(s)
; Y ) = Q(θ(s+)
|θ(s)
) − Q(θ(s)
|θ(s)
)
− H(θ(s+)
|θ(s)
) − H(θ(s)
|θ(s)
)
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 13 / 47
Proof for EM-algorithm: Non-decreasing
Recall that
obs(θ(s+)
; Y ) − obs(θ(s)
; Y ) = Q(θ(s+)
|θ(s)
) − Q(θ(s)
|θ(s)
)
(I)
− H(θ(s+)
|θ(s)
) − H(θ(s)
|θ(s)
)
(II)
(I) is non-negative
θ(s+)
= argmax
θ
Q(θ|θ(s)
)
Hence Q(θ(s+)
|θ(s)
) ≥ Q(θ(s)
|θ(s)
)
Thus (I) ≥ 0
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 14 / 47
Proof for EM-algorithm: Non-decreasing
(II) is non-positive
Using Jensen’s inequality for concave functions (log is concave)
Theroem 1. Jensen’s inequality
Let f be a concave function, and let X be a random variable.
Then
E[f(X)] ≤ f(EX)
H(θ(s+)
|θ(s)
) − H(θ(s)
|θ(s)
) = Eθ(s) log
fZ|Y,Θ(Z|Y, θ(s+)
)
fZ|Y,Θ(Z|Y, θ(s))
|Y
≤ log Eθ(s)
fZ|Y,Θ(Z|Y, θ(s+)
)
fZ|Y,Θ(Z|Y, θ(s))
|Y
= log
fZ|Y,Θ(z|Y, θ(s+)
)
fZ|Y,Θ(z|Y, θ(s))
fZ|Y,Θ(z|Y, θ(s)
) dz
= log1 = 0
Hence H(θ(s+)
|θ(s)
) ≤ H(θ(s)
|θ(s)
)
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 15 / 47
Proof for EM-algorithm: Non-decreasing
Theroem 1. Jensen’s inequality
Let f be a concave function, and let X be a random variable.
Then
E[f(X)] ≤ f(EX)
Figure: Jensen’s inequality for concave function[3]
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 16 / 47
Proof for EM-algorithm: Non-decreasing
Recall that
obs(θ(s+)
; Y ) − obs(θ(s)
; Y ) = Q(θ(s+)
|θ(s)
) − Q(θ(s)
|θ(s)
)
(I)
− H(θ(s+)
|θ(s)
) − H(θ(s)
|θ(s)
)
(II)
We’ve proven that (I) ≥ 0 and (II) ≤ 0
This shows obs(θ(s+)
; Y ) − obs(θ(s)
; Y ) ≥0
Thus the sequence obs(θ(s)
; Y ) in the EM-algorithm is
non-decreasing
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 17 / 47
Proof for EM-algorithm: Convergence
Convergence
We will show that the sequence θ(s) converges to some θ∗ with
(θ∗; y) = ∗, the limit of (θ(s))
Assumption
Ω is a subset of Rk
Ωθ = {θ ∈ Ω : (θ; y) ≥ (θ; y)} is compact for any
(θ; y) > −∞
(θ; x) is continuous and differentiable in the interior of Ω
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 18 / 47
Proof for EM-algorithm: Convergence
Theorem 2
Suppose that Q(θ|φ) is continuous in both θ and φ. Then all
limit points of any instance {θ(s)} of the EM algorithm are
stationary points, i.e. θ∗ = argmax
θ
Q(θ|θ∗), and (θ(s); y)
converges monotonically to some value ∗ = (θ∗; y) for some
stationary point θ∗
Theorem 3
Assume the hypothesis of Theorem 2. Suppose in addition that
∂θQ(θ|φ) is continuous in θ and φ. Then θ(s) converges to a
stationary point θ∗ with (θ∗; y) = ∗, the limit of (θ(s)), if either
{θ : (θ; y) = ∗} = {θ∗} or
|θ(s+) − θ(s)| →0 and {θ : (θ; y) = ∗} is discrete
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 19 / 47
Proof for EM-algorithm: Local Maximum
Local Maximum
Recall that
C(θ; X) = log fX|Θ(X|θ) = log fZ|Y,Θ(Z|Y, θ) + log fY |Θ(Y |θ)
Then
Q(θ|θ(s)
) = log fZ|Y,Θ(z|Y, θ)fZ|Y,Θ(z|Y, θ(s)
) dz + obs(θ; Y )
Differentiating w.r.t. θj and putting equal to 0 in order to maximize
Q gives
0 = ∂θj
Q(θ|θ(s)
) =
∂θj fZ|Y,Θ(z|Y, θ)
fZ|Y,Θ(z|Y, θ)
fZ|Y,Θ(z|Y, θ(s)
) dz + ∂θj obs(θ, Y )
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 20 / 47
Proof for EM-algorithm: Local Maximum
Recall that
0 = ∂θj
Q(θ|θ(s)
) =
∂θj fZ|Y,Θ(z|Y, θ)
fZ|Y,Θ(z|Y, θ)
fZ|Y,Θ(z|Y, θ(s)
) dz + ∂θj obs(θ, Y )
If θ(s) → θ∗ then we have for θ∗ that (with j = 1, · · · , J)
0 = ∂θj
Q(θ∗
|θ∗
)
=
∂θj
fZ|Y,Θ(z|Y, θ∗)
fZ|Y,Θ(z|Y, θ∗)
fZ|Y,Θ(z|Y, θ∗
) dz + ∂θj obs(θ∗
; Y )
= ∂θj
fZ|Y,Θ(z|Y, θ∗
) dz + ∂θj obs(θ∗
; Y )
= ∂θj obs(θ∗
; Y )
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 21 / 47
An Example: Gaussian Mixture Model (GMM)
An Example: Gaussian Mixture Model (GMM)
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 22 / 47
An Example: Gaussian Mixture Model (GMM)
Introduction
Mixture models make use of latent variables to model different
parameters for different groups (or clusters) of data points
For a point yi, let the cluster to which that point belongs be labeled
zi; where zi is latent, or unobserved
In this example, we will assume our observable features yi to be
distributed as a Gaussian, chosen based on the cluster that point yi is
associated with
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 23 / 47
An Example: Gaussian Mixture Model (GMM)
Figure: Gaussian Mixture Model Example[5]
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 24 / 47
An Example: Gaussian Mixture Model (GMM)
Data
Y = (y, · · · , yN )T : Observed data
∀i yi ∈ Rp
Z = (z, · · · , zN )T : Unobserved (latent) variable
Assume that Z is observed
∀i zi ∈ {, , · · · , K}
X = (Y, Z): Complete data
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 25 / 47
An Example: Gaussian Mixture Model (GMM)
Distribution Assumption
zi ∼ Mult(π), π ∈ RK
yi|zi = k ∼ NP (µk, Σk)
Model
rik
def
= p(zi = k|yi, θ) =
p(yi|zi = k, θ)p(zi = k|θ)
K
k= (p(yi|zi = k, θ)p(zi = k|θ))
z∗
i = argmax
k
rik
θ denotes the general parameter; θ = {π, µ, Σ}
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 26 / 47
An Example: Gaussian Mixture Model (GMM)
Notation Simplification
Write zi = k as
zi = (zi, · · · , zik, · · · , ziK)T
= (0, · · · , 1, · · · , 0)T
Where zij = I(j = k) ∈{0, 1} for j = , · · · , K
Using this:
p(zi|π) =
K
k=
π
zk
i
k
p(yi|zi, θ) =
K
k=
N(µk, Σk)zk
i =
K
k=
φ(yi|µk,k )zk
i
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 27 / 47
An Example: Gaussian Mixture Model (GMM)
Log-likelihood for observed data
obs(θ; Y ) = log fY |Θ(Y |θ)
=
N
i=
log p(yi|θ)
=
N
i=
log
K
zi
p(yi, zi|θ)
=
N
i=
log


zi∈Z
K
k=
π
zk
i
k N(µk, Σk)zk
i


This does not decouple the likelihood because the log cannot be
‘pushed’ inside the summation
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 28 / 47
An Example: Gaussian Mixture Model (GMM)
Complete log-likelihood
C(θ; X) = log fX|Θ(Y, Z|θ)
= log fY |Z,Θ(Y |Z, θ) + log fZ|Θ(Z|θ)
=
N
i=
log
K
k=
φ(yi|µk, Σk)zk
i + log
k=
π
zk
i
k
=
N
i=
K
k=
zk
i log φ(yi|µk, Σk) + zk
i log πk
Parameters are now decoupled since we can estimate πk and µk , Σk
separately
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 29 / 47
An Example: Gaussian Mixture Model (GMM)
Estimation
E-step
Q(θ|θ(s)
) = Eθ(s)
N
i=1
K
k=1
zk
i log φ(yi |µk , Σk ) + zk
i log πk |Y
=
N
i=
K
k=
Eθ(s) zk
i |Y log φ(yi|µk, Σk) + Eθ(s) zk
i |Y log πk
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 30 / 47
An Example: Gaussian Mixture Model (GMM)
E-step
Note that zk
i =1|Y ∼ Berrnoulli p(zk
i = 1|Y, θ)
Hence
r
(s)
ik
def
= Eθ(s) zk
i |Y = p(zk
i = 1|Y, θ(s)
)
=
p(zk
i = 1, yi|θ(s)
)
K
k=1 p(zk
i = , yi|θ(s))
=
p(yi|zk
i = 1, θ(s)
) p(zk
i = 1|θ(s)
)
K
k= p(yi|zk
i = 1, θ(s)) p(zk
i = 1|θ(s))
=
φ(yi|µ
(s)
k , Σ
(s)
k ) π
(s)
k
K
k= φ(yi|µ
(s)
k , Σ
(s)
k ) π
(s)
k
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 31 / 47
An Example: Gaussian Mixture Model (GMM)
M-step
Recall that
Q(θ|θ(s)
) =
N
i=
K
k=
r
(s)
ik log φ(yi|µk, Σk) + r
(s)
ik log πk
Set θ(s+)
= argmax
θ
Q(θ|θ(s)
)
π
(s+)
k =
N
i= r
(s)
ik
N
µ
(s+)
k =
N
i= r
(s)
ik
yi
N
i= r
(s)
ik
Σ
(s+)
k =
N
i= r
(s)
ik
yi−µ
(s+)
k
yi−µ
(s+)
k
T
N
i= r
(s)
ik
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 32 / 47
An Example: Gaussian Mixture Model (GMM)
Iterate until it satisfies following inequality
||θ(s+)
− θ(s)
|| <
Let ˆθ = θ(s)
What we get
ˆθ = (ˆπ, ˆµ, ˆΣ)
ˆrik = p(zi = k|Y, ˆθ) = φ(yi|ˆµk, ˆΣk) ˆπk
K
k= φ(yi|ˆµk, ˆΣk) ˆπk
ˆzi = argmax
k
ˆrik
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 33 / 47
An Example: Gaussian Mixture Model (GMM)
Figure: Gaussian Mixture Model Fitting Example[5]
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 34 / 47
Application to Outlier Detection
Application to Outlier Detection
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 35 / 47
Application to Outlier Detection: Directly Used
Basic Idea
For cases in which the data may have many different clusters with
different orientations
Assume a specific form of the generative model (e.g., a mixture of
Gaussians)
Fit the model to the data (usually for normal behavior)
Estimate the parameters with EM-algorithm
Fit this model to the unseen (test) data and get the estimation of the
fit (joint) probabilities
Data points that fit the distribution will have high fit (joint)
probabilities
Whereas anomalies (outliers) will have very low fit (joint) probabilities
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 36 / 47
Application to Outlier Detection: Directly Used
Simulation in R
https://rpubs.com/JayAhn/650433
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 37 / 47
Application to Outlier Detection: Indirectly Used
Introduction
Interestingly, EM-algorithms can also be used as a final step after
many such outlier detection algorithms for converting the scores into
probabilities [7]
Converting the outlier scores into well-calibrated probability estimates
is more favorable for several reasons
1 The probability estimates allow us to select the appropriate threshold
for declaring outliers using a Bayesian risk model
2 The probability estimates obtained from individual models can be
aggregated to build an ensemble outlier detection framework
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 38 / 47
Application to Outlier Detection: Indirectly Used
Motivation
Since the outlier detection problem is mainly about unsupervised
learning environment, it is hard to select the appropriate threshold for
decalring outliers
Every outlier detection model outputs different outlier score with
different scale which leads to a difficult problem during constructing
an outlier ensemble model
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 39 / 47
Application to Outlier Detection: Indirectly Used
Outlier Score Distributions
(a) Outlier Score Distribution for
Normal Examples
(b) Outlier Score Distributions for
Outliers
Figure: Outlier Score Distributions [7]
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 40 / 47
Application to Outlier Detection: Indirectly Used
Basic Idea
Treat an outlier score as an univariate random variable
Assume the label of outlierness as an unobserved latent variable
Estimate the posterior probabilities for the latent variable with
EM-algorithm
1 Model the posterior probability for outlier scores using a sigmoid
function
2 Model the score distribution as a mixture model (mixture of
exponential and Gaussian) and calculate the posterior probabilities via
the Bayes’ rule
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 41 / 47
Application to Outlier Detection: Indirectly Used
Bayesian Risk Model
Bayesian risk model minimizes the overall risk associated with some
cost function
For example, in the case of a two-class problem
The Bayes decision rule for a given observation x is to decide ω if:
(λ − λ)P(ω|x) > (λ − λ)P(ω|x)
Where ω, ω are the two classes while λij is the cost of misclassifying
ωj as ωi
Since P(ω|x) =1 − P(ω|x), the preceding inequality suggests that
the appropriate outlier threshold is automatically determined once the
cost functions are known
In the case of a zero-one loss function, the threshold which minimizes
the overall risk is 0.5, where
λij =
0 if i = j
1 if i = j
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 42 / 47
Summary
Summary
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 43 / 47
Summary
This slide had an overview on EM-algorithm and its application to
Outlier detection
We’ve checked the basic procedure of the EM-algorithm for estimating
the parameters of model which has the unobserved latent variable
This slide also has shown that the log-likelihood for observed data is
maximized by EM-algorithm through 3 parts: Non-decreasing /
Convergence / Local Maximum
Further more, we’ve seen that the EM-algorithm can be applied for
the Outlier detection not only directly but also indirectly
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 44 / 47
References
[1] Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. ”Maximum
likelihood from incomplete data via the EM algorithm.” Journal of the Royal
Statistical Society: Series B (Methodological) 39, no. 1 (1977): 1-22.
[2] https://www.math.kth.se/matstat/gru/Statistical%20inference/Lecture8.pdf
[3] https://www.cs.cmu.edu/∼epxing/Class/10708-17/notes-17/10708-scribe-
lecture8.pdf
[4] http://www2.stat.duke.edu/∼sayan/Sta613/2018/lec/emnotes.pdf
[5] Contributions to collaborative clustering and its potential applications on very
high resolution satellite images
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 45 / 47
References
[6] Kriegel, Hans-Peter, Peer Kroger, Erich Schubert, and Arthur Zimek.
”Interpreting and unifying outlier scores.” In Proceedings of the 2011 SIAM
International Conference on Data Mining, pp. 13-24. Society for Industrial and
Applied Mathematics, 2011.
[7] Gao, Jing, and Pang-Ning Tan. ”Converting output scores from outlier
detection algorithms into probability estimates.” In Sixth International Conference
on Data Mining (ICDM’06), pp. 212-221. IEEE, 2006.
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 46 / 47
Q&A
Q&A
Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 47 / 47

More Related Content

What's hot

NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Application of interpolation and finite difference
Application of interpolation and finite differenceApplication of interpolation and finite difference
Application of interpolation and finite differenceManthan Chavda
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionYu Liu
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsAlexander Litvinenko
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodYoonho Lee
 
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsYoonho Lee
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolationEasyStudy3
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Christian Robert
 

What's hot (20)

Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Application of interpolation and finite difference
Application of interpolation and finite differenceApplication of interpolation and finite difference
Application of interpolation and finite difference
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEs
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning Algorithms
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Madrid easy
Madrid easyMadrid easy
Madrid easy
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
 

Similar to A Tutorial of the EM-algorithm and Its Application to Outlier Detection

Conditional mixture model for modeling attributed dyadic data
Conditional mixture model for modeling attributed dyadic dataConditional mixture model for modeling attributed dyadic data
Conditional mixture model for modeling attributed dyadic dataLoc Nguyen
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsPer Kristian Lehre
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsPK Lehre
 
STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)2010111
 
Finite mixture model with EM algorithm
Finite mixture model with EM algorithmFinite mixture model with EM algorithm
Finite mixture model with EM algorithmLoc Nguyen
 
Connection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problemsConnection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problemsAlexander Litvinenko
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmLoc Nguyen
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCADmitrii Ignatov
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...Yuko Kuroki (黒木祐子)
 
2_GLMs_printable.pdf
2_GLMs_printable.pdf2_GLMs_printable.pdf
2_GLMs_printable.pdfElio Laureano
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6NTNU
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsReinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsSean Meyn
 
Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)NYversity
 
Stochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsStochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsSeonho Park
 

Similar to A Tutorial of the EM-algorithm and Its Application to Outlier Detection (20)

asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Conditional mixture model for modeling attributed dyadic data
Conditional mixture model for modeling attributed dyadic dataConditional mixture model for modeling attributed dyadic data
Conditional mixture model for modeling attributed dyadic data
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)
 
Finite mixture model with EM algorithm
Finite mixture model with EM algorithmFinite mixture model with EM algorithm
Finite mixture model with EM algorithm
 
Connection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problemsConnection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problems
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCA
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Propensity albert
Propensity albertPropensity albert
Propensity albert
 
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
 
2_GLMs_printable.pdf
2_GLMs_printable.pdf2_GLMs_printable.pdf
2_GLMs_printable.pdf
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsReinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
 
Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)
 
Stochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsStochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithms
 

More from Konkuk University, Korea (8)

(Slide)concentration effect
(Slide)concentration effect(Slide)concentration effect
(Slide)concentration effect
 
House price
House priceHouse price
House price
 
Titanic data analysis
Titanic data analysisTitanic data analysis
Titanic data analysis
 
Anomaly Detection: A Survey
Anomaly Detection: A SurveyAnomaly Detection: A Survey
Anomaly Detection: A Survey
 
Extended Isolation Forest
Extended Isolation Forest Extended Isolation Forest
Extended Isolation Forest
 
Isolation Forest
Isolation ForestIsolation Forest
Isolation Forest
 
Random Forest
Random ForestRandom Forest
Random Forest
 
Decision Tree
Decision Tree Decision Tree
Decision Tree
 

Recently uploaded

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 

Recently uploaded (20)

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 

A Tutorial of the EM-algorithm and Its Application to Outlier Detection

  • 1. A Tutorial of the EM-algorithm and its Application to Outlier Detection Jaehyeong Ahn Konkuk University jayahn0104@gmail.com September 9, 2020 Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 1 / 47
  • 2. Table of Contents 1 EM-algorithm: An Overview 2 Proof for EM-algorithm Non-decreasing (Ascent Property) Convergence Local Maximum 3 An Example: Gaussian Mixture Model (GMM) 4 Application to Outlier Detection Directly Used Indirectly Used 5 Summary 6 References Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 2 / 47
  • 3. EM-algorithm: An Overview EM-algorithm: An Overview Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 3 / 47
  • 4. EM-algorithm: An Overview Introduction The EM-algorithm (Expectation-Maximization algorithm) is an iterative procedure for computing the maximum likelihood estimator (MLE) when only a subset of the data is available (When the model depends on the unobserved latent variable) The first proper theoretical study of the algorithm was done by Dempster, Laird, and Rubin (1977) [1] The EM-algorithm is widely used in various research areas when unobserved latent variables are included in the model Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 4 / 47
  • 5. EM-algorithm: An Overview Data Y = (y, · · · , yN )T : Observed data Model Assume that Y is dependent on some unobserved latent variable Z where Z = (z, · · · , zN )T When Z is assumed as discrete random variable rik = P(zi = k|Y, θ), k = , · · · , K z∗ i = argmax k rik When Z is assumed as continuous random variable ri = fZ|Y,Θ(zi|Y, θ) Log-likelihood obs(θ; Y ) = log fY |Θ(Y |θ) = log fY,Z|Θ(Y, Z|θ) dz Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 5 / 47
  • 6. EM-algorithm: An Overview Goal By maximizing obs(θ; Y ) w.r.t. θ Find ˆθ which satisfies ∂θj obs(θ; Y )|θ=ˆθ =0, for j = , · · · , J Compute the estimated value ˆrik = fZ|Y , Θ(zi|Y, ˆθ) Problem The latent variable Z is not observable It is difficult to compute the integral in obs(θ; Y ) Thus the parameters can not be estimated separately Solution Assume the latent variable Z is observed Define the complete Data Maximize the complete log likelihood Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 6 / 47
  • 7. EM-algorithm: An Overview Data Y = (y, · · · , yN )T : Observed data Z = (z, · · · , zN )T : Unobserved (latent) variable It is assumed as observed X = (Y, Z): Complete Data Model ri = fZ|Y,Θ(zi|Y, θ) Complete log-likelihood C(θ; X) = log fX|Θ(X|θ) = log fX|Θ(Y, Z|θ) Log-likelihood for observed data obs(θ; Y ) = log fY |Θ(Y |θ) = log fX|Θ(Y, Z|θ) dz Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 7 / 47
  • 8. EM-algorithm: An Overview Estimation Idea Maximize C(θ; X) instead of maximizing obs(θ; Y ) Since the parameters in C(θ; X) can be decoupled E-step Compute the Expected Complete Log Likelihood (ECLL) Taking conditional expectation on C (θ; X) given Y and current value of parameters θ(s) This step estimates the realizations of z (since the value of z is not identified) M-step Maximize the computed ECLL w.r.t. θ Update the estimates of Θ by θ(s+) which maximize the current ECLL Iterate this procedure until it converges Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 8 / 47
  • 9. EM-algorithm: An Overview Estimation E-step Taking conditional expectation on C(θ; X) given Y and θ = θ(s) For θ() , set initial guess Compute Q(θ|θ(s)) = Eθ(s) [ C(θ; X)|Y ] M-step Maximize Q(θ|θ(s)) w.r.t. θ Put θ(s+) = argmax θ Q(θ|θ(s)) Iterate until it satisfies following inequality ||θ(s+) − θ(s) || < , where denotes the sufficiently small value Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 9 / 47
  • 10. EM-algorithm: An Overview Immediate Question Does maximizing the sequence Q(θ|θ(s)) leads to maximizing obs(θ; Y ) ? This question will be answered in following slides (Proof for EM-algorithm) with 3 parts: Non-decreasing / Convergence / Local maximum Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 10 / 47
  • 11. Proof for EM-algorithm Proof for EM-algorithm Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 11 / 47
  • 12. Proof for EM-algorithm: Non-decreasing Non-decreasing (Ascent Property) Proposition 1. The Sequence obs(θ(s); Y ) in the EM-algorithm is non-decreasing Proof We write X = (Y, Z) for the complete data Then fZ|Y,Θ(Z|Y, θ) = fX|Θ(Y,Z|θ) fY |Θ(Y |θ) Hence, obs(θ; Y ) = log fY |Θ(Y |θ) = log fX|Θ((Y, Z)|θ) − log fZ|Y,Θ(Z|Y, θ) Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 12 / 47
  • 13. Proof for EM-algorithm: Non-decreasing Taking conditional expectation given Y and Θ = θ(s) on both sides yields obs(θ; Y ) = Eθ(s) [ obs(θ; Y )|Y ] = Eθ(s) [log fX|Θ((Y, Z)|θ)|Y ] − Eθ(s) [log fZ|Y,Θ(Z|Y, θ)|Y ] = Q(θ|θ(s) ) − H(θ|θ(s) ) Where Q(θ|θ(s) ) = Eθ(s) [log fX|Θ((Y, Z)|θ)|Y ] H(θ|θ(s) ) = Eθ(s) [log fZ|Y,Θ(Z|Y, θ)|Y ] Then we have obs(θ(s+) ; Y ) − obs(θ(s) ; Y ) = Q(θ(s+) |θ(s) ) − Q(θ(s) |θ(s) ) − H(θ(s+) |θ(s) ) − H(θ(s) |θ(s) ) Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 13 / 47
  • 14. Proof for EM-algorithm: Non-decreasing Recall that obs(θ(s+) ; Y ) − obs(θ(s) ; Y ) = Q(θ(s+) |θ(s) ) − Q(θ(s) |θ(s) ) (I) − H(θ(s+) |θ(s) ) − H(θ(s) |θ(s) ) (II) (I) is non-negative θ(s+) = argmax θ Q(θ|θ(s) ) Hence Q(θ(s+) |θ(s) ) ≥ Q(θ(s) |θ(s) ) Thus (I) ≥ 0 Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 14 / 47
  • 15. Proof for EM-algorithm: Non-decreasing (II) is non-positive Using Jensen’s inequality for concave functions (log is concave) Theroem 1. Jensen’s inequality Let f be a concave function, and let X be a random variable. Then E[f(X)] ≤ f(EX) H(θ(s+) |θ(s) ) − H(θ(s) |θ(s) ) = Eθ(s) log fZ|Y,Θ(Z|Y, θ(s+) ) fZ|Y,Θ(Z|Y, θ(s)) |Y ≤ log Eθ(s) fZ|Y,Θ(Z|Y, θ(s+) ) fZ|Y,Θ(Z|Y, θ(s)) |Y = log fZ|Y,Θ(z|Y, θ(s+) ) fZ|Y,Θ(z|Y, θ(s)) fZ|Y,Θ(z|Y, θ(s) ) dz = log1 = 0 Hence H(θ(s+) |θ(s) ) ≤ H(θ(s) |θ(s) ) Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 15 / 47
  • 16. Proof for EM-algorithm: Non-decreasing Theroem 1. Jensen’s inequality Let f be a concave function, and let X be a random variable. Then E[f(X)] ≤ f(EX) Figure: Jensen’s inequality for concave function[3] Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 16 / 47
  • 17. Proof for EM-algorithm: Non-decreasing Recall that obs(θ(s+) ; Y ) − obs(θ(s) ; Y ) = Q(θ(s+) |θ(s) ) − Q(θ(s) |θ(s) ) (I) − H(θ(s+) |θ(s) ) − H(θ(s) |θ(s) ) (II) We’ve proven that (I) ≥ 0 and (II) ≤ 0 This shows obs(θ(s+) ; Y ) − obs(θ(s) ; Y ) ≥0 Thus the sequence obs(θ(s) ; Y ) in the EM-algorithm is non-decreasing Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 17 / 47
  • 18. Proof for EM-algorithm: Convergence Convergence We will show that the sequence θ(s) converges to some θ∗ with (θ∗; y) = ∗, the limit of (θ(s)) Assumption Ω is a subset of Rk Ωθ = {θ ∈ Ω : (θ; y) ≥ (θ; y)} is compact for any (θ; y) > −∞ (θ; x) is continuous and differentiable in the interior of Ω Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 18 / 47
  • 19. Proof for EM-algorithm: Convergence Theorem 2 Suppose that Q(θ|φ) is continuous in both θ and φ. Then all limit points of any instance {θ(s)} of the EM algorithm are stationary points, i.e. θ∗ = argmax θ Q(θ|θ∗), and (θ(s); y) converges monotonically to some value ∗ = (θ∗; y) for some stationary point θ∗ Theorem 3 Assume the hypothesis of Theorem 2. Suppose in addition that ∂θQ(θ|φ) is continuous in θ and φ. Then θ(s) converges to a stationary point θ∗ with (θ∗; y) = ∗, the limit of (θ(s)), if either {θ : (θ; y) = ∗} = {θ∗} or |θ(s+) − θ(s)| →0 and {θ : (θ; y) = ∗} is discrete Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 19 / 47
  • 20. Proof for EM-algorithm: Local Maximum Local Maximum Recall that C(θ; X) = log fX|Θ(X|θ) = log fZ|Y,Θ(Z|Y, θ) + log fY |Θ(Y |θ) Then Q(θ|θ(s) ) = log fZ|Y,Θ(z|Y, θ)fZ|Y,Θ(z|Y, θ(s) ) dz + obs(θ; Y ) Differentiating w.r.t. θj and putting equal to 0 in order to maximize Q gives 0 = ∂θj Q(θ|θ(s) ) = ∂θj fZ|Y,Θ(z|Y, θ) fZ|Y,Θ(z|Y, θ) fZ|Y,Θ(z|Y, θ(s) ) dz + ∂θj obs(θ, Y ) Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 20 / 47
  • 21. Proof for EM-algorithm: Local Maximum Recall that 0 = ∂θj Q(θ|θ(s) ) = ∂θj fZ|Y,Θ(z|Y, θ) fZ|Y,Θ(z|Y, θ) fZ|Y,Θ(z|Y, θ(s) ) dz + ∂θj obs(θ, Y ) If θ(s) → θ∗ then we have for θ∗ that (with j = 1, · · · , J) 0 = ∂θj Q(θ∗ |θ∗ ) = ∂θj fZ|Y,Θ(z|Y, θ∗) fZ|Y,Θ(z|Y, θ∗) fZ|Y,Θ(z|Y, θ∗ ) dz + ∂θj obs(θ∗ ; Y ) = ∂θj fZ|Y,Θ(z|Y, θ∗ ) dz + ∂θj obs(θ∗ ; Y ) = ∂θj obs(θ∗ ; Y ) Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 21 / 47
  • 22. An Example: Gaussian Mixture Model (GMM) An Example: Gaussian Mixture Model (GMM) Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 22 / 47
  • 23. An Example: Gaussian Mixture Model (GMM) Introduction Mixture models make use of latent variables to model different parameters for different groups (or clusters) of data points For a point yi, let the cluster to which that point belongs be labeled zi; where zi is latent, or unobserved In this example, we will assume our observable features yi to be distributed as a Gaussian, chosen based on the cluster that point yi is associated with Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 23 / 47
  • 24. An Example: Gaussian Mixture Model (GMM) Figure: Gaussian Mixture Model Example[5] Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 24 / 47
  • 25. An Example: Gaussian Mixture Model (GMM) Data Y = (y, · · · , yN )T : Observed data ∀i yi ∈ Rp Z = (z, · · · , zN )T : Unobserved (latent) variable Assume that Z is observed ∀i zi ∈ {, , · · · , K} X = (Y, Z): Complete data Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 25 / 47
  • 26. An Example: Gaussian Mixture Model (GMM) Distribution Assumption zi ∼ Mult(π), π ∈ RK yi|zi = k ∼ NP (µk, Σk) Model rik def = p(zi = k|yi, θ) = p(yi|zi = k, θ)p(zi = k|θ) K k= (p(yi|zi = k, θ)p(zi = k|θ)) z∗ i = argmax k rik θ denotes the general parameter; θ = {π, µ, Σ} Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 26 / 47
  • 27. An Example: Gaussian Mixture Model (GMM) Notation Simplification Write zi = k as zi = (zi, · · · , zik, · · · , ziK)T = (0, · · · , 1, · · · , 0)T Where zij = I(j = k) ∈{0, 1} for j = , · · · , K Using this: p(zi|π) = K k= π zk i k p(yi|zi, θ) = K k= N(µk, Σk)zk i = K k= φ(yi|µk,k )zk i Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 27 / 47
  • 28. An Example: Gaussian Mixture Model (GMM) Log-likelihood for observed data obs(θ; Y ) = log fY |Θ(Y |θ) = N i= log p(yi|θ) = N i= log K zi p(yi, zi|θ) = N i= log   zi∈Z K k= π zk i k N(µk, Σk)zk i   This does not decouple the likelihood because the log cannot be ‘pushed’ inside the summation Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 28 / 47
  • 29. An Example: Gaussian Mixture Model (GMM) Complete log-likelihood C(θ; X) = log fX|Θ(Y, Z|θ) = log fY |Z,Θ(Y |Z, θ) + log fZ|Θ(Z|θ) = N i= log K k= φ(yi|µk, Σk)zk i + log k= π zk i k = N i= K k= zk i log φ(yi|µk, Σk) + zk i log πk Parameters are now decoupled since we can estimate πk and µk , Σk separately Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 29 / 47
  • 30. An Example: Gaussian Mixture Model (GMM) Estimation E-step Q(θ|θ(s) ) = Eθ(s) N i=1 K k=1 zk i log φ(yi |µk , Σk ) + zk i log πk |Y = N i= K k= Eθ(s) zk i |Y log φ(yi|µk, Σk) + Eθ(s) zk i |Y log πk Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 30 / 47
  • 31. An Example: Gaussian Mixture Model (GMM) E-step Note that zk i =1|Y ∼ Berrnoulli p(zk i = 1|Y, θ) Hence r (s) ik def = Eθ(s) zk i |Y = p(zk i = 1|Y, θ(s) ) = p(zk i = 1, yi|θ(s) ) K k=1 p(zk i = , yi|θ(s)) = p(yi|zk i = 1, θ(s) ) p(zk i = 1|θ(s) ) K k= p(yi|zk i = 1, θ(s)) p(zk i = 1|θ(s)) = φ(yi|µ (s) k , Σ (s) k ) π (s) k K k= φ(yi|µ (s) k , Σ (s) k ) π (s) k Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 31 / 47
  • 32. An Example: Gaussian Mixture Model (GMM) M-step Recall that Q(θ|θ(s) ) = N i= K k= r (s) ik log φ(yi|µk, Σk) + r (s) ik log πk Set θ(s+) = argmax θ Q(θ|θ(s) ) π (s+) k = N i= r (s) ik N µ (s+) k = N i= r (s) ik yi N i= r (s) ik Σ (s+) k = N i= r (s) ik yi−µ (s+) k yi−µ (s+) k T N i= r (s) ik Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 32 / 47
  • 33. An Example: Gaussian Mixture Model (GMM) Iterate until it satisfies following inequality ||θ(s+) − θ(s) || < Let ˆθ = θ(s) What we get ˆθ = (ˆπ, ˆµ, ˆΣ) ˆrik = p(zi = k|Y, ˆθ) = φ(yi|ˆµk, ˆΣk) ˆπk K k= φ(yi|ˆµk, ˆΣk) ˆπk ˆzi = argmax k ˆrik Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 33 / 47
  • 34. An Example: Gaussian Mixture Model (GMM) Figure: Gaussian Mixture Model Fitting Example[5] Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 34 / 47
  • 35. Application to Outlier Detection Application to Outlier Detection Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 35 / 47
  • 36. Application to Outlier Detection: Directly Used Basic Idea For cases in which the data may have many different clusters with different orientations Assume a specific form of the generative model (e.g., a mixture of Gaussians) Fit the model to the data (usually for normal behavior) Estimate the parameters with EM-algorithm Fit this model to the unseen (test) data and get the estimation of the fit (joint) probabilities Data points that fit the distribution will have high fit (joint) probabilities Whereas anomalies (outliers) will have very low fit (joint) probabilities Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 36 / 47
  • 37. Application to Outlier Detection: Directly Used Simulation in R https://rpubs.com/JayAhn/650433 Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 37 / 47
  • 38. Application to Outlier Detection: Indirectly Used Introduction Interestingly, EM-algorithms can also be used as a final step after many such outlier detection algorithms for converting the scores into probabilities [7] Converting the outlier scores into well-calibrated probability estimates is more favorable for several reasons 1 The probability estimates allow us to select the appropriate threshold for declaring outliers using a Bayesian risk model 2 The probability estimates obtained from individual models can be aggregated to build an ensemble outlier detection framework Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 38 / 47
  • 39. Application to Outlier Detection: Indirectly Used Motivation Since the outlier detection problem is mainly about unsupervised learning environment, it is hard to select the appropriate threshold for decalring outliers Every outlier detection model outputs different outlier score with different scale which leads to a difficult problem during constructing an outlier ensemble model Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 39 / 47
  • 40. Application to Outlier Detection: Indirectly Used Outlier Score Distributions (a) Outlier Score Distribution for Normal Examples (b) Outlier Score Distributions for Outliers Figure: Outlier Score Distributions [7] Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 40 / 47
  • 41. Application to Outlier Detection: Indirectly Used Basic Idea Treat an outlier score as an univariate random variable Assume the label of outlierness as an unobserved latent variable Estimate the posterior probabilities for the latent variable with EM-algorithm 1 Model the posterior probability for outlier scores using a sigmoid function 2 Model the score distribution as a mixture model (mixture of exponential and Gaussian) and calculate the posterior probabilities via the Bayes’ rule Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 41 / 47
  • 42. Application to Outlier Detection: Indirectly Used Bayesian Risk Model Bayesian risk model minimizes the overall risk associated with some cost function For example, in the case of a two-class problem The Bayes decision rule for a given observation x is to decide ω if: (λ − λ)P(ω|x) > (λ − λ)P(ω|x) Where ω, ω are the two classes while λij is the cost of misclassifying ωj as ωi Since P(ω|x) =1 − P(ω|x), the preceding inequality suggests that the appropriate outlier threshold is automatically determined once the cost functions are known In the case of a zero-one loss function, the threshold which minimizes the overall risk is 0.5, where λij = 0 if i = j 1 if i = j Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 42 / 47
  • 43. Summary Summary Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 43 / 47
  • 44. Summary This slide had an overview on EM-algorithm and its application to Outlier detection We’ve checked the basic procedure of the EM-algorithm for estimating the parameters of model which has the unobserved latent variable This slide also has shown that the log-likelihood for observed data is maximized by EM-algorithm through 3 parts: Non-decreasing / Convergence / Local Maximum Further more, we’ve seen that the EM-algorithm can be applied for the Outlier detection not only directly but also indirectly Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 44 / 47
  • 45. References [1] Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. ”Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal Statistical Society: Series B (Methodological) 39, no. 1 (1977): 1-22. [2] https://www.math.kth.se/matstat/gru/Statistical%20inference/Lecture8.pdf [3] https://www.cs.cmu.edu/∼epxing/Class/10708-17/notes-17/10708-scribe- lecture8.pdf [4] http://www2.stat.duke.edu/∼sayan/Sta613/2018/lec/emnotes.pdf [5] Contributions to collaborative clustering and its potential applications on very high resolution satellite images Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 45 / 47
  • 46. References [6] Kriegel, Hans-Peter, Peer Kroger, Erich Schubert, and Arthur Zimek. ”Interpreting and unifying outlier scores.” In Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 13-24. Society for Industrial and Applied Mathematics, 2011. [7] Gao, Jing, and Pang-Ning Tan. ”Converting output scores from outlier detection algorithms into probability estimates.” In Sixth International Conference on Data Mining (ICDM’06), pp. 212-221. IEEE, 2006. Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 46 / 47
  • 47. Q&A Q&A Jaehyeong Ahn (Konkuk Univ) EM-algorithm September 9, 2020 47 / 47