Tutorial on EM algorithm - Poster

•

0 likes•6 views

Maximum likelihood estimation (MLE) is a popular method for parameter estimation in both applied probability and statistics but MLE cannot solve the problem of incomplete data or hidden data because it is impossible to maximize likelihood function from hidden data. Expectation maximum (EM) algorithm is a powerful mathematical tool for solving this problem if there is a relationship between hidden data and observed data. Such hinting relationship is specified by a mapping from hidden data to observed data or by a joint probability between hidden data and observed data (showing MLE, EM, and practical EM, hidden info implies the hinting relationship). The essential ideology of EM is to maximize the expectation of likelihood function over observed data based on the hinting relationship instead of maximizing directly the likelihood function of hidden data (showing the full EM with proof along with two steps). An important application of EM is (finite) mixture model which in turn is developed towards two trends such as infinite mixture model and semiparametric mixture model. Especially, in semiparametric mixture model, component probabilistic density functions are not parameterized. Semiparametric mixture model is interesting and potential for other applications where probabilistic components are not easy to be specified (showing mixture models). I raise a question that whether it is possible to backward discover semiparametric EM from semiparametric mixture model. I hope that this question will open a new trend or new extension for EM algorithm (showing the question).

Science

Presentation
• Maximum likelihood estimation (MLE) is a popular method for parameter estimation in both applied
probability and statistics but MLE cannot solve the problem of incomplete data or hidden data because it is
impossible to maximize likelihood function from hidden data. Expectation maximum (EM) algorithm is a
powerful mathematical tool for solving this problem if there is a relationship between hidden data and
observed data. Such hinting relationship is specified by a mapping from hidden data to observed data or by a
joint probability between hidden data and observed data (showing MLE, EM, and practical EM, hidden info
implies the hinting relationship).
• The essential ideology of EM is to maximize the expectation of likelihood function over observed data based
on the hinting relationship instead of maximizing directly the likelihood function of hidden data (showing the
full EM with proof along with two steps).
• An important application of EM is (finite) mixture model which in turn is developed towards two trends such
as infinite mixture model and semiparametric mixture model. Especially, in semiparametric mixture model,
component probabilistic density functions are not parameterized. Semiparametric mixture model is interesting
and potential for other applications where probabilistic components are not easy to be specified (showing
mixture models).
• I raise a question that whether it is possible to backward discover semiparametric EM from semiparametric
mixture model. I hope that this question will open a new trend or new extension for EM algorithm (showing
the question).

Maximum likelihood estimation (MLE), MAP
Θ = argmax
Θ
𝑙 Θ = argmax
Θ
𝑖=1
𝑁
log 𝑓 𝑋𝑖 Θ
Expectation Maximization (EM)
𝑄 Θ′ Θ =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑓 𝑋 Θ′ d𝑋
Practical EM
𝑄 Θ′ Θ =
𝑋
𝑓 𝑋 𝑌, Θ log 𝑓 𝑋, 𝑌 Θ′ d𝑋
Full EM
𝑄 Θ′ Θ =
𝑖=1
𝑁
𝜑−1 𝑌𝑖
𝑘 𝑋𝑖 𝑌𝑖, Θ log 𝑓 𝑋𝑖 Θ′ d𝑋𝑖
𝑄 Θ′
Θ =
𝑖=1
𝑁
𝑋
𝑓 𝑋 𝑌𝑖, Θ log 𝑓 𝑋, 𝑌𝑖 Θ′
d𝑋
Observed
data
Observed
data
Observed
data
Mixture model
Semiparametric EM ?
?
E-step: Determine Q(Θ | Θ(t))
M-step: Θ 𝑡+1
= argmax
Θ
𝑄 Θ Θ 𝑡
Require mapping
Y = φ(X)
Require PDF
f(X, Y | ϴ)
?
Tutorial on EM algorithm – Loc Nguyen (ng_phloc@yahoo.com) – http://www.locnguyen.net
Semiparametric
mixture model
Infinite
mixture model
Hidden
info
Hidden
info

Thank you for attention
3
Conditional mixture model for modeling attributed dyadic data
16/09/2021

More from Loc Nguyen

Where the dragon to fly

Loc Nguyen

Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is good at generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.

Adversarial Variational Autoencoders to extend and improve generative model

Loc Nguyen

Dyadic data which is also called co-occurrence data (COD) contains co-occurrences of objects. Searching for statistical models to represent dyadic data is necessary. Fortunately, finite mixture model is a solid statistical model to learn and make inference on dyadic data because mixture model is built smoothly and reliably by expectation maximization (EM) algorithm which is suitable to inherent spareness of dyadic data. This research summarizes mixture models for dyadic data. When each co-occurrence in dyadic data is associated with a value, there are many unaccomplished values because a lot of co-occurrences are inexistent. In this research, these unaccomplished values are estimated as mean (expectation) of random variable given partial probabilistic distributions inside dyadic mixture model.

Learning dyadic data and predicting unaccomplished co-occurrent values by mix...

Loc Nguyen

Machine learning forks into three main branches such as supervised learning, unsupervised learning, and reinforcement learning where reinforcement learning is much potential to artificial intelligence (AI) applications because it solves real problems by progressive process in which possible solutions are improved and finetuned continuously. The progressive approach, which reflects ability of adaptation, is appropriate to the real world where most events occur and change continuously and unexpectedly. Moreover, data is getting too huge for supervised learning and unsupervised learning to draw valuable knowledge from such huge data at one time. Bayesian optimization (BO) models an optimization problem as a probabilistic form called surrogate model and then directly maximizes an acquisition function created from such surrogate model in order to maximize implicitly and indirectly the target function for finding out solution of the optimization problem. A popular surrogate model is Gaussian process regression model. The process of maximizing acquisition function is based on updating posterior probability of surrogate model repeatedly, which is improved after every iteration. Taking advantages of acquisition function or utility function is also common in decision theory but the semantic meaning behind BO is that BO solves problems by progressive and adaptive approach via updating surrogate model from a small piece of data at each time, according to ideology of reinforcement learning. Undoubtedly, BO is a reinforcement learning algorithm with many potential applications and thus it is surveyed in this research with attention to its mathematical ideas. Moreover, the solution of optimization problem is important to not only applied mathematics but also AI.

Tutorial on Bayesian optimization

Loc Nguyen

Support vector machine is a powerful machine learning method in data classification. Using it for applied researches is easy but comprehending it for further development requires a lot of efforts. This report is a tutorial on support vector machine with full of mathematical proofs and example, which help researchers to understand it by the fastest way from theory to practice. The report focuses on theory of optimization which is the base of support vector machine.

Tutorial on Support Vector Machine

Loc Nguyen

Autoregressive (AR) model and conditional autoregressive (CAR) model are specific regressive models in which independent variables and dependent variable imply the same object. They are powerful statistical tools to predict values based on correlation of time domain and space domain, which are useful in epidemiology analysis. In this research, I combine them by the simple way in which AR and CAR is estimated in two separate steps so as to cover time domain and space domain in spatial-temporal data analysis. Moreover, I integrate logistic model into CAR model, which aims to improve competence of autoregressive models.

A Proposal of Two-step Autoregressive Model

Loc Nguyen

Regression analysis is an important tool in statistical analysis, in which there is a demand of discovering essential independent variables among many other ones, especially in case that there is a huge number of random variables. Extreme bound analysis is a powerful approach to extract such important variables called robust regressors. In this research, a so-called Regressive Expectation Maximization with RObust regressors (REMRO) algorithm is proposed as an alternative method beside other probabilistic methods for analyzing robust variables. By the different ideology from other probabilistic methods, REMRO searches for robust regressors forming optimal regression model and sorts them according to descending ordering given their fitness values determined by two proposed concepts of local correlation and global correlation. Local correlation represents sufficient explanatories to possible regressive models and global correlation reflects independence level and stand-alone capacity of regressors. Moreover, REMRO can resist incomplete data because it applies Regressive Expectation Maximization (REM) algorithm into filling missing values by estimated values based on ideology of expectation maximization (EM) algorithm. From experimental results, REMRO is more accurate for modeling numeric regressors than traditional probabilistic methods like Sala-I-Martin method but REMRO cannot be applied in case of nonnumeric regression model yet in this research.

Extreme bound analysis based on correlation coefficient for optimal regressio...

Loc Nguyen

There are many investment ways such as bank depositing, enterprise business, and stock investment. Bank depositing is a safe and easy way to invest and hence, it is known as reference tool to compare or make decision on other investment methods. Alternately, stock investment is preferred method with good feeling about its preeminence. However, according to mathematical model, stock investment and bank depositing have the same benefit if their growth rate and interest rate are the same. Therefore, I propose a so-called jagged stock investment (JSI) strategy in which the chain of buying stock in the given time interval is modeled as a saw with expectation that JSI strategy gets frequently profitable.

Jagged stock investment strategy

Loc Nguyen

Global optimization is an imperative development of local optimization because there are many problems in artificial intelligence and machine learning requires highly acute solutions over entire domain. There are many methods to resolve the global optimization, which can be classified into three groups such as analytic methods (purely mathematical methods), probabilistic methods, and heuristic methods. Especially, heuristic methods like particle swarm optimization and ant bee colony attract researchers because their effective and practical techniques which are easy to be implemented by computer programming languages. However, these heuristic methods are lacking in theoretical mathematical fundamental. Fortunately, minima distribution establishes a strict mathematical relationship between optimized target function and its global minima. In this research, I try to study minima distribution and apply it into explaining convergence and convergence speed of optimization algorithms. Especially, weak conditions of convergence and monotonicity within minima distribution are drawn so as to be appropriate to practical optimization methods.

A short study on minima distribution

Loc Nguyen

This is chapter 4 “Variants of EM algorithm” in my book “Tutorial on EM algorithm”, which focuses on EM variants. The main purpose of expectation maximization (EM) algorithm, also GEM algorithm, is to maximize the log-likelihood L(Θ) = log(g(Y|Θ)) with observed data Y by maximizing the conditional expectation Q(Θ’|Θ). Such Q(Θ’|Θ) is defined fixedly in E-step. Therefore, most variants of EM algorithm focus on how to maximize Q(Θ’|Θ) in M-step more effectively so that EM is faster or more accurate.

Tutorial on EM algorithm – Part 4

Loc Nguyen

Tutorial on EM algorithm – Part 3

Loc Nguyen

Local optimization with convex function is solved perfectly by traditional mathematical methods such as Newton-Raphson and gradient descent but it is not easy to solve the global optimization with arbitrary function although there are some purely mathematical approaches such as approximation, cutting plane, branch and bound, and interval method which can be impractical because of their complexity and high computation cost. Recently, some evolutional algorithms which are inspired from biological activities are proposed to solve the global optimization by acceptable heuristic level. Among them is particle swarm optimization (PSO) algorithm which is proved as an effective and feasible solution for global optimization in real applications. Although the ideology of PSO is not complicated, it derives many variants, which can make new researchers confused. Therefore, this tutorial focuses on describing, systemizing, and classifying PSO by succinct and straightforward way. Moreover, a combination of PSO and another evolutional algorithm as artificial bee colony (ABC) algorithm for improving PSO itself or solving other advanced problems are mentioned too.

Tutorial on particle swarm optimization

Loc Nguyen

Tutorial on EM algorithm – Part 2

Loc Nguyen

Maximum likelihood estimation (MLE) is a popular method for parameter estimation in both applied probability and statistics but MLE cannot solve the problem of incomplete data or hidden data because it is impossible to maximize likelihood function from hidden data. Expectation maximum (EM) algorithm is a powerful mathematical tool for solving this problem if there is a relationship between hidden data and observed data. Such hinting relationship is specified by a mapping from hidden data to observed data or by a joint probability between hidden data and observed data. In other words, the relationship helps us know hidden data by surveying observed data. The essential ideology of EM is to maximize the expectation of likelihood function over observed data based on the hinting relationship instead of maximizing directly the likelihood function of hidden data. Pioneers in EM algorithm proved its convergence. As a result, EM algorithm produces parameter estimators as well as MLE does. This tutorial aims to provide explanations of EM algorithm in order to help researchers comprehend it. Moreover some improvements of EM algorithm are also proposed in the tutorial such as combination of EM and third-order convergence Newton-Raphson process, combination of EM and gradient descent method, and combination of EM and particle swarm optimization (PSO) algorithm.

Tutorial on EM algorithm – Part 1

Loc Nguyen

Some thoughts about economics post pandemic in Southeast Asia

Loc Nguyen

Expectation maximization (EM) algorithm is a popular and powerful mathematical method for parameter estimation in case that there exist both observed data and hidden data. Therefore, EM is appropriate to applications which aim to exploit latent aspects under heterogeneous data. This report focuses on probabilistic finite mixture model which is a popular and successful application of EM, which is fully explained in my book (Nguyen, Tutorial on EM algorithm, 2020, pp. 78-88). I also proposed a special regression model associated with mixture model in which missing values are acceptable.

Finite mixture model with EM algorithm

Loc Nguyen

Some thoughts about science & technology

Loc Nguyen

Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating parameter of statistical models in case of incomplete data or hidden data. EM assumes that there is a relationship between hidden data and observed data, which can be a joint distribution or a mapping function. Therefore, this implies another implicit relationship between parameter estimation and data imputation. If missing data which contains missing values is considered as hidden data, it is very natural to handle missing data by EM algorithm. Handling missing data is not a new research but this report focuses on the theoretical base with detailed mathematical proofs for fulfilling missing values with EM. Besides, multinormal distribution and multinomial distribution are the two sample statistical models which are concerned to hold missing values.

Handling missing data with expectation maximization algorithm

Loc Nguyen

Expectation maximization (EM) algorithm is a popular and powerful mathematical method for parameter estimation in case that there exist both observed data and hidden data. The EM process depends on an implicit relationship between observed data and hidden data which is specified by a mapping function in traditional EM and a joint probability density function (PDF) in practical EM. However, the mapping function is vague and impractical whereas the joint PDF is not easy to be defined because of heterogeneity between observed data and hidden data. The research aims to improve competency of EM by making it more feasible and easier to be specified, which removes the vagueness. Therefore, the research proposes an assumption that observed data is the combination of hidden data which is realized as an analytic function where data points are numerical. In other words, observed points are supposedly calculated from hidden points via regression model. Mathematical computations and proofs indicate feasibility and clearness of the proposed method which can be considered as an extension of EM.

Expectation Maximization Algorithm with Combinatorial Assumption

Loc Nguyen

User model is description of users’ information and characteristics in abstract level. User model is very important to adaptive software which aims to support user as much as possible. The process to construct user model is called user modeling. Within learning context where users are learners, the research proposes a so-called Triangular Learner Model (TLM) which is composed of three essential learners’ properties such as knowledge, learning style, and learning history. TLM is the user model that supports built-in inference mechanism. So, the strong point of TLM is to reason out new information from users, based on mathematical tools. This paper focuses on fundamental algorithms and mathematical tools to construct three basic components of TLM such as knowledge sub-model, learning style sub-model, and learning history sub-model. In general, the paper is a summary of results from research on TLM. Algorithms and formulas are described by the succinct way.

Triangular Learner Model

Loc Nguyen

More from Loc Nguyen (20)

Where the dragon to fly

Adversarial Variational Autoencoders to extend and improve generative model

Learning dyadic data and predicting unaccomplished co-occurrent values by mix...

Tutorial on Bayesian optimization

Tutorial on Support Vector Machine

A Proposal of Two-step Autoregressive Model

Extreme bound analysis based on correlation coefficient for optimal regressio...

Jagged stock investment strategy

A short study on minima distribution

Tutorial on EM algorithm – Part 4

Tutorial on EM algorithm – Part 3

Tutorial on particle swarm optimization

Tutorial on EM algorithm – Part 2

Tutorial on EM algorithm – Part 1

Some thoughts about economics post pandemic in Southeast Asia

Finite mixture model with EM algorithm

Some thoughts about science & technology

Handling missing data with expectation maximization algorithm

Expectation Maximization Algorithm with Combinatorial Assumption

Triangular Learner Model

Tutorial on EM algorithm - Poster

1. Presentation • Maximum likelihood estimation (MLE) is a popular method for parameter estimation in both applied probability and statistics but MLE cannot solve the problem of incomplete data or hidden data because it is impossible to maximize likelihood function from hidden data. Expectation maximum (EM) algorithm is a powerful mathematical tool for solving this problem if there is a relationship between hidden data and observed data. Such hinting relationship is specified by a mapping from hidden data to observed data or by a joint probability between hidden data and observed data (showing MLE, EM, and practical EM, hidden info implies the hinting relationship). • The essential ideology of EM is to maximize the expectation of likelihood function over observed data based on the hinting relationship instead of maximizing directly the likelihood function of hidden data (showing the full EM with proof along with two steps). • An important application of EM is (finite) mixture model which in turn is developed towards two trends such as infinite mixture model and semiparametric mixture model. Especially, in semiparametric mixture model, component probabilistic density functions are not parameterized. Semiparametric mixture model is interesting and potential for other applications where probabilistic components are not easy to be specified (showing mixture models). • I raise a question that whether it is possible to backward discover semiparametric EM from semiparametric mixture model. I hope that this question will open a new trend or new extension for EM algorithm (showing the question).

2. Maximum likelihood estimation (MLE), MAP Θ = argmax Θ 𝑙 Θ = argmax Θ 𝑖=1 𝑁 log 𝑓 𝑋𝑖 Θ Expectation Maximization (EM) 𝑄 Θ′ Θ = 𝜑−1 𝑌 𝑘 𝑋 𝑌, Θ log 𝑓 𝑋 Θ′ d𝑋 Practical EM 𝑄 Θ′ Θ = 𝑋 𝑓 𝑋 𝑌, Θ log 𝑓 𝑋, 𝑌 Θ′ d𝑋 Full EM 𝑄 Θ′ Θ = 𝑖=1 𝑁 𝜑−1 𝑌𝑖 𝑘 𝑋𝑖 𝑌𝑖, Θ log 𝑓 𝑋𝑖 Θ′ d𝑋𝑖 𝑄 Θ′ Θ = 𝑖=1 𝑁 𝑋 𝑓 𝑋 𝑌𝑖, Θ log 𝑓 𝑋, 𝑌𝑖 Θ′ d𝑋 Observed data Observed data Observed data Mixture model Semiparametric EM ? ? E-step: Determine Q(Θ | Θ(t)) M-step: Θ 𝑡+1 = argmax Θ 𝑄 Θ Θ 𝑡 Require mapping Y = φ(X) Require PDF f(X, Y | ϴ) ? Tutorial on EM algorithm – Loc Nguyen (ng_phloc@yahoo.com) – http://www.locnguyen.net Semiparametric mixture model Infinite mixture model Hidden info Hidden info

3. Thank you for attention 3 Conditional mixture model for modeling attributed dyadic data 16/09/2021

Editor's Notes

Maximum likelihood estimation (MLE) is a popular method for parameter estimation in both applied probability and statistics but MLE cannot solve the problem of incomplete data or hidden data because it is impossible to maximize likelihood function from hidden data. Expectation maximization (EM) algorithm is a powerful mathematical tool for solving this problem if there is a relationship between hidden data and observed data. Such hinting relationship is specified by a mapping from hidden data to observed data or by a joint probability between hidden data and observed data (showing MLE, EM, and practical EM, hidden info implies the hinting relationship). The essential ideology of EM is to maximize the expectation of likelihood function over observed data based on the hinting relationship instead of maximizing directly the likelihood function of hidden data (showing the full EM with proof along with two steps). An important application of EM is (finite) mixture model which in turn is developed towards two trends such as infinite mixture model and semiparametric mixture model. Especially, in semiparametric mixture model, component probabilistic density functions are not parameterized. Semiparametric mixture model is interesting and potential for other applications where probabilistic components are not easy to be specified (showing mixture models). I raise a question that whether it is possible to backward discover semiparametric EM from semiparametric mixture model. I hope that this question will open a new trend or new extension for EM algorithm (showing the question).

Tutorial on EM algorithm - Poster

Recommended

Recommended

More Related Content

More from Loc Nguyen

More from Loc Nguyen (20)

Tutorial on EM algorithm - Poster

Editor's Notes