This is the chapter 3 "Properties and convergence of EM algorithm" in my book “Tutorial on EM algorithm”, which focuses on mathematical explanation of the convergence of GEM algorithm given by DLR (Dempster, Laird, & Rubin, 1977, pp. 6-9).
Expectation maximization (EM) algorithm is a popular and powerful mathematical method for parameter estimation in case that there exist both observed data and hidden data. Therefore, EM is appropriate to applications which aim to exploit latent aspects under heterogeneous data. This report focuses on probabilistic finite mixture model which is a popular and successful application of EM, which is fully explained in my book (Nguyen, Tutorial on EM algorithm, 2020, pp. 78-88). I also proposed a special regression model associated with mixture model in which missing values are acceptable.
This is chapter 4 “Variants of EM algorithm” in my book “Tutorial on EM algorithm”, which focuses on EM variants. The main purpose of expectation maximization (EM) algorithm, also GEM algorithm, is to maximize the log-likelihood L(Θ) = log(g(Y|Θ)) with observed data Y by maximizing the conditional expectation Q(Θ’|Θ). Such Q(Θ’|Θ) is defined fixedly in E-step. Therefore, most variants of EM algorithm focus on how to maximize Q(Θ’|Θ) in M-step more effectively so that EM is faster or more accurate.
Handling missing data with expectation maximization algorithmLoc Nguyen
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating parameter of statistical models in case of incomplete data or hidden data. EM assumes that there is a relationship between hidden data and observed data, which can be a joint distribution or a mapping function. Therefore, this implies another implicit relationship between parameter estimation and data imputation. If missing data which contains missing values is considered as hidden data, it is very natural to handle missing data by EM algorithm. Handling missing data is not a new research but this report focuses on the theoretical base with detailed mathematical proofs for fulfilling missing values with EM. Besides, multinormal distribution and multinomial distribution are the two sample statistical models which are concerned to hold missing values.
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
1. The document describes supervised learning problems, specifically linear regression with one feature. It defines key concepts like the hypothesis function, cost function, and gradient descent algorithm.
2. A data set with one input feature and one output is defined. The goal is to learn a linear function that maps the input to the output to best fit the training data.
3. The hypothesis function is defined as h(x) = θ0 + θ1x, where θ0 and θ1 are parameters to be estimated. Gradient descent is used to minimize the cost function and find the optimal θ values.
Stochastic Calculus, Summer 2014, July 22,Lecture 7Con.docxdessiechisomjj4
Stochastic Calculus, Summer 2014, July 22,
Lecture 7
Connection of the Stochastic Calculus and Partial Differential
Equation
Reading for this lecture:
(1) [1] pp. 125-175
(2) [2] pp. 239-280
(3) Professor R. Kohn’s lecture notes PDE for Finance, in particular Lecture 1
http://www.math.nyu.edu/faculty/kohn/pde_finance.html
Today throughout the lecture we will be using the following lemma.
Lemma 1. Assume we are given a random variable X on (Ω, F, P) and a filtration
(Ft)t≥0. Then E(X|Ft) is a martingale with respect to filtration (Ft)t≥0.
Proof. The proof is very easy and follows from the tower property of the conditional
expectation. �
Corollary 2. Let Xt be a Markov process and Ft be the natural filtration asso-
ciated with this process. Then according to the above lemma for any function V
process E(V (XT )|Ft) is a martingale and applying Markov property we get that
E(V (XT )|Xt) is a martingale. In the following we often write E(V (XT )|Xt) as
EXt=xV (XT ).
As we will see this corollary together with Itô’s formula yield some powerful
results on the connection of partial differential equations and stochastic calculus.
Expected value of payoff V (XT ). Assume that Xt is a stochastic process satis-
fying the following stochastic differential equation
dXt = a(t, Xt)dt + σ(t, Xt)dBt, (1)
or in the integral form
Xt − X0 =
t
∫
0
a(s, Xs)ds +
t
∫
0
σ(s, Xs)dBs. (2)
Let
u(t, x) = EXt=xV (XT ) (3)
be the expected value of some payoff V at maturity T > t given that Xt = x. Then
u(t, x) solves
ut + a(t, x)ux +
1
2
(σ(t, x))2uxx = 0 for t < T, with u(T, x) = V (x). (4)
By Corollary 2 we conclude that u(t, x) defined by (3) is a martingale. Applying
Itô’s lemma we obtain
du(t, Xt) = utdt + uxdXt +
1
2
uxx(dXt)
2
= utdt + ux(adt + σdBt) +
1
2
uxxσ
2
dt
= (ut + aux +
1
2
σ
2
uxx)dt + σuxdBt, (5)
1this version July 21, 2014
1
2
Since u(t, x) is a martingale the drift term must be zero and thus u(t, x) solves
ut + aux +
1
2
σ
2
uxx = 0.
Substituting t = T is (3) we get that u(T, x) = EXT =x(V (XT )) = V (x).
Feynman-Kac formula. Suppose that we are interested in a suitably “discounted”
final-time payoff of the form
u(t, x) = EXt=x
(
e
−
T
∫
t
b(s,Xs)ds
V (XT )
)
(6)
for some specified function b(t, Xt). We will show that u then solves
ut + a(t, x)ux +
1
2
σ
2
uxx − b(t, x)u = 0 (7)
and final-time condition u(T, x) = V (x).
The fact that u(T, x) = V (x) is clear from the definition of function u. Therefore
let us concentrate on the proof of (7). Our strategy is to apply Corollary 2 and thus
we have to find some martingale involving u(t, x). For this reason let us consider
e
−
t
∫
0
b(s,Xs)ds
u(t, x) = e
−
t
∫
0
b(s,Xs)ds
EXt=x
(
e
−
T
∫
t
b(s,Xs)ds
V (XT )
)
= EXt=x
(
e
−
T
∫
0
b(s,Xs)ds
V (XT )
)
. (8)
According to Corollary 2
EXt=x
(
e
−
T
∫
0
b(s,Xs)ds
V (XT )
)
is a martingale and thus e
−
t
∫
0
b(s,Xs)ds
u(t, x) is a martingale. Applying Itô’s lemma
.
The integrating factor method is a fundamental method for solving linear, first-order ordinary differential equations. It involves multiplying the differential equation by an integrating factor μ(t) = e^∫p(t)dt, which makes the left side an easily integrable expression. This allows the differential equation to be written as y(t)e^∫p(t)dt = ∫e^∫p(t)dtg(t)dt. Integrating both sides then gives the general solution y(t) = e^-∫p(t)dt∫e^∫p(t)dtg(t)dt.
Expectation Maximization Algorithm with Combinatorial AssumptionLoc Nguyen
Expectation maximization (EM) algorithm is a popular and powerful mathematical method for parameter estimation in case that there exist both observed data and hidden data. The EM process depends on an implicit relationship between observed data and hidden data which is specified by a mapping function in traditional EM and a joint probability density function (PDF) in practical EM. However, the mapping function is vague and impractical whereas the joint PDF is not easy to be defined because of heterogeneity between observed data and hidden data. The research aims to improve competency of EM by making it more feasible and easier to be specified, which removes the vagueness. Therefore, the research proposes an assumption that observed data is the combination of hidden data which is realized as an analytic function where data points are numerical. In other words, observed points are supposedly calculated from hidden points via regression model. Mathematical computations and proofs indicate feasibility and clearness of the proposed method which can be considered as an extension of EM.
This is the chapter 3 "Properties and convergence of EM algorithm" in my book “Tutorial on EM algorithm”, which focuses on mathematical explanation of the convergence of GEM algorithm given by DLR (Dempster, Laird, & Rubin, 1977, pp. 6-9).
Expectation maximization (EM) algorithm is a popular and powerful mathematical method for parameter estimation in case that there exist both observed data and hidden data. Therefore, EM is appropriate to applications which aim to exploit latent aspects under heterogeneous data. This report focuses on probabilistic finite mixture model which is a popular and successful application of EM, which is fully explained in my book (Nguyen, Tutorial on EM algorithm, 2020, pp. 78-88). I also proposed a special regression model associated with mixture model in which missing values are acceptable.
This is chapter 4 “Variants of EM algorithm” in my book “Tutorial on EM algorithm”, which focuses on EM variants. The main purpose of expectation maximization (EM) algorithm, also GEM algorithm, is to maximize the log-likelihood L(Θ) = log(g(Y|Θ)) with observed data Y by maximizing the conditional expectation Q(Θ’|Θ). Such Q(Θ’|Θ) is defined fixedly in E-step. Therefore, most variants of EM algorithm focus on how to maximize Q(Θ’|Θ) in M-step more effectively so that EM is faster or more accurate.
Handling missing data with expectation maximization algorithmLoc Nguyen
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating parameter of statistical models in case of incomplete data or hidden data. EM assumes that there is a relationship between hidden data and observed data, which can be a joint distribution or a mapping function. Therefore, this implies another implicit relationship between parameter estimation and data imputation. If missing data which contains missing values is considered as hidden data, it is very natural to handle missing data by EM algorithm. Handling missing data is not a new research but this report focuses on the theoretical base with detailed mathematical proofs for fulfilling missing values with EM. Besides, multinormal distribution and multinomial distribution are the two sample statistical models which are concerned to hold missing values.
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
1. The document describes supervised learning problems, specifically linear regression with one feature. It defines key concepts like the hypothesis function, cost function, and gradient descent algorithm.
2. A data set with one input feature and one output is defined. The goal is to learn a linear function that maps the input to the output to best fit the training data.
3. The hypothesis function is defined as h(x) = θ0 + θ1x, where θ0 and θ1 are parameters to be estimated. Gradient descent is used to minimize the cost function and find the optimal θ values.
Stochastic Calculus, Summer 2014, July 22,Lecture 7Con.docxdessiechisomjj4
Stochastic Calculus, Summer 2014, July 22,
Lecture 7
Connection of the Stochastic Calculus and Partial Differential
Equation
Reading for this lecture:
(1) [1] pp. 125-175
(2) [2] pp. 239-280
(3) Professor R. Kohn’s lecture notes PDE for Finance, in particular Lecture 1
http://www.math.nyu.edu/faculty/kohn/pde_finance.html
Today throughout the lecture we will be using the following lemma.
Lemma 1. Assume we are given a random variable X on (Ω, F, P) and a filtration
(Ft)t≥0. Then E(X|Ft) is a martingale with respect to filtration (Ft)t≥0.
Proof. The proof is very easy and follows from the tower property of the conditional
expectation. �
Corollary 2. Let Xt be a Markov process and Ft be the natural filtration asso-
ciated with this process. Then according to the above lemma for any function V
process E(V (XT )|Ft) is a martingale and applying Markov property we get that
E(V (XT )|Xt) is a martingale. In the following we often write E(V (XT )|Xt) as
EXt=xV (XT ).
As we will see this corollary together with Itô’s formula yield some powerful
results on the connection of partial differential equations and stochastic calculus.
Expected value of payoff V (XT ). Assume that Xt is a stochastic process satis-
fying the following stochastic differential equation
dXt = a(t, Xt)dt + σ(t, Xt)dBt, (1)
or in the integral form
Xt − X0 =
t
∫
0
a(s, Xs)ds +
t
∫
0
σ(s, Xs)dBs. (2)
Let
u(t, x) = EXt=xV (XT ) (3)
be the expected value of some payoff V at maturity T > t given that Xt = x. Then
u(t, x) solves
ut + a(t, x)ux +
1
2
(σ(t, x))2uxx = 0 for t < T, with u(T, x) = V (x). (4)
By Corollary 2 we conclude that u(t, x) defined by (3) is a martingale. Applying
Itô’s lemma we obtain
du(t, Xt) = utdt + uxdXt +
1
2
uxx(dXt)
2
= utdt + ux(adt + σdBt) +
1
2
uxxσ
2
dt
= (ut + aux +
1
2
σ
2
uxx)dt + σuxdBt, (5)
1this version July 21, 2014
1
2
Since u(t, x) is a martingale the drift term must be zero and thus u(t, x) solves
ut + aux +
1
2
σ
2
uxx = 0.
Substituting t = T is (3) we get that u(T, x) = EXT =x(V (XT )) = V (x).
Feynman-Kac formula. Suppose that we are interested in a suitably “discounted”
final-time payoff of the form
u(t, x) = EXt=x
(
e
−
T
∫
t
b(s,Xs)ds
V (XT )
)
(6)
for some specified function b(t, Xt). We will show that u then solves
ut + a(t, x)ux +
1
2
σ
2
uxx − b(t, x)u = 0 (7)
and final-time condition u(T, x) = V (x).
The fact that u(T, x) = V (x) is clear from the definition of function u. Therefore
let us concentrate on the proof of (7). Our strategy is to apply Corollary 2 and thus
we have to find some martingale involving u(t, x). For this reason let us consider
e
−
t
∫
0
b(s,Xs)ds
u(t, x) = e
−
t
∫
0
b(s,Xs)ds
EXt=x
(
e
−
T
∫
t
b(s,Xs)ds
V (XT )
)
= EXt=x
(
e
−
T
∫
0
b(s,Xs)ds
V (XT )
)
. (8)
According to Corollary 2
EXt=x
(
e
−
T
∫
0
b(s,Xs)ds
V (XT )
)
is a martingale and thus e
−
t
∫
0
b(s,Xs)ds
u(t, x) is a martingale. Applying Itô’s lemma
.
The integrating factor method is a fundamental method for solving linear, first-order ordinary differential equations. It involves multiplying the differential equation by an integrating factor μ(t) = e^∫p(t)dt, which makes the left side an easily integrable expression. This allows the differential equation to be written as y(t)e^∫p(t)dt = ∫e^∫p(t)dtg(t)dt. Integrating both sides then gives the general solution y(t) = e^-∫p(t)dt∫e^∫p(t)dtg(t)dt.
Expectation Maximization Algorithm with Combinatorial AssumptionLoc Nguyen
Expectation maximization (EM) algorithm is a popular and powerful mathematical method for parameter estimation in case that there exist both observed data and hidden data. The EM process depends on an implicit relationship between observed data and hidden data which is specified by a mapping function in traditional EM and a joint probability density function (PDF) in practical EM. However, the mapping function is vague and impractical whereas the joint PDF is not easy to be defined because of heterogeneity between observed data and hidden data. The research aims to improve competency of EM by making it more feasible and easier to be specified, which removes the vagueness. Therefore, the research proposes an assumption that observed data is the combination of hidden data which is realized as an analytic function where data points are numerical. In other words, observed points are supposedly calculated from hidden points via regression model. Mathematical computations and proofs indicate feasibility and clearness of the proposed method which can be considered as an extension of EM.
The document discusses the divide and conquer algorithm design paradigm. It begins by defining divide and conquer as recursively breaking down a problem into smaller sub-problems, solving the sub-problems, and then combining the solutions to solve the original problem. Some examples of problems that can be solved using divide and conquer include binary search, quicksort, merge sort, and the fast Fourier transform algorithm. The document then discusses control abstraction, efficiency analysis, and uses divide and conquer to provide algorithms for large integer multiplication and merge sort. It concludes by defining the convex hull problem and providing an example input and output.
This document discusses Hilbert-Schmidt n-tuples of operators on a Banach space. It presents two main results: 1) the Hypercyclicity Criterion, which provides conditions for an n-tuple of operators to be hypercyclic, and 2) conditions under which an n-tuple of unilateral weighted backward shifts is chaotic or has a non-trivial periodic point. It also references several other works studying properties of n-tuples and hypercyclic operators.
The document summarizes parameter estimation methods: 1) The method of moments estimates parameters by equating sample and population moments. 2) The maximum likelihood method estimates parameters by maximizing the likelihood function. 3) Mean-squared error compares estimators based on their variance and bias, preferring those with the smallest error.
Contemporary communication systems 1st edition mesiya solutions manualto2001
Contemporary Communication Systems 1st Edition Mesiya Solutions Manual
Download:https://goo.gl/DmVRQ4
contemporary communication systems mesiya pdf download
contemporary communication systems mesiya download
contemporary communication systems pdf
contemporary communication systems mesiya solutions
The International Journal of Engineering and Science (The IJES)theijes
The document discusses applying the Hansen-Bliek-Rohn method to solve the total least squares problem with interval data input. It begins with an introduction to total least squares and interval arithmetic. It then presents how to compute the mean and variance for statistical data expressed as intervals. Next, it discusses the general linear model for least squares and properties of the covariance matrix. It introduces using component-wise distance as a condition number for the weight matrix. In the following sections it will apply the Hansen-Bliek-Rohn method to a numerical example to solve the resulting interval linear system.
Conditional mixture model and its application for regression modelLoc Nguyen
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating statistical parameter when data sample contains hidden part and observed part. EM is applied to learn finite mixture model in which the whole distribution of observed variable is average sum of partial distributions. Coverage ratio of every partial distribution is specified by the probability of hidden variable. An application of mixture model is soft clustering in which cluster is modeled by hidden variable whereas each data point can be assigned to more than one cluster and degree of such assignment is represented by the probability of hidden variable. However, such probability in traditional mixture model is simplified as a parameter, which can cause loss of valuable information. Therefore, in this research I propose a so-called conditional mixture model (CMM) in which the probability of hidden variable is modeled as a full probabilistic density function (PDF) that owns individual parameter. CMM aims to extend mixture model. I also propose an application of CMM which is called adaptive regression model (ARM). Traditional regression model is effective when data sample is scattered equally. If data points are grouped into clusters, regression model tries to learn a unified regression function which goes through all data points. Obviously, such unified function is not effective to evaluate response variable based on grouped data points. The concept “adaptive” of ARM means that ARM solves the ineffectiveness problem by selecting the best cluster of data points firstly and then evaluating response variable within such best cluster. In orther words, ARM reduces estimation space of regression model so as to gain high accuracy in calculation.
Keywords: expectation maximization (EM) algorithm, finite mixture model, conditional mixture model, regression model, adaptive regression model (ARM).
This talk considers parameter estimation in the two-component symmetric Gaussian mixtures in $d$ dimensions with $n$ independent samples. We show that, even in the absence of any separation between components, with high probability, theEMalgorithm converges to an estimate in at most $O(\sqrt{n} \log n)$ iterations, which is within $O((d/n)^{1/4} (\log n)^{3/4})$ in Euclidean distance to the true parameter, provided that $n=\Omega(d \log^2 d)$. This is within a logarithmic factor to the minimax optimal rate of $(d/n)^{1/4}$. The proof relies on establishing (a) a non-linear contraction behavior of the populationEMmapping (b) concentration of theEMtrajectory near the population version, to prove that random initialization works. This is in contrast to previous analysis in Daskalakis, Tzamos, and Zampetakis (2017) that requires sample splitting and restart theEMiteration after normalization, and Balakrishnan, Wainwright, and Yu (2017) that requires strong conditions on both the separation of the components and the quality of the initialization. Furthermore, we obtain the asymptotic efficient estimation when the signal is stronger than the minimax rate.
1) The Fourier transform is useful for designing filters by allowing systems to be described in the frequency domain. Important properties include linearity, time shifts, differentiation, and convolution.
2) Convolution becomes simple multiplication in the frequency domain. To solve a differential/convolution equation using Fourier transforms, take the Fourier transform of the inputs, multiply them, and take the inverse Fourier transform of the result.
3) An example shows designing a low-pass filter by taking the inverse Fourier transform of a rectangular function, producing an ideal low-pass response without time-domain oscillations. Approximating this with a causal function provides some low-pass filtering characteristics.
Existence of Extremal Solutions of Second Order Initial Value Problemsijtsrd
In this paper existence of extremal solutions of second order initial value problems with discontinuous right hand side is obtained under certain monotonicity conditions and without assuming the existence of upper and lower solutions. Two basic differential inequalities corresponding to these initial value problems are obtained in the form of extremal solutions. And also we prove uniqueness of solutions of given initial value problems under certain conditions. A. Sreenivas ""Existence of Extremal Solutions of Second Order Initial Value Problems"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019,
URL: https://www.ijtsrd.com/papers/ijtsrd25192.pdf
Paper URL: https://www.ijtsrd.com/mathemetics/other/25192/existence-of-extremal-solutions-of-second-order-initial-value-problems/a-sreenivas
PaperNo14-Habibi-IJMA-n-Tuples and ChaoticityMezban Habibi
This document presents theorems and definitions related to n-tuples of operators on a Frechet space and conditions for chaoticity. It begins with definitions of key concepts such as the orbit of a vector under an n-tuple of operators and what it means for an n-tuple to be hypercyclic or for a vector to be periodic. The main results section presents two theorems, the first characterizing when an n-tuple satisfies the hypercyclicity criterion and the second proving conditions under which an n-tuple of weighted backward shifts is chaotic. The second theorem shows the equivalence of an n-tuple being chaotic, hypercyclic with a non-trivial periodic point, having a non-trivial periodic point, and a
This document discusses convolution and linear time-invariant (LTI) systems. It begins by defining the impulse response of a system and representing convolution using integral forms. Properties of convolution like time-invariance and linearity are described. The step response of LTI systems is introduced. Examples are provided to demonstrate calculating the convolution of inputs and impulse responses by breaking it into cases. Properties of LTI systems discussed include causality, stability, invertibility, and interconnecting multiple LTI systems.
Implementation of parallel randomized algorithm for skew-symmetric matrix gameAjay Bidyarthy
The document describes a parallel randomized algorithm for solving skew-symmetric matrix games. The algorithm finds an ε-optimal strategy x for the matrix A in time O(f(n)) with probability ≥ 1/2, where f(n) is a polynomial. The algorithm initializes vectors X and U, and then iteratively updates X and U based on a randomly selected index k until the stopping criterion of U/t ≤ ε is reached, guaranteeing ε-optimality of the output vector x. The algorithm is proven to halt within t = 4(-2ln(n)) iterations with probability ≥ 1/2.
This document provides an overview of convolution, Fourier series, and the Fourier transform. It defines convolution as a mathematical operator that computes the overlap between two functions. Fourier series expresses periodic functions as an infinite sum of sines and cosines. The Fourier transform allows converting signals between the time and frequency domains. It describes how the discrete Fourier transform (DFT) represents a sampled signal as a sum of complex exponentials, and how the fast Fourier transform (FFT) efficiently computes the DFT. The document also introduces Fourier transform pairs and defines the delta function.
The document discusses recursion, which is a method for solving problems by breaking them down into smaller subproblems. It provides examples of recursive algorithms like summing a list of numbers, calculating factorials, and the Fibonacci sequence. It also covers recursive algorithm components like the base case and recursive call. Methods for analyzing recursive algorithms' running times are presented, including iteration, recursion trees, and the master theorem.
This document discusses convolution, which is a mathematical operation used to express the relationship between the input and output of a linear time-invariant (LTI) system. Convolution involves integrating the product of two signals after one is reversed and shifted. The impulse response of an LTI system is its output when given a unit impulse as input. The output of an LTI system for any input can be found using the convolution integral, which convolves the input signal with the impulse response. Convolution has several properties like commutativity, distributivity, associativity, and shift properties. Finding the convolution of two signals involves integrating their product after one is shifted in time.
I am Abigail Taylor. I am a statistics Assignment Help Expert at statisticsassignmenthelp.com. I hold a Master's in statistics, from the California Institute of Technology, USA. I have been helping students with their Probability assignments for the past 6 years.
Visit statisticsassignmenthelp.com or email support@statisticsassignmenthelp.com.
You can also call +1 (315) 557-6473.
This paper introduces the concept of an ∞-tuple of operators on a Banach space and conditions for such an ∞-tuple to satisfy the Hypercyclicity Criterion. The paper defines what it means for an ∞-tuple to be hereditarily hypercyclic and proves theorems characterizing hereditarily hypercyclic ∞-tuples. Specifically, it is shown that an ∞-tuple is hereditarily hypercyclic if and only if it satisfies a topological transitivity condition involving open sets. The results generalize previous work on hypercyclic n-tuples of operators to the case of ∞-tuples.
This document describes computing Fourier series and power spectra with MATLAB. It discusses:
1) Representing signals in the frequency domain using Fourier analysis instead of the time domain. Fourier analysis allows isolating certain frequency ranges.
2) Computing Fourier series coefficients involves representing a signal as a sum of sines and cosines with different frequencies, and using integral properties to solve for coefficients.
3) Examples are provided to demonstrate Fourier series reconstruction of simple signals like a sine wave and square wave. The square wave example is used to derive its Fourier series coefficients analytically.
4) Computing Fourier transforms of discrete data uses a discrete approximation to integrals via the trapezoidal rule. A Fast Fourier Transform algorithm improves efficiency
On the Fixed Point Extension Results in the Differential Systems of Ordinary ...BRNSS Publication Hub
This document summarizes research on extending the domain of fixed points for ordinary differential equations. It begins with definitions of fixed points and extendability. It then establishes several theorems on extending fixed points, including using Peano's theorem on existence and Picard-Lindelof theorem on uniqueness to extend fixed points over open connected domains where the vector field is continuous. The document proves that if a fixed point is bounded on its domain and the limits at the endpoints exist, then the fixed point can be extended to those endpoints. It concludes by discussing extending fixed points defined on intervals to the whole positive real line using boundedness conditions.
Adversarial Variational Autoencoders to extend and improve generative model -...Loc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is a good data generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Nghịch dân chủ luận (tổng quan về dân chủ và thể chế chính trị liên quan đến ...Loc Nguyen
Vũ trụ có vật chất và phản vật chất, xã hội có xung đột và hữu hão để phát triển và suy tàn rồi suy tàn và phát triển. Tôi dựa vào đó để biện minh cho một bài viết có tính chất phản động nghịch chuyển thời cuộc nhưng bạn đọc sẽ tự tìm ra ý nghĩa bất ly của các hình thái xã hội. Ngoài ra bài viết này không đi sâu vào nghiên cứu pháp luật, chỉ đưa ra một cách nhìn tổng quan về dân chủ và thể chế chính trị liên quan đến triết học và tôn giáo, mà theo đó đóng góp của bài viết là khái niệm “nương tạm” của tư pháp không thật sự từ bầu cử và cũng không thật sự từ bổ nhiệm.
The document discusses the divide and conquer algorithm design paradigm. It begins by defining divide and conquer as recursively breaking down a problem into smaller sub-problems, solving the sub-problems, and then combining the solutions to solve the original problem. Some examples of problems that can be solved using divide and conquer include binary search, quicksort, merge sort, and the fast Fourier transform algorithm. The document then discusses control abstraction, efficiency analysis, and uses divide and conquer to provide algorithms for large integer multiplication and merge sort. It concludes by defining the convex hull problem and providing an example input and output.
This document discusses Hilbert-Schmidt n-tuples of operators on a Banach space. It presents two main results: 1) the Hypercyclicity Criterion, which provides conditions for an n-tuple of operators to be hypercyclic, and 2) conditions under which an n-tuple of unilateral weighted backward shifts is chaotic or has a non-trivial periodic point. It also references several other works studying properties of n-tuples and hypercyclic operators.
The document summarizes parameter estimation methods: 1) The method of moments estimates parameters by equating sample and population moments. 2) The maximum likelihood method estimates parameters by maximizing the likelihood function. 3) Mean-squared error compares estimators based on their variance and bias, preferring those with the smallest error.
Contemporary communication systems 1st edition mesiya solutions manualto2001
Contemporary Communication Systems 1st Edition Mesiya Solutions Manual
Download:https://goo.gl/DmVRQ4
contemporary communication systems mesiya pdf download
contemporary communication systems mesiya download
contemporary communication systems pdf
contemporary communication systems mesiya solutions
The International Journal of Engineering and Science (The IJES)theijes
The document discusses applying the Hansen-Bliek-Rohn method to solve the total least squares problem with interval data input. It begins with an introduction to total least squares and interval arithmetic. It then presents how to compute the mean and variance for statistical data expressed as intervals. Next, it discusses the general linear model for least squares and properties of the covariance matrix. It introduces using component-wise distance as a condition number for the weight matrix. In the following sections it will apply the Hansen-Bliek-Rohn method to a numerical example to solve the resulting interval linear system.
Conditional mixture model and its application for regression modelLoc Nguyen
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating statistical parameter when data sample contains hidden part and observed part. EM is applied to learn finite mixture model in which the whole distribution of observed variable is average sum of partial distributions. Coverage ratio of every partial distribution is specified by the probability of hidden variable. An application of mixture model is soft clustering in which cluster is modeled by hidden variable whereas each data point can be assigned to more than one cluster and degree of such assignment is represented by the probability of hidden variable. However, such probability in traditional mixture model is simplified as a parameter, which can cause loss of valuable information. Therefore, in this research I propose a so-called conditional mixture model (CMM) in which the probability of hidden variable is modeled as a full probabilistic density function (PDF) that owns individual parameter. CMM aims to extend mixture model. I also propose an application of CMM which is called adaptive regression model (ARM). Traditional regression model is effective when data sample is scattered equally. If data points are grouped into clusters, regression model tries to learn a unified regression function which goes through all data points. Obviously, such unified function is not effective to evaluate response variable based on grouped data points. The concept “adaptive” of ARM means that ARM solves the ineffectiveness problem by selecting the best cluster of data points firstly and then evaluating response variable within such best cluster. In orther words, ARM reduces estimation space of regression model so as to gain high accuracy in calculation.
Keywords: expectation maximization (EM) algorithm, finite mixture model, conditional mixture model, regression model, adaptive regression model (ARM).
This talk considers parameter estimation in the two-component symmetric Gaussian mixtures in $d$ dimensions with $n$ independent samples. We show that, even in the absence of any separation between components, with high probability, theEMalgorithm converges to an estimate in at most $O(\sqrt{n} \log n)$ iterations, which is within $O((d/n)^{1/4} (\log n)^{3/4})$ in Euclidean distance to the true parameter, provided that $n=\Omega(d \log^2 d)$. This is within a logarithmic factor to the minimax optimal rate of $(d/n)^{1/4}$. The proof relies on establishing (a) a non-linear contraction behavior of the populationEMmapping (b) concentration of theEMtrajectory near the population version, to prove that random initialization works. This is in contrast to previous analysis in Daskalakis, Tzamos, and Zampetakis (2017) that requires sample splitting and restart theEMiteration after normalization, and Balakrishnan, Wainwright, and Yu (2017) that requires strong conditions on both the separation of the components and the quality of the initialization. Furthermore, we obtain the asymptotic efficient estimation when the signal is stronger than the minimax rate.
1) The Fourier transform is useful for designing filters by allowing systems to be described in the frequency domain. Important properties include linearity, time shifts, differentiation, and convolution.
2) Convolution becomes simple multiplication in the frequency domain. To solve a differential/convolution equation using Fourier transforms, take the Fourier transform of the inputs, multiply them, and take the inverse Fourier transform of the result.
3) An example shows designing a low-pass filter by taking the inverse Fourier transform of a rectangular function, producing an ideal low-pass response without time-domain oscillations. Approximating this with a causal function provides some low-pass filtering characteristics.
Existence of Extremal Solutions of Second Order Initial Value Problemsijtsrd
In this paper existence of extremal solutions of second order initial value problems with discontinuous right hand side is obtained under certain monotonicity conditions and without assuming the existence of upper and lower solutions. Two basic differential inequalities corresponding to these initial value problems are obtained in the form of extremal solutions. And also we prove uniqueness of solutions of given initial value problems under certain conditions. A. Sreenivas ""Existence of Extremal Solutions of Second Order Initial Value Problems"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019,
URL: https://www.ijtsrd.com/papers/ijtsrd25192.pdf
Paper URL: https://www.ijtsrd.com/mathemetics/other/25192/existence-of-extremal-solutions-of-second-order-initial-value-problems/a-sreenivas
PaperNo14-Habibi-IJMA-n-Tuples and ChaoticityMezban Habibi
This document presents theorems and definitions related to n-tuples of operators on a Frechet space and conditions for chaoticity. It begins with definitions of key concepts such as the orbit of a vector under an n-tuple of operators and what it means for an n-tuple to be hypercyclic or for a vector to be periodic. The main results section presents two theorems, the first characterizing when an n-tuple satisfies the hypercyclicity criterion and the second proving conditions under which an n-tuple of weighted backward shifts is chaotic. The second theorem shows the equivalence of an n-tuple being chaotic, hypercyclic with a non-trivial periodic point, having a non-trivial periodic point, and a
This document discusses convolution and linear time-invariant (LTI) systems. It begins by defining the impulse response of a system and representing convolution using integral forms. Properties of convolution like time-invariance and linearity are described. The step response of LTI systems is introduced. Examples are provided to demonstrate calculating the convolution of inputs and impulse responses by breaking it into cases. Properties of LTI systems discussed include causality, stability, invertibility, and interconnecting multiple LTI systems.
Implementation of parallel randomized algorithm for skew-symmetric matrix gameAjay Bidyarthy
The document describes a parallel randomized algorithm for solving skew-symmetric matrix games. The algorithm finds an ε-optimal strategy x for the matrix A in time O(f(n)) with probability ≥ 1/2, where f(n) is a polynomial. The algorithm initializes vectors X and U, and then iteratively updates X and U based on a randomly selected index k until the stopping criterion of U/t ≤ ε is reached, guaranteeing ε-optimality of the output vector x. The algorithm is proven to halt within t = 4(-2ln(n)) iterations with probability ≥ 1/2.
This document provides an overview of convolution, Fourier series, and the Fourier transform. It defines convolution as a mathematical operator that computes the overlap between two functions. Fourier series expresses periodic functions as an infinite sum of sines and cosines. The Fourier transform allows converting signals between the time and frequency domains. It describes how the discrete Fourier transform (DFT) represents a sampled signal as a sum of complex exponentials, and how the fast Fourier transform (FFT) efficiently computes the DFT. The document also introduces Fourier transform pairs and defines the delta function.
The document discusses recursion, which is a method for solving problems by breaking them down into smaller subproblems. It provides examples of recursive algorithms like summing a list of numbers, calculating factorials, and the Fibonacci sequence. It also covers recursive algorithm components like the base case and recursive call. Methods for analyzing recursive algorithms' running times are presented, including iteration, recursion trees, and the master theorem.
This document discusses convolution, which is a mathematical operation used to express the relationship between the input and output of a linear time-invariant (LTI) system. Convolution involves integrating the product of two signals after one is reversed and shifted. The impulse response of an LTI system is its output when given a unit impulse as input. The output of an LTI system for any input can be found using the convolution integral, which convolves the input signal with the impulse response. Convolution has several properties like commutativity, distributivity, associativity, and shift properties. Finding the convolution of two signals involves integrating their product after one is shifted in time.
I am Abigail Taylor. I am a statistics Assignment Help Expert at statisticsassignmenthelp.com. I hold a Master's in statistics, from the California Institute of Technology, USA. I have been helping students with their Probability assignments for the past 6 years.
Visit statisticsassignmenthelp.com or email support@statisticsassignmenthelp.com.
You can also call +1 (315) 557-6473.
This paper introduces the concept of an ∞-tuple of operators on a Banach space and conditions for such an ∞-tuple to satisfy the Hypercyclicity Criterion. The paper defines what it means for an ∞-tuple to be hereditarily hypercyclic and proves theorems characterizing hereditarily hypercyclic ∞-tuples. Specifically, it is shown that an ∞-tuple is hereditarily hypercyclic if and only if it satisfies a topological transitivity condition involving open sets. The results generalize previous work on hypercyclic n-tuples of operators to the case of ∞-tuples.
This document describes computing Fourier series and power spectra with MATLAB. It discusses:
1) Representing signals in the frequency domain using Fourier analysis instead of the time domain. Fourier analysis allows isolating certain frequency ranges.
2) Computing Fourier series coefficients involves representing a signal as a sum of sines and cosines with different frequencies, and using integral properties to solve for coefficients.
3) Examples are provided to demonstrate Fourier series reconstruction of simple signals like a sine wave and square wave. The square wave example is used to derive its Fourier series coefficients analytically.
4) Computing Fourier transforms of discrete data uses a discrete approximation to integrals via the trapezoidal rule. A Fast Fourier Transform algorithm improves efficiency
On the Fixed Point Extension Results in the Differential Systems of Ordinary ...BRNSS Publication Hub
This document summarizes research on extending the domain of fixed points for ordinary differential equations. It begins with definitions of fixed points and extendability. It then establishes several theorems on extending fixed points, including using Peano's theorem on existence and Picard-Lindelof theorem on uniqueness to extend fixed points over open connected domains where the vector field is continuous. The document proves that if a fixed point is bounded on its domain and the limits at the endpoints exist, then the fixed point can be extended to those endpoints. It concludes by discussing extending fixed points defined on intervals to the whole positive real line using boundedness conditions.
Adversarial Variational Autoencoders to extend and improve generative model -...Loc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is a good data generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Nghịch dân chủ luận (tổng quan về dân chủ và thể chế chính trị liên quan đến ...Loc Nguyen
Vũ trụ có vật chất và phản vật chất, xã hội có xung đột và hữu hão để phát triển và suy tàn rồi suy tàn và phát triển. Tôi dựa vào đó để biện minh cho một bài viết có tính chất phản động nghịch chuyển thời cuộc nhưng bạn đọc sẽ tự tìm ra ý nghĩa bất ly của các hình thái xã hội. Ngoài ra bài viết này không đi sâu vào nghiên cứu pháp luật, chỉ đưa ra một cách nhìn tổng quan về dân chủ và thể chế chính trị liên quan đến triết học và tôn giáo, mà theo đó đóng góp của bài viết là khái niệm “nương tạm” của tư pháp không thật sự từ bầu cử và cũng không thật sự từ bổ nhiệm.
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsLoc Nguyen
Collaborative filtering (CF) is a popular technique in recommendation study. Concretely, items which are recommended to user are determined by surveying her/his communities. There are two main CF approaches, which are memory-based and model-based. I propose a new CF model-based algorithm by mining frequent itemsets from rating database. Hence items which belong to frequent itemsets are recommended to user. My CF algorithm gives immediate response because the mining task is performed at offline process-mode. I also propose another so-called Roller algorithm for improving the process of mining frequent itemsets. Roller algorithm is implemented by heuristic assumption “The larger the support of an item is, the higher it’s likely that this item will occur in some frequent itemset”. It models upon doing white-wash task, which rolls a roller on a wall in such a way that is capable of picking frequent itemsets. Moreover I provide enhanced techniques such as bit representation, bit matching and bit mining in order to speed up recommendation process. These techniques take advantages of bitwise operations (AND, NOT) so as to reduce storage space and make algorithms run faster.
Simple image deconvolution based on reverse image convolution and backpropaga...Loc Nguyen
Deconvolution task is not important in convolutional neural network (CNN) because it is not imperative to recover convoluted image when convolutional layer is important to extract features. However, the deconvolution task is useful in some cases of inspecting and reflecting a convolutional filter as well as trying to improve a generated image when information loss is not serious with regard to trade-off of information loss and specific features such as edge detection and sharpening. This research proposes a duplicated and reverse process of recovering a filtered image. Firstly, source layer and target layer are reversed in accordance with traditional image convolution so as to train the convolutional filter. Secondly, the trained filter is reversed again to derive a deconvolutional operator for recovering the filtered image. The reverse process is associated with backpropagation algorithm which is most popular in learning neural network. Experimental results show that the proposed technique in this research is better to learn the filters that focus on discovering pixel differences. Therefore, the main contribution of this research is to inspect convolutional filters from data.
Technological Accessibility: Learning Platform Among Senior High School StudentsLoc Nguyen
The document discusses technological accessibility among senior high school students. It covers several topics:
- Returning to nature through technology and ensuring technology accessibility is a life skill strengthened by advantages while avoiding problems for youth.
- Solutions to the dilemma of technology accessibility include letting youth develop independently, using security controls, and encouraging compassion.
- Researching career is optional but following passion and balancing positive and negative emotions can contribute to success.
The document discusses how engineering and technology can be used for social impact. It makes three key points:
1. Technology stems from knowledge and science, but assessing the value of knowledge is difficult. However, the importance of technology is clear through its fruits. Education fosters gaining knowledge and leads to technological development.
2. While technology increases wealth, it also widens inequality gaps. However, technology can indirectly address inequality through education by providing universal access to learning and bringing universities to people everywhere.
3. Diversity among countries should be embraced, and late innovations building on existing technologies can be particularly impactful, as was the strategy of a racing champion who overtook opponents in the final round through close observation and
Harnessing Technology for Research EducationLoc Nguyen
The document discusses harnessing technology for research and education. It notes some paradoxes of technology, such as how it can both increase inequality but also help solve it through education. It discusses the future of education being supported by distance learning tools, AI, and virtual/augmented reality. Open universities are seen as an intermediate step towards "home universities" and as a place where students can become lifelong learners and teachers. Understanding, emotion, and compassion are discussed as being more important than just remembering facts. The document ends by discussing connecting different subjects and technologies like a jigsaw puzzle.
Future of education with support of technologyLoc Nguyen
The document discusses the future of education with the support of technology. It notes that while technology deepens inequality by benefiting the rich, it can indirectly address inequality through education by allowing anyone to access knowledge and study from anywhere. Emerging forms of educational support include distance learning using tools like Zoom, artificial intelligence assistants, and virtual/augmented reality. Blended teaching combining different methods is emphasized. Understanding is seen as more important than memorization, and education should foster understanding, emotional development, and compassion through nature-based activities. The document argues technology can help education promote love by connecting various topics through both fusion and by assembling them like a jigsaw puzzle.
The document discusses generative artificial intelligence (AI) and its applications, using the metaphor of a dragon's flight. It introduces digital generative AI models that can generate images, sounds, and motions. As an example, it describes a model that could generate possible orbits or movements for a dragon and tiger in an image along with a bamboo background. The document then discusses how generative AI could be applied creatively in education to generate new learning materials and methods. It argues that while technology and societies are changing rapidly, climate change is a unifying challenge that nations must work together to overcome.
Adversarial Variational Autoencoders to extend and improve generative modelLoc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is good at generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Learning dyadic data and predicting unaccomplished co-occurrent values by mix...Loc Nguyen
Dyadic data which is also called co-occurrence data (COD) contains co-occurrences of objects. Searching for statistical models to represent dyadic data is necessary. Fortunately, finite mixture model is a solid statistical model to learn and make inference on dyadic data because mixture model is built smoothly and reliably by expectation maximization (EM) algorithm which is suitable to inherent spareness of dyadic data. This research summarizes mixture models for dyadic data. When each co-occurrence in dyadic data is associated with a value, there are many unaccomplished values because a lot of co-occurrences are inexistent. In this research, these unaccomplished values are estimated as mean (expectation) of random variable given partial probabilistic distributions inside dyadic mixture model.
Machine learning forks into three main branches such as supervised learning, unsupervised learning, and reinforcement learning where reinforcement learning is much potential to artificial intelligence (AI) applications because it solves real problems by progressive process in which possible solutions are improved and finetuned continuously. The progressive approach, which reflects ability of adaptation, is appropriate to the real world where most events occur and change continuously and unexpectedly. Moreover, data is getting too huge for supervised learning and unsupervised learning to draw valuable knowledge from such huge data at one time. Bayesian optimization (BO) models an optimization problem as a probabilistic form called surrogate model and then directly maximizes an acquisition function created from such surrogate model in order to maximize implicitly and indirectly the target function for finding out solution of the optimization problem. A popular surrogate model is Gaussian process regression model. The process of maximizing acquisition function is based on updating posterior probability of surrogate model repeatedly, which is improved after every iteration. Taking advantages of acquisition function or utility function is also common in decision theory but the semantic meaning behind BO is that BO solves problems by progressive and adaptive approach via updating surrogate model from a small piece of data at each time, according to ideology of reinforcement learning. Undoubtedly, BO is a reinforcement learning algorithm with many potential applications and thus it is surveyed in this research with attention to its mathematical ideas. Moreover, the solution of optimization problem is important to not only applied mathematics but also AI.
Support vector machine is a powerful machine learning method in data classification. Using it for applied researches is easy but comprehending it for further development requires a lot of efforts. This report is a tutorial on support vector machine with full of mathematical proofs and example, which help researchers to understand it by the fastest way from theory to practice. The report focuses on theory of optimization which is the base of support vector machine.
A Proposal of Two-step Autoregressive ModelLoc Nguyen
Autoregressive (AR) model and conditional autoregressive (CAR) model are specific regressive models in which independent variables and dependent variable imply the same object. They are powerful statistical tools to predict values based on correlation of time domain and space domain, which are useful in epidemiology analysis. In this research, I combine them by the simple way in which AR and CAR is estimated in two separate steps so as to cover time domain and space domain in spatial-temporal data analysis. Moreover, I integrate logistic model into CAR model, which aims to improve competence of autoregressive models.
Extreme bound analysis based on correlation coefficient for optimal regressio...Loc Nguyen
Regression analysis is an important tool in statistical analysis, in which there is a demand of discovering essential independent variables among many other ones, especially in case that there is a huge number of random variables. Extreme bound analysis is a powerful approach to extract such important variables called robust regressors. In this research, a so-called Regressive Expectation Maximization with RObust regressors (REMRO) algorithm is proposed as an alternative method beside other probabilistic methods for analyzing robust variables. By the different ideology from other probabilistic methods, REMRO searches for robust regressors forming optimal regression model and sorts them according to descending ordering given their fitness values determined by two proposed concepts of local correlation and global correlation. Local correlation represents sufficient explanatories to possible regressive models and global correlation reflects independence level and stand-alone capacity of regressors. Moreover, REMRO can resist incomplete data because it applies Regressive Expectation Maximization (REM) algorithm into filling missing values by estimated values based on ideology of expectation maximization (EM) algorithm. From experimental results, REMRO is more accurate for modeling numeric regressors than traditional probabilistic methods like Sala-I-Martin method but REMRO cannot be applied in case of nonnumeric regression model yet in this research.
The document proposes a "jagged stock investment" strategy where stocks are repeatedly purchased at intervals where the price is increasing, similar to rises and falls in a saw blade. It shows mathematically that this strategy can achieve equal or higher returns than bank depositing if the average growth rate and interest rate are the same. The strategy models purchasing a share and its replications over time periods where each replication is bought using the profits from the previous rise. It provides formulas to calculate the overall value and return on investment from this jagged stock investment approach.
This document presents a study on minima distribution and its application to explaining the convergence and convergence speed of optimization algorithms. The author proposes weaker conditions for the convergence and monotonicity properties of minima distribution compared to previous work. Specifically, the convergence condition requires the function τ(x) to be positive and non-minimizing everywhere except at minimizers of the target function f(x). The monotonicity condition requires f(x) and the derivative of τ(x) to be inversely proportional. The author then defines convergence speed as the derivative of the integral of f(x) with respect to the optimization iterations and shows it depends on the slope of the derivative of τ(x).
Local optimization with convex function is solved perfectly by traditional mathematical methods such as Newton-Raphson and gradient descent but it is not easy to solve the global optimization with arbitrary function although there are some purely mathematical approaches such as approximation, cutting plane, branch and bound, and interval method which can be impractical because of their complexity and high computation cost. Recently, some evolutional algorithms which are inspired from biological activities are proposed to solve the global optimization by acceptable heuristic level. Among them is particle swarm optimization (PSO) algorithm which is proved as an effective and feasible solution for global optimization in real applications. Although the ideology of PSO is not complicated, it derives many variants, which can make new researchers confused. Therefore, this tutorial focuses on describing, systemizing, and classifying PSO by succinct and straightforward way. Moreover, a combination of PSO and another evolutional algorithm as artificial bee colony (ABC) algorithm for improving PSO itself or solving other advanced problems are mentioned too.
Maximum likelihood estimation (MLE) is a popular method for parameter estimation in both applied probability and statistics but MLE cannot solve the problem of incomplete data or hidden data because it is impossible to maximize likelihood function from hidden data. Expectation maximum (EM) algorithm is a powerful mathematical tool for solving this problem if there is a relationship between hidden data and observed data. Such hinting relationship is specified by a mapping from hidden data to observed data or by a joint probability between hidden data and observed data (showing MLE, EM, and practical EM, hidden info implies the hinting relationship).
The essential ideology of EM is to maximize the expectation of likelihood function over observed data based on the hinting relationship instead of maximizing directly the likelihood function of hidden data (showing the full EM with proof along with two steps).
An important application of EM is (finite) mixture model which in turn is developed towards two trends such as infinite mixture model and semiparametric mixture model. Especially, in semiparametric mixture model, component probabilistic density functions are not parameterized. Semiparametric mixture model is interesting and potential for other applications where probabilistic components are not easy to be specified (showing mixture models).
I raise a question that whether it is possible to backward discover semiparametric EM from semiparametric mixture model. I hope that this question will open a new trend or new extension for EM algorithm (showing the question).
Maximum likelihood estimation (MLE) is a popular method for parameter estimation in both applied probability and statistics but MLE cannot solve the problem of incomplete data or hidden data because it is impossible to maximize likelihood function from hidden data. Expectation maximum (EM) algorithm is a powerful mathematical tool for solving this problem if there is a relationship between hidden data and observed data. Such hinting relationship is specified by a mapping from hidden data to observed data or by a joint probability between hidden data and observed data. In other words, the relationship helps us know hidden data by surveying observed data. The essential ideology of EM is to maximize the expectation of likelihood function over observed data based on the hinting relationship instead of maximizing directly the likelihood function of hidden data. Pioneers in EM algorithm proved its convergence. As a result, EM algorithm produces parameter estimators as well as MLE does. This tutorial aims to provide explanations of EM algorithm in order to help researchers comprehend it. Moreover some improvements of EM algorithm are also proposed in the tutorial such as combination of EM and third-order convergence Newton-Raphson process, combination of EM and gradient descent method, and combination of EM and particle swarm optimization (PSO) algorithm.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
8.Isolation of pure cultures and preservation of cultures.pdf
Tutorial on EM algorithm – Part 2
1. Tutorial on EM algorithm – Part 2
Prof. Dr. Loc Nguyen, PhD, PostDoc
Founder of Loc Nguyen’s Academic Network, Vietnam
Email: ng_phloc@yahoo.com
Homepage: www.locnguyen.net
EM Tutorial P2 - Loc Nguyen
30/05/2022 1
2. Abstract
This is the chapter 2 in my book “Tutorial on EM algorithm”, which
focuses on essential description of EM algorithm.
30/05/2022 EM Tutorial P2 - Loc Nguyen 2
3. Table of contents
This report focuses on EM description in detail.
1. Traditional EM algorithm
2. Practical EM algorithm
3
EM Tutorial P2 - Loc Nguyen
30/05/2022
4. 1. Traditional EM algorithm
Expectation maximization (EM) algorithm has many iterations and each iteration has two steps in which expectation step (E-
step) calculates sufficient statistic of hidden data based on observed data and current parameter whereas maximization step
(M-step) re-estimates parameter. When DLR proposed EM algorithm (Dempster, Laird, & Rubin, 1977), they firstly
concerned that the PDF f(X | Θ) of hidden space belongs to exponential family. E-step and M-step at the tth iteration are
described in table 2.1 (Dempster, Laird, & Rubin, 1977, p. 4), in which the current estimate is Θ(t), with note that f(X | Θ)
belongs to regular exponential family.
Table 2.1. E-step and M-step of EM algorithm given regular exponential PDF f(X|Θ)
EM algorithm stops if two successive estimates are equal, Θ* = Θ(t) = Θ(t+1), at some tth iteration. At that time we conclude that
Θ* is the optimal estimate of EM process. Please see table 1.2 to know how to calculate E(τ(X) | Θ(t)) and E(τ(X) | Y, Θ(t)). As a
convention, the estimate of parameter Θ resulted from EM process is denoted Θ* instead of Θ in order to emphasize that Θ* is
solution of optimization problem.
30/05/2022 EM Tutorial P2 - Loc Nguyen 4
E-step: We calculate current value τ(t) of the sufficient statistic τ(X) from observed Y and current parameter Θ(t) according to
equation 2.6:
𝜏 𝑡 = 𝐸 𝜏 𝑋 𝑌, Θ 𝑡
M-step: Basing on τ(t), we determine the next parameter Θ(t+1) as solution of equation 2.3:
𝐸 𝜏 𝑋 Θ = 𝜏 𝑡
Note, Θ(t+1) will become current parameter at the next iteration ((t+1)th iteration).
5. 1. Traditional EM algorithm
It is necessary to explain E-step and M-step as well as convergence of EM algorithm. Essentially, the
two steps aim to maximize log-likelihood function of Θ, denoted L(Θ), with respect to observation Y.
Θ∗
= argmax
Θ
𝐿 Θ
Where, 𝐿 Θ = log 𝑔 𝑌 Θ . Note that log(.) denotes logarithm function. Therefore, EM algorithm
is an extension of maximum likelihood estimation (MLE) method. In fact, let l(Θ) be log-likelihood
function of Θ with respect to X.
𝑙 Θ = log 𝑓 𝑋 Θ = log 𝑏 𝑋 + Θ𝑇𝜏 𝑋 − log 𝑎 Θ (2.1)
By referring to table 1.2, the first-order derivative of l(Θ) is:
d𝑙 Θ
dΘ
=
dlog 𝑓 𝑌 Θ
dΘ
= 𝜏 𝑋
𝑇
− log′ 𝑎 Θ = 𝜏 𝑋
𝑇
− 𝐸 𝜏 𝑋 Θ
𝑇
(2.2)
We set the first-order derivative of l(Θ) to be zero with expectation that l(Θ) will be maximized.
Therefore, the optimal estimate Θ* is solution of the following equation which is specified in M-step.
𝐸 𝜏 𝑋 Θ = 𝜏 𝑋
30/05/2022 EM Tutorial P2 - Loc Nguyen 5
6. 1. Traditional EM algorithm
The expression E(τ(X) | Θ) is function of Θ but τ(X) is still dependent on X. Let τ(t) be value of τ(X) at the tth
iteration of EM process, candidate for the best estimate of Θ is solution of equation 2.3 according to M-step.
𝐸 𝜏 𝑋 Θ = 𝜏 𝑡 (2.3)
Where 𝐸 𝜏 𝑋 Θ = 𝑋
𝑓 𝑋 Θ 𝜏 𝑋 d𝑋. Thus, we will calculate τ(t) by maximizing the log-likelihood function
L(Θ) given Y. Recall that maximizing L(Θ) is the ultimate purpose of EM algorithm.
Θ∗ = argmax
Θ
𝐿 Θ
Where,
𝐿 Θ = log 𝑔 𝑌 Θ = log
𝜑−1 𝑌
𝑓 𝑋 Θ d𝑋 (2.4)
Due to: 𝑘 𝑋 𝑌, Θ =
𝑓 𝑋 Θ
𝑔 𝑌 Θ
, it implies:
𝐿 Θ = log 𝑔 𝑌 Θ = log 𝑓 𝑋 Θ − log 𝑘 𝑋 𝑌, Θ
Because f(X | Θ) belongs to exponential family, we have:
𝑓 𝑋 Θ = 𝑏 𝑋 exp Θ𝑇𝜏 𝑋 𝑎 Θ
𝑘 𝑋 𝑌, Θ = 𝑏 𝑋 exp Θ𝑇
𝜏 𝑋 𝑎 Θ 𝑌
The log-likelihood function L(Θ) is reduced as follows:
𝐿 Θ = −log 𝑎 Θ + log 𝑎 Θ 𝑌
30/05/2022 EM Tutorial P2 - Loc Nguyen 6
7. 1. Traditional EM algorithm
By referring to table 1.2, the first-order derivative of L(Θ) is:
d𝐿 Θ
dΘ
= −log′
𝑎 Θ + log′
𝑎 Θ 𝑌 = − 𝐸 𝜏 𝑋 Θ
𝑇
+ 𝐸 𝜏 𝑋 𝑌, Θ
𝑇
(2.5)
We set the first-order derivative of L(Θ) to be zero with expectation that L(Θ) will be maximized, as follows:
− 𝐸 𝜏 𝑋 Θ
𝑇
+ 𝐸 𝜏 𝑋 𝑌, Θ
𝑇
= 0
It implies: 𝐸 𝜏 𝑋 Θ = 𝐸 𝜏 𝑋 𝑌, Θ . Let Θ(t) be the current estimate at some tth iteration of EM process. Derived
from the equality above, the value τ(t) is calculated as seen in equation 2.6.
𝜏 𝑡
= 𝐸 𝜏 𝑋 𝑌, Θ 𝑡
(2.6)
Where,
𝐸 𝜏 𝑋 𝑌, Θ 𝑡
=
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ 𝑡
𝜏 𝑋 d𝑋
Equation 2.6 specifies the E-step of EM process. After t iterations we will obtain Θ* = Θ(t+1) = Θ(t) such that E(τ(X) | Y,
Θ(t)) = E(τ(X) | Y, Θ*) = τ(t) = E(τ(X) | Θ*) = E(τ(X) | Θ(t+1)) when Θ(t+1) is solution of equation 2.3 (Dempster, Laird, &
Rubin, 1977, p. 5). This means that Θ* is the optimal estimate of EM process because Θ* is solution of the equation:
𝐸 𝜏 𝑋 Θ = 𝐸 𝜏 𝑋 𝑌, Θ
Thus, we conclude that Θ* is the optimal estimate of EM process.
Θ∗
= argmax
Θ
𝐿 Θ
30/05/2022 EM Tutorial P2 - Loc Nguyen 7
8. 1. Traditional EM algorithm
The EM algorithm shown in table 2.1 is totally exact with assumption that f(X|Θ) belongs to regular
exponential family. If f(X|Θ) is not regular, the maximal point (maximizer) of the log-likelihood
function l(Θ) is not always the stationary point Θ* so that the first-order derivative of l(Θ) is zero,
l’(Θ*) = 0. However, if f(X|Θ) belongs to curved exponential family, the M-step of the EM algorithm
shown in table 2.1 is modified as follows (Dempster, Laird, & Rubin, 1977, p. 5):
Θ 𝑡+1 = argmax
Θ∈Ω0
𝑙 Θ = argmax
Θ∈Ω0
𝑙 Θ 𝜏 𝑡 = argmax
Θ∈Ω0
Θ𝑇𝜏 𝑡 − log 𝑎 Θ (2.7)
Where τ(t) is calculated by equation 2.6 in E-step. This means that, in more general manner, the
maximizer Θ(t+1) will be found by some way. Recall that if Θ lies in a curved sub-manifold Ω0 of Ω
where Ω is the domain of Θ then, f(X | Θ) belongs to curved exponential family.
In general, given exponential family, within simple EM algorithm, E-step aims to calculate the
current sufficient statistic τ(t) that the log-likelihood function L(Θ(t)) gets maximal with such τ(t) at
current Θ(t) given Y whereas M-step aims to maximize the log-likelihood function l(Θ) given τ(t), as
seen in table 2.2. Note, in table 2.2, f(X|Θ) belongs to curved exponential family but it is not
necessary to be regular. Next slide will show table 2.2.
30/05/2022 EM Tutorial P2 - Loc Nguyen 8
9. 1. Traditional EM algorithm
E-step: Given observed Y and current Θ(t), current value τ(t) of the sufficient statistic τ(X) is the value that the log-likelihood function L(Θ(t))
gets maximal with such τ(t). Concretely, suppose Θ* is a maximizer of L(Θ) given Y where L(Θ) is specified by equation 2.4.
Θ∗
= argmax
Θ
𝐿 Θ = argmax
Θ
𝐿 Θ 𝑌
Suppose Θ* is formulated as function of τ(X), for instance, Θ* = h(τ(X)) with note that Θ* is not evaluated because τ(X) is not evaluated.
Thus, the equation Θ* = h(τ(X)) is only symbolic formula. Let τ(t) be a value of τ(X) such that Θ(t) = h(τ(X)). This means 𝜏 𝑡
∈ 𝜏 𝑋 : Θ 𝑡
=
30/05/2022 EM Tutorial P2 - Loc Nguyen 9
Table 2.2. E-step and M-step of EM algorithm given exponential PDF f(X|Θ)
10. 1. Traditional EM algorithm
EM algorithm stops if two successive estimates are equal, Θ* = Θ(t) = Θ(t+1), at some tth iteration. At that time, Θ* is the optimal estimate of EM
process, which is an optimizer of L(Θ) as Θ∗
= argmax
Θ
𝐿 Θ . Going back example 1.1, given the tth iteration, sufficient statistics x1 and x2 are
estimated as x1
(t) and x2
(t) based on current parameter θ(t) in E-step according to equation 2.6.
𝑥1
𝑡
+ 𝑥2
𝑡
= 𝑦1
𝑡
= 𝐸 𝑦1 𝑌, 𝜃 𝑡
Given py1 = 1/2 + θ/4, which implies that: 𝑥1
𝑡
+ 𝑥2
𝑡
= 𝐸 𝑦1 𝑌, 𝜃 𝑡
= 𝑦1𝑝𝑦1
= 𝑦1
1
2
+
𝜃 𝑡
4
. Because the probability of y1 is 1/2 + θ/4 and
y1 is sum of x1 and x2, let 𝑝𝑥1 𝑦1
be conditional probability of x1 given y1 and let 𝑝𝑥2 𝑦1
be conditional probability of x2 given y1 such that
𝑝𝑥1 𝑦1
=
𝑃 𝑥1, 𝑦1
𝑝𝑦1
=
𝑃 𝑥1, 𝑦1
1 2 + 𝜃 4
𝑝𝑥2 𝑦1
=
𝑃 𝑥2, 𝑦1
𝑝𝑦1
=
𝑃 𝑥2, 𝑦1
1 2 + 𝜃 4
𝑝𝑥1 𝑦1
+ 𝑝𝑥2 𝑦1
= 1
Where P(x1, y1) and P(x2, y1) are joint probabilities of (x1, y1) and (x2, y1), respectively. We can select P(x1, y1) = 1/2 and P(x2, y1) = θ/4, which
implies:
𝑥1
𝑡
= 𝐸 𝑥1 𝑌, 𝜃 𝑡
= 𝑦1
𝑡
𝑝𝑥1 𝑦1
= 𝑦1
𝑡 1 2
1 2 + 𝜃 𝑡 4
𝑥2
𝑡
= 𝐸 𝑥2 𝑌, 𝜃 𝑡
= 𝑦1
𝑡
𝑝𝑥2 𝑦1
= 𝑦1
𝑡 𝜃 𝑡
4
1 2 + 𝜃 𝑡 4
Such that
𝑥1
𝑡
+ 𝑥2
𝑡
= 𝑦1
𝑡
30/05/2022 EM Tutorial P2 - Loc Nguyen 10
11. 1. Traditional EM algorithm
Note, we can select alternately as P(x1, y1) = P(x2, y1) = (1/2 + θ/4) / 2, for example but fixing P(x1, y1) as 1/2 is better because the next estimate
θ(t+1) known later depends only on x2
(t). When y1 is evaluated as y1 = 125, we obtain:
𝑥1
𝑡
= 125
1 2
1 2 + 𝜃 𝑡 4
𝑥2
𝑡
= 125
𝜃 𝑡 4
1 2 + 𝜃 𝑡 4
The expectation y1
(t) = E(y1 | Y, θ(t)) gets value 125 when y1 is evaluated as y1 = 125 and the probability corresponding to y1 gets maximal as 1/2 +
θ(t)/4 = 1. Essentially, equation 2.3 specifying M-step is result of maximizing the log-likelihood function l(Θ). This log-likelihood function is:
𝑙 Θ = log 𝑓 𝑋 𝜃 = log
𝑖=1
5
𝑥𝑖 !
𝑖=1
5
𝑥𝑖!
− 𝑥1 + 2𝑥2 + 2𝑥3 + 2𝑥4 + 2𝑥5 log 2 + 𝑥2 + 𝑥5 log 𝜃 + 𝑥3 + 𝑥4 log 1 − 𝜃
The first-order derivative of log(f(X | θ) is
dlog 𝑓 𝑋 𝜃
d𝜃
=
𝑥2+𝑥5− 𝑥2+𝑥3+𝑥4+𝑥5 𝜃
𝜃 1−𝜃
. Because y2 = x3 = 18, y3 = x4 = 20, y4 = x5 = 34 and x2 is
approximated by x2
(t), we have
𝜕log 𝑓 𝑋 𝜃
𝜕𝜃
=
𝑥2
𝑡
+34− 𝑥2
𝑡
+72 𝜃
𝜃 1−𝜃
. As a maximizer of log(f(X | θ), the next estimate θ(t+1) is solution of the following
equation
𝜕log 𝑓 𝑋 𝜃
𝜕𝜃
= 0. So we have:
𝜃 𝑡+1 =
𝑥2
𝑡
+ 34
𝑥2
𝑡
+ 72
Where, 𝑥2
𝑡
= 125
𝜃 𝑡 4
1 2+𝜃 𝑡 4
30/05/2022 EM Tutorial P2 - Loc Nguyen 11
12. 1. Traditional EM algorithm
For example, given the initial θ(1) = 0.5, at the first iteration, we
have:
𝑥2
1
= 125
𝜃 1
4
1 2 + 𝜃 1 4
=
125 ∗ 0.5/4
0.5 + 0.5/4
= 25
𝜃 2
=
𝑥2
1
+ 34
𝑥2
1
+ 72
=
25 + 34
25 + 72
= 0.6082
After five iterations we gets the optimal estimate θ*:
𝜃∗
= 𝜃 5
= 𝜃 6
= 0.6268
Table 1.3 (Dempster, Laird, & Rubin, 1977, p. 3) show resulted
estimation
30/05/2022 EM Tutorial P2 - Loc Nguyen 12
13. 1. Traditional EM algorithm
For further research, DLR gave a preeminent generality of EM algorithm (Dempster, Laird, & Rubin, 1977, pp. 6-
11) in which f(X | Θ) specifies arbitrary distribution. In other words, there is no requirement of exponential family.
They define the conditional expectation Q(Θ’ | Θ) according to equation 2.8 (Dempster, Laird, & Rubin, 1977, p. 6).
𝑄 Θ′
Θ = 𝐸 log 𝑓 𝑋 Θ′
𝑌, Θ =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑓 𝑋 Θ′
d𝑋 (2.8)
If X and Y are discrete, equation 2.8 can be re-written as follows:
𝑄 Θ′
Θ = 𝐸 log 𝑓 𝑋 Θ′
𝑌, Θ =
𝑋∈𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑓 𝑋 Θ′
The two steps of generalized EM (GEM) algorithm aim to maximize Q(Θ | Θ(t)) at some tth iteration as seen in table
2.3 (Dempster, Laird, & Rubin, 1977, p. 6).
Table 2.3. E-step and M-step of GEM algorithm
30/05/2022 EM Tutorial P2 - Loc Nguyen 13
E-step: The expectation Q(Θ | Θ(t)) is determined based on current parameter Θ(t), according to equation 2.8.
Actually, Q(Θ | Θ(t)) is formulated as function of Θ.
M-step: The next parameter Θ(t+1) is a maximizer of Q(Θ | Θ(t)) with subject to Θ. Note that Θ(t+1) will become
current parameter at the next iteration (the (t+1)th iteration).
14. 1. Traditional EM algorithm
DLR proved that GEM algorithm converges at some tth iteration. At that time, Θ* = Θ(t+1) = Θ(t) is the optimal
estimate of EM process, which is an optimizer of L(Θ).
Θ∗ = argmax
Θ
𝐿 Θ
It is deduced from E-step and M-step that Q(Θ | Θ(t)) is increased after every iteration. How to maximize
Q(Θ|Θ(t)) is the optimization problem which is dependent on applications. For example, the estimate Θ(t+1) can
be solution of the equation created by setting the first-order derivative of Q(Θ|Θ(t)) regarding Θ to be zero,
DQ(Θ|Θ(t)) = 0T. If solving such equation is too complex or impossible, some popular methods to solve
optimization problem are Newton-Raphson (Burden & Faires, 2011, pp. 67-71), gradient descent (Ta, 2014),
and Lagrange duality (Wikipedia, Karush–Kuhn–Tucker conditions, 2014). Note, solving the equation
DQ(Θ|Θ(t)) = 0T may be incorrect in some case, for instance, in theory, Θ(t+1) such that DQ(Θ(t+1)|Θ(t)) = 0T
may be a saddle point (not a maximizer).
GEM algorithm still aims to maximize the log-likelihood function L(Θ) specified by equation 2.4, which is
explained here. Following is proof of equation 2.8. Suppose the current parameter is Θ after some iteration.
Next we must find out the new estimate Θ* that maximizes the next log-likelihood function L(Θ’).
Θ∗ = argmax
Θ′
𝐿 Θ′ = argmax
Θ′
log 𝑔 𝑌 Θ′
30/05/2022 EM Tutorial P2 - Loc Nguyen 14
15. 1. Traditional EM algorithm
The next log-likelihood function L(Θ’) is re-written as follows:
𝐿 Θ′
= log
𝜑−1 𝑌
𝑓 𝑋 Θ′
d𝑋 = log
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ
𝑓 𝑋 Θ′
𝑘 𝑋 𝑌, Θ
d𝑋
Due to 𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ′
d𝑋 = 1, by applying Jensen’s inequality (Sean, 2009, pp. 3-4) with concavity of logarithm function
log
𝑥
𝑢 𝑥 𝑣 𝑥 d𝑥 ≥
𝑥
𝑢 𝑥 log 𝑣 𝑥 d𝑥 , where
𝑥
𝑢 𝑥 d𝑥 = 1
into L(Θ’), we have (Sean, 2009, p. 6):
𝐿 Θ′
≥
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log
𝑓 𝑋 Θ′
𝑘 𝑋 𝑌, Θ
d𝑋 =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑓 𝑋 Θ′
− log 𝑘 𝑋 𝑌, Θ d𝑋
=
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑓 𝑋 Θ′
d𝑋 −
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑘 𝑋 𝑌, Θ d𝑋 = 𝑄 Θ′
Θ − 𝐻 Θ Θ
Where,
𝑄 Θ′
Θ =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑓 𝑋 Θ′
d𝑋
𝐻 Θ′
Θ =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑘 𝑋 𝑌, Θ′
d𝑋
30/05/2022 EM Tutorial P2 - Loc Nguyen 15
16. 1. Traditional EM algorithm
The lower-bound of L(Θ’) is defined as follows:
lb(Θ’ | Θ) = Q(Θ’ | Θ) – H(Θ | Θ)
Of course, we have:
L(Θ’) ≥ lb(Θ’ | Θ)
Suppose at some tth iteration, when the current parameter is Θ(t), the lower-bound of L(Θ) is re-written:
lb(Θ | Θ(t)) = Q(Θ | Θ(t)) – H(Θ(t) | Θ(t))
Of course, we have:
L(Θ) ≥ lb(Θ | Θ(t))
The lower bound lb(Θ | Θ(t)) has following property (Sean, 2009, p. 7):
lb(Θ(t) | Θ(t)) = Q(Θ(t) | Θ(t)) – H(Θ(t) | Θ(t)) = L(Θ(t))
Indeed, we have:
𝑙𝑏 Θ 𝑡 Θ 𝑡 = 𝑄 Θ 𝑡 Θ 𝑡 − 𝐻 Θ 𝑡 Θ 𝑡
=
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ 𝑡
log 𝑓 𝑋 Θ 𝑡
d𝑋 −
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ 𝑡
log 𝑘 𝑋 𝑌, Θ 𝑡
d𝑋
=
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ 𝑡
log
𝑓 𝑋 Θ 𝑡
𝑘 𝑋 𝑌, Θ 𝑡
d𝑋 =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ 𝑡
log 𝑔 𝑌 Θ 𝑡
d𝑋 = log 𝑔 𝑌 Θ 𝑡
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ 𝑡
d𝑋
= log 𝑔 𝑌 Θ 𝑡 = 𝐿 Θ 𝑡
30/05/2022 EM Tutorial P2 - Loc Nguyen 16
17. 1. Traditional EM algorithm
Recall that the main purpose of GEM algorithm is to maximize the log-likelihood L(Θ) =
log(g(Y|Θ)) with observed data Y. However, it is too difficult to maximize log(g(Y | Θ))
because g(Y | Θ) is not well-defined when g(Y | Θ) is integral of f(X | Θ) given a general
mapping function. DLR solved this problem by an iterative process which is an instance of
GEM algorithm. The lower-bound (Sean, 2009, pp. 7-8) of L(Θ) is maximized over many
iterations of the iterative process so that L(Θ) is maximized finally. Such lower-bound is
determined indirectly by the condition expectation Q(Θ | Θ(t)) so that maximizing Q(Θ | Θ(t))
is the same to maximizing the lower bound. Suppose Θ(t+1) is a maximizer of Q(Θ | Θ(t)) at
tth iteration, which is also a maximizer of the lower bound at tth iteration.
Θ 𝑡+1 = argmax
Θ
𝑙𝑏 Θ Θ 𝑡 = argmax
Θ
𝑄 Θ Θ 𝑡
Note, H(Θ(t) | Θ(t)) is constant with regard to Θ. The lower bound is increased after every
iteration. As a result, the maximizer Θ* of the final lower-bound after many iterations will
be expected as a maximizer of L(Θ) in final. Therefore, the two steps of GEM is interpreted
with regard to the lower bound lb(Θ | Θ(t)) as seen in table 2.4.
30/05/2022 EM Tutorial P2 - Loc Nguyen 17
18. 1. Traditional EM algorithm
E-step: The lower bound lb(Θ | Θ(t)) is re-calculated
based on Q(Θ | Θ(t)).
M-step: The next parameter Θ(t+1) is a maximizer of Q(Θ
| Θ(t)) which is also a maximizer of lb(Θ | Θ(t)) because
H(Θ(t) | Θ(t)) is constant.
Θ 𝑡+1 = argmax
Θ
𝑙𝑏 Θ Θ 𝑡 = argmax
Θ
𝑄 Θ Θ 𝑡
Note that Θ(t+1) will become current parameter at the
next iteration so that the lower bound is increased in the
next iteration.
30/05/2022 EM Tutorial P2 - Loc Nguyen 18
Table 2.4. An interpretation of GEM with lower bound
Because Q(Θ | Θ(t)) is defined fixedly in E-step, most
variants of EM algorithm focus on how to maximize
Q(Θ’ | Θ) in M-step more effectively so that EM is faster
or more accurate. Figure 2.1 (Borman, 2004, p. 7) shows
relationship between the log-likelihood function
L(Θ) and its lower-bound lb(Θ | Θ(t)).
Figure 2.1. Relationship between the log-
likelihood function and its lower-bound
Now ideology of GEM is explained in detail.
19. 1. Traditional EM algorithm
The next section focuses on convergence of GEM algorithm proved by DLR (Dempster, Laird, & Rubin, 1977, pp. 7-10) but
firstly we should discuss some features of Q(Θ’ | Θ). In special case of exponential family, Q(Θ’ | Θ) is modified by equation
2.9.
𝑄 Θ′
Θ = 𝐸 log 𝑏 𝑋 𝑌, Θ + Θ′ 𝑇
𝜏Θ − log 𝑎 Θ′
(2.9)
Where,
𝐸 log 𝑏 𝑋 𝑌, Θ =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑏 𝑋 d𝑋
𝜏Θ = 𝐸 𝜏 𝑋 𝑌, Θ =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ 𝜏 𝑋 d𝑋
Following is a proof of equation 2.9.
𝑄 Θ′
Θ = 𝐸 log 𝑓 𝑋 Θ′
𝑌, Θ =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑓 𝑋 Θ′
d𝑋 =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑏 𝑋 exp Θ′ 𝑇
𝜏 𝑋 𝑎 Θ′
d𝑋
=
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑏 𝑋 + Θ′ 𝑇
𝜏 𝑋 − log 𝑎 Θ′
d𝑋 =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑏 𝑋 d𝑋 +
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ Θ′ 𝑇
𝜏 𝑋 d𝑋 −
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ log 𝑎 Θ′
d𝑋
= 𝐸 log 𝑏 𝑋 𝑌, Θ + Θ′ 𝑇
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ 𝜏 𝑋 d𝑋 − log 𝑎 Θ′
= 𝐸 log 𝑏 𝑋 𝑌, Θ + Θ′ 𝑇
𝐸 𝜏 𝑋 𝑌, Θ − log 𝑎 Θ′
Because k(X | Y, Θ) belongs exponential family, the expectation E(τ(X) | Y, Θ) is function of Θ, denoted τΘ. It implies:
𝑄 Θ′
Θ = 𝐸 log 𝑏 𝑋 𝑌, Θ + Θ′ 𝑇
𝜏Θ − log 𝑎 Θ′
∎
30/05/2022 EM Tutorial P2 - Loc Nguyen 19
20. 1. Traditional EM algorithm
If f(X|Θ) belongs to regular exponential family, Q(Θ’ | Θ) gets maximal at the stationary point Θ* so that the first-order
derivative of Q(Θ’ | Θ) is zero. By referring to table 1.2, the first-order derivative of Q(Θ’ | Θ) with regard to Θ’ is:
d𝑄 Θ′ Θ
dΘ′ = 𝜏Θ
𝑇
− log′
𝑎 Θ = 𝜏Θ
𝑇
− 𝐸 𝜏 𝑋 Θ
𝑇
Let τ(t) be the value of τΘ at the tth iteration.
𝜏 𝑡 = 𝐸 𝜏 𝑋 𝑌, Θ 𝑡 =
𝜑−1 𝑌
𝑘 𝑋 𝑌, Θ 𝑡 𝜏 𝑋 d𝑋
The equation above is indeed equation 2.6. The next parameter Θ(t+1) is determined at M-step as solution of the following
equation.
d𝑄 Θ′
Θ
dΘ′ = 𝜏 𝑡 𝑇
− 𝐸 𝜏 𝑋 Θ
𝑇
= 𝟎𝑇
This implies 𝐸 𝜏 𝑋 Θ = 𝜏 𝑡
. The equation above is indeed equation 2.3. If f(X|Θ) belongs to curved exponential family,
Θ(t+1) is determined as follows:
Θ 𝑡+1
= argmax
Θ′
𝑄 Θ′
Θ = argmax
Θ′
Θ′ 𝑇
𝜏 𝑡
− log 𝑎 Θ′
The equation above is indeed equation 2.7. Therefore, GEM shown in table 2.3 degrades into EM shown in table 2.1 and table
2.2 if f(X|Θ) belongs to exponential family. Of course, this recognition is trivial. Example 1.1 is also a good example for GEM
when multinomial distribution belongs to exponential family and then we apply equation 2.7 into maximizing Q(Θ’ | Θ).
30/05/2022 EM Tutorial P2 - Loc Nguyen 20
21. 2. Practical EM algorithm
In practice, if Y is observed as particular N observations Y1, Y2,…, YN. Let 𝒴 = {Y1, Y2,…, YN} be the observed sample of size
N with note that all Yi (s) are mutually independent and identically distributed (iid). Given an observation Yi, there is an
associated random variable Xi. All Xi (s) are iid and they are not existent in fact. Each 𝑋𝑖 ∈ 𝑿 is a random variable like X. Of
course, the domain of each Xi is X. Let 𝒳 = {X1, X2,…, XN} be the set of associated random variables. Because all Xi (s) are
iid, the joint PDF of 𝒳 is determined as follows:
𝑓 𝒳 Θ = 𝑓 𝑋1, 𝑋2, … , 𝑋𝑁 Θ =
𝑖=1
𝑁
𝑓 𝑋𝑖 Θ
Because all Xi (s) are iid and each Yi is associated with Xi, the conditional joint PDF of 𝒳 given 𝒴 is determined as follows:
𝑘 𝒳 𝒴, Θ = 𝑘 𝑋1, 𝑋2, … , 𝑋𝑁 𝑌1, 𝑌2, … , 𝑌𝑁, Θ =
𝑖=1
𝑁
𝑘 𝑋𝑖 𝑌1, 𝑌2, … , 𝑌𝑁, Θ =
𝑖=1
𝑁
𝑘 𝑋𝑖 𝑌𝑖, Θ
The conditional expectation Q(Θ’ | Θ) given samples X and Y is determined as follows:
𝑄 Θ′
Θ =
𝜑−1 𝒴
𝑘 𝒳 𝒴, Θ log 𝑓 𝒳 Θ′
d𝒳
=
𝜑−1 𝑌1 𝜑−1 𝑌2
…
𝜑−1 𝑌𝑁
𝑗=1
𝑁
𝑘 𝑋𝑗 𝑌
𝑗, Θ ∗ log
𝑖=1
𝑁
𝑓 𝑋𝑖 Θ′ d𝑋𝑁 … d𝑋2 d𝑋1
30/05/2022 EM Tutorial P2 - Loc Nguyen 21
25. 2. Practical EM algorithm
Like taking Riemann integral on 𝑋
𝛿 𝑋, 𝑋𝑖 𝑘 𝑋𝑖 𝑌𝑖, Θ log 𝑓 𝑋 Θ′
d𝑋, we have:
𝜑−1 𝑌𝑖 𝑋
𝛿 𝑋, 𝑋𝑖 𝑘 𝑋𝑖 𝑌𝑖, Θ log 𝑓 𝑋 Θ′ d𝑋 d𝑋𝑖 =
𝜑−1 𝑌𝑖
𝑘 𝑋𝑖 𝑌𝑖, Θ log 𝑓 𝑋𝑖 Θ′ d𝑋𝑖
As a result, the conditional expectation Q(Θ’ | Θ) given an observed sample 𝒴 = {Y1, Y2,…, YN} and a set of associated random
variables 𝒳 = {X1, X2,…, XN} is specified as follows:
𝑄 Θ′ Θ =
𝑖=1
𝑁
𝜑−1 𝑌𝑖
𝑘 𝑋𝑖 𝑌𝑖, Θ log 𝑓 𝑋𝑖 Θ′ d𝑋𝑖
Note, all Xi (s) are iid and they are not existent in fact. Because all Xi are iid, let X be the random variable representing every Xi and
the equation of Q(Θ’ | Θ) is re-written according to equation 2.10.
𝑄 Θ′
Θ =
𝑖=1
𝑁
𝜑−1 𝑌𝑖
𝑘 𝑋 𝑌𝑖, Θ log 𝑓 𝑋 Θ′
d𝑋 (2.10)
The similar proof of equation 2.10 in case that Xi (s) are discrete is found in (Bilmes, 1998, p. 4). If X and all Yi (s) are discrete,
equation 2.10 can be re-written as follows:
𝑄 Θ′
Θ =
𝑖=1
𝑁
𝑋∈𝜑−1 𝑌𝑖
𝑘 𝑋 𝑌𝑖, Θ log 𝑓 𝑋 Θ′
30/05/2022 EM Tutorial P2 - Loc Nguyen 25
26. 2. Practical EM algorithm
In case that f(X | Θ) and k(X | Yi, Θ) belong to exponential family, equation 2.10
becomes equation 2.11 with an observed sample 𝒴 = {Y1, Y2,…, YN}.
𝑄 Θ′
Θ =
𝑖=1
𝑁
𝐸 log 𝑏 𝑋 𝑌𝑖, Θ + Θ′ 𝑇
𝑖=1
𝑁
𝜏Θ,𝑌𝑖
− 𝑁log 𝑎 Θ′
(2.11)
Where,
𝐸 log 𝑏 𝑋 𝑌𝑖, Θ =
𝜑−1 𝑌𝑖
𝑘 𝑋 𝑌𝑖, Θ log 𝑏 𝑋 d𝑋
𝜏Θ,𝑌𝑖
= 𝐸 𝜏 𝑋 𝑌𝑖, Θ =
𝜑−1 𝑌𝑖
𝑘 𝑋 𝑌𝑖, Θ 𝜏 𝑋 d𝑋
Please combine equation 2.9 and equation 2.10 to comprehend how to derive equation
2.11. Note, 𝜏Θ,𝑌𝑖
is dependent on both Θ and Yi.
30/05/2022 EM Tutorial P2 - Loc Nguyen 26
27. 2. Practical EM algorithm
DLR (Dempster, Laird, & Rubin, 1977, p. 1) called X as complete data because the mapping φ:
X → Y is many-one function. There is another case that the complete space Z consists of hidden
space X and observed space Y with note that X and Y are separated. There is no explicit
mapping φ from X and Y but there exists a PDF of 𝑍 ∈ 𝒁 as the joint PDF of 𝑋 ∈ 𝑿 and 𝑌 ∈ 𝒀.
𝑓 𝑍 Θ = 𝑓 𝑋, 𝑌 Θ
In this case, the equation 2.8 is modified with the joint PDF f(X, Y | Θ). The PDF of Y becomes:
𝑓 𝑌 Θ =
𝑋
𝑓 𝑋, 𝑌 Θ d𝑋
The PDF f(Y|Θ) is equivalent to the PDF g(Y|Θ) mentioned in equation 1.34. Although there is
no explicit mapping from X to Y, the PDF of Y above implies an implicit mapping from Z to Y.
The conditional PDF of X given Z is specified according to Bayes’ rule as follows:
𝑓 𝑍 𝑌, Θ = 𝑓 𝑋, 𝑌 𝑌, Θ = 𝑓 𝑋 𝑌 𝑓 𝑌 𝑌 = 𝑓 𝑋 𝑌, Θ =
𝑓 𝑋, 𝑌 Θ
𝑓 𝑌 Θ
=
𝑓 𝑋, 𝑌 Θ
𝑋
𝑓 𝑋, 𝑌 Θ d𝑋
The conditional PDF f(X|Y, Θ) is equivalent to the conditional PDF k(X|Y, Θ) mentioned in
equation 1.35.
30/05/2022 EM Tutorial P2 - Loc Nguyen 27
28. 2. Practical EM algorithm
Equation 2.12 specifies the conditional expectation Q(Θ’ | Θ) in case that there is no explicit mapping from X to
Y but there exists the joint PDF of X and Y.
𝑄 Θ′ Θ =
𝑋
𝑓 𝑍 𝑌, Θ log 𝑓 𝑍 Θ′ d𝑋 =
𝑋
𝑓 𝑋 𝑌, Θ log 𝑓 𝑋, 𝑌 Θ′ d𝑋 (2.12)
Note, X is separated from Y and the complete data Z = (X, Y) is composed of X and Y. For equation 2.12, the
existence of the joint PDF f(X, Y | Θ) can be replaced by the existence of the conditional PDF f(Y|X, Θ) and the
prior PDF f(X|Θ) due to:
𝑓 𝑋, 𝑌 Θ = 𝑓 𝑌 𝑋, Θ 𝑓 𝑋 Θ
In applied statistics, equation 2.8 is often replaced by equation 2.12 because specifying the joint PDF f(X, Y | Θ)
is more practical than specifying the mapping φ: X → Y. However, equation 2.8 is more general equation 2.12
because the requirement of the joint PDF for equation 2.12 is stricter than the requirement of the explicit
mapping for equation 2.8. In case that X and Y are discrete, equation 2.12 becomes:
𝑄 Θ′ Θ =
𝑋
𝑃 𝑋 𝑌, Θ log 𝑃 𝑋, 𝑌 Θ′
In case that X and Y are discrete, P(X, Y | Θ) is the joint probability of X and Y whereas P(X | Y, Θ) is the
conditional probability of X given Y.
30/05/2022 EM Tutorial P2 - Loc Nguyen 28
29. 2. Practical EM algorithm
Equation 2.12 can be proved alternately without knowledge related to complete data (Sean, 2009). This proof
is like the proof of equation 2.8. In fact, given hidden space X, observed space Y, and a joint PDF f(X, Y | Θ),
the likelihood function L(Θ’) is re-defined here as log(f(Y | Θ’)). The maximizer is:
Θ∗ = argmax
Θ′
𝐿 Θ′ = argmax
Θ′
log 𝑓 𝑌 Θ′
Suppose the current parameter is Θ after some iteration. Next we must find out the new estimate Θ* that
maximizes the next log-likelihood function L(Θ’). Suppose the total probability of observed data can be
determined by marginalizing over hidden data:
𝑓 𝑌 Θ′
=
𝑋
𝑓 𝑋, 𝑌 Θ′
d𝑋
The expansion of f(Y | Θ’) is total probability rule. The next log-likelihood function L(Θ’) is re-written:
𝐿 Θ′ = log 𝑓 𝑌 Θ′ = log
𝑋
𝑓 𝑋, 𝑌 Θ′ d𝑋 = log
𝑋
𝑓 𝑋 𝑌, Θ
𝑓 𝑋, 𝑌 Θ′
𝑓 𝑋 𝑌, Θ
d𝑋
Because hidden X is the complete set of mutually exclusive variables, the sum of conditional probabilities of
X is equal to 1 given Y and Θ as 𝑋
𝑓 𝑋 𝑌, Θ d𝑋 = 1, where 𝑓 𝑋 𝑌, Θ =
𝑓 𝑋, 𝑌 Θ
𝑋
𝑓 𝑋, 𝑌 Θ d𝑋
30/05/2022 EM Tutorial P2 - Loc Nguyen 29
30. 2. Practical EM algorithm
Applying Jensen’s inequality (Sean, 2009, pp. 3-4) with concavity of logarithm function log 𝑥
𝑢 𝑥 𝑣 𝑥 d𝑥 ≥
𝑥
𝑢 𝑥 log 𝑣 𝑥 d𝑥 where 𝑥
𝑢 𝑥 d𝑥 = 1 into L(Θ’), we have (Sean, 2009, p. 6):
𝐿 Θ′
≥
𝑋
𝑓 𝑋 𝑌, Θ log
𝑓 𝑋, 𝑌 Θ′
𝑓 𝑋 𝑌, Θ
=
𝑋
𝑓 𝑋 𝑌, Θ log 𝑓 𝑋, 𝑌 Θ′
− log 𝑓 𝑋 𝑌, Θ d𝑋 =
𝑋
𝑓 𝑋 𝑌, Θ log 𝑓 𝑋, 𝑌 Θ′
d𝑋 −
𝑋
𝑓 𝑋 𝑌, Θ log 𝑓 𝑋 𝑌, Θ d𝑋
= 𝑄 Θ′
Θ − 𝐻 Θ Θ
Where,
𝑄 Θ′
Θ =
𝑋
𝑓 𝑋 𝑌, Θ log 𝑓 𝑋, 𝑌 Θ′
d𝑋
𝐻 Θ′
Θ =
𝑋
𝑓 𝑋 𝑌, Θ log 𝑓 𝑋 𝑌, Θ′
d𝑋
Obviously, the lower-bound of L(Θ’) is 𝑙𝑏 Θ′
Θ = 𝑄 Θ′
Θ − 𝐻 Θ Θ . As aforementioned, the lower-bound lb(Θ’|Θ) (Sean, 2009, pp. 7-8)
is maximized over many iterations of the iterative process so that L(Θ’) is maximized finally. Because H(Θ|Θ) is constant with regard to Θ’, it
is possible to eliminate H(Θ|Θ) so that maximizing Q(Θ’|Θ) is the same to maximizing the lower bound. In final, when GEM converges Θ(t) =
Θ(t+1) = Θ*, we have:
Θ∗
= argmax
Θ′
𝑙𝑏 Θ′
Θ = argmax
Θ′
𝑄 Θ′
Θ
We have the proof .
30/05/2022 EM Tutorial P2 - Loc Nguyen 30
31. 2. Practical EM algorithm
Mixture model mentioned in subsection 5.1 is a good example for GEM without explicit mapping from
X to Y. Another well-known example is three-coin toss example (Collins & Barzilay, 2005) which
applies GEM into estimating parameters of binomial distributions without explicit mapping.
Example 2.1. There are three coins named coin 1, coin 2 and coin 3. Each coin has two sides such as
head (H) side and tail (T) side. Let hidden random variable X represent coin 1 where X is binary (X =
{H, T}). Let θ1 be probability of coin 1 receiving head side.
θ1 = P(X=H)
Of course, we have:
P(X=T) = 1 – θ1
Let observed random variable Y represent a sequence of tossing coin 2 or coin 3 three times. Such
sequence depends on first tossing coin 1. For instance, if coin 1 shows head side (X=H), the sequence is
result of tossing coin 2 three times. Otherwise, if coin 1 shows tail side (X=T), the sequence is result of
tossing coin 3 three times. For example, suppose first tossing coin 1 results X=H then, a possible result Y
= HHT means that we toss coin 2 three times resulting head, head, and tail from coin 2. Obviously, X is
hidden and Y is observed. In this example, we observe that
Y=HHT
30/05/2022 EM Tutorial P2 - Loc Nguyen 31
32. 2. Practical EM algorithm
Suppose Y conforms binomial distribution as follows:
𝑃 𝑌 𝑋 =
𝜃2
ℎ
1 − 𝜃2
𝑡 if 𝑋 = 𝐻
𝜃3
ℎ
1 − 𝜃3
𝑡
if 𝑋 = 𝑇
Where θ2 and θ3 are probabilities of coin 2 and coin 3 receiving head side, respectively. Note, h is the number of head side
from trials of tossing coin 2 (if X=H) or coin 3 (if X=T). Similarly, t is the number of tail side from trials of tossing coin 2 (if
X=H) or coin 3 (if X=T). The joint probability P(X, Y) is:
𝑃 𝑋, 𝑌 = 𝑃 𝑋 𝑃 𝑌 𝑋 =
𝜃1𝜃2
ℎ
1 − 𝜃2
𝑡
if 𝑋 = 𝐻
1 − 𝜃1 𝜃3
ℎ
1 − 𝜃3
𝑡
if 𝑋 = 𝑇
In short, we need to estimate Θ = (θ1, θ2, θ3)T from the observation Y=HHT by discrete version of Q(Θ’ | Θ). Given Y=HHT,
we have h=2 and t=1. Thus, the probability P(Y|X) becomes:
𝑃 𝑌 𝑋 = 𝑃 𝑌 = 𝐻𝐻𝑇 𝑋 =
𝜃2
2
1 − 𝜃2 if 𝑋 = 𝐻
𝜃3
2
1 − 𝜃3 if 𝑋 = 𝑇
The joint probability P(X, Y) becomes:
𝑃 𝑋, 𝑌 =
𝜃1𝜃2
2
1 − 𝜃2 if 𝑋 = 𝐻
1 − 𝜃1 𝜃3
2
1 − 𝜃3 if 𝑋 = 𝑇
The probability of Y is calculated as follows:
𝑃 𝑌 = 𝑃 𝑌 𝑋 = 𝐻 + 𝑃 𝑌 𝑋 = 𝑇 = 𝜃2
2
1 − 𝜃2 + 𝜃3
2
1 − 𝜃3
30/05/2022 EM Tutorial P2 - Loc Nguyen 32
33. 2. Practical EM algorithm
The conditional probability of X given Y is determined as follows:
𝑃 𝑋 𝑌 =
𝑃 𝑋, 𝑌
𝑃 𝑌
=
𝜃1𝜃2
2
1 − 𝜃2
𝜃2
2
1 − 𝜃2 + 𝜃3
2
1 − 𝜃3
if 𝑋 = 𝐻
1 − 𝜃1 𝜃3
2
1 − 𝜃3
𝜃2
2
1 − 𝜃2 + 𝜃3
2
1 − 𝜃3
if 𝑋 = 𝑇
The discrete version of Q(Θ’ | Θ) is determined as follows:
𝑄 Θ′
Θ =
𝑋
𝑃 𝑋 𝑌, Θ log 𝑃 𝑋, 𝑌 Θ′
= 𝑃 𝑋 = 𝐻 𝑌, Θ log 𝑃 𝑋 = 𝐻, 𝑌 Θ′
+ 𝑃 𝑋 = 𝑇 𝑌, Θ log 𝑃 𝑋 = 𝑇, 𝑌 Θ′
=
𝜃1𝜃2
2
1 − 𝜃2
𝜃2
2
1 − 𝜃2 + 𝜃3
2
1 − 𝜃3
log 𝜃1
′
+ 2log 𝜃2
′
+ log 1 − 𝜃2
′
+
1 − 𝜃1 𝜃3
2
1 − 𝜃3
𝜃2
2
1 − 𝜃2 + 𝜃3
2
1 − 𝜃3
log 1 − 𝜃1
′
+ 2log 𝜃3
′
+ log 1 − 𝜃3
′
Note, Q(Θ’|Θ) is function of Θ’ = (θ1’, θ2’, θ3’)T. The next parameter Θ(t+1) = (θ1
(t+1), θ2
(t+1), θ3
(t+1))T is a maximizer of
Q(Θ’|Θ) with regard to Θ’, which is solution of the equation created by setting the first-order derivative of Q(Θ’|Θ) to
be zero with note that the current parameter is Θ(t) = Θ.
30/05/2022 EM Tutorial P2 - Loc Nguyen 33
34. 2. Practical EM algorithm
The first-order partial derivative of Q(Θ’|Θ) with regard to θ1’ is:
𝜕𝑄 Θ′
Θ
𝜕𝜃1
′ =
𝜃1𝜃2
2
1 − 𝜃2 − 𝜃1
′
𝜃1𝜃2
2
1 − 𝜃2 + 1 − 𝜃1 𝜃3
2
1 − 𝜃3
𝜃1
′
1 − 𝜃1
′
𝜃2
2
1 − 𝜃2 + 𝜃3
2
1 − 𝜃3
Setting this partial derivative
𝜕𝑄 Θ′
Θ
𝜕𝜃1
′ to be zero, we obtain:
𝜃1
′
=
𝜃1𝜃2
2
1 − 𝜃2
𝜃1𝜃2
2
1 − 𝜃2 + 1 − 𝜃1 𝜃3
2
1 − 𝜃3
Therefore, in M-step, given current parameter Θ(t) = (θ1
(t), θ2
(t), θ3
(t))T, the next partial parameter θ1
(t+1) is calculated as follows:
𝜃1
𝑡+1
=
𝜃1
𝑡
𝜃2
𝑡
2
1 − 𝜃2
𝑡
𝜃1
𝑡
𝜃2
𝑡
2
1 − 𝜃2
𝑡
+ 1 − 𝜃1
𝑡
𝜃3
𝑡
2
1 − 𝜃3
𝑡
The first-order partial derivative of Q(Θ’|Θ) with regard to θ2’ is:
𝜕𝑄 Θ′
Θ
𝜕𝜃2
′ =
𝜃1𝜃2
2
1 − 𝜃2
𝜃2
2
1 − 𝜃2 + 𝜃3
2
1 − 𝜃3
2 − 3𝜃2
′
𝜃2
′
1 − 𝜃2
′
Setting this partial derivative
𝜕𝑄 Θ′ Θ
𝜕𝜃2
′ to be zero, we obtain:
𝜃2
′
=
2
3
30/05/2022 EM Tutorial P2 - Loc Nguyen 34
35. 2. Practical EM algorithm
Therefore, in M-step, given current parameter Θ(t) = (θ1
(t), θ2
(t), θ3
(t))T, the next partial parameter θ2
(t+1) is fixed 𝜃2
𝑡+1
=
2
3
.
The first-order partial derivative of Q(Θ’|Θ) with regard to θ3’ is
𝜕𝑄 Θ′
Θ
𝜕𝜃3
′ =
1 − 𝜃1 𝜃3
2
1 − 𝜃3
𝜃2
2
1 − 𝜃2 + 𝜃3
2
1 − 𝜃3
2 − 3𝜃3
′
𝜃3
′
1 − 𝜃3
′
Setting this partial derivative
𝜕𝑄 Θ′
Θ
𝜕𝜃3
′ to be zero, we obtain 𝜃3
′
=
2
3
. Therefore, in M-step, given current parameter Θ(t) =
(θ1
(t), θ2
(t), θ3
(t))T, the next partial parameter θ3
(t+1) is fixed 𝜃3
𝑡+1
=
2
3
. In short, in M-step of some tth iteration, given current
parameter Θ(t) = (θ1
(t), θ2
(t), θ3
(t))T, only θ1
(t+1) is updated whereas both θ2
(t+1) and θ3
(t+1) are fixed with observation Y=HHT.
𝜃1
𝑡+1
=
𝜃1
𝑡
𝜃2
𝑡
2
1 − 𝜃2
𝑡
𝜃1
𝑡
𝜃2
𝑡
2
1 − 𝜃2
𝑡
+ 1 − 𝜃1
𝑡
𝜃3
𝑡
2
1 − 𝜃3
𝑡
𝜃2
𝑡+1
= 𝜃3
𝑡+1
=
2
3
30/05/2022 EM Tutorial P2 - Loc Nguyen 35
36. 2. Practical EM algorithm
For instance, let Θ(1) = (θ1
(1), θ2
(1), θ3
(1))T be initialized arbitrarily as θ1
(1) = θ2
(1) = θ3
(1) = 0.5, at the
first iteration, we obtain:
𝜃1
2
=
0.5 ∗ 0.5 2
∗ 1 − 0.5
0.5 ∗ 0.5 2 ∗ 1 − 0.5 + 1 − 0.5 ∗ 0.5 2 ∗ 1 − 0.5
= 0.5
𝜃2
2
= 𝜃3
2
=
2
3
At the second iteration with current parameter Θ(2) = (θ1
(2)=0.5, θ2
(2)=2/3, θ3
(2)=2/3)T, we obtain:
𝜃1
3
=
0.5 ∗
2
3
2
∗ 1 −
2
3
0.5 ∗
2
3
2
∗ 1 −
2
3
+ 1 − 0.5 ∗
2
3
2
∗ 1 −
2
3
= 0.5
𝜃2
3
= 𝜃3
3
=
2
3
As a result, GEM inside this example converges at the second iteration with final estimate Θ(2) =
Θ(3) = Θ* = (θ1
*=0.5, θ2
*=2/3, θ3
*=2/3)T .
30/05/2022 EM Tutorial P2 - Loc Nguyen 36
37. 2. Practical EM algorithm
In practice, suppose Y is observed as a sample 𝒴 = {Y1, Y2,…, YN} of size N with note that
all Yi (s) are mutually independent and identically distributed (iid). The observed sample 𝒴
is associated with a a hidden set (latent set) 𝒳 = {X1, X2,…, XN} of size N. All Xi (s) are iid
and they are not existent in fact. Let 𝑋 ∈ 𝑿 be the random variable representing every Xi. Of
course, the domain of X is X. Equation 2.13 specifies the conditional expectation Q(Θ’ | Θ)
given such 𝒴.
𝑄 Θ′ Θ =
𝑖=1
𝑁
𝑋
𝑓 𝑋 𝑌𝑖, Θ log 𝑓 𝑋, 𝑌𝑖 Θ′ d𝑋 (2.13)
Equation 2.13 is a variant of equation 2.10 in case that there is no explicit mapping between
Xi and Yi but there exists the same joint PDF between Xi and Yi. Please see the proof of
equation 2.10 to comprehend how to derive equation 2.13. If both X and Y are discrete,
equation 2.13 becomes:
𝑄 Θ′
Θ =
𝑖=1
𝑁
𝑋
𝑃 𝑋 𝑌𝑖, Θ log 𝑃 𝑋, 𝑌𝑖 Θ′
(2.14)
30/05/2022 EM Tutorial P2 - Loc Nguyen 37
38. 2. Practical EM algorithm
If X is discrete and Y is continuous such that f(X, Y | Θ) = P(X|Θ)f(Y | X, Θ) then, according
to the total probability rule, we have:
𝑓 𝑌 Θ =
𝑋
𝑃 𝑋 Θ 𝑓 𝑌 𝑋, Θ
Note, when only X is discrete, its PDF f(X|Θ) becomes the probability P(X|Θ). Therefore,
equation 2.15 is a variant of equation 2.13, as follows:
𝑄 Θ′ Θ =
𝑖=1
𝑁
𝑋
𝑃 𝑋 𝑌𝑖, Θ log 𝑃 𝑋 Θ′ 𝑓 𝑌𝑖 𝑋, Θ′ (2.15)
Where P(X | Yi, Θ) is determined by Bayes’ rule, as follows:
𝑃 𝑋 𝑌𝑖, Θ =
𝑃 𝑋 Θ 𝑓 𝑌𝑖 𝑋, Θ
𝑋 𝑃 𝑋 Θ 𝑓 𝑌𝑖 𝑋, Θ
Equation 2.15 is the base for estimating the probabilistic mixture model by EM algorithm,
which will be described later in detail. Some other properties of GEM will be mentioned in
next section.
30/05/2022 EM Tutorial P2 - Loc Nguyen 38
39. References
1. Bilmes, J. A. (1998). A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture
and Hidden Markov Models. International Computer Science Institute, Department of Electrical Engineering and Computer
Science. Berkeley: University of Washington. Retrieved from
http://melodi.ee.washington.edu/people/bilmes/mypubs/bilmes1997-em.pdf
2. Borman, S. (2004). The Expectation Maximization Algorithm - A short tutorial. University of Notre Dame, Department of
Electrical Engineering. South Bend, Indiana: Sean Borman's Home Page.
3. Burden, R. L., & Faires, D. J. (2011). Numerical Analysis (9th Edition ed.). (M. Julet, Ed.) Brooks/Cole Cengage Learning.
4. Collins, M., & Barzilay, R. (2005). Advanced Natural Language Processing - The EM Algorithm. Massachusetts Institute of
Technology, Electrical Engineering and Computer Science. MIT OpenCourseWare. Retrieved October 9, 2020, from
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-864-advanced-natural-language-processing-fall-
2005/lecture-notes/lec5.pdf
5. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. (M.
Stone, Ed.) Journal of the Royal Statistical Society, Series B (Methodological), 39(1), 1-38.
6. Sean, B. (2009). The Expectation Maximization Algorithm - A short tutorial. University of Notre Dame, Indiana, Department of
Electrical Engineering. Sean Borman's Homepage.
7. Ta, P. D. (2014). Numerical Analysis Lecture Notes. Vietnam Institute of Mathematics, Numerical Analysis and Scientific
Computing. Hanoi: Vietnam Institute of Mathematics. Retrieved 2014
8. Wikipedia. (2014, August 4). Karush–Kuhn–Tucker conditions. (Wikimedia Foundation) Retrieved November 16, 2014, from
Wikipedia website: http://en.wikipedia.org/wiki/Karush–Kuhn–Tucker_conditions
30/05/2022 EM Tutorial P2 - Loc Nguyen 39
40. Thank you for listening
40
EM Tutorial P2 - Loc Nguyen
30/05/2022