Expectation maximization (EM) algorithm is a popular and powerful mathematical method for parameter estimation in case that there exist both observed data and hidden data. The EM process depends on an implicit relationship between observed data and hidden data which is specified by a mapping function in traditional EM and a joint probability density function (PDF) in practical EM. However, the mapping function is vague and impractical whereas the joint PDF is not easy to be defined because of heterogeneity between observed data and hidden data. The research aims to improve competency of EM by making it more feasible and easier to be specified, which removes the vagueness. Therefore, the research proposes an assumption that observed data is the combination of hidden data which is realized as an analytic function where data points are numerical. In other words, observed points are supposedly calculated from hidden points via regression model. Mathematical computations and proofs indicate feasibility and clearness of the proposed method which can be considered as an extension of EM.
Toeplitz Hermitian Positive Definite (THPD) matrices play an important role in signal processing and computer graphics and circular models, related to angular / periodic data, have vide applications in various walks of life. Visualizing a circular model through THPD matrix the required computations on THPD matrices using single bordering and double bordering are discussed. It can be seen that every tridiagonal THPD leads to Cardioid model.
IOSR Journal of Mathematics(IOSR-JM) is an open access international journal that provides rapid publication (within a month) of articles in all areas of mathemetics and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in mathematics. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Toeplitz Hermitian Positive Definite (THPD) matrices play an important role in signal processing and computer graphics and circular models, related to angular / periodic data, have vide applications in various walks of life. Visualizing a circular model through THPD matrix the required computations on THPD matrices using single bordering and double bordering are discussed. It can be seen that every tridiagonal THPD leads to Cardioid model.
IOSR Journal of Mathematics(IOSR-JM) is an open access international journal that provides rapid publication (within a month) of articles in all areas of mathemetics and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in mathematics. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Correlation measure for intuitionistic fuzzy multi setseSAT Journals
Abstract In this paper, the Correlation measure of Intuitionistic Fuzzy Multi sets (IFMS) is proposed. The concept of this Correlation measure of IFMS is the extension of Correlation measure of IFS. Using the Correlation of IFMS measure, the application of medical diagnosis and pattern recognition are presented. The new method also shows that the correlation measure of any two IFMS equals one if and only if the two IFMS are the same. Keywords: Intuitionistic fuzzy set, Intuitionistic Fuzzy Multi sets, Correlation measure.
The numerical solution of Huxley equation by the use of two finite difference methods is done. The first one is the explicit scheme and the second one is the Crank-Nicholson scheme. The comparison between the two methods showed that the explicit scheme is easier and has faster convergence while the Crank-Nicholson scheme is more accurate. In addition, the stability analysis using Fourier (von Neumann) method of two schemes is investigated. The resulting analysis showed that the first scheme
is conditionally stable if, r ≤ 2 − aβ∆t , ∆t ≤ 2(∆x)2 and the second
scheme is unconditionally stable.
Isoparametric bilinear quadrilateral element _ ppt presentationFilipe Giesteira
The theoretical formulation of an isoparametric element, from the Lagrange family. In addition, the MATLAB code of the FEM from the bilinear element with four nodes was also implemented.
Assignment developed in the scope of the finite element method course, lectured at FEUP (Faculdade de Engenharia da Universidade do Porto).
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...mathsjournal
In this article we assume that experts express their view points by way of approximation of Triangular fuzzy numbers. We take the help of fuzzy set theory concept to model the situation and present a method to aggregate these approximations of triangular fuzzy numbers to obtain an overall approximation of triangular fuzzy number for each system and then linear ordering done before the best system is chosen. A comparison has been made between approximation of triangular fuzzy systems and the corresponding fuzzy triangular numbers systems. The notions like fuzziness and ambiguity for the approximation of triangular fuzzy numbers are also found.
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...mathsjournal
In this article we assume that experts express their view points by way of approximation of Triangular
fuzzy numbers. We take the help of fuzzy set theory concept to model the situation and present a method to
aggregate these approximations of triangular fuzzy numbers to obtain an overall approximation of
triangular fuzzy number for each system and then linear ordering done before the best system is chosen. A
comparison has been made betweenapproximation of triangular fuzzy systems and the corresponding fuzzy
triangular numbers systems. The notions like fuzziness and ambiguity for the approximation of triangular
fuzzy numbers are also found.
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...mathsjournal
In this article we assume that experts express their view points by way of approximation of Triangular
fuzzy numbers. We take the help of fuzzy set theory concept to model the situation and present a method to
aggregate these approximations of triangular fuzzy numbers to obtain an overall approximation of
triangular fuzzy number for each system and then linear ordering done before the best system is chosen. A
comparison has been made betweenapproximation of triangular fuzzy systems and the corresponding fuzzy
triangular numbers systems. The notions like fuzziness and ambiguity for the approximation of triangular
fuzzy numbers are also found.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
When Classifier Selection meets Information Theory: A Unifying ViewMohamed Farouk
Classifier selection aims to reduce the size of an
ensemble of classifiers in order to improve its efficiency and
classification accuracy. Recently an information-theoretic view
was presented for feature selection. It derives a space of possible
selection criteria and show that several feature selection criteria
in the literature are points within this continuous space. The
contribution of this paper is to export this information-theoretic
view to solve an open issue in ensemble learning which is
classifier selection. We investigated a couple of informationtheoretic
selection criteria that are used to rank classifiers.
On solving fuzzy delay differential equationusing bezier curves IJECEIAES
In this article, we plan to use Bezier curves method to solve linear fuzzy delay differential equations. A Bezier curves method is presented and modified to solve fuzzy delay problems taking the advantages of the fuzzy set theory properties. The approximate solution with different degrees is compared to the exact solution to confirm that the linear fuzzy delay differential equations process is accurate and efficient. Numerical example is explained and analyzed involved first order linear fuzzy delay differential equations to demonstrate these proper features of this proposed problem.
Handling missing data with expectation maximization algorithmLoc Nguyen
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating parameter of statistical models in case of incomplete data or hidden data. EM assumes that there is a relationship between hidden data and observed data, which can be a joint distribution or a mapping function. Therefore, this implies another implicit relationship between parameter estimation and data imputation. If missing data which contains missing values is considered as hidden data, it is very natural to handle missing data by EM algorithm. Handling missing data is not a new research but this report focuses on the theoretical base with detailed mathematical proofs for fulfilling missing values with EM. Besides, multinormal distribution and multinomial distribution are the two sample statistical models which are concerned to hold missing values.
Expectation maximization (EM) algorithm is a popular and powerful mathematical method for parameter estimation in case that there exist both observed data and hidden data. Therefore, EM is appropriate to applications which aim to exploit latent aspects under heterogeneous data. This report focuses on probabilistic finite mixture model which is a popular and successful application of EM, which is fully explained in my book (Nguyen, Tutorial on EM algorithm, 2020, pp. 78-88). I also proposed a special regression model associated with mixture model in which missing values are acceptable.
Conditional mixture model for modeling attributed dyadic dataLoc Nguyen
Dyadic data contains co-occurrences of objects, which is often modeled by finite mixture model which in turn is learned by expectation maximization (EM) algorithm. Objects in traditional dyadic data are identified by names, causing the drawback which is that it is impossible to extract implicit valuable knowledge under objects. In this research, I propose the so-called attributed dyadic data (ADD) in which each object has an informative attribute and each co-occurrence of two objects is associated with a value. ADD is flexible and covers most of structures / forms of dyadic data. Conditional mixture model (CMM), which is a variant of finite mixture model, is applied into learning ADD. Moreover, a significant feature of CMM is that any co-occurrence of two objects is based on some conditional variable. As a result, CMM can predict or estimate co-occurrent values based on regression model, which extends applications of ADD and CMM.
Conditional mixture model and its application for regression modelLoc Nguyen
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating statistical parameter when data sample contains hidden part and observed part. EM is applied to learn finite mixture model in which the whole distribution of observed variable is average sum of partial distributions. Coverage ratio of every partial distribution is specified by the probability of hidden variable. An application of mixture model is soft clustering in which cluster is modeled by hidden variable whereas each data point can be assigned to more than one cluster and degree of such assignment is represented by the probability of hidden variable. However, such probability in traditional mixture model is simplified as a parameter, which can cause loss of valuable information. Therefore, in this research I propose a so-called conditional mixture model (CMM) in which the probability of hidden variable is modeled as a full probabilistic density function (PDF) that owns individual parameter. CMM aims to extend mixture model. I also propose an application of CMM which is called adaptive regression model (ARM). Traditional regression model is effective when data sample is scattered equally. If data points are grouped into clusters, regression model tries to learn a unified regression function which goes through all data points. Obviously, such unified function is not effective to evaluate response variable based on grouped data points. The concept “adaptive” of ARM means that ARM solves the ineffectiveness problem by selecting the best cluster of data points firstly and then evaluating response variable within such best cluster. In orther words, ARM reduces estimation space of regression model so as to gain high accuracy in calculation.
Keywords: expectation maximization (EM) algorithm, finite mixture model, conditional mixture model, regression model, adaptive regression model (ARM).
This is chapter 4 “Variants of EM algorithm” in my book “Tutorial on EM algorithm”, which focuses on EM variants. The main purpose of expectation maximization (EM) algorithm, also GEM algorithm, is to maximize the log-likelihood L(Θ) = log(g(Y|Θ)) with observed data Y by maximizing the conditional expectation Q(Θ’|Θ). Such Q(Θ’|Θ) is defined fixedly in E-step. Therefore, most variants of EM algorithm focus on how to maximize Q(Θ’|Θ) in M-step more effectively so that EM is faster or more accurate.
-contraction and some fixed point results via modified !-distance mappings in t...IJECEIAES
In this Article, we introduce the notion of an -contraction, which is based on modified !-distance mappings, and employ this new definition to prove some fixed point result. Moreover, to highlight the significance of our work, we present an interesting example along with an application. '
International journal of engineering and mathematical modelling vol2 no1_2015_1IJEMM
Our efforts are mostly concentrated on improving the convergence rate of the numerical procedures both from the viewpoint of cost-efficiency and accuracy by handling the parametrization of the shape to be optimized. We employ nested parameterization supports of either shape, or shape deformation, and the classical process of degree elevation resulting in exact geometrical data transfer from coarse to fine representations. The algorithms mimick classical multigrid strategies and are found very effective in terms of convergence acceleration. In this paper, we analyse and demonstrate the efficiency of the two-level correction algorithm which is the basic block of a more general miltilevel strategy.
Correlation measure for intuitionistic fuzzy multi setseSAT Journals
Abstract In this paper, the Correlation measure of Intuitionistic Fuzzy Multi sets (IFMS) is proposed. The concept of this Correlation measure of IFMS is the extension of Correlation measure of IFS. Using the Correlation of IFMS measure, the application of medical diagnosis and pattern recognition are presented. The new method also shows that the correlation measure of any two IFMS equals one if and only if the two IFMS are the same. Keywords: Intuitionistic fuzzy set, Intuitionistic Fuzzy Multi sets, Correlation measure.
The numerical solution of Huxley equation by the use of two finite difference methods is done. The first one is the explicit scheme and the second one is the Crank-Nicholson scheme. The comparison between the two methods showed that the explicit scheme is easier and has faster convergence while the Crank-Nicholson scheme is more accurate. In addition, the stability analysis using Fourier (von Neumann) method of two schemes is investigated. The resulting analysis showed that the first scheme
is conditionally stable if, r ≤ 2 − aβ∆t , ∆t ≤ 2(∆x)2 and the second
scheme is unconditionally stable.
Isoparametric bilinear quadrilateral element _ ppt presentationFilipe Giesteira
The theoretical formulation of an isoparametric element, from the Lagrange family. In addition, the MATLAB code of the FEM from the bilinear element with four nodes was also implemented.
Assignment developed in the scope of the finite element method course, lectured at FEUP (Faculdade de Engenharia da Universidade do Porto).
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...mathsjournal
In this article we assume that experts express their view points by way of approximation of Triangular fuzzy numbers. We take the help of fuzzy set theory concept to model the situation and present a method to aggregate these approximations of triangular fuzzy numbers to obtain an overall approximation of triangular fuzzy number for each system and then linear ordering done before the best system is chosen. A comparison has been made between approximation of triangular fuzzy systems and the corresponding fuzzy triangular numbers systems. The notions like fuzziness and ambiguity for the approximation of triangular fuzzy numbers are also found.
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...mathsjournal
In this article we assume that experts express their view points by way of approximation of Triangular
fuzzy numbers. We take the help of fuzzy set theory concept to model the situation and present a method to
aggregate these approximations of triangular fuzzy numbers to obtain an overall approximation of
triangular fuzzy number for each system and then linear ordering done before the best system is chosen. A
comparison has been made betweenapproximation of triangular fuzzy systems and the corresponding fuzzy
triangular numbers systems. The notions like fuzziness and ambiguity for the approximation of triangular
fuzzy numbers are also found.
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...mathsjournal
In this article we assume that experts express their view points by way of approximation of Triangular
fuzzy numbers. We take the help of fuzzy set theory concept to model the situation and present a method to
aggregate these approximations of triangular fuzzy numbers to obtain an overall approximation of
triangular fuzzy number for each system and then linear ordering done before the best system is chosen. A
comparison has been made betweenapproximation of triangular fuzzy systems and the corresponding fuzzy
triangular numbers systems. The notions like fuzziness and ambiguity for the approximation of triangular
fuzzy numbers are also found.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
When Classifier Selection meets Information Theory: A Unifying ViewMohamed Farouk
Classifier selection aims to reduce the size of an
ensemble of classifiers in order to improve its efficiency and
classification accuracy. Recently an information-theoretic view
was presented for feature selection. It derives a space of possible
selection criteria and show that several feature selection criteria
in the literature are points within this continuous space. The
contribution of this paper is to export this information-theoretic
view to solve an open issue in ensemble learning which is
classifier selection. We investigated a couple of informationtheoretic
selection criteria that are used to rank classifiers.
On solving fuzzy delay differential equationusing bezier curves IJECEIAES
In this article, we plan to use Bezier curves method to solve linear fuzzy delay differential equations. A Bezier curves method is presented and modified to solve fuzzy delay problems taking the advantages of the fuzzy set theory properties. The approximate solution with different degrees is compared to the exact solution to confirm that the linear fuzzy delay differential equations process is accurate and efficient. Numerical example is explained and analyzed involved first order linear fuzzy delay differential equations to demonstrate these proper features of this proposed problem.
Handling missing data with expectation maximization algorithmLoc Nguyen
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating parameter of statistical models in case of incomplete data or hidden data. EM assumes that there is a relationship between hidden data and observed data, which can be a joint distribution or a mapping function. Therefore, this implies another implicit relationship between parameter estimation and data imputation. If missing data which contains missing values is considered as hidden data, it is very natural to handle missing data by EM algorithm. Handling missing data is not a new research but this report focuses on the theoretical base with detailed mathematical proofs for fulfilling missing values with EM. Besides, multinormal distribution and multinomial distribution are the two sample statistical models which are concerned to hold missing values.
Expectation maximization (EM) algorithm is a popular and powerful mathematical method for parameter estimation in case that there exist both observed data and hidden data. Therefore, EM is appropriate to applications which aim to exploit latent aspects under heterogeneous data. This report focuses on probabilistic finite mixture model which is a popular and successful application of EM, which is fully explained in my book (Nguyen, Tutorial on EM algorithm, 2020, pp. 78-88). I also proposed a special regression model associated with mixture model in which missing values are acceptable.
Conditional mixture model for modeling attributed dyadic dataLoc Nguyen
Dyadic data contains co-occurrences of objects, which is often modeled by finite mixture model which in turn is learned by expectation maximization (EM) algorithm. Objects in traditional dyadic data are identified by names, causing the drawback which is that it is impossible to extract implicit valuable knowledge under objects. In this research, I propose the so-called attributed dyadic data (ADD) in which each object has an informative attribute and each co-occurrence of two objects is associated with a value. ADD is flexible and covers most of structures / forms of dyadic data. Conditional mixture model (CMM), which is a variant of finite mixture model, is applied into learning ADD. Moreover, a significant feature of CMM is that any co-occurrence of two objects is based on some conditional variable. As a result, CMM can predict or estimate co-occurrent values based on regression model, which extends applications of ADD and CMM.
Conditional mixture model and its application for regression modelLoc Nguyen
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating statistical parameter when data sample contains hidden part and observed part. EM is applied to learn finite mixture model in which the whole distribution of observed variable is average sum of partial distributions. Coverage ratio of every partial distribution is specified by the probability of hidden variable. An application of mixture model is soft clustering in which cluster is modeled by hidden variable whereas each data point can be assigned to more than one cluster and degree of such assignment is represented by the probability of hidden variable. However, such probability in traditional mixture model is simplified as a parameter, which can cause loss of valuable information. Therefore, in this research I propose a so-called conditional mixture model (CMM) in which the probability of hidden variable is modeled as a full probabilistic density function (PDF) that owns individual parameter. CMM aims to extend mixture model. I also propose an application of CMM which is called adaptive regression model (ARM). Traditional regression model is effective when data sample is scattered equally. If data points are grouped into clusters, regression model tries to learn a unified regression function which goes through all data points. Obviously, such unified function is not effective to evaluate response variable based on grouped data points. The concept “adaptive” of ARM means that ARM solves the ineffectiveness problem by selecting the best cluster of data points firstly and then evaluating response variable within such best cluster. In orther words, ARM reduces estimation space of regression model so as to gain high accuracy in calculation.
Keywords: expectation maximization (EM) algorithm, finite mixture model, conditional mixture model, regression model, adaptive regression model (ARM).
This is chapter 4 “Variants of EM algorithm” in my book “Tutorial on EM algorithm”, which focuses on EM variants. The main purpose of expectation maximization (EM) algorithm, also GEM algorithm, is to maximize the log-likelihood L(Θ) = log(g(Y|Θ)) with observed data Y by maximizing the conditional expectation Q(Θ’|Θ). Such Q(Θ’|Θ) is defined fixedly in E-step. Therefore, most variants of EM algorithm focus on how to maximize Q(Θ’|Θ) in M-step more effectively so that EM is faster or more accurate.
-contraction and some fixed point results via modified !-distance mappings in t...IJECEIAES
In this Article, we introduce the notion of an -contraction, which is based on modified !-distance mappings, and employ this new definition to prove some fixed point result. Moreover, to highlight the significance of our work, we present an interesting example along with an application. '
International journal of engineering and mathematical modelling vol2 no1_2015_1IJEMM
Our efforts are mostly concentrated on improving the convergence rate of the numerical procedures both from the viewpoint of cost-efficiency and accuracy by handling the parametrization of the shape to be optimized. We employ nested parameterization supports of either shape, or shape deformation, and the classical process of degree elevation resulting in exact geometrical data transfer from coarse to fine representations. The algorithms mimick classical multigrid strategies and are found very effective in terms of convergence acceleration. In this paper, we analyse and demonstrate the efficiency of the two-level correction algorithm which is the basic block of a more general miltilevel strategy.
Numerical Solution Of Delay Differential Equations Using The Adomian Decompos...theijes
Adomian Decomposition Method has been applied to obtain approximate solution to a wide class of ordinary and partial differential equation problems arising from Physics, Chemistry, Biology and Engineering. In this paper, a numerical solution of delay differential Equations (DDE) based on the Adomian Decomposition Method (ADM) is presented. The solutions obtained were without discretization nor linearization. Example problems were solved for demonstration. Keywords: Adomian Decomposition, Delay Differential Equations (DDE), Functional Equations , Method of Characteristic.
Symmetric quadratic tetration interpolation using forward and backward opera...IJECEIAES
The existed interpolation method, based on the second-order tetration polynomial, has the asymmetric property. The interpolation results, for each considering region, give individual characteristics. Although the interpolation performance has been better than the conventional methods, the symmetric property for signal interpolation is also necessary. In this paper, we propose the symmetric interpolation formulas derived from the second-order tetration polynomial. The combination of the forward and backward operations was employed to construct two types of the symmetric interpolation. Several resolutions of the fundamental signals were used to evaluate the signal reconstruction performance. The results show that the proposed interpolations can be used to reconstruct the fundamental signal and its peak signal to noise ratio (PSNR) is superior to the conventional interpolation methods, except the cubic spline interpolation for the sine wave signal. However, the visual results show that it has a small difference. Moreover, our proposed interpolations converge to the steady-state faster than the cubic spline interpolation. In addition, the option number increasing will reinforce their sensitivity.
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...BRNSS Publication Hub
We know that a large number of problems in differential equations can be reduced to finding the solution x to an equation of the form Tx=y. The operator T maps a subset of a Banach space X into another Banach space Y and y is a known element of Y. If y=0 and Tx=Ux−x, for another operator U, the equation Tx=y is equivalent to the equation Ux=x. Naturally, to solve Ux=x, we must assume that the range R (U) and the domain D (U) have points in common. Points x for which Ux=x are called fixed points of the operator U. In this work, we state the main fixed-point theorems that are most widely used in the field of differential equations. These are the Banach contraction principle, the Schauder–Tychonoff theorem, and the Leray–Schauder theorem. We will only prove the first theorem and then proceed.
Learning dyadic data and predicting unaccomplished co-occurrent values by mix...Loc Nguyen
Dyadic data which is also called co-occurrence data (COD) contains co-occurrences of objects. Searching for statistical models to represent dyadic data is necessary. Fortunately, finite mixture model is a solid statistical model to learn and make inference on dyadic data because mixture model is built smoothly and reliably by expectation maximization (EM) algorithm which is suitable to inherent spareness of dyadic data. This research summarizes mixture models for dyadic data. When each co-occurrence in dyadic data is associated with a value, there are many unaccomplished values because a lot of co-occurrences are inexistent. In this research, these unaccomplished values are estimated as mean (expectation) of random variable given partial probabilistic distributions inside dyadic mixture model.
International Journal of Mathematics and Statistics Invention (IJMSI) inventionjournals
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
Very often one has to deal with high-dimensional random variables (RVs). A high-dimensional RV can be described by its probability density (\pdf) and/or by the corresponding probability characteristic functions (\pcf), or by a function representation. Here the interest is mainly to compute characterisations like the entropy, or
relations between two distributions, like their Kullback-Leibler divergence, or more general measures such as $f$-divergences,
among others. These are all computed from the \pdf, which is often not available directly, and it is a computational challenge to even represent it in a numerically feasible fashion in case the dimension $d$ is even moderately large. It is an even stronger numerical challenge to then actually compute said characterisations in the high-dimensional case.
In this regard, in order to achieve a computationally feasible task, we propose to represent the density by a high order tensor product, and approximate this in a low-rank format.
Adversarial Variational Autoencoders to extend and improve generative model -...Loc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is a good data generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Nghịch dân chủ luận (tổng quan về dân chủ và thể chế chính trị liên quan đến ...Loc Nguyen
Vũ trụ có vật chất và phản vật chất, xã hội có xung đột và hữu hão để phát triển và suy tàn rồi suy tàn và phát triển. Tôi dựa vào đó để biện minh cho một bài viết có tính chất phản động nghịch chuyển thời cuộc nhưng bạn đọc sẽ tự tìm ra ý nghĩa bất ly của các hình thái xã hội. Ngoài ra bài viết này không đi sâu vào nghiên cứu pháp luật, chỉ đưa ra một cách nhìn tổng quan về dân chủ và thể chế chính trị liên quan đến triết học và tôn giáo, mà theo đó đóng góp của bài viết là khái niệm “nương tạm” của tư pháp không thật sự từ bầu cử và cũng không thật sự từ bổ nhiệm.
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsLoc Nguyen
Collaborative filtering (CF) is a popular technique in recommendation study. Concretely, items which are recommended to user are determined by surveying her/his communities. There are two main CF approaches, which are memory-based and model-based. I propose a new CF model-based algorithm by mining frequent itemsets from rating database. Hence items which belong to frequent itemsets are recommended to user. My CF algorithm gives immediate response because the mining task is performed at offline process-mode. I also propose another so-called Roller algorithm for improving the process of mining frequent itemsets. Roller algorithm is implemented by heuristic assumption “The larger the support of an item is, the higher it’s likely that this item will occur in some frequent itemset”. It models upon doing white-wash task, which rolls a roller on a wall in such a way that is capable of picking frequent itemsets. Moreover I provide enhanced techniques such as bit representation, bit matching and bit mining in order to speed up recommendation process. These techniques take advantages of bitwise operations (AND, NOT) so as to reduce storage space and make algorithms run faster.
Simple image deconvolution based on reverse image convolution and backpropaga...Loc Nguyen
Deconvolution task is not important in convolutional neural network (CNN) because it is not imperative to recover convoluted image when convolutional layer is important to extract features. However, the deconvolution task is useful in some cases of inspecting and reflecting a convolutional filter as well as trying to improve a generated image when information loss is not serious with regard to trade-off of information loss and specific features such as edge detection and sharpening. This research proposes a duplicated and reverse process of recovering a filtered image. Firstly, source layer and target layer are reversed in accordance with traditional image convolution so as to train the convolutional filter. Secondly, the trained filter is reversed again to derive a deconvolutional operator for recovering the filtered image. The reverse process is associated with backpropagation algorithm which is most popular in learning neural network. Experimental results show that the proposed technique in this research is better to learn the filters that focus on discovering pixel differences. Therefore, the main contribution of this research is to inspect convolutional filters from data.
Adversarial Variational Autoencoders to extend and improve generative modelLoc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is good at generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Machine learning forks into three main branches such as supervised learning, unsupervised learning, and reinforcement learning where reinforcement learning is much potential to artificial intelligence (AI) applications because it solves real problems by progressive process in which possible solutions are improved and finetuned continuously. The progressive approach, which reflects ability of adaptation, is appropriate to the real world where most events occur and change continuously and unexpectedly. Moreover, data is getting too huge for supervised learning and unsupervised learning to draw valuable knowledge from such huge data at one time. Bayesian optimization (BO) models an optimization problem as a probabilistic form called surrogate model and then directly maximizes an acquisition function created from such surrogate model in order to maximize implicitly and indirectly the target function for finding out solution of the optimization problem. A popular surrogate model is Gaussian process regression model. The process of maximizing acquisition function is based on updating posterior probability of surrogate model repeatedly, which is improved after every iteration. Taking advantages of acquisition function or utility function is also common in decision theory but the semantic meaning behind BO is that BO solves problems by progressive and adaptive approach via updating surrogate model from a small piece of data at each time, according to ideology of reinforcement learning. Undoubtedly, BO is a reinforcement learning algorithm with many potential applications and thus it is surveyed in this research with attention to its mathematical ideas. Moreover, the solution of optimization problem is important to not only applied mathematics but also AI.
Support vector machine is a powerful machine learning method in data classification. Using it for applied researches is easy but comprehending it for further development requires a lot of efforts. This report is a tutorial on support vector machine with full of mathematical proofs and example, which help researchers to understand it by the fastest way from theory to practice. The report focuses on theory of optimization which is the base of support vector machine.
A Proposal of Two-step Autoregressive ModelLoc Nguyen
Autoregressive (AR) model and conditional autoregressive (CAR) model are specific regressive models in which independent variables and dependent variable imply the same object. They are powerful statistical tools to predict values based on correlation of time domain and space domain, which are useful in epidemiology analysis. In this research, I combine them by the simple way in which AR and CAR is estimated in two separate steps so as to cover time domain and space domain in spatial-temporal data analysis. Moreover, I integrate logistic model into CAR model, which aims to improve competence of autoregressive models.
Extreme bound analysis based on correlation coefficient for optimal regressio...Loc Nguyen
Regression analysis is an important tool in statistical analysis, in which there is a demand of discovering essential independent variables among many other ones, especially in case that there is a huge number of random variables. Extreme bound analysis is a powerful approach to extract such important variables called robust regressors. In this research, a so-called Regressive Expectation Maximization with RObust regressors (REMRO) algorithm is proposed as an alternative method beside other probabilistic methods for analyzing robust variables. By the different ideology from other probabilistic methods, REMRO searches for robust regressors forming optimal regression model and sorts them according to descending ordering given their fitness values determined by two proposed concepts of local correlation and global correlation. Local correlation represents sufficient explanatories to possible regressive models and global correlation reflects independence level and stand-alone capacity of regressors. Moreover, REMRO can resist incomplete data because it applies Regressive Expectation Maximization (REM) algorithm into filling missing values by estimated values based on ideology of expectation maximization (EM) algorithm. From experimental results, REMRO is more accurate for modeling numeric regressors than traditional probabilistic methods like Sala-I-Martin method but REMRO cannot be applied in case of nonnumeric regression model yet in this research.
There are many investment ways such as bank depositing, enterprise business, and stock investment. Bank depositing is a safe and easy way to invest and hence, it is known as reference tool to compare or make decision on other investment methods. Alternately, stock investment is preferred method with good feeling about its preeminence. However, according to mathematical model, stock investment and bank depositing have the same benefit if their growth rate and interest rate are the same. Therefore, I propose a so-called jagged stock investment (JSI) strategy in which the chain of buying stock in the given time interval is modeled as a saw with expectation that JSI strategy gets frequently profitable.
Global optimization is an imperative development of local optimization because there are many problems in artificial intelligence and machine learning requires highly acute solutions over entire domain. There are many methods to resolve the global optimization, which can be classified into three groups such as analytic methods (purely mathematical methods), probabilistic methods, and heuristic methods. Especially, heuristic methods like particle swarm optimization and ant bee colony attract researchers because their effective and practical techniques which are easy to be implemented by computer programming languages. However, these heuristic methods are lacking in theoretical mathematical fundamental. Fortunately, minima distribution establishes a strict mathematical relationship between optimized target function and its global minima. In this research, I try to study minima distribution and apply it into explaining convergence and convergence speed of optimization algorithms. Especially, weak conditions of convergence and monotonicity within minima distribution are drawn so as to be appropriate to practical optimization methods.
This is the chapter 3 "Properties and convergence of EM algorithm" in my book “Tutorial on EM algorithm”, which focuses on mathematical explanation of the convergence of GEM algorithm given by DLR (Dempster, Laird, & Rubin, 1977, pp. 6-9).
Local optimization with convex function is solved perfectly by traditional mathematical methods such as Newton-Raphson and gradient descent but it is not easy to solve the global optimization with arbitrary function although there are some purely mathematical approaches such as approximation, cutting plane, branch and bound, and interval method which can be impractical because of their complexity and high computation cost. Recently, some evolutional algorithms which are inspired from biological activities are proposed to solve the global optimization by acceptable heuristic level. Among them is particle swarm optimization (PSO) algorithm which is proved as an effective and feasible solution for global optimization in real applications. Although the ideology of PSO is not complicated, it derives many variants, which can make new researchers confused. Therefore, this tutorial focuses on describing, systemizing, and classifying PSO by succinct and straightforward way. Moreover, a combination of PSO and another evolutional algorithm as artificial bee colony (ABC) algorithm for improving PSO itself or solving other advanced problems are mentioned too.
Maximum likelihood estimation (MLE) is a popular method for parameter estimation in both applied probability and statistics but MLE cannot solve the problem of incomplete data or hidden data because it is impossible to maximize likelihood function from hidden data. Expectation maximum (EM) algorithm is a powerful mathematical tool for solving this problem if there is a relationship between hidden data and observed data. Such hinting relationship is specified by a mapping from hidden data to observed data or by a joint probability between hidden data and observed data (showing MLE, EM, and practical EM, hidden info implies the hinting relationship).
The essential ideology of EM is to maximize the expectation of likelihood function over observed data based on the hinting relationship instead of maximizing directly the likelihood function of hidden data (showing the full EM with proof along with two steps).
An important application of EM is (finite) mixture model which in turn is developed towards two trends such as infinite mixture model and semiparametric mixture model. Especially, in semiparametric mixture model, component probabilistic density functions are not parameterized. Semiparametric mixture model is interesting and potential for other applications where probabilistic components are not easy to be specified (showing mixture models).
I raise a question that whether it is possible to backward discover semiparametric EM from semiparametric mixture model. I hope that this question will open a new trend or new extension for EM algorithm (showing the question).
Maximum likelihood estimation (MLE) is a popular method for parameter estimation in both applied probability and statistics but MLE cannot solve the problem of incomplete data or hidden data because it is impossible to maximize likelihood function from hidden data. Expectation maximum (EM) algorithm is a powerful mathematical tool for solving this problem if there is a relationship between hidden data and observed data. Such hinting relationship is specified by a mapping from hidden data to observed data or by a joint probability between hidden data and observed data. In other words, the relationship helps us know hidden data by surveying observed data. The essential ideology of EM is to maximize the expectation of likelihood function over observed data based on the hinting relationship instead of maximizing directly the likelihood function of hidden data. Pioneers in EM algorithm proved its convergence. As a result, EM algorithm produces parameter estimators as well as MLE does. This tutorial aims to provide explanations of EM algorithm in order to help researchers comprehend it. Moreover some improvements of EM algorithm are also proposed in the tutorial such as combination of EM and third-order convergence Newton-Raphson process, combination of EM and gradient descent method, and combination of EM and particle swarm optimization (PSO) algorithm.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Expectation Maximization Algorithm with Combinatorial Assumption
1. Expectation Maximization Algorithm
with Combinatorial Assumption
Prof. Dr. Loc Nguyen, PostDoc
Loc Nguyen’s Academic Network, Vietnam
Email: ng_phloc@yahoo.com
Homepage: www.locnguyen.net
EM with CA - EcoSta2022 - Loc Nguyen
June 4 2022 1
2. Abstract
Expectation maximization (EM) algorithm is a popular and powerful mathematical
method for parameter estimation in case that there exist both observed data and
hidden data. The EM process depends on an implicit relationship between observed
data and hidden data which is specified by a mapping function in traditional EM and
a joint probability density function (PDF) in practical EM. However, the mapping
function is vague and impractical whereas the joint PDF is not easy to be defined
because of heterogeneity between observed data and hidden data. The research aims
to improve competency of EM by making it more feasible and easier to be specified,
which removes the vagueness. Therefore, the research proposes an assumption that
observed data is the combination of hidden data which is realized as an analytic
function where data points are numerical. In other words, observed points are
supposedly calculated from hidden points via regression model. Mathematical
computations and proofs indicate feasibility and clearness of the proposed method
which can be considered as an extension of EM.
2
EM with CA - EcoSta2022 - Loc Nguyen
June 4 2022
3. Table of contents
1. Introduction
2. EM with combinatorial assumption
3. Case of scalar responsor
4. Without combinatorial assumption
5. Conclusions
3
EM with CA - EcoSta2022 - Loc Nguyen
June 4 2022
4. 1. Introduction
Expectation maximization (EM) algorithm developed by Dempster, Laird, and
Rubin called DLR (Dempster, Laird, & Rubin, 1977) is an extension of
maximum likelihood estimation (MLE) method when there exist both observed
data X and hidden data Y and there is an implicit mapping φ: X → Y such that φ–
1(Y) = {𝑋 ∈ 𝑿: φ(X) = Y}. Let f(X | Θ) be the density probability function (PDF)
of random variable X and let g(Y | Θ) be the PDF of random variable Y. Note, Θ
is (vector) parameter.
𝑔 𝑌 Θ =
𝜑−1 𝑌
𝑓 𝑋 Θ d𝑋
The conditional PDF of X given Y, denoted k(X | Y, Θ), is specified as follows:
𝑘 𝑋 𝑌, Θ =
𝑓 𝑋 Θ
𝑔 𝑌 Θ
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 4
5. 1. Introduction
Given sample 𝒴 = {Y1, Y2,…, YN} whose all Yi (s) are mutually independent and
identically distributed (iid), EM has many iterations and each iteration has two steps
in which expectation step (E-step) determines the conditional expectation Q(Θ | Θ(t))
and maximization step (M-step) re-estimates parameter as follows:
E-step: The expectation Q(Θ | Θ(t)) is determined based on current parameter Θ(t)
(Nguyen, 2020).
𝑄 Θ Θ 𝑡 =
𝑖=1
𝑁
𝜑−1 𝑌𝑖
𝑘 𝑋 𝑌𝑖, Θ 𝑡 log 𝑓 𝑋 Θ d𝑋 (1.1)
M-step: The next parameter Θ(t+1) is a maximizer of Q(Θ | Θ(t)) with subject to Θ.
Note that Θ(t+1) will become current parameter at the next iteration (the (t+1)th
iteration).
EM converges at some tth iteration; at that time, Θ* = Θ(t+1) = Θ(t) is the optimal
estimate of EM process.
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 5
6. 1. Introduction
The expectation Q(Θ | Θ(t)) based on the mapping φ(X) represents the
traditional EM proposed by DLR. Because it is too vague to specify the
mapping φ(X), practical EM issues the joint PDF of X and Y denoted
f(X, Y | Θ) as prerequisite condition to run EM such that:
𝑓 𝑌 Θ =
𝑋
𝑓 𝑋, 𝑌 Θ d𝑋 and 𝑓 𝑋 𝑌, Θ =
𝑓 𝑋, 𝑌 Θ
𝑋
𝑓 𝑋, 𝑌 Θ d𝑋
The expectation Q(Θ | Θ(t)) becomes (Nguyen, 2020):
𝑄 Θ Θ 𝑡 =
𝑖=1
𝑁
𝑋
𝑓 𝑋 𝑌𝑖, Θ 𝑡 log 𝑓 𝑋, 𝑌𝑖 Θ d𝑋 (1.2)
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 6
7. 1. Introduction
• If X and Y are too different in context due to data heterogeneity
and they are not independent, it will be difficult to define the PDF
f(X, Y | Θ). In general, it is not easy to specify both the mapping
φ(X) in traditional EM and the joint PDF f(X, Y | Θ) in practical
EM.
• Therefore, the purpose of this research is to extend competency of
EM, in which the observed data Y is assumed to be the
combination of X, called combinatorial assumption (CA). In other
words, Y can be calculated by an analytic function of X, which is
more feasible than specifying the mapping φ(X) and easier than
specifying the joint PDF f(X, Y).
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 7
8. 2. EM with combinatorial assumption
Given sample 𝒴 = {Y1, Y2,…, YN} of size N in which all Yi (s) are iid. Suppose X = (x1, x2,…, xn)
which is vector of size n distributes normally with mean vector μ and covariance matrix Σ as follows:
𝑓 𝑋 𝜇, Σ = 2𝜋 −
𝑛
2 Σ −
1
2exp −
1
2
𝑋 − 𝜇 𝑇
Σ−1
(𝑋 − 𝜇) (2.1)
The nx1 mean μ and the nxn covariance matrix Σ:
𝜇 =
𝜇1
𝜇2
⋮
𝜇𝑛
, Σ =
𝜎1
2
𝜎12
2
⋯ 𝜎1𝑛
2
𝜎21
2
𝜎2
2
⋯ 𝜎2𝑛
2
⋮ ⋮ ⋱ ⋮
𝜎𝑛1
2
𝜎𝑛2
2
⋯ 𝜎𝑛
2
Note, σij
2 is covariance of xi and xj whereas σi
2 is variance of xi. Let Y = (y1, y2,…, ym)T be random
variable representing all sample random variable Yi = (yi1, yi2,…, yim)T. Suppose there is an
assumption that Y is a combination of partial random variables (components) of X such that:
𝑦𝑖 = 𝛼𝑖0 +
𝑗=1
𝑛
𝛼𝑖𝑗𝑥𝑖
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 8
9. 2. EM with combinatorial assumption
As a generalization, let A be mxn matrix whose elements are called regressive coefficients as follows:
𝐴 =
𝛼10 𝛼11 𝛼12 ⋯ 𝛼1𝑛
𝛼20 𝛼21 𝛼22 ⋯ 𝛼2𝑛
⋮ ⋮ ⋮ ⋱ ⋮
𝛼𝑚0 𝛼𝑚1 𝛼𝑚2 ⋯ 𝛼𝑚𝑛
, 𝐴0 =
𝛼10
𝛼20
⋮
𝛼𝑚0
, 𝐴 =
𝛼11 𝛼12 ⋯ 𝛼1𝑛
𝛼21 𝛼22 ⋯ 𝛼2𝑛
⋮ ⋮ ⋱ ⋮
𝛼𝑚1 𝛼𝑚2 ⋯ 𝛼𝑚𝑛
This implies the so-called vector regressive model as follows
𝑌 = 𝐴0 + 𝐴𝑋 (2.2)
Let:
𝐴𝑖 = 𝛼𝑖1, 𝛼𝑖2, … , 𝛼𝑖𝑛
𝑇 and 𝐴 =
𝐴1
𝐴2
⋮
𝐴𝑚
𝑦𝑖 = 𝛼𝑖0 + 𝐴𝑖
𝑇
𝑋
The regression function above implies the combinatorial assumption (CA) in which Y is called
responsor and X is called regressor whereas A is called regressive matrix. Therefore, the method proposed
here is called CA method or CA algorithm.
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 9
10. 2. EM with combinatorial assumption
Suppose Y distributes normally with mean 𝐴0 + 𝐴𝑋 and mxm covariance matrix S as
follows:
𝑓 𝑌 𝑋, 𝐴, 𝑆 = 2𝜋 −
𝑚
2 𝑆 −
1
2exp −
1
2
𝑌 − 𝐴0 − 𝐴𝑋
𝑇
𝑆−1(𝑌 − 𝐴0 − 𝐴𝑋) (2.3)
The marginal PDF of Y is now defined by support of regression model as follows:
𝑓 𝑌 Θ =
ℝ𝑛
𝑓 𝑋, 𝑌 Θ d𝑋 ≝
ℝ𝑛
𝑓 𝑋 𝜇, Σ 𝑓 𝑌 𝑋, 𝐴, 𝑆 d𝑋
Where parameter Θ = (μ, Σ, A, S)T is compound parameter. Consequently, the
expectation Q(Θ | Θ(t)) becomes:
𝑄 Θ Θ 𝑡
=
𝑖=1
𝑁
ℝ𝑛
𝑓 𝑋 𝑌𝑖, Θ 𝑡
log 𝑓 𝑋 𝜇, Σ 𝑓 𝑌𝑖 𝑋, 𝐴, 𝑆 d𝑋 (2.4)
Essentially, the conditional PDF f(X | Y, Θ) is mentioned in the next slide.
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 10
11. 2. EM with combinatorial assumption
I approximate the PDF f(X | Y, Θ) as follows:
𝑓 𝑋 𝑌, Θ =
𝑓 𝑋, 𝑌 Θ
𝑓 𝑌 Θ
≅
1
𝐸 𝑓0 𝑋, 𝑌 Θ
∗ exp −
1
2
𝑋 − 𝜇 𝑇Σ−1 𝑋 − 𝜇 − 2 𝑌 − 𝐴0
𝑇𝑆−1𝐴 𝑋 − 𝜇
Where,
𝑓0 𝑋, 𝑌 Θ ≅ exp −
1
2
𝑋 − 𝜇 𝑇
Σ−1
𝑋 − 𝜇 − 2 𝑌 − 𝐴0
𝑇
𝑆−1
𝐴 𝑋 − 𝜇
Let k(Y|Θ) be the constant with subject to X but it is function of Y with parameter Θ:
𝑘 𝑌 Θ =
1
𝐸 𝑓0 𝑋, 𝑌 Θ
(2.5)
The conditional PDF f(X | Y, Θ(t)) is specified (approximated) at E-step of some tth iteration process as
follows:
𝑓 𝑋 𝑌, Θ 𝑡 ≝ 𝑘 𝑌 Θ 𝑡 ∗ exp −
1
2
𝑋 − 𝜇 𝑇Σ−1 𝑋 − 𝜇 − 2 𝑌 − 𝐴0
𝑇𝑆−1𝐴 𝑋 − 𝜇 (2.6)
At M-step of current tth iteration, Q(Θ|Θ(t)) is maximized by setting its partial derivatives regarding Θ to be
zero, mentioned in the next slides.
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 11
12. 2. EM with combinatorial assumption
The next parameter μ(t+1) at M-step of some tth iteration that maximizes Q(Θ|Θ(t)) is solution of
the equation
𝜕𝑄 Θ Θ 𝑡
𝜕𝜇
= 𝟎𝑇
, as follows:
𝜇 𝑡+1
= 𝜇 𝑡
+
1
𝑁
𝑖=1
𝑁
𝐸 𝑋 𝑌𝑖, Θ 𝑡
(2.7)
Similarly, the next parameter Σ(t+1) is solution of the equation
𝜕𝑄 Θ Θ 𝑡
𝜕Σ
= 𝟎 , as follows:
Σ 𝑡+1
=
1
𝑁
𝑖=1
𝑁
𝐸 𝑋𝑋𝑇
𝑌𝑖, Θ 𝑡
(2.8)
The next parameter A0
(t+1) is solution of the equation
𝜕𝑄 Θ Θ 𝑡
𝜕𝐴0
= 0, as follows:
𝐴0
𝑡+1
=
1
𝑁
𝑖=1
𝑁
𝑌𝑖 − 𝑁𝐴 𝑡 𝜇 𝑡 − 𝑁𝐴 𝑡 𝜇 𝑡+1 − 𝜇 𝑡 (2.9)
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 12
13. 2. EM with combinatorial assumption
Similarly, by setting
𝜕𝑄 Θ Θ 𝑡
𝜕𝐴
= 𝟎 , the next parameter 𝐴 𝑡+1 is solution of n+1 equations as follows:
𝐴 𝑡+1
≝
𝐴 𝜇 𝑡
+ 𝐸 𝑋 𝑌𝑖, Θ 𝑡
=
1
𝑁
𝑖=1
𝑁
𝑌𝑖 − 𝐴0
𝐴𝐸 𝑥𝑗𝑋 𝑌𝑖, Θ 𝑡
=
1
𝑁
𝑖=1
𝑁
𝑌𝑖 − 𝐴0 𝐸 𝑥𝑗 𝑌𝑖, Θ 𝑡
+ 𝜇𝑗
𝑡
𝑗=1,𝑛
(2.10)
The next parameter S(t+1) is solution of the equation
𝜕𝑄 Θ Θ 𝑡
𝜕𝑆
= 𝟎 , as follows:
𝑆 𝑡+1
=
1
𝑁
𝐴 𝑡
𝜇𝜇𝑇
𝐴 𝑡 𝑇
+
𝑖=1
𝑁
𝑌𝑖 − 𝐴0
𝑡
− 2𝐴 𝑡
𝜇 𝑌𝑖 − 𝐴0
𝑡
𝑇
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 13
14. 2. EM with combinatorial assumption
In general, CA method is EM process with two steps as follows:
E-step: Determining conditional PDF f(X | Y, Θ(t)) specified by equation 2.6 based on current parameter Θ(t).
𝑓 𝑋 𝑌, Θ 𝑡
≝ 𝑘 𝑌 Θ 𝑡
∗ exp −
1
2
𝑋 − 𝜇 𝑇
Σ−1
𝑋 − 𝜇 − 2 𝑌 − 𝐴0
𝑇
𝑆−1
𝐴 𝑋 − 𝜇
M-step: Calculating next parameters Θ(t+1) = (μ(t+1), Σ(t+1), A(t+1), S(t+1))T based on f(X | Y, Θ(t)) determined in
the E-step, specified by equations 2.7, 2.8, 2.9, 2.10, and 2.11.
𝜇 𝑡+1 = 𝜇 𝑡 +
1
𝑁
𝑖=1
𝑁
𝐸 𝑋 𝑌𝑖, Θ 𝑡 , Σ 𝑡+1 =
1
𝑁
𝑖=1
𝑁
𝐸 𝑋𝑋𝑇 𝑌𝑖, Θ 𝑡 , 𝐴0
𝑡+1
=
1
𝑁
𝑖=1
𝑁
𝑌𝑖 − 𝑁𝐴 𝑡 𝜇 𝑡 − 𝑁𝐴 𝑡 𝜇 𝑡+1 − 𝜇 𝑡
𝐴 𝑡+1 ≝
𝐴 𝜇 + 𝐸 𝑋 𝑌𝑖, Θ 𝑡
=
1
𝑁
𝑖=1
𝑁
𝑌𝑖 − 𝐴0
𝐴𝐸 𝑥𝑗𝑋 𝑌𝑖, Θ 𝑡 =
1
𝑁
𝑖=1
𝑁
𝑌𝑖 − 𝐴0 𝐸 𝑥𝑗 𝑌𝑖, Θ 𝑡 + 𝜇𝑗
𝑗=1,𝑛
𝑆 𝑡+1
=
1
𝑁
𝐴 𝑡
𝜇𝜇𝑇
𝐴 𝑡 𝑇
+
𝑖=1
𝑁
𝑌𝑖 − 𝐴0
𝑡
− 2𝐴 𝑡
𝜇 𝑌𝑖 − 𝐴0
𝑡
𝑇
+
𝑖=1
𝑁
−2𝐴 𝑡
𝐸 𝑋 𝑌𝑖, Θ 𝑡
𝑌𝑖 − 𝐴0
𝑡
𝑇
− 𝜇𝑇
𝐴 𝑡 𝑇
+ 𝐴 𝑡
𝐸 𝑋𝑋𝑇
𝑌𝑖, Θ 𝑡
𝐴 𝑡 𝑇
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 14
15. 3. Case of scalar responsor
Given sample 𝒴 = {Y1, Y2,…, YN} of size N in which all Yi (s) are iid, equations 2.7, 2.8, 2.9, 2.10, and 2.11 for
estimating CA model become complicated although they are general. Here I simplify CA model when every
random vector variable Yi degrades into random scalar variable yi so that the conditional expectation Q(Θ | Θ(t))
becomes:
𝑄 Θ Θ 𝑡
=
𝑖=1
𝑁
ℝ𝑛
𝑓 𝑋 𝑦𝑖, Θ 𝑡
log 𝑓 𝑋 𝜇, Σ 𝑓 𝑦𝑖 𝑋, 𝛼, 𝜎2
d𝑋 (3.1)
The PDF f(X | μ, Σ) specified by equation 2.1 is not changed whereas the PDF f(y | X, α, σ2) is specified as follows:
𝑓 𝑦 𝑋, 𝛼, 𝜎2
=
1
2𝜋𝜎2
exp −
𝑦 − 𝛼0 − 𝛼𝑇
𝑋 2
2𝜎2
(3.2)
The vector regressive model becomes scalar regressive model as follows:
𝑦 = 𝛼0 +
𝑗=1
𝑛
𝛼𝑗𝑥𝑗 = 𝛼0 + 𝛼𝑇
𝑋 (3.3)
Where regressive coefficients are specified as follows:
𝛼 = 𝛼0, 𝛼1, … , 𝛼𝑛
𝑇
𝛼 = 𝛼1, … , 𝛼𝑛
𝑇 (3.4)
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 15
16. 3. Case of scalar responsor
I approximate the PDF f(X | y, Θ) as follows:
𝑓 𝑋 𝑦, Θ =
𝑓 𝑋, 𝑦 Θ
𝑓 𝑦 Θ
≅ 2𝜋 −
𝑛
2 Σ −
1
2exp −
𝑦 − 𝛼0
2𝛼𝑇Σ𝛼
2 𝜎2 2
∗ exp −
1
2
𝑋 − 𝜇 𝑇
Σ−1
𝑋 − 𝜇 +
𝑦 − 𝛼0 𝛼𝑇 𝑋 − 𝜇
𝜎2
Let k(y|Θ) be the constant with subject to X but it is function of y with parameter Θ, defined as follows:
𝑘 𝑦 Θ = 2𝜋 −
𝑛
2 Σ −
1
2exp −
𝑦 − 𝛼0
2
𝛼𝑇
Σ𝛼
2 𝜎2 2
(3.5)
The conditional PDF f(X | y, Θ) is specified at E-step of some tth iteration process as follows:
𝑓 𝑋 𝑦, Θ 𝑡
≝ 𝑘 𝑦 Θ 𝑡
∗ exp −
1
2
𝑋 − 𝜇 𝑡 𝑇
Σ−1 𝑡
𝑋 − 𝜇 𝑡
+
𝑦 − 𝛼0
𝑡
𝛼 𝑡 𝑇
𝑋 − 𝜇 𝑡
𝜎2 𝑡
(3.6)
At M-step of current tth iteration, Q(Θ|Θ(t)) is maximized by setting its partial derivatives regarding Θ to
be zero, mentioned in next slides.
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 16
17. 3. Case of scalar responsor
The next parameter μ(t+1) is solution of the equation
𝜕𝑄 Θ Θ 𝑡
𝜕𝜇
= 𝟎𝑇
as follows:
𝜇 𝑡+1
= 𝜇 𝑡
+
𝑖=1
𝑁
𝛿 𝑦𝑖 Θ 𝑡
𝑦𝑖 − 𝛼0
𝑡
𝑁 𝜎2 𝑡
Σ 𝑡
𝛼 𝑡
(3.7)
𝛿 𝑦 Θ = 2𝜋
𝑛
2 Σ
1
2𝑘 𝑦 Θ exp
1
2
𝑦 − 𝛼0
𝜎2
2
𝛼𝑇
Σ 𝑡
𝛼 (3.8)
The next parameter Σ(t+1) is solution of the equation
𝜕𝑄 Θ Θ 𝑡
𝜕Σ
= 𝟎 , as follows:
Σ 𝑡+1
=
1
𝑁
𝑖=1
𝑁
𝛿 𝑦𝑖 Θ 𝑡
Σ 𝑡
+
𝑦𝑖 − 𝛼0
𝑡
𝜎2 𝑡
2
Σ 𝑡
𝛼 𝑡
𝛼 𝑡 𝑇
Σ 𝑡
(3.9)
The next parameter α0
(t+1) is solution of the equation
𝜕𝑄 Θ Θ 𝑡
𝜕𝛼0
= 0, as follows:
𝛼0
𝑡+1
=
1
𝑁
𝑖=1
𝑁
𝑦𝑖 − 𝛼 𝑡 𝑇
𝜇 𝑡
− 𝑁 𝛼 𝑡 𝑇
𝜇 𝑡+1
− 𝜇 𝑡
(3.10)
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 17
18. 3. Case of scalar responsor
The next parameter 𝛼 𝑡+1
is solution of the equation
𝜕𝑄 Θ Θ 𝑡
𝜕𝛼
= 𝟎𝑇
, as follows:
𝛼 𝑡+1 = 𝑁𝜇 𝑡 𝜇𝑇 + 𝑁Σ 𝑡+1 + 2𝑁𝜇 𝑡 𝜇 𝑡+1 − 𝜇 𝑡 𝑇 −1
∗ 𝜇 𝑡
𝑖=1
𝑁
𝑦𝑖 − 𝛼0
𝑡
+
𝑖=1
𝑁
𝑦𝑖 − 𝛼0
𝑡
𝑚
𝑦𝑖 Θ 𝑡
𝑡+1
(3.11)
𝑚
𝑦𝑖 Θ 𝑡
𝑡+1
=
𝑦𝑖 − 𝛼0
𝑡
𝛿 𝑦𝑖 Θ 𝑡
𝜎2 𝑡
Σ 𝑡
𝛼 𝑡
(3.12)
The next parameter (σ2)(t+1) is solution of the equation
𝜕𝑄 Θ Θ 𝑡
𝜕 𝜎2 = 0, as follows:
𝜎2 𝑡+1
= 𝛼 𝑡 𝑇
𝜇 𝑡
2
+
1
𝑁
𝑖=1
𝑁
𝑦𝑖 − 𝛼0
𝑡
2
−
2 𝛼 𝑡 𝑇
𝜇 𝑡
𝑁
𝑖=1
𝑁
𝑦𝑖 − 𝛼0 −
2 𝛼 𝑡 𝑇
𝑁
𝑖=1
𝑁
𝑦𝑖 − 𝛼0
𝑡
𝑚
𝑦𝑖 Θ 𝑡
𝑡+1
(3.13)
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 18
19. 3. Case of scalar responsor
In general, CA method in case of scalar responsor is EM process with two steps as follows:
E-step: Calculating the quantities δ(yi | Θ(t)) specified by equation 3.8 based on current parameter Θ(t).
𝛿 𝑦𝑖 Θ 𝑡
= 2𝜋
𝑛
2 Σ 𝑡
1
2𝑘 𝑦𝑖 Θ 𝑡
exp
1
2
𝑦𝑖 − 𝛼0
𝑡
𝜎2 𝑡
2
𝛼 𝑡 𝑇
Σ 𝑡
𝛼 𝑡
M-step: Calculating next parameters Θ(t+1) = (μ(t+1), Σ(t+1), α(t+1), (σ2) (t+1))T based on the quantities δ(yi | Θ(t)) calculated in the
E-step, specified by equations 3.7, 3.9, 3.10, 3.11, and 3.13.
𝜇 𝑡+1
= 𝜇 𝑡
+
𝑖=1
𝑁
𝛿 𝑦𝑖 Θ 𝑡
𝑦𝑖 − 𝛼0
𝑡
𝑁 𝜎2 𝑡
Σ 𝑡
𝛼 𝑡
Σ 𝑡+1
=
1
𝑁
𝑖=1
𝑁
𝛿 𝑦𝑖 Θ 𝑡
Σ 𝑡
+
𝑦𝑖 − 𝛼0
𝑡
𝜎2 𝑡
2
Σ 𝑡
𝛼 𝑡
𝛼 𝑡 𝑇
Σ 𝑡
𝛼0
𝑡+1
=
1
𝑁
𝑖=1
𝑁
𝑦𝑖 − 𝛼 𝑡 𝑇
𝜇 𝑡
− 𝑁 𝛼 𝑡 𝑇
𝜇 𝑡+1
− 𝜇 𝑡
𝛼 𝑡+1
= 𝑁𝜇 𝑡
𝜇𝑇
+ 𝑁Σ 𝑡+1
+ 2𝑁𝜇 𝑡
𝜇 𝑡+1
− 𝜇 𝑡 𝑇 −1
∗ 𝜇 𝑡
𝑖=1
𝑁
𝑦𝑖 − 𝛼0
𝑡
+
𝑖=1
𝑁
𝑦𝑖 − 𝛼0
𝑡
𝑚
𝑦𝑖 Θ 𝑡
𝑡+1
𝜎2 𝑡+1
= 𝛼 𝑡 𝑇
𝜇 𝑡
2
+
1
𝑁
𝑖=1
𝑁
𝑦𝑖 − 𝛼0
𝑡
2
−
2 𝛼 𝑡 𝑇
𝜇 𝑡
𝑁
𝑖=1
𝑁
𝑦𝑖 − 𝛼0 −
2 𝛼 𝑡 𝑇
𝑁
𝑖=1
𝑁
𝑦𝑖 − 𝛼0
𝑡
𝑚
𝑦𝑖 Θ 𝑡
𝑡+1
Shortly, estimation equations in case of scalar responsor are simpler and easier to compute than the ones in general case.
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 19
20. 4. Without combinatorial assumption
Combinatorial assumption (CA) is not always feasible when there is no clear
relationship between observed data Y and hidden data X. In the most general case when
regressive model is not supported, we need to specify the joint PDF f(X, Y | Θ) such that
𝑄 Θ Θ 𝑡
=
𝑖=1
𝑁
𝑋
𝑓 𝑋 𝑌𝑖, Θ 𝑡
log 𝑓 𝑋, 𝑌𝑖 Θ d𝑋
Given random variable Y represents every random variable Yi, suppose f(X, Y | Θ)
distributes normally as multinormal distribution. If we assume X and Y are mutually
independent, it is unreal. Thus, given X = (x1, x2,…, xn)T is n-dimension vector and Y =
(y1, y2,…, ym)T is m-dimension vector, let Z is the composite random variable such that Z
=
𝑋
𝑌
= (z1, z2,…, zn, zn+1, zn+1,…, yn+m)T is m-n-dimension vector where
𝑧𝑖 = 𝑥𝑖, if 1 ≤ 𝑖 ≤ 𝑛
𝑧𝑛+𝑗 = 𝑦𝑗, if 1 ≤ 𝑗 ≤ 𝑚
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 20
21. 4. Without combinatorial assumption
Hence f(X, Y | Θ) can be denoted as f(Z | Θ) with mean μ and covariance matrix Σ.
𝑓 𝑋, 𝑌 𝜇, Σ = 𝑓
𝑋
𝑌
𝜇, Σ = 𝑓 𝑍 𝜇, Σ = 2𝜋 −
𝑛+𝑚
2 Σ −
1
2exp −
1
2
𝑋
𝑌
− 𝜇
𝑇
Σ−1(
𝑋
𝑌
− 𝜇) (4.1)
So we have Θ = (μ, Σ)T such that (Hardle & Simar, 2013, p. 156):
𝜇 =
𝜇1
𝜇2
Σ =
Σ11 Σ12
Σ21 Σ22
(4.2)
Indices “1” and “2” refer to X and Y. Conditional PDF f(X|Y, Θ) is specified (Hardle & Simar, 2013, p. 157):
𝑓 𝑋 𝑌, Θ = 2𝜋 −
𝑛
2 Σ12
−
1
2exp −
1
2
𝑋 − 𝜇12 𝑌
𝑇
Σ12
−1
(𝑋 − 𝜇12 𝑌 ) (4.3)
Where μ12(Y) is conditional mean of X given Y specified as follows (Hardle & Simar, 2013, p. 157):
𝜇12 𝑌 = 𝜇1 + Σ12Σ22
−1
𝑌 − 𝜇2 (4.4)
The condition mean μ12(Y) is the core of EM without CA. The covariance matrix of X given Y is specified as
follows (Hardle & Simar, 2013, p. 156):
Σ12 = Σ11 − Σ12Σ22
−1
Σ21 (4.5)
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 21
22. 4. Without combinatorial assumption
When both f(X, Y | Θ) and f(X | Y, Θ) are specified, the expectation Q(Θ | Θ(t)) is totally determined. At M-step of current
tth iteration, Q(Θ|Θ(t)) is maximized by setting its partial derivatives regarding Θ = (μ, Σ)T to be zero. The next parameter
μ(t+1) is solution of the equation
𝜕𝑄 Θ Θ 𝑡
𝜕𝜇
= 𝟎𝑇
as follows:
𝜇 𝑡+1
=
1
𝑁
𝑖=1
𝑁
𝜇12
𝑡
𝑌𝑖
𝑖=1
𝑁
𝑌𝑖
(4.6)
Note, μ12
(t)(Yi) denotes the conditional mean μ12(Yi) at the tth iteration. The next parameter Σ(t+1) is solution of the equation
𝜕𝑄 Θ Θ 𝑡
𝜕Σ
= 𝟎 as follows:
Σ 𝑡+1
= 𝜇 𝑡
𝜇 𝑡 𝑇
−
2𝜇 𝑡
𝑁
𝑖=1
𝑁
𝜇12
𝑡
𝑌𝑖
𝑇
𝑖=1
𝑁
𝑌𝑖
𝑇
𝑇
+
1
𝑁
𝑖=1
𝑁
Σ12 + 𝜇12
𝑡
𝑌𝑖 𝜇12
𝑡
𝑌𝑖
𝑇
𝑖=1
𝑁
𝜇12
𝑡
𝑌𝑖 𝑌𝑖
𝑇
𝑖=1
𝑁
𝜇12
𝑡
𝑌𝑖 𝑌𝑖
𝑇
𝑖=1
𝑁
𝑌𝑖𝑌𝑖
𝑇
(4.7)
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 22
23. 4. Without combinatorial assumption
In general, if there is no relationship between observed data and hidden
data, EM process without CA has two steps as follows:
E-step: Determining the condition means μ12(Yi) specified by equation
4.4 based on current parameter Θ(t).
𝜇12 𝑌𝑖 = 𝜇1 + Σ12Σ22
−1
𝑌𝑖 − 𝜇2
M-step: Calculating next parameters Θ(t+1) = (μ(t+1), Σ(t+1))T based on
μ12(Yi) determined in the E-step, specified by equations 4.6 and 4.7.
Therefore, estimation equations for EM without CA are much simpler
but it will not be precise as CA method in the case that there is a clear
regressive relationship between observed data and hidden data.
Moreover, it is not easy to specify the joint PDF f(X, Y | Θ) in case of
data heterogeneity.
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 23
24. 5. Discussions and conclusions
Although combinatorial assumption (CA) is subjective assumption, it will
reach high usefulness and high effectiveness if there is strong regressive
relationship between observed data and hidden data in many cases. The
regression function may not be linear but it is easy to convert nonlinear
functions into linear function in some cases, as follows:
• Logarithm function: 𝑦 = 𝛼0 + 𝑗=1
𝑛
𝛼𝑗log 𝑥𝑗
• Exponent function: 𝑦 = exp 𝛼0 + 𝑗=1
𝑛
𝛼𝑗log𝑥𝑗
• Product function: 𝑦 = 𝛼0 𝑗=1
𝑛
𝑥𝑗
𝛼𝑗
Given logarithm function, the transformation is 𝑦 = 𝛼0 + 𝛼𝑇
𝑈 and uj =
log(xj). For exponent function, the transformation is 𝑣 = 𝛼0 + 𝛼𝑇𝑋 and v =
log(y). For product function, the transformation is 𝑣 = 𝛽0 + 𝛼𝑇
𝑈, v = log(y),
uj = log(xj), and β0 = log(α0).
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 24
25. 5. Discussions and conclusions
Recall that the dimensions of X and Y are n and m, respectively. Note, if m ≥ n, there is no
information loss which is ideal case of CA method. Otherwise, if m < n, the important
parameters such as μ and A vary much with large amplitude due to information loss of
dimension reduction. The problem is called large-scale variation. Therefore, in practice
there should be restrictions on μ and A specified as constraints u(μ) ≤ 0 and v(A) ≤ 0; for
example, slope of regressive hyperplane specified by the normal vectors −𝐴𝑖, which
varies in a predefined interval, is a constraint. The solution is to apply Lagrange duality
method into maximizing Q(Θ|Θ(t)), in which a Lagrange function l(Θ, κ, λ | Θ(t)) is defined
as sum of Q(Θ|Θ(t)) and constraints u(μ) ≤ 0 and v(A) ≤ 0 as follows:
𝑙 Θ, 𝜅, λ Θ 𝑡
= 𝑄 Θ Θ 𝑡
+ 𝜅𝑢 𝜇 + λ𝑣 𝐴
Note Q(Θ|Θ(t)) is maximized indirectly by maximizing the Lagrange function l(Θ, κ, λ |
Θ(t)), in which κ ≥ 0 and λ ≥ 0 called Lagrange multipliers are concerned. Another simple
trick to alleviate the large-scale variation of μ and A is to initialize appropriate values μ(0)
and A(0) at the first iteration of EM process.
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 25
26. 5. Discussions and conclusions
An application of CA is to learn Bayesian parameter. According to Bayesian statistics, the
parameter Θ is considered as random variable. Given sample D = {X1, X2,…, XN) whose all
observations Xi (s) are iid, the posterior probability of Θ given D denoted P(Θ | D, ξ) is
calculated based on D and the prior probability of Θ denoted P(Θ | ξ).
𝑃 Θ 𝐷, 𝜉 =
𝑃 𝐷 Θ, 𝜉 𝑃 Θ 𝜉
𝑃 𝐷 𝜉
Where ξ denotes background knowledge about Θ which can be considered sub-parameter
but we do not focus on learning ξ. Let X be random variable representing all Xi. The most
popular method to learn Bayesian parameter is to use binomial sample and beta distribution.
Here CA method considers the parameter Θ as hidden data and random variable X as
observed data.
𝑋 = 𝛼0 + 𝛼𝑇
Θ
Therefore, the posterior probability P(Θ | D, ξ) is the same to the conditional PDF f(X | y, Θ)
aforementioned. After EM process is finished, f(X | y, Θ) or P(Θ | D, ξ) is obviously
determined.
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 26
27. 5. Discussions and conclusions
In general, CA is an extension of EM, whose strong point is
feasibility and simplicity with clearness of mathematic
formulas. Especially, in the ideal case that dimension of Y (y) is
larger than or equal to dimension of X, CA method will result
out best estimates because there is no information loss.
However, its drawback is the large-scale variation in
determining important parameters μ and A when dimension of
Y (y) is smaller than dimension of X. Therefore, in the future I
will research Lagrange duality method to maximize the
expectation Q(Θ|Θ(t)) with constraints on μ and A.
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 27
28. References
1. Barazandeh, B., Ghafelebashi, A., Razaviyayn, M., & Sriharsha, R. (2021, May 12). Efficient Algorithms for Estimating the Parameters of Mixed Linear
Regression Models. arXiv. Retrieved from https://arxiv.org/pdf/2105.05953
2. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. (M. Stone, Ed.) Journal of the Royal
Statistical Society, Series B (Methodological), 39(1), 1-38.
3. Faria, S., & Soromenho, G. (2009, February 11). Fitting mixtures of linear regressions. Journal of Statistical Computation and Simulation, 80(2), 201-225.
doi:10.1080/00949650802590261
4. Griffith, D. A., & Smith, A. (2010). Some simplifications for the expectationmaximization (em) algorithm: the linear regression model case. InterStat, 23.
doi:10.1.1.613.8033
5. Hardle, W., & Simar, L. (2013). Applied Multivariate Statistical Analysis. Berlin, Germany: Research Data Center, School of Business and Economics, Humboldt
University.
6. Kokic, P. (2002). The EM Algorithm for a Multivariate Regression Model: including its applications to a non-parametric regression model and a multivariate
time series model. University of York, Department of Computer Science. York: University of York. Retrieved from
https://www.cs.york.ac.uk/euredit/_temp/The%20Euredit%20Software/NAG%20Prototype%20platform/WorkingPaper4.pdf
7. Mathprof. (2013, March 22). Multidimensional Gaussian integral. (PlanetMath ) Retrieved November 23, 2021, from PlanetMath:
https://planetmath.org/multidimensionalgaussianintegral
8. Nguyen, L. (2020). Tutorial on EM algorithm. MDPI. Preprints. doi:10.20944/preprints201802.0131.v8
9. Saliba, G. (2016, July 21). Not understanding derivative of a matrix-matrix product. (Stack Overflow) Retrieved December 06, 2021, from Mathematics Stack
Exchange: https://math.stackexchange.com/questions/1866757/not-understanding-derivative-of-a-matrix-matrix-product
10.Wikipedia. (2021, December 3). List of integrals of exponential functions. (Wikimedia Foundation) Retrieved December 6, 2021, from Wikipedia website:
https://en.wikipedia.org/wiki/List_of_integrals_of_exponential_functions
11.Zhang, X., Deng, J., & Su, R. (2016). The EM algorithm for a linear regression model with application to a diabetes data. 2016 International Conference on
Progress in Informatics and Computing (PIC). Shanghai: IEEE. doi:10.1109/PIC.2016.7949477
12.Zhang, Z., & Rockette, H. E. (2006, November 4). An EM algorithm for regression analysis with incomplete covariate information. Journal of Statistical
Computation and Simulation. doi:10.1080/10629360600565202
June 4 2022 EM with CA - EcoSta2022 - Loc Nguyen 28
29. Thank you for attention
29
EM with CA - EcoSta2022 - Loc Nguyen
June 4 2022