The goal of this mini-project is to implement belief propagation algorithms for posterior probability inference and most probable explanation (MPE) inference for the Bayesian Network with binary values in which the Conditional Probability Table for each random-variable/node is given.
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
For more details, please have a look at:
1. https://www.mdpi.com/1099-4300/24/2/188
2. https://ieeexplore.ieee.org/document/9518004
Abstract:
In statistical inference, the information-theoretic performance limits can often be expressed in terms of a notion of divergence between the underlying statistical models (e.g., in binary hypothesis testing, the total error probability is equal to the total variation between the models). As the data dimension grows, computing the statistics involved in decision-making and the attendant performance limits (divergence measures) face complexity and stability challenges. Dimensionality reduction addresses these challenges at the expense of compromising the performance (divergence reduces due to the data processing inequality for divergence). This paper considers linear dimensionality reduction such that the divergence between the models is \emph{maximally} preserved. Specifically, the paper focuses on the Gaussian models and characterizes an optimal projection of the data onto a lower-dimensional subspace with respect to four $f$-divergence measures (Kullback-Leibler, $\chi^2$, Hellinger, and total variation). There are two key observations. First, projections are not necessarily along the dominant modes of the covariance matrix of the data, and even in some situations, they can be along the least dominant modes. Secondly, under specific regimes, the optimal design of subspace projection is identical under all the $f$-divergence measures considered, rendering a degree of universality to the design independent of the inference problem of interest.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
For more details, please have a look at:
1. https://www.mdpi.com/1099-4300/24/2/188
2. https://ieeexplore.ieee.org/document/9518004
Abstract:
In statistical inference, the information-theoretic performance limits can often be expressed in terms of a notion of divergence between the underlying statistical models (e.g., in binary hypothesis testing, the total error probability is equal to the total variation between the models). As the data dimension grows, computing the statistics involved in decision-making and the attendant performance limits (divergence measures) face complexity and stability challenges. Dimensionality reduction addresses these challenges at the expense of compromising the performance (divergence reduces due to the data processing inequality for divergence). This paper considers linear dimensionality reduction such that the divergence between the models is \emph{maximally} preserved. Specifically, the paper focuses on the Gaussian models and characterizes an optimal projection of the data onto a lower-dimensional subspace with respect to four $f$-divergence measures (Kullback-Leibler, $\chi^2$, Hellinger, and total variation). There are two key observations. First, projections are not necessarily along the dominant modes of the covariance matrix of the data, and even in some situations, they can be along the least dominant modes. Secondly, under specific regimes, the optimal design of subspace projection is identical under all the $f$-divergence measures considered, rendering a degree of universality to the design independent of the inference problem of interest.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
Efficient end-to-end learning for quantizable representationsNAVER Engineering
발표자: 정연우(서울대 박사과정)
발표일: 2018.7.
유사한 이미지 검색을 위해 neural network를 이용해 이미지의 embedding을 학습시킨다. 기존 연구에서는 검색 속도 증가를 위해 binary code의 hamming distance를 활용하지만 여전히 전체 데이터 셋을 검색해야 하며 정확도가 떨어지는 다는 단점이 있다. 이 논문에서는 sparse한 binary code를 학습하여 검색 정확도가 떨어지지 않으면서 검색 속도도 향상시키는 해쉬 테이블을 생성한다. 또한 mini-batch 상에서 optimal한 sparse binary code를 minimum cost flow problem을 통해 찾을 수 있음을 보였다. 우리의 방법은 Cifar-100과 ImageNet에서 precision@k, NMI에서 최고의 검색 정확도를 보였으며 각각 98× 와 478×의 검색 속도 증가가 있었다.
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
We present a generic dynamic programming method to compute the optimal clustering of n scalar elements into k pairwise disjoint intervals. This case includes 1D Euclidean k-means, k-medoids, k-medians, k-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate k by model selection. Finally, we illustrate and refine the method on two case studies: Bregman clustering and statistical mixture learning maximizing the complete likelihood.
http://arxiv.org/abs/1403.2485
Efficient Analysis of high-dimensional data in tensor formatsAlexander Litvinenko
We solve a PDE with uncertain coefficients. The solution is approximated in the Karhunen Loeve/PCE basis. How to compute maximum ? frequency? probability density function? with almost linear complexity? We offer various methods.
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Frank Nielsen
Slides for the paper:
On the Chi Square and Higher-Order Chi Distances for Approximating f-Divergences
published in IEEE SPL:
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6654274
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
Equational axioms for probability calculus and modelling of Likelihood ratio ...Advanced-Concepts-Team
Based on the theory of meadows an equational axiomatisation is given for probability functions on finite event spaces. Completeness of the axioms is stated with some pointers to how that is shown.Then a simplified model courtroom subjective probabilistic reasoning is provided in terms of a protocol with two proponents: the trier of fact (TOF, the judge), and the moderator of evidence (MOE, the scientific witness). Then the idea is outlined of performing of a step of Bayesian reasoning by way of applying a transformation of the subjective probability function of TOF on the basis of different pieces of information obtained from MOE. The central role of the so-called Adams transformation is outlined. A simple protocol is considered where MOE transfers to TOF first a likelihood ratio for a hypothesis H and a potential piece of evidence E and thereupon the additional assertion that E holds true. As an alternative a second protocol is considered where MOE transfers two successive likelihoods (the quotient of both being the mentioned ratio) followed with the factuality of E. It is outlined how the Adams transformation allows to describe information processing at TOF side in both protocols and that the resulting probability distribution is the same in both cases. Finally it is indicated how the Adams transformation also allows the required update of subjective probability at MOE side so that both sides in the protocol may be assumed to comply with the demands of subjective probability.
Efficient end-to-end learning for quantizable representationsNAVER Engineering
발표자: 정연우(서울대 박사과정)
발표일: 2018.7.
유사한 이미지 검색을 위해 neural network를 이용해 이미지의 embedding을 학습시킨다. 기존 연구에서는 검색 속도 증가를 위해 binary code의 hamming distance를 활용하지만 여전히 전체 데이터 셋을 검색해야 하며 정확도가 떨어지는 다는 단점이 있다. 이 논문에서는 sparse한 binary code를 학습하여 검색 정확도가 떨어지지 않으면서 검색 속도도 향상시키는 해쉬 테이블을 생성한다. 또한 mini-batch 상에서 optimal한 sparse binary code를 minimum cost flow problem을 통해 찾을 수 있음을 보였다. 우리의 방법은 Cifar-100과 ImageNet에서 precision@k, NMI에서 최고의 검색 정확도를 보였으며 각각 98× 와 478×의 검색 속도 증가가 있었다.
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
We present a generic dynamic programming method to compute the optimal clustering of n scalar elements into k pairwise disjoint intervals. This case includes 1D Euclidean k-means, k-medoids, k-medians, k-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate k by model selection. Finally, we illustrate and refine the method on two case studies: Bregman clustering and statistical mixture learning maximizing the complete likelihood.
http://arxiv.org/abs/1403.2485
Efficient Analysis of high-dimensional data in tensor formatsAlexander Litvinenko
We solve a PDE with uncertain coefficients. The solution is approximated in the Karhunen Loeve/PCE basis. How to compute maximum ? frequency? probability density function? with almost linear complexity? We offer various methods.
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Frank Nielsen
Slides for the paper:
On the Chi Square and Higher-Order Chi Distances for Approximating f-Divergences
published in IEEE SPL:
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6654274
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
Equational axioms for probability calculus and modelling of Likelihood ratio ...Advanced-Concepts-Team
Based on the theory of meadows an equational axiomatisation is given for probability functions on finite event spaces. Completeness of the axioms is stated with some pointers to how that is shown.Then a simplified model courtroom subjective probabilistic reasoning is provided in terms of a protocol with two proponents: the trier of fact (TOF, the judge), and the moderator of evidence (MOE, the scientific witness). Then the idea is outlined of performing of a step of Bayesian reasoning by way of applying a transformation of the subjective probability function of TOF on the basis of different pieces of information obtained from MOE. The central role of the so-called Adams transformation is outlined. A simple protocol is considered where MOE transfers to TOF first a likelihood ratio for a hypothesis H and a potential piece of evidence E and thereupon the additional assertion that E holds true. As an alternative a second protocol is considered where MOE transfers two successive likelihoods (the quotient of both being the mentioned ratio) followed with the factuality of E. It is outlined how the Adams transformation allows to describe information processing at TOF side in both protocols and that the resulting probability distribution is the same in both cases. Finally it is indicated how the Adams transformation also allows the required update of subjective probability at MOE side so that both sides in the protocol may be assumed to comply with the demands of subjective probability.
Talk presented on this workshop "Workshop: Imaging With Uncertainty Quantification (IUQ), September 2022",
https://people.compute.dtu.dk/pcha/CUQI/IUQworkshop.html
We consider a weakly supervised classification problem. It
is a classification problem where the target variable can be unknown
or uncertain for some subset of samples. This problem appears when
the labeling is impossible, time-consuming, or expensive. Noisy measurements
and lack of data may prevent accurate labeling. Our task
is to build an optimal classification function. For this, we construct and
minimize a specific objective function, which includes the fitting error on
labeled data and a smoothness term. Next, we use covariance and radial AQ1
basis functions to define the degree of similarity between points. The further
process involves the repeated solution of an extensive linear system
with the graph Laplacian operator. To speed up this solution process,
we introduce low-rank approximation techniques. We call the resulting
algorithm WSC-LR. Then we use the WSC-LR algorithm for analysis
CT brain scans to recognize ischemic stroke disease. We also compare
WSC-LR with other well-known machine learning algorithms.
I am Bon Leofen Currently associated with economicshomeworkhelper.com as an economics homework helper. After completing my master's at Ambrose University, I was in search of an opportunity that would expand my area of knowledge hence I decided to help students with their assignments. I have written several economics assignments to date to help students overcome numerous difficulties they face.
After we applied the stochastic Galerkin method to solve stochastic PDE, and solve large linear system, we obtain stochastic solution (random field), which is represented in Karhunen Loeve and PCE basis. No sampling error is involved, only algebraic truncation error. Now we would like to escape classical MCMC path to compute the posterior. We develop an Bayesian* update formula for KLE-PCE coefficients.
I am Josh U. I am a Probabilistic Systems Exam Helper at statisticsexamhelp.com. I hold a Masters' Degree in Statistics, from the University of Southampton, UK. I have been helping students with their exams for the past 5 years. You can hire me to take your exam in Probabilistic Systems.
Visit statisticsexamhelp.com or email support@statisticsexamhelp.com. You can also call on +1 678 648 4277 for any assistance with the Probabilistic Systems Exam.
I am Craig D. I am a Probabilistic Assignment Expert at statisticsassignmenthelp.com. I hold a master's in Statistics from Malacca, Malaysia.
I have been helping students with their assignments for the past 6 years. I solve assignments related to Probabilistic Systems Analysis.
Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Probabilistic Systems Analysis Assignments.
Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements.
Probability theory is the proper mechanism for accounting for uncertainty.
Connections b/w active learning and model extractionAnmol Dwivedi
Codes on https://github.com/anmold-07/Model-Extraction-with-RL
https://www.usenix.org/conference/usenixsecurity20/presentation/chandrasekaran
This paper formalizes model extraction and discusses possible defense strategies by drawing parallels between model extraction and an established area of active learning. In particular, the authors show that recent advancements in the active learning domain can be used to implement powerful model extraction attacks and investigate possible defense strategies.
Tutorial on Markov Random Fields (MRFs) for Computer Vision ApplicationsAnmol Dwivedi
The goal of this mini-project is to implement a pairwise binary label-observation Markov Random Field
model for bi-level image segmentation. Specifically, two inference algorithms, i.e., the Iterative
Conditional Mode (ICM) and Gibbs sampling methods will be implemented to perform image segmentation.
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)Anmol Dwivedi
This mini-project will consider performing inference and learning in Linear Chain CRFs. In particular, it will consider an application to hand-written word recognition. Handwritten word recognition is a task many have explored with different methods of machine learning. Some written characters can be evaluated individually or as a whole word to account for the context in characters. In this mini-project, we use linear chain CRF models to account for context between the characters of a word to improve word recognition accuracy.
Voltage stability Analysis using GridCalAnmol Dwivedi
Power system voltage stability is characterized as being capable of maintaining load voltage magnitudes within specified operating limits under steady state conditions. This presentation deals with the modeling of two standard power systems test cases i.e the Nordic-32 and the Nordic-68, comparing the power flows results obtained from GridCal against PSS/E, finding the respective P-V curves for the two test cases using the continuation power flow under contingencies, and finally proposing a graph-based test statistic which can be used for an imminent voltage instability. The simulations are carried out using an open-source power system software called GridCal and the scripts for this project are written in python.
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Anmol Dwivedi
Find the code on: https://github.com/anmold07/Graphical_Models/tree/master/CRF%20Learning
Probabilistic Graphical Models (PGMs) provides a general
framework to model dependencies among the output variables. Among the family of graphical models include Neural Networks, Markov Networks, Ising Models, factor graphs, Bayesian Networks etc, however, this project considers linear chain Conditional Random Fields to learn the inter-dependencies among the output variables for efficient classification of handwritten word recognition. Such models are capable of representing a complex distribution over multivariate distributions as a product of local factor functions.
Find all the relevant code on: https://github.com/anmold-07/Graphical_Models
Simulated Annealing for Optimal Power Flow (OPF)Anmol Dwivedi
Code available at: https://github.com/anmold-07/Optimal-Power-Flow-using-Simulated-Annealing
In this report, the algorithm is tested on standard functions like the Himmelblau’s function, Easom function, and Ackley function. Further, the algorithm is used to solve an Optimal Power Flow (OPF) with an objective to minimize the transmission line losses for an IEEE 14 bus and 30 bus systems. The proposed methodology has been tested with IEEE 14 bus and 30 bus networks and finally, the results section includes the conclusions of the work.
A tutorial on LDA that first builds on the intuition of the algorithm followed by a numerical example that is solved using MATLAB. This presentation is an audio-slide, which becomes self-explanatory if downloaded and viewed in slideshow mode.
Detection of Generator Loss of Excitation (LOE)Anmol Dwivedi
This presentation analyzes the behavior of different LOE protection schemes, such as R-X, R-X directional and G-B, for a salient pole generator, which is connected to an infinite bus bar.
Rotor Resistance Control of Wound Rotor Induction Generator (WRIG) using PSCA...Anmol Dwivedi
The primary objective of the carried out mini-project is to develop a variable slip (Type-2) wind turbine. Models in this report include representations of general turbine aerodynamics, the mechanical drive-train, and the electrical characteristics of the induction generator as well as the control systems used. This mini-project was part of the course Wind And Solar Electrical Systems (EEPE-32) at National Institute of Technology Tiruchirappalli (NIT Trichy), India.
IEEE International Conference PresentationAnmol Dwivedi
IEEE INTERNATIONAL CONFERENCE -
Paper Title "Real-Time Implementation of Phasor Measurement Unit Using NI CompactRIO".
Code Available on: https://github.com/anmold-07/Synchrophasor-Estimation
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Tutorial on Belief Propagation in Bayesian Networks
1. Project-1: Belief Propagation
Anmol Dwivedi, RIN: 661982035
1 Introduction
The goal of Bayesian Network inference is to evaluate the probability distribution over some set of variables or evaluate
their most likley states, given the values of other set of variables. In general, there are four types of inference:
• Posterior probability inference
• Maximum a Posteriori (MAP) inference
• Most Probable Explanation (MPE) inference
• Model Likelihood (ML) inference
Assume a bayesian network G, parameterized by Θ, over random variables X = {XU , XE} where the XU denotes
the set of unknown nodes and XE denotes the set of evidence nodes that are realized.
Then the problem of posterior probability inference, or the sum-product inference, deals with evaluating P(Xq|XE)
where Xq denotes the query variables such that Xq ⊆ XU .
The problem of Maximum a Posteriori (MAP) inference, or the max-sum inference, deals with finding out Xq
∗
=
arg maxXq
P(Xq|XE), i.e., the most likely configuration for the query variables given evidence.
The problem of Most Probable Explanation (MPE) inference, or the max-product inference, deals with finding
out XU
∗
= arg maxXU
P(XU |XE), i.e., the most likely configuration for all the unknown random variables given
evidence.
The problem of Model Likelihood (ML) inference is to evaluate the liklihood of the model in terms of its structure
and parameters for a given set of observations. i,e., P(XE|Θ, G). Usually done to select the model, or bayesian net-
work, that best explains the observations XE.
For example, for the bayesian network given in Figure 1, and asusuming that the random variables F, X are
realized, the query variables XU = {S, A, B, L, T, E}, XE = {X, F} and Xq ⊆ XU .
2 Problem Statement
The goal of this project is to implement belief propagation algorithms for posterior probability and most probable
explanation (MPE) inference for the Bayesian Network given in Figure 1 where each node is binary, assuming a value
of 1 or 0 and CPT for each node is given. Specifically, there are two problems that this project addresses:
• First is to implement marginal posterior probability (sum-product) inference algorithm using MATLAB to com-
pute P(Xq|F = 1, X = 0) where Xq is the query variable that belongs to the set Xq ∈ {S, A, B, L, T, E}.
• Second is to implement the MPE (max-product) inference algorithm to compute s∗
, a∗
, b∗
l∗
t∗
e∗
= arg maxs,a,b,l,t,e
P(s, a, b, l, t, e|F = 1, X = 0)
1
2. Figure 1
3 Theory
The general elimination algorithm for inference allows us to compute a single marginal distribution or in other words,
perform inference for a given random variable at a time. That means that the algorithm must be run separately for each
node to perform inference for multiple random variables leading to additional complexity. Belief propagation is an
algorithm for exact inference on directed graphs without loops. The belief propagation algorithm allows us to compute
all the marginal and conditionals by computing intermediate factors commonly known as messages that are common
to many elimination orderings. It efficiently computes all the marginals especially in the special case singly connected
graph. A DAG is singly connected if its underlying undirected graph is a tree, i.e., there is only one undirected path
between any two nodes. A generalized version of belief propagation for arbitrary graphs is also known as the junction
tree algorithm.
The idea behind belief propagation for tree structured graphs can be summarized as follows: Since we are dealing
with singly connected graphs, every node X in the DAG G divides the evidence E into upstream and downstream
evidence. We denote E+
part of the evidence accessible through the parents of node X, also referred to as the prior
message that we denote by π and analogously we denote E−
part of the diagnostic evidence through the children of
node X, also referred to as the likelihood message that we denote by λ. The fresh idea in this method is that the
evidence bisected into two E+
and E−
when viewed from the perspective of any arbitrary node in the the DAG G.
Now in order to evaluate the probability of variable X, we have the following:
P(X|E) =
P(X, E)
P(E)
=
P(X, E+
, E−
)
P(E+, E−)
= α · P(X|E+
) · P(E−
|X, E+
) (1)
= α · P(X|E+
) · P(E−
|X) (2)
= α · π(X) · λ(X) (3)
= Belief(X) (4)
3.1 Sum-Product Inference
Where the π(X) denotes the prior message that is the prior that is received from the parents of RV X, λ(X) denotes
the liklihood message that is the liklihood received form the children of RV X and α is the normalization constant. The
advantage of this approach is that each of π(X) and λ(X) can be computed via the local message passing algorithm
between nodes in the DAG as follows:
2
3. π(X) = P(X|E+
) =
P arents(X)
P(X|Parents(X)) · P(Parents(X)|E+
) (5)
=
P arents(X)
P(X|Parents(X)) · πP arents(X)(X) (6)
=
P1,P2,...,PN
P(X|P1, P2, . . . , PN ) ·
Pi
πPi (X) (7)
and similarly,
λ(X) = P(E−
|X) = λChildren(X)(X) (8)
=
Ci
λCi
(X) (9)
Where the messages received by X from its parents are denoted by πPi (X) and the messages received by X from
its children is denoted by λCi (X).
πPi
(X) = π(Pi) ·
C∈Children(Pi)X
λC(Pi) (10)
λCi
(X) =
Ci
λ(Ci)
uk∈P arents(Ci)X
P(Ci|X, u1, . . . , uN )
k
πuk
(Ci) (11)
Before we discuss the algorithm, we discuss the initializations that must be done before the start of the belief
propagation algorithm.
Algorithm 1 Sum-Product Algorithm [1]
1: Initialization step for evidence nodes: ∀X ∈ XE, if xi = ei λ(xi) = 1, π(xi) = 1 , otherwise λ(xi) =
0, π(xi) = 0
2: Initialization step for nodes without parents: λ(xi) = 1, π(xi) = P(xi)
3: Initialization step for nodes without children: λ(xi) = 1
4: while Until no change in messages do
5: Compute the message π message from each of the parents πPi
(X) (10)
6: Calculate the total parent message π(X) (5)
7: Compute the λ message from each λCi (X) (11)
8: Calculate the total message λ(X) (8)
9: Update belief Belief(X) and normalize.
10: end while
3.2 Max-Product Inference
Replacing the sum by the max operator, the equation for the max-product messages are given by:
π(X) = P(X|E+
) = max
P1,P2,...,PN
P(X|P1, P2, . . . , PN ) ·
Pi
πPi (X) (12)
3
4. λ(X) = P(E−
|X) =
Ci
λCi
(X) (13)
πPi (X) = π(Pi) ·
C∈Children(Pi)X
λC(Pi) (14)
λCi
(X) = max
Ci
λ(Ci) max
uk∈P arents(Ci)X
P(Ci|X, u1, . . . , uN )
k
πuk
(Ci) (15)
Algorithm 2 Max-Product Algorithm [1]
1: Initialization step for evidence nodes: ∀X ∈ XE, if xi = ei λ(xi) = 1, π(xi) = 1 , otherwise λ(xi) =
0, π(xi) = 0
2: Initialization step for nodes without parents: λ(xi) = 1, π(xi) = P(xi)
3: Initialization step for nodes without children: λ(xi) = 1
4: while Until no change in messages do
5: Compute the message π message from each of the parents πPi
(X) (14)
6: Calculate the total parent message π(X) (12)
7: Compute the λ message from each λCi
(X) (15)
8: Calculate the total message λ(X) (13)
9: Update belief Belief(X) and find the best state for each node independently by finding the state that maximizes
its marginal for each node in G.
10: end while
4 Experiments
The following are the simplified equations for the sum-product algorithm that I have implemented for the DAG in
Figure 1. In the code attached, I compute the each of the messages in the following order to obtain beliefs for the
DAG.
π(T) =
A
P(T|A) · π(A) (16)
π(T) =
L,T
P(E|L, T) · π(L) · π(T) (17)
π(B) =
S
P(B|S) · π(S) (18)
λ(B) =
F
λ(F)
E
P(F|B, E) · π(E) ·
E
P(X|E)λ(X) (19)
λ(E) =
F
λ(F)
B
P(F|B, E) · π(E) ·
X
P(X|E)λ(X) (20)
4
5. λ(T) =
E
λ(E)
L
P(E|L, T) · π(L) (21)
λ(S) =
B
P(B|S) · λ(B) (22)
λ(L) =
E
λ(E)
T
P(E|L, T) · π(T) (23)
λ(A) =
T
λ(T) · P(T|A) (24)
Replacing the sum by max gives the equations for the Max-Product algorithms.
Simulation results are as follows:
Listing 1: Output for ’main.m’
1 "Marginal Posterior Probability given evidence P(A=1| F=1, X=0) is 0.56401"
2
3 "Marginal Posterior Probability given evidence P(A=0| F=1, X=0) is 0.43599"
4
5
6 "Marginal Posterior Probability given evidence P(S=1| F=1, X=0) is 0.83893"
7
8 "Marginal Posterior Probability given evidence P(S=0| F=1, X=0) is 0.16107"
9
10
11 "Marginal Posterior Probability given evidence P(L=1| F=1, X=0) is 0.75544"
12
13 "Marginal Posterior Probability given evidence P(L=0| F=1, X=0) is 0.24456"
14
15
16 "Marginal Posterior Probability given evidence P(T=1| F=1, X=0) is 0.52802"
17
18 "Marginal Posterior Probability given evidence P(T=0| F=1, X=0) is 0.47198"
19
20
21 "Marginal Posterior Probability given evidence P(E=1| F=1, X=0) is 0.43043"
22
23 "Marginal Posterior Probability given evidence P(E=0| F=1, X=0) is 0.56957"
24
25
26 "Marginal Posterior Probability given evidence P(B=1| F=1, X=0) is 0.76332"
27
28 "Marginal Posterior Probability given evidence P(B=0| F=1, X=0) is 0.23668"
Listing 2: Output for mainMaxProduct
1 "Random Variable A takes MPE assignment 0"
2
3 "Random Variable S takes MPE assignment 1"
5
6. 4
5 "Random Variable L takes MPE assignment 1"
6
7 "Random Variable T takes MPE assignment 0"
8
9 "Random Variable E takes MPE assignment 0"
10
11 "Random Variable B takes MPE assignment 1"
5 Conclusion
This project performs MPE and posterior probability inference for the DAG in Figure 1 using belief propagation
algorithm in MATLAB. Issues I encountered were figuring out how to store the the probability values in an appropriate
data structure in order to efficiently compute the factor product and then marginalize out relevant variables to complete
inference. Learnt about various algorithms in order to perform inference in DAGs.
6 Appendix
The idea behind my code is that I am considering the conditional probabailty table values and the messages π(.), λ(.)
as a factor. I am storing these factors in a struct data structure in MATLAB such that each factor has fields such
as its name (stored in field V arName), the number of values it takes, binary in this homework (stored in the field
Cardinality) and the conditional probability values (stored in the field ProbabilityV alues). In order to evaluate a
message π(.), λ(.), I break the operations into two steps. First I carry out the factor product and then I marginalize the
relevant variable needed to obtain that particular message.
In order to reproduce results, run then ”main.m” file under both folders Sum Product and Max Product. All other
files are function that are called from the main.m file.
Listing 3: file ’main.m’ under folder SumProduct
1 clc
2 clear all
3
4 % Dictionary for storing the Random Variable representation as their equivalent numerical ←
values;
5 keys = {’A’, ’S’, ’L’, ’T’, ’E’, ’X’, ’B’, ’F’};
6 value = {1, 2, 3, 4, 5, 6, 7, 8};
7 Let_to_Num = containers.Map(keys, value);
8 Num_to_Let = containers.Map(value, keys);
9
10 % Input the probability tables;
11 probab.input(1) = struct(’Var_Name’, [Let_to_Num(’A’)], ’Cardinality’, [2], ’Probability_Values←
’, [0.6 0.4]);
12 probab.input(2) = struct(’Var_Name’, [Let_to_Num(’S’)], ’Cardinality’, [2], ’Probability_Values←
’, [0.8 0.2]);
13 probab.input(3) = struct(’Var_Name’, [Let_to_Num(’L’)], ’Cardinality’, [2], ’Probability_Values←
’, [0.8 0.2]);
14 probab.input(4) = struct(’Var_Name’, [Let_to_Num(’T’), Let_to_Num(’A’)], ’Cardinality’, [2, 2],←
’Probability_Values’, [0.8 0.2 0.3 0.7]);
15 probab.input(5) = struct(’Var_Name’, [Let_to_Num(’E’), Let_to_Num(’L’), Let_to_Num(’T’)], ’←
Cardinality’, [2, 2, 2], ’Probability_Values’, [0.8 0.2 0.7 0.3 0.6 0.4 0.1 0.9]);
16 probab.input(6) = struct(’Var_Name’, [Let_to_Num(’X’), Let_to_Num(’E’)], ’Cardinality’, [2, 2],←
’Probability_Values’, [0.8 0.2 0.1 0.9]);
6
8. 68 temp_1 = Sum_Marginalization(temp, [Let_to_Num(’L’)]);
69 temp_2 = Product(temp_1, LAMBDA.E);
70 LAMBDA_E_TO_T = Sum_Marginalization(temp_2, [Let_to_Num(’E’)]);
71 LAMBDA.T = LAMBDA_E_TO_T;
72
73 % Computing LAMBDA(S)
74 temp = Product(probab.input(7), LAMBDA.B);
75 LAMBDA_B_TO_S = Sum_Marginalization(temp, [Let_to_Num(’B’)]);
76 LAMBDA.S = LAMBDA_B_TO_S;
77
78 % Computing LAMBDA(L)
79 temp = Product(probab.input(5), PI.T);
80 temp_1 = Sum_Marginalization(temp, [Let_to_Num(’T’)]);
81 temp_2 = Product(temp_1, LAMBDA.E);
82 LAMBDA_T_TO_E = Sum_Marginalization(temp_2, [Let_to_Num(’E’)]);
83 LAMBDA.L = LAMBDA_T_TO_E;
84
85 % Computing LAMBDA(A)
86 temp = Product(probab.input(4), LAMBDA.T);
87 LAMBDA_T_TO_A = Sum_Marginalization(temp, [Let_to_Num(’T’)]);
88 LAMBDA.A = LAMBDA_T_TO_A;
89
90 % Calculating Beliefs, Normalizing them in order to obtain the marginal probability given ←
evidence
91 temp = Product(LAMBDA.A, PI.A);
92 A_1 = normalization(temp.Probability_Values);
93 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(1), ’=1| F=1, ←
X=0) is ’, " ", string(A_1(1))))
94 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(1), ’=0| F=1, ←
X=0) is ’, " ", string(A_1(2))))
95 fprintf(’nn’)
96
97 temp = Product(LAMBDA.S, PI.S);
98 S_2 = normalization(temp.Probability_Values);
99 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(2), ’=1| F=1, ←
X=0) is ’, " ", string(S_2(1))))
100 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(2), ’=0| F=1, ←
X=0) is ’, " ", string(S_2(2))))
101 fprintf(’nn’)
102
103 temp = Product(LAMBDA.L, PI.L);
104 L_3 = normalization(temp.Probability_Values);
105 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(3), ’=1| F=1, ←
X=0) is ’, " ", string(L_3(1))))
106 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(3), ’=0| F=1, ←
X=0) is ’, " ", string(L_3(2))))
107 fprintf(’nn’)
108
109 temp = Product(LAMBDA.T, PI.T);
110 T_4 = normalization(temp.Probability_Values);
111 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(4), ’=1| F=1, ←
X=0) is ’, " ", string(T_4(1))))
112 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(4), ’=0| F=1, ←
X=0) is ’, " ", string(T_4(2))))
113 fprintf(’nn’)
114
115 temp = Product(LAMBDA.E, PI.E);
116 E_5 = normalization(temp.Probability_Values);
117 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(5), ’=1| F=1, ←
X=0) is ’, " ", string(E_5(1))))
118 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(5), ’=0| F=1, ←
X=0) is ’, " ", string(E_5(2))))
8
9. 119 fprintf(’nn’)
120
121 temp = Product(LAMBDA.B, PI.B);
122 B_7 = normalization(temp.Probability_Values);
123 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(7), ’=1| F=1, ←
X=0) is ’, " ", string(B_7(1))))
124 display(strcat(’Marginal Posterior Probability given evidence’," P(", Num_to_Let(7), ’=0| F=1, ←
X=0) is ’, " ", string(B_7(2))))
125 fprintf(’nn’)
By default, I store the probability table values in such a way that follows the following rule. Consider the condi-
tional distribution over varibles X1|X2, X3 each of which takes two values 0 and 1. In order to store these values I
store it such the following form pertaining to the particular order of the input variables. You can cross check how to
feed in values for the homework assignment at the start of the main.m file.
Listing 4: Illustration for storing CPD values in the field Probability Values of the structure for each variable
1 % -+-----+-----+-----+----------------------+
2 % | X_1 | X_2 | X_3 | factor(X_1, X_2, X_3)|
3 % -+-----+-----+-----+----------------------+
4 % | 1 | 1 | 1 | factor.val(1) |
5 % -+-----+-----+-----+----------------------+
6 % | 0 | 1 | 1 | factor.val(2) |
7 % -+-----+-----+-----+----------------------+
8 % | 1 | 0 | 1 | factor.val(3) |
9 % -+-----+-----+-----+----------------------+
10 % | 0 | 0 | 1 | factor.val(4) |
11 % -+-----+-----+-----+----------------------+
12 % | 1 | 1 | 0 | factor.val(5) |
13 % -+-----+-----+-----+----------------------+
14 % | 0 | 1 | 0 | factor.val(6) |
15 % -+-----+-----+-----+----------------------+
16 % | 1 | 0 | 0 | factor.val(7) |
17 % -+-----+-----+-----+----------------------+
18 % | 0 | 0 | 0 | factor.val(8) |
19 % -+-----+-----+-----+----------------------+
Hence, in order to convert a given index i ∈ [1, 8] into an assignment for the RVs X1, X2 and X3, I write the
following function that takes the index and the length of assignment as input and return the assignments for the
random variables as an array.
Listing 5: MATLAB function for conversion to assignment
1 function assignment = conv_to_assignment(index, length_of_assignment)
2
3 assignment = [];
4 for i = 1:length(index)
5 temp = flip(de2bi(index(i)-1,log(length_of_assignment)/log(2),’left-msb’)) + 1;
6 assignment = [assignment; temp];
7 end
8
9 end
On the other hand, in order to convert an assignment for a set of random variables X1, X2, X3 etc. in to an index
i, I write the following function that takes the assignemnt and the length as input and returns the index for that specific
assignment.
9
10. Listing 6: MATLAB function for conversion to index
1 function index = conv_to_index(A, len)
2
3 % Given Input Vector Consisting of Sequence
4 size_temp = size(A);
5 for j = 1:size_temp(1)
6 temp = 0;
7 for i = 1:size_temp(2)
8 temp = temp + (A(j, i) - 1)*(2ˆ(i-1));
9 end
10 index(j, 1) = temp + 1;
11 end
12 end
This function just carries out normalization for an input vector.
Listing 7: MATLAB function to carry out normalization
1 function B = normalization(A)
2
3 B = zeros(2, 1);
4 TEMP_1 = A(1);
5 TEMP_2 = A(2);
6 B(1) = TEMP_1/(TEMP_1 + TEMP_2);
7 B(2) = TEMP_2/(TEMP_1 + TEMP_2);
8 return
9 end
This function just carries out marginalization given a factor A and the marginalization variable V such that V is
summed out from the variables present in factor A.
Listing 8: MATLAB function to carry out Sum Marginalization
1 function B = Sum_Marginalization(A, V)
2
3 [B.Var_Name, mapB] = setdiff(A.Var_Name, V);
4 B.Cardinality = A.Cardinality(mapB);
5 B.Probability_Values = zeros(1, prod(B.Cardinality));
6
7 assignments = conv_to_assignment(1:length(A.Probability_Values), prod(A.Cardinality));
8 indxB = conv_to_index(assignments(:, mapB), B.Cardinality);
9
10 for i=1:prod(B.Cardinality)
11 for j=1:prod(A.Cardinality)
12 if indxB(j)==i
13 B.Probability_Values(i)=B.Probability_Values(i)+A.Probability_Values(j);
14 end
15 end
16 end
17
18 end
This function just carries out the product between two input factors A and B.
10
11. Listing 9: MATLAB function to carry out the Product of two given probability tables
1
2 function C = Product(A, B)
3
4 C.Var_Name = union(A.Var_Name, B.Var_Name);
5
6 [dummy, mapA] = ismember(A.Var_Name, C.Var_Name);
7 [dummy, mapB] = ismember(B.Var_Name, C.Var_Name);
8
9 C.Cardinality = zeros(1, length(C.Var_Name));
10 C.Cardinality(mapA) = A.Cardinality;
11 C.Cardinality(mapB) = B.Cardinality;
12
13 assignments = conv_to_assignment(1:prod(C.Cardinality), prod(C.Cardinality)); % Creating the ←
assignment table for each index of the DESIRED .val representation
14
15 assignments(:, mapA);
16 assignments(:, mapB);
17
18 indxA = conv_to_index(assignments(:, mapA), A.Cardinality); % in order to access values as ←
given in A structure
19 indxB = conv_to_index(assignments(:, mapB), B.Cardinality);
20
21 for i=1:prod(C.Cardinality)
22 C.Probability_Values(i)=A.Probability_Values(indxA(i))*B.Probability_Values(indxB(i));
23 end
24
25 end
11