SlideShare a Scribd company logo
1 of 40
Chapter 8
Reviewer : Sunwoo Kim
Christopher M. Bishop
Pattern Recognition and Machine Learning
Yonsei University
Department of Applied Statistics
Chapter 8. Probabilistic Graphical Models
2
Expressing probability in a simple way
Consider following joint probability
๐‘(๐‘ฅ1)๐‘ ๐‘ฅ2 ๐‘ฅ1 ๐‘ ๐‘ฅ3 ๐‘ฅ2, ๐‘ฅ1 ๐‘ ๐‘ฅ4 ๐‘ฅ3, ๐‘ฅ2, ๐‘ฅ1 โ€ฆ ๐‘(๐‘ฅ๐‘˜|๐‘ฅ๐‘˜โˆ’1, โ€ฆ , ๐‘ฅ1)
What is it equal to?
Answer is ๐‘(๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘˜)
Isnโ€™t it really troublesome to write all these variables in such a way. Thus, we can think of easy way by using visualization tool,
Which is called a probabilistic graphical model.
Node : Random Variable
Edge : Probabilistic
relationships
Directed graphical models : A graph which has a direction in edge.
- Good in capturing casual relationships(conditional terms)
Undirected graphical models : A graph which its edge does not carry direction.
- Good in expressing soft constraints
Now, letโ€™s take an example!
Chapter 8.1. Bayesian Networks
3
Modeling joint probability
Basic idea of graphical model can be explained by this simple example!
๐’‘ ๐’‚, ๐’ƒ, ๐’„ = ๐’‘ ๐’„ ๐’‚, ๐’ƒ ๐’‘ ๐’ƒ ๐’‚ ๐’‘(๐’‚)
Note that right-hand side of an equation is not symmetric anymore.
Arrow direction : From conditional term to random variable
For the complicated model, such as
๐‘ ๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐พ = ๐‘ ๐‘ฅ๐พ ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐พโˆ’1 โ€ฆ ๐‘ ๐‘ฅ2 ๐‘ฅ1 ๐‘(๐‘ฅ1)
This is called a fully connected, since there is a link between every pair.
There are some much complicated form likeโ€ฆ
In general, we can sayโ€ฆ
Like left figure, if there does not exist any cycle, such
network is called directed acyclic network.
Chapter 8.1. Bayesian Networks
4
Polynomial regression
Letโ€™s think of Bayesian polynomial regression with N-independent data. We assume a distribution of parameter ๐’˜.
Overall equation can be left-hand side equation, and corresponding figures are right-hand side.
Original form
Simplified form
Original equation
Letโ€™s think of model with more parameters (param for prior and variance.)
Note that this blue box indicates N
number of observations, and they
are expressed in a form of joint
product!
Chapter 8.1. Bayesian Networks
5
Representation of Observed data
Data ๐’• might be observed, or not.
Left hand side expresses the general form, and the right one shows the case of observed data.
Suppose we are trying to predict a new data ๐‘ฅ!
Here, joint probability can be expressed by
To exactly generate predictive distribution, we needโ€ฆ
Remember some Laplace approximation and other
integral methods!
Chapter 8.1. Bayesian Networks
6
Generative models
In chapter 11, we are going to cover some sampling methods.
We may need to sample data from a distribution!
For example, from joint probability ๐‘(๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐พ), we can generate ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐พ
Not only just using above full equation, we can perform it iteratively, starting from ๐‘ฅ1.
In image, such as,
Likewise, we consider there is a latent variable beneath the observed
data & distribution!
We may interpret hidden variable like in image, but sometimes we cant.
Still, it is useful in modeling some complicated probability models!
Chapter 8.1. Bayesian Networks
7
Discrete variables
- Exponential family :
- Many famous distributions are exponential family, and they form useful building blocks for constructing more complex probability!
- If we choose such distributions as parent & child node of graph, we can get many nice properties!
- Letโ€™s take a look!
Consider multinomial distribution.
There exist a constraint of ๐œ‡๐‘˜ = 1
Letโ€™s extend this univariate example to two-variables case.
That is, we are observing event of ๐‘ฅ1๐‘˜ = 1 & ๐‘ฅ2๐‘™ = 1 / Note that they are not totally independent! Joint probability is not just product of ๐๐’Œ โˆ— ๐๐’
In this case, there exist ๐พ2
โˆ’ 1 number of parameters!
For general case of ๐‘€ variables, we have ๐พ๐‘€
โˆ’ 1.
Canโ€™t we figure this exponential growth
problem??
Chapter 8.1. Bayesian Networks
8
Independence
We can figure it out by assuming independence! Then, calculation gets much! Much! simple!
๐‘ ๐‘‹1, ๐‘‹2 = ๐‘ ๐‘‹1 ๐‘(๐‘‹2)
In this case, we are containing 2(๐พ โˆ’ 1) number of parameters! In general case of ๐‘€ variables, we have ๐พ(๐‘€ โˆ’ 1)
Now, letโ€™s consider a special case of chain.
We covered similar model in stochastic process!
That is, we are assuming certain variable ๐‘ฅ(๐‘–) depends only its one previous step variable, ๐‘ฅ(๐‘–โˆ’1).
Thus, joint probability can be p x๐‘, โ€ฆ , x1 = ๐‘ ๐‘ฅ๐‘ ๐‘ฅ๐‘โˆ’1 ๐‘ ๐‘ฅ๐‘โˆ’1 ๐‘ฅ๐‘ โ€ฆ ๐‘ ๐‘ฅ2 ๐‘ฅ1 ๐‘(๐‘ฅ1).
Graphically, it can be shown as
๐‘ ๐‘‹1, ๐‘‹2 = ๐‘ ๐‘‹2|๐‘‹1 ๐‘(๐‘‹1) ๐‘ ๐‘‹1, ๐‘‹2 = ๐‘ ๐‘‹2 ๐‘(๐‘‹1)
๐‘‹1 does not depend on any variable, thus it takes ๐พ โˆ’ 1 cases.
Since this is conditional, for each conditional term, there exist ๐พ โˆ’ 1 possible cases. Thus,
We require ๐‘ฒ โˆ’ ๐Ÿ + ๐‘ด โˆ’ ๐Ÿ ๐‘ฒ(๐‘ฒ โˆ’ ๐Ÿ) number of parameters in this case!
Which is a linear growth as ๐‘€ increases!
Chain approach
Chapter 8.1. Bayesian Networks
9
Bayesian approach
Letโ€™s again consider ๐œ‡๐‘˜ as a random variable!
We are using chain model once again.
Here, we are using multinomial distribution, it is reasonable to set Dirichlet distribution as prior of ๐œ‡.
๐œ‡ can be used separately, or together!
Parameterized models
There is much simple way of modeling ๐‘(๐‘ฆ = 1|๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘€).
By using parametric approach, we can get following equation, and contains (๐‘€ + 1)
Chapter 8.1. Bayesian Networks
10
Linear-Gaussian models
We can express multivariate gaussian by graphical models.
Letโ€™s think of arbitrary directed acyclic graph. We assume ๐‘(๐‘ฅ๐‘–|๐‘๐‘Ž๐‘–) follows gaussian, and its parameters are
By using this, we can extend this idea to the joint probability, by
Here, we can find that again this joint probability follows multivariate
gaussian since it contains quadratic term of ๐‘ฅ๐‘–!
This indicates if we assume individual conditional probability as
gaussian in graphical model, entire joint probability also follows
multivariate gaussian!
But here, it is not written how to estimate the value of ๐’˜๐’Š๐’‹. I donโ€™t have any idea of how to get itโ€ฆ
If we assume we know ๐ฐ & ๐› values, we can estimate mean and covariance of joint probability!
Chapter 8.1. Bayesian Networks
11
Linear-Gaussian models
All these idea can be connected to the
Hierarchical bayes model!
Which assumes the prior of prior,
Which is called hyperprior!
Here, error term ๐๐’Š follows gaussian distribution.
Estimating Mean:
Starting from variable which does not depend on other variables, such as ๐‘ฅ1,
We can iteratively estimate mean value of other variables!
Likewise, we can estimate covariance similarly.
If every variables are independent, then we only need to estimate
๐‘๐‘– & ๐‘ฃ๐‘–, which contains 2๐ท number of parameters.
In case of fully connected graph, we have to estimate full covariance
matrix of
๐ท ๐ท+1
2
.
Each variable ๐‘ฅ๐‘– can be written as
Chapter 8.2. Conditional Independence
12
Ideation
We have covered conditional independence in Mathematical statistics I.
In this section, letโ€™s take a look at it more detail.
๐‘ ๐‘Ž ๐‘, ๐‘ = ๐‘(๐‘Ž|๐‘)
This means, a is independent of b when c is given!
Furthermore, joint probability of ๐‘ ๐‘Ž, ๐‘ โ‰  ๐‘ ๐‘Ž ๐‘(๐‘), but
Conditional independence can be notated by
This is significant in various machine learning tasks. Letโ€™s take an example.
Tail-to-tail
Still, c is conditionally given, and a & b are conditionally independent. Consider joint probability of
๐‘ ๐‘Ž, ๐‘, ๐‘ = ๐‘ ๐‘Ž ๐‘, ๐‘ ๐‘ ๐‘ ๐‘ ๐‘(๐‘) = ๐’‘ ๐’‚ ๐’„ ๐’‘ ๐’ƒ ๐’„ ๐’‘(๐’„)
Even if we marginalize out c, that does not become (Unobserved)
๐‘ ๐‘Ž, ๐‘ =
๐‘
๐‘ ๐‘Ž ๐‘ ๐‘ ๐‘ ๐‘ ๐‘(๐‘) โ‰ข ๐‘ ๐‘Ž ๐‘(๐‘)
However, if ๐‘ is given, then we can make ๐‘ ๐‘Ž, ๐‘ ๐‘ = ๐‘ ๐‘Ž ๐‘ ๐‘(๐‘|๐‘)
We call this โ€˜conditioned node blocks the path from a to b.โ€™ (Observed)
Chapter 8.2. Conditional Independence
13
Head-to-tail
Still, c is conditionally given. Consider joint probability of
๐‘ ๐‘Ž, ๐‘, ๐‘ = ๐‘ ๐‘ ๐‘Ž, ๐‘ ๐‘ ๐‘ ๐‘Ž ๐‘(๐‘Ž) = ๐’‘ ๐’ƒ ๐’„ ๐’‘ ๐’„ ๐’‚ ๐’‘(๐’‚)
Even if we marginalize out c, that does not become (Unobserved)
๐‘ ๐‘Ž, ๐‘ = ๐‘ ๐‘Ž
๐‘
๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘Ž = ๐‘ ๐‘ ๐‘Ž ๐‘(๐‘Ž) โ‰ข ๐‘ ๐‘Ž ๐‘(๐‘)
However, if ๐‘ is given, then we can make ๐‘ ๐‘Ž, ๐‘ ๐‘ =
๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘Ž ๐‘ ๐‘Ž
๐‘(๐‘)
=
p ๐‘ ๐‘ p a,c
๐‘(๐‘)
= ๐‘ ๐‘Ž ๐‘ ๐‘(๐‘|๐‘)
Here again โ€˜conditioned node blocks the path from a to b.โ€™ (Observed)
Head-to-head
Now, c does not stay in conditional term anymore.
๐‘ ๐‘Ž, ๐‘, ๐‘ = ๐‘(๐‘Ž)๐‘(๐‘)๐‘(๐‘|๐‘Ž, ๐‘) = ๐’‘ ๐’ƒ ๐’„ ๐’‘ ๐’„ ๐’‚ ๐’‘(๐’‚)
Here, we can get ๐‘ ๐‘Ž, ๐‘ = ๐‘(๐‘Ž, ๐‘) by marginalizing both sides by ๐‘.
However, if ๐‘ is given, then we can make ๐‘ ๐‘Ž, ๐‘ ๐‘ =
๐‘ ๐‘Ž ๐‘ ๐‘ ๐‘(๐‘|๐‘Ž,๐‘)
๐‘(๐‘)
โ‰ข ๐‘ ๐‘Ž ๐‘ ๐‘(๐‘|๐‘)
Here, this does not satisfy conditional independence!
Chapter 8.2. Conditional Independence
14
General result summary
Parent node (Ancestor)
Child node (Descendant)
Independence of different events depend on the fact that
Whether its ancestor and descendant is observed or not.
Details are covered in the below table.
Not blocked means they are having dependent structure.
Blocked means they are having independent structure either by themselves or conditioned.
Unobserved Observed
Tail to Tail Not blocked Blocked
Head to tail Not blocked Blocked
Head to head Blocked Not blocked
Chapter 8.2. Conditional Independence
15
Example of this approach
Three variables.
1. Battery : {0, 1}
2. Fuel : {0, 1}
3. Gauge : {0, 1}
This indicates ๐‘ ๐น = 0 ๐บ = 0 > ๐‘(๐น = 0).
This fits our intuition, because the probability of fuel is empty
when gauge says its empty is much bigger than itself says its
empty is a common sense!
1. This fits our intuition that as battery is already empty, likelihood
of fuel tank is also empty when gauge says its empt๐ฒ๐Ÿ
is,
much smaller than when only gauge says its empty!
2. This means battery and fuel is not conditionally independent while
state of gauge is given.
Chapter 8.2. Conditional Independence
16
D-separation
Can we identify whether a relation of satisfies by just looking at directed graph?
Letโ€™s think of paths from ๐ด to ๐ต. Any such path is blocked if it includes a node such that eitherโ€ฆ
1. Arrows on the path meet either
- Head โ€“ to โ€“ tail
- Tail โ€“ to โ€“ tail
- And the node is in the set C.
2. Arrows on the path meet
- Head โ€“ to โ€“ head
- Neither the node, nor any of its descendants, is in the set ๐ถ.
Here, if all paths are blocked, A is said to be d-separated from B by C, and joint distribution satisfies
First and second example!
Last example!
Path from a to b is not blocked by c,
Because node e is head-to-head, but
its descendant is c.
Path from a to b is blocked by f,
Because node f is tail-to-tail and
observed!
Chapter 8.2. Conditional Independence
17
I.I.D. (Independent and identically distributed)
Consider the joint probability of ๐‘ random samples which follow I.I.D. univariate gaussian distribution. That is, ๐‘ ๐ท ๐œ‡ = ๐‘›=1
๐‘
๐‘(๐‘ฅ๐‘›|๐œ‡)
๐‘ ๐ท ๐œ‡ =
๐‘›=1
๐‘
๐‘(๐‘ฅ๐‘›|๐œ‡)
Here, note that every data ๐‘ฅ are conditionally independent
given ๐œ‡. Thus, each data themselves do not become
independent even if we integrated ๐ out! (Tail to tail)
Furthermore, bayes polynomial model is also an example of this i.i.d. data model.
This indicates ๐’•๐’ & ๐’• is conditionally independent while ๐’˜ is given!
This is pretty intuitive that as model parameter ๐‘ค is given, predictive distribution
is independent of training data.
This is what we originally intended!
Chapter 8.2. Conditional Independence
18
Naรฏve Bayes model
I brought detail explanation of naรฏve bayes from Wikipedia!
As we all know, input features are not independent.
However, we can treat input features as conditionally independent while ๐ถ๐‘˜ is given!
This is useful when we model data which consists of both discrete and continuous type.
We can approximate discrete to multinomial and continuous to gaussian!
Chapter 8.2. Conditional Independence
19
Role of graphical model
Specific directed graph represents a specific decomposition of a joint probability distribution into a product of conditional probabilities.
Here, we can think of d-separation theorem and its graph as a filter of distribution!
That is, we can express overall distribution in much simpler form.
There is a term called โ€˜Markov blanket or Markov boundaryโ€™. This helps us simplify overall distribution either!
From this, terms that do not depend on ๐‘ฅ๐‘– either on
conditional term or probability term are cancelled
out. Remaining terms are one depend on ๐’™๐’Š
We can think of right-hand side graph as a
minimal set of ๐’™๐’Š that can be isolated from
graph
Chapter 8.3. Markov Random Fields
20
Conditional independence properties
We have covered some directed network. Now, lets take a look at โ€˜undirectedโ€™ one!
One major problem of directed network was the presence of โ€˜head-to-headโ€™ nodes.
We can simplify this problems by using undirected network!
To check whether conditionally independence satisfy, find all paths that connect ๐ด & ๐ต.
Here, above statement satisfies, because as we remove all nodes in set ๐ถ, there does
not exist any path that connects set ๐ด and ๐ต.
** This is my personal Idea.
For me, it was much easier to understand overall idea by thinking the connection between two nodes as just โ€˜probabilistic relationshipโ€™.
For now, letโ€™s forget the conditional term. This just contains
Chapter 8.3. Markov Random Fields
21
Factorization properties
Here, we are trying to model joint probability in much practical way!
Letโ€™s see how general probability can be expressed in undirected graph!
Conditional probability was expressed by arrow in the directed model.
Here, we need a concept of a โ€˜cliqueโ€™. Clique is a fully-connected subgraph.
Here, we can think clique as a building block of joint probability.
Here, letโ€™s denote a clique by ๐‘ช and set of variables in that clique by ๐‘ฟ๐‘ช.
Furthermore, we can define arbitrary function of ๐ถ, a potential function over maximal cliques ๐œ“๐ถ(๐‘‹๐ถ).
That is, joint probability can be expressed by the product of the functions of maximal cliques.
In fact, this ๐‘(๐‘‹) may not be pure probability, thus we need normalizing constant ๐‘.
However, for ๐‘€ discrete nodes over ๐พ states, possible case might be ๐พ๐‘€
, which has the exponent growth.
However, we donโ€™t need to normalize probability all the time! (Example will be soon covered!)
One of the popular case is using Boltzmann distribution with energy function ๐ธ(๐‘‹๐ถ).
Here, potential function do not have specific interpretation. Rather, we can set it according to our intuition and purpose.
Chapter 8.3. Markov Random Fields
22
Example. Image de-noising
Letโ€™s say original image as t, and ๐‘ก๐‘– as its individual pixel.
We are trying to erase noise from the noise figure.
Noise image can be ๐’š๐’Š, and estimated image ๐‘ฅ๐‘–.
We are iteratively erasing noise of the image.
Adjacent pixels should
have similar pixel value!
Difference
from raw data
Scalar โ„Ž, ๐›ฝ, ๐œ‚ โ‰ฅ 0 is common setting.
As training goes onโ€ฆ
Overall energy should be decreased,
Joint probability should be increased.
Chapter 8.3. Markov Random Fields
23
Relation to directed graphs
We have covered two ways of graphical models.
Directed was good in modeling conditional probability, while undirected gave intuitive and practical approach.
Letโ€™s find the connection between them.
From (a), equation ๐‘(๐‘ฅ4|๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3) includes every variables.
Thus, we can merge every nodes like (b), which is called
moralization. This graph is called moral graph.
Chapter 8.4. Inference in Graphical Models
24
Relation to directed graphs
Letโ€™s think of how to get ๐‘(๐‘ฅ๐‘›) from the joint probability of (๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘).
Intuitively, for the discrete case, we can marginalize all other variables in a joint probability.
Some of them might be observed, and some would not.
For the simple example, we can think of how to get ๐‘(๐‘ฅ|๐‘ฆ) from the above example (Example of posterior)
๐‘ ๐‘ฅ, ๐‘ฆ = ๐‘ ๐‘ฅ ๐‘(๐‘ฆ|๐‘ฅ), by using this, we can re-express
This was a simple example. Letโ€™s now consider much complicated one
Since this model is much simpler than fully-connected graph model(๐พ๐‘
), we only contain (N โˆ’ 1)๐พ2
number of parameters.
To get marginal density of ๐‘(๐‘ฅ๐‘), we can simply sum up all other variables.
Chapter 8.4. Inference in Graphical Models
25
Inference on Chain
Consider the chain example we have just saw.
Summation over last variable ๐’™๐‘ต only includes ๐œ“๐‘โˆ’1,๐‘. Thus, formula can be
So, in order to get the marginal distribution of ๐‘ฅ๐‘, which locates somewhere between them, we have to come from both sides.
Chapter 8.4. Inference in Graphical Models
26
Inference on Chain
Here, ๐œ‡๐›ผ and ๐œ‡๐›ฝ are as follows..
This process of marginalizing can be called as โ€˜message passingโ€™.
This kind of one-step dependent approach,
We call this โ€˜Markov chainโ€™.
Letโ€™s think we want to compute ๐’‘ ๐’™๐Ÿ , ๐’‘ ๐’™๐Ÿ , โ€ฆ , ๐’‘(๐’™๐‘ต) respectively.
Then we have complexity of N X ๐‘ ๐พ2
= ๐‘‚(๐‘2
๐พ2
). Which has quadratic complexity with respect to number of elements.
Is it efficient? Obviously not. Because the bottom-up (๐œ‡๐›ผ) in ๐‘(๐‘ฅ๐‘โˆ’1) and ๐‘(๐‘ฅ๐‘) only have a single term difference!
Thus, to compute overall algorithm much efficiently, we have to store calculated values for each step.
If there exist an observed data in the process,
We do not need to sum up that variable. We only need to compute it into the equation!
That is, ๐‘ฅ๐‘˜ = ๐‘ฅ๐‘˜
Marginal density of joint
probability can be expressed by
Chapter 8.4. Inference in Graphical Models
27
Trees
We can perform similar message passing by using โ€˜Treesโ€™.
We have seen various decision trees in many undergraduate classes.
Here, structure of tree is same, but the node corresponds to the random variables.
Thus, details of tree structure need not to be covered (may be..?)
One special thing that need to be noticed is that basic treeโ€™s node has at most one parent node.
A tree which contains more than one parent is called a polytree (Figure c).
Note that tree structure does not contain any loop (Since there does not exist any way going upside.)
Chapter 8.4. Inference in Graphical Models
28
Factor graphs
Consider โ€˜soo-neungโ€™.
What do we try to measure? We try to measure a studentโ€™s capability of understanding, comprehension, or their intelligence.
Can we measure the intelligence directly? Of course not. It is an object that exists in a latent dimension, which we cannot observe.
Thus, we are using pseudo measure, such as exam score, IQ. Which can reflect oneโ€™s intelligence.
Thus, exam score is a data, and intelligence is a factor.
Now, letโ€™s extend this idea to the data and probability.
We believe joint probability of data can be expressed by the product of some factor.
Here, factor ๐‘“๐‘† is a function of a corresponding set of variables ๐‘ฅ๐‘†.
For graph, in addition to the original data node, we add some factor nodes.
As you can see, factor graph is a bipartite graph.
Bipartite graph means a graph which has two sets of nodes.
Each set of node only has connection with other set.
And two sets are disjoint to each other.
Bipartite G.
Figure from Wikipedia!
Chapter 8.4. Inference in Graphical Models
29
Examples of factor graph
Undirected graph
Maximal clique with
factor variable ๐‘“.
Can also be
expressed in this!
As we can see, one undirected network can be expressed in
many kinds of factor graphs.
Directed graph
Maximal clique with
factor variable ๐‘“.
Can also be
expressed in this!
As we can see, one directed network can be expressed in
many kinds of factor graphs.
Factor graph of tree also becomes tree Like undirected network! Factors can be set between variable!
Chapter 8.4. Inference in Graphical Models
30
The sum-product algorithm
My major interest is Graph Neural Network(GNN).
Here, I think understanding overall architecture of GNN gives help in understanding.
GNN passes its information through edges, aggregates necessary information.
From Jure Leskovec CS224W (My favorite prof. )
For now, we donโ€™t need to understand what those neural networks
are. Rather, please focus on the idea that โ€˜We are aggregating
information!โ€™
In our example of probability graph, we are aggregating information
by using sum & product!
This is called belief propagation, which is also known as sum-
product algorithm.
Chapter 8.4. Inference in Graphical Models
31
The sum-product algorithm
As you can see,
We merge information via product term.
Here, please check that information from the backward
terms are not being decomposed!
We just let them be some constant.
Chapter 8.4. Inference in Graphical Models
32
The sum-product algorithm
Aggregation of edges : Product
Aggregation of ๐‘“๐‘† values : Sum
Here, we define ๐œ‡ โ†’ ๐‘“(๐‘ฅ) link
In the following page!
Chapter 8.4. Inference in Graphical Models
33
The sum-product algorithm
Aggregation of edges : Product
Here, we do not need to consider
factor values!
Note that this link connects
๐‘“ to ๐‘ฅ!
Chapter 8.4. Inference in Graphical Models
34
The sum-product algorithm
Note that there are two kinds of link,
1. From factor to data, ๐œ‡๐‘“ โ†’๐‘ฅ : This contains summation with product
2. From data to factor, ๐œ‡๐‘ฅ โ†’๐‘“ : This only contains product
When we are doing something with the leaf nodesโ€ฆ
Suppose we are trying to get marginal probability for every nodes in the graph.
Performing propagation for every nodes from the beginning is very inefficient.
Here, note that the message passing is independent from which node has been designated as root.
Thus, we can save or moving in a reverse order to compute the passing value.
For the joint set, we can simply compute this function of
As we have seen, link from variable to factor is a simple product of factor to nodes.
Thus, we can make entire process much simpler by eliminating this
Variable to factor links.
Chapter 8.4. Inference in Graphical Models
35
Normalization
If we start from directed graph which is intrinsically conditional probability, we donโ€™t need to compute normalizing constant Z.
For undirected, we need to compute normalization constant ๐‘ to make a probability.
Easy way of find this constant ๐‘ is by marginalizing ๐‘(๐‘ฅ๐‘–). Once ๐‘(๐‘ฅ๐‘–) is being solved, we can get it by simply getting
๐‘ ๐‘ฅ๐‘– =
๐‘ ๐‘ฅ๐‘–
๐‘(๐‘ฅ๐‘–)
Letโ€™s understand overall algorithm with a simple example!
Simple Example for sum-product algorithm
Our goal of computation!
Chapter 8.4. Inference in Graphical Models
36
Normalization
If we start from directed graph which is intrinsically conditional probability, we donโ€™t need to compute normalizing constant Z.
For undirected, we need to compute normalization constant ๐‘ to make a probability.
Easy way of find this constant ๐‘ is by marginalizing ๐‘(๐‘ฅ๐‘–). Once ๐‘(๐‘ฅ๐‘–) is being solved, we can get it by simply getting
๐‘ ๐‘ฅ๐‘– =
๐‘ ๐‘ฅ๐‘–
๐‘(๐‘ฅ๐‘–)
Letโ€™s understand overall algorithm with a simple example!
Simple Example for sum-product algorithm
Our goal of computation!
Here, let ๐‘ฅ3 be the root!
Chapter 8.4. Inference in Graphical Models
37
Example of sum-product algorithm
Now, letโ€™s see with a specific probability!
We have considered every variable as unobserved variables.
Now, letโ€™s assume some of them are observed in the set of variables.
Then, we can simply multiply indicator function of observed data to the joint probability ๐‘ ๐’™ โˆ— ๐ผ(๐‘ฃ๐‘–, ๐‘ฃ๐‘–)
Where indicator gives 1 for ๐‘ฃ๐‘– = ๐‘ฃ๐‘– o.w. 0.
This means we are generating ๐‘(โ„Ž , ๐‘ฃ = ๐‘ฃ), that we can ignore the hidden summation of ๐‘ฃ๐‘– term.
(Actually, for the observed condition, I canโ€™t get some of them intuitively. Thus, someone who understood this notion well may
explain it instead of me ๏Œ)
Chapter 8.4. Inference in Graphical Models
38
Max-Sum Algorithm
In sum-product, we have found the joint distribution ๐‘(๐‘‹) with a factor graph.
Here, we are going to find the setting of variables that has the largest probability.
Problem is that we cannot generate it from the naรฏve individual ๐‘(๐‘ฅ๐‘–). Bottom example tells why.
Here, by computing marginal distribution, we get
๐’‘ ๐’™ = ๐ŸŽ = ๐ŸŽ. ๐Ÿ”, ๐‘ ๐‘ฅ = 1 = 0.4, ๐’‘ ๐’š = ๐ŸŽ = ๐ŸŽ. ๐Ÿ• ๐‘ ๐‘ฆ = 1 = 0.3
There exist a difference between marginal max, and joint max.
Thus, we have to use joint max!
By using
Here, every algorithm is same with sum-product!
Thus, message passing and other mechanisms are same!
Just summation is replaced by maximization.
Furthermore, we use monotonic function log to get computational convenience!
Chapter 8.4. Inference in Graphical Models
39
Max-Sum Algorithm
Everything go in a similar way!
Summation was replaced by maximization!
Initial value of transmitting!
Maximum probability can be computed as Here, corresponding ๐’™ value can be
To obtain the estimated value, we again use message passing of different kinds!
Unlike other common MLE problem, parameters exist in a complicated joint structure.
Thus, we are using iterative (sequential) method to get estimation!
Chapter 8.4. Inference in Graphical Models
40
Max-Sum Algorithm
Initial value!
Here, we are tracking back the maximized value of ๐’™๐‘ต to compute previous ๐’™๐‘ตโˆ’๐Ÿ
Then, we are moving along the black line to back-propagate the maximized values!
For efficient calculation for max value, we store the computed maximized value ๐‘ฅ๐‘€๐‘Ž๐‘ฅ
since they can be reused to compute
other state of variables!
Application of this model is a Hidden Markov Model!

More Related Content

What's hot

ๅค‰ๅˆ†ใƒ™ใ‚คใ‚บๆณ•ใฎ่ชฌๆ˜Ž
ๅค‰ๅˆ†ใƒ™ใ‚คใ‚บๆณ•ใฎ่ชฌๆ˜Žๅค‰ๅˆ†ใƒ™ใ‚คใ‚บๆณ•ใฎ่ชฌๆ˜Ž
ๅค‰ๅˆ†ใƒ™ใ‚คใ‚บๆณ•ใฎ่ชฌๆ˜ŽHaruka Ozaki
ย 
[DL่ผช่ชญไผš]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL่ผช่ชญไผš]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL่ผช่ชญไผš]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL่ผช่ชญไผš]Set Transformer: A Framework for Attention-based Permutation-Invariant...Deep Learning JP
ย 
PRML่ผช่ชญ#7
PRML่ผช่ชญ#7PRML่ผช่ชญ#7
PRML่ผช่ชญ#7matsuolab
ย 
PRML 8.2 ๆกไปถไป˜ใ็‹ฌ็ซ‹ๆ€ง
PRML 8.2 ๆกไปถไป˜ใ็‹ฌ็ซ‹ๆ€งPRML 8.2 ๆกไปถไป˜ใ็‹ฌ็ซ‹ๆ€ง
PRML 8.2 ๆกไปถไป˜ใ็‹ฌ็ซ‹ๆ€งsleepy_yoshi
ย 
8.4 ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๆŽจ่ซ–
8.4 ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๆŽจ่ซ–8.4 ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๆŽจ่ซ–
8.4 ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๆŽจ่ซ–sleepy_yoshi
ย 
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซๅ…ฅ้–€
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซๅ…ฅ้–€ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซๅ…ฅ้–€
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซๅ…ฅ้–€Kawamoto_Kazuhiko
ย 
PRML็ฌฌ๏ผ™็ซ ใ€Œๆททๅˆใƒขใƒ‡ใƒซใจEMใ€
PRML็ฌฌ๏ผ™็ซ ใ€Œๆททๅˆใƒขใƒ‡ใƒซใจEMใ€PRML็ฌฌ๏ผ™็ซ ใ€Œๆททๅˆใƒขใƒ‡ใƒซใจEMใ€
PRML็ฌฌ๏ผ™็ซ ใ€Œๆททๅˆใƒขใƒ‡ใƒซใจEMใ€Keisuke Sugawara
ย 
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใซใ‚ˆใ‚‹ๆฉŸๆขฐๅญฆ็ฟ’ๅ…ฅ้–€ใ€€็ฌฌ๏ผ”็ซ 
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใซใ‚ˆใ‚‹ๆฉŸๆขฐๅญฆ็ฟ’ๅ…ฅ้–€ใ€€็ฌฌ๏ผ”็ซ ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใซใ‚ˆใ‚‹ๆฉŸๆขฐๅญฆ็ฟ’ๅ…ฅ้–€ใ€€็ฌฌ๏ผ”็ซ 
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใซใ‚ˆใ‚‹ๆฉŸๆขฐๅญฆ็ฟ’ๅ…ฅ้–€ใ€€็ฌฌ๏ผ”็ซ YosukeAkasaka
ย 
PRML่ผช่ชญ#9
PRML่ผช่ชญ#9PRML่ผช่ชญ#9
PRML่ผช่ชญ#9matsuolab
ย 
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซ Lasso ใ‚’็”จใ„ใŸ็•ฐๅธธๆคœ็Ÿฅ
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซ Lasso ใ‚’็”จใ„ใŸ็•ฐๅธธๆคœ็Ÿฅใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซ Lasso ใ‚’็”จใ„ใŸ็•ฐๅธธๆคœ็Ÿฅ
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซ Lasso ใ‚’็”จใ„ใŸ็•ฐๅธธๆคœ็ŸฅYuya Takashina
ย 
็ ”็ฉถๅฎคๅ†…PRMLๅ‹‰ๅผทไผš 11็ซ 2-4็ฏ€
็ ”็ฉถๅฎคๅ†…PRMLๅ‹‰ๅผทไผš 11็ซ 2-4็ฏ€็ ”็ฉถๅฎคๅ†…PRMLๅ‹‰ๅผทไผš 11็ซ 2-4็ฏ€
็ ”็ฉถๅฎคๅ†…PRMLๅ‹‰ๅผทไผš 11็ซ 2-4็ฏ€Koji Matsuda
ย 
่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-
่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-
่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-sleepy_yoshi
ย 
2 4.devianceใจๅฐคๅบฆๆฏ”ๆคœๅฎš
2 4.devianceใจๅฐคๅบฆๆฏ”ๆคœๅฎš2 4.devianceใจๅฐคๅบฆๆฏ”ๆคœๅฎš
2 4.devianceใจๅฐคๅบฆๆฏ”ๆคœๅฎšlogics-of-blue
ย 
ใ€Œ็ตฑ่จˆ็š„ๅญฆ็ฟ’็†่ซ–ใ€็ฌฌ1็ซ 
ใ€Œ็ตฑ่จˆ็š„ๅญฆ็ฟ’็†่ซ–ใ€็ฌฌ1็ซ ใ€Œ็ตฑ่จˆ็š„ๅญฆ็ฟ’็†่ซ–ใ€็ฌฌ1็ซ 
ใ€Œ็ตฑ่จˆ็š„ๅญฆ็ฟ’็†่ซ–ใ€็ฌฌ1็ซ Kota Matsui
ย 
PRMLๅญฆ็ฟ’่€…ใ‹ใ‚‰ๅ…ฅใ‚‹ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซๅ…ฅ้–€
PRMLๅญฆ็ฟ’่€…ใ‹ใ‚‰ๅ…ฅใ‚‹ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซๅ…ฅ้–€PRMLๅญฆ็ฟ’่€…ใ‹ใ‚‰ๅ…ฅใ‚‹ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซๅ…ฅ้–€
PRMLๅญฆ็ฟ’่€…ใ‹ใ‚‰ๅ…ฅใ‚‹ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซๅ…ฅ้–€tmtm otm
ย 
ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’ ยง6.2 ใ‚ซใƒผใƒใƒซ้–ขๆ•ฐใฎๆง‹ๆˆ
ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’ ยง6.2 ใ‚ซใƒผใƒใƒซ้–ขๆ•ฐใฎๆง‹ๆˆใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’ ยง6.2 ใ‚ซใƒผใƒใƒซ้–ขๆ•ฐใฎๆง‹ๆˆ
ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’ ยง6.2 ใ‚ซใƒผใƒใƒซ้–ขๆ•ฐใฎๆง‹ๆˆPrunus 1350
ย 
ใ‚ˆใ†ใ‚„ใๅˆ†ใ‹ใฃใŸ๏ผๆœ€ๅฐคๆŽจๅฎšใจใƒ™ใ‚คใ‚บๆŽจๅฎš
ใ‚ˆใ†ใ‚„ใๅˆ†ใ‹ใฃใŸ๏ผๆœ€ๅฐคๆŽจๅฎšใจใƒ™ใ‚คใ‚บๆŽจๅฎšใ‚ˆใ†ใ‚„ใๅˆ†ใ‹ใฃใŸ๏ผๆœ€ๅฐคๆŽจๅฎšใจใƒ™ใ‚คใ‚บๆŽจๅฎš
ใ‚ˆใ†ใ‚„ใๅˆ†ใ‹ใฃใŸ๏ผๆœ€ๅฐคๆŽจๅฎšใจใƒ™ใ‚คใ‚บๆŽจๅฎšAkira Masuda
ย 
PRML่ผช่ชญ#12
PRML่ผช่ชญ#12PRML่ผช่ชญ#12
PRML่ผช่ชญ#12matsuolab
ย 

What's hot (20)

ๅค‰ๅˆ†ใƒ™ใ‚คใ‚บๆณ•ใฎ่ชฌๆ˜Ž
ๅค‰ๅˆ†ใƒ™ใ‚คใ‚บๆณ•ใฎ่ชฌๆ˜Žๅค‰ๅˆ†ใƒ™ใ‚คใ‚บๆณ•ใฎ่ชฌๆ˜Ž
ๅค‰ๅˆ†ใƒ™ใ‚คใ‚บๆณ•ใฎ่ชฌๆ˜Ž
ย 
Prml 10 1
Prml 10 1Prml 10 1
Prml 10 1
ย 
[DL่ผช่ชญไผš]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL่ผช่ชญไผš]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL่ผช่ชญไผš]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL่ผช่ชญไผš]Set Transformer: A Framework for Attention-based Permutation-Invariant...
ย 
PRML่ผช่ชญ#7
PRML่ผช่ชญ#7PRML่ผช่ชญ#7
PRML่ผช่ชญ#7
ย 
PRML 8.2 ๆกไปถไป˜ใ็‹ฌ็ซ‹ๆ€ง
PRML 8.2 ๆกไปถไป˜ใ็‹ฌ็ซ‹ๆ€งPRML 8.2 ๆกไปถไป˜ใ็‹ฌ็ซ‹ๆ€ง
PRML 8.2 ๆกไปถไป˜ใ็‹ฌ็ซ‹ๆ€ง
ย 
8.4 ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๆŽจ่ซ–
8.4 ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๆŽจ่ซ–8.4 ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๆŽจ่ซ–
8.4 ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๆŽจ่ซ–
ย 
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซๅ…ฅ้–€
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซๅ…ฅ้–€ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซๅ…ฅ้–€
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซใƒขใƒ‡ใƒซๅ…ฅ้–€
ย 
PRML็ฌฌ๏ผ™็ซ ใ€Œๆททๅˆใƒขใƒ‡ใƒซใจEMใ€
PRML็ฌฌ๏ผ™็ซ ใ€Œๆททๅˆใƒขใƒ‡ใƒซใจEMใ€PRML็ฌฌ๏ผ™็ซ ใ€Œๆททๅˆใƒขใƒ‡ใƒซใจEMใ€
PRML็ฌฌ๏ผ™็ซ ใ€Œๆททๅˆใƒขใƒ‡ใƒซใจEMใ€
ย 
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใซใ‚ˆใ‚‹ๆฉŸๆขฐๅญฆ็ฟ’ๅ…ฅ้–€ใ€€็ฌฌ๏ผ”็ซ 
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใซใ‚ˆใ‚‹ๆฉŸๆขฐๅญฆ็ฟ’ๅ…ฅ้–€ใ€€็ฌฌ๏ผ”็ซ ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใซใ‚ˆใ‚‹ๆฉŸๆขฐๅญฆ็ฟ’ๅ…ฅ้–€ใ€€็ฌฌ๏ผ”็ซ 
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใซใ‚ˆใ‚‹ๆฉŸๆขฐๅญฆ็ฟ’ๅ…ฅ้–€ใ€€็ฌฌ๏ผ”็ซ 
ย 
EMใ‚ขใƒซใ‚ดใƒชใ‚บใƒ 
EMใ‚ขใƒซใ‚ดใƒชใ‚บใƒ EMใ‚ขใƒซใ‚ดใƒชใ‚บใƒ 
EMใ‚ขใƒซใ‚ดใƒชใ‚บใƒ 
ย 
PRML่ผช่ชญ#9
PRML่ผช่ชญ#9PRML่ผช่ชญ#9
PRML่ผช่ชญ#9
ย 
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซ Lasso ใ‚’็”จใ„ใŸ็•ฐๅธธๆคœ็Ÿฅ
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซ Lasso ใ‚’็”จใ„ใŸ็•ฐๅธธๆคœ็Ÿฅใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซ Lasso ใ‚’็”จใ„ใŸ็•ฐๅธธๆคœ็Ÿฅ
ใ‚ฐใƒฉใƒ•ใ‚ฃใ‚ซใƒซ Lasso ใ‚’็”จใ„ใŸ็•ฐๅธธๆคœ็Ÿฅ
ย 
็ ”็ฉถๅฎคๅ†…PRMLๅ‹‰ๅผทไผš 11็ซ 2-4็ฏ€
็ ”็ฉถๅฎคๅ†…PRMLๅ‹‰ๅผทไผš 11็ซ 2-4็ฏ€็ ”็ฉถๅฎคๅ†…PRMLๅ‹‰ๅผทไผš 11็ซ 2-4็ฏ€
็ ”็ฉถๅฎคๅ†…PRMLๅ‹‰ๅผทไผš 11็ซ 2-4็ฏ€
ย 
่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-
่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-
่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-
ย 
2 4.devianceใจๅฐคๅบฆๆฏ”ๆคœๅฎš
2 4.devianceใจๅฐคๅบฆๆฏ”ๆคœๅฎš2 4.devianceใจๅฐคๅบฆๆฏ”ๆคœๅฎš
2 4.devianceใจๅฐคๅบฆๆฏ”ๆคœๅฎš
ย 
ใ€Œ็ตฑ่จˆ็š„ๅญฆ็ฟ’็†่ซ–ใ€็ฌฌ1็ซ 
ใ€Œ็ตฑ่จˆ็š„ๅญฆ็ฟ’็†่ซ–ใ€็ฌฌ1็ซ ใ€Œ็ตฑ่จˆ็š„ๅญฆ็ฟ’็†่ซ–ใ€็ฌฌ1็ซ 
ใ€Œ็ตฑ่จˆ็š„ๅญฆ็ฟ’็†่ซ–ใ€็ฌฌ1็ซ 
ย 
PRMLๅญฆ็ฟ’่€…ใ‹ใ‚‰ๅ…ฅใ‚‹ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซๅ…ฅ้–€
PRMLๅญฆ็ฟ’่€…ใ‹ใ‚‰ๅ…ฅใ‚‹ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซๅ…ฅ้–€PRMLๅญฆ็ฟ’่€…ใ‹ใ‚‰ๅ…ฅใ‚‹ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซๅ…ฅ้–€
PRMLๅญฆ็ฟ’่€…ใ‹ใ‚‰ๅ…ฅใ‚‹ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซๅ…ฅ้–€
ย 
ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’ ยง6.2 ใ‚ซใƒผใƒใƒซ้–ขๆ•ฐใฎๆง‹ๆˆ
ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’ ยง6.2 ใ‚ซใƒผใƒใƒซ้–ขๆ•ฐใฎๆง‹ๆˆใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’ ยง6.2 ใ‚ซใƒผใƒใƒซ้–ขๆ•ฐใฎๆง‹ๆˆ
ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’ ยง6.2 ใ‚ซใƒผใƒใƒซ้–ขๆ•ฐใฎๆง‹ๆˆ
ย 
ใ‚ˆใ†ใ‚„ใๅˆ†ใ‹ใฃใŸ๏ผๆœ€ๅฐคๆŽจๅฎšใจใƒ™ใ‚คใ‚บๆŽจๅฎš
ใ‚ˆใ†ใ‚„ใๅˆ†ใ‹ใฃใŸ๏ผๆœ€ๅฐคๆŽจๅฎšใจใƒ™ใ‚คใ‚บๆŽจๅฎšใ‚ˆใ†ใ‚„ใๅˆ†ใ‹ใฃใŸ๏ผๆœ€ๅฐคๆŽจๅฎšใจใƒ™ใ‚คใ‚บๆŽจๅฎš
ใ‚ˆใ†ใ‚„ใๅˆ†ใ‹ใฃใŸ๏ผๆœ€ๅฐคๆŽจๅฎšใจใƒ™ใ‚คใ‚บๆŽจๅฎš
ย 
PRML่ผช่ชญ#12
PRML่ผช่ชญ#12PRML่ผช่ชญ#12
PRML่ผช่ชญ#12
ย 

Similar to PRML Chapter 8

PRML Chapter 1
PRML Chapter 1PRML Chapter 1
PRML Chapter 1Sunwoo Kim
ย 
PRML Chapter 12
PRML Chapter 12PRML Chapter 12
PRML Chapter 12Sunwoo Kim
ย 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5Sunwoo Kim
ย 
PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7Sunwoo Kim
ย 
PRML Chapter 6
PRML Chapter 6PRML Chapter 6
PRML Chapter 6Sunwoo Kim
ย 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...ijscmcj
ย 
PRML Chapter 3
PRML Chapter 3PRML Chapter 3
PRML Chapter 3Sunwoo Kim
ย 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4Sunwoo Kim
ย 
PRML Chapter 2
PRML Chapter 2PRML Chapter 2
PRML Chapter 2Sunwoo Kim
ย 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Mengxi Jiang
ย 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxPatilDevendra5
ย 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help HelpWithAssignment.com
ย 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxDevendraRavindraPati
ย 
PRML Chapter 11
PRML Chapter 11PRML Chapter 11
PRML Chapter 11Sunwoo Kim
ย 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer Sammer Qader
ย 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
ย 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20Yuta Kashino
ย 

Similar to PRML Chapter 8 (20)

PRML Chapter 1
PRML Chapter 1PRML Chapter 1
PRML Chapter 1
ย 
PRML Chapter 12
PRML Chapter 12PRML Chapter 12
PRML Chapter 12
ย 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5
ย 
PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7
ย 
PRML Chapter 6
PRML Chapter 6PRML Chapter 6
PRML Chapter 6
ย 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
ย 
PRML Chapter 3
PRML Chapter 3PRML Chapter 3
PRML Chapter 3
ย 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4
ย 
PRML Chapter 2
PRML Chapter 2PRML Chapter 2
PRML Chapter 2
ย 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
ย 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
ย 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
ย 
9057263
90572639057263
9057263
ย 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
ย 
JISA_Paper
JISA_PaperJISA_Paper
JISA_Paper
ย 
PRML Chapter 11
PRML Chapter 11PRML Chapter 11
PRML Chapter 11
ย 
2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
ย 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
ย 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
ย 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
ย 

Recently uploaded

High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...soniya singh
ย 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
ย 
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ffjhghh
ย 
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
ย 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
ย 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
ย 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
ย 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
ย 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
ย 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
ย 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
ย 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
ย 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
ย 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
ย 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
ย 
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...shivangimorya083
ย 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
ย 

Recently uploaded (20)

High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
ย 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
ย 
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ย 
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
ย 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
ย 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
ย 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
ย 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
ย 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
ย 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
ย 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
ย 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
ย 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
ย 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
ย 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
ย 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
ย 
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...
ย 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
ย 

PRML Chapter 8

  • 1. Chapter 8 Reviewer : Sunwoo Kim Christopher M. Bishop Pattern Recognition and Machine Learning Yonsei University Department of Applied Statistics
  • 2. Chapter 8. Probabilistic Graphical Models 2 Expressing probability in a simple way Consider following joint probability ๐‘(๐‘ฅ1)๐‘ ๐‘ฅ2 ๐‘ฅ1 ๐‘ ๐‘ฅ3 ๐‘ฅ2, ๐‘ฅ1 ๐‘ ๐‘ฅ4 ๐‘ฅ3, ๐‘ฅ2, ๐‘ฅ1 โ€ฆ ๐‘(๐‘ฅ๐‘˜|๐‘ฅ๐‘˜โˆ’1, โ€ฆ , ๐‘ฅ1) What is it equal to? Answer is ๐‘(๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘˜) Isnโ€™t it really troublesome to write all these variables in such a way. Thus, we can think of easy way by using visualization tool, Which is called a probabilistic graphical model. Node : Random Variable Edge : Probabilistic relationships Directed graphical models : A graph which has a direction in edge. - Good in capturing casual relationships(conditional terms) Undirected graphical models : A graph which its edge does not carry direction. - Good in expressing soft constraints Now, letโ€™s take an example!
  • 3. Chapter 8.1. Bayesian Networks 3 Modeling joint probability Basic idea of graphical model can be explained by this simple example! ๐’‘ ๐’‚, ๐’ƒ, ๐’„ = ๐’‘ ๐’„ ๐’‚, ๐’ƒ ๐’‘ ๐’ƒ ๐’‚ ๐’‘(๐’‚) Note that right-hand side of an equation is not symmetric anymore. Arrow direction : From conditional term to random variable For the complicated model, such as ๐‘ ๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐พ = ๐‘ ๐‘ฅ๐พ ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐พโˆ’1 โ€ฆ ๐‘ ๐‘ฅ2 ๐‘ฅ1 ๐‘(๐‘ฅ1) This is called a fully connected, since there is a link between every pair. There are some much complicated form likeโ€ฆ In general, we can sayโ€ฆ Like left figure, if there does not exist any cycle, such network is called directed acyclic network.
  • 4. Chapter 8.1. Bayesian Networks 4 Polynomial regression Letโ€™s think of Bayesian polynomial regression with N-independent data. We assume a distribution of parameter ๐’˜. Overall equation can be left-hand side equation, and corresponding figures are right-hand side. Original form Simplified form Original equation Letโ€™s think of model with more parameters (param for prior and variance.) Note that this blue box indicates N number of observations, and they are expressed in a form of joint product!
  • 5. Chapter 8.1. Bayesian Networks 5 Representation of Observed data Data ๐’• might be observed, or not. Left hand side expresses the general form, and the right one shows the case of observed data. Suppose we are trying to predict a new data ๐‘ฅ! Here, joint probability can be expressed by To exactly generate predictive distribution, we needโ€ฆ Remember some Laplace approximation and other integral methods!
  • 6. Chapter 8.1. Bayesian Networks 6 Generative models In chapter 11, we are going to cover some sampling methods. We may need to sample data from a distribution! For example, from joint probability ๐‘(๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐พ), we can generate ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐พ Not only just using above full equation, we can perform it iteratively, starting from ๐‘ฅ1. In image, such as, Likewise, we consider there is a latent variable beneath the observed data & distribution! We may interpret hidden variable like in image, but sometimes we cant. Still, it is useful in modeling some complicated probability models!
  • 7. Chapter 8.1. Bayesian Networks 7 Discrete variables - Exponential family : - Many famous distributions are exponential family, and they form useful building blocks for constructing more complex probability! - If we choose such distributions as parent & child node of graph, we can get many nice properties! - Letโ€™s take a look! Consider multinomial distribution. There exist a constraint of ๐œ‡๐‘˜ = 1 Letโ€™s extend this univariate example to two-variables case. That is, we are observing event of ๐‘ฅ1๐‘˜ = 1 & ๐‘ฅ2๐‘™ = 1 / Note that they are not totally independent! Joint probability is not just product of ๐๐’Œ โˆ— ๐๐’ In this case, there exist ๐พ2 โˆ’ 1 number of parameters! For general case of ๐‘€ variables, we have ๐พ๐‘€ โˆ’ 1. Canโ€™t we figure this exponential growth problem??
  • 8. Chapter 8.1. Bayesian Networks 8 Independence We can figure it out by assuming independence! Then, calculation gets much! Much! simple! ๐‘ ๐‘‹1, ๐‘‹2 = ๐‘ ๐‘‹1 ๐‘(๐‘‹2) In this case, we are containing 2(๐พ โˆ’ 1) number of parameters! In general case of ๐‘€ variables, we have ๐พ(๐‘€ โˆ’ 1) Now, letโ€™s consider a special case of chain. We covered similar model in stochastic process! That is, we are assuming certain variable ๐‘ฅ(๐‘–) depends only its one previous step variable, ๐‘ฅ(๐‘–โˆ’1). Thus, joint probability can be p x๐‘, โ€ฆ , x1 = ๐‘ ๐‘ฅ๐‘ ๐‘ฅ๐‘โˆ’1 ๐‘ ๐‘ฅ๐‘โˆ’1 ๐‘ฅ๐‘ โ€ฆ ๐‘ ๐‘ฅ2 ๐‘ฅ1 ๐‘(๐‘ฅ1). Graphically, it can be shown as ๐‘ ๐‘‹1, ๐‘‹2 = ๐‘ ๐‘‹2|๐‘‹1 ๐‘(๐‘‹1) ๐‘ ๐‘‹1, ๐‘‹2 = ๐‘ ๐‘‹2 ๐‘(๐‘‹1) ๐‘‹1 does not depend on any variable, thus it takes ๐พ โˆ’ 1 cases. Since this is conditional, for each conditional term, there exist ๐พ โˆ’ 1 possible cases. Thus, We require ๐‘ฒ โˆ’ ๐Ÿ + ๐‘ด โˆ’ ๐Ÿ ๐‘ฒ(๐‘ฒ โˆ’ ๐Ÿ) number of parameters in this case! Which is a linear growth as ๐‘€ increases! Chain approach
  • 9. Chapter 8.1. Bayesian Networks 9 Bayesian approach Letโ€™s again consider ๐œ‡๐‘˜ as a random variable! We are using chain model once again. Here, we are using multinomial distribution, it is reasonable to set Dirichlet distribution as prior of ๐œ‡. ๐œ‡ can be used separately, or together! Parameterized models There is much simple way of modeling ๐‘(๐‘ฆ = 1|๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘€). By using parametric approach, we can get following equation, and contains (๐‘€ + 1)
  • 10. Chapter 8.1. Bayesian Networks 10 Linear-Gaussian models We can express multivariate gaussian by graphical models. Letโ€™s think of arbitrary directed acyclic graph. We assume ๐‘(๐‘ฅ๐‘–|๐‘๐‘Ž๐‘–) follows gaussian, and its parameters are By using this, we can extend this idea to the joint probability, by Here, we can find that again this joint probability follows multivariate gaussian since it contains quadratic term of ๐‘ฅ๐‘–! This indicates if we assume individual conditional probability as gaussian in graphical model, entire joint probability also follows multivariate gaussian! But here, it is not written how to estimate the value of ๐’˜๐’Š๐’‹. I donโ€™t have any idea of how to get itโ€ฆ If we assume we know ๐ฐ & ๐› values, we can estimate mean and covariance of joint probability!
  • 11. Chapter 8.1. Bayesian Networks 11 Linear-Gaussian models All these idea can be connected to the Hierarchical bayes model! Which assumes the prior of prior, Which is called hyperprior! Here, error term ๐๐’Š follows gaussian distribution. Estimating Mean: Starting from variable which does not depend on other variables, such as ๐‘ฅ1, We can iteratively estimate mean value of other variables! Likewise, we can estimate covariance similarly. If every variables are independent, then we only need to estimate ๐‘๐‘– & ๐‘ฃ๐‘–, which contains 2๐ท number of parameters. In case of fully connected graph, we have to estimate full covariance matrix of ๐ท ๐ท+1 2 . Each variable ๐‘ฅ๐‘– can be written as
  • 12. Chapter 8.2. Conditional Independence 12 Ideation We have covered conditional independence in Mathematical statistics I. In this section, letโ€™s take a look at it more detail. ๐‘ ๐‘Ž ๐‘, ๐‘ = ๐‘(๐‘Ž|๐‘) This means, a is independent of b when c is given! Furthermore, joint probability of ๐‘ ๐‘Ž, ๐‘ โ‰  ๐‘ ๐‘Ž ๐‘(๐‘), but Conditional independence can be notated by This is significant in various machine learning tasks. Letโ€™s take an example. Tail-to-tail Still, c is conditionally given, and a & b are conditionally independent. Consider joint probability of ๐‘ ๐‘Ž, ๐‘, ๐‘ = ๐‘ ๐‘Ž ๐‘, ๐‘ ๐‘ ๐‘ ๐‘ ๐‘(๐‘) = ๐’‘ ๐’‚ ๐’„ ๐’‘ ๐’ƒ ๐’„ ๐’‘(๐’„) Even if we marginalize out c, that does not become (Unobserved) ๐‘ ๐‘Ž, ๐‘ = ๐‘ ๐‘ ๐‘Ž ๐‘ ๐‘ ๐‘ ๐‘ ๐‘(๐‘) โ‰ข ๐‘ ๐‘Ž ๐‘(๐‘) However, if ๐‘ is given, then we can make ๐‘ ๐‘Ž, ๐‘ ๐‘ = ๐‘ ๐‘Ž ๐‘ ๐‘(๐‘|๐‘) We call this โ€˜conditioned node blocks the path from a to b.โ€™ (Observed)
  • 13. Chapter 8.2. Conditional Independence 13 Head-to-tail Still, c is conditionally given. Consider joint probability of ๐‘ ๐‘Ž, ๐‘, ๐‘ = ๐‘ ๐‘ ๐‘Ž, ๐‘ ๐‘ ๐‘ ๐‘Ž ๐‘(๐‘Ž) = ๐’‘ ๐’ƒ ๐’„ ๐’‘ ๐’„ ๐’‚ ๐’‘(๐’‚) Even if we marginalize out c, that does not become (Unobserved) ๐‘ ๐‘Ž, ๐‘ = ๐‘ ๐‘Ž ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘Ž = ๐‘ ๐‘ ๐‘Ž ๐‘(๐‘Ž) โ‰ข ๐‘ ๐‘Ž ๐‘(๐‘) However, if ๐‘ is given, then we can make ๐‘ ๐‘Ž, ๐‘ ๐‘ = ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘Ž ๐‘ ๐‘Ž ๐‘(๐‘) = p ๐‘ ๐‘ p a,c ๐‘(๐‘) = ๐‘ ๐‘Ž ๐‘ ๐‘(๐‘|๐‘) Here again โ€˜conditioned node blocks the path from a to b.โ€™ (Observed) Head-to-head Now, c does not stay in conditional term anymore. ๐‘ ๐‘Ž, ๐‘, ๐‘ = ๐‘(๐‘Ž)๐‘(๐‘)๐‘(๐‘|๐‘Ž, ๐‘) = ๐’‘ ๐’ƒ ๐’„ ๐’‘ ๐’„ ๐’‚ ๐’‘(๐’‚) Here, we can get ๐‘ ๐‘Ž, ๐‘ = ๐‘(๐‘Ž, ๐‘) by marginalizing both sides by ๐‘. However, if ๐‘ is given, then we can make ๐‘ ๐‘Ž, ๐‘ ๐‘ = ๐‘ ๐‘Ž ๐‘ ๐‘ ๐‘(๐‘|๐‘Ž,๐‘) ๐‘(๐‘) โ‰ข ๐‘ ๐‘Ž ๐‘ ๐‘(๐‘|๐‘) Here, this does not satisfy conditional independence!
  • 14. Chapter 8.2. Conditional Independence 14 General result summary Parent node (Ancestor) Child node (Descendant) Independence of different events depend on the fact that Whether its ancestor and descendant is observed or not. Details are covered in the below table. Not blocked means they are having dependent structure. Blocked means they are having independent structure either by themselves or conditioned. Unobserved Observed Tail to Tail Not blocked Blocked Head to tail Not blocked Blocked Head to head Blocked Not blocked
  • 15. Chapter 8.2. Conditional Independence 15 Example of this approach Three variables. 1. Battery : {0, 1} 2. Fuel : {0, 1} 3. Gauge : {0, 1} This indicates ๐‘ ๐น = 0 ๐บ = 0 > ๐‘(๐น = 0). This fits our intuition, because the probability of fuel is empty when gauge says its empty is much bigger than itself says its empty is a common sense! 1. This fits our intuition that as battery is already empty, likelihood of fuel tank is also empty when gauge says its empt๐ฒ๐Ÿ is, much smaller than when only gauge says its empty! 2. This means battery and fuel is not conditionally independent while state of gauge is given.
  • 16. Chapter 8.2. Conditional Independence 16 D-separation Can we identify whether a relation of satisfies by just looking at directed graph? Letโ€™s think of paths from ๐ด to ๐ต. Any such path is blocked if it includes a node such that eitherโ€ฆ 1. Arrows on the path meet either - Head โ€“ to โ€“ tail - Tail โ€“ to โ€“ tail - And the node is in the set C. 2. Arrows on the path meet - Head โ€“ to โ€“ head - Neither the node, nor any of its descendants, is in the set ๐ถ. Here, if all paths are blocked, A is said to be d-separated from B by C, and joint distribution satisfies First and second example! Last example! Path from a to b is not blocked by c, Because node e is head-to-head, but its descendant is c. Path from a to b is blocked by f, Because node f is tail-to-tail and observed!
  • 17. Chapter 8.2. Conditional Independence 17 I.I.D. (Independent and identically distributed) Consider the joint probability of ๐‘ random samples which follow I.I.D. univariate gaussian distribution. That is, ๐‘ ๐ท ๐œ‡ = ๐‘›=1 ๐‘ ๐‘(๐‘ฅ๐‘›|๐œ‡) ๐‘ ๐ท ๐œ‡ = ๐‘›=1 ๐‘ ๐‘(๐‘ฅ๐‘›|๐œ‡) Here, note that every data ๐‘ฅ are conditionally independent given ๐œ‡. Thus, each data themselves do not become independent even if we integrated ๐ out! (Tail to tail) Furthermore, bayes polynomial model is also an example of this i.i.d. data model. This indicates ๐’•๐’ & ๐’• is conditionally independent while ๐’˜ is given! This is pretty intuitive that as model parameter ๐‘ค is given, predictive distribution is independent of training data. This is what we originally intended!
  • 18. Chapter 8.2. Conditional Independence 18 Naรฏve Bayes model I brought detail explanation of naรฏve bayes from Wikipedia! As we all know, input features are not independent. However, we can treat input features as conditionally independent while ๐ถ๐‘˜ is given! This is useful when we model data which consists of both discrete and continuous type. We can approximate discrete to multinomial and continuous to gaussian!
  • 19. Chapter 8.2. Conditional Independence 19 Role of graphical model Specific directed graph represents a specific decomposition of a joint probability distribution into a product of conditional probabilities. Here, we can think of d-separation theorem and its graph as a filter of distribution! That is, we can express overall distribution in much simpler form. There is a term called โ€˜Markov blanket or Markov boundaryโ€™. This helps us simplify overall distribution either! From this, terms that do not depend on ๐‘ฅ๐‘– either on conditional term or probability term are cancelled out. Remaining terms are one depend on ๐’™๐’Š We can think of right-hand side graph as a minimal set of ๐’™๐’Š that can be isolated from graph
  • 20. Chapter 8.3. Markov Random Fields 20 Conditional independence properties We have covered some directed network. Now, lets take a look at โ€˜undirectedโ€™ one! One major problem of directed network was the presence of โ€˜head-to-headโ€™ nodes. We can simplify this problems by using undirected network! To check whether conditionally independence satisfy, find all paths that connect ๐ด & ๐ต. Here, above statement satisfies, because as we remove all nodes in set ๐ถ, there does not exist any path that connects set ๐ด and ๐ต. ** This is my personal Idea. For me, it was much easier to understand overall idea by thinking the connection between two nodes as just โ€˜probabilistic relationshipโ€™. For now, letโ€™s forget the conditional term. This just contains
  • 21. Chapter 8.3. Markov Random Fields 21 Factorization properties Here, we are trying to model joint probability in much practical way! Letโ€™s see how general probability can be expressed in undirected graph! Conditional probability was expressed by arrow in the directed model. Here, we need a concept of a โ€˜cliqueโ€™. Clique is a fully-connected subgraph. Here, we can think clique as a building block of joint probability. Here, letโ€™s denote a clique by ๐‘ช and set of variables in that clique by ๐‘ฟ๐‘ช. Furthermore, we can define arbitrary function of ๐ถ, a potential function over maximal cliques ๐œ“๐ถ(๐‘‹๐ถ). That is, joint probability can be expressed by the product of the functions of maximal cliques. In fact, this ๐‘(๐‘‹) may not be pure probability, thus we need normalizing constant ๐‘. However, for ๐‘€ discrete nodes over ๐พ states, possible case might be ๐พ๐‘€ , which has the exponent growth. However, we donโ€™t need to normalize probability all the time! (Example will be soon covered!) One of the popular case is using Boltzmann distribution with energy function ๐ธ(๐‘‹๐ถ). Here, potential function do not have specific interpretation. Rather, we can set it according to our intuition and purpose.
  • 22. Chapter 8.3. Markov Random Fields 22 Example. Image de-noising Letโ€™s say original image as t, and ๐‘ก๐‘– as its individual pixel. We are trying to erase noise from the noise figure. Noise image can be ๐’š๐’Š, and estimated image ๐‘ฅ๐‘–. We are iteratively erasing noise of the image. Adjacent pixels should have similar pixel value! Difference from raw data Scalar โ„Ž, ๐›ฝ, ๐œ‚ โ‰ฅ 0 is common setting. As training goes onโ€ฆ Overall energy should be decreased, Joint probability should be increased.
  • 23. Chapter 8.3. Markov Random Fields 23 Relation to directed graphs We have covered two ways of graphical models. Directed was good in modeling conditional probability, while undirected gave intuitive and practical approach. Letโ€™s find the connection between them. From (a), equation ๐‘(๐‘ฅ4|๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3) includes every variables. Thus, we can merge every nodes like (b), which is called moralization. This graph is called moral graph.
  • 24. Chapter 8.4. Inference in Graphical Models 24 Relation to directed graphs Letโ€™s think of how to get ๐‘(๐‘ฅ๐‘›) from the joint probability of (๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘). Intuitively, for the discrete case, we can marginalize all other variables in a joint probability. Some of them might be observed, and some would not. For the simple example, we can think of how to get ๐‘(๐‘ฅ|๐‘ฆ) from the above example (Example of posterior) ๐‘ ๐‘ฅ, ๐‘ฆ = ๐‘ ๐‘ฅ ๐‘(๐‘ฆ|๐‘ฅ), by using this, we can re-express This was a simple example. Letโ€™s now consider much complicated one Since this model is much simpler than fully-connected graph model(๐พ๐‘ ), we only contain (N โˆ’ 1)๐พ2 number of parameters. To get marginal density of ๐‘(๐‘ฅ๐‘), we can simply sum up all other variables.
  • 25. Chapter 8.4. Inference in Graphical Models 25 Inference on Chain Consider the chain example we have just saw. Summation over last variable ๐’™๐‘ต only includes ๐œ“๐‘โˆ’1,๐‘. Thus, formula can be So, in order to get the marginal distribution of ๐‘ฅ๐‘, which locates somewhere between them, we have to come from both sides.
  • 26. Chapter 8.4. Inference in Graphical Models 26 Inference on Chain Here, ๐œ‡๐›ผ and ๐œ‡๐›ฝ are as follows.. This process of marginalizing can be called as โ€˜message passingโ€™. This kind of one-step dependent approach, We call this โ€˜Markov chainโ€™. Letโ€™s think we want to compute ๐’‘ ๐’™๐Ÿ , ๐’‘ ๐’™๐Ÿ , โ€ฆ , ๐’‘(๐’™๐‘ต) respectively. Then we have complexity of N X ๐‘ ๐พ2 = ๐‘‚(๐‘2 ๐พ2 ). Which has quadratic complexity with respect to number of elements. Is it efficient? Obviously not. Because the bottom-up (๐œ‡๐›ผ) in ๐‘(๐‘ฅ๐‘โˆ’1) and ๐‘(๐‘ฅ๐‘) only have a single term difference! Thus, to compute overall algorithm much efficiently, we have to store calculated values for each step. If there exist an observed data in the process, We do not need to sum up that variable. We only need to compute it into the equation! That is, ๐‘ฅ๐‘˜ = ๐‘ฅ๐‘˜ Marginal density of joint probability can be expressed by
  • 27. Chapter 8.4. Inference in Graphical Models 27 Trees We can perform similar message passing by using โ€˜Treesโ€™. We have seen various decision trees in many undergraduate classes. Here, structure of tree is same, but the node corresponds to the random variables. Thus, details of tree structure need not to be covered (may be..?) One special thing that need to be noticed is that basic treeโ€™s node has at most one parent node. A tree which contains more than one parent is called a polytree (Figure c). Note that tree structure does not contain any loop (Since there does not exist any way going upside.)
  • 28. Chapter 8.4. Inference in Graphical Models 28 Factor graphs Consider โ€˜soo-neungโ€™. What do we try to measure? We try to measure a studentโ€™s capability of understanding, comprehension, or their intelligence. Can we measure the intelligence directly? Of course not. It is an object that exists in a latent dimension, which we cannot observe. Thus, we are using pseudo measure, such as exam score, IQ. Which can reflect oneโ€™s intelligence. Thus, exam score is a data, and intelligence is a factor. Now, letโ€™s extend this idea to the data and probability. We believe joint probability of data can be expressed by the product of some factor. Here, factor ๐‘“๐‘† is a function of a corresponding set of variables ๐‘ฅ๐‘†. For graph, in addition to the original data node, we add some factor nodes. As you can see, factor graph is a bipartite graph. Bipartite graph means a graph which has two sets of nodes. Each set of node only has connection with other set. And two sets are disjoint to each other. Bipartite G. Figure from Wikipedia!
  • 29. Chapter 8.4. Inference in Graphical Models 29 Examples of factor graph Undirected graph Maximal clique with factor variable ๐‘“. Can also be expressed in this! As we can see, one undirected network can be expressed in many kinds of factor graphs. Directed graph Maximal clique with factor variable ๐‘“. Can also be expressed in this! As we can see, one directed network can be expressed in many kinds of factor graphs. Factor graph of tree also becomes tree Like undirected network! Factors can be set between variable!
  • 30. Chapter 8.4. Inference in Graphical Models 30 The sum-product algorithm My major interest is Graph Neural Network(GNN). Here, I think understanding overall architecture of GNN gives help in understanding. GNN passes its information through edges, aggregates necessary information. From Jure Leskovec CS224W (My favorite prof. ) For now, we donโ€™t need to understand what those neural networks are. Rather, please focus on the idea that โ€˜We are aggregating information!โ€™ In our example of probability graph, we are aggregating information by using sum & product! This is called belief propagation, which is also known as sum- product algorithm.
  • 31. Chapter 8.4. Inference in Graphical Models 31 The sum-product algorithm As you can see, We merge information via product term. Here, please check that information from the backward terms are not being decomposed! We just let them be some constant.
  • 32. Chapter 8.4. Inference in Graphical Models 32 The sum-product algorithm Aggregation of edges : Product Aggregation of ๐‘“๐‘† values : Sum Here, we define ๐œ‡ โ†’ ๐‘“(๐‘ฅ) link In the following page!
  • 33. Chapter 8.4. Inference in Graphical Models 33 The sum-product algorithm Aggregation of edges : Product Here, we do not need to consider factor values! Note that this link connects ๐‘“ to ๐‘ฅ!
  • 34. Chapter 8.4. Inference in Graphical Models 34 The sum-product algorithm Note that there are two kinds of link, 1. From factor to data, ๐œ‡๐‘“ โ†’๐‘ฅ : This contains summation with product 2. From data to factor, ๐œ‡๐‘ฅ โ†’๐‘“ : This only contains product When we are doing something with the leaf nodesโ€ฆ Suppose we are trying to get marginal probability for every nodes in the graph. Performing propagation for every nodes from the beginning is very inefficient. Here, note that the message passing is independent from which node has been designated as root. Thus, we can save or moving in a reverse order to compute the passing value. For the joint set, we can simply compute this function of As we have seen, link from variable to factor is a simple product of factor to nodes. Thus, we can make entire process much simpler by eliminating this Variable to factor links.
  • 35. Chapter 8.4. Inference in Graphical Models 35 Normalization If we start from directed graph which is intrinsically conditional probability, we donโ€™t need to compute normalizing constant Z. For undirected, we need to compute normalization constant ๐‘ to make a probability. Easy way of find this constant ๐‘ is by marginalizing ๐‘(๐‘ฅ๐‘–). Once ๐‘(๐‘ฅ๐‘–) is being solved, we can get it by simply getting ๐‘ ๐‘ฅ๐‘– = ๐‘ ๐‘ฅ๐‘– ๐‘(๐‘ฅ๐‘–) Letโ€™s understand overall algorithm with a simple example! Simple Example for sum-product algorithm Our goal of computation!
  • 36. Chapter 8.4. Inference in Graphical Models 36 Normalization If we start from directed graph which is intrinsically conditional probability, we donโ€™t need to compute normalizing constant Z. For undirected, we need to compute normalization constant ๐‘ to make a probability. Easy way of find this constant ๐‘ is by marginalizing ๐‘(๐‘ฅ๐‘–). Once ๐‘(๐‘ฅ๐‘–) is being solved, we can get it by simply getting ๐‘ ๐‘ฅ๐‘– = ๐‘ ๐‘ฅ๐‘– ๐‘(๐‘ฅ๐‘–) Letโ€™s understand overall algorithm with a simple example! Simple Example for sum-product algorithm Our goal of computation! Here, let ๐‘ฅ3 be the root!
  • 37. Chapter 8.4. Inference in Graphical Models 37 Example of sum-product algorithm Now, letโ€™s see with a specific probability! We have considered every variable as unobserved variables. Now, letโ€™s assume some of them are observed in the set of variables. Then, we can simply multiply indicator function of observed data to the joint probability ๐‘ ๐’™ โˆ— ๐ผ(๐‘ฃ๐‘–, ๐‘ฃ๐‘–) Where indicator gives 1 for ๐‘ฃ๐‘– = ๐‘ฃ๐‘– o.w. 0. This means we are generating ๐‘(โ„Ž , ๐‘ฃ = ๐‘ฃ), that we can ignore the hidden summation of ๐‘ฃ๐‘– term. (Actually, for the observed condition, I canโ€™t get some of them intuitively. Thus, someone who understood this notion well may explain it instead of me ๏Œ)
  • 38. Chapter 8.4. Inference in Graphical Models 38 Max-Sum Algorithm In sum-product, we have found the joint distribution ๐‘(๐‘‹) with a factor graph. Here, we are going to find the setting of variables that has the largest probability. Problem is that we cannot generate it from the naรฏve individual ๐‘(๐‘ฅ๐‘–). Bottom example tells why. Here, by computing marginal distribution, we get ๐’‘ ๐’™ = ๐ŸŽ = ๐ŸŽ. ๐Ÿ”, ๐‘ ๐‘ฅ = 1 = 0.4, ๐’‘ ๐’š = ๐ŸŽ = ๐ŸŽ. ๐Ÿ• ๐‘ ๐‘ฆ = 1 = 0.3 There exist a difference between marginal max, and joint max. Thus, we have to use joint max! By using Here, every algorithm is same with sum-product! Thus, message passing and other mechanisms are same! Just summation is replaced by maximization. Furthermore, we use monotonic function log to get computational convenience!
  • 39. Chapter 8.4. Inference in Graphical Models 39 Max-Sum Algorithm Everything go in a similar way! Summation was replaced by maximization! Initial value of transmitting! Maximum probability can be computed as Here, corresponding ๐’™ value can be To obtain the estimated value, we again use message passing of different kinds! Unlike other common MLE problem, parameters exist in a complicated joint structure. Thus, we are using iterative (sequential) method to get estimation!
  • 40. Chapter 8.4. Inference in Graphical Models 40 Max-Sum Algorithm Initial value! Here, we are tracking back the maximized value of ๐’™๐‘ต to compute previous ๐’™๐‘ตโˆ’๐Ÿ Then, we are moving along the black line to back-propagate the maximized values! For efficient calculation for max value, we store the computed maximized value ๐‘ฅ๐‘€๐‘Ž๐‘ฅ since they can be reused to compute other state of variables! Application of this model is a Hidden Markov Model!