SlideShare a Scribd company logo
1 of 250
Download to read offline
Artificial Intelligence
Belief Propagation and Junction Trees
Andres Mendez-Vazquez
March 28, 2016
1 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
2 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
3 / 102
Introduction
We will be looking at the following algorithms
Pearl’s Belief Propagation Algorithm
Junction Tree Algorithm
Belief Propagation Algorithm
The algorithm was first proposed by Judea Pearl in 1982, who
formulated this algorithm on trees, and was later extended to
polytrees.
4 / 102
Introduction
We will be looking at the following algorithms
Pearl’s Belief Propagation Algorithm
Junction Tree Algorithm
Belief Propagation Algorithm
The algorithm was first proposed by Judea Pearl in 1982, who
formulated this algorithm on trees, and was later extended to
polytrees.
4 / 102
Introduction
We will be looking at the following algorithms
Pearl’s Belief Propagation Algorithm
Junction Tree Algorithm
Belief Propagation Algorithm
The algorithm was first proposed by Judea Pearl in 1982, who
formulated this algorithm on trees, and was later extended to
polytrees.
A
C D
B
E
F G H
I
4 / 102
Introduction
Something Notable
It has since been shown to be a useful approximate algorithm on
general graphs.
Junction Tree Algorithm
The junction tree algorithm (also known as ’Clique Tree’) is a method
used in machine learning to extract marginalization in general graphs.
it entails performing belief propagation on a modified graph called a
junction tree by cycle elimination
5 / 102
Introduction
Something Notable
It has since been shown to be a useful approximate algorithm on
general graphs.
Junction Tree Algorithm
The junction tree algorithm (also known as ’Clique Tree’) is a method
used in machine learning to extract marginalization in general graphs.
it entails performing belief propagation on a modified graph called a
junction tree by cycle elimination
5 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
6 / 102
Example
The Message Passing Stuff
7 / 102
Thus
We can do the following
To pass information from below and from above to a certain node V .
Thus
We call those messages
π from above.
λ from below.
8 / 102
Thus
We can do the following
To pass information from below and from above to a certain node V .
Thus
We call those messages
π from above.
λ from below.
8 / 102
Thus
We can do the following
To pass information from below and from above to a certain node V .
Thus
We call those messages
π from above.
λ from below.
8 / 102
Thus
We can do the following
To pass information from below and from above to a certain node V .
Thus
We call those messages
π from above.
λ from below.
8 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
9 / 102
Inference on Trees
Recall
A rooted tree is a DAG
Now
Let (G, P) be a Bayesian network whose DAG is a tree.
Let a be a set of values of a subset A ⊂ V .
For simplicity
Imagine that each node has two children.
The general case can be inferred from it.
10 / 102
Inference on Trees
Recall
A rooted tree is a DAG
Now
Let (G, P) be a Bayesian network whose DAG is a tree.
Let a be a set of values of a subset A ⊂ V .
For simplicity
Imagine that each node has two children.
The general case can be inferred from it.
10 / 102
Inference on Trees
Recall
A rooted tree is a DAG
Now
Let (G, P) be a Bayesian network whose DAG is a tree.
Let a be a set of values of a subset A ⊂ V .
For simplicity
Imagine that each node has two children.
The general case can be inferred from it.
10 / 102
Inference on Trees
Recall
A rooted tree is a DAG
Now
Let (G, P) be a Bayesian network whose DAG is a tree.
Let a be a set of values of a subset A ⊂ V .
For simplicity
Imagine that each node has two children.
The general case can be inferred from it.
10 / 102
Inference on Trees
Recall
A rooted tree is a DAG
Now
Let (G, P) be a Bayesian network whose DAG is a tree.
Let a be a set of values of a subset A ⊂ V .
For simplicity
Imagine that each node has two children.
The general case can be inferred from it.
10 / 102
Then
Let DX be the subset of A
Containing all members that are in the subtree rooted at X
Including X if X ∈ A
Let NX be the subset
Containing all members of A that are non-descendant’s of X.
This set includes X if X ∈ A
11 / 102
Then
Let DX be the subset of A
Containing all members that are in the subtree rooted at X
Including X if X ∈ A
Let NX be the subset
Containing all members of A that are non-descendant’s of X.
This set includes X if X ∈ A
11 / 102
Then
Let DX be the subset of A
Containing all members that are in the subtree rooted at X
Including X if X ∈ A
Let NX be the subset
Containing all members of A that are non-descendant’s of X.
This set includes X if X ∈ A
11 / 102
Then
Let DX be the subset of A
Containing all members that are in the subtree rooted at X
Including X if X ∈ A
Let NX be the subset
Containing all members of A that are non-descendant’s of X.
This set includes X if X ∈ A
11 / 102
Example
We have that A = NX ∪ DX
A
X
12 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) =
P (dX |x) P (x|nX )
P (dX |nX )
= βP (dX |x) P (x|nX )
where β, the normalizing factor, is a constant not depending on x.
14 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
15 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Developing λ (x)
We need
λ (x) P (dX |x)
Case 1: X ∈ A and X ∈ DX
Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
λ (ˆx) ≡ 1
λ (x) ≡ 0 for x = ˆx
17 / 102
Developing λ (x)
We need
λ (x) P (dX |x)
Case 1: X ∈ A and X ∈ DX
Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
λ (ˆx) ≡ 1
λ (x) ≡ 0 for x = ˆx
17 / 102
Developing λ (x)
We need
λ (x) P (dX |x)
Case 1: X ∈ A and X ∈ DX
Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
λ (ˆx) ≡ 1
λ (x) ≡ 0 for x = ˆx
17 / 102
Developing λ (x)
We need
λ (x) P (dX |x)
Case 1: X ∈ A and X ∈ DX
Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
λ (ˆx) ≡ 1
λ (x) ≡ 0 for x = ˆx
17 / 102
Now
Case 2: X /∈ A and X is a leaf
Then, dX = ∅ and
P (dX |x) = P (∅|x) = 1 for all values of x
Thus, to achieve proportionality, we can set
λ (x) ≡ 1 for all values of x
18 / 102
Now
Case 2: X /∈ A and X is a leaf
Then, dX = ∅ and
P (dX |x) = P (∅|x) = 1 for all values of x
Thus, to achieve proportionality, we can set
λ (x) ≡ 1 for all values of x
18 / 102
Finally
Case 3: X /∈ A and X is a non-leaf
Let Y be X’s left child, W be X’s right child.
Since X /∈ A
DX = DY ∪ DW
19 / 102
Finally
Case 3: X /∈ A and X is a non-leaf
Let Y be X’s left child, W be X’s right child.
Since X /∈ A
DX = DY ∪ DW
X
Y W
19 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
λ (x) = λY (x) λW (x) for all values x
21 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Now
Case 2: X /∈ A and X is the root
In this specific case nX = ∅ or the empty set of random variables.
Then
P (x|nX ) = P (x|∅) = P (x) for all values of x
Enforcing the proportionality, we get
π (x) ≡ P (x) for all values of x
23 / 102
Now
Case 2: X /∈ A and X is the root
In this specific case nX = ∅ or the empty set of random variables.
Then
P (x|nX ) = P (x|∅) = P (x) for all values of x
Enforcing the proportionality, we get
π (x) ≡ P (x) for all values of x
23 / 102
Now
Case 2: X /∈ A and X is the root
In this specific case nX = ∅ or the empty set of random variables.
Then
P (x|nX ) = P (x|∅) = P (x) for all values of x
Enforcing the proportionality, we get
π (x) ≡ P (x) for all values of x
23 / 102
Then
Case 3: X /∈ A and X is not the root
Without loss of generality assume X is Z’s right child and T is the Z’s
left child
Then, NX = NZ ∪ DT
24 / 102
Then
Case 3: X /∈ A and X is not the root
Without loss of generality assume X is Z’s right child and T is the Z’s
left child
Then, NX = NZ ∪ DT
Z
T X
24 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Last Step
We have
P (x|nX ) =
z
P (x|z)
P (z|nZ ) P (nZ ) P (dT |z)
P (nZ , dT )
= γ
z
P (x|z) π (z) λT (z)
where γ = P(nZ )
P(nZ ,dT )
Thus, we can achieve proportionality by
πX (z) ≡ π (z) λT (z)
Then, setting
π (x) ≡
z
P (x|z) πX (z) for all values of x
26 / 102
Last Step
We have
P (x|nX ) =
z
P (x|z)
P (z|nZ ) P (nZ ) P (dT |z)
P (nZ , dT )
= γ
z
P (x|z) π (z) λT (z)
where γ = P(nZ )
P(nZ ,dT )
Thus, we can achieve proportionality by
πX (z) ≡ π (z) λT (z)
Then, setting
π (x) ≡
z
P (x|z) πX (z) for all values of x
26 / 102
Last Step
We have
P (x|nX ) =
z
P (x|z)
P (z|nZ ) P (nZ ) P (dT |z)
P (nZ , dT )
= γ
z
P (x|z) π (z) λT (z)
where γ = P(nZ )
P(nZ ,dT )
Thus, we can achieve proportionality by
πX (z) ≡ π (z) λT (z)
Then, setting
π (x) ≡
z
P (x|z) πX (z) for all values of x
26 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
27 / 102
How do we implement this?
We require the following functions
initial_tree
update-tree
intial_tree has the following input and outputs
Input: ((G, P), A, a, P (x|a))
Output: After this call A and a are both empty making P (x|a) the
prior probability of x.
Then each time a variable V is instantiated for ˆv the routine
update-tree is called
Input: ((G, P), A, a, V , ˆv, P (x|a))
Output: After this call V has been added to A, ˆv has been added to
a and for every value of x, P (x|a) has been updated to be
the conditional probability of x given the new a.
28 / 102
How do we implement this?
We require the following functions
initial_tree
update-tree
intial_tree has the following input and outputs
Input: ((G, P), A, a, P (x|a))
Output: After this call A and a are both empty making P (x|a) the
prior probability of x.
Then each time a variable V is instantiated for ˆv the routine
update-tree is called
Input: ((G, P), A, a, V , ˆv, P (x|a))
Output: After this call V has been added to A, ˆv has been added to
a and for every value of x, P (x|a) has been updated to be
the conditional probability of x given the new a.
28 / 102
How do we implement this?
We require the following functions
initial_tree
update-tree
intial_tree has the following input and outputs
Input: ((G, P), A, a, P (x|a))
Output: After this call A and a are both empty making P (x|a) the
prior probability of x.
Then each time a variable V is instantiated for ˆv the routine
update-tree is called
Input: ((G, P), A, a, V , ˆv, P (x|a))
Output: After this call V has been added to A, ˆv has been added to
a and for every value of x, P (x|a) has been updated to be
the conditional probability of x given the new a.
28 / 102
Algorithm: Inference-in-trees
Problem
Given a Bayesian network whose DAG is a tree, determine the probabilities
of the values of each node conditional on specified values of the nodes in
some subset.
Input
Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a
set of values a of a subset A ⊆ V.
Output
The Bayesian network (G, P) updated according to the values in a. The λ
and π values and messages and P(x|a) for each X∈V are considered part
of the network.
29 / 102
Algorithm: Inference-in-trees
Problem
Given a Bayesian network whose DAG is a tree, determine the probabilities
of the values of each node conditional on specified values of the nodes in
some subset.
Input
Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a
set of values a of a subset A ⊆ V.
Output
The Bayesian network (G, P) updated according to the values in a. The λ
and π values and messages and P(x|a) for each X∈V are considered part
of the network.
29 / 102
Algorithm: Inference-in-trees
Problem
Given a Bayesian network whose DAG is a tree, determine the probabilities
of the values of each node conditional on specified values of the nodes in
some subset.
Input
Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a
set of values a of a subset A ⊆ V.
Output
The Bayesian network (G, P) updated according to the values in a. The λ
and π values and messages and P(x|a) for each X∈V are considered part
of the network.
29 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Sending the λ message
void send_λ_msg(node Y , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of x)
2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message
3 λ (x) = U∈CHX
λU (x) // Compute X s λ values
4 P(x|a) = αλ (x) π (x) // Compute P(x|a)
5 normalize P(x|a)
6 if (X is not the root and X s parent Z /∈A)
7 send_λ_msg(X, Z)
8 for (each child W of X such that W = Y and W ∈A) )
9 send_π_msg(X, W )
32 / 102
Sending the λ message
void send_λ_msg(node Y , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of x)
2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message
3 λ (x) = U∈CHX
λU (x) // Compute X s λ values
4 P(x|a) = αλ (x) π (x) // Compute P(x|a)
5 normalize P(x|a)
6 if (X is not the root and X s parent Z /∈A)
7 send_λ_msg(X, Z)
8 for (each child W of X such that W = Y and W ∈A) )
9 send_π_msg(X, W )
32 / 102
Sending the λ message
void send_λ_msg(node Y , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of x)
2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message
3 λ (x) = U∈CHX
λU (x) // Compute X s λ values
4 P(x|a) = αλ (x) π (x) // Compute P(x|a)
5 normalize P(x|a)
6 if (X is not the root and X s parent Z /∈A)
7 send_λ_msg(X, Z)
8 for (each child W of X such that W = Y and W ∈A) )
9 send_π_msg(X, W )
32 / 102
Sending the λ message
void send_λ_msg(node Y , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of x)
2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message
3 λ (x) = U∈CHX
λU (x) // Compute X s λ values
4 P(x|a) = αλ (x) π (x) // Compute P(x|a)
5 normalize P(x|a)
6 if (X is not the root and X s parent Z /∈A)
7 send_λ_msg(X, Z)
8 for (each child W of X such that W = Y and W ∈A) )
9 send_π_msg(X, W )
32 / 102
Sending the π message
void send_π_msg(node Z , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of z)
2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π
message
3 for (each value of x)
4 π (x) = z P (x|z) πX (z) // ComputeX s π values
5 P(x|a) = αλ (x) π (x) // Compute P(x|a)
6 normalize P(x|a)
7 for (each child Y of X such that Y /∈A) )
8 send_π_msg(X, Y )
33 / 102
Sending the π message
void send_π_msg(node Z , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of z)
2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π
message
3 for (each value of x)
4 π (x) = z P (x|z) πX (z) // ComputeX s π values
5 P(x|a) = αλ (x) π (x) // Compute P(x|a)
6 normalize P(x|a)
7 for (each child Y of X such that Y /∈A) )
8 send_π_msg(X, Y )
33 / 102
Sending the π message
void send_π_msg(node Z , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of z)
2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π
message
3 for (each value of x)
4 π (x) = z P (x|z) πX (z) // ComputeX s π values
5 P(x|a) = αλ (x) π (x) // Compute P(x|a)
6 normalize P(x|a)
7 for (each child Y of X such that Y /∈A) )
8 send_π_msg(X, Y )
33 / 102
Example of Tree Initialization
We have then
34 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
Compute P (b|∅)
P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α
P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α
Then, normalize
P (b1|∅) =
0.09α
0.09α + 0.91α
= 0.09
P (b2|∅) =
0.91α
0.09α + 0.91α
= 0.91
38 / 102
The call send_π_msg(H, B)
Compute P (b|∅)
P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α
P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α
Then, normalize
P (b1|∅) =
0.09α
0.09α + 0.91α
= 0.09
P (b2|∅) =
0.91α
0.09α + 0.91α
= 0.91
38 / 102
The call send_π_msg(H, B)
Compute P (b|∅)
P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α
P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α
Then, normalize
P (b1|∅) =
0.09α
0.09α + 0.91α
= 0.09
P (b2|∅) =
0.91α
0.09α + 0.91α
= 0.91
38 / 102
The call send_π_msg(H, B)
Compute P (b|∅)
P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α
P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α
Then, normalize
P (b1|∅) =
0.09α
0.09α + 0.91α
= 0.09
P (b2|∅) =
0.91α
0.09α + 0.91α
= 0.91
38 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
Then, normalize
P (l1|∅) =
0.00064α
0.00064α + 0.99936α
= 0.00064
P (l2|∅) =
0.99936α
0.00064α + 0.99936α
= 0.99936
40 / 102
Send the call send_π_msg(H, L)
Then, normalize
P (l1|∅) =
0.00064α
0.00064α + 0.99936α
= 0.00064
P (l2|∅) =
0.99936α
0.00064α + 0.99936α
= 0.99936
40 / 102
Send the call send_π_msg(L, C)
L sends C a π message
πC (l1) = π (l1) = 0.00064
πC (l2) = π (l2) = 0.99936
Compute C s π values
π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2)
= (0.6) (0.00064) + (0.02) (0.99936) = 0.02037
π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2)
= (0.4) (0.00064) + (0.98) (0.99936) = 0.97963
41 / 102
Send the call send_π_msg(L, C)
L sends C a π message
πC (l1) = π (l1) = 0.00064
πC (l2) = π (l2) = 0.99936
Compute C s π values
π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2)
= (0.6) (0.00064) + (0.02) (0.99936) = 0.02037
π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2)
= (0.4) (0.00064) + (0.98) (0.99936) = 0.97963
41 / 102
Send the call send_π_msg(L, C)
L sends C a π message
πC (l1) = π (l1) = 0.00064
πC (l2) = π (l2) = 0.99936
Compute C s π values
π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2)
= (0.6) (0.00064) + (0.02) (0.99936) = 0.02037
π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2)
= (0.4) (0.00064) + (0.98) (0.99936) = 0.97963
41 / 102
Send the call send_π_msg(L, C)
Compute P (c|∅)
P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α
P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α
Normalize
P (c1|∅) =
0.02037α
0.02037α + 0.97963α
= 0.02037
P (c2|∅) =
0.99936α
0.02037α + 0.97963α
= 0.97963
42 / 102
Send the call send_π_msg(L, C)
Compute P (c|∅)
P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α
P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α
Normalize
P (c1|∅) =
0.02037α
0.02037α + 0.97963α
= 0.02037
P (c2|∅) =
0.99936α
0.02037α + 0.97963α
= 0.97963
42 / 102
Send the call send_π_msg(L, C)
Compute P (c|∅)
P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α
P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α
Normalize
P (c1|∅) =
0.02037α
0.02037α + 0.97963α
= 0.02037
P (c2|∅) =
0.99936α
0.02037α + 0.97963α
= 0.97963
42 / 102
Final Graph
We have then
H
B L
C
43 / 102
For the Generalization Please look at...
Look at pages 123 - 156 at
Richard E. Neapolitan. 2003. Learning Bayesian Networks. Prentice-Hall,
Inc
44 / 102
History
Invented in 1988
Invented by Lauritzen and Spiegelhalter, 1988
Something Notable
The general idea is that the propagation of evidence through the network
can be carried out more efficiently by representing the joint probability
distribution on an undirected graph called the Junction tree (or Join tree).
45 / 102
History
Invented in 1988
Invented by Lauritzen and Spiegelhalter, 1988
Something Notable
The general idea is that the propagation of evidence through the network
can be carried out more efficiently by representing the joint probability
distribution on an undirected graph called the Junction tree (or Join tree).
45 / 102
More in the Intuition
High-level Intuition
Computing marginals is straightforward in a tree structure.
46 / 102
Junction Tree Characteristics
The junction tree has the following characteristics
It is an undirected tree
Its nodes are clusters of variables (i.e. from the original BN)
Given two clusters, C1 and C2, every node on the path between them
contains their intersection C1 ∩ C2
In addition
A Separator, S, is associated with each edge and contains the variables in
the intersection between neighboring nodes
47 / 102
Junction Tree Characteristics
The junction tree has the following characteristics
It is an undirected tree
Its nodes are clusters of variables (i.e. from the original BN)
Given two clusters, C1 and C2, every node on the path between them
contains their intersection C1 ∩ C2
In addition
A Separator, S, is associated with each edge and contains the variables in
the intersection between neighboring nodes
47 / 102
Junction Tree Characteristics
The junction tree has the following characteristics
It is an undirected tree
Its nodes are clusters of variables (i.e. from the original BN)
Given two clusters, C1 and C2, every node on the path between them
contains their intersection C1 ∩ C2
In addition
A Separator, S, is associated with each edge and contains the variables in
the intersection between neighboring nodes
47 / 102
Junction Tree Characteristics
The junction tree has the following characteristics
It is an undirected tree
Its nodes are clusters of variables (i.e. from the original BN)
Given two clusters, C1 and C2, every node on the path between them
contains their intersection C1 ∩ C2
In addition
A Separator, S, is associated with each edge and contains the variables in
the intersection between neighboring nodes
ABC BC BCD CD CDE
S
47 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
48 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
49 / 102
Simplicial Node
Simplicial Node
In a graph G, a vertex v is called simplicial if and only if the subgraph of
G induced by the vertex set {v} ∪ N (v) is a clique.
N (v) is the neighbor of v in the Graph.
50 / 102
Example
Vertex 3 is simplicial, while 4 is not
1 2
3 4
51 / 102
Perfect Elimination Ordering
Definition
A graph G on n vertices is said to have a perfect elimination ordering if
and only if there is an ordering {v1, ..., vn} of G’s vertices, such that each
vi is simplicial in the subgraph induced by the vertices {v1, ..., vi}.
52 / 102
Chordal Graph
Definition
A Chordal Graph is one in which all cycles of four or more vertices have a
chord, which is an edge that is not part of the cycle but connects two
vertices of the cycle.
Definition
For any two vertices x, y ∈ G such that (x, y) ∈ E, a x − y separator is a
set S ⊂ V such that the graph G − S has at least two disjoint connected
components, one of which contains x and another of which contains y.
53 / 102
Chordal Graph
Definition
A Chordal Graph is one in which all cycles of four or more vertices have a
chord, which is an edge that is not part of the cycle but connects two
vertices of the cycle.
Definition
For any two vertices x, y ∈ G such that (x, y) ∈ E, a x − y separator is a
set S ⊂ V such that the graph G − S has at least two disjoint connected
components, one of which contains x and another of which contains y.
53 / 102
Chordal Graph
Theorem
For a graph G on n vertices, the following conditions are equivalent:
1 G has a perfect elimination ordering.
2 G is chordal.
3 If H is any induced subgraph of G and S is a vertex separator of H of
minimal size, S’s vertices induce a clique.
54 / 102
Chordal Graph
Theorem
For a graph G on n vertices, the following conditions are equivalent:
1 G has a perfect elimination ordering.
2 G is chordal.
3 If H is any induced subgraph of G and S is a vertex separator of H of
minimal size, S’s vertices induce a clique.
54 / 102
Chordal Graph
Theorem
For a graph G on n vertices, the following conditions are equivalent:
1 G has a perfect elimination ordering.
2 G is chordal.
3 If H is any induced subgraph of G and S is a vertex separator of H of
minimal size, S’s vertices induce a clique.
54 / 102
Chordal Graph
Theorem
For a graph G on n vertices, the following conditions are equivalent:
1 G has a perfect elimination ordering.
2 G is chordal.
3 If H is any induced subgraph of G and S is a vertex separator of H of
minimal size, S’s vertices induce a clique.
54 / 102
Maximal Clique
Definition
A maximal clique is a clique that cannot be extended by including one
more adjacent vertex, meaning it is not a subset of a larger clique.
We have the the following Claims
1 A chordal graph with N vertices can have no more than N maximal
cliques.
2 Given a chordal graph with G = (V , E), where |V | = N , there exists
an algorithm to find all the maximal cliques of G which takes no more
than O N4 time.
55 / 102
Maximal Clique
Definition
A maximal clique is a clique that cannot be extended by including one
more adjacent vertex, meaning it is not a subset of a larger clique.
We have the the following Claims
1 A chordal graph with N vertices can have no more than N maximal
cliques.
2 Given a chordal graph with G = (V , E), where |V | = N , there exists
an algorithm to find all the maximal cliques of G which takes no more
than O N4 time.
55 / 102
Maximal Clique
Definition
A maximal clique is a clique that cannot be extended by including one
more adjacent vertex, meaning it is not a subset of a larger clique.
We have the the following Claims
1 A chordal graph with N vertices can have no more than N maximal
cliques.
2 Given a chordal graph with G = (V , E), where |V | = N , there exists
an algorithm to find all the maximal cliques of G which takes no more
than O N4 time.
55 / 102
Elimination Clique
Definition (Elimination Clique)
Given a chordal graph G, and an elimination ordering for G which does
not add any edges.
Suppose node i (Assuming a Labeling) is eliminated in some step of
the elimination algorithm, then the clique consisting of the node i
along with its neighbors during the elimination step (which must be
fully connected since elimination does not add edges) is called an
elimination clique.
Formally
Suppose node i is eliminated in the kth step of the algorithm, and let G(k)
be the graph just before the kth elimination step. Then, the clique
Ci = {i} ∪ N(k) (i) where N(k) (i) is the neighbor of i in the Graph G(k).
56 / 102
Elimination Clique
Definition (Elimination Clique)
Given a chordal graph G, and an elimination ordering for G which does
not add any edges.
Suppose node i (Assuming a Labeling) is eliminated in some step of
the elimination algorithm, then the clique consisting of the node i
along with its neighbors during the elimination step (which must be
fully connected since elimination does not add edges) is called an
elimination clique.
Formally
Suppose node i is eliminated in the kth step of the algorithm, and let G(k)
be the graph just before the kth elimination step. Then, the clique
Ci = {i} ∪ N(k) (i) where N(k) (i) is the neighbor of i in the Graph G(k).
56 / 102
From This
Theorem
Given a chordal graph and an elimination ordering which does not add any
edges. Let C be the set of maximal cliques in the chordal graph, and let
Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this
elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique
is also an elimination clique for this particular ordering.
Something Notable
The theorem proves the 2nd claims given earlier. Firstly, it shows that a
chordal graph cannot have more than N maximal cliques, since we have
only N elimination cliques.
It is more
It gives us an efficient algorithm for finding these N maximal cliques.
Simply go over each elimination clique and check whether it is
maximal.
57 / 102
From This
Theorem
Given a chordal graph and an elimination ordering which does not add any
edges. Let C be the set of maximal cliques in the chordal graph, and let
Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this
elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique
is also an elimination clique for this particular ordering.
Something Notable
The theorem proves the 2nd claims given earlier. Firstly, it shows that a
chordal graph cannot have more than N maximal cliques, since we have
only N elimination cliques.
It is more
It gives us an efficient algorithm for finding these N maximal cliques.
Simply go over each elimination clique and check whether it is
maximal.
57 / 102
From This
Theorem
Given a chordal graph and an elimination ordering which does not add any
edges. Let C be the set of maximal cliques in the chordal graph, and let
Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this
elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique
is also an elimination clique for this particular ordering.
Something Notable
The theorem proves the 2nd claims given earlier. Firstly, it shows that a
chordal graph cannot have more than N maximal cliques, since we have
only N elimination cliques.
It is more
It gives us an efficient algorithm for finding these N maximal cliques.
Simply go over each elimination clique and check whether it is
maximal.
57 / 102
Therefore
Even with a brute force approach
It will not take more than |Ce|2
× D = O N3 with D = maxC∈C |C|.
Because
Since both clique size and number of elimination cliques is bounded by N
Observation
The maximum clique problem, which is NP-hard on general graphs, is easy
on chordal graphs.
58 / 102
Therefore
Even with a brute force approach
It will not take more than |Ce|2
× D = O N3 with D = maxC∈C |C|.
Because
Since both clique size and number of elimination cliques is bounded by N
Observation
The maximum clique problem, which is NP-hard on general graphs, is easy
on chordal graphs.
58 / 102
Therefore
Even with a brute force approach
It will not take more than |Ce|2
× D = O N3 with D = maxC∈C |C|.
Because
Since both clique size and number of elimination cliques is bounded by N
Observation
The maximum clique problem, which is NP-hard on general graphs, is easy
on chordal graphs.
58 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
59 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
61 / 102
Definition
Junction Tree
Given a graph G = (V , E), a graph G = (V , E ) is said to be a Junction
Tree for G, iff:
1 The nodes of G are the maximal cliques of G (i.e. G is a clique
graph of G.)
2 G is a tree.
3 Running Intersection Property / Junction Tree Property:
1 For each v ∈ V , define Gv to be the induced subgraph of G
consisting of exactly those nodes which correspond to maximal cliques
of G that contain v. Then Gv must be a connected graph.
62 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
63 / 102
Step 1
Given a DAG G = (V , E) and |V | = N
Chordalize the graph using the elimination algorithm with an arbitrary
elimination ordering, if required.
For this, you can use the following greedy algorithm
Given a list of nodes:
1 Is the vertex simplicial? If it is not, make it simplicial.
2 If not remove it from the list.
64 / 102
Step 1
Given a DAG G = (V , E) and |V | = N
Chordalize the graph using the elimination algorithm with an arbitrary
elimination ordering, if required.
For this, you can use the following greedy algorithm
Given a list of nodes:
1 Is the vertex simplicial? If it is not, make it simplicial.
2 If not remove it from the list.
64 / 102
Step 1
Given a DAG G = (V , E) and |V | = N
Chordalize the graph using the elimination algorithm with an arbitrary
elimination ordering, if required.
For this, you can use the following greedy algorithm
Given a list of nodes:
1 Is the vertex simplicial? If it is not, make it simplicial.
2 If not remove it from the list.
64 / 102
Step 1
Given a DAG G = (V , E) and |V | = N
Chordalize the graph using the elimination algorithm with an arbitrary
elimination ordering, if required.
For this, you can use the following greedy algorithm
Given a list of nodes:
1 Is the vertex simplicial? If it is not, make it simplicial.
2 If not remove it from the list.
64 / 102
Step 1
Another way
1 By the Moralization Procedure.
2 Triangulate the moral graph.
Moralization Procedure
1 Add edges between all pairs of nodes that have a common child.
2 Make all edges in the graph undirected.
Triangulate the moral graph
An undirected graph is triangulated if every cycle of length greater than 3
possesses a chord.
65 / 102
Step 1
Another way
1 By the Moralization Procedure.
2 Triangulate the moral graph.
Moralization Procedure
1 Add edges between all pairs of nodes that have a common child.
2 Make all edges in the graph undirected.
Triangulate the moral graph
An undirected graph is triangulated if every cycle of length greater than 3
possesses a chord.
65 / 102
Step 1
Another way
1 By the Moralization Procedure.
2 Triangulate the moral graph.
Moralization Procedure
1 Add edges between all pairs of nodes that have a common child.
2 Make all edges in the graph undirected.
Triangulate the moral graph
An undirected graph is triangulated if every cycle of length greater than 3
possesses a chord.
65 / 102
Step 2
Find the maximal cliques in the chordal graph
List the N Cliques
({vN } ∪ N (vN )) ∩ {v1, ..., vN }
({vN−1} ∪ N (vN−1)) ∩ {v1, ..., vN−1}
· · ·
{v1}
Note: If the graph is Chordal this is not necessary because all the
cliques are maximal.
66 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
This step can be implemented quickly in practice using a hash table
Running Time: O |C|2
D = O N2D
68 / 102
Step 4
Compute a maximum-weight spanning tree on the weighted clique
graph to obtain a junction tree
You can us for this the Kruskal and Prim for Maximum Weight Graph
We will give Kruskal’s algorithm
For finding the maximum-weight spanning tree
69 / 102
Step 4
Compute a maximum-weight spanning tree on the weighted clique
graph to obtain a junction tree
You can us for this the Kruskal and Prim for Maximum Weight Graph
We will give Kruskal’s algorithm
For finding the maximum-weight spanning tree
69 / 102
Step 4
Maximal Kruskal’s algorithm
Initialize an edgeless graph T with nodes that are all the maximal cliques
in our chordal graph.
Then
We will add edges to T until it becomes a junction tree.
Sort the m edges ei in our clique graph from step 3 by weight wi
We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1
70 / 102
Step 4
Maximal Kruskal’s algorithm
Initialize an edgeless graph T with nodes that are all the maximal cliques
in our chordal graph.
Then
We will add edges to T until it becomes a junction tree.
Sort the m edges ei in our clique graph from step 3 by weight wi
We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1
70 / 102
Step 4
Maximal Kruskal’s algorithm
Initialize an edgeless graph T with nodes that are all the maximal cliques
in our chordal graph.
Then
We will add edges to T until it becomes a junction tree.
Sort the m edges ei in our clique graph from step 3 by weight wi
We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1
70 / 102
Step 4
For i = 1, 2, ..., m
1 Add edge ei to T if it does not introduce a cycle.
2 If |C| − 1 edges have been added, quit.
Running Time given that |E| = O |C|2
O |C|2
log |C|2
= O |C|2
log |C| = O N2 log N
71 / 102
Step 4
For i = 1, 2, ..., m
1 Add edge ei to T if it does not introduce a cycle.
2 If |C| − 1 edges have been added, quit.
Running Time given that |E| = O |C|2
O |C|2
log |C|2
= O |C|2
log |C| = O N2 log N
71 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
72 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
73 / 102
How do you build a Junction Tree?
Given a General DAG
S
B L
F
E
T
X
A
Build a Chordal Graph
Moral Graph – marry common parents and remove arrows.
74 / 102
How do you build a Junction Tree?
Given a General DAG
S
B L
F
E
T
X
A
Build a Chordal Graph
Moral Graph – marry common parents and remove arrows.
S
B L
F
E
T
X
A
74 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
75 / 102
How do you build a Junction Tree?
Triangulate the moral graph
An undirected graph is triangulated if every cycle of length greater
than 3 possesses a chord.
S
B L
F
E
T
X
A
76 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
77 / 102
Listing of Cliques
Identify the Cliques
A clique is a subset of nodes which is complete (i.e. there is an edge
between every pair of nodes) and maximal.
S
B L
F
E
T
X
A {B,S,L}
{B,L,E}
{B,E,F}
{L,E,T}
{A,T}
{E,X}
78 / 102
Build the Clique Graph
Clique Graph
Add an edge between Cj and Ci with weight |Ci ∩ Cj| > 0
BSL
LET
BLE
EX
BEF
AT
1 1
1
2
2
2
1
1
1
79 / 102
Getting The Junction Tree
Run the Maximum Kruskal’s Algorithm
BSL
LET
BLE
EX
BEF
AT
1 1
1
2
2
2
1
1
1
80 / 102
Getting The Junction Tree
Finally
BSL
LET
BLE
EX
BEF
AT
81 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
82 / 102
Potential Representation for the Junction Tree
Something Notable
The joint probability distribution can now be represented in terms of
potential functions, φ.
This is defined in each clique and each separator
Thus
P (x) =
n
i=1 φC (xci )
m
j=1 φS xsj
where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and
xsj correspond to a separator.
Basic idea is to represent probability distribution corresponding to any
graph as a product of clique potentials
P (x) =
1
Z
n
i=1
φC (xci )
83 / 102
Potential Representation for the Junction Tree
Something Notable
The joint probability distribution can now be represented in terms of
potential functions, φ.
This is defined in each clique and each separator
Thus
P (x) =
n
i=1 φC (xci )
m
j=1 φS xsj
where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and
xsj correspond to a separator.
Basic idea is to represent probability distribution corresponding to any
graph as a product of clique potentials
P (x) =
1
Z
n
i=1
φC (xci )
83 / 102
Potential Representation for the Junction Tree
Something Notable
The joint probability distribution can now be represented in terms of
potential functions, φ.
This is defined in each clique and each separator
Thus
P (x) =
n
i=1 φC (xci )
m
j=1 φS xsj
where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and
xsj correspond to a separator.
Basic idea is to represent probability distribution corresponding to any
graph as a product of clique potentials
P (x) =
1
Z
n
i=1
φC (xci )
83 / 102
Then
Main idea
The idea is to transform one representation of the joint distribution to
another in which for each clique, c, the potential function gives the
marginal distribution for the variables in c, i.e.
φC (xc) = P (xc)
This will also apply for each separator, s.
84 / 102
Now, Initialization
To initialize the potential functions
1 Set all potentials to unity
2 For each variable, xi, select one node in the junction tree (i.e. one
clique) containing both that variable and its parents, pa(xi), in the
original DAG.
3 Multiply the potential by P (xi|pa (xi))
85 / 102
Now, Initialization
To initialize the potential functions
1 Set all potentials to unity
2 For each variable, xi, select one node in the junction tree (i.e. one
clique) containing both that variable and its parents, pa(xi), in the
original DAG.
3 Multiply the potential by P (xi|pa (xi))
85 / 102
Now, Initialization
To initialize the potential functions
1 Set all potentials to unity
2 For each variable, xi, select one node in the junction tree (i.e. one
clique) containing both that variable and its parents, pa(xi), in the
original DAG.
3 Multiply the potential by P (xi|pa (xi))
85 / 102
Then
For example, we have at the beginning φBSL = φBFL = φLX = 1,then
S
B L
F X
BSL BLF LX
After Initialization
86 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
87 / 102
Propagating Information in a Junction Tree
Passing Information using the separators
Passing information from one clique C1 to another C2 via the separator in
between them, S0, requires two steps
First Step
Obtain a new potential for S0 by marginalizing out the variables in C1 that
are not in S0:
φ∗
S0
=
C1−S0
φC1
88 / 102
Propagating Information in a Junction Tree
Passing Information using the separators
Passing information from one clique C1 to another C2 via the separator in
between them, S0, requires two steps
First Step
Obtain a new potential for S0 by marginalizing out the variables in C1 that
are not in S0:
φ∗
S0
=
C1−S0
φC1
88 / 102
Propagating Information in a Junction Tree
Second Step
Obtain a new potential for C2:
φ∗
C2
= φC2 λS0
Where
λS0 =
φ∗
S0
φS0
89 / 102
Propagating Information in a Junction Tree
Second Step
Obtain a new potential for C2:
φ∗
C2
= φC2 λS0
Where
λS0 =
φ∗
S0
φS0
89 / 102
An Example
Consider a flow from the clique {B,S,L} to {B,L,F}
BSL BL BLF L LX
90 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
91 / 102
An Example
Initial representation
φBSL = P (B|S) P (L|S) P (S)
l1 l2
s1, b1 0.00015 0.04985
s1, b2 0.00045 0.14955
s2, b1 0.000002 0.039998
s2, b2 0.000038 0.759962
φBL = 1
l1 l2
b1 1 1
b2 1 1
φBLF = P (F|B, L) P (B) P (L) = P (F|B, L)
l1 l2
f1, b1 0.75 0.1
f1, b2 0.5 0.05
f2, b1 0.25 0.9
f2, b2 0.5 0.95
92 / 102
An Example
After Flow
φBSL = P (B|S) P (L|S) P (S)
l1 l2
s1, b1 0.00015 0.04985
s1, b2 0.00045 0.14955
s2, b1 0.000002 0.039998
s2, b2 0.000038 0.759962
φBL = 1
l1 l2
b1 0.000152 0.089848
b2 0.000488 0.909512
φBLF = P (F|B, L)
l1 l2
f1, b1 0.000114 0.0089848
f1, b2 0.000244 0.0454756
f2, b1 0.000038 0.0808632
f2, b2 0.000244 0.8640364
93 / 102
Now Introduce Evidence
We have
A flow from the clique {B,S,L} to {B,L,F}, but this time we he
information that Joe is a smoker, S = s1.
Incorporation of Evidence
φBSL = P (B|S) P (L|S) P (S)
l1 l2
s1, b1 0.00015 0.04985
s1, b2 0.00045 0.14955
s2, b1 0 0
s2, b2 0 0
φBL = 1
l1 l2
b1 1 1
b2 1 1
φBLF = P (F|B, L)
l1 l2
f1, b1 0.75 0.1
f1, b2 0.5 0.05
f2, b1 0.25 0.9
f2, b2 0.5 0.95
94 / 102
Now Introduce Evidence
We have
A flow from the clique {B,S,L} to {B,L,F}, but this time we he
information that Joe is a smoker, S = s1.
Incorporation of Evidence
φBSL = P (B|S) P (L|S) P (S)
l1 l2
s1, b1 0.00015 0.04985
s1, b2 0.00045 0.14955
s2, b1 0 0
s2, b2 0 0
φBL = 1
l1 l2
b1 1 1
b2 1 1
φBLF = P (F|B, L)
l1 l2
f1, b1 0.75 0.1
f1, b2 0.5 0.05
f2, b1 0.25 0.9
f2, b2 0.5 0.95
94 / 102
An Example
After Flow
φBSL = P (B|S) P (L|S) P (S)
l1 l2
s1, b1 0.00015 0.04985
s1, b2 0.00045 0.14955
s2, b1 0 0
s2, b2 0 0
φBL = 1
l1 l2
b1 0.00015 0.04985
b2 0.00045 0.14955
φBLF = P (F|B, L)
l1 l2
f1, b1 0.0001125 0.004985
f1, b2 0.000245 0.0074775
f2, b1 0.0000375 0.044865
f2, b2 0.000255 0.1420725
95 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
96 / 102
The Full Propagation
Two phase propagation (Jensen et al, 1990)
1 Select an arbitrary clique, C0
2 Collection Phase – flows passed from periphery to C0
3 Distribution Phase – flows passed from C0 to periphery
97 / 102
The Full Propagation
Two phase propagation (Jensen et al, 1990)
1 Select an arbitrary clique, C0
2 Collection Phase – flows passed from periphery to C0
3 Distribution Phase – flows passed from C0 to periphery
97 / 102
The Full Propagation
Two phase propagation (Jensen et al, 1990)
1 Select an arbitrary clique, C0
2 Collection Phase – flows passed from periphery to C0
3 Distribution Phase – flows passed from C0 to periphery
97 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
98 / 102
Example
Distribution
BSL
LET
BLE
EX
BEF
AT
99 / 102
Example
Collection
BSL
LET
BLE
EX
BEF
AT
100 / 102
The Full Propagation
After the two propagation phases have been carried out
The Junction tree will be in equilibrium with each clique containing
the joint probability distribution for the variables it contains.
Marginal probabilities for individual variables can then be obtained
from the cliques.
Now, some evidence E can be included before propagation
By selecting a clique for each variable for which evidence is available.
The potential for the clique is then set to 0 for any configuration
which differs from the evidence.
101 / 102
The Full Propagation
After the two propagation phases have been carried out
The Junction tree will be in equilibrium with each clique containing
the joint probability distribution for the variables it contains.
Marginal probabilities for individual variables can then be obtained
from the cliques.
Now, some evidence E can be included before propagation
By selecting a clique for each variable for which evidence is available.
The potential for the clique is then set to 0 for any configuration
which differs from the evidence.
101 / 102
The Full Propagation
After the two propagation phases have been carried out
The Junction tree will be in equilibrium with each clique containing
the joint probability distribution for the variables it contains.
Marginal probabilities for individual variables can then be obtained
from the cliques.
Now, some evidence E can be included before propagation
By selecting a clique for each variable for which evidence is available.
The potential for the clique is then set to 0 for any configuration
which differs from the evidence.
101 / 102
The Full Propagation
After the two propagation phases have been carried out
The Junction tree will be in equilibrium with each clique containing
the joint probability distribution for the variables it contains.
Marginal probabilities for individual variables can then be obtained
from the cliques.
Now, some evidence E can be included before propagation
By selecting a clique for each variable for which evidence is available.
The potential for the clique is then set to 0 for any configuration
which differs from the evidence.
101 / 102
The Full Propagation
After propagation the result will be
P (x, E) = c∈C φc (xc, E)
s∈S φs (xs, E)
After normalization
P (x|E) = c∈C φc (xc|E)
s∈S φs (xs|E)
102 / 102
The Full Propagation
After propagation the result will be
P (x, E) = c∈C φc (xc, E)
s∈S φs (xs, E)
After normalization
P (x|E) = c∈C φc (xc|E)
s∈S φs (xs|E)
102 / 102

More Related Content

What's hot

23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature GenerationAndres Mendez-Vazquez
 
11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine LearningAndres Mendez-Vazquez
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variationsAndres Mendez-Vazquez
 
Fuzzy logic andits Applications
Fuzzy logic andits ApplicationsFuzzy logic andits Applications
Fuzzy logic andits ApplicationsDrATAMILARASIMCA
 
Lecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceLecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceasimnawaz54
 
07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation MaximizationAndres Mendez-Vazquez
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
27 Machine Learning Unsupervised Measure Properties
27 Machine Learning Unsupervised Measure Properties27 Machine Learning Unsupervised Measure Properties
27 Machine Learning Unsupervised Measure PropertiesAndres Mendez-Vazquez
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Christian Robert
 
Machine Learning and Data Mining - Decision Trees
Machine Learning and Data Mining - Decision TreesMachine Learning and Data Mining - Decision Trees
Machine Learning and Data Mining - Decision Treeswebisslides
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsMalik Sb
 
An Extension to the Zero-Inflated Generalized Power Series Distributions
An Extension to the Zero-Inflated Generalized Power Series DistributionsAn Extension to the Zero-Inflated Generalized Power Series Distributions
An Extension to the Zero-Inflated Generalized Power Series Distributionsinventionjournals
 
Matroid Basics
Matroid BasicsMatroid Basics
Matroid BasicsASPAK2014
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackarogozhnikov
 
EVEN GRACEFUL LABELLING OF A CLASS OF TREES
EVEN GRACEFUL LABELLING OF A CLASS OF TREESEVEN GRACEFUL LABELLING OF A CLASS OF TREES
EVEN GRACEFUL LABELLING OF A CLASS OF TREESFransiskeran
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeTwo Sigma
 

What's hot (20)

23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature Generation
 
11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Fuzzy logic andits Applications
Fuzzy logic andits ApplicationsFuzzy logic andits Applications
Fuzzy logic andits Applications
 
Lecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceLecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inference
 
07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
27 Machine Learning Unsupervised Measure Properties
27 Machine Learning Unsupervised Measure Properties27 Machine Learning Unsupervised Measure Properties
27 Machine Learning Unsupervised Measure Properties
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
 
Machine Learning and Data Mining - Decision Trees
Machine Learning and Data Mining - Decision TreesMachine Learning and Data Mining - Decision Trees
Machine Learning and Data Mining - Decision Trees
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Module 4 part_1
Module 4 part_1Module 4 part_1
Module 4 part_1
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
An Extension to the Zero-Inflated Generalized Power Series Distributions
An Extension to the Zero-Inflated Generalized Power Series DistributionsAn Extension to the Zero-Inflated Generalized Power Series Distributions
An Extension to the Zero-Inflated Generalized Power Series Distributions
 
Matroid Basics
Matroid BasicsMatroid Basics
Matroid Basics
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
EVEN GRACEFUL LABELLING OF A CLASS OF TREES
EVEN GRACEFUL LABELLING OF A CLASS OF TREESEVEN GRACEFUL LABELLING OF A CLASS OF TREES
EVEN GRACEFUL LABELLING OF A CLASS OF TREES
 
Astaño 4
Astaño 4Astaño 4
Astaño 4
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
 

Viewers also liked

02 probabilistic inference in graphical models
02 probabilistic inference in graphical models02 probabilistic inference in graphical models
02 probabilistic inference in graphical modelszukun
 
Bayesian Networks with R and Hadoop
Bayesian Networks with R and HadoopBayesian Networks with R and Hadoop
Bayesian Networks with R and HadoopDataWorks Summit
 
World Cup Qualification Prediction - How it works
World Cup Qualification Prediction - How it worksWorld Cup Qualification Prediction - How it works
World Cup Qualification Prediction - How it worksBayesiaWC2010
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talkjasonj383
 
Efficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth FindingEfficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth FindingSamantha Luber
 
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache GiraphAvery Ching
 
A Movement Recognition Method using LBP
A Movement Recognition Method using LBPA Movement Recognition Method using LBP
A Movement Recognition Method using LBPZihui Li
 
SocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceSocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceYuto Yamaguchi
 
Peter LaBrash resume
Peter LaBrash resumePeter LaBrash resume
Peter LaBrash resumePeter LaBrash
 
Presentación Iriana Colina Seguridad industrial 2 semestre
Presentación Iriana Colina Seguridad industrial 2 semestrePresentación Iriana Colina Seguridad industrial 2 semestre
Presentación Iriana Colina Seguridad industrial 2 semestrealbanyta
 
áGuia azul gracilene pinto
áGuia azul gracilene pintoáGuia azul gracilene pinto
áGuia azul gracilene pintoLuzia Gabriele
 
Introducción a la computacón
Introducción a la computacónIntroducción a la computacón
Introducción a la computacónCabrera Miguel
 
How to prepare for class 12 board exams
How to prepare for class 12 board examsHow to prepare for class 12 board exams
How to prepare for class 12 board examsEdnexa
 
Digimon leonardo zec & ivan đopar 7.d
Digimon   leonardo zec & ivan đopar 7.dDigimon   leonardo zec & ivan đopar 7.d
Digimon leonardo zec & ivan đopar 7.dsilvijedevald
 
Cake Phpで簡単問い合わせフォームの作り方
Cake Phpで簡単問い合わせフォームの作り方Cake Phpで簡単問い合わせフォームの作り方
Cake Phpで簡単問い合わせフォームの作り方柴田 篤志
 

Viewers also liked (18)

02 probabilistic inference in graphical models
02 probabilistic inference in graphical models02 probabilistic inference in graphical models
02 probabilistic inference in graphical models
 
Bayesian Networks with R and Hadoop
Bayesian Networks with R and HadoopBayesian Networks with R and Hadoop
Bayesian Networks with R and Hadoop
 
Bn presentation
Bn presentationBn presentation
Bn presentation
 
World Cup Qualification Prediction - How it works
World Cup Qualification Prediction - How it worksWorld Cup Qualification Prediction - How it works
World Cup Qualification Prediction - How it works
 
C04922125
C04922125C04922125
C04922125
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talk
 
Efficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth FindingEfficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth Finding
 
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
 
Bayes Belief Network
Bayes Belief NetworkBayes Belief Network
Bayes Belief Network
 
A Movement Recognition Method using LBP
A Movement Recognition Method using LBPA Movement Recognition Method using LBP
A Movement Recognition Method using LBP
 
SocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceSocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with Confidence
 
Peter LaBrash resume
Peter LaBrash resumePeter LaBrash resume
Peter LaBrash resume
 
Presentación Iriana Colina Seguridad industrial 2 semestre
Presentación Iriana Colina Seguridad industrial 2 semestrePresentación Iriana Colina Seguridad industrial 2 semestre
Presentación Iriana Colina Seguridad industrial 2 semestre
 
áGuia azul gracilene pinto
áGuia azul gracilene pintoáGuia azul gracilene pinto
áGuia azul gracilene pinto
 
Introducción a la computacón
Introducción a la computacónIntroducción a la computacón
Introducción a la computacón
 
How to prepare for class 12 board exams
How to prepare for class 12 board examsHow to prepare for class 12 board exams
How to prepare for class 12 board exams
 
Digimon leonardo zec & ivan đopar 7.d
Digimon   leonardo zec & ivan đopar 7.dDigimon   leonardo zec & ivan đopar 7.d
Digimon leonardo zec & ivan đopar 7.d
 
Cake Phpで簡単問い合わせフォームの作り方
Cake Phpで簡単問い合わせフォームの作り方Cake Phpで簡単問い合わせフォームの作り方
Cake Phpで簡単問い合わせフォームの作り方
 

Similar to Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junction Trees

Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Distance Based Indexing
Distance Based IndexingDistance Based Indexing
Distance Based Indexingketan533
 
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)shreemadghodasra
 
Application of partial derivatives with two variables
Application of partial derivatives with two variablesApplication of partial derivatives with two variables
Application of partial derivatives with two variablesSagar Patel
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesAlexander Decker
 
The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...Alexander Decker
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...Joe Suzuki
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream MiningAlbert Bifet
 
Distributionworkshop 2.pptx
Distributionworkshop 2.pptxDistributionworkshop 2.pptx
Distributionworkshop 2.pptxnagalakshmig4
 
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)Dahua Lin
 
A generalisation of the ratio-of-uniform algorithm
A generalisation of the ratio-of-uniform algorithmA generalisation of the ratio-of-uniform algorithm
A generalisation of the ratio-of-uniform algorithmChristian Robert
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1aGaston Liberman
 
(C f)- weak contraction in cone metric spaces
(C f)- weak contraction in cone metric spaces(C f)- weak contraction in cone metric spaces
(C f)- weak contraction in cone metric spacesAlexander Decker
 

Similar to Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junction Trees (20)

Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Distance Based Indexing
Distance Based IndexingDistance Based Indexing
Distance Based Indexing
 
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
 
Application of partial derivatives with two variables
Application of partial derivatives with two variablesApplication of partial derivatives with two variables
Application of partial derivatives with two variables
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spaces
 
Approximate Tree Kernels
Approximate Tree KernelsApproximate Tree Kernels
Approximate Tree Kernels
 
The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...
 
Pmath 351 note
Pmath 351 notePmath 351 note
Pmath 351 note
 
i2ml3e-chap2.pptx
i2ml3e-chap2.pptxi2ml3e-chap2.pptx
i2ml3e-chap2.pptx
 
I2ml3e chap2
I2ml3e chap2I2ml3e chap2
I2ml3e chap2
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
M03 nb-02
M03 nb-02M03 nb-02
M03 nb-02
 
Distributionworkshop 2.pptx
Distributionworkshop 2.pptxDistributionworkshop 2.pptx
Distributionworkshop 2.pptx
 
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
A generalisation of the ratio-of-uniform algorithm
A generalisation of the ratio-of-uniform algorithmA generalisation of the ratio-of-uniform algorithm
A generalisation of the ratio-of-uniform algorithm
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1a
 
The integral
The integralThe integral
The integral
 
(C f)- weak contraction in cone metric spaces
(C f)- weak contraction in cone metric spaces(C f)- weak contraction in cone metric spaces
(C f)- weak contraction in cone metric spaces
 

More from Andres Mendez-Vazquez

01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectorsAndres Mendez-Vazquez
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issuesAndres Mendez-Vazquez
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiationAndres Mendez-Vazquez
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep LearningAndres Mendez-Vazquez
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusAndres Mendez-Vazquez
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusAndres Mendez-Vazquez
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesAndres Mendez-Vazquez
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusAndres Mendez-Vazquez
 

More from Andres Mendez-Vazquez (20)

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
 
Zetta global
Zetta globalZetta global
Zetta global
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data Sciences
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems Syllabus
 

Recently uploaded

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 

Recently uploaded (20)

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 

Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junction Trees

  • 1. Artificial Intelligence Belief Propagation and Junction Trees Andres Mendez-Vazquez March 28, 2016 1 / 102
  • 2. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 2 / 102
  • 3. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 3 / 102
  • 4. Introduction We will be looking at the following algorithms Pearl’s Belief Propagation Algorithm Junction Tree Algorithm Belief Propagation Algorithm The algorithm was first proposed by Judea Pearl in 1982, who formulated this algorithm on trees, and was later extended to polytrees. 4 / 102
  • 5. Introduction We will be looking at the following algorithms Pearl’s Belief Propagation Algorithm Junction Tree Algorithm Belief Propagation Algorithm The algorithm was first proposed by Judea Pearl in 1982, who formulated this algorithm on trees, and was later extended to polytrees. 4 / 102
  • 6. Introduction We will be looking at the following algorithms Pearl’s Belief Propagation Algorithm Junction Tree Algorithm Belief Propagation Algorithm The algorithm was first proposed by Judea Pearl in 1982, who formulated this algorithm on trees, and was later extended to polytrees. A C D B E F G H I 4 / 102
  • 7. Introduction Something Notable It has since been shown to be a useful approximate algorithm on general graphs. Junction Tree Algorithm The junction tree algorithm (also known as ’Clique Tree’) is a method used in machine learning to extract marginalization in general graphs. it entails performing belief propagation on a modified graph called a junction tree by cycle elimination 5 / 102
  • 8. Introduction Something Notable It has since been shown to be a useful approximate algorithm on general graphs. Junction Tree Algorithm The junction tree algorithm (also known as ’Clique Tree’) is a method used in machine learning to extract marginalization in general graphs. it entails performing belief propagation on a modified graph called a junction tree by cycle elimination 5 / 102
  • 9. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 6 / 102
  • 10. Example The Message Passing Stuff 7 / 102
  • 11. Thus We can do the following To pass information from below and from above to a certain node V . Thus We call those messages π from above. λ from below. 8 / 102
  • 12. Thus We can do the following To pass information from below and from above to a certain node V . Thus We call those messages π from above. λ from below. 8 / 102
  • 13. Thus We can do the following To pass information from below and from above to a certain node V . Thus We call those messages π from above. λ from below. 8 / 102
  • 14. Thus We can do the following To pass information from below and from above to a certain node V . Thus We call those messages π from above. λ from below. 8 / 102
  • 15. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 9 / 102
  • 16. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102
  • 17. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102
  • 18. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102
  • 19. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102
  • 20. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102
  • 21. Then Let DX be the subset of A Containing all members that are in the subtree rooted at X Including X if X ∈ A Let NX be the subset Containing all members of A that are non-descendant’s of X. This set includes X if X ∈ A 11 / 102
  • 22. Then Let DX be the subset of A Containing all members that are in the subtree rooted at X Including X if X ∈ A Let NX be the subset Containing all members of A that are non-descendant’s of X. This set includes X if X ∈ A 11 / 102
  • 23. Then Let DX be the subset of A Containing all members that are in the subtree rooted at X Including X if X ∈ A Let NX be the subset Containing all members of A that are non-descendant’s of X. This set includes X if X ∈ A 11 / 102
  • 24. Then Let DX be the subset of A Containing all members that are in the subtree rooted at X Including X if X ∈ A Let NX be the subset Containing all members of A that are non-descendant’s of X. This set includes X if X ∈ A 11 / 102
  • 25. Example We have that A = NX ∪ DX A X 12 / 102
  • 26. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 27. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 28. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 29. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 30. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 31. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 32. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 33. Thus We have for each value of x P (x|A) = P (dX |x) P (x|nX ) P (dX |nX ) = βP (dX |x) P (x|nX ) where β, the normalizing factor, is a constant not depending on x. 14 / 102
  • 34. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 15 / 102
  • 35. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 36. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 37. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 38. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 39. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 40. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 41. Developing λ (x) We need λ (x) P (dX |x) Case 1: X ∈ A and X ∈ DX Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx Thus, to achieve proportionality, we can set λ (ˆx) ≡ 1 λ (x) ≡ 0 for x = ˆx 17 / 102
  • 42. Developing λ (x) We need λ (x) P (dX |x) Case 1: X ∈ A and X ∈ DX Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx Thus, to achieve proportionality, we can set λ (ˆx) ≡ 1 λ (x) ≡ 0 for x = ˆx 17 / 102
  • 43. Developing λ (x) We need λ (x) P (dX |x) Case 1: X ∈ A and X ∈ DX Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx Thus, to achieve proportionality, we can set λ (ˆx) ≡ 1 λ (x) ≡ 0 for x = ˆx 17 / 102
  • 44. Developing λ (x) We need λ (x) P (dX |x) Case 1: X ∈ A and X ∈ DX Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx Thus, to achieve proportionality, we can set λ (ˆx) ≡ 1 λ (x) ≡ 0 for x = ˆx 17 / 102
  • 45. Now Case 2: X /∈ A and X is a leaf Then, dX = ∅ and P (dX |x) = P (∅|x) = 1 for all values of x Thus, to achieve proportionality, we can set λ (x) ≡ 1 for all values of x 18 / 102
  • 46. Now Case 2: X /∈ A and X is a leaf Then, dX = ∅ and P (dX |x) = P (∅|x) = 1 for all values of x Thus, to achieve proportionality, we can set λ (x) ≡ 1 for all values of x 18 / 102
  • 47. Finally Case 3: X /∈ A and X is a non-leaf Let Y be X’s left child, W be X’s right child. Since X /∈ A DX = DY ∪ DW 19 / 102
  • 48. Finally Case 3: X /∈ A and X is a non-leaf Let Y be X’s left child, W be X’s right child. Since X /∈ A DX = DY ∪ DW X Y W 19 / 102
  • 49. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 50. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 51. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 52. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 53. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 54. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 55. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 56. Thus We have then λ (x) = λY (x) λW (x) for all values x 21 / 102
  • 57. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 58. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 59. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 60. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 61. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 62. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 63. Now Case 2: X /∈ A and X is the root In this specific case nX = ∅ or the empty set of random variables. Then P (x|nX ) = P (x|∅) = P (x) for all values of x Enforcing the proportionality, we get π (x) ≡ P (x) for all values of x 23 / 102
  • 64. Now Case 2: X /∈ A and X is the root In this specific case nX = ∅ or the empty set of random variables. Then P (x|nX ) = P (x|∅) = P (x) for all values of x Enforcing the proportionality, we get π (x) ≡ P (x) for all values of x 23 / 102
  • 65. Now Case 2: X /∈ A and X is the root In this specific case nX = ∅ or the empty set of random variables. Then P (x|nX ) = P (x|∅) = P (x) for all values of x Enforcing the proportionality, we get π (x) ≡ P (x) for all values of x 23 / 102
  • 66. Then Case 3: X /∈ A and X is not the root Without loss of generality assume X is Z’s right child and T is the Z’s left child Then, NX = NZ ∪ DT 24 / 102
  • 67. Then Case 3: X /∈ A and X is not the root Without loss of generality assume X is Z’s right child and T is the Z’s left child Then, NX = NZ ∪ DT Z T X 24 / 102
  • 68. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 69. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 70. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 71. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 72. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 73. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 74. Last Step We have P (x|nX ) = z P (x|z) P (z|nZ ) P (nZ ) P (dT |z) P (nZ , dT ) = γ z P (x|z) π (z) λT (z) where γ = P(nZ ) P(nZ ,dT ) Thus, we can achieve proportionality by πX (z) ≡ π (z) λT (z) Then, setting π (x) ≡ z P (x|z) πX (z) for all values of x 26 / 102
  • 75. Last Step We have P (x|nX ) = z P (x|z) P (z|nZ ) P (nZ ) P (dT |z) P (nZ , dT ) = γ z P (x|z) π (z) λT (z) where γ = P(nZ ) P(nZ ,dT ) Thus, we can achieve proportionality by πX (z) ≡ π (z) λT (z) Then, setting π (x) ≡ z P (x|z) πX (z) for all values of x 26 / 102
  • 76. Last Step We have P (x|nX ) = z P (x|z) P (z|nZ ) P (nZ ) P (dT |z) P (nZ , dT ) = γ z P (x|z) π (z) λT (z) where γ = P(nZ ) P(nZ ,dT ) Thus, we can achieve proportionality by πX (z) ≡ π (z) λT (z) Then, setting π (x) ≡ z P (x|z) πX (z) for all values of x 26 / 102
  • 77. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 27 / 102
  • 78. How do we implement this? We require the following functions initial_tree update-tree intial_tree has the following input and outputs Input: ((G, P), A, a, P (x|a)) Output: After this call A and a are both empty making P (x|a) the prior probability of x. Then each time a variable V is instantiated for ˆv the routine update-tree is called Input: ((G, P), A, a, V , ˆv, P (x|a)) Output: After this call V has been added to A, ˆv has been added to a and for every value of x, P (x|a) has been updated to be the conditional probability of x given the new a. 28 / 102
  • 79. How do we implement this? We require the following functions initial_tree update-tree intial_tree has the following input and outputs Input: ((G, P), A, a, P (x|a)) Output: After this call A and a are both empty making P (x|a) the prior probability of x. Then each time a variable V is instantiated for ˆv the routine update-tree is called Input: ((G, P), A, a, V , ˆv, P (x|a)) Output: After this call V has been added to A, ˆv has been added to a and for every value of x, P (x|a) has been updated to be the conditional probability of x given the new a. 28 / 102
  • 80. How do we implement this? We require the following functions initial_tree update-tree intial_tree has the following input and outputs Input: ((G, P), A, a, P (x|a)) Output: After this call A and a are both empty making P (x|a) the prior probability of x. Then each time a variable V is instantiated for ˆv the routine update-tree is called Input: ((G, P), A, a, V , ˆv, P (x|a)) Output: After this call V has been added to A, ˆv has been added to a and for every value of x, P (x|a) has been updated to be the conditional probability of x given the new a. 28 / 102
  • 81. Algorithm: Inference-in-trees Problem Given a Bayesian network whose DAG is a tree, determine the probabilities of the values of each node conditional on specified values of the nodes in some subset. Input Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a set of values a of a subset A ⊆ V. Output The Bayesian network (G, P) updated according to the values in a. The λ and π values and messages and P(x|a) for each X∈V are considered part of the network. 29 / 102
  • 82. Algorithm: Inference-in-trees Problem Given a Bayesian network whose DAG is a tree, determine the probabilities of the values of each node conditional on specified values of the nodes in some subset. Input Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a set of values a of a subset A ⊆ V. Output The Bayesian network (G, P) updated according to the values in a. The λ and π values and messages and P(x|a) for each X∈V are considered part of the network. 29 / 102
  • 83. Algorithm: Inference-in-trees Problem Given a Bayesian network whose DAG is a tree, determine the probabilities of the values of each node conditional on specified values of the nodes in some subset. Input Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a set of values a of a subset A ⊆ V. Output The Bayesian network (G, P) updated according to the values in a. The λ and π values and messages and P(x|a) for each X∈V are considered part of the network. 29 / 102
  • 84. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 85. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 86. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 87. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 88. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 89. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 90. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 91. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 92. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 93. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 94. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 95. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 96. Sending the λ message void send_λ_msg(node Y , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of x) 2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message 3 λ (x) = U∈CHX λU (x) // Compute X s λ values 4 P(x|a) = αλ (x) π (x) // Compute P(x|a) 5 normalize P(x|a) 6 if (X is not the root and X s parent Z /∈A) 7 send_λ_msg(X, Z) 8 for (each child W of X such that W = Y and W ∈A) ) 9 send_π_msg(X, W ) 32 / 102
  • 97. Sending the λ message void send_λ_msg(node Y , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of x) 2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message 3 λ (x) = U∈CHX λU (x) // Compute X s λ values 4 P(x|a) = αλ (x) π (x) // Compute P(x|a) 5 normalize P(x|a) 6 if (X is not the root and X s parent Z /∈A) 7 send_λ_msg(X, Z) 8 for (each child W of X such that W = Y and W ∈A) ) 9 send_π_msg(X, W ) 32 / 102
  • 98. Sending the λ message void send_λ_msg(node Y , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of x) 2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message 3 λ (x) = U∈CHX λU (x) // Compute X s λ values 4 P(x|a) = αλ (x) π (x) // Compute P(x|a) 5 normalize P(x|a) 6 if (X is not the root and X s parent Z /∈A) 7 send_λ_msg(X, Z) 8 for (each child W of X such that W = Y and W ∈A) ) 9 send_π_msg(X, W ) 32 / 102
  • 99. Sending the λ message void send_λ_msg(node Y , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of x) 2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message 3 λ (x) = U∈CHX λU (x) // Compute X s λ values 4 P(x|a) = αλ (x) π (x) // Compute P(x|a) 5 normalize P(x|a) 6 if (X is not the root and X s parent Z /∈A) 7 send_λ_msg(X, Z) 8 for (each child W of X such that W = Y and W ∈A) ) 9 send_π_msg(X, W ) 32 / 102
  • 100. Sending the π message void send_π_msg(node Z , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of z) 2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π message 3 for (each value of x) 4 π (x) = z P (x|z) πX (z) // ComputeX s π values 5 P(x|a) = αλ (x) π (x) // Compute P(x|a) 6 normalize P(x|a) 7 for (each child Y of X such that Y /∈A) ) 8 send_π_msg(X, Y ) 33 / 102
  • 101. Sending the π message void send_π_msg(node Z , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of z) 2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π message 3 for (each value of x) 4 π (x) = z P (x|z) πX (z) // ComputeX s π values 5 P(x|a) = αλ (x) π (x) // Compute P(x|a) 6 normalize P(x|a) 7 for (each child Y of X such that Y /∈A) ) 8 send_π_msg(X, Y ) 33 / 102
  • 102. Sending the π message void send_π_msg(node Z , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of z) 2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π message 3 for (each value of x) 4 π (x) = z P (x|z) πX (z) // ComputeX s π values 5 P(x|a) = αλ (x) π (x) // Compute P(x|a) 6 normalize P(x|a) 7 for (each child Y of X such that Y /∈A) ) 8 send_π_msg(X, Y ) 33 / 102
  • 103. Example of Tree Initialization We have then 34 / 102
  • 104. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 105. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 106. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 107. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 108. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 109. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 110. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 111. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 112. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 113. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 114. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 115. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 116. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 117. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 118. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 119. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 120. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 121. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 122. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 123. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 124. The call send_π_msg(H, B) Compute P (b|∅) P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α Then, normalize P (b1|∅) = 0.09α 0.09α + 0.91α = 0.09 P (b2|∅) = 0.91α 0.09α + 0.91α = 0.91 38 / 102
  • 125. The call send_π_msg(H, B) Compute P (b|∅) P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α Then, normalize P (b1|∅) = 0.09α 0.09α + 0.91α = 0.09 P (b2|∅) = 0.91α 0.09α + 0.91α = 0.91 38 / 102
  • 126. The call send_π_msg(H, B) Compute P (b|∅) P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α Then, normalize P (b1|∅) = 0.09α 0.09α + 0.91α = 0.09 P (b2|∅) = 0.91α 0.09α + 0.91α = 0.91 38 / 102
  • 127. The call send_π_msg(H, B) Compute P (b|∅) P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α Then, normalize P (b1|∅) = 0.09α 0.09α + 0.91α = 0.09 P (b2|∅) = 0.91α 0.09α + 0.91α = 0.91 38 / 102
  • 128. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 129. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 130. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 131. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 132. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 133. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 134. Send the call send_π_msg(H, L) Then, normalize P (l1|∅) = 0.00064α 0.00064α + 0.99936α = 0.00064 P (l2|∅) = 0.99936α 0.00064α + 0.99936α = 0.99936 40 / 102
  • 135. Send the call send_π_msg(H, L) Then, normalize P (l1|∅) = 0.00064α 0.00064α + 0.99936α = 0.00064 P (l2|∅) = 0.99936α 0.00064α + 0.99936α = 0.99936 40 / 102
  • 136. Send the call send_π_msg(L, C) L sends C a π message πC (l1) = π (l1) = 0.00064 πC (l2) = π (l2) = 0.99936 Compute C s π values π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2) = (0.6) (0.00064) + (0.02) (0.99936) = 0.02037 π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2) = (0.4) (0.00064) + (0.98) (0.99936) = 0.97963 41 / 102
  • 137. Send the call send_π_msg(L, C) L sends C a π message πC (l1) = π (l1) = 0.00064 πC (l2) = π (l2) = 0.99936 Compute C s π values π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2) = (0.6) (0.00064) + (0.02) (0.99936) = 0.02037 π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2) = (0.4) (0.00064) + (0.98) (0.99936) = 0.97963 41 / 102
  • 138. Send the call send_π_msg(L, C) L sends C a π message πC (l1) = π (l1) = 0.00064 πC (l2) = π (l2) = 0.99936 Compute C s π values π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2) = (0.6) (0.00064) + (0.02) (0.99936) = 0.02037 π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2) = (0.4) (0.00064) + (0.98) (0.99936) = 0.97963 41 / 102
  • 139. Send the call send_π_msg(L, C) Compute P (c|∅) P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α Normalize P (c1|∅) = 0.02037α 0.02037α + 0.97963α = 0.02037 P (c2|∅) = 0.99936α 0.02037α + 0.97963α = 0.97963 42 / 102
  • 140. Send the call send_π_msg(L, C) Compute P (c|∅) P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α Normalize P (c1|∅) = 0.02037α 0.02037α + 0.97963α = 0.02037 P (c2|∅) = 0.99936α 0.02037α + 0.97963α = 0.97963 42 / 102
  • 141. Send the call send_π_msg(L, C) Compute P (c|∅) P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α Normalize P (c1|∅) = 0.02037α 0.02037α + 0.97963α = 0.02037 P (c2|∅) = 0.99936α 0.02037α + 0.97963α = 0.97963 42 / 102
  • 142. Final Graph We have then H B L C 43 / 102
  • 143. For the Generalization Please look at... Look at pages 123 - 156 at Richard E. Neapolitan. 2003. Learning Bayesian Networks. Prentice-Hall, Inc 44 / 102
  • 144. History Invented in 1988 Invented by Lauritzen and Spiegelhalter, 1988 Something Notable The general idea is that the propagation of evidence through the network can be carried out more efficiently by representing the joint probability distribution on an undirected graph called the Junction tree (or Join tree). 45 / 102
  • 145. History Invented in 1988 Invented by Lauritzen and Spiegelhalter, 1988 Something Notable The general idea is that the propagation of evidence through the network can be carried out more efficiently by representing the joint probability distribution on an undirected graph called the Junction tree (or Join tree). 45 / 102
  • 146. More in the Intuition High-level Intuition Computing marginals is straightforward in a tree structure. 46 / 102
  • 147. Junction Tree Characteristics The junction tree has the following characteristics It is an undirected tree Its nodes are clusters of variables (i.e. from the original BN) Given two clusters, C1 and C2, every node on the path between them contains their intersection C1 ∩ C2 In addition A Separator, S, is associated with each edge and contains the variables in the intersection between neighboring nodes 47 / 102
  • 148. Junction Tree Characteristics The junction tree has the following characteristics It is an undirected tree Its nodes are clusters of variables (i.e. from the original BN) Given two clusters, C1 and C2, every node on the path between them contains their intersection C1 ∩ C2 In addition A Separator, S, is associated with each edge and contains the variables in the intersection between neighboring nodes 47 / 102
  • 149. Junction Tree Characteristics The junction tree has the following characteristics It is an undirected tree Its nodes are clusters of variables (i.e. from the original BN) Given two clusters, C1 and C2, every node on the path between them contains their intersection C1 ∩ C2 In addition A Separator, S, is associated with each edge and contains the variables in the intersection between neighboring nodes 47 / 102
  • 150. Junction Tree Characteristics The junction tree has the following characteristics It is an undirected tree Its nodes are clusters of variables (i.e. from the original BN) Given two clusters, C1 and C2, every node on the path between them contains their intersection C1 ∩ C2 In addition A Separator, S, is associated with each edge and contains the variables in the intersection between neighboring nodes ABC BC BCD CD CDE S 47 / 102
  • 151. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 48 / 102
  • 152. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 49 / 102
  • 153. Simplicial Node Simplicial Node In a graph G, a vertex v is called simplicial if and only if the subgraph of G induced by the vertex set {v} ∪ N (v) is a clique. N (v) is the neighbor of v in the Graph. 50 / 102
  • 154. Example Vertex 3 is simplicial, while 4 is not 1 2 3 4 51 / 102
  • 155. Perfect Elimination Ordering Definition A graph G on n vertices is said to have a perfect elimination ordering if and only if there is an ordering {v1, ..., vn} of G’s vertices, such that each vi is simplicial in the subgraph induced by the vertices {v1, ..., vi}. 52 / 102
  • 156. Chordal Graph Definition A Chordal Graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Definition For any two vertices x, y ∈ G such that (x, y) ∈ E, a x − y separator is a set S ⊂ V such that the graph G − S has at least two disjoint connected components, one of which contains x and another of which contains y. 53 / 102
  • 157. Chordal Graph Definition A Chordal Graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Definition For any two vertices x, y ∈ G such that (x, y) ∈ E, a x − y separator is a set S ⊂ V such that the graph G − S has at least two disjoint connected components, one of which contains x and another of which contains y. 53 / 102
  • 158. Chordal Graph Theorem For a graph G on n vertices, the following conditions are equivalent: 1 G has a perfect elimination ordering. 2 G is chordal. 3 If H is any induced subgraph of G and S is a vertex separator of H of minimal size, S’s vertices induce a clique. 54 / 102
  • 159. Chordal Graph Theorem For a graph G on n vertices, the following conditions are equivalent: 1 G has a perfect elimination ordering. 2 G is chordal. 3 If H is any induced subgraph of G and S is a vertex separator of H of minimal size, S’s vertices induce a clique. 54 / 102
  • 160. Chordal Graph Theorem For a graph G on n vertices, the following conditions are equivalent: 1 G has a perfect elimination ordering. 2 G is chordal. 3 If H is any induced subgraph of G and S is a vertex separator of H of minimal size, S’s vertices induce a clique. 54 / 102
  • 161. Chordal Graph Theorem For a graph G on n vertices, the following conditions are equivalent: 1 G has a perfect elimination ordering. 2 G is chordal. 3 If H is any induced subgraph of G and S is a vertex separator of H of minimal size, S’s vertices induce a clique. 54 / 102
  • 162. Maximal Clique Definition A maximal clique is a clique that cannot be extended by including one more adjacent vertex, meaning it is not a subset of a larger clique. We have the the following Claims 1 A chordal graph with N vertices can have no more than N maximal cliques. 2 Given a chordal graph with G = (V , E), where |V | = N , there exists an algorithm to find all the maximal cliques of G which takes no more than O N4 time. 55 / 102
  • 163. Maximal Clique Definition A maximal clique is a clique that cannot be extended by including one more adjacent vertex, meaning it is not a subset of a larger clique. We have the the following Claims 1 A chordal graph with N vertices can have no more than N maximal cliques. 2 Given a chordal graph with G = (V , E), where |V | = N , there exists an algorithm to find all the maximal cliques of G which takes no more than O N4 time. 55 / 102
  • 164. Maximal Clique Definition A maximal clique is a clique that cannot be extended by including one more adjacent vertex, meaning it is not a subset of a larger clique. We have the the following Claims 1 A chordal graph with N vertices can have no more than N maximal cliques. 2 Given a chordal graph with G = (V , E), where |V | = N , there exists an algorithm to find all the maximal cliques of G which takes no more than O N4 time. 55 / 102
  • 165. Elimination Clique Definition (Elimination Clique) Given a chordal graph G, and an elimination ordering for G which does not add any edges. Suppose node i (Assuming a Labeling) is eliminated in some step of the elimination algorithm, then the clique consisting of the node i along with its neighbors during the elimination step (which must be fully connected since elimination does not add edges) is called an elimination clique. Formally Suppose node i is eliminated in the kth step of the algorithm, and let G(k) be the graph just before the kth elimination step. Then, the clique Ci = {i} ∪ N(k) (i) where N(k) (i) is the neighbor of i in the Graph G(k). 56 / 102
  • 166. Elimination Clique Definition (Elimination Clique) Given a chordal graph G, and an elimination ordering for G which does not add any edges. Suppose node i (Assuming a Labeling) is eliminated in some step of the elimination algorithm, then the clique consisting of the node i along with its neighbors during the elimination step (which must be fully connected since elimination does not add edges) is called an elimination clique. Formally Suppose node i is eliminated in the kth step of the algorithm, and let G(k) be the graph just before the kth elimination step. Then, the clique Ci = {i} ∪ N(k) (i) where N(k) (i) is the neighbor of i in the Graph G(k). 56 / 102
  • 167. From This Theorem Given a chordal graph and an elimination ordering which does not add any edges. Let C be the set of maximal cliques in the chordal graph, and let Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique is also an elimination clique for this particular ordering. Something Notable The theorem proves the 2nd claims given earlier. Firstly, it shows that a chordal graph cannot have more than N maximal cliques, since we have only N elimination cliques. It is more It gives us an efficient algorithm for finding these N maximal cliques. Simply go over each elimination clique and check whether it is maximal. 57 / 102
  • 168. From This Theorem Given a chordal graph and an elimination ordering which does not add any edges. Let C be the set of maximal cliques in the chordal graph, and let Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique is also an elimination clique for this particular ordering. Something Notable The theorem proves the 2nd claims given earlier. Firstly, it shows that a chordal graph cannot have more than N maximal cliques, since we have only N elimination cliques. It is more It gives us an efficient algorithm for finding these N maximal cliques. Simply go over each elimination clique and check whether it is maximal. 57 / 102
  • 169. From This Theorem Given a chordal graph and an elimination ordering which does not add any edges. Let C be the set of maximal cliques in the chordal graph, and let Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique is also an elimination clique for this particular ordering. Something Notable The theorem proves the 2nd claims given earlier. Firstly, it shows that a chordal graph cannot have more than N maximal cliques, since we have only N elimination cliques. It is more It gives us an efficient algorithm for finding these N maximal cliques. Simply go over each elimination clique and check whether it is maximal. 57 / 102
  • 170. Therefore Even with a brute force approach It will not take more than |Ce|2 × D = O N3 with D = maxC∈C |C|. Because Since both clique size and number of elimination cliques is bounded by N Observation The maximum clique problem, which is NP-hard on general graphs, is easy on chordal graphs. 58 / 102
  • 171. Therefore Even with a brute force approach It will not take more than |Ce|2 × D = O N3 with D = maxC∈C |C|. Because Since both clique size and number of elimination cliques is bounded by N Observation The maximum clique problem, which is NP-hard on general graphs, is easy on chordal graphs. 58 / 102
  • 172. Therefore Even with a brute force approach It will not take more than |Ce|2 × D = O N3 with D = maxC∈C |C|. Because Since both clique size and number of elimination cliques is bounded by N Observation The maximum clique problem, which is NP-hard on general graphs, is easy on chordal graphs. 58 / 102
  • 173. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 59 / 102
  • 174. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 175. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 176. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 177. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 178. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 179. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 180. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 181. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 61 / 102
  • 182. Definition Junction Tree Given a graph G = (V , E), a graph G = (V , E ) is said to be a Junction Tree for G, iff: 1 The nodes of G are the maximal cliques of G (i.e. G is a clique graph of G.) 2 G is a tree. 3 Running Intersection Property / Junction Tree Property: 1 For each v ∈ V , define Gv to be the induced subgraph of G consisting of exactly those nodes which correspond to maximal cliques of G that contain v. Then Gv must be a connected graph. 62 / 102
  • 183. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 63 / 102
  • 184. Step 1 Given a DAG G = (V , E) and |V | = N Chordalize the graph using the elimination algorithm with an arbitrary elimination ordering, if required. For this, you can use the following greedy algorithm Given a list of nodes: 1 Is the vertex simplicial? If it is not, make it simplicial. 2 If not remove it from the list. 64 / 102
  • 185. Step 1 Given a DAG G = (V , E) and |V | = N Chordalize the graph using the elimination algorithm with an arbitrary elimination ordering, if required. For this, you can use the following greedy algorithm Given a list of nodes: 1 Is the vertex simplicial? If it is not, make it simplicial. 2 If not remove it from the list. 64 / 102
  • 186. Step 1 Given a DAG G = (V , E) and |V | = N Chordalize the graph using the elimination algorithm with an arbitrary elimination ordering, if required. For this, you can use the following greedy algorithm Given a list of nodes: 1 Is the vertex simplicial? If it is not, make it simplicial. 2 If not remove it from the list. 64 / 102
  • 187. Step 1 Given a DAG G = (V , E) and |V | = N Chordalize the graph using the elimination algorithm with an arbitrary elimination ordering, if required. For this, you can use the following greedy algorithm Given a list of nodes: 1 Is the vertex simplicial? If it is not, make it simplicial. 2 If not remove it from the list. 64 / 102
  • 188. Step 1 Another way 1 By the Moralization Procedure. 2 Triangulate the moral graph. Moralization Procedure 1 Add edges between all pairs of nodes that have a common child. 2 Make all edges in the graph undirected. Triangulate the moral graph An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord. 65 / 102
  • 189. Step 1 Another way 1 By the Moralization Procedure. 2 Triangulate the moral graph. Moralization Procedure 1 Add edges between all pairs of nodes that have a common child. 2 Make all edges in the graph undirected. Triangulate the moral graph An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord. 65 / 102
  • 190. Step 1 Another way 1 By the Moralization Procedure. 2 Triangulate the moral graph. Moralization Procedure 1 Add edges between all pairs of nodes that have a common child. 2 Make all edges in the graph undirected. Triangulate the moral graph An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord. 65 / 102
  • 191. Step 2 Find the maximal cliques in the chordal graph List the N Cliques ({vN } ∪ N (vN )) ∩ {v1, ..., vN } ({vN−1} ∪ N (vN−1)) ∩ {v1, ..., vN−1} · · · {v1} Note: If the graph is Chordal this is not necessary because all the cliques are maximal. 66 / 102
  • 192. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 193. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 194. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 195. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 196. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 197. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 198. Step 3 This step can be implemented quickly in practice using a hash table Running Time: O |C|2 D = O N2D 68 / 102
  • 199. Step 4 Compute a maximum-weight spanning tree on the weighted clique graph to obtain a junction tree You can us for this the Kruskal and Prim for Maximum Weight Graph We will give Kruskal’s algorithm For finding the maximum-weight spanning tree 69 / 102
  • 200. Step 4 Compute a maximum-weight spanning tree on the weighted clique graph to obtain a junction tree You can us for this the Kruskal and Prim for Maximum Weight Graph We will give Kruskal’s algorithm For finding the maximum-weight spanning tree 69 / 102
  • 201. Step 4 Maximal Kruskal’s algorithm Initialize an edgeless graph T with nodes that are all the maximal cliques in our chordal graph. Then We will add edges to T until it becomes a junction tree. Sort the m edges ei in our clique graph from step 3 by weight wi We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1 70 / 102
  • 202. Step 4 Maximal Kruskal’s algorithm Initialize an edgeless graph T with nodes that are all the maximal cliques in our chordal graph. Then We will add edges to T until it becomes a junction tree. Sort the m edges ei in our clique graph from step 3 by weight wi We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1 70 / 102
  • 203. Step 4 Maximal Kruskal’s algorithm Initialize an edgeless graph T with nodes that are all the maximal cliques in our chordal graph. Then We will add edges to T until it becomes a junction tree. Sort the m edges ei in our clique graph from step 3 by weight wi We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1 70 / 102
  • 204. Step 4 For i = 1, 2, ..., m 1 Add edge ei to T if it does not introduce a cycle. 2 If |C| − 1 edges have been added, quit. Running Time given that |E| = O |C|2 O |C|2 log |C|2 = O |C|2 log |C| = O N2 log N 71 / 102
  • 205. Step 4 For i = 1, 2, ..., m 1 Add edge ei to T if it does not introduce a cycle. 2 If |C| − 1 edges have been added, quit. Running Time given that |E| = O |C|2 O |C|2 log |C|2 = O |C|2 log |C| = O N2 log N 71 / 102
  • 206. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 72 / 102
  • 207. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 73 / 102
  • 208. How do you build a Junction Tree? Given a General DAG S B L F E T X A Build a Chordal Graph Moral Graph – marry common parents and remove arrows. 74 / 102
  • 209. How do you build a Junction Tree? Given a General DAG S B L F E T X A Build a Chordal Graph Moral Graph – marry common parents and remove arrows. S B L F E T X A 74 / 102
  • 210. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 75 / 102
  • 211. How do you build a Junction Tree? Triangulate the moral graph An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord. S B L F E T X A 76 / 102
  • 212. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 77 / 102
  • 213. Listing of Cliques Identify the Cliques A clique is a subset of nodes which is complete (i.e. there is an edge between every pair of nodes) and maximal. S B L F E T X A {B,S,L} {B,L,E} {B,E,F} {L,E,T} {A,T} {E,X} 78 / 102
  • 214. Build the Clique Graph Clique Graph Add an edge between Cj and Ci with weight |Ci ∩ Cj| > 0 BSL LET BLE EX BEF AT 1 1 1 2 2 2 1 1 1 79 / 102
  • 215. Getting The Junction Tree Run the Maximum Kruskal’s Algorithm BSL LET BLE EX BEF AT 1 1 1 2 2 2 1 1 1 80 / 102
  • 216. Getting The Junction Tree Finally BSL LET BLE EX BEF AT 81 / 102
  • 217. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 82 / 102
  • 218. Potential Representation for the Junction Tree Something Notable The joint probability distribution can now be represented in terms of potential functions, φ. This is defined in each clique and each separator Thus P (x) = n i=1 φC (xci ) m j=1 φS xsj where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and xsj correspond to a separator. Basic idea is to represent probability distribution corresponding to any graph as a product of clique potentials P (x) = 1 Z n i=1 φC (xci ) 83 / 102
  • 219. Potential Representation for the Junction Tree Something Notable The joint probability distribution can now be represented in terms of potential functions, φ. This is defined in each clique and each separator Thus P (x) = n i=1 φC (xci ) m j=1 φS xsj where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and xsj correspond to a separator. Basic idea is to represent probability distribution corresponding to any graph as a product of clique potentials P (x) = 1 Z n i=1 φC (xci ) 83 / 102
  • 220. Potential Representation for the Junction Tree Something Notable The joint probability distribution can now be represented in terms of potential functions, φ. This is defined in each clique and each separator Thus P (x) = n i=1 φC (xci ) m j=1 φS xsj where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and xsj correspond to a separator. Basic idea is to represent probability distribution corresponding to any graph as a product of clique potentials P (x) = 1 Z n i=1 φC (xci ) 83 / 102
  • 221. Then Main idea The idea is to transform one representation of the joint distribution to another in which for each clique, c, the potential function gives the marginal distribution for the variables in c, i.e. φC (xc) = P (xc) This will also apply for each separator, s. 84 / 102
  • 222. Now, Initialization To initialize the potential functions 1 Set all potentials to unity 2 For each variable, xi, select one node in the junction tree (i.e. one clique) containing both that variable and its parents, pa(xi), in the original DAG. 3 Multiply the potential by P (xi|pa (xi)) 85 / 102
  • 223. Now, Initialization To initialize the potential functions 1 Set all potentials to unity 2 For each variable, xi, select one node in the junction tree (i.e. one clique) containing both that variable and its parents, pa(xi), in the original DAG. 3 Multiply the potential by P (xi|pa (xi)) 85 / 102
  • 224. Now, Initialization To initialize the potential functions 1 Set all potentials to unity 2 For each variable, xi, select one node in the junction tree (i.e. one clique) containing both that variable and its parents, pa(xi), in the original DAG. 3 Multiply the potential by P (xi|pa (xi)) 85 / 102
  • 225. Then For example, we have at the beginning φBSL = φBFL = φLX = 1,then S B L F X BSL BLF LX After Initialization 86 / 102
  • 226. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 87 / 102
  • 227. Propagating Information in a Junction Tree Passing Information using the separators Passing information from one clique C1 to another C2 via the separator in between them, S0, requires two steps First Step Obtain a new potential for S0 by marginalizing out the variables in C1 that are not in S0: φ∗ S0 = C1−S0 φC1 88 / 102
  • 228. Propagating Information in a Junction Tree Passing Information using the separators Passing information from one clique C1 to another C2 via the separator in between them, S0, requires two steps First Step Obtain a new potential for S0 by marginalizing out the variables in C1 that are not in S0: φ∗ S0 = C1−S0 φC1 88 / 102
  • 229. Propagating Information in a Junction Tree Second Step Obtain a new potential for C2: φ∗ C2 = φC2 λS0 Where λS0 = φ∗ S0 φS0 89 / 102
  • 230. Propagating Information in a Junction Tree Second Step Obtain a new potential for C2: φ∗ C2 = φC2 λS0 Where λS0 = φ∗ S0 φS0 89 / 102
  • 231. An Example Consider a flow from the clique {B,S,L} to {B,L,F} BSL BL BLF L LX 90 / 102
  • 232. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 91 / 102
  • 233. An Example Initial representation φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0.000002 0.039998 s2, b2 0.000038 0.759962 φBL = 1 l1 l2 b1 1 1 b2 1 1 φBLF = P (F|B, L) P (B) P (L) = P (F|B, L) l1 l2 f1, b1 0.75 0.1 f1, b2 0.5 0.05 f2, b1 0.25 0.9 f2, b2 0.5 0.95 92 / 102
  • 234. An Example After Flow φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0.000002 0.039998 s2, b2 0.000038 0.759962 φBL = 1 l1 l2 b1 0.000152 0.089848 b2 0.000488 0.909512 φBLF = P (F|B, L) l1 l2 f1, b1 0.000114 0.0089848 f1, b2 0.000244 0.0454756 f2, b1 0.000038 0.0808632 f2, b2 0.000244 0.8640364 93 / 102
  • 235. Now Introduce Evidence We have A flow from the clique {B,S,L} to {B,L,F}, but this time we he information that Joe is a smoker, S = s1. Incorporation of Evidence φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0 0 s2, b2 0 0 φBL = 1 l1 l2 b1 1 1 b2 1 1 φBLF = P (F|B, L) l1 l2 f1, b1 0.75 0.1 f1, b2 0.5 0.05 f2, b1 0.25 0.9 f2, b2 0.5 0.95 94 / 102
  • 236. Now Introduce Evidence We have A flow from the clique {B,S,L} to {B,L,F}, but this time we he information that Joe is a smoker, S = s1. Incorporation of Evidence φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0 0 s2, b2 0 0 φBL = 1 l1 l2 b1 1 1 b2 1 1 φBLF = P (F|B, L) l1 l2 f1, b1 0.75 0.1 f1, b2 0.5 0.05 f2, b1 0.25 0.9 f2, b2 0.5 0.95 94 / 102
  • 237. An Example After Flow φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0 0 s2, b2 0 0 φBL = 1 l1 l2 b1 0.00015 0.04985 b2 0.00045 0.14955 φBLF = P (F|B, L) l1 l2 f1, b1 0.0001125 0.004985 f1, b2 0.000245 0.0074775 f2, b1 0.0000375 0.044865 f2, b2 0.000255 0.1420725 95 / 102
  • 238. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 96 / 102
  • 239. The Full Propagation Two phase propagation (Jensen et al, 1990) 1 Select an arbitrary clique, C0 2 Collection Phase – flows passed from periphery to C0 3 Distribution Phase – flows passed from C0 to periphery 97 / 102
  • 240. The Full Propagation Two phase propagation (Jensen et al, 1990) 1 Select an arbitrary clique, C0 2 Collection Phase – flows passed from periphery to C0 3 Distribution Phase – flows passed from C0 to periphery 97 / 102
  • 241. The Full Propagation Two phase propagation (Jensen et al, 1990) 1 Select an arbitrary clique, C0 2 Collection Phase – flows passed from periphery to C0 3 Distribution Phase – flows passed from C0 to periphery 97 / 102
  • 242. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 98 / 102
  • 245. The Full Propagation After the two propagation phases have been carried out The Junction tree will be in equilibrium with each clique containing the joint probability distribution for the variables it contains. Marginal probabilities for individual variables can then be obtained from the cliques. Now, some evidence E can be included before propagation By selecting a clique for each variable for which evidence is available. The potential for the clique is then set to 0 for any configuration which differs from the evidence. 101 / 102
  • 246. The Full Propagation After the two propagation phases have been carried out The Junction tree will be in equilibrium with each clique containing the joint probability distribution for the variables it contains. Marginal probabilities for individual variables can then be obtained from the cliques. Now, some evidence E can be included before propagation By selecting a clique for each variable for which evidence is available. The potential for the clique is then set to 0 for any configuration which differs from the evidence. 101 / 102
  • 247. The Full Propagation After the two propagation phases have been carried out The Junction tree will be in equilibrium with each clique containing the joint probability distribution for the variables it contains. Marginal probabilities for individual variables can then be obtained from the cliques. Now, some evidence E can be included before propagation By selecting a clique for each variable for which evidence is available. The potential for the clique is then set to 0 for any configuration which differs from the evidence. 101 / 102
  • 248. The Full Propagation After the two propagation phases have been carried out The Junction tree will be in equilibrium with each clique containing the joint probability distribution for the variables it contains. Marginal probabilities for individual variables can then be obtained from the cliques. Now, some evidence E can be included before propagation By selecting a clique for each variable for which evidence is available. The potential for the clique is then set to 0 for any configuration which differs from the evidence. 101 / 102
  • 249. The Full Propagation After propagation the result will be P (x, E) = c∈C φc (xc, E) s∈S φs (xs, E) After normalization P (x|E) = c∈C φc (xc|E) s∈S φs (xs|E) 102 / 102
  • 250. The Full Propagation After propagation the result will be P (x, E) = c∈C φc (xc, E) s∈S φs (xs, E) After normalization P (x|E) = c∈C φc (xc|E) s∈S φs (xs|E) 102 / 102