SlideShare a Scribd company logo
Bayesian Networks:
    Independencies and Inference




              Scott Davies and Andrew Moore


                                    Note to other teachers and users of these slides. Andrew and Scott would
                                    be delighted if you found this source material useful in giving your own
                                    lectures. Feel free to use these slides verbatim, or to modify them to fit
                                    your own needs. PowerPoint originals are available. If you make use of a
                                    significant portion of these slides in your own lecture, please include this
                                    message, or the following link to the source repository of Andre w’s
                                    tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and
                                    corrections gratefully received.




What Independencies does a Bayes Net Model?

 • In order for a Bayesian network to model a
   probability distribution, the following must be true by
   definition:
      Each variable is conditionally independent of all its non-
      descendants in the graph given the value of all its parents.


 • This implies
                              n
           P ( X 1 K X n ) = ∏ P ( X i | parents ( X i ))
                             i =1

 • But what else does it imply?




                                                                                                                   1
What Independencies does a Bayes Net Model?

  • Example:
                         Given Y, does learning the value of Z tell us
                                   nothing new about X?
       Z
                                I.e., is P(X|Y, Z) equal to P(X | Y)?
      Y
                          Yes. Since we know the value of all of X’s
                              parents (namely, Y), and Z is not a
                              descendant of X, X is conditionally
      X                                independent of Z.

                            Also, since independence is symmetric,
                                      P(Z|Y, X) = P(Z|Y).




 Quick proof that independence is symmetric

  • Assume: P(X|Y, Z) = P(X|Y)
  • Then:
                     P( X , Y | Z ) P( Z )
  P ( Z | X ,Y ) =                                     (Bayes’s Rule)
                         P( X , Y )
                     P(Y | Z ) P( X | Y , Z ) P( Z )
                 =                                     (Chain Rule)
                           P( X | Y ) P(Y )

                     P(Y | Z ) P( X | Y ) P( Z )
                 =                                     (By Assumption)
                         P( X | Y ) P(Y )

                 P(Y | Z ) P( Z )
             =                    = P( Z | Y )         (Bayes’s Rule)
                     P(Y )




                                                                         2
What Independencies does a Bayes Net Model?

  • Let I<X,Y,Z> represent X and Z being conditionally
    independent given Y.

                          Y


                    X           Z


  • I<X,Y,Z>? Yes, just as in previous example: All X’s
    parents given, and Z is not a descendant.




What Independencies does a Bayes Net Model?

                          Z

                    U            V

                          X

  • I<X,{U},Z>? No.
  • I<X,{U,V},Z>? Yes.
  • Maybe I<X, S, Z> iff S acts a cutset between X and Z
    in an undirected version of the graph…?




                                                           3
Things get a little more confusing

                  X               Z

                          Y

 • X has no parents, so we’re know all its parents’
   values trivially
 • Z is not a descendant of X
 • So, I<X,{},Z>, even though there’s a undirected path
   from X to Z through an unknown variable Y.
 • What if we do know the value of Y, though? Or one
   of its descendants?




The “Burglar Alarm” example

               Burglar            Earthquake

                          Alarm

                         Phone Call
 • Your house has a twitchy burglar alarm that is also
   sometimes triggered by earthquakes.
 • Earth arguably doesn’t care whether your house is
   currently being burgled
 • While you are on vacation, one of your neighbors
   calls and tells you your home’s burglar alarm is
   ringing. Uh oh!




                                                          4
Things get a lot more confusing

                Burglar            Earthquake

                           Alarm

                          Phone Call

• But now suppose you learn that there was a medium-sized
  earthquake in your neighborhood. Oh, whew! Probably not a
  burglar after all.
• Earthquake “explains away” the hypothetical burglar.
• But then it must not be the case that
  I<Burglar,{Phone Call}, Earthquake>, even though
  I<Burglar,{}, Earthquake>!




d-separation to the rescue

• Fortunately, there is a relatively simple algorithm for
  determining whether two variables in a Bayesian
  network are conditionally independent: d-separation.

• Definition: X and Z are d-separated by a set of
  evidence variables E iff every undirected path from X
  to Z is “blocked”, where a path is “blocked” iff one
  or more of the following conditions is true: ...




                                                              5
A path is “blocked” when...

 • There exists a variable V on the path such that
    • it is in the evidence set E
    • the arcs putting V in the path are “tail-to-tail”

                               V

 • Or, there exists a variable V on the path such that
    • it is in the evidence set E
    • the arcs putting V in the path are “tail-to- head”

                               V

 • Or, ...




A path is “blocked” when… (the funky case)

 • … Or, there exists a variable V on the path such that
    • it is NOT in the evidence set E
    • neither are any of its descendants
    • the arcs putting V on the path are “head-to-head”


                               V




                                                           6
d-separation to the rescue, cont’d

• Theorem [Verma & Pearl, 1998]:
   • If a set of evidence variables E d-separates X and
     Z in a Bayesian network’s graph, then I<X, E, Z>.
• d-separation can be computed in linear time using a
  depth-first-search-like algorithm.
• Great! We now have a fast algorithm for
  automatically inferring whether learning the value of
  one variable might give us any additional hints about
  some other variable, given what we already know.
   • “Might”: Variables may actually be independent when they’re not d-
     separated, depending on the actual probabilities involved




d-separation example


  A         B

                             •I<C, {}, D>?
  C         D
                             •I<C, {A}, D>?
                             •I<C, {A, B}, D>?
  E          F               •I<C, {A, B, J}, D>?
                             •I<C, {A, B, E, J}, D>?
  G         H


  I          J




                                                                          7
Bayesian Network Inference

  • Inference: calculating P(X|Y) for some variables or
    sets of variables X and Y.
  • Inference in Bayesian networks is #P-hard!
                                   Inputs: prior probabilities of .5

                                      I1    I2    I3    I4    I5
                      Reduces to




                                                  O
                                              P(O) must be
How many satisfying assignments?      (#sat. assign.)*(.5^#inputs)




 Bayesian Network Inference

  • But…inference is still tractable in some cases.
  • Let’s look a special class of networks: trees / forests
    in which each node has at most one parent.




                                                                       8
Decomposing the probabilities

• Suppose we want P(Xi | E) where E is some set of
  evidence variables.
• Let’s split E into two parts:
   • Ei- is the part consisting of assignments to variables in the
     subtree rooted at Xi
   • Ei+ is the rest of it


                            Xi




Decomposing the probabilities, cont’d

 P( X i | E ) = P ( X i | Ei− , Ei+ )

                                             Xi




                                                                     9
Decomposing the probabilities, cont’d

 P( X i | E ) = P ( X i | Ei− , Ei+ )
   P( Ei− | X , Ei+ ) P ( X | Ei+ )
 =                                      Xi
           P( Ei− | Ei+ )




Decomposing the probabilities, cont’d

 P( X i | E ) = P ( X i | Ei− , Ei+ )
   P( Ei− | X , Ei+ ) P ( X | Ei+ )
 =                                      Xi
           P( Ei− | Ei+ )
   P( Ei− | X ) P( X | Ei+ )
 =
        P ( Ei− | Ei+ )




                                             10
Decomposing the probabilities, cont’d

 P( X i | E ) = P ( X i | Ei− , Ei+ )
   P( Ei− | X , Ei+ ) P ( X | Ei+ )
 =                                      Xi
           P( Ei− | Ei+ )
   P( Ei− | X ) P( X | Ei+ )
 =
         P ( Ei− | Ei+ )
 = ap ( X i )? ( X i )   Where:
                         • α is a constant independent of Xi
                         • π(Xi) = P(Xi |Ei+)
                         • λ(Xi) = P(Ei-| Xi)




Using the decomposition for inference

• We can use this decomposition to do inference as
  follows. First, compute λ(Xi) = P(Ei-| Xi) for all Xi
  recursively, using the leaves of the tree as the base
  case.
• If Xi is a leaf:
   • If Xi is in E: λ(Xi) = 1 if Xi matches E, 0 otherwise
   • If Xi is not in E: Ei- is the null set, so
       P(Ei-| Xi) = 1 (constant)




                                                               11
Quick aside: “Virtual evidence”

• For theoretical simplicity, but without loss of
  generality, let’s assume that all variables in E (the
  evidence set) are leaves in the tree.
• Why can we do this WLOG:
                         Equivalent to       Xi
        Xi
    Observe Xi                                      Observe Xi’
                                             X i’

             Where P(Xi’| Xi) =1 if Xi’=Xi, 0 otherwise




Calculating λ(Xi) for non-leaves

                                             Xi
• Suppose Xi has one child, Xc.
                                                    Xc
• Then:
     ? ( X i ) = P ( E i− | X i ) =




                                                                  12
Calculating λ(Xi) for non-leaves

                                                    Xi
• Suppose Xi has one child, Xc.
                                                         Xc
• Then:

                                     ∑ P(E
    ? ( X i ) = P ( E i− | X i ) =                ,XC = j | Xi)
                                              −
                                             i
                                     j




Calculating λ(Xi) for non-leaves

                                                    Xi
• Suppose Xi has one child, Xc.
                                                         Xc
• Then:

                                     ∑ P(E
    ? ( X i ) = P ( E i− | X i ) =                ,XC = j | Xi)
                                              −
                                             i
                                     j


        ∑ P(X        = j | X i ) P ( E i− | X i , X C = j )
    =            C
        j




                                                                  13
Calculating λ(Xi) for non-leaves

                                                          Xi
• Suppose Xi has one child, Xc.
                                                                  Xc
• Then:

                                     ∑ P(E
    ? ( X i ) = P ( E i− | X i ) =              −
                                                     ,XC = j | Xi)
                                               i
                                     j

        ∑ P(X
    =                = j | X i ) P ( E i− | X i , X C = j )
                 C
        j


        ∑ P(X        = j | X i ) P ( E i− | X C = j )
    =            C
        j

        ∑ P(X
    =                = j | X i )?( X C = j)
                 C
        j




Calculating λ(Xi) for non-leaves

• Now, suppose Xi has a set of children, C.
• Since Xi d-separates each of its subtrees, the
  contribution of each subtree to λ(Xi) is independent:

                                            ∏?
             ? ( X i ) = P (Ei− | X i ) =                 ( Xi)
                                                      j
                                            X j ∈C

                                               
             ∏ ∑
             =            P( X j | X i )?( X j )
             X j ∈C  X j                       
                                               
where λj(Xi) is the contribution to P(Ei-| Xi) of the part of
 the evidence lying in the subtree rooted at one of Xi’s
 children Xj.




                                                                       14
We are now λ-happy

• So now we have a way to recursively compute all the
  λ(Xi)’s, starting from the root and using the leaves as
  the base case.
• If we want, we can think of each node in the network
  as an autonomous processor that passes a little “λ
  message” to its parent.
                              λ           λ


                      λ           λ   λ         λ




The other half of the problem

• Remember, P(Xi|E) = απ(Xi)λ(Xi). Now that we have
  all the λ(Xi)’s, what about the π(Xi)’s?
       π(Xi) = P(Xi |Ei+).
• What about the root of the tree, Xr? In that case, Er+
  is the null set, so π(Xr) = P(Xr). No sweat. Since we
  also know λ(Xr), we can compute the final P(Xr).
• So for an arbitrary Xi with parent Xp, let’s inductively
  assume we know π(Xp) and/or P(Xp|E). How do we
  get π(Xi)?




                                                             15
Computing π(Xi)

             p( X i ) = P( X i | Ei+ ) =


       Xp

  Xi




Computing π(Xi)

             p( X i ) = P( X i | Ei+ ) = ∑ P ( X i , X p = j | Ei+ )
                                           j



       Xp

  Xi




                                                                       16
Computing π(Xi)

             p( X i ) = P( X i | Ei+ ) = ∑ P ( X i , X p = j | Ei+ )
                                             j

             = ∑ P ( X i | X p = j, E ) P ( X p = j | Ei+ )
                                         +
                                        i
       Xp         j




  Xi




Computing π(Xi)

             p( X i ) = P( X i | Ei+ ) = ∑ P ( X i , X p = j | Ei+ )
                                             j

             = ∑ P ( X i | X p = j, Ei+ ) P ( X p = j | Ei+ )
       Xp         j

             = ∑ P ( X i | X p = j ) P ( X p = j | Ei+ )
                  j
  Xi




                                                                       17
Computing π(Xi)

                    p( X i ) = P( X i | Ei+ ) = ∑ P ( X i , X p = j | Ei+ )
                                                    j

                    = ∑ P ( X i | X p = j, E ) P ( X p = j | Ei+ )
                                                +
                                               i
         Xp            j

                    = ∑ P ( X i | X p = j ) P ( X p = j | Ei+ )
                       j
  Xi
                                             P( X p = j | E)
                    = ∑ P( X i | X p = j )
                                               ? i ( X p = j)
                       j




Computing π(Xi)

                    p( X i ) = P ( X i | Ei+ ) = ∑ P( X i , X p = j | Ei+ )
                                                    j

                    = ∑ P ( X i | X p = j, Ei+ ) P( X p = j | Ei+ )
                       j
         Xp
                    = ∑ P ( X i | X p = j) P( X p = j | Ei+ )
                       j
  Xi
                                             P( X p = j | E )
                    = ∑ P ( X i | X p = j)
                                               ? i ( X p = j)
                       j

                    = ∑ P ( X i | X p = j)p i ( X p = j)
                       j

                                          P( X p | E )
       Where πi(Xp) is defined as
                                            ?i(X p)




                                                                              18
We’re done. Yay!

• Thus we can compute all the π(Xi)’s, and, in turn, all
  the P(Xi|E)’s.
• Can think of nodes as autonomous processors passing
  λ and π messages to their neighbors

                     λ           λ
                         ππ

             λ           λ   λ        λ
                                 ππ
                  ππ




Conjunctive queries

• What if we want, e.g., P(A, B | C) instead of just
  marginal distributions P(A | C) and P(B | C)?
• Just use chain rule:
   • P(A, B | C) = P(A | C) P(B | A, C)
   • Each of the latter probabilities can be computed
     using the technique just discussed.




                                                           19
Polytrees

• Technique can be generalized to polytrees:
  undirected versions of the graphs are still trees, but
  nodes can have more than one parent




Dealing with cycles

• Can deal with undirected cycles in graph by
   • clustering variables together
                                          A
               A

          B          C                   BC
               D                          D
                                                   Set to 1
                                     Set to 0
   • Conditioning




                                                              20
Join trees

• Arbitrary Bayesian network can be transformed via
  some evil graph-theoretic magic into a join tree in
  which a similar method can be employed.
                                                  ABC
          A
      B        C
                                           BCD           BCD
  E       D        G
                                            DF
          F
 In the worst case the join tree nodes must take on exponentially
 many combinations of values, but often works well in practice




                                                                    21

More Related Content

What's hot

NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
Adnan Masood
 
5 csp
5 csp5 csp
5 csp
Mhd Sb
 
AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)
Tajim Md. Niamat Ullah Akhund
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
Amar Jukuntla
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
DataminingTools Inc
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
Yiqun Hu
 
Module 4 part_1
Module 4 part_1Module 4 part_1
Module 4 part_1
ShashankN22
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)
cairo university
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Milind Gokhale
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
Kazuki Yoshida
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligence
harshita virwani
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang
 
Inductive bias
Inductive biasInductive bias
Inductive bias
swapnac12
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
Greedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack ProblemGreedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack Problem
Madhu Bala
 

What's hot (20)

NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
5 csp
5 csp5 csp
5 csp
 
AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Module 4 part_1
Module 4 part_1Module 4 part_1
Module 4 part_1
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligence
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Inductive bias
Inductive biasInductive bias
Inductive bias
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Greedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack ProblemGreedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack Problem
 

Viewers also liked

A Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian ClassifiersA Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian Classifiersguestfee8698
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Minersguestfee8698
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networksguestfee8698
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)guestfee8698
 
Training pendidikan 2012
Training pendidikan 2012Training pendidikan 2012
Training pendidikan 2012
Azure Linger
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Modelsguestfee8698
 
FY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact SheetFY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact Sheet
Marcus Peterson
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithmsguestfee8698
 
Role Of States
Role Of StatesRole Of States
Role Of States
Marcus Peterson
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Modelsguestfee8698
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimationguestfee8698
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clusteringguestfee8698
 

Viewers also liked (18)

A Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian ClassifiersA Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian Classifiers
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Miners
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networks
 
Bayesian Networks
Bayesian NetworksBayesian Networks
Bayesian Networks
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)
 
Gaussians
GaussiansGaussians
Gaussians
 
Training pendidikan 2012
Training pendidikan 2012Training pendidikan 2012
Training pendidikan 2012
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
FY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact SheetFY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact Sheet
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithms
 
Role Of States
Role Of StatesRole Of States
Role Of States
 
PAC Learning
PAC LearningPAC Learning
PAC Learning
 
VC dimensio
VC dimensioVC dimensio
VC dimensio
 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Models
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimation
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clustering
 

Similar to Inference in Bayesian Networks

Causal Bayesian Networks
Causal Bayesian NetworksCausal Bayesian Networks
Causal Bayesian Networks
Flávio Codeço Coelho
 
Bayesian Machine Learning - Naive Bayes
Bayesian Machine Learning - Naive BayesBayesian Machine Learning - Naive Bayes
Bayesian Machine Learning - Naive Bayes
Krishna Sankar
 
Comments on Jeffrey conditioning and external Bayesianity
Comments on Jeffrey conditioning and external BayesianityComments on Jeffrey conditioning and external Bayesianity
Comments on Jeffrey conditioning and external Bayesianity
spetey
 
Link-based document classification using Bayesian Networks
Link-based document classification using Bayesian NetworksLink-based document classification using Bayesian Networks
Link-based document classification using Bayesian NetworksAlfonso E. Romero
 
L03 ai - knowledge representation using logic
L03 ai - knowledge representation using logicL03 ai - knowledge representation using logic
L03 ai - knowledge representation using logic
Manjula V
 
Perceptrons
PerceptronsPerceptrons
Perceptrons
ESCOM
 
Prolog Cpt114 - Week 2
Prolog Cpt114 - Week 2Prolog Cpt114 - Week 2
Prolog Cpt114 - Week 2a_akhavan
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
岳華 杜
 
Exchanging More than Complete Data
Exchanging More than Complete DataExchanging More than Complete Data
Exchanging More than Complete Datanet2-project
 
Ontology-based Data Access with Existential Rules
Ontology-based Data Access with Existential RulesOntology-based Data Access with Existential Rules
Ontology-based Data Access with Existential RulesRuleML
 
Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02laboratoridalbasso
 

Similar to Inference in Bayesian Networks (15)

Causal Bayesian Networks
Causal Bayesian NetworksCausal Bayesian Networks
Causal Bayesian Networks
 
Bayesian Machine Learning - Naive Bayes
Bayesian Machine Learning - Naive BayesBayesian Machine Learning - Naive Bayes
Bayesian Machine Learning - Naive Bayes
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
Comments on Jeffrey conditioning and external Bayesianity
Comments on Jeffrey conditioning and external BayesianityComments on Jeffrey conditioning and external Bayesianity
Comments on Jeffrey conditioning and external Bayesianity
 
Link-based document classification using Bayesian Networks
Link-based document classification using Bayesian NetworksLink-based document classification using Bayesian Networks
Link-based document classification using Bayesian Networks
 
L03 ai - knowledge representation using logic
L03 ai - knowledge representation using logicL03 ai - knowledge representation using logic
L03 ai - knowledge representation using logic
 
Dbm630 lecture07
Dbm630 lecture07Dbm630 lecture07
Dbm630 lecture07
 
Bayesnetwork
BayesnetworkBayesnetwork
Bayesnetwork
 
Perceptrons
PerceptronsPerceptrons
Perceptrons
 
Prolog Cpt114 - Week 2
Prolog Cpt114 - Week 2Prolog Cpt114 - Week 2
Prolog Cpt114 - Week 2
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Exchanging More than Complete Data
Exchanging More than Complete DataExchanging More than Complete Data
Exchanging More than Complete Data
 
Ontology-based Data Access with Existential Rules
Ontology-based Data Access with Existential RulesOntology-based Data Access with Existential Rules
Ontology-based Data Access with Existential Rules
 
Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

Inference in Bayesian Networks

  • 1. Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andre w’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. What Independencies does a Bayes Net Model? • In order for a Bayesian network to model a probability distribution, the following must be true by definition: Each variable is conditionally independent of all its non- descendants in the graph given the value of all its parents. • This implies n P ( X 1 K X n ) = ∏ P ( X i | parents ( X i )) i =1 • But what else does it imply? 1
  • 2. What Independencies does a Bayes Net Model? • Example: Given Y, does learning the value of Z tell us nothing new about X? Z I.e., is P(X|Y, Z) equal to P(X | Y)? Y Yes. Since we know the value of all of X’s parents (namely, Y), and Z is not a descendant of X, X is conditionally X independent of Z. Also, since independence is symmetric, P(Z|Y, X) = P(Z|Y). Quick proof that independence is symmetric • Assume: P(X|Y, Z) = P(X|Y) • Then: P( X , Y | Z ) P( Z ) P ( Z | X ,Y ) = (Bayes’s Rule) P( X , Y ) P(Y | Z ) P( X | Y , Z ) P( Z ) = (Chain Rule) P( X | Y ) P(Y ) P(Y | Z ) P( X | Y ) P( Z ) = (By Assumption) P( X | Y ) P(Y ) P(Y | Z ) P( Z ) = = P( Z | Y ) (Bayes’s Rule) P(Y ) 2
  • 3. What Independencies does a Bayes Net Model? • Let I<X,Y,Z> represent X and Z being conditionally independent given Y. Y X Z • I<X,Y,Z>? Yes, just as in previous example: All X’s parents given, and Z is not a descendant. What Independencies does a Bayes Net Model? Z U V X • I<X,{U},Z>? No. • I<X,{U,V},Z>? Yes. • Maybe I<X, S, Z> iff S acts a cutset between X and Z in an undirected version of the graph…? 3
  • 4. Things get a little more confusing X Z Y • X has no parents, so we’re know all its parents’ values trivially • Z is not a descendant of X • So, I<X,{},Z>, even though there’s a undirected path from X to Z through an unknown variable Y. • What if we do know the value of Y, though? Or one of its descendants? The “Burglar Alarm” example Burglar Earthquake Alarm Phone Call • Your house has a twitchy burglar alarm that is also sometimes triggered by earthquakes. • Earth arguably doesn’t care whether your house is currently being burgled • While you are on vacation, one of your neighbors calls and tells you your home’s burglar alarm is ringing. Uh oh! 4
  • 5. Things get a lot more confusing Burglar Earthquake Alarm Phone Call • But now suppose you learn that there was a medium-sized earthquake in your neighborhood. Oh, whew! Probably not a burglar after all. • Earthquake “explains away” the hypothetical burglar. • But then it must not be the case that I<Burglar,{Phone Call}, Earthquake>, even though I<Burglar,{}, Earthquake>! d-separation to the rescue • Fortunately, there is a relatively simple algorithm for determining whether two variables in a Bayesian network are conditionally independent: d-separation. • Definition: X and Z are d-separated by a set of evidence variables E iff every undirected path from X to Z is “blocked”, where a path is “blocked” iff one or more of the following conditions is true: ... 5
  • 6. A path is “blocked” when... • There exists a variable V on the path such that • it is in the evidence set E • the arcs putting V in the path are “tail-to-tail” V • Or, there exists a variable V on the path such that • it is in the evidence set E • the arcs putting V in the path are “tail-to- head” V • Or, ... A path is “blocked” when… (the funky case) • … Or, there exists a variable V on the path such that • it is NOT in the evidence set E • neither are any of its descendants • the arcs putting V on the path are “head-to-head” V 6
  • 7. d-separation to the rescue, cont’d • Theorem [Verma & Pearl, 1998]: • If a set of evidence variables E d-separates X and Z in a Bayesian network’s graph, then I<X, E, Z>. • d-separation can be computed in linear time using a depth-first-search-like algorithm. • Great! We now have a fast algorithm for automatically inferring whether learning the value of one variable might give us any additional hints about some other variable, given what we already know. • “Might”: Variables may actually be independent when they’re not d- separated, depending on the actual probabilities involved d-separation example A B •I<C, {}, D>? C D •I<C, {A}, D>? •I<C, {A, B}, D>? E F •I<C, {A, B, J}, D>? •I<C, {A, B, E, J}, D>? G H I J 7
  • 8. Bayesian Network Inference • Inference: calculating P(X|Y) for some variables or sets of variables X and Y. • Inference in Bayesian networks is #P-hard! Inputs: prior probabilities of .5 I1 I2 I3 I4 I5 Reduces to O P(O) must be How many satisfying assignments? (#sat. assign.)*(.5^#inputs) Bayesian Network Inference • But…inference is still tractable in some cases. • Let’s look a special class of networks: trees / forests in which each node has at most one parent. 8
  • 9. Decomposing the probabilities • Suppose we want P(Xi | E) where E is some set of evidence variables. • Let’s split E into two parts: • Ei- is the part consisting of assignments to variables in the subtree rooted at Xi • Ei+ is the rest of it Xi Decomposing the probabilities, cont’d P( X i | E ) = P ( X i | Ei− , Ei+ ) Xi 9
  • 10. Decomposing the probabilities, cont’d P( X i | E ) = P ( X i | Ei− , Ei+ ) P( Ei− | X , Ei+ ) P ( X | Ei+ ) = Xi P( Ei− | Ei+ ) Decomposing the probabilities, cont’d P( X i | E ) = P ( X i | Ei− , Ei+ ) P( Ei− | X , Ei+ ) P ( X | Ei+ ) = Xi P( Ei− | Ei+ ) P( Ei− | X ) P( X | Ei+ ) = P ( Ei− | Ei+ ) 10
  • 11. Decomposing the probabilities, cont’d P( X i | E ) = P ( X i | Ei− , Ei+ ) P( Ei− | X , Ei+ ) P ( X | Ei+ ) = Xi P( Ei− | Ei+ ) P( Ei− | X ) P( X | Ei+ ) = P ( Ei− | Ei+ ) = ap ( X i )? ( X i ) Where: • α is a constant independent of Xi • π(Xi) = P(Xi |Ei+) • λ(Xi) = P(Ei-| Xi) Using the decomposition for inference • We can use this decomposition to do inference as follows. First, compute λ(Xi) = P(Ei-| Xi) for all Xi recursively, using the leaves of the tree as the base case. • If Xi is a leaf: • If Xi is in E: λ(Xi) = 1 if Xi matches E, 0 otherwise • If Xi is not in E: Ei- is the null set, so P(Ei-| Xi) = 1 (constant) 11
  • 12. Quick aside: “Virtual evidence” • For theoretical simplicity, but without loss of generality, let’s assume that all variables in E (the evidence set) are leaves in the tree. • Why can we do this WLOG: Equivalent to Xi Xi Observe Xi Observe Xi’ X i’ Where P(Xi’| Xi) =1 if Xi’=Xi, 0 otherwise Calculating λ(Xi) for non-leaves Xi • Suppose Xi has one child, Xc. Xc • Then: ? ( X i ) = P ( E i− | X i ) = 12
  • 13. Calculating λ(Xi) for non-leaves Xi • Suppose Xi has one child, Xc. Xc • Then: ∑ P(E ? ( X i ) = P ( E i− | X i ) = ,XC = j | Xi) − i j Calculating λ(Xi) for non-leaves Xi • Suppose Xi has one child, Xc. Xc • Then: ∑ P(E ? ( X i ) = P ( E i− | X i ) = ,XC = j | Xi) − i j ∑ P(X = j | X i ) P ( E i− | X i , X C = j ) = C j 13
  • 14. Calculating λ(Xi) for non-leaves Xi • Suppose Xi has one child, Xc. Xc • Then: ∑ P(E ? ( X i ) = P ( E i− | X i ) = − ,XC = j | Xi) i j ∑ P(X = = j | X i ) P ( E i− | X i , X C = j ) C j ∑ P(X = j | X i ) P ( E i− | X C = j ) = C j ∑ P(X = = j | X i )?( X C = j) C j Calculating λ(Xi) for non-leaves • Now, suppose Xi has a set of children, C. • Since Xi d-separates each of its subtrees, the contribution of each subtree to λ(Xi) is independent: ∏? ? ( X i ) = P (Ei− | X i ) = ( Xi) j X j ∈C   ∏ ∑ = P( X j | X i )?( X j ) X j ∈C  X j    where λj(Xi) is the contribution to P(Ei-| Xi) of the part of the evidence lying in the subtree rooted at one of Xi’s children Xj. 14
  • 15. We are now λ-happy • So now we have a way to recursively compute all the λ(Xi)’s, starting from the root and using the leaves as the base case. • If we want, we can think of each node in the network as an autonomous processor that passes a little “λ message” to its parent. λ λ λ λ λ λ The other half of the problem • Remember, P(Xi|E) = απ(Xi)λ(Xi). Now that we have all the λ(Xi)’s, what about the π(Xi)’s? π(Xi) = P(Xi |Ei+). • What about the root of the tree, Xr? In that case, Er+ is the null set, so π(Xr) = P(Xr). No sweat. Since we also know λ(Xr), we can compute the final P(Xr). • So for an arbitrary Xi with parent Xp, let’s inductively assume we know π(Xp) and/or P(Xp|E). How do we get π(Xi)? 15
  • 16. Computing π(Xi) p( X i ) = P( X i | Ei+ ) = Xp Xi Computing π(Xi) p( X i ) = P( X i | Ei+ ) = ∑ P ( X i , X p = j | Ei+ ) j Xp Xi 16
  • 17. Computing π(Xi) p( X i ) = P( X i | Ei+ ) = ∑ P ( X i , X p = j | Ei+ ) j = ∑ P ( X i | X p = j, E ) P ( X p = j | Ei+ ) + i Xp j Xi Computing π(Xi) p( X i ) = P( X i | Ei+ ) = ∑ P ( X i , X p = j | Ei+ ) j = ∑ P ( X i | X p = j, Ei+ ) P ( X p = j | Ei+ ) Xp j = ∑ P ( X i | X p = j ) P ( X p = j | Ei+ ) j Xi 17
  • 18. Computing π(Xi) p( X i ) = P( X i | Ei+ ) = ∑ P ( X i , X p = j | Ei+ ) j = ∑ P ( X i | X p = j, E ) P ( X p = j | Ei+ ) + i Xp j = ∑ P ( X i | X p = j ) P ( X p = j | Ei+ ) j Xi P( X p = j | E) = ∑ P( X i | X p = j ) ? i ( X p = j) j Computing π(Xi) p( X i ) = P ( X i | Ei+ ) = ∑ P( X i , X p = j | Ei+ ) j = ∑ P ( X i | X p = j, Ei+ ) P( X p = j | Ei+ ) j Xp = ∑ P ( X i | X p = j) P( X p = j | Ei+ ) j Xi P( X p = j | E ) = ∑ P ( X i | X p = j) ? i ( X p = j) j = ∑ P ( X i | X p = j)p i ( X p = j) j P( X p | E ) Where πi(Xp) is defined as ?i(X p) 18
  • 19. We’re done. Yay! • Thus we can compute all the π(Xi)’s, and, in turn, all the P(Xi|E)’s. • Can think of nodes as autonomous processors passing λ and π messages to their neighbors λ λ ππ λ λ λ λ ππ ππ Conjunctive queries • What if we want, e.g., P(A, B | C) instead of just marginal distributions P(A | C) and P(B | C)? • Just use chain rule: • P(A, B | C) = P(A | C) P(B | A, C) • Each of the latter probabilities can be computed using the technique just discussed. 19
  • 20. Polytrees • Technique can be generalized to polytrees: undirected versions of the graphs are still trees, but nodes can have more than one parent Dealing with cycles • Can deal with undirected cycles in graph by • clustering variables together A A B C BC D D Set to 1 Set to 0 • Conditioning 20
  • 21. Join trees • Arbitrary Bayesian network can be transformed via some evil graph-theoretic magic into a join tree in which a similar method can be employed. ABC A B C BCD BCD E D G DF F In the worst case the join tree nodes must take on exponentially many combinations of values, but often works well in practice 21