SlideShare a Scribd company logo
1 of 173
Download to read offline
Bayesian Networks
        Unit 7 Approximate Inference
            in Bayesian Networks
                    Wang, Yuan-Kai, 王元凱
                      ykwang@mails.fju.edu.tw
                       http://www.ykwang.tw

       Department of Electrical Engineering, Fu Jen Univ.
                     輔仁大學電機工程系

                                 2006~2011

                      Reference this document as:
    Wang, Yuan-Kai, “Approximate Inference in Bayesian Networks,"
    Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.
Fu Jen University     Department of Electrical Engineering   Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 2



                             Goal of This Unit
         • P(X|e) inference for Bayesian networks
         • Why approximate inference
               – Exact inference is too slow because of
                 exponential complexity
         • Using approximate approaches
               – Sampling methods
                    • Likelihood weighting sampling
                    • Markov Chain Monte Carlo sampling
               – Loopy belief propagation
               – Variational method

         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 3



                              Related Units
         • Background
               – Probabilistic graphical model
               – Exact inference in BN
         • Next units
               – Probabilistic inference over time




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks               Unit - Approximate Inference in Bayesian Networks                        p. 4



                             Self-Study References
         • Chapter 14, Artificial Intelligence-a modern
           approach, 2nd, by S. Russel & P. Norvig, Prentice
           Hall, 2003.
         • Inference in Bayesian networks, B. D’Ambrosio, AI
           Magazine, 1999.
         • Probabilistic Inference in graphical models, M. I.
           Jordan & Y. Weiss.
         • An introduction to MCMC for machine learning.
           Andrieu, C., De Freitas, J., Doucet, A., & Jordan,
           M. I., Machine Learning, vol. 50, pp.5-43, 2003.
         • Computational Statistics Handbook with Matlab,
           W. L. Martinez and A. R. Martinez, Chapman &
           Hall/CRC, 2002
               – Chapter 3 Sampling Concepts
               – Chapter 4 Generating Random Variables
         Fu Jen University        Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                 Unit - Approximate Inference in Bayesian Networks                        p. 5



             Structure of Related Lecture Notes
                         Problem                                   Structure                    Data
                                                                   Learning
           PGM                                       B        E
       Representation                                                                  Learning
                                                          A
          Unit 5 : BN                                                                 Units 16~ : MLE, EM
          Unit 9 : Hybrid BN                          J       M
          Units 10~15: Naïve Bayes, MRF,
                      HMM, DBN,
                      Kalman filter                   P(B)             Parameter
                                                      P(E)             Learning
                                                    P(A|B,E)
                                                     P(J|A)
      Query Inference
                                                     P(M|A)
                    Unit 6: Exact inference
                    Unit 7: Approximate inference
                    Unit 8: Temporal inference
         Fu Jen University          Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                 Unit - Approximate Inference in Bayesian Networks                        p. 6




                                        Contents
          1.   Sampling ..........................................................                11
          2.   Random Number Generator ..........................                                 20
          3.   Stochastic Simulation …….............................                              70
          4.   Markov Chain Monte Carlo ..........................                               113
          5.   Loopy Belief Propagation ………………….                                                 145
          6.   Variational Methods ………………………...                                                  146
          7.   Implementation ……………………………..                                                      147
          8.   Summary …………………………………….                                                           148
          9.   References ……………………………………                                                         151


         Fu Jen University           Department of Electrical Engineering             Wang, Yuan-Kai Copyright
Bayesian Networks                  Unit - Approximate Inference in Bayesian Networks                        p. 7



                              4 Steps of Inference
         • Step 1: Bayesian theorem
                                              P ( X , E  e)
                             P ( X | E  e)                  P ( X , E  e)
                                               P ( E  e)
         • Step 2: Marginalization
                      P( X , E  e, H  h)
                                 hH
         • Step 3: Conditional independence
                       P( X i | Pa ( X i ))
                                hH i 1~ n
         • Step 4: Product sum computation (Enumeration)
               – Exact inference
               – Approximate inference
         Fu Jen University           Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 8



           Five Types of Queries in Inference
         • For a probabilistic graphical model G
         • Given a set of evidence E=e
         • Query the PGM with
              – P(e) : Likelihood query
              – arg max P(e) :
                Maximum likelihood query
              – P(X|e) : Posterior belief query
              – arg maxx P(X=x|e) : (Single query variable)
                Maximum a posterior (MAP) query
              – arg maxx …x P(X1=x1, …, Xk=xk|e) :
                             1      k
                Most probable explanation (MPE) query
         Fu Jen University       Department of Electrical Engineering            Wang, Yuan-Kai Copyright
Bayesian Networks               Unit - Approximate Inference in Bayesian Networks                        p. 9


                             Approximate Inference
                              v.s. Exact Inference
         • Exact inference: P(X|E) = 0.71828
              – Get exact probability value
              – Using the inference steps derived by
                probabilistic formula
              – Need exponential time complexity
         • Approximate inference: P(X|E)  0.71
              – Get approximate probability value
              – Using sampling theorem
              – Need only polynomial time complexity,
                fast computation

         Fu Jen University        Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 10



                    Why Approximate Inference
         • Large treewidth
               – Large, highly connected graphical models
               – Treewidth may be large (>40) in sparse
                 networks
         • In many applications, approximation are
           sufficient
               – Example: P(X = x|e) = 0.3183098861
               – Maybe P(X = x|e)  0.3 is a good enough
                 approximation
               – e.g., we take action only if P(X=x|e) > 0.5
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 11




                               1. Sampling
         • 1.1 What Is Sampling
         • 1.2 Sampling for Inference




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks               Unit - Approximate Inference in Bayesian Networks                        p. 12



                             Basic Idea of Sampling
         • Why sampling
                – Estimate some values by random number
                  generation
         1. Sampling
                –  Random number generating
                – Draw N samples from a known distribution P
                – Generate N random numbers from a known
                  distribution S
         2. Estimation
                                                       ˆ
                – Compute an approximate probability P , which
                  approximates the real posterior probability
                  P(X|E)

         Fu Jen University        Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks               Unit - Approximate Inference in Bayesian Networks                        p. 13



                             1.1 What Is Sampling
         • A very simple example with a random
           variable : coin toss
               – Tossing the coin, get head or tail
               – It is a Boolean R.V.
                  • coin = head or tail
               – If it is unbiased coin, head and tail have
                 equal probability
                  • A prior probability distribution
                    P(Coin) = <0.5, 0.5>
                  • Uniform distribution
               – Assume we have a coin but we do not
                 know it is unbiased
         Fu Jen University        Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks               Unit - Approximate Inference in Bayesian Networks                        p. 14



                             Sampling of Coin Toss
         • Sampling in this example
           = flipping the coin many times N
               – e.g., N=1000 times
               – One flipping  get one sample
               – Ideally, 500 heads, 500 tails
                    • P(head) = 500/1000=0.5
                      P(tail) = 500/1000=0.5
               – Practically, 5001 heads, 499 tails
                    • P(head) = 501/1000=0.501
                      P(tail) = 499/1000=0.499
         • After the sampling,
               – We can estimate probability distribution
               – Check if it is biased
         Fu Jen University        Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 15



                    Sampling & Estimation (Math)
        • For a Boolean random variable X
              – P(X) is prior distribution
                = <P(x), P(x)>
              – Using a sampling algorithm to generate N
                samples
              – Say N(x) is the number of samples that x is
                true, N(x) x is false
                        N ( x) ˆ          N ( x ) ˆ
                                P( x),             P (x )
                          N                 N
                             N ( x)                N ( x )
                       lim           P( x), lim             P ( x )
                       N    N               N    N
         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 16



                      1.2 Sampling for Inference
         • Given a Bayesian network G including
           (X1, …, Xn)
               – We get a joint probability distribution
                 P(X1, …, Xn) =  P(Xi|Pa(Xi))
         • For a query P(X|E=e)
               – P(X|e) =   P(Xi | Parent(Xi))
               – It is hard to compute
                    • Need exponential time in number of Xi
               – We will try to use sampling to compute it


         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 17



                    Compute P(X|e) by Sampling
         • Sampling                                                              Explained in
               – Generate N samples of                                           Sections 2,3,4
                 P(X1, …, Xn) =  P(Xi|Pa(Xi))
         • Estimation
               – Use N samples to estimate
                 P(X,e)  N(X,e)/N
               – Use N samples to estimate P(e)  N(e)/N
               – Estimate P(X|e) by P(X,e) / P(e)



         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 18



                    What Is Sampling Algorithm
         • The algorithm to
               – Generate samples from a known
                 probability distribution P
                                                      ˆ
               – Estimate the approximate probability P




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 19



                    Various Sampling Algorithms
        • Stochastic simulation                                                   Section 3
              – Direct Sampling
              – Rejection sampling
                    • Reject samples disagreeing with evidence
              – Likelihood weighting
                    • Use evidence to weight samples
        • Markov chain Monte Carlo                                                 Section 4
          (MCMC)
              – Sample from a stochastic process whose
                stationary distribution is the true posterior

         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 20




              2. Random Number Generator

         • Very important for sampling algorithm
         • Introduce basic concepts related to
           sampling of Bayesian networks
         • Subsections
               – 2.1 Univariate
               – 2.2 Multivariate



         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 21



              RNG In Programming Languages
         • Random number generator (RNG)
               – C/C++: rand()
               – Java: random()
               – Matlab: rand()
         • Why should we discuss it?
               – They generate random numbers with
                 uniform distribution
               – How to generate
                  • Gaussian, …
                  • Multivariate, dependent random
                    variables
                  • Non-closed-form distribution?
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 22



              Generate a Random Number (1/2)
         • Examples in C
               – int i = rand();
               – Return 0 ~ RAND_MAX (32767)
               – It generates integers
         • Generate a random number
           between 1 and n (n<32767)
               – int i = 1 + ( rand() % n )
               – (rand() % n) returns a number between 0
                 and n - 1
               – Add 1 to make random number between 1
                 and n
               – It generates integers, but not real numbers
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 23



              Generate a Random Number (2/2)
         • Ex: integer between 1 and 6
               –1 + ( rand() % 6)
         • Ex: real number between 0 and 1
               –double i = rand() / RAND_MAX
         • Exercise
               – Real number between 10 and 20




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 24


                      Generate
            Many Random Numbers Repeatedly
         • Using loop for repeated generation
               – for (int i=0; i<1000; i++)
                  { rand(); }
               – int i, j[1000];
                 for (i=0; i<1000; i++)
                  { j[i] = 1 + rand() % 6; }

                    rand() generates a number uniformly
                            Uniform distribution
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 25



               Why Generate Random Numbers
         • Simulate random behavior
         • Make random decision
         • Estimate some values




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 26



                Random Behavior/Decision (1/2)
         • Flip a coin for decision (Boolean)
               – Fair: each face has equal probability
               – int coin_face;
                 if (rand() > RAND_MAX/2)
                         coin_face = 1;
                    else coin_face = 0;
               – int coin_face;
                 coin_face = rand() % 2;



         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 27



         Random Behavior/Decision (2/2)
      • Random decision of multiple choices
            – Discrete random variable
      • Ex: roll a die                                            Uniform distribution
         – Fair: each face has equal probability
      • int die_face;    //Random variable
        die_face = rand() % 6;




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 28



                                  Estimation
         • If we can simulate a random behavior
         • We can estimate some values
               – First, we repeat the random behavior
               – Then we estimate the value




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 29



                         Example: The Coin Toss
         • Flip the coin 1000 times to estimate the
           fairness of the coin
               – int coin_face; //Random variable
                 int frequency[2];
                                              Uniform distribution
                 for (i=0; i<1000; i++)
                                            frequency
                 { coin_face = rand() % 2
                   frequency[coin_face]++;
                 }

                                                                                     0       1     Coin
                                                                                                   face
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 30



         Example : Area of Circle (Estimation)
         • int x, y; //Two random variables
           int N=1000, NCircle=0, Area;
           for (i=0; i<N; i++)
           { x = rand() / RAND_MAX;         x and y are
             y = rand() / RAND_MAX;         independent
             if ( (x*x + y*y) <= 1 )
                NCircle = NCircle + 1;
           }                         A random number ?
           Area = 4 * (NCircle/N);
                                                              We call (x,y) a sample
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                  Unit - Approximate Inference in Bayesian Networks                        p. 31



       Multiple Dependent Random Variables
         • Markov Chain: n random variables
                       X1               ...            Xk         ...                     Xn

         • Bayesian Networks: 5 random variables
                             Burglary            Earthquake


                                        Alarm                            What is a sample ?

                             John Calls          Mary Calls

                               Variables are dependent
         Fu Jen University              Department of Electrical Engineering           Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 32



                                      Sampling
         • It is to randomly generate a sample
               – For a random variable X or                                                Univariate
                 A set of random variables X1, …, Xn                                       Multivariate
                  • Boolean, Discrete, Continuous
                  • Multivariate
                        – Independent, dependent
               – According to a probability distribution P(X)
                  • Discrete X: Histogram
                  • Continuous X:
                        – Uniform, Gaussian, or
                        – Any distribution: Gaussian mixture models
         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks               Unit - Approximate Inference in Bayesian Networks                        p. 33



                              Sub-Sections for
                             Generating a Sample
         • 2.1 Univariate
               – Uniform, Gaussian, Gaussian mixture
         • 2.2 Multivariate
               – Uniform
               – Gaussian
                    • Independent, dependent
               – Any distribution
                    • Gaussian mixture
                        – Independent, dependent
                    • Bayesian network

         Fu Jen University        Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 34



                             2.1 Univariate
         • For a random variable X
               – Boolean, discrete, continuous, hybrid
         • We know P(X) is
               – Uniform, Gaussian, Gaussian mixture
         • Generate a sample X according to P(X)




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 35



                             Uniform Generator
         • Every programming language provides
           a rand()/random() function to generate
           a uniform-distributed number
               – Integer number within [0, MAX)
         • Sampling a Boolean uniform number
               – rand() %2
         • Sampling a discrete uniform number
           within [0, d)
               – rand() % d
         • Sampling a continuous uniform number
               – Within [0, 1): rand() % MAX
               – Within [a, b): a + (rand() % MAX)*(a-b)
         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                         p. 36



                    Example : Uniform Generator
         • x=rand(1,10000);
         • h=hist(x,20);              600

         • bar(h);
                                      500



                                      400



                                      300



                                      200



                                      100



                                        0
                                            0        5          10          15          20        25




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 37



                        Gaussian Generator (1/2)
         • Sampling Gaussian can be obtained by
           uniform distribution
         • There are functions in C/Java/Matlab to
           randomly generate a univariate
           Gaussian real number with (, )=(0,1)
               – C : Numerical recipies in C,
               – Java: Random.nextGaussian()
               – Matlab: randn()
         • Suppose it is called Gaussian()

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 38



                      Gaussian Generator (2/2)
         • Sampling a continuous Gaussian
           number with (, )
               – (Gaussian() * ) + 
         • Sampling a discrete Gaussian number
           with (, ) ?




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 39



           Example : Gaussian Generator (1/2)
         • Pseudo codes
               – Assume Gaussian() is a pseudo function to
                 generate Gaussian numbers
               – double x[10000];
                 for (i=0; i<10000; i++)
                   x[i] = Gaussian();
               – for (i=0; i<10000; i++)
                   x[i] =  + Gaussian() * ;



         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                 Unit - Approximate Inference in Bayesian Networks                        p. 40



           Example : Gaussian Generator (2/2)
         • Matlab                                         • Java
                   – x=randn(1,10000);                         – Random r=new
                   – h=hist(x,20);                                          Random();
        1600
                   – bar(h);                                     int x[10000];
        1400
                                                                 for (i=0;i<10000;i++)
        1200
                                                                  x[i]=r.nextGaussian();
        1000


         800


         600


         400


         200


           0
               0      5      10    15          20        25

         Fu Jen University              Department of Electrical Engineering          Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 41



              Gaussian Mixture Generator (1/2)
         • Random variable X with Gaussian
               – P(X) = N(X; , )
         • Random variable Y with Gaussian
           mixture
               – P(Y) = m mN(Y; m, m)




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 42



              Gaussian Mixture Generator (2/2)
         • Generate N samples of X
               – for (i=0; i<N; i++)
                   x[i]=(Gaussian() * ) + 
         • Generate N samples of Y with mixture
           of M Gaussians
               – Each Gaussian m has m, m
               – for (m=0; m<M; m++)
                   for (i=0; i<N*m; i++)
                      y[m][i] = (Gaussian() * m) + m


         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 43


                    Example : Gaussian Mixture
                            Generator
         •   N=10000; pi1=0.8; pi2=0.2;
         •   mu1=0; mu2=15; sigma1=3; sigma2=5;
         •   x1 = mu1 + randn(1,N*pi1) * sigma1;
         •   x2 = mu2 + randn(1,N*pi2) * sigma2;  900
         •   x = [x1, x2];                        800
         •   h=hist(x,50);                        700

         •   bar(h);                              600

                                                  500

                                                  400

                                                  300

                                                  200

                                                  100

                                                    0
                                                        0   10       20          30    40     50     60


         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 44



                             2.2 Multivariate
         • For random variables X1,… ,Xn
               – Boolean, discrete, continuous, hybrid
         • We know P(X1,… ,Xn) is
               – Uniform, Gaussian, Gaussian mixture, any
                 distribution
         • Generate a sample (X1,… ,Xn) according
           to P(X1,… ,Xn)
               – Independent
               – Dependent

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 45



          Multivariate Boolean Uniform Generator
         • Boolean random variables X1,… ,Xn
         • int X[n]; // A sample
           for (i=0; i<n; i++)
             X[i] = rand() % 2;




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 46



          Multivariate Discrete Uniform Generator

         • Discrete random variables X1,…, Xn
               – Each with d discrete values: [0, d-1]
               – Each Xi is uniform distributed
               – X1,…, Xn must be independent
         • int X[n]; // A sample
           for (i=0; i<n; i++)
             X[i] = rand() % d;



         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 47


               Multivariate Gaussian Generator
                     - Independent (1/2)
         • Pseudo codes
         • For n random variables X=(X1,…,Xn)
               – Gaussian : N(X; , )
                    • Mean vector: 
                    • Covariance matrix: =[ij]
         • X1,…,Xn are independent
               – ij = 0 for ij
         • Generate a sample of X
                Generate each Xi independently

         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 48


               Multivariate Gaussian Generator
                     - Independent (2/2)
         • Generate a sample of X =(X1,…,Xn) with
           i=0, ii=1, ij = 0 for ij
               – int X[n]; // a sample
                 for (i=0; i<n; i++)
                    X[i] = Gaussian();
         • Generate a sample of X =(X1,…,Xn) with
           i0, ii 1, ij = 0 for ij
               – int X[n]; // a sample
                 for (i=0; i<n; i++)
                    X[i] = i + Gaussian() * ii;

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 49



                             Example – Matlab (1/2)
         mx=[0 0]';          X  (0,0)                   T

         Cx=[1 0; 0 1];           1 0
         x1=-3:0.1:3;       X        
         x2=-3:0.1:3;             0 1 
         for i=1:length(x1),
           for j=1:length(x2),
           f(i,j)=(1/(2*pi*det(Cx)^
           1/2))*exp((-1/2)*([x1(i)
           x2(j)]-
           mx')*inv(Cx)*([x1(i);x2(
           j)]-mx));
             end
         end
         mesh(x1,x2,f)
         pause;
         contour(x1,x2,f)
         pause

         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 50



                             Example – Matlab (2/2)
         • Randomly generate 1000 samples for
                                 1 0
              X  (0,0) ,  X  
                             T

                                  0 1
                                      

             y1=randn(1,1000);
             y2=randn(1,1000);
             plot(y1,y2,'.');



         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks               Unit - Approximate Inference in Bayesian Networks                        p. 51


               Multivariate Gaussian Generator
                      - Dependent (1/4)
        • For n random variables X=(X1,…,Xn)
             –Gaussian : N(X; , )
                    • Mean vector: 
                    • Covariance matrix: =[ij]
                      –  is a positive definite matrix
                         • Symmetric and all eigenvalues (pivots) > 0
                      – For general matrix A : A= LDU
                         • L: lower triangular, U: upper triangular
                           D: diagonal matrix of pivots
                      – For symmetric matrix S: S = LDLT
                      – For positive definite matrix  = LDL           PPT
                                                                         T
                                                             T= L D L D
                      – This is called Cholesky decomposition
        • X1,…,Xn are dependent
             –ij  0
         Fu Jen University        Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 52


           Multivariate Gaussian Generator
                   - Dependent (2/4)
        • Generate a sample of X with , 
             – Perform Cholesky decomposition of 
                    • Cholesky decomposition is pivot decomposition
                      for positive definite matrix
                    •  = PP-1 = PPT
             – Generate independent Gaussian Y=(Y1,…,Yn )
               with i=0, i=1
             – X = PY + 



         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 53


               Multivariate Gaussian Generator
                      - Dependent (3/4)
         • Pseudo code to generate a sample of X
           with , 
               – Matrix ;
                 Vector ;
                 Vector X(n), Y(n); // a sample

                    Matrix P=chol(); //Cholesky decomp.
                    for (i=0; i<n; i++) Y(i) = Gaussian();
                    X=P*Y+

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 54


               Multivariate Gaussian Generator
                      - Dependent (4/4)
        • Proof
              – For n random variables X=(X1,…,Xn) with , 
              – Generate n independent, zero-mean, unit variance
                normal random variables Y=(Y1,…,Yn)
                                                           1  0
                Y  (Y1 , , Yn )T , Y  (0, ,0)T , Y      
                                                                  
                                                           0  1 
                                                                  
              – Take X = PY+, where  =PP        -1 =PPT


                                                  
        Covariance Matrix of X  E ( X   )( X   )T                               
         E{( PY )( PY )T }  E{PYY T P T }  PE{YY T }P T  PP T  
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 55



                             Example – Matlab (1/4)
      Assume
         X  (0,0)T
              1 1 / 2       1                     0 
        X          , P  1 / 2                  3 
             1 / 2 1                               2

          1/ 2
       Matlab:
       mx=[0 0]';
       Cx=[1 1/2; 1/2 1];
       P=chol(Cx);

         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 56



                             Example – Matlab (2/4)
         • Randomly generate 1000 samples for
                                  1 1 / 2
              X  (0,0) ,  X  
                             T

                                 1/ 2 1 
         • mx=zeros(2,1000);
           y1=randn(1,1000);
           y2=randn(1,1000);
           y=[y1;y2];
           P=[1, 0; 1/2, sqrt(3)/2];
           x=P*y+mx;
           x1=x(1,:);
           x2=x(2,:);
           plot(x1,x2,'.');
           r=corrcoef(x1',x2');
         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 57



                             Example – Matlab (3/4)
      Assume
         X  (5,5)T
              1 0.9      1                     0 
        X        , P   9                   19 
             0.9 1        10                    10 

          0.9

       Matlab:
        • mx=[5 5]';
        • Cx=[1 9/10; 9/10 1];
        • P=chol(Cx);

         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 58



                             Example – Matlab (4/4)
         • Randomly generate 1000 samples for
                                   1 0.9
               X  (5,5) ,  X  
                             T

                                   0. 9 1 
                                           
         • mx=5*ones(2,1000);
           y1=randn(1,1000);
           y2=randn(1,1000);
           y=[y1;y2];
           P=[1, 0; 9/10, sqrt(19)/10];
           x=P*y+mx;
           x1=x(1,:);
           x2=x(2,:);
           plot(x1,x2,'.');
           r=corrcoef(x1',x2');
         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 59


                    Multivariate Gaussian Mixture
                              Generator
       • Generate N samples of X with mixture of M
         Gaussians (Matlab-like pseudo code)
             – for (m=0; m<M; m++)
               { Matrix P=chol(m) //Cholesky decomposition
                 for (i=0; i<N*m; i++)
                  { //Generate n independent normally distributed
                    // R.V. (=0, =1)
                    y = randn(1, n)
                    // Transform y into x
                    x=P*y+
                  }
               }

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                 Unit - Approximate Inference in Bayesian Networks                          p. 60



                             Example – Matlab (1/4)
         • Combine the previous two Gaussians:
           1=0.5, 2=0.5,                7



              1  (0,0)
                                          6
                             T
                                          5


                   1 1 / 2              4

             1  
                   1/ 2 1 
                                          3

                                        2


              2  (5,5)      T           1

                                          0

                   1 0. 9               -1

             2                        -2

                  0.9 1                 -3
                                            -4      -2       0        2         4         6      8      10

         Fu Jen University          Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 61



                             Example – Matlab (2/4)
         • pi1= 0.5; pi2=0.5; N=2000;
           mx1=zeros(2,pi1*N); Cx1=[1 1/2; 1/2 1];
           P1=chol(Cx1); %P=[1, 0; 1/2, sqrt(3)/2];
           y1_1=randn(1,pi1*N); y1_2=randn(1,pi1*N);
            y1=[y1_1;y1_2];
           x1=P1*y1+mx1; x1_1=x1(1,:); x1_2=x1(2,:);
             mx2=5*ones(2,pi2*N); Cx2=[1 9/10; 9/10 1];
             P2=chol(Cx2); %P=[1, 0; 1/2, sqrt(3)/2];
             y2_1=randn(1,pi2*N); y2_2=randn(1,pi2*N);
             y2=[y2_1;y2_2];
             x2=P2*y2+mx2; x2_1=x2(1,:); x2_2=x2(2,:);
             z1=[x1_1,x2_1]; z2=[x1_2,x2_2];
             plot(z1,z2,'.');
         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                 Unit - Approximate Inference in Bayesian Networks                          p. 62



                             Example – Matlab (3/4)
         • Combine the previous two Gaussians
           1=0.2, 2=0.8                 7

                                          6

              1  (0,0)     T
                                          5


                   1 1 / 2              4

             1  
                   1/ 2 1 
                                          3

                                        2



              2  (5,5)      T           1

                                          0


                   1 0. 9               -1

             2                        -2

                  0.9 1                 -3
                                            -4      -2        0        2        4         6      8      10


         Fu Jen University          Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 63



                             Example – Matlab (4/4)
         • pi1= 0.2; pi2=0.8; N=2000;
           mx1=zeros(2,pi1*N); Cx1=[1 1/2; 1/2 1];
           P1=chol(Cx1); %P=[1, 0; 1/2, sqrt(3)/2];
           y1_1=randn(1,pi1*N); y1_2=randn(1,pi1*N);
            y1=[y1_1;y1_2];
           x1=P1*y1+mx1; x1_1=x1(1,:); x1_2=x1(2,:);
             mx2=5*ones(2,pi2*N); Cx2=[1 9/10; 9/10 1];
             P2=chol(Cx2); %P=[1, 0; 1/2, sqrt(3)/2];
             y2_1=randn(1,pi2*N); y2_2=randn(1,pi2*N);
             y2=[y2_1;y2_2];
             x2=P2*y2+mx2; x2_1=x2(1,:); x2_2=x2(2,:);
             z1=[x1_1,x2_1]; z2=[x1_2,x2_2];
             plot(z1,z2,'.');
         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 64



                                     Exercise
         • Write a program to randomly generate
           1000 samples of 3-dimensional Gaussian
           with =(5,10,-3), =(2,1,3;4,2,2;3,1,2)




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 65



                             Any Distribution
         • For random variables X1,… ,Xn
               – Boolean, discrete, continuous, hybrid
         • We know P(X1,… ,Xn) has no closed-form
           formula
               – Independent: P(X1,… ,Xn)= P(X1)… P(Xn)
               – Dependent:
                 P(X1,… ,Xn)=  P(Xi | Parent(Xi))
         • Generate a sample (X1,… ,Xn) according to
           P(X1,… ,Xn)
               – Independent: generate each Xi by P(Xi)
               – Dependent: generate each Xi by P(Xi| Parent(Xi))

         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 66



               Two Boolean R.V.s - Independent
         • X1, X2 have distributions :
              – P(X1)=<0.67, 0.33>, P(X2)=<0.75,0.25>
         • int X1, X2;                                                               P(X1)
           for (i=0; i<1000; i++)                                         0.67
           { if (rand() > RAND_MAX/3)
                X1 = 1;
             else X1 = 0;                                                             0         1      X1
             if (rand() > RAND_MAX/4)                                               P(X2)
                X2 = 1;                                                  0.75
             else X2 = 0;
           }
                                                                                        0       1      X2
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 67



                    Two Boolean R.V.s - Dependent
         • X1, X2 have distributions :
            – P(X1)=<0.67, 0.33>
            – P(X2|X1=T)=<0.75,0.25>, P(X2|X1=F)=<0.8,0.2>
         • Generate a sample (x1, x2)
           if (rand() > RAND_MAX/3) x1 = 1;
           else           x1 = 0;
           if (x1==1)
              if (rand() > RAND_MAX/4)     x2 = 1;
              else        x2 = 0;
           else // x1==0
              if (rand() > RAND_MAX/5) x2 = 1;
              else        x2 = 0;

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 68



                             Markov Chain
         • Markov Chain: n random variables
                       X1      ...              Xk          ...                     Xn




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                     Unit - Approximate Inference in Bayesian Networks                        p. 69



                                 Bayesian Network
         • Example: 5 random variables
               Burglary              Earthquake


                             Alarm


               John Calls            Mary Calls




         Fu Jen University              Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 70




                       3. Stochastic Simulation
         • Also called
               – Monte Carlo Methods
               – Sampling Methods
         • Sub-sections
               – 3.1 Direct sampling
               – 3.2 Rejection sampling
               – 3.3 Likelihood weighting

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 71



                  3.1 Direct Sampling
         • Generate N samples randomly
         • For the inference P(X|E)
               – P(X|E)= P(X^E) / P(E)
               – Get N(E) & N(X^E) from the N
                 samples
                    • N(E) : No. of samples of E
                    • N(X^E) : No. of samples of X and E
               – P(E) = N(E) / N,
                 P(X^E) = N(X^E) / N
               – P(X|E) = N(X^E) / N(E)

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 72



                     Example (1/4)
         • For the sprinkler network
               – Estimate P(w|r)
                 by direct sampling
               – 4 random variables
               – A sample =
                 (c,s,r,w)




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                  Unit - Approximate Inference in Bayesian Networks                        p. 73



                                    Example (2/4)
         • Generate 1000 samples
                        Cloudy Sprinkler Rain                             WetGrass
                             T               T                 T                  F
                             F               T                 T                  F
                             F               F                 T                  T
                             T               T                 T                  F
                             T               T                 T                  F
                             ...             ...               ...                ...
                             F               T                 T                  F

         Fu Jen University           Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 74



                          Example (3/4)
         •   P(r| w) = P(r, w)/P(w)
             Nw: No. of WetGrass=False
             Nr^w: No. of (Rain=True&WetGrass=False)
                                Cloudy Sprinkler Rain                                   WetGrass
                                      T                   T                  T                 F
                                      F                   T                  T                 F
       Nr^w / Nw                    F                   F                  T                 T
                                      T                   T                  F                 F
                                      ...                 ...                ...               ...
                                      F                   T                  T                 F
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                  Unit - Approximate Inference in Bayesian Networks                        p. 75



                                    Example (4/4)
         • P(R|w)
              – = P(R, w)/P(w)
              – = < P(r ^ w)/P(w), P(r ^ w)/P(w) >
                         Cloudy Sprinkler Rain                              WetGrass
                             T                 T                 T                     F
                             F                 T                 T                     F
                             F                 F                 T                     T
                             T                 T                 F                     F
                             ...               ...               ...                   ...
                             F                 T                 T                     F
         Fu Jen University           Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 76

               How to Generate a Sample
            for the Bayesian Network? (1/3)
         • The sprinkler Bayesian network
    A sample is an atomic event :
    (cloundy,sprinkler,rain,wetgrass)
    =(T, F, T, T)



      •Assume a sampling order:
      [ Cloudy, Sprinkler,
        Rain, WetGrass ]

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 77


                       How to Generate a Sample
                    for the Bayesian Network? (2/3)
         • int C, S, R, W;
           for (i=0; i<1000; i++)
           { if (rand() > RAND_MAX/2) C = T;
                else C = F;
             if (rand() > RAND_MAX/2) S = T;
                else S = F;
             if (rand() > RAND_MAX/2) R = T;
                else R = F;
             if (rand() > RAND_MAX/2) W = T;
                else W = F;
           }                      Incorrect
                                                   Implementation
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 78


                       How to Generate a Sample
                    for the Bayesian Network? (3/3)
       • int C, S, R, W;
         for (i=0; i<1000; i++)
         { if (rand() > RAND_MAX/2) C = T;
           else C = F;
           if (C==T)
               if (rand() > RAND_MAX*0.9)
                    S = T;
               else S = F;
           else // C==F
               if (rand() > RAND_MAX/2)
                    S = T;
               else S = F;
           ...
         }
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 79

                           An Example
                    Generating One Sample (1/8)
         • The sampling algorithm
               1.Sample from P(Cloudy)=<0.5, 0.5>
                  – Suppose it returns true
               2.Sample from
                 P(Sprinkler|Cloudy=true)=<0.1,0.9>
                  – Suppose it returns false
               3.Sample from
                 P(Rain|Cloudy=true)=<0.8,0.2>
                  – Suppose it returns true
               4.Sample from
                 P(WetGrass|Sprinkler=false, Rain=true) =
                 <0.9,0.1>
                  – Suppose it returns true
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 80

                           An Example
                    Generating One Sample (2/8)
                                                                                     C     S    R     W

                                                              Samples:




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 81

                           An Example
                    Generating One Sample (3/8)
       Random sampling:                                                               C    S    R W
         Cloudy                                             Samples:
                                                                                      c

       Return: Cloudy=true




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 82

                           An Example
                    Generating One Sample (4/8)
                                                                                        C    S    R W
                                                                                        c
                                                               Samples:




         Random sampling
         1. Sprinkler
         2. Rain
         Given Cloudy=true

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 83

                           An Example
                    Generating One Sample (5/8)
                                                                                        C    S    R W
                                                                                        c    s
                                                               Samples:




     Random sampling
     Sprinkler
     Given Cloudy=true
     Return: Sprinkler=false
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 84

                           An Example
                    Generating One Sample (6/8)
                                                                                     C    S    R W
                                                                                     c    s     r
                                                           Samples:




                                                     Random sampling Rain
                                                     Given Cloudy=true

                                                      Return: Rain=true

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 85

                           An Example
                    Generating One Sample (7/8)
                                                                                     C    S    R W
                                                                                     c    s     r
                                                           Samples:




                                               Random sampling WetGrass
                                               Given Rain=true,
                                                     Sprinkler=false

         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                          p. 86

                           An Example
                    Generating One Sample (8/8)
                                                                                     C    S    R W
                                                                                     c    s     r   w
                                                           Samples:




                                                Random sampling WetGrass
                                                Given Rain=true,
                                                      Sprinkler=false
                                                 Return: WetGrass=true
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 87



                             The Algorithm (1/2)
         • To generate one sample




         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 88



                  The Algorithm (2/2)
         • In previous example
               – We get a sample [true, false, true, true]
                 of a Bayesian network using the Prior-
                 Sample
         • The sampling of a Bayesian network
               – Repeat the sampling N times
               – We get N samples
         • We can use the N samples to compute
           any query probability in the Bayesian
           network
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                  Unit - Approximate Inference in Bayesian Networks                        p. 89



                 How It Works (1/2)
         • Why any probability can be
           answered from the sampling?
               – The N samples is actually a full joint
                 distribution table (FJD)
             C       S       R     W                          C        S         R         W       P
             T       T       T     F                          T        T         T         F      0.02
             F       T       T     F                          F        T         T         F      0.13
             F       F       T     T                          F        F         T         T      0.04
             T       T       F     F                          T        T         F         F      0.15
             ...     ...     ...   ...                        ...      ...       ...       ...     ...
             F       T       T     F                                         FJD
         Fu Jen University           Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 90



                             Why It Works (2/2)
         • A sample is an atomic event (x1, ..., xn)
         • P(x1, ..., xn)  N(x1, ..., xn) / N
         • Therefore, a FJD is generated from
           the N samples
         • Note: N < 2n




         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                    Unit - Approximate Inference in Bayesian Networks                        p. 91



                       Exercise: Direct Sampling
         p(smart)=.8                     p(study)=.6                 Query: What is the probability
                    smart            study                           that a student studied, given
                                                                     that they pass the exam?
                                                  p(fair)=.9
                             prepared                   fair

                                                                     p(prep|…) smart smart
                              pass                                   study     .9    .7
                      smart      smart                              study               .5        .1
         p(pass|…)
                   prep prep prep prep
         fair      .9   .7    .7    .2
         fair     .1   .1    .1    .1
         Fu Jen University              Department of Electrical Engineering             Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 92



                    Problems of Direct Sampling
         • It needs to generate very many
           samples in order to obtain the
           approximate FJD
         • For a query of conditional
           probability P(X|e)
               – Can we just approximate the
                 conditional probability?
               – Yes, the following two algorithms will
                 do this
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 93



                             3.2 Rejection Sampling
            ˆ
         • P( X | e) is estimated from samples
           agreeing with e




         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 94



                                An Example
         • Estimate P(Rain|Sprinkler=true)
           using 100 samples
               – 27 samples have Sprinkler = true
               – Of these, 8 have Rain=true and
                 19 have Rain=false
                 
               – P(Rain|Sprinkler=true) =
                 Normalize(<8,19>) = <0.296, 0.704>
         • Similar to a basic real-world
           empirical estimation procedure
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 95



                    Analysis of Rejection Sampling

           P ( X | e) 
           ˆ                    N ( X ,e )
                                 N (e)                 P ( X ,e )
                                                         P (e)             P ( X | e)
          • Hence rejection sampling returns
            consistent posterior estimates
          • Problem: expensive if P(e) is small
               – P(e) drops off exponentially with
                 number of evidence variables!


         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 96



               3.3 Likelihood Weighting
         • Avoids the inefficiency of rejection
           sampling
               – By generating only events consistent
                 with the evidence variables e
         • Idea                                                                        Randomly
               – Fix evidence variables,                                               generate
                                                                                       a sample
               – Sample only hidden variables                                          event
               – Weight each sample event by the
                 likelihood it accords the evidence
                    • Events have different weights
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 97



                  An Example (1/9)
         • Query P(Rain|sprinkler, wetgrass)




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 98



                             An Example (2/9)
         1. Set the weight  =1.0
         2. Sample from P(Cloudy)=<0.5,0.5>
               •     Suppose it returns true
         3. The evidence Sprinkler=true. So we set
             =  P(sprinkler|cloudy)=1*0.1=0.1
         4. Sample from P(Rain|cloudy)=<0.8,0.2>
               •     Suppose it returns true
         5. The evidence WetGrass=true. So we set
             =  P(wetgrass|sprinkler,rain)
            =0.1*0.99=0.099
                    A sample event (true, true, true, true)
                             with weight 0.099
         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 99



                             An Example (3/9)




                       =1.0

         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 100



                             An Example (4/9)




                       =1.0

         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 101



                             An Example (5/9)




                       =1.0

         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 102



                             An Example (6/9)




                    =1.0  0.1
         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 103



                             An Example (7/9)




                    =1.0  0.1
         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 104



                             An Example (8/9)




                    =1.0  0.1
         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 105



                             An Example (9/9)




          =1.0  0.1  0.99
           = 0.099

         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 106



                   The Algorithm (1/2)
         • The example generates a sample
           event (true, true, true, true) for the
           query P(Rain|sprinkler, wetgrass)
         • Repeat the sampling N times
               – We get N sample events
               – Each event has a likelihood weight 
               – 1 = rain=true , 1 = rain=false 
         • P(Rain|sprinkler, wetgrass)
           = < 1/(1+2), 2/(1+2) >
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 107



                             The Algorithm (2/2)




         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                    Unit - Approximate Inference in Bayesian Networks                        p. 108



                    Exercise: Likelihood Weighting
         p(smart)=.8                     p(study)=.6                 Query: What is the probability
                    smart            study                           that a student studied, given
                                                                     that they pass the exam?
                                                  p(fair)=.9
                             prepared                   fair

                                                                     p(prep|…) smart smart
                              pass                                   study     .9    .7
                      smart      smart                              study               .5        .1
         p(pass|…)
                   prep prep prep prep
         fair      .9   .7    .7    .2
         fair     .1   .1    .1    .1
         Fu Jen University              Department of Electrical Engineering             Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 109



                     Analysis (1/3)
         • Why the algorithm works? P(X|E=e)
         • Let the sampling probability for
           WEIGHTED-SAMPLE be SWS
               – The evidence variables E are fixed
                 with e
               – All the other variables Z = {X}  Y
               – The algorithm samples each variable
                 in Z given its parent values
                                         l
                      SWS ( z , e)   P( zi | parents( Z i ))
                                       i 1
         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                  Unit - Approximate Inference in Bayesian Networks                        p. 110



                        Analysis (2/3)
        • The likelihood weight w for a given
          sample (z, e)=(x, y, e) is
                         m
            w( z , e)   P (ei | parents ( Ei ))
                                      i 1
        • The weighted probability of a
          sample (z,e)=(x, y, e) is
          SWS ( z , e) w( z , e)
                l                                      m
            P( zi | parents ( Z i )) P (ei | parents ( Ei ))
               i 1                                   i 1                               n

           P ( x, y , e)                                     P( x1 , , xn )   P( xi | parents ( X i ))
                                                                                        i 1

         Fu Jen University           Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                 Unit - Approximate Inference in Bayesian Networks                        p. 111



                                   Analysis (3/3)
            P( x | e)    NWS ( x, y, e) w( x, y, e)
            ˆ
                                    y
                                '  SWS ( x, y, e) w( x, y, e)
                                      y

                                '  P ( x, y , e)
                                        y
                                ' P ( x, e)  P ( x | e)
                             So the algorithm works
         Fu Jen University          Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 112



                      Discussions
       • Likelihood weighting is efficient
         because it uses all the samples
         generated
       • However, it suffers a degradation in
         performance as the no. of evidence
         variables increases, because
             – Most samples will have very low weights,
             – The weighted estimate will be dominated
               by the tiny fraction of samples that have
               infinitesimal likelihood
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 113




                       4. Inference by MCMC
         • Key idea
               – Sampling process as a Markov Chain
                    • Next sample depends on the previous one
               – Approximate any posterior distribution
         • "State" of network
           = current assignment to all variables
         • Generate next state
               – by sampling one variable given Markov
                 blanket
         • Sample each variable in turn, keeping
           evidence fixed
         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 114



                             The Markov Chain
         • With Sprinkler =true, WetGrass=true,
           there are four states:




         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 115



                       Markov Blanket Sampling
         • Markov blanket of Cloudy is
               – Sprinkler and Rain
         • Markov blanket of Rain is
               – Cloudy, Sprinkler, and WetGrass
         • Probability given the Markov
           blanket is calculated as follows
               – P(x'i|MB(Xi))
                 = P(x'i|Parents(Xi))
                   ZjChildren(Xi)P(zj|Parents(Zj))
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 116



                             An Example (1/2)
         • Estimate P(Rain|sprinkler,wetgrass)
         • Loop for N times
               – Sample Cloudy or Rain given its
                 Markov blanket
         • Count number of times Rain=true
           and Rain=false in the samples



         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 117



                             An Example (2/2)
         • E.g., visit 100 states
               – 31 have Rain=true,
               – 69 have Rain=false
         • P(Rain|sprinkler,wetgrass)
           = Normalize(<31, 69>)
           = <0.31, 0.69>



         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 118



                             The Algorithm




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 119



                              Why it works
         • Skipped
               – Details in pp. 517-518 in the AIMA 2e
                 textbook




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 120



                               Sub-Sections
         • 4.1 Markov chain theory
         • 4.2 Two MCMC sampling algorithms




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 121



            4.1 Markov Chain Theory
    • Suppose X1, X2, … take some set of values
          – wlog. These values are 1, 2, ...
    • A Markov chain is a process that corresponds
                                                              ...                                   ...
      to the network:
                    X1       X2                X3                                   Xn

    • To quantify the chain, we need to specify
          – Initial probability: P(X1)
          – Transition probability: P(Xt+1|Xt)
    • A Markov chain has stationary transition
      probability: P(Xt+1|Xt) same for all times t
         Fu Jen University        Department of Electrical Engineering            Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 122



                             Irreducible Chains
         • A state j is accessible from state i if there
           is an n such that P(Xn = j | X1 = i) >
             0
               – There is a positive probability of reaching
                 j from i after some number steps

         • A chain is irreducible if every state is
           accessible from every state

         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 123



                             Ergodic Chains
         • A state is positively recurrent if there is a
           finite expected time to get back to state i
           after being in state i
               – If X has finite number of states, then this is
                 suffices that i is accessible from itself

         • A chain is ergodic if it is irreducible and
           every state is positively recurrent


         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks               Unit - Approximate Inference in Bayesian Networks                        p. 124



                              (A)periodic Chains
         • A state i is periodic if there is an integer
           d such that when n is not divisible by d
                             P(Xn = i | X1 = i ) = 0
         • Intuition: only every d steps state i may
           occur
         • A chain is aperiodic if it contains no
           periodic state



         Fu Jen University        Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                Unit - Approximate Inference in Bayesian Networks                        p. 125



                             Stationary Probabilities
         Thm:
         • If a chain is ergodic and aperiodic, then
           the limit n   P ( X n | X 1  i )
                     lim
           exists, and does not depend on i
         • Moreover, let P * ( X  j )  n   P ( X n  j | X 1  i )
                                               lim
           then, P*(X) is the unique probability
           satisfying
             P * (X  j )        P ( X t  1  j | X t  i )P * ( X  i )
                                   i


         Fu Jen University         Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 126



                 Stationary Probabilities
         • The probability P*(X) is the stationary
           probability of the process
         • Regardless of the starting point, the
           process will converge to this probability

         • The rate of convergence depends on
           properties of the transition probability



         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks               Unit - Approximate Inference in Bayesian Networks                        p. 127



                               Sampling from the
                             Stationary Probability
         • This theory suggests how to sample from
           the stationary probability:
               – Set X1 = i, for some random/arbitrary i
               – For t = 1, 2, …, n
                  • Sample a value xt+1 for Xt+1 from
                   P(Xt+1|Xt=xt)
               – return xn
         • If n is large enough, then this is a sample
           from P*(X)

         Fu Jen University        Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 128



                       Designing Markov Chains
         • How do we construct the right chain to
           sample from?
               – Ensuring aperiodicity and irreducibility is
                 usually easy

         • Problem is ensuring the desired
           stationary probability



         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                   Unit - Approximate Inference in Bayesian Networks                        p. 129



                       Designing Markov Chains
         Key tool:
         • If the transition probability satisfies
      P ( Xt  1  j |Xt i )       Q (X  j )
      P ( Xt  1 i |Xt  j )
                                   Q ( X i )
                                               whenever P ( Xt  1  j | Xt  i )  0

           then, P*(X) = Q(X)
         • This gives a local criteria for checking
           that the chain will have the right
           stationary distribution


         Fu Jen University            Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 130



                             MCMC Methods
         • We can use these results to sample from
             P(X1,…,Xn|e)
         Idea:
         • Construct an ergodic & aperiodic
           Markov Chain such that
           P*(X1,…,Xn) = P(X1,…,Xn|e)
         • Simulate the chain n steps to get a
           sample


         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks               Unit - Approximate Inference in Bayesian Networks                        p. 131



                             MCMC Methods
         Notes:
         • The Markov chain variable Y takes as
           value assignments to all variables that
           are consistent evidence
        V (Y )  { x 1 ,..., x n V ( X 1 )   V ( X 1 ) | x 1 ,..., x n satisfy e }
         • For simplicity, we will denote such a
           state using the vector of variables



         Fu Jen University        Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 132



                      4.2 Two MCMC Sampling
                            Algorithms
         • Gibbs Sampler
         • Metropolis-Hastings Sampler




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 133



                             Gibbs Sampler
         • One of the simplest MCMC method
         • Each transition changes the state of one
             Xi
         • The transition probability defined by P
           itself as a stochastic procedure:
               – Input: a state x1,…,xn
               – Choose i at random (uniform probability)
               – Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…,
                 xn, e)
               – let x’j = xj for all j  i
               – return x’1,…,x’n
         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks            Unit - Approximate Inference in Bayesian Networks                        p. 134



                    Correctness of Gibbs Sampler
         • How do we show correctness?




         Fu Jen University     Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                          Unit - Approximate Inference in Bayesian Networks                         p. 135



                    Correctness of Gibbs Sampler
         • By chain rule
             P(x1,…,xi-1, xi, xi+1,…,xn|e) =
             P(x1,…,xi-1, xi+1,…,xn|e)P(xi|x1,…,xi-1,
             xi+1,…,xn, e)
         • Thus, we get                                                                                     Transition
          P ( x 1 ,, x i  1 , x i , x i  1 ,, x n |e )        P ( x i |x 1 ,, x i  1 , x i  1 ,, x n ,e )
          P ( x 1 ,, x i  1 , x 'i , x i  1 ,, x n |e )
                                                                 P ( x 'i |x 1 ,, x i  1 , x i  1 ,, x n ,e )
         • Since we choose i from the same
           distribution at each stage, this
           procedure satisfies the ratio criteria
         Fu Jen University                    Department of Electrical Engineering             Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 136



                             Gibbs Sampling for
                             Bayesian Network
         • Why is the Gibbs sampler “easy” in BNs?
         • Recall that the Markov blanket of a
           variable separates it from the other
           variables in the network
               – P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi |
                    Mbi )
         • This property allows us to use local
           computations to perform sampling in
           each transition
         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks                   Unit - Approximate Inference in Bayesian Networks                        p. 137



                               Gibbs Sampling in
                               Bayesian Networks
         • How do we evaluate
                  P(Xi | x1,…,xi-1,xi+1,…,xn) ?
         • Let Y1, …, Yk be the children of Xi
               – By definition of Mbi, the parents of Yj are
                 in Mbi{Xi}
         • It is easy to show that
                                           P ( xi | Pa i ) P ( y j | pa y j )
                    P ( xi | Mb i ) 
                                                                 j

                                         P ( x ' | Pa ) P ( y
                                        x 'i
                                                    i        i
                                                                     j
                                                                              j   | pa y j )

         Fu Jen University              Department of Electrical Engineering            Wang, Yuan-Kai Copyright
Bayesian Networks              Unit - Approximate Inference in Bayesian Networks                        p. 138



                             Metropolis-Hastings
         • More general than Gibbs (Gibbs is a
           special case of M-H)
         • Proposal distribution arbitrary q(x’|x)
           that is ergodic and aperiodic (e.g.,
           uniform)
         • Transition to x’ happens with
           probability
           (x’|x)=min(1, P(x’)q(x|x’)/P(x)q(x’|x))
         • Useful when computing P(x) infeasible
         • q(x’|x)=0 implies P(x’)=0 or q(x|x’)=0
         Fu Jen University       Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 139



                             Sampling Strategy
         • How do we collect the samples?
         Strategy I:
         • Run the chain M times, each for N steps
               – each run starts from a different state
                 points
         • Return the last state in each run

                                           M chains



         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
Bayesian Networks             Unit - Approximate Inference in Bayesian Networks                        p. 140



                             Sampling Strategy
         Strategy II:
         • Run one chain for a long time
         • After some “burn in” period, sample
           points every some fixed number of steps


          “burn in”           M samples from one chain



         Fu Jen University      Department of Electrical Engineering              Wang, Yuan-Kai Copyright
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn
07 approximate inference in bn

More Related Content

What's hot

Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic ReasoningJunya Tanaka
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networksguestfee8698
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceSahil Kumar
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component AnalysisTatsuya Yokota
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Adaptive Resonance Theory
Adaptive Resonance TheoryAdaptive Resonance Theory
Adaptive Resonance TheoryNaveen Kumar
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Rohit Kumar
 
Inference in First-Order Logic
Inference in First-Order Logic Inference in First-Order Logic
Inference in First-Order Logic Junya Tanaka
 

What's hot (20)

Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
 
03 Machine Learning Linear Algebra
03 Machine Learning Linear Algebra03 Machine Learning Linear Algebra
03 Machine Learning Linear Algebra
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial Intelligence
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
First order logic
First order logicFirst order logic
First order logic
 
Adaptive Resonance Theory
Adaptive Resonance TheoryAdaptive Resonance Theory
Adaptive Resonance Theory
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 
Inference in First-Order Logic
Inference in First-Order Logic Inference in First-Order Logic
Inference in First-Order Logic
 
Graph coloring using backtracking
Graph coloring using backtrackingGraph coloring using backtracking
Graph coloring using backtracking
 

Similar to 07 approximate inference in bn

Information processing with artificial spiking neural networks
Information processing with artificial spiking neural networksInformation processing with artificial spiking neural networks
Information processing with artificial spiking neural networksAdvanced-Concepts-Team
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networksguestfee8698
 
Subspace Identification
Subspace IdentificationSubspace Identification
Subspace Identificationaileencv
 
Bayesian probabilistic interference
Bayesian probabilistic interferenceBayesian probabilistic interference
Bayesian probabilistic interferencechauhankapil
 
Bayesian probabilistic interference
Bayesian probabilistic interferenceBayesian probabilistic interference
Bayesian probabilistic interferencechauhankapil
 
Feasibility of moment tensor inversion for a single-well microseismic data us...
Feasibility of moment tensor inversion for a single-well microseismic data us...Feasibility of moment tensor inversion for a single-well microseismic data us...
Feasibility of moment tensor inversion for a single-well microseismic data us...Oleg Ovcharenko
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsSalah Amean
 
Csss2010 20100803-kanevski-lecture2
Csss2010 20100803-kanevski-lecture2Csss2010 20100803-kanevski-lecture2
Csss2010 20100803-kanevski-lecture2hasan_elektro
 
2016Nonlinear inversion of electrical resistivity imaging.pdf
2016Nonlinear inversion of electrical resistivity imaging.pdf2016Nonlinear inversion of electrical resistivity imaging.pdf
2016Nonlinear inversion of electrical resistivity imaging.pdfDUSABEMARIYA
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksstellajoseph
 

Similar to 07 approximate inference in bn (20)

06 exact inference in bn
06 exact inference in bn06 exact inference in bn
06 exact inference in bn
 
08 probabilistic inference over time
08 probabilistic inference over time08 probabilistic inference over time
08 probabilistic inference over time
 
05 probabilistic graphical models
05 probabilistic graphical models05 probabilistic graphical models
05 probabilistic graphical models
 
03 Uncertainty inference(discrete)
03 Uncertainty inference(discrete)03 Uncertainty inference(discrete)
03 Uncertainty inference(discrete)
 
02 Statistics review
02 Statistics review02 Statistics review
02 Statistics review
 
Information processing with artificial spiking neural networks
Information processing with artificial spiking neural networksInformation processing with artificial spiking neural networks
Information processing with artificial spiking neural networks
 
AI Lesson 29
AI Lesson 29AI Lesson 29
AI Lesson 29
 
Lesson 29
Lesson 29Lesson 29
Lesson 29
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networks
 
Dx25751756
Dx25751756Dx25751756
Dx25751756
 
Subspace Identification
Subspace IdentificationSubspace Identification
Subspace Identification
 
Bayesian probabilistic interference
Bayesian probabilistic interferenceBayesian probabilistic interference
Bayesian probabilistic interference
 
Bayesian probabilistic interference
Bayesian probabilistic interferenceBayesian probabilistic interference
Bayesian probabilistic interference
 
04 Uncertainty inference(continuous)
04 Uncertainty inference(continuous)04 Uncertainty inference(continuous)
04 Uncertainty inference(continuous)
 
Feasibility of moment tensor inversion for a single-well microseismic data us...
Feasibility of moment tensor inversion for a single-well microseismic data us...Feasibility of moment tensor inversion for a single-well microseismic data us...
Feasibility of moment tensor inversion for a single-well microseismic data us...
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
Jovan-DPG-Poster
Jovan-DPG-PosterJovan-DPG-Poster
Jovan-DPG-Poster
 
Csss2010 20100803-kanevski-lecture2
Csss2010 20100803-kanevski-lecture2Csss2010 20100803-kanevski-lecture2
Csss2010 20100803-kanevski-lecture2
 
2016Nonlinear inversion of electrical resistivity imaging.pdf
2016Nonlinear inversion of electrical resistivity imaging.pdf2016Nonlinear inversion of electrical resistivity imaging.pdf
2016Nonlinear inversion of electrical resistivity imaging.pdf
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 

More from IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing

More from IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing (11)

Computer Vision in the Age of IoT
Computer Vision in the Age of IoTComputer Vision in the Age of IoT
Computer Vision in the Age of IoT
 
2014/07/17 Parallelize computer vision by GPGPU computing
2014/07/17 Parallelize computer vision by GPGPU computing2014/07/17 Parallelize computer vision by GPGPU computing
2014/07/17 Parallelize computer vision by GPGPU computing
 
Towards Embedded Computer Vision - New @ 2013
Towards Embedded Computer Vision - New @ 2013Towards Embedded Computer Vision - New @ 2013
Towards Embedded Computer Vision - New @ 2013
 
老師與教學助理的互動經驗分享 1010217
老師與教學助理的互動經驗分享 1010217老師與教學助理的互動經驗分享 1010217
老師與教學助理的互動經驗分享 1010217
 
Parallel Vision by GPGPU/CUDA
Parallel Vision by GPGPU/CUDAParallel Vision by GPGPU/CUDA
Parallel Vision by GPGPU/CUDA
 
Markov Random Field (MRF)
Markov Random Field (MRF)Markov Random Field (MRF)
Markov Random Field (MRF)
 
01 Probability review
01 Probability review01 Probability review
01 Probability review
 
Monocular Human Pose Estimation with Bayesian Networks
Monocular Human Pose Estimation with Bayesian NetworksMonocular Human Pose Estimation with Bayesian Networks
Monocular Human Pose Estimation with Bayesian Networks
 
Towards Embedded Computer Vision邁向嵌入式電腦視覺
Towards Embedded Computer Vision邁向嵌入式電腦視覺Towards Embedded Computer Vision邁向嵌入式電腦視覺
Towards Embedded Computer Vision邁向嵌入式電腦視覺
 
Intelligent Video Surveillance with Cloud Computing
Intelligent Video Surveillance with Cloud ComputingIntelligent Video Surveillance with Cloud Computing
Intelligent Video Surveillance with Cloud Computing
 
Intelligent Video Surveillance and Sousveillance
Intelligent Video Surveillance and SousveillanceIntelligent Video Surveillance and Sousveillance
Intelligent Video Surveillance and Sousveillance
 

Recently uploaded

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 

07 approximate inference in bn

  • 1. Bayesian Networks Unit 7 Approximate Inference in Bayesian Networks Wang, Yuan-Kai, 王元凱 ykwang@mails.fju.edu.tw http://www.ykwang.tw Department of Electrical Engineering, Fu Jen Univ. 輔仁大學電機工程系 2006~2011 Reference this document as: Wang, Yuan-Kai, “Approximate Inference in Bayesian Networks," Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011. Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 2. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 2 Goal of This Unit • P(X|e) inference for Bayesian networks • Why approximate inference – Exact inference is too slow because of exponential complexity • Using approximate approaches – Sampling methods • Likelihood weighting sampling • Markov Chain Monte Carlo sampling – Loopy belief propagation – Variational method Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 3. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 3 Related Units • Background – Probabilistic graphical model – Exact inference in BN • Next units – Probabilistic inference over time Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 4. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 4 Self-Study References • Chapter 14, Artificial Intelligence-a modern approach, 2nd, by S. Russel & P. Norvig, Prentice Hall, 2003. • Inference in Bayesian networks, B. D’Ambrosio, AI Magazine, 1999. • Probabilistic Inference in graphical models, M. I. Jordan & Y. Weiss. • An introduction to MCMC for machine learning. Andrieu, C., De Freitas, J., Doucet, A., & Jordan, M. I., Machine Learning, vol. 50, pp.5-43, 2003. • Computational Statistics Handbook with Matlab, W. L. Martinez and A. R. Martinez, Chapman & Hall/CRC, 2002 – Chapter 3 Sampling Concepts – Chapter 4 Generating Random Variables Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 5. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 5 Structure of Related Lecture Notes Problem Structure Data Learning PGM B E Representation Learning A Unit 5 : BN Units 16~ : MLE, EM Unit 9 : Hybrid BN J M Units 10~15: Naïve Bayes, MRF, HMM, DBN, Kalman filter P(B) Parameter P(E) Learning P(A|B,E) P(J|A) Query Inference P(M|A) Unit 6: Exact inference Unit 7: Approximate inference Unit 8: Temporal inference Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 6. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 6 Contents 1. Sampling .......................................................... 11 2. Random Number Generator .......................... 20 3. Stochastic Simulation ……............................. 70 4. Markov Chain Monte Carlo .......................... 113 5. Loopy Belief Propagation …………………. 145 6. Variational Methods ………………………... 146 7. Implementation …………………………….. 147 8. Summary ……………………………………. 148 9. References …………………………………… 151 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 7. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 7 4 Steps of Inference • Step 1: Bayesian theorem P ( X , E  e) P ( X | E  e)   P ( X , E  e) P ( E  e) • Step 2: Marginalization    P( X , E  e, H  h) hH • Step 3: Conditional independence     P( X i | Pa ( X i )) hH i 1~ n • Step 4: Product sum computation (Enumeration) – Exact inference – Approximate inference Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 8. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 8 Five Types of Queries in Inference • For a probabilistic graphical model G • Given a set of evidence E=e • Query the PGM with – P(e) : Likelihood query – arg max P(e) : Maximum likelihood query – P(X|e) : Posterior belief query – arg maxx P(X=x|e) : (Single query variable) Maximum a posterior (MAP) query – arg maxx …x P(X1=x1, …, Xk=xk|e) : 1 k Most probable explanation (MPE) query Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 9. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 9 Approximate Inference v.s. Exact Inference • Exact inference: P(X|E) = 0.71828 – Get exact probability value – Using the inference steps derived by probabilistic formula – Need exponential time complexity • Approximate inference: P(X|E)  0.71 – Get approximate probability value – Using sampling theorem – Need only polynomial time complexity, fast computation Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 10. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 10 Why Approximate Inference • Large treewidth – Large, highly connected graphical models – Treewidth may be large (>40) in sparse networks • In many applications, approximation are sufficient – Example: P(X = x|e) = 0.3183098861 – Maybe P(X = x|e)  0.3 is a good enough approximation – e.g., we take action only if P(X=x|e) > 0.5 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 11. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 11 1. Sampling • 1.1 What Is Sampling • 1.2 Sampling for Inference Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 12. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 12 Basic Idea of Sampling • Why sampling – Estimate some values by random number generation 1. Sampling –  Random number generating – Draw N samples from a known distribution P – Generate N random numbers from a known distribution S 2. Estimation ˆ – Compute an approximate probability P , which approximates the real posterior probability P(X|E) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 13. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 13 1.1 What Is Sampling • A very simple example with a random variable : coin toss – Tossing the coin, get head or tail – It is a Boolean R.V. • coin = head or tail – If it is unbiased coin, head and tail have equal probability • A prior probability distribution P(Coin) = <0.5, 0.5> • Uniform distribution – Assume we have a coin but we do not know it is unbiased Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 14. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 14 Sampling of Coin Toss • Sampling in this example = flipping the coin many times N – e.g., N=1000 times – One flipping  get one sample – Ideally, 500 heads, 500 tails • P(head) = 500/1000=0.5 P(tail) = 500/1000=0.5 – Practically, 5001 heads, 499 tails • P(head) = 501/1000=0.501 P(tail) = 499/1000=0.499 • After the sampling, – We can estimate probability distribution – Check if it is biased Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 15. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 15 Sampling & Estimation (Math) • For a Boolean random variable X – P(X) is prior distribution = <P(x), P(x)> – Using a sampling algorithm to generate N samples – Say N(x) is the number of samples that x is true, N(x) x is false N ( x) ˆ N ( x ) ˆ  P( x),  P (x ) N N N ( x) N ( x ) lim  P( x), lim  P ( x ) N  N N  N Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 16. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 16 1.2 Sampling for Inference • Given a Bayesian network G including (X1, …, Xn) – We get a joint probability distribution P(X1, …, Xn) =  P(Xi|Pa(Xi)) • For a query P(X|E=e) – P(X|e) =   P(Xi | Parent(Xi)) – It is hard to compute • Need exponential time in number of Xi – We will try to use sampling to compute it Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 17. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 17 Compute P(X|e) by Sampling • Sampling Explained in – Generate N samples of Sections 2,3,4 P(X1, …, Xn) =  P(Xi|Pa(Xi)) • Estimation – Use N samples to estimate P(X,e)  N(X,e)/N – Use N samples to estimate P(e)  N(e)/N – Estimate P(X|e) by P(X,e) / P(e) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 18. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 18 What Is Sampling Algorithm • The algorithm to – Generate samples from a known probability distribution P ˆ – Estimate the approximate probability P Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 19. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 19 Various Sampling Algorithms • Stochastic simulation Section 3 – Direct Sampling – Rejection sampling • Reject samples disagreeing with evidence – Likelihood weighting • Use evidence to weight samples • Markov chain Monte Carlo Section 4 (MCMC) – Sample from a stochastic process whose stationary distribution is the true posterior Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 20. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 20 2. Random Number Generator • Very important for sampling algorithm • Introduce basic concepts related to sampling of Bayesian networks • Subsections – 2.1 Univariate – 2.2 Multivariate Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 21. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 21 RNG In Programming Languages • Random number generator (RNG) – C/C++: rand() – Java: random() – Matlab: rand() • Why should we discuss it? – They generate random numbers with uniform distribution – How to generate • Gaussian, … • Multivariate, dependent random variables • Non-closed-form distribution? Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 22. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 22 Generate a Random Number (1/2) • Examples in C – int i = rand(); – Return 0 ~ RAND_MAX (32767) – It generates integers • Generate a random number between 1 and n (n<32767) – int i = 1 + ( rand() % n ) – (rand() % n) returns a number between 0 and n - 1 – Add 1 to make random number between 1 and n – It generates integers, but not real numbers Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 23. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 23 Generate a Random Number (2/2) • Ex: integer between 1 and 6 –1 + ( rand() % 6) • Ex: real number between 0 and 1 –double i = rand() / RAND_MAX • Exercise – Real number between 10 and 20 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 24. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 24 Generate Many Random Numbers Repeatedly • Using loop for repeated generation – for (int i=0; i<1000; i++) { rand(); } – int i, j[1000]; for (i=0; i<1000; i++) { j[i] = 1 + rand() % 6; } rand() generates a number uniformly Uniform distribution Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 25. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 25 Why Generate Random Numbers • Simulate random behavior • Make random decision • Estimate some values Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 26. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 26 Random Behavior/Decision (1/2) • Flip a coin for decision (Boolean) – Fair: each face has equal probability – int coin_face; if (rand() > RAND_MAX/2) coin_face = 1; else coin_face = 0; – int coin_face; coin_face = rand() % 2; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 27. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 27 Random Behavior/Decision (2/2) • Random decision of multiple choices – Discrete random variable • Ex: roll a die Uniform distribution – Fair: each face has equal probability • int die_face; //Random variable die_face = rand() % 6; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 28. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 28 Estimation • If we can simulate a random behavior • We can estimate some values – First, we repeat the random behavior – Then we estimate the value Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 29. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 29 Example: The Coin Toss • Flip the coin 1000 times to estimate the fairness of the coin – int coin_face; //Random variable int frequency[2]; Uniform distribution for (i=0; i<1000; i++) frequency { coin_face = rand() % 2 frequency[coin_face]++; } 0 1 Coin face Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 30. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 30 Example : Area of Circle (Estimation) • int x, y; //Two random variables int N=1000, NCircle=0, Area; for (i=0; i<N; i++) { x = rand() / RAND_MAX; x and y are y = rand() / RAND_MAX; independent if ( (x*x + y*y) <= 1 ) NCircle = NCircle + 1; } A random number ? Area = 4 * (NCircle/N); We call (x,y) a sample Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 31. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 31 Multiple Dependent Random Variables • Markov Chain: n random variables X1 ... Xk ... Xn • Bayesian Networks: 5 random variables Burglary Earthquake Alarm What is a sample ? John Calls Mary Calls Variables are dependent Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 32. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 32 Sampling • It is to randomly generate a sample – For a random variable X or Univariate A set of random variables X1, …, Xn Multivariate • Boolean, Discrete, Continuous • Multivariate – Independent, dependent – According to a probability distribution P(X) • Discrete X: Histogram • Continuous X: – Uniform, Gaussian, or – Any distribution: Gaussian mixture models Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 33. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 33 Sub-Sections for Generating a Sample • 2.1 Univariate – Uniform, Gaussian, Gaussian mixture • 2.2 Multivariate – Uniform – Gaussian • Independent, dependent – Any distribution • Gaussian mixture – Independent, dependent • Bayesian network Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 34. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 34 2.1 Univariate • For a random variable X – Boolean, discrete, continuous, hybrid • We know P(X) is – Uniform, Gaussian, Gaussian mixture • Generate a sample X according to P(X) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 35. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 35 Uniform Generator • Every programming language provides a rand()/random() function to generate a uniform-distributed number – Integer number within [0, MAX) • Sampling a Boolean uniform number – rand() %2 • Sampling a discrete uniform number within [0, d) – rand() % d • Sampling a continuous uniform number – Within [0, 1): rand() % MAX – Within [a, b): a + (rand() % MAX)*(a-b) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 36. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 36 Example : Uniform Generator • x=rand(1,10000); • h=hist(x,20); 600 • bar(h); 500 400 300 200 100 0 0 5 10 15 20 25 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 37. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 37 Gaussian Generator (1/2) • Sampling Gaussian can be obtained by uniform distribution • There are functions in C/Java/Matlab to randomly generate a univariate Gaussian real number with (, )=(0,1) – C : Numerical recipies in C, – Java: Random.nextGaussian() – Matlab: randn() • Suppose it is called Gaussian() Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 38. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 38 Gaussian Generator (2/2) • Sampling a continuous Gaussian number with (, ) – (Gaussian() * ) +  • Sampling a discrete Gaussian number with (, ) ? Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 39. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 39 Example : Gaussian Generator (1/2) • Pseudo codes – Assume Gaussian() is a pseudo function to generate Gaussian numbers – double x[10000]; for (i=0; i<10000; i++) x[i] = Gaussian(); – for (i=0; i<10000; i++) x[i] =  + Gaussian() * ; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 40. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 40 Example : Gaussian Generator (2/2) • Matlab • Java – x=randn(1,10000); – Random r=new – h=hist(x,20); Random(); 1600 – bar(h); int x[10000]; 1400 for (i=0;i<10000;i++) 1200 x[i]=r.nextGaussian(); 1000 800 600 400 200 0 0 5 10 15 20 25 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 41. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 41 Gaussian Mixture Generator (1/2) • Random variable X with Gaussian – P(X) = N(X; , ) • Random variable Y with Gaussian mixture – P(Y) = m mN(Y; m, m) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 42. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 42 Gaussian Mixture Generator (2/2) • Generate N samples of X – for (i=0; i<N; i++) x[i]=(Gaussian() * ) +  • Generate N samples of Y with mixture of M Gaussians – Each Gaussian m has m, m – for (m=0; m<M; m++) for (i=0; i<N*m; i++) y[m][i] = (Gaussian() * m) + m Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 43. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 43 Example : Gaussian Mixture Generator • N=10000; pi1=0.8; pi2=0.2; • mu1=0; mu2=15; sigma1=3; sigma2=5; • x1 = mu1 + randn(1,N*pi1) * sigma1; • x2 = mu2 + randn(1,N*pi2) * sigma2; 900 • x = [x1, x2]; 800 • h=hist(x,50); 700 • bar(h); 600 500 400 300 200 100 0 0 10 20 30 40 50 60 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 44. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 44 2.2 Multivariate • For random variables X1,… ,Xn – Boolean, discrete, continuous, hybrid • We know P(X1,… ,Xn) is – Uniform, Gaussian, Gaussian mixture, any distribution • Generate a sample (X1,… ,Xn) according to P(X1,… ,Xn) – Independent – Dependent Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 45. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 45 Multivariate Boolean Uniform Generator • Boolean random variables X1,… ,Xn • int X[n]; // A sample for (i=0; i<n; i++) X[i] = rand() % 2; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 46. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 46 Multivariate Discrete Uniform Generator • Discrete random variables X1,…, Xn – Each with d discrete values: [0, d-1] – Each Xi is uniform distributed – X1,…, Xn must be independent • int X[n]; // A sample for (i=0; i<n; i++) X[i] = rand() % d; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 47. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 47 Multivariate Gaussian Generator - Independent (1/2) • Pseudo codes • For n random variables X=(X1,…,Xn) – Gaussian : N(X; , ) • Mean vector:  • Covariance matrix: =[ij] • X1,…,Xn are independent – ij = 0 for ij • Generate a sample of X  Generate each Xi independently Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 48. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 48 Multivariate Gaussian Generator - Independent (2/2) • Generate a sample of X =(X1,…,Xn) with i=0, ii=1, ij = 0 for ij – int X[n]; // a sample for (i=0; i<n; i++) X[i] = Gaussian(); • Generate a sample of X =(X1,…,Xn) with i0, ii 1, ij = 0 for ij – int X[n]; // a sample for (i=0; i<n; i++) X[i] = i + Gaussian() * ii; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 49. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 49 Example – Matlab (1/2) mx=[0 0]';  X  (0,0) T Cx=[1 0; 0 1]; 1 0 x1=-3:0.1:3; X    x2=-3:0.1:3; 0 1  for i=1:length(x1), for j=1:length(x2), f(i,j)=(1/(2*pi*det(Cx)^ 1/2))*exp((-1/2)*([x1(i) x2(j)]- mx')*inv(Cx)*([x1(i);x2( j)]-mx)); end end mesh(x1,x2,f) pause; contour(x1,x2,f) pause Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 50. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 50 Example – Matlab (2/2) • Randomly generate 1000 samples for 1 0  X  (0,0) ,  X   T  0 1  y1=randn(1,1000); y2=randn(1,1000); plot(y1,y2,'.'); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 51. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 51 Multivariate Gaussian Generator - Dependent (1/4) • For n random variables X=(X1,…,Xn) –Gaussian : N(X; , ) • Mean vector:  • Covariance matrix: =[ij] –  is a positive definite matrix • Symmetric and all eigenvalues (pivots) > 0 – For general matrix A : A= LDU • L: lower triangular, U: upper triangular D: diagonal matrix of pivots – For symmetric matrix S: S = LDLT – For positive definite matrix  = LDL     PPT T T= L D L D – This is called Cholesky decomposition • X1,…,Xn are dependent –ij  0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 52. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 52 Multivariate Gaussian Generator - Dependent (2/4) • Generate a sample of X with ,  – Perform Cholesky decomposition of  • Cholesky decomposition is pivot decomposition for positive definite matrix •  = PP-1 = PPT – Generate independent Gaussian Y=(Y1,…,Yn ) with i=0, i=1 – X = PY +  Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 53. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 53 Multivariate Gaussian Generator - Dependent (3/4) • Pseudo code to generate a sample of X with ,  – Matrix ; Vector ; Vector X(n), Y(n); // a sample Matrix P=chol(); //Cholesky decomp. for (i=0; i<n; i++) Y(i) = Gaussian(); X=P*Y+ Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 54. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 54 Multivariate Gaussian Generator - Dependent (4/4) • Proof – For n random variables X=(X1,…,Xn) with ,  – Generate n independent, zero-mean, unit variance normal random variables Y=(Y1,…,Yn) 1  0 Y  (Y1 , , Yn )T , Y  (0, ,0)T , Y         0  1    – Take X = PY+, where  =PP -1 =PPT  Covariance Matrix of X  E ( X   )( X   )T   E{( PY )( PY )T }  E{PYY T P T }  PE{YY T }P T  PP T   Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 55. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 55 Example – Matlab (1/4) Assume  X  (0,0)T  1 1 / 2  1 0  X   , P  1 / 2 3  1 / 2 1   2   1/ 2 Matlab: mx=[0 0]'; Cx=[1 1/2; 1/2 1]; P=chol(Cx); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 56. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 56 Example – Matlab (2/4) • Randomly generate 1000 samples for  1 1 / 2  X  (0,0) ,  X   T 1/ 2 1  • mx=zeros(2,1000); y1=randn(1,1000); y2=randn(1,1000); y=[y1;y2]; P=[1, 0; 1/2, sqrt(3)/2]; x=P*y+mx; x1=x(1,:); x2=x(2,:); plot(x1,x2,'.'); r=corrcoef(x1',x2'); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 57. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 57 Example – Matlab (3/4) Assume  X  (5,5)T  1 0.9 1 0  X   , P   9 19  0.9 1   10 10    0.9 Matlab: • mx=[5 5]'; • Cx=[1 9/10; 9/10 1]; • P=chol(Cx); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 58. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 58 Example – Matlab (4/4) • Randomly generate 1000 samples for  1 0.9  X  (5,5) ,  X   T  0. 9 1   • mx=5*ones(2,1000); y1=randn(1,1000); y2=randn(1,1000); y=[y1;y2]; P=[1, 0; 9/10, sqrt(19)/10]; x=P*y+mx; x1=x(1,:); x2=x(2,:); plot(x1,x2,'.'); r=corrcoef(x1',x2'); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 59. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 59 Multivariate Gaussian Mixture Generator • Generate N samples of X with mixture of M Gaussians (Matlab-like pseudo code) – for (m=0; m<M; m++) { Matrix P=chol(m) //Cholesky decomposition for (i=0; i<N*m; i++) { //Generate n independent normally distributed // R.V. (=0, =1) y = randn(1, n) // Transform y into x x=P*y+ } } Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 60. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 60 Example – Matlab (1/4) • Combine the previous two Gaussians: 1=0.5, 2=0.5, 7 1  (0,0) 6 T 5  1 1 / 2 4 1   1/ 2 1  3   2  2  (5,5) T 1 0  1 0. 9  -1 2    -2 0.9 1  -3 -4 -2 0 2 4 6 8 10 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 61. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 61 Example – Matlab (2/4) • pi1= 0.5; pi2=0.5; N=2000; mx1=zeros(2,pi1*N); Cx1=[1 1/2; 1/2 1]; P1=chol(Cx1); %P=[1, 0; 1/2, sqrt(3)/2]; y1_1=randn(1,pi1*N); y1_2=randn(1,pi1*N); y1=[y1_1;y1_2]; x1=P1*y1+mx1; x1_1=x1(1,:); x1_2=x1(2,:); mx2=5*ones(2,pi2*N); Cx2=[1 9/10; 9/10 1]; P2=chol(Cx2); %P=[1, 0; 1/2, sqrt(3)/2]; y2_1=randn(1,pi2*N); y2_2=randn(1,pi2*N); y2=[y2_1;y2_2]; x2=P2*y2+mx2; x2_1=x2(1,:); x2_2=x2(2,:); z1=[x1_1,x2_1]; z2=[x1_2,x2_2]; plot(z1,z2,'.'); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 62. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 62 Example – Matlab (3/4) • Combine the previous two Gaussians 1=0.2, 2=0.8 7 6 1  (0,0) T 5  1 1 / 2 4 1   1/ 2 1  3   2  2  (5,5) T 1 0  1 0. 9  -1 2    -2 0.9 1  -3 -4 -2 0 2 4 6 8 10 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 63. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 63 Example – Matlab (4/4) • pi1= 0.2; pi2=0.8; N=2000; mx1=zeros(2,pi1*N); Cx1=[1 1/2; 1/2 1]; P1=chol(Cx1); %P=[1, 0; 1/2, sqrt(3)/2]; y1_1=randn(1,pi1*N); y1_2=randn(1,pi1*N); y1=[y1_1;y1_2]; x1=P1*y1+mx1; x1_1=x1(1,:); x1_2=x1(2,:); mx2=5*ones(2,pi2*N); Cx2=[1 9/10; 9/10 1]; P2=chol(Cx2); %P=[1, 0; 1/2, sqrt(3)/2]; y2_1=randn(1,pi2*N); y2_2=randn(1,pi2*N); y2=[y2_1;y2_2]; x2=P2*y2+mx2; x2_1=x2(1,:); x2_2=x2(2,:); z1=[x1_1,x2_1]; z2=[x1_2,x2_2]; plot(z1,z2,'.'); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 64. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 64 Exercise • Write a program to randomly generate 1000 samples of 3-dimensional Gaussian with =(5,10,-3), =(2,1,3;4,2,2;3,1,2) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 65. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 65 Any Distribution • For random variables X1,… ,Xn – Boolean, discrete, continuous, hybrid • We know P(X1,… ,Xn) has no closed-form formula – Independent: P(X1,… ,Xn)= P(X1)… P(Xn) – Dependent: P(X1,… ,Xn)=  P(Xi | Parent(Xi)) • Generate a sample (X1,… ,Xn) according to P(X1,… ,Xn) – Independent: generate each Xi by P(Xi) – Dependent: generate each Xi by P(Xi| Parent(Xi)) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 66. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 66 Two Boolean R.V.s - Independent • X1, X2 have distributions : – P(X1)=<0.67, 0.33>, P(X2)=<0.75,0.25> • int X1, X2; P(X1) for (i=0; i<1000; i++) 0.67 { if (rand() > RAND_MAX/3) X1 = 1; else X1 = 0; 0 1 X1 if (rand() > RAND_MAX/4) P(X2) X2 = 1; 0.75 else X2 = 0; } 0 1 X2 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 67. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 67 Two Boolean R.V.s - Dependent • X1, X2 have distributions : – P(X1)=<0.67, 0.33> – P(X2|X1=T)=<0.75,0.25>, P(X2|X1=F)=<0.8,0.2> • Generate a sample (x1, x2) if (rand() > RAND_MAX/3) x1 = 1; else x1 = 0; if (x1==1) if (rand() > RAND_MAX/4) x2 = 1; else x2 = 0; else // x1==0 if (rand() > RAND_MAX/5) x2 = 1; else x2 = 0; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 68. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 68 Markov Chain • Markov Chain: n random variables X1 ... Xk ... Xn Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 69. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 69 Bayesian Network • Example: 5 random variables Burglary Earthquake Alarm John Calls Mary Calls Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 70. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 70 3. Stochastic Simulation • Also called – Monte Carlo Methods – Sampling Methods • Sub-sections – 3.1 Direct sampling – 3.2 Rejection sampling – 3.3 Likelihood weighting Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 71. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 71 3.1 Direct Sampling • Generate N samples randomly • For the inference P(X|E) – P(X|E)= P(X^E) / P(E) – Get N(E) & N(X^E) from the N samples • N(E) : No. of samples of E • N(X^E) : No. of samples of X and E – P(E) = N(E) / N, P(X^E) = N(X^E) / N – P(X|E) = N(X^E) / N(E) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 72. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 72 Example (1/4) • For the sprinkler network – Estimate P(w|r) by direct sampling – 4 random variables – A sample = (c,s,r,w) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 73. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 73 Example (2/4) • Generate 1000 samples Cloudy Sprinkler Rain WetGrass T T T F F T T F F F T T T T T F T T T F ... ... ... ... F T T F Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 74. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 74 Example (3/4) • P(r| w) = P(r, w)/P(w) Nw: No. of WetGrass=False Nr^w: No. of (Rain=True&WetGrass=False) Cloudy Sprinkler Rain WetGrass T T T F F T T F Nr^w / Nw F F T T T T F F ... ... ... ... F T T F Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 75. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 75 Example (4/4) • P(R|w) – = P(R, w)/P(w) – = < P(r ^ w)/P(w), P(r ^ w)/P(w) > Cloudy Sprinkler Rain WetGrass T T T F F T T F F F T T T T F F ... ... ... ... F T T F Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 76. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 76 How to Generate a Sample for the Bayesian Network? (1/3) • The sprinkler Bayesian network A sample is an atomic event : (cloundy,sprinkler,rain,wetgrass) =(T, F, T, T) •Assume a sampling order: [ Cloudy, Sprinkler, Rain, WetGrass ] Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 77. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 77 How to Generate a Sample for the Bayesian Network? (2/3) • int C, S, R, W; for (i=0; i<1000; i++) { if (rand() > RAND_MAX/2) C = T; else C = F; if (rand() > RAND_MAX/2) S = T; else S = F; if (rand() > RAND_MAX/2) R = T; else R = F; if (rand() > RAND_MAX/2) W = T; else W = F; } Incorrect Implementation Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 78. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 78 How to Generate a Sample for the Bayesian Network? (3/3) • int C, S, R, W; for (i=0; i<1000; i++) { if (rand() > RAND_MAX/2) C = T; else C = F; if (C==T) if (rand() > RAND_MAX*0.9) S = T; else S = F; else // C==F if (rand() > RAND_MAX/2) S = T; else S = F; ... } Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 79. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 79 An Example Generating One Sample (1/8) • The sampling algorithm 1.Sample from P(Cloudy)=<0.5, 0.5> – Suppose it returns true 2.Sample from P(Sprinkler|Cloudy=true)=<0.1,0.9> – Suppose it returns false 3.Sample from P(Rain|Cloudy=true)=<0.8,0.2> – Suppose it returns true 4.Sample from P(WetGrass|Sprinkler=false, Rain=true) = <0.9,0.1> – Suppose it returns true Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 80. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 80 An Example Generating One Sample (2/8) C S R W Samples: Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 81. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 81 An Example Generating One Sample (3/8) Random sampling: C S R W Cloudy Samples: c Return: Cloudy=true Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 82. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 82 An Example Generating One Sample (4/8) C S R W c Samples: Random sampling 1. Sprinkler 2. Rain Given Cloudy=true Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 83. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 83 An Example Generating One Sample (5/8) C S R W c s Samples: Random sampling Sprinkler Given Cloudy=true Return: Sprinkler=false Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 84. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 84 An Example Generating One Sample (6/8) C S R W c s r Samples: Random sampling Rain Given Cloudy=true Return: Rain=true Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 85. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 85 An Example Generating One Sample (7/8) C S R W c s r Samples: Random sampling WetGrass Given Rain=true, Sprinkler=false Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 86. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 86 An Example Generating One Sample (8/8) C S R W c s r w Samples: Random sampling WetGrass Given Rain=true, Sprinkler=false Return: WetGrass=true Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 87. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 87 The Algorithm (1/2) • To generate one sample Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 88. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 88 The Algorithm (2/2) • In previous example – We get a sample [true, false, true, true] of a Bayesian network using the Prior- Sample • The sampling of a Bayesian network – Repeat the sampling N times – We get N samples • We can use the N samples to compute any query probability in the Bayesian network Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 89. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 89 How It Works (1/2) • Why any probability can be answered from the sampling? – The N samples is actually a full joint distribution table (FJD) C S R W C S R W P T T T F T T T F 0.02 F T T F F T T F 0.13 F F T T F F T T 0.04 T T F F T T F F 0.15 ... ... ... ... ... ... ... ... ... F T T F FJD Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 90. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 90 Why It Works (2/2) • A sample is an atomic event (x1, ..., xn) • P(x1, ..., xn)  N(x1, ..., xn) / N • Therefore, a FJD is generated from the N samples • Note: N < 2n Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 91. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 91 Exercise: Direct Sampling p(smart)=.8 p(study)=.6 Query: What is the probability smart study that a student studied, given that they pass the exam? p(fair)=.9 prepared fair p(prep|…) smart smart pass study .9 .7 smart smart study .5 .1 p(pass|…) prep prep prep prep fair .9 .7 .7 .2 fair .1 .1 .1 .1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 92. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 92 Problems of Direct Sampling • It needs to generate very many samples in order to obtain the approximate FJD • For a query of conditional probability P(X|e) – Can we just approximate the conditional probability? – Yes, the following two algorithms will do this Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 93. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 93 3.2 Rejection Sampling ˆ • P( X | e) is estimated from samples agreeing with e Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 94. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 94 An Example • Estimate P(Rain|Sprinkler=true) using 100 samples – 27 samples have Sprinkler = true – Of these, 8 have Rain=true and 19 have Rain=false  – P(Rain|Sprinkler=true) = Normalize(<8,19>) = <0.296, 0.704> • Similar to a basic real-world empirical estimation procedure Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 95. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 95 Analysis of Rejection Sampling P ( X | e)  ˆ N ( X ,e ) N (e)  P ( X ,e ) P (e)  P ( X | e) • Hence rejection sampling returns consistent posterior estimates • Problem: expensive if P(e) is small – P(e) drops off exponentially with number of evidence variables! Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 96. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 96 3.3 Likelihood Weighting • Avoids the inefficiency of rejection sampling – By generating only events consistent with the evidence variables e • Idea Randomly – Fix evidence variables, generate a sample – Sample only hidden variables event – Weight each sample event by the likelihood it accords the evidence • Events have different weights Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 97. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 97 An Example (1/9) • Query P(Rain|sprinkler, wetgrass) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 98. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 98 An Example (2/9) 1. Set the weight  =1.0 2. Sample from P(Cloudy)=<0.5,0.5> • Suppose it returns true 3. The evidence Sprinkler=true. So we set  =  P(sprinkler|cloudy)=1*0.1=0.1 4. Sample from P(Rain|cloudy)=<0.8,0.2> • Suppose it returns true 5. The evidence WetGrass=true. So we set  =  P(wetgrass|sprinkler,rain) =0.1*0.99=0.099 A sample event (true, true, true, true) with weight 0.099 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 99. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 99 An Example (3/9) =1.0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 100. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 100 An Example (4/9) =1.0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 101. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 101 An Example (5/9) =1.0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 102. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 102 An Example (6/9) =1.0  0.1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 103. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 103 An Example (7/9) =1.0  0.1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 104. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 104 An Example (8/9) =1.0  0.1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 105. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 105 An Example (9/9) =1.0  0.1  0.99 = 0.099 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 106. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 106 The Algorithm (1/2) • The example generates a sample event (true, true, true, true) for the query P(Rain|sprinkler, wetgrass) • Repeat the sampling N times – We get N sample events – Each event has a likelihood weight  – 1 = rain=true , 1 = rain=false  • P(Rain|sprinkler, wetgrass) = < 1/(1+2), 2/(1+2) > Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 107. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 107 The Algorithm (2/2) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 108. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 108 Exercise: Likelihood Weighting p(smart)=.8 p(study)=.6 Query: What is the probability smart study that a student studied, given that they pass the exam? p(fair)=.9 prepared fair p(prep|…) smart smart pass study .9 .7 smart smart study .5 .1 p(pass|…) prep prep prep prep fair .9 .7 .7 .2 fair .1 .1 .1 .1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 109. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 109 Analysis (1/3) • Why the algorithm works? P(X|E=e) • Let the sampling probability for WEIGHTED-SAMPLE be SWS – The evidence variables E are fixed with e – All the other variables Z = {X}  Y – The algorithm samples each variable in Z given its parent values l SWS ( z , e)   P( zi | parents( Z i )) i 1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 110. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 110 Analysis (2/3) • The likelihood weight w for a given sample (z, e)=(x, y, e) is m w( z , e)   P (ei | parents ( Ei )) i 1 • The weighted probability of a sample (z,e)=(x, y, e) is SWS ( z , e) w( z , e) l m   P( zi | parents ( Z i )) P (ei | parents ( Ei )) i 1 i 1 n  P ( x, y , e)  P( x1 , , xn )   P( xi | parents ( X i )) i 1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 111. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 111 Analysis (3/3) P( x | e)    NWS ( x, y, e) w( x, y, e) ˆ y   '  SWS ( x, y, e) w( x, y, e) y   '  P ( x, y , e) y   ' P ( x, e)  P ( x | e) So the algorithm works Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 112. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 112 Discussions • Likelihood weighting is efficient because it uses all the samples generated • However, it suffers a degradation in performance as the no. of evidence variables increases, because – Most samples will have very low weights, – The weighted estimate will be dominated by the tiny fraction of samples that have infinitesimal likelihood Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 113. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 113 4. Inference by MCMC • Key idea – Sampling process as a Markov Chain • Next sample depends on the previous one – Approximate any posterior distribution • "State" of network = current assignment to all variables • Generate next state – by sampling one variable given Markov blanket • Sample each variable in turn, keeping evidence fixed Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 114. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 114 The Markov Chain • With Sprinkler =true, WetGrass=true, there are four states: Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 115. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 115 Markov Blanket Sampling • Markov blanket of Cloudy is – Sprinkler and Rain • Markov blanket of Rain is – Cloudy, Sprinkler, and WetGrass • Probability given the Markov blanket is calculated as follows – P(x'i|MB(Xi)) = P(x'i|Parents(Xi)) ZjChildren(Xi)P(zj|Parents(Zj)) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 116. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 116 An Example (1/2) • Estimate P(Rain|sprinkler,wetgrass) • Loop for N times – Sample Cloudy or Rain given its Markov blanket • Count number of times Rain=true and Rain=false in the samples Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 117. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 117 An Example (2/2) • E.g., visit 100 states – 31 have Rain=true, – 69 have Rain=false • P(Rain|sprinkler,wetgrass) = Normalize(<31, 69>) = <0.31, 0.69> Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 118. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 118 The Algorithm Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 119. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 119 Why it works • Skipped – Details in pp. 517-518 in the AIMA 2e textbook Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 120. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 120 Sub-Sections • 4.1 Markov chain theory • 4.2 Two MCMC sampling algorithms Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 121. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 121 4.1 Markov Chain Theory • Suppose X1, X2, … take some set of values – wlog. These values are 1, 2, ... • A Markov chain is a process that corresponds ... ... to the network: X1 X2 X3 Xn • To quantify the chain, we need to specify – Initial probability: P(X1) – Transition probability: P(Xt+1|Xt) • A Markov chain has stationary transition probability: P(Xt+1|Xt) same for all times t Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 122. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 122 Irreducible Chains • A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0 – There is a positive probability of reaching j from i after some number steps • A chain is irreducible if every state is accessible from every state Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 123. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 123 Ergodic Chains • A state is positively recurrent if there is a finite expected time to get back to state i after being in state i – If X has finite number of states, then this is suffices that i is accessible from itself • A chain is ergodic if it is irreducible and every state is positively recurrent Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 124. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 124 (A)periodic Chains • A state i is periodic if there is an integer d such that when n is not divisible by d P(Xn = i | X1 = i ) = 0 • Intuition: only every d steps state i may occur • A chain is aperiodic if it contains no periodic state Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 125. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 125 Stationary Probabilities Thm: • If a chain is ergodic and aperiodic, then the limit n   P ( X n | X 1  i ) lim exists, and does not depend on i • Moreover, let P * ( X  j )  n   P ( X n  j | X 1  i ) lim then, P*(X) is the unique probability satisfying P * (X  j )   P ( X t  1  j | X t  i )P * ( X  i ) i Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 126. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 126 Stationary Probabilities • The probability P*(X) is the stationary probability of the process • Regardless of the starting point, the process will converge to this probability • The rate of convergence depends on properties of the transition probability Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 127. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 127 Sampling from the Stationary Probability • This theory suggests how to sample from the stationary probability: – Set X1 = i, for some random/arbitrary i – For t = 1, 2, …, n • Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt) – return xn • If n is large enough, then this is a sample from P*(X) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 128. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 128 Designing Markov Chains • How do we construct the right chain to sample from? – Ensuring aperiodicity and irreducibility is usually easy • Problem is ensuring the desired stationary probability Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 129. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 129 Designing Markov Chains Key tool: • If the transition probability satisfies P ( Xt  1  j |Xt i ) Q (X  j ) P ( Xt  1 i |Xt  j )  Q ( X i ) whenever P ( Xt  1  j | Xt  i )  0 then, P*(X) = Q(X) • This gives a local criteria for checking that the chain will have the right stationary distribution Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 130. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 130 MCMC Methods • We can use these results to sample from P(X1,…,Xn|e) Idea: • Construct an ergodic & aperiodic Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e) • Simulate the chain n steps to get a sample Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 131. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 131 MCMC Methods Notes: • The Markov chain variable Y takes as value assignments to all variables that are consistent evidence V (Y )  { x 1 ,..., x n V ( X 1 )   V ( X 1 ) | x 1 ,..., x n satisfy e } • For simplicity, we will denote such a state using the vector of variables Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 132. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 132 4.2 Two MCMC Sampling Algorithms • Gibbs Sampler • Metropolis-Hastings Sampler Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 133. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 133 Gibbs Sampler • One of the simplest MCMC method • Each transition changes the state of one Xi • The transition probability defined by P itself as a stochastic procedure: – Input: a state x1,…,xn – Choose i at random (uniform probability) – Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e) – let x’j = xj for all j  i – return x’1,…,x’n Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 134. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 134 Correctness of Gibbs Sampler • How do we show correctness? Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 135. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 135 Correctness of Gibbs Sampler • By chain rule P(x1,…,xi-1, xi, xi+1,…,xn|e) = P(x1,…,xi-1, xi+1,…,xn|e)P(xi|x1,…,xi-1, xi+1,…,xn, e) • Thus, we get Transition P ( x 1 ,, x i  1 , x i , x i  1 ,, x n |e ) P ( x i |x 1 ,, x i  1 , x i  1 ,, x n ,e ) P ( x 1 ,, x i  1 , x 'i , x i  1 ,, x n |e )  P ( x 'i |x 1 ,, x i  1 , x i  1 ,, x n ,e ) • Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 136. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 136 Gibbs Sampling for Bayesian Network • Why is the Gibbs sampler “easy” in BNs? • Recall that the Markov blanket of a variable separates it from the other variables in the network – P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi ) • This property allows us to use local computations to perform sampling in each transition Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 137. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 137 Gibbs Sampling in Bayesian Networks • How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ? • Let Y1, …, Yk be the children of Xi – By definition of Mbi, the parents of Yj are in Mbi{Xi} • It is easy to show that P ( xi | Pa i ) P ( y j | pa y j ) P ( xi | Mb i )  j  P ( x ' | Pa ) P ( y x 'i i i j j | pa y j ) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 138. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 138 Metropolis-Hastings • More general than Gibbs (Gibbs is a special case of M-H) • Proposal distribution arbitrary q(x’|x) that is ergodic and aperiodic (e.g., uniform) • Transition to x’ happens with probability (x’|x)=min(1, P(x’)q(x|x’)/P(x)q(x’|x)) • Useful when computing P(x) infeasible • q(x’|x)=0 implies P(x’)=0 or q(x|x’)=0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 139. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 139 Sampling Strategy • How do we collect the samples? Strategy I: • Run the chain M times, each for N steps – each run starts from a different state points • Return the last state in each run M chains Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 140. Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 140 Sampling Strategy Strategy II: • Run one chain for a long time • After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright