07 approximate inference in bn
Upcoming SlideShare
Loading in...5
×
 

07 approximate inference in bn

on

  • 1,281 views

 

Statistics

Views

Total Views
1,281
Views on SlideShare
1,272
Embed Views
9

Actions

Likes
0
Downloads
47
Comments
0

1 Embed 9

http://bn-course.wikispaces.com 9

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

07 approximate inference in bn 07 approximate inference in bn Presentation Transcript

  • Bayesian Networks Unit 7 Approximate Inference in Bayesian Networks Wang, Yuan-Kai, 王元凱 ykwang@mails.fju.edu.tw http://www.ykwang.tw Department of Electrical Engineering, Fu Jen Univ. 輔仁大學電機工程系 2006~2011 Reference this document as: Wang, Yuan-Kai, “Approximate Inference in Bayesian Networks," Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 2 Goal of This Unit • P(X|e) inference for Bayesian networks • Why approximate inference – Exact inference is too slow because of exponential complexity • Using approximate approaches – Sampling methods • Likelihood weighting sampling • Markov Chain Monte Carlo sampling – Loopy belief propagation – Variational method Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 3 Related Units • Background – Probabilistic graphical model – Exact inference in BN • Next units – Probabilistic inference over time Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 4 Self-Study References • Chapter 14, Artificial Intelligence-a modern approach, 2nd, by S. Russel & P. Norvig, Prentice Hall, 2003. • Inference in Bayesian networks, B. D’Ambrosio, AI Magazine, 1999. • Probabilistic Inference in graphical models, M. I. Jordan & Y. Weiss. • An introduction to MCMC for machine learning. Andrieu, C., De Freitas, J., Doucet, A., & Jordan, M. I., Machine Learning, vol. 50, pp.5-43, 2003. • Computational Statistics Handbook with Matlab, W. L. Martinez and A. R. Martinez, Chapman & Hall/CRC, 2002 – Chapter 3 Sampling Concepts – Chapter 4 Generating Random Variables Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 5 Structure of Related Lecture Notes Problem Structure Data Learning PGM B E Representation Learning A Unit 5 : BN Units 16~ : MLE, EM Unit 9 : Hybrid BN J M Units 10~15: Naïve Bayes, MRF, HMM, DBN, Kalman filter P(B) Parameter P(E) Learning P(A|B,E) P(J|A) Query Inference P(M|A) Unit 6: Exact inference Unit 7: Approximate inference Unit 8: Temporal inference Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 6 Contents 1. Sampling .......................................................... 11 2. Random Number Generator .......................... 20 3. Stochastic Simulation ……............................. 70 4. Markov Chain Monte Carlo .......................... 113 5. Loopy Belief Propagation …………………. 145 6. Variational Methods ………………………... 146 7. Implementation …………………………….. 147 8. Summary ……………………………………. 148 9. References …………………………………… 151 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 7 4 Steps of Inference • Step 1: Bayesian theorem P ( X , E  e) P ( X | E  e)   P ( X , E  e) P ( E  e) • Step 2: Marginalization    P( X , E  e, H  h) hH • Step 3: Conditional independence     P( X i | Pa ( X i )) hH i 1~ n • Step 4: Product sum computation (Enumeration) – Exact inference – Approximate inference Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 8 Five Types of Queries in Inference • For a probabilistic graphical model G • Given a set of evidence E=e • Query the PGM with – P(e) : Likelihood query – arg max P(e) : Maximum likelihood query – P(X|e) : Posterior belief query – arg maxx P(X=x|e) : (Single query variable) Maximum a posterior (MAP) query – arg maxx …x P(X1=x1, …, Xk=xk|e) : 1 k Most probable explanation (MPE) query Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 9 Approximate Inference v.s. Exact Inference • Exact inference: P(X|E) = 0.71828 – Get exact probability value – Using the inference steps derived by probabilistic formula – Need exponential time complexity • Approximate inference: P(X|E)  0.71 – Get approximate probability value – Using sampling theorem – Need only polynomial time complexity, fast computation Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 10 Why Approximate Inference • Large treewidth – Large, highly connected graphical models – Treewidth may be large (>40) in sparse networks • In many applications, approximation are sufficient – Example: P(X = x|e) = 0.3183098861 – Maybe P(X = x|e)  0.3 is a good enough approximation – e.g., we take action only if P(X=x|e) > 0.5 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 11 1. Sampling • 1.1 What Is Sampling • 1.2 Sampling for Inference Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 12 Basic Idea of Sampling • Why sampling – Estimate some values by random number generation 1. Sampling –  Random number generating – Draw N samples from a known distribution P – Generate N random numbers from a known distribution S 2. Estimation ˆ – Compute an approximate probability P , which approximates the real posterior probability P(X|E) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 13 1.1 What Is Sampling • A very simple example with a random variable : coin toss – Tossing the coin, get head or tail – It is a Boolean R.V. • coin = head or tail – If it is unbiased coin, head and tail have equal probability • A prior probability distribution P(Coin) = <0.5, 0.5> • Uniform distribution – Assume we have a coin but we do not know it is unbiased Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 14 Sampling of Coin Toss • Sampling in this example = flipping the coin many times N – e.g., N=1000 times – One flipping  get one sample – Ideally, 500 heads, 500 tails • P(head) = 500/1000=0.5 P(tail) = 500/1000=0.5 – Practically, 5001 heads, 499 tails • P(head) = 501/1000=0.501 P(tail) = 499/1000=0.499 • After the sampling, – We can estimate probability distribution – Check if it is biased Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 15 Sampling & Estimation (Math) • For a Boolean random variable X – P(X) is prior distribution = <P(x), P(x)> – Using a sampling algorithm to generate N samples – Say N(x) is the number of samples that x is true, N(x) x is false N ( x) ˆ N ( x ) ˆ  P( x),  P (x ) N N N ( x) N ( x ) lim  P( x), lim  P ( x ) N  N N  N Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 16 1.2 Sampling for Inference • Given a Bayesian network G including (X1, …, Xn) – We get a joint probability distribution P(X1, …, Xn) =  P(Xi|Pa(Xi)) • For a query P(X|E=e) – P(X|e) =   P(Xi | Parent(Xi)) – It is hard to compute • Need exponential time in number of Xi – We will try to use sampling to compute it Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 17 Compute P(X|e) by Sampling • Sampling Explained in – Generate N samples of Sections 2,3,4 P(X1, …, Xn) =  P(Xi|Pa(Xi)) • Estimation – Use N samples to estimate P(X,e)  N(X,e)/N – Use N samples to estimate P(e)  N(e)/N – Estimate P(X|e) by P(X,e) / P(e) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 18 What Is Sampling Algorithm • The algorithm to – Generate samples from a known probability distribution P ˆ – Estimate the approximate probability P Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 19 Various Sampling Algorithms • Stochastic simulation Section 3 – Direct Sampling – Rejection sampling • Reject samples disagreeing with evidence – Likelihood weighting • Use evidence to weight samples • Markov chain Monte Carlo Section 4 (MCMC) – Sample from a stochastic process whose stationary distribution is the true posterior Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 20 2. Random Number Generator • Very important for sampling algorithm • Introduce basic concepts related to sampling of Bayesian networks • Subsections – 2.1 Univariate – 2.2 Multivariate Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 21 RNG In Programming Languages • Random number generator (RNG) – C/C++: rand() – Java: random() – Matlab: rand() • Why should we discuss it? – They generate random numbers with uniform distribution – How to generate • Gaussian, … • Multivariate, dependent random variables • Non-closed-form distribution? Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 22 Generate a Random Number (1/2) • Examples in C – int i = rand(); – Return 0 ~ RAND_MAX (32767) – It generates integers • Generate a random number between 1 and n (n<32767) – int i = 1 + ( rand() % n ) – (rand() % n) returns a number between 0 and n - 1 – Add 1 to make random number between 1 and n – It generates integers, but not real numbers Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 23 Generate a Random Number (2/2) • Ex: integer between 1 and 6 –1 + ( rand() % 6) • Ex: real number between 0 and 1 –double i = rand() / RAND_MAX • Exercise – Real number between 10 and 20 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 24 Generate Many Random Numbers Repeatedly • Using loop for repeated generation – for (int i=0; i<1000; i++) { rand(); } – int i, j[1000]; for (i=0; i<1000; i++) { j[i] = 1 + rand() % 6; } rand() generates a number uniformly Uniform distribution Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 25 Why Generate Random Numbers • Simulate random behavior • Make random decision • Estimate some values Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 26 Random Behavior/Decision (1/2) • Flip a coin for decision (Boolean) – Fair: each face has equal probability – int coin_face; if (rand() > RAND_MAX/2) coin_face = 1; else coin_face = 0; – int coin_face; coin_face = rand() % 2; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 27 Random Behavior/Decision (2/2) • Random decision of multiple choices – Discrete random variable • Ex: roll a die Uniform distribution – Fair: each face has equal probability • int die_face; //Random variable die_face = rand() % 6; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 28 Estimation • If we can simulate a random behavior • We can estimate some values – First, we repeat the random behavior – Then we estimate the value Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 29 Example: The Coin Toss • Flip the coin 1000 times to estimate the fairness of the coin – int coin_face; //Random variable int frequency[2]; Uniform distribution for (i=0; i<1000; i++) frequency { coin_face = rand() % 2 frequency[coin_face]++; } 0 1 Coin face Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 30 Example : Area of Circle (Estimation) • int x, y; //Two random variables int N=1000, NCircle=0, Area; for (i=0; i<N; i++) { x = rand() / RAND_MAX; x and y are y = rand() / RAND_MAX; independent if ( (x*x + y*y) <= 1 ) NCircle = NCircle + 1; } A random number ? Area = 4 * (NCircle/N); We call (x,y) a sample Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 31 Multiple Dependent Random Variables • Markov Chain: n random variables X1 ... Xk ... Xn • Bayesian Networks: 5 random variables Burglary Earthquake Alarm What is a sample ? John Calls Mary Calls Variables are dependent Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 32 Sampling • It is to randomly generate a sample – For a random variable X or Univariate A set of random variables X1, …, Xn Multivariate • Boolean, Discrete, Continuous • Multivariate – Independent, dependent – According to a probability distribution P(X) • Discrete X: Histogram • Continuous X: – Uniform, Gaussian, or – Any distribution: Gaussian mixture models Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 33 Sub-Sections for Generating a Sample • 2.1 Univariate – Uniform, Gaussian, Gaussian mixture • 2.2 Multivariate – Uniform – Gaussian • Independent, dependent – Any distribution • Gaussian mixture – Independent, dependent • Bayesian network Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 34 2.1 Univariate • For a random variable X – Boolean, discrete, continuous, hybrid • We know P(X) is – Uniform, Gaussian, Gaussian mixture • Generate a sample X according to P(X) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 35 Uniform Generator • Every programming language provides a rand()/random() function to generate a uniform-distributed number – Integer number within [0, MAX) • Sampling a Boolean uniform number – rand() %2 • Sampling a discrete uniform number within [0, d) – rand() % d • Sampling a continuous uniform number – Within [0, 1): rand() % MAX – Within [a, b): a + (rand() % MAX)*(a-b) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 36 Example : Uniform Generator • x=rand(1,10000); • h=hist(x,20); 600 • bar(h); 500 400 300 200 100 0 0 5 10 15 20 25 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 37 Gaussian Generator (1/2) • Sampling Gaussian can be obtained by uniform distribution • There are functions in C/Java/Matlab to randomly generate a univariate Gaussian real number with (, )=(0,1) – C : Numerical recipies in C, – Java: Random.nextGaussian() – Matlab: randn() • Suppose it is called Gaussian() Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 38 Gaussian Generator (2/2) • Sampling a continuous Gaussian number with (, ) – (Gaussian() * ) +  • Sampling a discrete Gaussian number with (, ) ? Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 39 Example : Gaussian Generator (1/2) • Pseudo codes – Assume Gaussian() is a pseudo function to generate Gaussian numbers – double x[10000]; for (i=0; i<10000; i++) x[i] = Gaussian(); – for (i=0; i<10000; i++) x[i] =  + Gaussian() * ; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 40 Example : Gaussian Generator (2/2) • Matlab • Java – x=randn(1,10000); – Random r=new – h=hist(x,20); Random(); 1600 – bar(h); int x[10000]; 1400 for (i=0;i<10000;i++) 1200 x[i]=r.nextGaussian(); 1000 800 600 400 200 0 0 5 10 15 20 25 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 41 Gaussian Mixture Generator (1/2) • Random variable X with Gaussian – P(X) = N(X; , ) • Random variable Y with Gaussian mixture – P(Y) = m mN(Y; m, m) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 42 Gaussian Mixture Generator (2/2) • Generate N samples of X – for (i=0; i<N; i++) x[i]=(Gaussian() * ) +  • Generate N samples of Y with mixture of M Gaussians – Each Gaussian m has m, m – for (m=0; m<M; m++) for (i=0; i<N*m; i++) y[m][i] = (Gaussian() * m) + m Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 43 Example : Gaussian Mixture Generator • N=10000; pi1=0.8; pi2=0.2; • mu1=0; mu2=15; sigma1=3; sigma2=5; • x1 = mu1 + randn(1,N*pi1) * sigma1; • x2 = mu2 + randn(1,N*pi2) * sigma2; 900 • x = [x1, x2]; 800 • h=hist(x,50); 700 • bar(h); 600 500 400 300 200 100 0 0 10 20 30 40 50 60 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 44 2.2 Multivariate • For random variables X1,… ,Xn – Boolean, discrete, continuous, hybrid • We know P(X1,… ,Xn) is – Uniform, Gaussian, Gaussian mixture, any distribution • Generate a sample (X1,… ,Xn) according to P(X1,… ,Xn) – Independent – Dependent Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 45 Multivariate Boolean Uniform Generator • Boolean random variables X1,… ,Xn • int X[n]; // A sample for (i=0; i<n; i++) X[i] = rand() % 2; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 46 Multivariate Discrete Uniform Generator • Discrete random variables X1,…, Xn – Each with d discrete values: [0, d-1] – Each Xi is uniform distributed – X1,…, Xn must be independent • int X[n]; // A sample for (i=0; i<n; i++) X[i] = rand() % d; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 47 Multivariate Gaussian Generator - Independent (1/2) • Pseudo codes • For n random variables X=(X1,…,Xn) – Gaussian : N(X; , ) • Mean vector:  • Covariance matrix: =[ij] • X1,…,Xn are independent – ij = 0 for ij • Generate a sample of X  Generate each Xi independently Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 48 Multivariate Gaussian Generator - Independent (2/2) • Generate a sample of X =(X1,…,Xn) with i=0, ii=1, ij = 0 for ij – int X[n]; // a sample for (i=0; i<n; i++) X[i] = Gaussian(); • Generate a sample of X =(X1,…,Xn) with i0, ii 1, ij = 0 for ij – int X[n]; // a sample for (i=0; i<n; i++) X[i] = i + Gaussian() * ii; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 49 Example – Matlab (1/2) mx=[0 0];  X  (0,0) T Cx=[1 0; 0 1]; 1 0 x1=-3:0.1:3; X    x2=-3:0.1:3; 0 1  for i=1:length(x1), for j=1:length(x2), f(i,j)=(1/(2*pi*det(Cx)^ 1/2))*exp((-1/2)*([x1(i) x2(j)]- mx)*inv(Cx)*([x1(i);x2( j)]-mx)); end end mesh(x1,x2,f) pause; contour(x1,x2,f) pause Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 50 Example – Matlab (2/2) • Randomly generate 1000 samples for 1 0  X  (0,0) ,  X   T  0 1  y1=randn(1,1000); y2=randn(1,1000); plot(y1,y2,.); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 51 Multivariate Gaussian Generator - Dependent (1/4) • For n random variables X=(X1,…,Xn) –Gaussian : N(X; , ) • Mean vector:  • Covariance matrix: =[ij] –  is a positive definite matrix • Symmetric and all eigenvalues (pivots) > 0 – For general matrix A : A= LDU • L: lower triangular, U: upper triangular D: diagonal matrix of pivots – For symmetric matrix S: S = LDLT – For positive definite matrix  = LDL     PPT T T= L D L D – This is called Cholesky decomposition • X1,…,Xn are dependent –ij  0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 52 Multivariate Gaussian Generator - Dependent (2/4) • Generate a sample of X with ,  – Perform Cholesky decomposition of  • Cholesky decomposition is pivot decomposition for positive definite matrix •  = PP-1 = PPT – Generate independent Gaussian Y=(Y1,…,Yn ) with i=0, i=1 – X = PY +  Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 53 Multivariate Gaussian Generator - Dependent (3/4) • Pseudo code to generate a sample of X with ,  – Matrix ; Vector ; Vector X(n), Y(n); // a sample Matrix P=chol(); //Cholesky decomp. for (i=0; i<n; i++) Y(i) = Gaussian(); X=P*Y+ Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 54 Multivariate Gaussian Generator - Dependent (4/4) • Proof – For n random variables X=(X1,…,Xn) with ,  – Generate n independent, zero-mean, unit variance normal random variables Y=(Y1,…,Yn) 1  0 Y  (Y1 , , Yn )T , Y  (0, ,0)T , Y         0  1    – Take X = PY+, where  =PP -1 =PPT  Covariance Matrix of X  E ( X   )( X   )T   E{( PY )( PY )T }  E{PYY T P T }  PE{YY T }P T  PP T   Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 55 Example – Matlab (1/4) Assume  X  (0,0)T  1 1 / 2  1 0  X   , P  1 / 2 3  1 / 2 1   2   1/ 2 Matlab: mx=[0 0]; Cx=[1 1/2; 1/2 1]; P=chol(Cx); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 56 Example – Matlab (2/4) • Randomly generate 1000 samples for  1 1 / 2  X  (0,0) ,  X   T 1/ 2 1  • mx=zeros(2,1000); y1=randn(1,1000); y2=randn(1,1000); y=[y1;y2]; P=[1, 0; 1/2, sqrt(3)/2]; x=P*y+mx; x1=x(1,:); x2=x(2,:); plot(x1,x2,.); r=corrcoef(x1,x2); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 57 Example – Matlab (3/4) Assume  X  (5,5)T  1 0.9 1 0  X   , P   9 19  0.9 1   10 10    0.9 Matlab: • mx=[5 5]; • Cx=[1 9/10; 9/10 1]; • P=chol(Cx); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 58 Example – Matlab (4/4) • Randomly generate 1000 samples for  1 0.9  X  (5,5) ,  X   T  0. 9 1   • mx=5*ones(2,1000); y1=randn(1,1000); y2=randn(1,1000); y=[y1;y2]; P=[1, 0; 9/10, sqrt(19)/10]; x=P*y+mx; x1=x(1,:); x2=x(2,:); plot(x1,x2,.); r=corrcoef(x1,x2); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 59 Multivariate Gaussian Mixture Generator • Generate N samples of X with mixture of M Gaussians (Matlab-like pseudo code) – for (m=0; m<M; m++) { Matrix P=chol(m) //Cholesky decomposition for (i=0; i<N*m; i++) { //Generate n independent normally distributed // R.V. (=0, =1) y = randn(1, n) // Transform y into x x=P*y+ } } Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 60 Example – Matlab (1/4) • Combine the previous two Gaussians: 1=0.5, 2=0.5, 7 1  (0,0) 6 T 5  1 1 / 2 4 1   1/ 2 1  3   2  2  (5,5) T 1 0  1 0. 9  -1 2    -2 0.9 1  -3 -4 -2 0 2 4 6 8 10 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 61 Example – Matlab (2/4) • pi1= 0.5; pi2=0.5; N=2000; mx1=zeros(2,pi1*N); Cx1=[1 1/2; 1/2 1]; P1=chol(Cx1); %P=[1, 0; 1/2, sqrt(3)/2]; y1_1=randn(1,pi1*N); y1_2=randn(1,pi1*N); y1=[y1_1;y1_2]; x1=P1*y1+mx1; x1_1=x1(1,:); x1_2=x1(2,:); mx2=5*ones(2,pi2*N); Cx2=[1 9/10; 9/10 1]; P2=chol(Cx2); %P=[1, 0; 1/2, sqrt(3)/2]; y2_1=randn(1,pi2*N); y2_2=randn(1,pi2*N); y2=[y2_1;y2_2]; x2=P2*y2+mx2; x2_1=x2(1,:); x2_2=x2(2,:); z1=[x1_1,x2_1]; z2=[x1_2,x2_2]; plot(z1,z2,.); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 62 Example – Matlab (3/4) • Combine the previous two Gaussians 1=0.2, 2=0.8 7 6 1  (0,0) T 5  1 1 / 2 4 1   1/ 2 1  3   2  2  (5,5) T 1 0  1 0. 9  -1 2    -2 0.9 1  -3 -4 -2 0 2 4 6 8 10 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 63 Example – Matlab (4/4) • pi1= 0.2; pi2=0.8; N=2000; mx1=zeros(2,pi1*N); Cx1=[1 1/2; 1/2 1]; P1=chol(Cx1); %P=[1, 0; 1/2, sqrt(3)/2]; y1_1=randn(1,pi1*N); y1_2=randn(1,pi1*N); y1=[y1_1;y1_2]; x1=P1*y1+mx1; x1_1=x1(1,:); x1_2=x1(2,:); mx2=5*ones(2,pi2*N); Cx2=[1 9/10; 9/10 1]; P2=chol(Cx2); %P=[1, 0; 1/2, sqrt(3)/2]; y2_1=randn(1,pi2*N); y2_2=randn(1,pi2*N); y2=[y2_1;y2_2]; x2=P2*y2+mx2; x2_1=x2(1,:); x2_2=x2(2,:); z1=[x1_1,x2_1]; z2=[x1_2,x2_2]; plot(z1,z2,.); Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 64 Exercise • Write a program to randomly generate 1000 samples of 3-dimensional Gaussian with =(5,10,-3), =(2,1,3;4,2,2;3,1,2) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 65 Any Distribution • For random variables X1,… ,Xn – Boolean, discrete, continuous, hybrid • We know P(X1,… ,Xn) has no closed-form formula – Independent: P(X1,… ,Xn)= P(X1)… P(Xn) – Dependent: P(X1,… ,Xn)=  P(Xi | Parent(Xi)) • Generate a sample (X1,… ,Xn) according to P(X1,… ,Xn) – Independent: generate each Xi by P(Xi) – Dependent: generate each Xi by P(Xi| Parent(Xi)) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 66 Two Boolean R.V.s - Independent • X1, X2 have distributions : – P(X1)=<0.67, 0.33>, P(X2)=<0.75,0.25> • int X1, X2; P(X1) for (i=0; i<1000; i++) 0.67 { if (rand() > RAND_MAX/3) X1 = 1; else X1 = 0; 0 1 X1 if (rand() > RAND_MAX/4) P(X2) X2 = 1; 0.75 else X2 = 0; } 0 1 X2 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 67 Two Boolean R.V.s - Dependent • X1, X2 have distributions : – P(X1)=<0.67, 0.33> – P(X2|X1=T)=<0.75,0.25>, P(X2|X1=F)=<0.8,0.2> • Generate a sample (x1, x2) if (rand() > RAND_MAX/3) x1 = 1; else x1 = 0; if (x1==1) if (rand() > RAND_MAX/4) x2 = 1; else x2 = 0; else // x1==0 if (rand() > RAND_MAX/5) x2 = 1; else x2 = 0; Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 68 Markov Chain • Markov Chain: n random variables X1 ... Xk ... Xn Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 69 Bayesian Network • Example: 5 random variables Burglary Earthquake Alarm John Calls Mary Calls Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 70 3. Stochastic Simulation • Also called – Monte Carlo Methods – Sampling Methods • Sub-sections – 3.1 Direct sampling – 3.2 Rejection sampling – 3.3 Likelihood weighting Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 71 3.1 Direct Sampling • Generate N samples randomly • For the inference P(X|E) – P(X|E)= P(X^E) / P(E) – Get N(E) & N(X^E) from the N samples • N(E) : No. of samples of E • N(X^E) : No. of samples of X and E – P(E) = N(E) / N, P(X^E) = N(X^E) / N – P(X|E) = N(X^E) / N(E) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 72 Example (1/4) • For the sprinkler network – Estimate P(w|r) by direct sampling – 4 random variables – A sample = (c,s,r,w) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 73 Example (2/4) • Generate 1000 samples Cloudy Sprinkler Rain WetGrass T T T F F T T F F F T T T T T F T T T F ... ... ... ... F T T F Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 74 Example (3/4) • P(r| w) = P(r, w)/P(w) Nw: No. of WetGrass=False Nr^w: No. of (Rain=True&WetGrass=False) Cloudy Sprinkler Rain WetGrass T T T F F T T F Nr^w / Nw F F T T T T F F ... ... ... ... F T T F Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 75 Example (4/4) • P(R|w) – = P(R, w)/P(w) – = < P(r ^ w)/P(w), P(r ^ w)/P(w) > Cloudy Sprinkler Rain WetGrass T T T F F T T F F F T T T T F F ... ... ... ... F T T F Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 76 How to Generate a Sample for the Bayesian Network? (1/3) • The sprinkler Bayesian network A sample is an atomic event : (cloundy,sprinkler,rain,wetgrass) =(T, F, T, T) •Assume a sampling order: [ Cloudy, Sprinkler, Rain, WetGrass ] Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 77 How to Generate a Sample for the Bayesian Network? (2/3) • int C, S, R, W; for (i=0; i<1000; i++) { if (rand() > RAND_MAX/2) C = T; else C = F; if (rand() > RAND_MAX/2) S = T; else S = F; if (rand() > RAND_MAX/2) R = T; else R = F; if (rand() > RAND_MAX/2) W = T; else W = F; } Incorrect Implementation Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 78 How to Generate a Sample for the Bayesian Network? (3/3) • int C, S, R, W; for (i=0; i<1000; i++) { if (rand() > RAND_MAX/2) C = T; else C = F; if (C==T) if (rand() > RAND_MAX*0.9) S = T; else S = F; else // C==F if (rand() > RAND_MAX/2) S = T; else S = F; ... } Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 79 An Example Generating One Sample (1/8) • The sampling algorithm 1.Sample from P(Cloudy)=<0.5, 0.5> – Suppose it returns true 2.Sample from P(Sprinkler|Cloudy=true)=<0.1,0.9> – Suppose it returns false 3.Sample from P(Rain|Cloudy=true)=<0.8,0.2> – Suppose it returns true 4.Sample from P(WetGrass|Sprinkler=false, Rain=true) = <0.9,0.1> – Suppose it returns true Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 80 An Example Generating One Sample (2/8) C S R W Samples: Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 81 An Example Generating One Sample (3/8) Random sampling: C S R W Cloudy Samples: c Return: Cloudy=true Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 82 An Example Generating One Sample (4/8) C S R W c Samples: Random sampling 1. Sprinkler 2. Rain Given Cloudy=true Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 83 An Example Generating One Sample (5/8) C S R W c s Samples: Random sampling Sprinkler Given Cloudy=true Return: Sprinkler=false Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 84 An Example Generating One Sample (6/8) C S R W c s r Samples: Random sampling Rain Given Cloudy=true Return: Rain=true Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 85 An Example Generating One Sample (7/8) C S R W c s r Samples: Random sampling WetGrass Given Rain=true, Sprinkler=false Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 86 An Example Generating One Sample (8/8) C S R W c s r w Samples: Random sampling WetGrass Given Rain=true, Sprinkler=false Return: WetGrass=true Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 87 The Algorithm (1/2) • To generate one sample Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 88 The Algorithm (2/2) • In previous example – We get a sample [true, false, true, true] of a Bayesian network using the Prior- Sample • The sampling of a Bayesian network – Repeat the sampling N times – We get N samples • We can use the N samples to compute any query probability in the Bayesian network Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 89 How It Works (1/2) • Why any probability can be answered from the sampling? – The N samples is actually a full joint distribution table (FJD) C S R W C S R W P T T T F T T T F 0.02 F T T F F T T F 0.13 F F T T F F T T 0.04 T T F F T T F F 0.15 ... ... ... ... ... ... ... ... ... F T T F FJD Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 90 Why It Works (2/2) • A sample is an atomic event (x1, ..., xn) • P(x1, ..., xn)  N(x1, ..., xn) / N • Therefore, a FJD is generated from the N samples • Note: N < 2n Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 91 Exercise: Direct Sampling p(smart)=.8 p(study)=.6 Query: What is the probability smart study that a student studied, given that they pass the exam? p(fair)=.9 prepared fair p(prep|…) smart smart pass study .9 .7 smart smart study .5 .1 p(pass|…) prep prep prep prep fair .9 .7 .7 .2 fair .1 .1 .1 .1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 92 Problems of Direct Sampling • It needs to generate very many samples in order to obtain the approximate FJD • For a query of conditional probability P(X|e) – Can we just approximate the conditional probability? – Yes, the following two algorithms will do this Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 93 3.2 Rejection Sampling ˆ • P( X | e) is estimated from samples agreeing with e Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 94 An Example • Estimate P(Rain|Sprinkler=true) using 100 samples – 27 samples have Sprinkler = true – Of these, 8 have Rain=true and 19 have Rain=false  – P(Rain|Sprinkler=true) = Normalize(<8,19>) = <0.296, 0.704> • Similar to a basic real-world empirical estimation procedure Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 95 Analysis of Rejection Sampling P ( X | e)  ˆ N ( X ,e ) N (e)  P ( X ,e ) P (e)  P ( X | e) • Hence rejection sampling returns consistent posterior estimates • Problem: expensive if P(e) is small – P(e) drops off exponentially with number of evidence variables! Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 96 3.3 Likelihood Weighting • Avoids the inefficiency of rejection sampling – By generating only events consistent with the evidence variables e • Idea Randomly – Fix evidence variables, generate a sample – Sample only hidden variables event – Weight each sample event by the likelihood it accords the evidence • Events have different weights Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 97 An Example (1/9) • Query P(Rain|sprinkler, wetgrass) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 98 An Example (2/9) 1. Set the weight  =1.0 2. Sample from P(Cloudy)=<0.5,0.5> • Suppose it returns true 3. The evidence Sprinkler=true. So we set  =  P(sprinkler|cloudy)=1*0.1=0.1 4. Sample from P(Rain|cloudy)=<0.8,0.2> • Suppose it returns true 5. The evidence WetGrass=true. So we set  =  P(wetgrass|sprinkler,rain) =0.1*0.99=0.099 A sample event (true, true, true, true) with weight 0.099 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 99 An Example (3/9) =1.0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 100 An Example (4/9) =1.0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 101 An Example (5/9) =1.0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 102 An Example (6/9) =1.0  0.1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 103 An Example (7/9) =1.0  0.1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 104 An Example (8/9) =1.0  0.1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 105 An Example (9/9) =1.0  0.1  0.99 = 0.099 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 106 The Algorithm (1/2) • The example generates a sample event (true, true, true, true) for the query P(Rain|sprinkler, wetgrass) • Repeat the sampling N times – We get N sample events – Each event has a likelihood weight  – 1 = rain=true , 1 = rain=false  • P(Rain|sprinkler, wetgrass) = < 1/(1+2), 2/(1+2) > Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 107 The Algorithm (2/2) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 108 Exercise: Likelihood Weighting p(smart)=.8 p(study)=.6 Query: What is the probability smart study that a student studied, given that they pass the exam? p(fair)=.9 prepared fair p(prep|…) smart smart pass study .9 .7 smart smart study .5 .1 p(pass|…) prep prep prep prep fair .9 .7 .7 .2 fair .1 .1 .1 .1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 109 Analysis (1/3) • Why the algorithm works? P(X|E=e) • Let the sampling probability for WEIGHTED-SAMPLE be SWS – The evidence variables E are fixed with e – All the other variables Z = {X}  Y – The algorithm samples each variable in Z given its parent values l SWS ( z , e)   P( zi | parents( Z i )) i 1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 110 Analysis (2/3) • The likelihood weight w for a given sample (z, e)=(x, y, e) is m w( z , e)   P (ei | parents ( Ei )) i 1 • The weighted probability of a sample (z,e)=(x, y, e) is SWS ( z , e) w( z , e) l m   P( zi | parents ( Z i )) P (ei | parents ( Ei )) i 1 i 1 n  P ( x, y , e)  P( x1 , , xn )   P( xi | parents ( X i )) i 1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 111 Analysis (3/3) P( x | e)    NWS ( x, y, e) w( x, y, e) ˆ y    SWS ( x, y, e) w( x, y, e) y    P ( x, y , e) y   P ( x, e)  P ( x | e) So the algorithm works Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 112 Discussions • Likelihood weighting is efficient because it uses all the samples generated • However, it suffers a degradation in performance as the no. of evidence variables increases, because – Most samples will have very low weights, – The weighted estimate will be dominated by the tiny fraction of samples that have infinitesimal likelihood Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 113 4. Inference by MCMC • Key idea – Sampling process as a Markov Chain • Next sample depends on the previous one – Approximate any posterior distribution • "State" of network = current assignment to all variables • Generate next state – by sampling one variable given Markov blanket • Sample each variable in turn, keeping evidence fixed Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 114 The Markov Chain • With Sprinkler =true, WetGrass=true, there are four states: Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 115 Markov Blanket Sampling • Markov blanket of Cloudy is – Sprinkler and Rain • Markov blanket of Rain is – Cloudy, Sprinkler, and WetGrass • Probability given the Markov blanket is calculated as follows – P(xi|MB(Xi)) = P(xi|Parents(Xi)) ZjChildren(Xi)P(zj|Parents(Zj)) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 116 An Example (1/2) • Estimate P(Rain|sprinkler,wetgrass) • Loop for N times – Sample Cloudy or Rain given its Markov blanket • Count number of times Rain=true and Rain=false in the samples Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 117 An Example (2/2) • E.g., visit 100 states – 31 have Rain=true, – 69 have Rain=false • P(Rain|sprinkler,wetgrass) = Normalize(<31, 69>) = <0.31, 0.69> Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 118 The Algorithm Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 119 Why it works • Skipped – Details in pp. 517-518 in the AIMA 2e textbook Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 120 Sub-Sections • 4.1 Markov chain theory • 4.2 Two MCMC sampling algorithms Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 121 4.1 Markov Chain Theory • Suppose X1, X2, … take some set of values – wlog. These values are 1, 2, ... • A Markov chain is a process that corresponds ... ... to the network: X1 X2 X3 Xn • To quantify the chain, we need to specify – Initial probability: P(X1) – Transition probability: P(Xt+1|Xt) • A Markov chain has stationary transition probability: P(Xt+1|Xt) same for all times t Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 122 Irreducible Chains • A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0 – There is a positive probability of reaching j from i after some number steps • A chain is irreducible if every state is accessible from every state Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 123 Ergodic Chains • A state is positively recurrent if there is a finite expected time to get back to state i after being in state i – If X has finite number of states, then this is suffices that i is accessible from itself • A chain is ergodic if it is irreducible and every state is positively recurrent Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 124 (A)periodic Chains • A state i is periodic if there is an integer d such that when n is not divisible by d P(Xn = i | X1 = i ) = 0 • Intuition: only every d steps state i may occur • A chain is aperiodic if it contains no periodic state Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 125 Stationary Probabilities Thm: • If a chain is ergodic and aperiodic, then the limit n   P ( X n | X 1  i ) lim exists, and does not depend on i • Moreover, let P * ( X  j )  n   P ( X n  j | X 1  i ) lim then, P*(X) is the unique probability satisfying P * (X  j )   P ( X t  1  j | X t  i )P * ( X  i ) i Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 126 Stationary Probabilities • The probability P*(X) is the stationary probability of the process • Regardless of the starting point, the process will converge to this probability • The rate of convergence depends on properties of the transition probability Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 127 Sampling from the Stationary Probability • This theory suggests how to sample from the stationary probability: – Set X1 = i, for some random/arbitrary i – For t = 1, 2, …, n • Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt) – return xn • If n is large enough, then this is a sample from P*(X) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 128 Designing Markov Chains • How do we construct the right chain to sample from? – Ensuring aperiodicity and irreducibility is usually easy • Problem is ensuring the desired stationary probability Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 129 Designing Markov Chains Key tool: • If the transition probability satisfies P ( Xt  1  j |Xt i ) Q (X  j ) P ( Xt  1 i |Xt  j )  Q ( X i ) whenever P ( Xt  1  j | Xt  i )  0 then, P*(X) = Q(X) • This gives a local criteria for checking that the chain will have the right stationary distribution Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 130 MCMC Methods • We can use these results to sample from P(X1,…,Xn|e) Idea: • Construct an ergodic & aperiodic Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e) • Simulate the chain n steps to get a sample Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 131 MCMC Methods Notes: • The Markov chain variable Y takes as value assignments to all variables that are consistent evidence V (Y )  { x 1 ,..., x n V ( X 1 )   V ( X 1 ) | x 1 ,..., x n satisfy e } • For simplicity, we will denote such a state using the vector of variables Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 132 4.2 Two MCMC Sampling Algorithms • Gibbs Sampler • Metropolis-Hastings Sampler Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 133 Gibbs Sampler • One of the simplest MCMC method • Each transition changes the state of one Xi • The transition probability defined by P itself as a stochastic procedure: – Input: a state x1,…,xn – Choose i at random (uniform probability) – Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e) – let x’j = xj for all j  i – return x’1,…,x’n Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 134 Correctness of Gibbs Sampler • How do we show correctness? Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 135 Correctness of Gibbs Sampler • By chain rule P(x1,…,xi-1, xi, xi+1,…,xn|e) = P(x1,…,xi-1, xi+1,…,xn|e)P(xi|x1,…,xi-1, xi+1,…,xn, e) • Thus, we get Transition P ( x 1 ,, x i  1 , x i , x i  1 ,, x n |e ) P ( x i |x 1 ,, x i  1 , x i  1 ,, x n ,e ) P ( x 1 ,, x i  1 , x i , x i  1 ,, x n |e )  P ( x i |x 1 ,, x i  1 , x i  1 ,, x n ,e ) • Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 136 Gibbs Sampling for Bayesian Network • Why is the Gibbs sampler “easy” in BNs? • Recall that the Markov blanket of a variable separates it from the other variables in the network – P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi ) • This property allows us to use local computations to perform sampling in each transition Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 137 Gibbs Sampling in Bayesian Networks • How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ? • Let Y1, …, Yk be the children of Xi – By definition of Mbi, the parents of Yj are in Mbi{Xi} • It is easy to show that P ( xi | Pa i ) P ( y j | pa y j ) P ( xi | Mb i )  j  P ( x | Pa ) P ( y x i i i j j | pa y j ) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 138 Metropolis-Hastings • More general than Gibbs (Gibbs is a special case of M-H) • Proposal distribution arbitrary q(x’|x) that is ergodic and aperiodic (e.g., uniform) • Transition to x’ happens with probability (x’|x)=min(1, P(x’)q(x|x’)/P(x)q(x’|x)) • Useful when computing P(x) infeasible • q(x’|x)=0 implies P(x’)=0 or q(x|x’)=0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 139 Sampling Strategy • How do we collect the samples? Strategy I: • Run the chain M times, each for N steps – each run starts from a different state points • Return the last state in each run M chains Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 140 Sampling Strategy Strategy II: • Run one chain for a long time • After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 141 Comparing Strategies Strategy I: – Better chance of “covering” the space of points especially if the chain is slow to reach stationarity – Have to perform “burn in” steps for each chain Strategy II: – Perform “burn in” only once – Samples might be correlated (although only weakly) Hybrid strategy: – Run several chains, sample few times each – Combines benefits of both strategies Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 142 Short Summary - Approximate Inference • Monte Carlo (sampling with positive and negative error) Methods: – Pos: Simplicity of implementation and theoretical guarantee of convergence – Neg: Can be slow to converge and hard to diagnose their convergence. • Variational Methods – Your presentation • Loopy Belief Propagation and Generalized Belief Propagation -- Your presentation Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 143 Exercise: MCMC Sampling p(smart)=.8 p(study)=.6 Query: What is the probability smart study that a student studied, given that they pass the exam? p(fair)=.9 prepared fair p(prep|…) smart smart pass study .9 .7 smart smart study .5 .1 p(pass|…) prep prep prep prep fair .9 .7 .7 .2 fair .1 .1 .1 .1 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 144 Main Computational Problems 1. Difficult to tell if convergence has been achieved 2. Can be wasteful if Markov blanket is large – P(Xi|MB(Xi)) wont change much (law of large numbers) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 145 5. Loopy Belief Propagation • TBU Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 146 6. Variational Methods • TBU Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 147 7. Implementation by PNL PNL GeNIe Enumeration v (Naïve) Variable Elimination Belief Propagation v (Pearl) v (Polytree) Junction Tree v v (Clustering) Direct Sampling v (Logic) Likelihood Sampling v(LWSampling) v(Likelihood sampling) MCMC Sampling v(Gibbswithanneal) (Other 5 samplings) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 148 8. Summary • Exact inference by variable elimination – Polytime on polytrees – NP-hard on general graphs – Space = time, very sensitive to topology Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 149 Summary • Approximate inference by LW, MCMC – LW does poorly when there is lots of (downstream) evidence – LW, MCMC generally insensitive to topology – Convergence can be very slow with probabilities close to 1 or 0 – Can handle arbitrary combinations of discrete and continuous variables Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 150 Summary • What we know – What is a Bayesian network – How to inference, given a Bayesian network • However, we still need to know – How to learn CPTs – How to build or automatically learn the structure of a Bayesian network by given a set of data Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 151 9. References • General Introduction to Probabilistic Inference in BN – B. D’Ambrosio, Inference in Bayesian networks, AI Magazine, 1999. – M. I. Jordan & Y. Weiss, Probabilistic Inference in graphical models,. – Andrieu, C., De Freitas, J., Doucet, A., & Jordan, M. I. (in press). An introduction to MCMC for machine learning. Machine Learning, vol. 50, pp.5-43, 2003.. Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 152 Recent Books • R. E. Neapolitan, Learning Bayesian Networks, Prentice Hall, 2004. • C. Borgelt and R. Kruse, Graphical Models: methods for data analysis and mining, Wiley, 2002. • D. Edwards, Introduction to Graphical Modelling, 2nd, Springer, 2000. • S. L. Lauritzen, Graphical Models, Oxford, 1996. • M. I. Jordan (ed.), Learning in Graphical Models, MIT, 2001. Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 153 Appendix • Theoretical analysis of approximation error Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 154 Types of Approximations Absolute error 1 • An estimate q of P(X=x|e) has absolute error , if P(X=x|e) -   q  P(X=x|e) +  equivalently q -   P(X = x|e)  q +  2 q • Not always what we want: error 0.001 – Unacceptable if P(X = x | e) = 0.0001, 0 – Overly precise if P(X = x | e) = 0.3 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 155 Types of Approximations Relative error 1 • An estimate q of P(X=x|e) has relative error , if P(X=x|e)(1-)  q  P(X=x|e)(1+) equivalently q/(1-) q/(1+)  P(X=x|e)  q/(1-) q • Sensitivity of approximation q/(1+) depends on actual value of desired result 0 Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 156 Complexity • Recall, exact inference is NP-hard • Is approximate inference any easier? • Construction for exact inference: – Input: a 3-SAT problem  – Output: a BN such that P(X=t) > 0 iff  is satisfiable Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 157 Complexity: Relative Error • Suppose that q is a relative error estimate of P(X = t), • If  is not satisfiable, then 0 = P(X = t)(1 - )  q  P(X = t)(1 + ) = 0 Thus, if q > 0, then  is satisfiable An immediate consequence: Thm: Given , finding an -relative error approximation is NP-hard Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 158 Complexity: Absolute error • Thm: If  < 0.5, then finding an estimate of P(X=x|e) with  absulote error approximation is NP-Hard Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 159 Likelihood Weighting • Can we ensure that all of our sample satisfy e? • One simple solution: –When we need to sample a variable that is assigned value by e, use the specified value • For example: we know Y = 1 –Sample X from P(X) X Y –Then take Y = 1 • Is this a sample from P(X,Y |Y = 1) ? Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 160 Likelihood Weighting • Problem: these samples of X from P(X) • Solution: X Y – Penalize samples in which P(Y=1|X) is small • We now sample as follows: – Let x[i] be a sample from P(X) – Let w[i] be P(Y = 1|X = x [i])  w [i ]P (X  x| x [i]) P (X  x |Y  1 )  i  w [i ] i Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 161 Likelihood Weighting • Why does this make sense? • When N is large, we expect to sample NP(X = x) samples with x[i] = x • Thus,  w  NP ( X  x )P (Y  1 | X  x ) i i , x [i ]  x  NP ( X  x ,Y  1) • When we normalize, we get approximation of the conditional probability Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 162 Likelihood Weighting P(b) 0.03 0.03 P(e) 0.001 Earthquake Burglary be be be be Radio Alarm P(a) 0.98 0.7 0.4 0.01 e e =r =a P(r) 0.3 0.001 Call a a B E A C R Weight P(c) 0.8 0.05 b Samples: Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 163 Likelihood Weighting P(b) 0.03 P(e) 0.001 0.001 Earthquake Burglary be be be be Radio Alarm P(a) 0.98 0.7 0.4 0.01 e e =r =a P(r) 0.3 0.001 Call a a B E A C R Weight P(c) 0.8 0.05 b e Samples: Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 164 Likelihood Weighting P(b) 0.03 P(e) 0.001 Earthquake Burglary be be be be Radio Alarm P(a) 0.98 0.7 0.4 0.01 e e =r =a P(r) 0.3 0.001 Call a a B E A C R Weight P(c) 0.8 0.05 b e a 0.6 Samples: Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 165 Likelihood Weighting P(b) 0.03 P(e) 0.001 Earthquake Burglary be be be be Radio Alarm P(a) 0.98 0.7 0.4 0.01 e e =r =a P(r) 0.3 0.001 Call a a B E A C R Weight P(c) 0.8 0.05 0.05 b e a c 0.6 Samples: Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 166 Likelihood Weighting P(b) 0.03 P(e) 0.001 Earthquake Burglary be be be be Radio Alarm P(a) 0.98 0.7 0.4 0.01 e e =r =a P(r) 0.3 0.001 Call a a B E A C R Weight P(c) 0.8 0.05 b e a c r 0.6 *0.3 Samples: Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 167 Likelihood Weighting • Let X1, …, Xn be order of variables consistent with arc direction •w=1 • for i = 1, …, n do –if Xi = xi has been observed • w w* P(Xi = xi | pai ) –else • sample xi from P(Xi | pai ) • return x1, …,xn, and w Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 168 Importance Sampling • A method for evaluating expectation of f under P(x), <f>P(X) • Discrete:  f   f ( x) P ( x) P( X )  x • Continuous:  f  P( X )   f ( x ) P ( x ) dx • If we could sample from P 1  f  P ( X )   f ( x[ r ]) R r Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 169 Importance Sampling A general method for evaluating <f>P(X) when we cannot sample from P(X). Idea: Choose an approximating distribution W(X) Q(X) and sample from it Q (X ) P (X ) f (x )   f ( x )P ( x )dx   f ( x )P ( x ) dx  f ( x ) P (X ) x x Q (X ) Q (X ) Q (X ) Using this we can now sample from Q and 1 1 M M then f ( x )   f ( X [ m ])  P (X )  f ( x [ m ]) w ( m ) M m 1 M m 1 If we could generate Now that we generate samples from P(X) the samples from Q(X) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 170 (Unnormalized) Importance Sampling 1. For m=1:M Sample X[m] from Q(X) Calculate W(m) = P(X)/Q(X) 1 2. Estimatefthe)expectationf of f(X)wusing M (x   ( x [ m ]) ( m ) P (X ) M m 1 Requirements:  P(X)>0  Q(X)>0 (don’t ignore possible scenarios)  Possible to calculate P(X),Q(X) for a specific X=x  It is possible to sample from Q(X) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 171 Normalized Importance Sampling Assume that we cannot evaluate P(X=x) but can evaluate P’(X=x) = P(X=x) (ex., we can evaluate P(X) but not P(X|e) in a Bayesian network) We define w’(X) = P’(X)/Q(X). We can then evaluate : P ( X ) w ( X ) Q X   Q ( X )   P ( x )  α x Q (X ) ( ) x and then: Q (X ) f (x )   f ( x )P ( x )dx   f ( x )P ( x ) dx  P (X ) x x Q (X ) 1 Q (X ) 1 f ( X )w ( X ) Q (X )  f ( x ) P ( x ) dx  f ( X )w ( X )  α x Q (X ) α Q (X ) w ( X ) Q (X ) In the last step we simply replace  with the above equation Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 172 Normalized Importance Sampling We can now estimate the expectation of f(X) similarly to unnormalized importance sampling by sampling x[m] from Q(X) and then M  f ( x [ m ]) w ( m ) f (x) P(X )  m 1 M m 1 w (m ) (hence the name “normalized”) Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 173 Importance Sampling Weaknesses • Important to choose sampling distribution with heavy tails – Not to “miss” large values of f • Many-dimensional I-S: – “Typical set” of P may take a long time to find, unless Q good approximation to P – Weights vary by factors exponential in N • Similar for Likelihood Weighting Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright