0
Upcoming SlideShare
×

# Jylee probabilistic reasoning with bayesian networks

958

Published on

Published in: Technology, Education
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
958
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
31
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Transcript of "Jylee probabilistic reasoning with bayesian networks"

1. 1. Probabilistic Reasoning in Bayesian Networks KAIST AIPR Lab. Jung-Yeol Lee 17th June 2010 1
2. 2. KAIST AIPR Lab. Contents • Backgrounds • Bayesian Network • Semantics of Bayesian Network • D-Separation • Conditional Independence Relations • Probabilistic Inference in Bayesian Networks • Summary 2
3. 3. KAIST AIPR Lab. Backgrounds • Bayes’ rule  From the product rule, P( X  Y )  P( X | Y ) P(Y )  P(Y | X ) P( X )  P(Y | X )  P( X | Y ) P(Y )  P( X | Y ) P(Y ), where  is the normalization constant P( X )  Combining evidence e P( X | Y , e) P(Y | e) P(Y | X , e)  P( X | e) • Conditional independence  P( X , Y | Z )  P( X | Z ) P(Y | Z ) when X Y|Z 3
4. 4. KAIST AIPR Lab. Bayesian Network • Causal relationships among random variables • Directed acyclic graph  Node X i : random variables  Directed links: probabilistic relationships between variables  Acyclic: no links from any node to any lower node • Link from node X to node Y, X is Parent (Y ) • Conditional probability distribution of X i  P( X i | Parents ( X i ))  Effect of the parents on the node X i 4
5. 5. KAIST AIPR Lab. Example of Bayesian Network • Burglary network P(E) 0.002 P(B) Burglary Earthquake 0.001 B E P(A|B,E) T T 0.95 Alarm T F 0.94 A P(J|A) F T 0.29 T 0.90 F F 0.001 F 0.05 JohnCalls Conditional Probability Tables Directly influenced by Alarm A P(M|A) MaryCalls P( J | M  A  E  B)  P( J | A) T 0.70 F 0.01 5
6. 6. KAIST AIPR Lab. Semantics of Bayesian Network • Full joint probability distribution  Notation: P( x1 ,, xn ) abbreviated from P( X1  x1    X n  xn ) n  P( x1 ,, xn )   P( xi | parents ( X i )), i 1 where parents ( X i ) is the specific values of the variables in Parents ( X i ) • Constructing Bayesian networks n  P( x1 ,, xn )   P(xi | xi 1 ,, x1 ) by chain rule i 1  For every variable X i in the network, • P( X i | X i 1 ,, X1 )  P( X i | Parents ( X i )) provided that Parents ( X i )  {X i 1 ,, X1}  Correctness • Choose parents for each node s.t. this property holds 6
7. 7. KAIST AIPR Lab. Semantics of Bayesian Network (cont’d) • Compactness  Locally structured system • Interacts directly with only a bounded number of components  Complete network specified by n2 k conditional probabilities where at most k parents • Node ordering  Add “root causes” first  Add variables influenced, and so on  Until reach the “leaves” • “Leaves”: no direct causal influence on others 7
8. 8. KAIST AIPR Lab. Three example of 3-node graphs Tail-to-Tail Connection • Node c is said to be tail-to-tail c P(a, b)   P(a | c) P(b | c) P(c) c a b a  b| 0 c P ( a, b | c )  P(a, b, c)  P(a | c) P(b | c) P (c ) a b a b| c • When node c is observed,  Node c blocks the path from a to b  Variables a and b are independent 8
9. 9. KAIST AIPR Lab. Three example of 3-node graphs Head-to-Tail Connection • Node c is said to be head-to-tail P(a, b)  P(a) P(c | a) P(b | c)  P(a) P(b | a) a c b c a  b| 0 P(a, b, c) P(a) P(c | a) P(b | c) P ( a, b | c )    P(a | c) P(b | c) a c b P (c ) P (c ) a b| c • When node c is observed,  Node c blocks the path from a to b  Variables a and b are independent 9
10. 10. KAIST AIPR Lab. Three example of 3-node graphs Head-to-Head Connection • Node c is said to be head-to-head P(a, b, c)  P(a) P(b) P(c | a, b) a b  P(a, b, c)  P(a, b),  P(a) P(b) P(c | a, b)  P(a) P(b) c c c a  b| 0 a b P ( a, b | c )  P(a, b, c) P(a) P(b) P(c | a, b)  P (c ) P (c ) c a b| c • When node c is unobserved,  Node c blocks the path from a to b  Variables a and b are independent 10
11. 11. KAIST AIPR Lab. D-separation • Let A, B, and C be arbitrary nonintersecting sets of nodes • Paths from A to B is blocked if it includes either,  Head-to-tail or tail-to-tail node, and node is in C  Head-to-head node, and node and its descendants is not in C • A is d-separated from B by C if,  Any node in possible paths from A to B blocks the path a f a f e b e b c c a b|c a b| f 11
12. 12. KAIST AIPR Lab. Conditional Independence Relations • Conditionally independent of U1 Um its non-descendants, given its parents Z1j X Znj • Conditionally independent of Y1 Yn all other nodes, given its Markov blanket* U1 Um • In general, d-separation is used for deciding independence Z1j X Znj Y1 Yn * Parents, children, and children’s other parents 12
13. 13. KAIST AIPR Lab. Probabilistic Inference In Bayesian Networks • Notation  X: the query variable  E: the set of evidence variables, E1,…,Em  e: particular observed evidences • Compute posterior probability distribution P( X | e) • Exact inference  Inference by enumeration  Variable elimination algorithm • Approximate inference  Direct sampling methods  Markov chain Monte Carlo (MCMC) algorithm 13
14. 14. KAIST AIPR Lab. Exact Inference In Bayesian Networks Inference By Enumeration • P( X | e)  P( X , e)    P( X , e, y) where y is hidden var iable y • Recall, n  P( x1 ,, xn )   P( xi | parents ( X i )) i 1 • Computing sums of products of conditional probabilities • In Burglary example, B E  P( B | j, m)  P( B, j, m)    P( B, e, a, j, m) e a P(b | j , m)    P(b) P(e) P(a | b, e) P( j | a) P(m | a) A e a  P(b) P(e) P(a | b, e) P( j | a) P(m | a) J M e a • O(2n) time complexity for n Boolean variables 14
15. 15. KAIST AIPR Lab. Exact Inference In Bayesian Networks Variable Elimination Algorithm • Eliminating repeated calculations of Enumeration P( B | j, m)  P( B) P( E ) P(a | B, e) P( j | a) P(m | a) e a Repeated calculations 15
16. 16. KAIST AIPR Lab. Exact Inference In Bayesian Networks Variable Elimination Algorithm (cont’d) • Evaluating in right-to-left order (bottom-up) B E  P( B | j, m)  P( B) P( E ) P(a | B, e) P( j | a) P(m | a) e a • Each part of the expression makes factor A   P(m | a)   P( j | a)  J M f M ( A)   , f J ( A)    P(m | a   P( j | a       • Pointwise product  f ( A)   P( j | a) P(m | a)       P ( j |  a ) P ( m | a )  JM f AJM ( B, E )   f A (a, B, E )  f J (a)  f M (a) a f E AJM ( B)   f E (e)  f AJM ( B, e) e P( B | j , m)  f B ( B)  f E AJM ( B) 16
17. 17. KAIST AIPR Lab. Exact Inference In Bayesian Networks Variable Elimination Algorithm (cont’d) • Repeat removing any leaf node that is not a query variable or an evidence variable • In Burglary example, P( J | B  true) B E  P( J | b)  P(b) P(e) P(a | b, e) P( J | a) P(m | a) e a m A  P(b) P(e) P(a | b, e) P( J | a) e a J M • Time and space complexity  Dominated by the size of the largest factor  In the worst case, exponential time and space complexity 17
18. 18. KAIST AIPR Lab. Approximate Inference In Bayesian Networks Direct Sampling Methods • Generating of samples from known probability distribution • Sample each variable in topological order • Function Prior-Sample(bn) returns an event sampled from the prior specified by bn inputs: bn, a Bayesian network specifying joint distribution P(X1,…,Xn) x ← an event with n elements for i=1 to n do xi ← a random sample from P(Xi | parents(Xi)) return x • S PS ( x1 ,..., xn ) : the probability of specific event from Prior-Sample n S PS ( x1 ,..., xn )   P( xi | parents ( X i ))  P( x1 , , xn ) i 1 N PS ( x1 ,..., xn ) lim  S PS ( x1 ,..., xn )  P( x1 , , xn ) (Consistent estimate) N  N where N(x1,...,xn ) is the frequency of the event x1 , , xn 18
19. 19. KAIST AIPR Lab. Approximate Inference In Bayesian Networks Rejection Sampling Methods • Rejecting samples that is inconsistent with evidence • Estimate by counting how often X  x occurs  P( X | e)  N PS ( X , e)  N PS ( X , e) ˆ N PS (e) P ( X , e)   P ( X | e) (Consistent estimate) P ( e) • Rejects samples exponentially as the number of evidence variables grows 19
20. 20. KAIST AIPR Lab. Approximate Inference In Bayesian Networks Likelihood weighting • Generating only consistent events w.r.t. the evidence  Fixes the values for the evidence variables E  Samples only the remaining variables X and Y • function Likelihood-Weighting(X, e, bn, N) returns an estimate of P(X|e) local variables: W, a vector of weighted counts over X, initially zero for i=1 to N do x, w ← Weighted-Sample(bn, e) W[x] ← W[x]+w where x is the value of X in x Return Normalize(W[X]) function Weighted-Sample(bn, e) returns an event and a weight x ← an event with n elements; w ← 1 for i=1 to n do if Xi has a value xi in e then w ← w  P( X i  xi | parents ( X i )) else xi ← a random sample from P( X i | parents ( X i )) return x, w 20
21. 21. KAIST AIPR Lab. Approximate Inference In Bayesian Networks Likelihood weighting (cont’d) • Sampling distribution SWS by Weighted-Sample l  SWS ( z, e)   P( zi | parents (Zi )) where Z  {X} Y i 1 • The likelihood weight w(z,e) m  w( z, e)   P(ei | parents ( Ei )) i 1 • Weighted probability of a sample l m  SWS ( z, e)w( z, e)   P( zi | parents (Z i )) P(ei | parents ( Ei ) i 1 i 1  P ( z , e) 21
22. 22. KAIST AIPR Lab. Approximate Inference In Bayesian Networks Markov Chain Monte Carlo Algorithm • Generating event by random change to one of nonevidence variables Zi • Zi conditioned on current values in the Markov blanket of Zi • State specifying a value for every variables • Long-run fraction of time spent in each state  P( X | e) • functionvariables: N[X], e, bn, N) returns an estimate of P(X|e) local MCMC-Ask(X, a vector of counts over X, initially zero Z, the nonevidence variables in bn x, the current state of the network, initially copied from e initialize x with random values for the variables in Z for j=1 to N do for each Zi in Z do sample the value of Zi in x from P(Zi | mb(Zi )) given the values of mb( Z i ) in x N[x]←N[x] + 1 where x is the value of X in x return Normalize(N[X]) 22
23. 23. KAIST AIPR Lab. Approximate Inference In Bayesian Networks Markov Chain Monte Carlo Algorithm (cont’d) • Markov chain on the state space  q( x  x) : the probability of transition from state x to state x • Consistency  Let X i be all the hidden var iables other than X i q( x  x)  q(( xi , xi )  ( xi, xi ))  P( xi | xi , e), called Gibbs sampler  Markov chain reached its stationary distribution if it has detailed balance 23
24. 24. KAIST AIPR Lab. Summary • Bayesian network  Directed acyclic graph expressing causal relationship • Conditional independence  D-separation property • Inference in Bayesian network  Enumeration: intractable  Variable elimination: efficient, but sensitive to topology  Direct sampling: estimate posterior probabilities  MCMC algorithm: powerful method for computing with probability models 24
25. 25. KAIST AIPR Lab. References [1] Stuart Russell et al., “Probabilistic Reasoning”, Artificial Intelligence A Modern Approach, Chapter 14, pp.492-519 [2] Eugene Charniak, "Bayesian Networks without Tears", 1991 [3] C. Bishop, “Graphical Models”, Pattern Recognition and Machine Learning, Chapter 8, pp.359-418 25
26. 26. KAIST AIPR Lab. Q&A • Thank you 26
27. 27. KAIST AIPR Lab. Appendix 1. Example of Bad Node Ordering • Two more links and unnatural probability judgments ① ② MaryCalls JohnCalls ③ Alarm ④ ⑤ Burglary Earthquake 27
28. 28. KAIST AIPR Lab. Appendix 2. Consistency of Likelihood Weighting • P( x | e)    NWS ( x, y, e) w( x, y, e) ˆ from Likelihood-Weighting y   '  SWS ( x, y, e) w( x, y, e) for large N y   '  P ( x, y , e) y   ' P ( x , e)  P ( x | e) (Consistent estimate) 28
29. 29. KAIST AIPR Lab. Appendix 2. State Distribution of MCMC • Detailed balance  Let πt(x) be the probability of systembeing in state x at time t  ( x)q( x  x)   ( x)q( x  x) for all x, x • Gibbs sampler, q( x  x)  q(( xi , xi )  ( xi, xi ))  P( xi | xi , e)   ( x)q( x  x)  P( x | e) P( xi | xi , e)  P( xi , xi | e) P( xi | xi , e)  P( xi | xi , e) P( xi | e) P( xi | xi , e) by chain rule on P( xi , xi | e)  P( xi | xi , e) P( xi, xi | e) by backwards chain rule  q(x  x)  (x) • Stationary distribution if  t   t 1   t 1 ( x)    ( x)q( x  x)    ( x)q( x  x) x x   ( x) q( x  x)   ( x) x 29
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.