Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Bayesian Networks  CSC 371: Spring 2012
Today’s Lecture• Recap: Joint distribution, independence, marginal  independence, conditional independence• Bayesian netwo...
Marginal Independence•   Intuitively: if X ╨ Y, then     – learning that Y=y does not change your belief in X     – and th...
Marginal Independence4
Conditional Independence• Intuitively: if X ╨ Y | Z, then   – learning that Y=y does not change your belief in X     when ...
Conditional Independence
“…probability theory is more fundamentally concerned withthe structure of reasoning and causation than withnumbers.”      ...
Bayesian Network Motivation• We want a representation and reasoning system that is based on  conditional (and marginal) in...
Bayesian Networks: Intuition• A graphical representation for a joint probability  distribution      – Nodes are random var...
Example of a simple Bayesian            network A       B    p(A,B,C) = p(A)p(B) p(C|A,B)                                 ...
Bayesian Networks: Definition11
Bayesian Networks: Definition• Discrete Bayesian networks:     – Domain of each variable is finite     – Conditional proba...
Examples of 3-way Bayesian        Networks A    B   C   Marginal Independence:              p(A,B,C) = p(A) p(B) p(C)
Examples of 3-way Bayesian        Networks            Conditionally independent effects:            p(A,B,C) = p(B|A)p(C|A...
Examples of 3-way Bayesian        Networks A       B   Independent Causes:             p(A,B,C) = p(C|A,B)p(A)p(B)     C  ...
Examples of 3-way Bayesian        Networks A   B    C   Markov dependence:              p(A,B,C) = p(C|B) p(B|A)p(A)
Example: Burglar Alarm• I have a burglar alarm that is sometimes set off by minor  earthquakes. My two neighbors, John and...
Example 5 binary variables:       B = a burglary occurs at your house       E = an earthquake occurs at your house    ...
Example: Burglar Alarm             What are the model             parameters?
Conditional probability              distributions• To specify the full joint distribution, we need to specify a  conditio...
Example: Burglar Alarm
The joint probability distribution• For each node Xi, we know P(Xi | Parents(Xi))• How do we get the full joint distributi...
Constructing a Bayesian Network:             Step 1 • Order the variables in terms of causality (may be a partial order)  ...
Constructing this Bayesian          Network: Step 2•   P(J, M, A, E, B) =      P(J | A) P(M | A) P(A | E, B) P(E) P(B)•   ...
Number of Probabilities in         Bayesian Networks• Consider n binary variables• Unconstrained joint distribution requir...
Constructing Bayesian networks1. Choose an ordering of variables X1, … , Xn2. For i = 1 to n   – add Xi to the network   –...
Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?
Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?            No
Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?            NoP(A | J, M) = P(A)?P(A | J, M) = P(A | ...
Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?            NoP(A | J, M) = P(A)?         NoP(A | J, ...
Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?              NoP(A | J, M) = P(A)?           NoP(A |...
Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?              NoP(A | J, M) = P(A)?           NoP(A |...
Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?                 NoP(A | J, M) = P(A)?              N...
Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?                NoP(A | J, M) = P(A)?             NoP...
Example contd.• Deciding conditional independence is hard in noncausal directions   – The causal direction seems much more...
A more realistic Bayes Network:               Car diagnosis•   Initial observation: car won’t start•   Orange: “broken, so...
The Bayesian Network from a different Variable Ordering
Given a graph, can we “read off” conditional independencies?
Are there wrong network              structures?• Some variable orderings yield more compact, some less  compact structure...
Summary• Bayesian networks provide a natural representation for  (causally induced) conditional independence• Topology + c...
Probabilistic inference• A general scenario:    – Query variables: X    – Evidence (observed) variables: E = e    – Unobse...
Conclusions…• Full joint distributions are intractable to work with   – Conditional independence assumptions allow us to m...
Upcoming SlideShare
Loading in …5
×

Bayesnetwork

1,155 views

Published on

Published in: Education, Technology
  • Be the first to comment

Bayesnetwork

  1. 1. Bayesian Networks CSC 371: Spring 2012
  2. 2. Today’s Lecture• Recap: Joint distribution, independence, marginal independence, conditional independence• Bayesian networks• Reading: – Sections 14.1-14.4 in AIMA [Russel & Norvig]
  3. 3. Marginal Independence• Intuitively: if X ╨ Y, then – learning that Y=y does not change your belief in X – and this is true for all values y that Y could take• For example, weather is marginally independent of the result of a coin toss3
  4. 4. Marginal Independence4
  5. 5. Conditional Independence• Intuitively: if X ╨ Y | Z, then – learning that Y=y does not change your belief in X when we already know Z=z – and this is true for all values y that Y could take and all values z that Z could take• For example, 5 ExamGrade ╨ AssignmentGrade | UnderstoodMaterial
  6. 6. Conditional Independence
  7. 7. “…probability theory is more fundamentally concerned withthe structure of reasoning and causation than withnumbers.” Glenn Shafer and Judea Pearl Introduction to Readings in Uncertain Reasoning, Morgan Kaufmann, 1990
  8. 8. Bayesian Network Motivation• We want a representation and reasoning system that is based on conditional (and marginal) independence – Compact yet expressive representation – Efficient reasoning procedures• Bayesian (Belief) Networks are such a representation – Named after Thomas Bayes (ca. 1702 –1761) – Term coined in 1985 by Judea Pearl (1936 – ) – Their invention changed the primary focus of AI from logic to probability! Thomas Bayes Judea Pearl 8
  9. 9. Bayesian Networks: Intuition• A graphical representation for a joint probability distribution – Nodes are random variables • Can be assigned (observed) or unassigned (unobserved) – Arcs are interactions between nodes • Encode conditional independence • An arrow from one variable to another indicates direct influence • Directed arcs between nodes reflect dependence – A compact specification of full joint distributions• Some informal examples: Smoking At Fire Understood Sensor Material Assignment Exam Grade Grade Alarm9
  10. 10. Example of a simple Bayesian network A B p(A,B,C) = p(A)p(B) p(C|A,B) C• Probability model has simple factored form• Directed edges => direct dependence• Absence of an edge => conditional independence• Also known as belief networks, graphical models, causal networks• Other formulations, e.g., undirected graphical models
  11. 11. Bayesian Networks: Definition11
  12. 12. Bayesian Networks: Definition• Discrete Bayesian networks: – Domain of each variable is finite – Conditional probability distribution is a conditional probability table – We will assume this discrete case • But everything we say about independence (marginal & conditional)12 carries over to the continuous case
  13. 13. Examples of 3-way Bayesian Networks A B C Marginal Independence: p(A,B,C) = p(A) p(B) p(C)
  14. 14. Examples of 3-way Bayesian Networks Conditionally independent effects: p(A,B,C) = p(B|A)p(C|A)p(A) A B and C are conditionally independent Given A e.g., A is a disease, and we modelB C B and C as conditionally independent symptoms given A
  15. 15. Examples of 3-way Bayesian Networks A B Independent Causes: p(A,B,C) = p(C|A,B)p(A)p(B) C “Explaining away” effect: Given C, observing A makes B less likely e.g., earthquake/burglary/alarm example A and B are (marginally) independent but become dependent once C is known
  16. 16. Examples of 3-way Bayesian Networks A B C Markov dependence: p(A,B,C) = p(C|B) p(B|A)p(A)
  17. 17. Example: Burglar Alarm• I have a burglar alarm that is sometimes set off by minor earthquakes. My two neighbors, John and Mary, promised to call me at work if they hear the alarm – Example inference task: suppose Mary calls and John doesn’t call. What is the probability of a burglary?• What are the random variables? – Burglary, Earthquake, Alarm, JohnCalls, MaryCalls
  18. 18. Example 5 binary variables:  B = a burglary occurs at your house  E = an earthquake occurs at your house  A = the alarm goes off  J = John calls to report the alarm  M = Mary calls to report the alarm What is P(B | M, J) ? (for example)  We can use the full joint distribution to answer this question  Requires 25 = 32 probabilities Can we use prior domain knowledge to come up with a Bayesian network that requires fewer probabilities? What are the direct influence relationships?  A burglary can set the alarm off  An earthquake can set the alarm off  The alarm can cause Mary to call  The alarm can cause John to call
  19. 19. Example: Burglar Alarm What are the model parameters?
  20. 20. Conditional probability distributions• To specify the full joint distribution, we need to specify a conditional distribution for each node given its parents: P (X | Parents(X)) Z1 Z2 … Zn X P (X | Z1, …, Zn)
  21. 21. Example: Burglar Alarm
  22. 22. The joint probability distribution• For each node Xi, we know P(Xi | Parents(Xi))• How do we get the full joint distribution P(X1, …, Xn)?• Using chain rule: n n P ( X 1 , , X n ) = ∏ P( X i | X 1 , , X i −1 ) = ∏ P( X i | Parents( X i ) ) i =1 i =1• For example, P(j, m, a, ¬b, ¬e) = P(¬b) P(¬e) P(a | ¬b, ¬e) P(j | a) P(m | a)•
  23. 23. Constructing a Bayesian Network: Step 1 • Order the variables in terms of causality (may be a partial order) e.g., {E, B} -> {A} -> {J, M} • P(J, M, A, E, B) = P(J, M | A, E, B) P(A| E, B) P(E, B) ~ P(J, M | A) P(A| E, B) P(E) P(B) ~ P(J | A) P(M | A) P(A| E, B) P(E) P(B) These CI assumptions are reflected in the graph structure of the Bayesian network
  24. 24. Constructing this Bayesian Network: Step 2• P(J, M, A, E, B) = P(J | A) P(M | A) P(A | E, B) P(E) P(B)• There are 3 conditional probability tables (CPDs) to be determined: P(J | A), P(M | A), P(A | E, B) – Requiring 2 + 2 + 4 = 8 probabilities• And 2 marginal probabilities P(E), P(B) -> 2 more probabilities • 2 + 2 + 4 + 1 + 1 = 10 numbers (vs. 25-1 = 31)• Where do these probabilities come from? – Expert knowledge – From data (relative frequency estimates) – Or a combination of both
  25. 25. Number of Probabilities in Bayesian Networks• Consider n binary variables• Unconstrained joint distribution requires O(2n) probabilities• If we have a Bayesian network, with a maximum of k parents for any node, then we need O(n 2k) probabilities• Example – Full unconstrained joint distribution • n = 30: need 109 probabilities for full joint distribution – Bayesian network • n = 30, k = 4: need 480 probabilities
  26. 26. Constructing Bayesian networks1. Choose an ordering of variables X1, … , Xn2. For i = 1 to n – add Xi to the network – select parents from X1, … ,Xi-1 such that P(Xi | Parents(Xi)) = P(Xi | X1, ... Xi-1)
  27. 27. Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?
  28. 28. Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)? No
  29. 29. Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)? NoP(A | J, M) = P(A)?P(A | J, M) = P(A | J)?P(A | J, M) = P(A | M)?
  30. 30. Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)? NoP(A | J, M) = P(A)? NoP(A | J, M) = P(A | J)? NoP(A | J, M) = P(A | M)? No
  31. 31. Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)? NoP(A | J, M) = P(A)? NoP(A | J, M) = P(A | J)? NoP(A | J, M) = P(A | M)? NoP(B | A, J, M) = P(B)?P(B | A, J, M) = P(B | A)?
  32. 32. Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)? NoP(A | J, M) = P(A)? NoP(A | J, M) = P(A | J)? NoP(A | J, M) = P(A | M)? NoP(B | A, J, M) = P(B)? NoP(B | A, J, M) = P(B | A)? Yes
  33. 33. Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)? NoP(A | J, M) = P(A)? NoP(A | J, M) = P(A | J)? NoP(A | J, M) = P(A | M)? NoP(B | A, J, M) = P(B)? NoP(B | A, J, M) = P(B | A)? YesP(E | B, A ,J, M) = P(E)?P(E | B, A, J, M) = P(E | A, B)?
  34. 34. Example• Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)? NoP(A | J, M) = P(A)? NoP(A | J, M) = P(A | J)? NoP(A | J, M) = P(A | M)? NoP(B | A, J, M) = P(B)? NoP(B | A, J, M) = P(B | A)? YesP(E | B, A ,J, M) = P(E)? NoP(E | B, A, J, M) = P(E | A, B)?Yes
  35. 35. Example contd.• Deciding conditional independence is hard in noncausal directions – The causal direction seems much more natural• Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed
  36. 36. A more realistic Bayes Network: Car diagnosis• Initial observation: car won’t start• Orange: “broken, so fix it” nodes• Green: testable evidence• Gray: “hidden variables” to ensure sparse structure, reduce parameteres
  37. 37. The Bayesian Network from a different Variable Ordering
  38. 38. Given a graph, can we “read off” conditional independencies?
  39. 39. Are there wrong network structures?• Some variable orderings yield more compact, some less compact structures – Compact ones are better – But all representations resulting from this process are correct – One extreme: the fully connected network is always correct but rarely the best choice• How can a network structure be wrong? – If it misses directed edges that are required
  40. 40. Summary• Bayesian networks provide a natural representation for (causally induced) conditional independence• Topology + conditional probability tables• Generally easy for domain experts to construct
  41. 41. Probabilistic inference• A general scenario: – Query variables: X – Evidence (observed) variables: E = e – Unobserved variables: Y• If we know the full joint distribution P(X , E, Y), how can we perform inference about X? P( X , e) P( X | E = e) = ∝ ∑ y P( X , e, y ) P (e )• Problems – Full joint distributions are too large – Marginalizing out Y may involve too many summation terms
  42. 42. Conclusions…• Full joint distributions are intractable to work with – Conditional independence assumptions allow us to model real- world phenomena with much simpler models – Bayesian networks are a systematic way to construct parsimonious structured distributions• How do we do inference (reasoning) in Bayesian networks? – Systematic algorithms exist – Complexity depends on the structure of the graph

×