1. A B RIEF INTRODUCTIONA D N A N M A S O O DS C I S . N O V A . E D U / ~ A D N A NA D N A N @ N O V A . E D UD O C T O R A L C A N D I D A T EN O V A S O U T H E A S T E R N U N I V E R S I T YBayesian Networks
2. What is a Bayesian Network? A Bayesian network (BN) is a graphical model fordepicting probabilistic relationships among a setof variables. BN Encodes the conditional independence relationships between thevariables in the graph structure. Provides a compact representation of the joint probabilitydistribution over the variables A problem domain is modeled by a list of variables X1, …, Xn Knowledge about the problem domain is represented by a jointprobability P(X1, …, Xn) Directed links represent causal direct influences Each node has a conditional probability table quantifying the effectsfrom the parents. No directed cycles
3. Bayesian Network constitutes of.. Directed Acyclic Graph (DAG) Set of conditional probability tables for each node inthe graphABC D
4. So BN = (DAG, CPD) DAG: directed acyclic graph (BN’s structure)Nodes: random variables (typically binary or discrete,but methods also exist to handle continuous variables)Arcs: indicate probabilistic dependencies betweennodes (lack of link signifies conditional independence) CPD: conditional probability distribution (BN’sparameters)Conditional probabilities at each node, usually storedas a table (conditional probability table, or CPT)
5. So, what is a DAG?ABC Ddirected acyclic graphs useonly unidirectional arrows toshow the direction ofcausationEach node in graph representsa random variableFollow the general graphprinciples such as a node A is aparent of another node B, ifthere is an arrow from node Ato node B.Informally, an arrow fromnode X to node Y means X hasa direct influence on Y
6. Where do all these numbers come from?There is a set of tables for each node in the network.Each node Xi has a conditional probability distributionP(Xi | Parents(Xi)) that quantifies the effect of the parentson the nodeThe parameters are the probabilities in these conditionalprobability tables (CPTs)ABC D
7. The infamous Burglary-Alarm ExampleBurglary EarthquakeAlarmJohn Calls Mary CallsP(B)0.001P(E)0.002B E P(A)T T 0.95T F 0.94F T 0.29F F 0.001A P(J)T 0.90F 0.05A P(M)T 0.70F 0.01
8. Cont..calculations on the belief networkUsing the network in the example, suppose you wantto calculate:P(A = true, B = true, C = true, D = true)= P(A = true) * P(B = true | A = true) *P(C = true | B = true) P( D = true | B = true)= (0.4)*(0.3)*(0.1)*(0.95)These numbers are from theconditional probability tablesThis is from thegraph structure
9. So let’s see how you can calculate P(John called)if there was a burglary? Inference from effect to cause; Given a burglary,what is P(J|B)? Can also calculate P (M|B) = 0.6785.0)05.0)(06.0()9.0)(94.0()|()05.0)(()9.0)(()|(94.0)|()95.0)(002.0(1)94.0)(998.0(1)|()95.0)(()()94.0)(()()|(?)|(BJPAPAPBJPBAPBAPEPBPEPBPBAPBJP
10. Why Bayesian Networks? Bayesian Probability represents the degree of beliefin that event while Classical Probability (or frequentsapproach) deals with true or physical probability ofan event• Bayesian Network• Handling of Incomplete Data Sets• Learning about Causal Networks• Facilitating the combination of domain knowledge and data• Efficient and principled approach for avoiding the over fittingof data
11. What are Belief Computations? Belief Revision Model explanatory/diagnostic tasks Given evidence, what is the most likely hypothesis to explain theevidence? Also called abductive reasoning Example: Given some evidence variables, find the state of all othervariables that maximize the probability. E.g.: We know John Calls,but not Mary. What is the most likely state? Only considerassignments where J=T and M=F, and maximize. Belief Updating Queries Given evidence, what is the probability of some other randomvariable occurring?
12. What is conditional independence?The Markov condition says that given its parents (P1, P2), anode (X) is conditionally independent of its non-descendants(ND1, ND2)XP1 P2C1 C2ND2ND1
13. What is D-Separation? A variable a is d-separated from b by a set of variablesE if there does not exist a d-connecting path between aand b such that None of its linear or diverging nodes is in E For each of the converging nodes, either it or one of itsdescendants is in E. Intuition: The influence between a and b must propagate through a d-connecting path If a and b are d-separated by E, then they areconditionally independent of each other given E:P(a, b | E) = P(a | E) x P(b | E)
14. Construction of a Belief NetworkProcedure for constructing BN: Choose a set of variables describing the applicationdomain Choose an ordering of variables Start with empty network and add variables to thenetwork one by one according to the ordering To add i-th variable Xi: Determine pa(Xi) of variables already in the network (X1, …, Xi – 1)such thatP(Xi | X1, …, Xi – 1) = P(Xi | pa(Xi))(domain knowledge is needed there) Draw an arc from each variable in pa(Xi) to Xi
15. What is Inference in BN? Using a Bayesian network to compute probabilities iscalled inference In general, inference involves queries of the form:P( X | E )where X is the query variable and E is the evidencevariable.
16. Representing causality in Bayesian Networks A causal Bayesian network, or simply causalnetworks, is a Bayesian network whose arcs areinterpreted as indicating cause-effect relationships Build a causal network: Choose a set of variables that describes the domain Draw an arc to a variable from each of its direct causes(Domain knowledge required)Visit AfricaTuberculosisX-RaySmokingLung CancerBronchitisDyspneaTuberculosis orLung Cancer
17. Limitations of Bayesian Networks• Typically require initial knowledge of manyprobabilities…quality and extent of prior knowledgeplay an important role• Significant computational cost(NP hard task)• Unanticipated probability of an event is not takencare of.
18. Summary Bayesian methods provide sound theory and framework forimplementation of classifiers Bayesian networks a natural way to represent conditional independenceinformation. Qualitative info in links, quantitative in tables. NP-complete or NP-hard to compute exact values; typical to makesimplifying assumptions or approximate methods. Many Bayesian tools and systems exist Bayesian Networks: an efficient and effective representation of the jointprobability distribution of a set of random variables Efficient: Local models Independence (d-separation) Effective: Algorithms take advantage of structure to Compute posterior probabilities Compute most probable instantiation Decision making
19. Bayesian Network Resources Repository: www.cs.huji.ac.il/labs/compbio/Repository/ Softwares: Infer.NET http://research.microsoft.com/en-us/um/cambridge/projects/infernet/ Genie: genie.sis.pitt.edu Hugin: www.hugin.com SamIam http://reasoning.cs.ucla.edu/samiam/ JavaBayes: www.cs.cmu.edu/ javabayes/Home/ Bayesware: www.bayesware.com BN info sites Bayesian Belief Network site (Russell Greiner)http://webdocs.cs.ualberta.ca/~greiner/bn.html Summary of BN software and links to software sites (Kevin Murphy)
20. References and Further Reading Bayesian Networks without Tears by Eugene Charniakhttp://www.cs.ubc.ca/~murphyk/Bayes/Charniak_91.pdf Russel, S. and Norvig, P. (1995). ArtificialIntelligence, A Modern Approach. Prentice Hall. Weiss, S. and Kulikowski, C. (1991). Computer SystemsThat Learn. Morgan Kaufman. Heckerman, D. (1996). A Tutorial on Learning withBayesian Networks. Microsoft Technical ReportMSR-TR-95-06. Internet Resources on Bayesian Networks andMachine Learning:http://www.cs.orst.edu/~wangxi/resource.html