Life and Work of Judea Perl | Turing100@Persistent

654 views

Published on

Dr.Mukund Deshpande, Practice Head- Big Data, Persistent Systems Ltd. talks about Life and Work of 2011 Turing award recipient Judea Pearl

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
654
On SlideShare
0
From Embeds
0
Number of Embeds
283
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Turing Award Lecturehttp://www.youtube.com/watch?v=78EmmdfOcI8&feature=player_embedded “The Mechanization of Causal Inference: A ‘Mini Turing Test’ and Beyond,”
  • Life and Work of Judea Perl | Turing100@Persistent

    1. 1. www.persistentsys.com Life and Work of Judea Pearl March 9, 2013© 2012 Persistent Systems Ltd
    2. 2. ACM A. M. Turing Award Judea Pearl United States – 2011 Citation: For fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning.  35th Turing Award Recipient  In the Turing Centennial Year.2 © 2012 Persistent Systems Ltd
    3. 3. Quotes about Judea Pearl’s work ―Judea Pearls highly influential 1988 book (Probabilistic Reasoning in Intelligent Systems) brought probability and decision theory into AI.‖ AI becomes science (1987 – present), AIMA. Stuart Russel & Peter Norvig ―His accomplishments over the last 30 years have provided the theoretical basis for progress in artificial intelligence and led to extraordinary achievements in machine learning, and they have redefined the term thinking machine‖ Vint Cerf, Turing Award Recipeient & President of ACM. ―Before Pearl, most AI systems reasoned with Boolean logic — they understood true or false, but had a hard time with maybe’.‖ Alfred Spector, VP Research at Google3 © 2012 Persistent Systems Ltd
    4. 4. Turing Test – Defining problem for AI ―The computer passes the test if a human interrogator, after posing some written questions, cannot tell whether the written responses come from a person or not‖ Alan Turing, Computing Machinery & Intelligence (1950) The computer needs to process following capabilities  Natural Language Processing  Knowledge Representation  Automated Reasoning  Machine Learning4 © 2012 Persistent Systems Ltd
    5. 5. www.persistentsys.com About Judea Pearl© 2012 Persistent Systems Ltd
    6. 6. Background  Born: 1936, Tel Aviv, Israel  Education:  BS, Technion Israel -1960  MS, Electronics, Newark College of Engineering - 1961  MS, Physics, Rutgers – 1965  PhD. Electrical Engineering, Polytechnic Institute of Brooklyn, 1965  Professional Career:  Member Technical Staff, RCA Research Laboratories, 1961 - 1965  Director, Electronic Memories, Inc., Hawthorne, 1966–1969)  Faculty, University of California, Los Angeles, Computer Science Department, 1969 – to date (Emeritus Faculty since 1994)  Director of Cognitive Systems Laboratory, UCLA (1978-)6 © 2012 Persistent Systems Ltd
    7. 7. Research Interests  Early Research:  Magnetic and superconducting memories.  Combinatorial Search - A* Search  Heuristics: Intelligent Search Strategies for Computer Problem Solving,  Probability & Decision Theory  Probabilistic Reasoning in Intelligent Systems  Causality & its applications in different domains  Causality: Models, Reasoning, and Inference7 © 2012 Persistent Systems Ltd
    8. 8. Daniel Pearl  Journalist, Musician  Wall Street Journal (South Asia Bureau Chief 2002)  Kidnapped & Murdered in Karachi, 2002  Daniel Pearl Foundation  Promotes cross-cultural understanding through journalism and music.  Formed by Ruth & Judea Pearl in 2002. (1963 – 2002)8 © 2012 Persistent Systems Ltd
    9. 9. Understanding Cause & Effect CAUSALITY9 © 2012 Persistent Systems Ltd
    10. 10. Thank you for smoking !10 © 2012 Persistent Systems Ltd
    11. 11. Thank you for Smoking ! Nick Naylor • Academy of Tobacco Studies, a firm that promotes the benefits of cigarettes. • Evangelist for Tobacco products. Ortolan K. Finnistire • Senator from Vermont – Anti Tobacco • Pass a resolution to put a skull & bones symbol on Cigarette cases.11 © 2012 Persistent Systems Ltd
    12. 12. Cigarette Smoking causes Lung Cancer ? Cause: Smoking Cigarettes Effect: Lung Cancer Eating Cheese Leads to Heart Disease ? Cause: Eating cheese Effect: Heart Disease12 © 2012 Persistent Systems Ltd
    13. 13. Which of these are actually Causal ?  Eating high protein food leads to Weight Loss.  Eating Aspirin reduces the risk of Heart Attack.  Women’s empowerment reduces population birth rate.  Bigger search button on a web page increases click-through.  Drinking Milk with additives increases height of kids.  Lower class size improve learning.  Carbon Emissions cause global warming.  Reducing Taxes increases job creation.  Lower interest rates leads to improved economy.  Higher pay leads to reduced attrition.13 © 2012 Persistent Systems Ltd
    14. 14. Pearl’s Riddles of Causation What patterns of experience convince people that connection is causal. What difference will it make if I told you that a certain connection is causal or not causal.14 © 2012 Persistent Systems Ltd
    15. 15. www.persistentsys.com Why study Cause & Effect (Causality)?© 2012 Persistent Systems Ltd
    16. 16. www.persistentsys.com Why should Computer Scientists study Causality ?© 2012 Persistent Systems Ltd
    17. 17. From a Pulitzer Prize–winning investigative reporter at The New York Times comes the explosive story of the rise of the processed food industry and its link to the emerging obesity epidemic. Michael Moss reveals how companies use salt, sugar, and fat to addict us and, more important, how we can fight back17 © 2012 Persistent Systems Ltd
    18. 18. www.persistentsys.com The Art and Science of Cause and Effect – Judea Pearl Transcript of lecture given Thursday, October 29, 1996, UCLA 81st Faculty Research Lecture Series© 2012 Persistent Systems Ltd
    19. 19. www.persistentsys.com Causality – A historical perspective© 2012 Persistent Systems Ltd
    20. 20. David Hume - Philosopher  “"Treatise of Human Nature“ – David Hume “Thus we remember to have seen that species of object we call FLAME, and to have felt that species of sensation we call HEAT. We likewise call to mind their constant conjunction in all past instances. Without any farther ceremony, we call the one CAUSE and the other EFFECT, and infer the existence of the one from that of the other.― (1711 –1776)20 © 2012 Persistent Systems Ltd
    21. 21. www.persistentsys.com Correlation© 2012 Persistent Systems Ltd
    22. 22. Francis Galton & Karl Pearson  Study of Inheritance of intelligence  Study of fore-arm & height measurements “Co- relation must be the consequence Francis Galton of the variations of the two organs being (1822 - 1911) partly due to common causes.“ Karl Pearson (1857-1936)22 © 2012 Persistent Systems Ltd
    23. 23. Correlation & Dependence  Correlation: It is a measure of relationship between two mathematical variables or measured data values  Correlation coefficient  Pearson’s correlation coefficient23 © 2012 Persistent Systems Ltd
    24. 24. Correlation is NOT Causation !  Careful inferring Causation from Correlation !  Indicates possibility of predictive relationship  Correlation is not the sufficient condition for Causation.  Correlation or Causation?  Did Avas cause Housing Bubble ?  Is murder rate related to the height of a mountain range ? http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.html24  M night Shyamalan’s lack of © 2012 Persistent Systems Ltd
    25. 25. RANDOMIZED TRIAL25 © 2012 Persistent Systems Ltd
    26. 26. Sir Ronald Fisher  Randomized Controlled Trials  Only accepted way for proving causality.  First proposed by Charles Sanders Peirce in education  Promoted and formalized by Sir Ronald Fisher  Design of Experiments, (1935)26 © 2012 Persistent Systems Ltd
    27. 27. Randomized Control Trials Diagram  Four Phases for RCT in Clinical Trials  Enrollment  Intervention Allocation  Follow-up  Data Analysis27 © 2012 Persistent Systems Ltd
    28. 28. Randomized Control Trials in Web Current Search Widget Proposed Search Widget Which one is better ?28 Ronny Kohavi, http://www.exp-platform.com/Pages/default.aspx © 2012 Persistent Systems Ltd
    29. 29. Run Experiment (RCT) and Decide.  Data Driven Decision Making  Also known as A/B Testing  Google ran approximately 12,000 randomized experiments in 2009 – 10% resulted in change.  Web is ideal for running and improving using experiments.  Very low cost of running the experiment on web29 Ronny Kohavi, http://www.exp-platform.com/Pages/default.aspx © 2012 Persistent Systems Ltd
    30. 30. Things to keep in Mind  Randomization before allocation is critical  The exposure of other parameters, except for the feature under test, in Control & Treatment group should be identical.  Use statistical significance tests on the results  Large enough of sample set  Remove the random chance of obtaining result in a trial.30 © 2012 Persistent Systems Ltd
    31. 31. Challenges of Randomized Controlled Trials  In most cases running a RCT is infeasible  Economics, Anthropology, Politics  In some cases it might be illegal !  Lack of Deeper Understanding  Understanding = How things work when taken apart ! Lack of language to express causal concepts explicitly is responsible for the poor scientific activity around Causality. Judea Pearl31 © 2012 Persistent Systems Ltd
    32. 32. BEYOND RANDOMIZED TRIALS32 © 2012 Persistent Systems Ltd
    33. 33. Judea Pearl’s Contribution to Causality 1. Representation for capturing relationships between different pieces of information and their causal link.  Bayesian Networks  Capture 2. Algebra of Intervention  Do operator to capture explicit actions  Their relationship with Probability. Judea Pearl’s work gave language and notation to Causality and bought it under Mathematical Sciences33 © 2012 Persistent Systems Ltd
    34. 34. Cigarette smoking & Lung Cancer  1964 study finds that cigarette smokers have a higher chance of getting Lung Cancer  Cigarette lobby indicates a presence of unknown gene that causes urge for Nicotine and causes cancer.  Study finds that people who visit bars have a higher chance of getting lung cancer.  Doctors find a relationship between tar deposits in lung and having lung cancer.34 © 2012 Persistent Systems Ltd
    35. 35. Factors Affecting Pneumonia (An Example) From: Aronsky, D. and Haug, P.J., Diagnosing community-acquired pneumonia with a Bayesian network, In: Proceedings of the Fall Symposium of the American Medical Informatics Association, (1998) 632-35 636. 35 © 2012 Persistent Systems Ltd
    36. 36. Challenge ! How can I establish a causal relationship between smoking and lung cancer using this data ?  P(Cancer | smoking) ?? P(Cancer)  P(cancer | smoking) > P(Cancer)  P (cancer | smoking) = P(Cancer)36 © 2012 Persistent Systems Ltd
    37. 37. A Tutorial on Bayesian Networks - Oregon State University www.persistentsys.com A Tutorial on Bayesian Networks - Oregon State University Primer on Probability A Tutorial on Bayesian Networks, Weng-Keen Wong - Oregon State University © 2012 Persistent Systems Ltd
    38. 38. Probability Primer: Random Variables  A random variable is the basic element of probability  Refers to an event and there is some degree of uncertainty as to the outcome of the event  For example, the random variable A could be the event of getting a heads on a coin flip38 © 2012 Persistent Systems Ltd
    39. 39. Boolean Random Variables We deal with the simplest type of random variables – Boolean ones Take the values true or false Think of the event as occurring or not occurring Examples (Let A be a Boolean random variable): A = Getting heads on a coin flip A = It will rain today39 A = There is a typo in these slides © 2012 Persistent Systems Ltd
    40. 40. Probabilities We will write P(A = true) to mean the probability that A = true. What is probability? It is the relative frequency with which an outcome would be obtained if the process were repeated a large number of times under similar conditions* The sum of the red and blue areas is 1 P(A = true) *Ahem…there’s also the Bayesian definition which says probability is your degree of belief in an outcome P(A = false)40 © 2012 Persistent Systems Ltd
    41. 41. Conditional Probability  P(A = true | B = true) = Out of all the outcomes in which B is true, how many also have A equal to true  Read this as: “Probability of A conditioned on B” or “Probability of A given B” H = “Have a headache” F = “Coming down with Flu” P(F = true) P(H = true) = 1/10 P(F = true) = 1/40 P(H = true | F = true) = 1/2 P(H = “Headaches are rare and flu is rarer, but if true) you’re coming down with flu there’s a 50-41 50 chance you’ll have a headache.” © 2012 Persistent Systems Ltd
    42. 42. The Joint Probability Distribution  We will write P(A = true, B = true) to mean “the probability of A = true and B = true”  Notice that: P(H=true|F=true) P(F = true) Area of " H and F" region Area of " F" region P(H true, F true) P(F true) P(H = true) In general, P(X|Y)=P(X,Y)/P(Y)42 © 2012 Persistent Systems Ltd
    43. 43. The Joint Probability Distribution A B C P(A,B,C)  Joint probabilities can be false false false 0.1 between any number of false false true 0.2 variables false true false 0.05 false true true 0.05 eg. P(A = true, B = true, C = true false false 0.3 true) true false true 0.1 true true false 0.05  For each combination of true true true 0.15 variables, we need to say how probable that combination is Sums to 1  The probabilities of these43 combinations need to sum to 1 © 2012 Persistent Systems Ltd
    44. 44. The Joint Probability Distribution  Once you have the joint probability A B C P(A,B,C) distribution, you can calculate any false false false 0.1 probability involving A, B, and C false false true 0.2  Note: May need to use marginalization false true false 0.05 and Bayes rule, (both of which are not false true true 0.05 discussed in these slides) true false false 0.3 true false true 0.1 Examples of things you can compute: true true false 0.05 true true true 0.15 • P(A=true) = sum of P(A,B,C) in rows with A=true • P(A=true, B = true | C=true) = P(A = true, B = true, C = true) / P(C = true)44 © 2012 Persistent Systems Ltd
    45. 45. The Problem with the Joint Distribution  Lots of entries in the table to fill A B C P(A,B,C) up! false false false 0.1  For k Boolean random variables, false false true 0.2 you need a table of size 2k false true false 0.05  How do we use fewer numbers? false true true 0.05 Need the concept of true false false 0.3 independence true false true 0.1 true true false 0.05 true true true 0.1545 © 2012 Persistent Systems Ltd
    46. 46. Independence Variables A and B are independent if any of the following hold:  P(A,B) = P(A) P(B)  P(A | B) = P(A)  P(B | A) = P(B) This says that knowing the outcome of A does not tell me anything new about the outcome of B.46 © 2012 Persistent Systems Ltd
    47. 47. Independence How is independence useful? Suppose you have n coin flips and you want to calculate the joint distribution P(C1, …, Cn) If the coin flips are not independent, you need 2n values in the table If the coin flips are independent, then n Each P(Ci) table has 2 entries and P ( C 1 ,..., C n ) P (C i ) there are n of them for a total of 2n i 1 values47 © 2012 Persistent Systems Ltd
    48. 48. Conditional Independence Variables A and B are conditionally independent given C if any of the following hold:  P(A, B | C) = P(A | C) P(B | C)  P(A | B, C) = P(A | C)  P(B | A, C) = P(B | C) Knowing C tells me everything about B. I don’t gain anything by knowing A (either because A doesn’t influence B or because knowing C provides all the information knowing A would give)48 © 2012 Persistent Systems Ltd
    49. 49. www.persistentsys.com Bayesian Networks© 2012 Persistent Systems Ltd
    50. 50. A Bayesian Network A Bayesian network is made up of two things A 1. A Directed Acyclic Graph B C D 2. A set of tables for each node in the graph A P(A) A B P(B|A) B D P(D|B) B C P(C|B) false 0.6 false false 0.01 false false 0.02 false false 0.4 true 0.4 false true 0.99 false true 0.98 false true 0.6 true false 0.7 true false 0.05 true false 0.9 true true 0.3 true true 0.95 true true 0.150 © 2012 Persistent Systems Ltd
    51. 51. A Directed Acyclic Graph Each node in the graph is a A node X is a parent of another random variable node Y if there is an arrow from node X to node Y eg. A is a parent A of B B C D Informally, an arrow from node X to node Y means X has a direct influence on Y51 © 2012 Persistent Systems Ltd
    52. 52. A Set of Tables for Each Node A P(A) A B P(B|A) Each node Xi has a conditional false 0.6 false false 0.01 probability distribution P(Xi | true 0.4 false true 0.99 Parents(Xi)) that quantifies the effect true false 0.7 true true 0.3 of the parents on the node The parameters are the probabilities B C P(C|B) in these conditional probability false false 0.4 tables (CPTs) false true 0.6 A true false 0.9 true true 0.1 B B D P(D|B) false false 0.02 C D false true 0.98 true false 0.0552 true true 0.95 © 2012 Persistent Systems Ltd
    53. 53. Bayesian Network for Cigarette Smoking MysterySmoking| Mystery Gene) Gene Cancer Smoking in Family Late night Tar Partying P(Tar | Smoking) Deposits Lung Cancer53 © 2012 Persistent Systems Ltd
    54. 54. Bayesian Networks Two important properties: 1. Encodes the conditional independence relationships between the variables in the graph structure 2. Is a compact representation of the joint probability distribution over the variables54 54 © 2012 Persistent Systems Ltd
    55. 55. Conditional Independence The Markov condition: given its parents (P1, P2), a node (X) is conditionally independent of its non-descendants (ND1, ND2) P1 P2 ND1 X ND2 C1 C255 55 © 2012 Persistent Systems Ltd
    56. 56. The Joint Probability Distribution Due to the Markov condition, we can compute the joint probability distribution over all the variables X1, …, Xn in the Bayesian net using the formula: n P( X1 x1 ,..., X n xn ) P( X i x i | Parents ( X i )) i 1 Where Parents(Xi) means the values of the Parents of the node Xi with respect to the graph56 56 © 2012 Persistent Systems Ltd
    57. 57. Using a Bayesian Network Example Using the network in the example, suppose you want to calculate: P(A = true, B = true, C = true, D = true) = P(A = true) * P(B = true | A = true) * P(C = true | B = true) P( D = true | B = true) A = (0.4)*(0.3)*(0.1)*(0.95) B C D57 57 © 2012 Persistent Systems Ltd
    58. 58. Using a Bayesian Network Example Using the network in the example, suppose you want to calculate: This is from the graph structure P(A = true, B = true, C = true, D = true) = P(A = true) * P(B = true | A = true) * P(C = true | B = true) P( D = true | B = true) A = (0.4)*(0.3)*(0.1)*(0.95) B These numbers are from the conditional probability tables C D58 58 © 2012 Persistent Systems Ltd
    59. 59. Inference  Using a Bayesian network to compute probabilities is called inference  In general, inference involves queries of the form: E = The evidence variable(s) P( XX|=E ) query variable(s) The59 59 © 2012 Persistent Systems Ltd
    60. 60. Key Questions on Bayesian Networks  How do you build a Bayesian Networks ?  How do you compute conditional probabilities based on data ?  What about continuous variables ?  Without data how do you build Bayesian Networks ? Can you capture data from your experience in the network ?  Can you learn the structure from data ?  How do you draw inference using Bayesian Networks ?  How do you manage the computational complexity of the network for exact inference. 60 © 2012 Persistent Systems Ltd
    61. 61. www.persistentsys.com Algebra of Doing© 2012 Persistent Systems Ltd
    62. 62. Algebra of Doing  Available: Algebra of Seeing  Simplify the Bayesian Network  What is the chance it rained if by explicitly capturing an we see the grass is wet ? intervention.  P(Rain | wet) = P (wet | rain)  Causal conditional probabilities. P(rain)/P(wet) P( x |do (y))  Algebra of Doing  Calculus for moving from  What is the chance that it rained Causal conditional probability to if we make the grass wet ? conditional probability.  P(rain | do(wet) ) = P(rain)62 © 2012 Persistent Systems Ltd
    63. 63. Causal Conditional Probabilities  Borrowing Ideas from Gene Randomized Controlled Trials  Hypothetical world where ?  Can we compute Smoking Cancer P(cancer| do(smoking)) ?  Allows us to override Causal Tar influences for that variable.63 © 2012 Persistent Systems Ltd
    64. 64. Using Causal Conditional Probabilities  Setup an intervention in Intervention Bayesian networks  Override all other Causal influences in presence of Smoking Cancer intervention. P(Tar | do(smoking) Tar  Convert from do calculus to observational calculus.64 © 2012 Persistent Systems Ltd
    65. 65. www.persistentsys.com Summary© 2012 Persistent Systems Ltd
    66. 66. To summarize  Correlation is NOT Causation  Randomized Controlled Trials (RCT) can establish causation.  Want more ?  Study Causality !66 © 2012 Persistent Systems Ltd
    67. 67. References  Causality: Models, Reasoning, and Inference  Judea Pearl, Second Edition  A Tutorial on Learning With Bayesian Networks , David Heckerman  Technical Report, Microsoft Research.  Bayesian Networks without Tears, Eugene Cherniak  AI Magazine, 1991  If Correlation does not imply Causation, what does ?  Michael Nielson blog67 © 2012 Persistent Systems Ltd
    68. 68. Thank You Persistent Systems Limited www.persistentsys.com68 68 © 2012 Persistent Systems Ltd © 2012 Persistent Systems Ltd

    ×