Lecture9 - Bayesian-Decision-Theory


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lecture9 - Bayesian-Decision-Theory

  1. 1. Introduction to Machine Learning Lecture 9 Bayesian decision theory – An introduction Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull
  2. 2. Recap of Lecture 5-8 LET’S START WITH DATA CLASSIFICATION Slide 2 Artificial Intelligence Machine Learning
  3. 3. Recap of Lectures 5-8 We want to build decision trees How can I automatically generate these types of trees? Decide which attribute we should put in each node Decide a split point Rely on information theory We also saw many other improvements Slide 3 Artificial Intelligence Machine Learning
  4. 4. Recap of Lecture 5-8 From kNN to CBR 15-NN 1-NN Key aspects Value of k Distance functions Slide 4 Artificial Intelligence Machine Learning
  5. 5. Today’s Agenda Could we use probability to classify? p y y Where all began Some anecdotes on the correct use of probabilities b biliti Slide 5 Artificial Intelligence Introduction to C++
  6. 6. Why Bother about Prob.? The world is a very uncertain place Almost 40 years of AI and ML dealing with uncertain domains Some researchers decided to employ ideas from probability to model concepts Before saying more let’s go to the beginning more… let s Slide 6 Artificial Intelligence Machine Learning
  7. 7. Meeting the Reverend Thomas Bayes Two main works: Divine Benevolence or an Attempt to Benevolence, Prove That the Principal End of the Divine Providence and Government is the Happiness of Hi C t H i f His Creatures (1731) An Introduction to the Doctrine of Fluxions, and a Defence of the Mathematicians Against the Objections of the Author of the Analyst (published anonymously in 1736) But we are especially interested in: Essay Towards Solving a Problem in the Doctrine of Chances (1764) which was actually published p yp posthumously by Richard Price yy Slide 7 Artificial Intelligence Machine Learning
  8. 8. Where These Ideas Came From? Bayes build his theory upon several ideas y yp Immanuel Kant (1724-1804) Copernican revolution: our understanding of the external world had its foundations not merely in experience, but in both experience and a priori concepts, th offering a d ii t thus ff i non-empiricist critique of rationalist philosophy Isaac Newton (1643-1727) Universal gravitation three laws of motion which dominated the scientific view of the physical universe for the next three centuries Slide 8 Artificial Intelligence Machine Learning
  9. 9. What Was Bayes’ Point Bayesian p y probability y Notion of probability interpreted as partial belief rather than as frequency Bayesian estimation Calculate the validity of a proposition On the basis of a prior estimate of its probability and new relevant evidence E.g.: Before Bayes, forward probability Bf B f d b bilit given a specified number of white and black balls in an urn, what is the probability of drawing a black ball? p y g Bayes turned its attention to the converse problem given that one or more balls have been drawn, what can be said about the number of white and black balls in the urn? Slide 9 Artificial Intelligence Machine Learning
  10. 10. Bayes’ Theorem Outputs the most probable hypothesis h∈H, given the data D + knowledge about prior probabilities of hypotheses in H Terminology: P(h|D): probability that h holds given data D. Posterior probability of h; confidence that h holds given D. P(h): prior probability of h (background knowledge we have about that h is a correct hypothesis) P(D): prior probability that training data D will be observed P(D|h): probability of observing D given h holds P (D | h )P (h ) P (h | D ) = P (D ) Slide 10 Artificial Intelligence Machine Learning
  11. 11. Bayes’ Theorem Given H the space of possible hypothesis The Th most probable h b bl hypothesis i the one that maximizes P(h|D) h i is h h ii P(h|D): P (D | h )P (h ) hMAP ≡ arg max P (h | D ) = arg max = arg max P (D | h )P (h ) P (D ) h∈H Slide 11 Artificial Intelligence Machine Learning
  12. 12. Is the Pope the Pope? The chances that a randomly chosen human being is the Pope y g p are about 1 in 6 billion Benedict XVI is the Pope p What are the chances that Benedict XVI is human? (Beck-Bornholdt (Beck Bornholdt and Dubben, 1996) Dubben Analogy to syllogistic reasoning: 1 in 6 billion Slide 12 Artificial Intelligence Machine Learning
  13. 13. So, Is the Pope an Alien? Where is the trick? Probability of the data given a hypothesis H: P(D|H) ypo es s (|) Probability of the hypothesis ge given the da a P(H|D) e data: ( | ) P(D|H) is different from P(H|D) So, i th P S is the Pope An alien? A li ? Probability of being an alien P(A) Probability of being human P(H) Probability that the pope is an alien P( Pope | Alien) P( Alien) P( Alien | Pope) = p Human) + P( P P( P Pope | H Human) P( H Pope | Ali ) P( Ali ) Alien Alien Slide 13 Artificial Intelligence Machine Learning
  14. 14. So, Is the Pope an Alien? What’s missing? g P(Pope|Alien) P(Human) P(H ) P(Alien) Considering Low values of P(Alien) and P(Pope|Alien) And large values of P(Human) f( ) We could “probably” say that the pope is not an alien! Slide 14 Artificial Intelligence Machine Learning
  15. 15. More examples: Monty Hall Stick or switch Slide 15 Artificial Intelligence Machine Learning
  16. 16. Stick or Switch I chose door number 3 Door 2 is uncovered a d contains sheep and co a s a s eep They give me the chance to change the door Should I? Use probability, not faith, to give an answer! Slide 16 Artificial Intelligence Machine Learning
  17. 17. Stick or Switch I should switch! Slide 17 Artificial Intelligence Machine Learning
  18. 18. Yet Another Example: The Defendant’s Fallacy The history of a murder A suspect was caught h DNA test was positive DNA test fails only 1 over 1 million times So, my suspect must be guilty, right? More specifically, it will be guilty with p = 0.999999. Agree? Slide 18 Artificial Intelligence Machine Learning
  19. 19. The Defendant’s Fallacy Where is the trick now? P(coincides | innocent) as opposed to P(innocent|coincides) P(coincides | innocent) commonly misused as the probability of being innocent P(innocent | coincides) is the probability of being guilty ( ) p y gg y having that the test was positive! Does this really matter? Let’s L t’ assume a city of 10 million i h bit t it f illi inhabitants We apply the test to all the 10 million inhabitants How many of them will be positive? 10 Slide 19 Artificial Intelligence Machine Learning
  20. 20. The Defendant’s Fallacy Two arguments g The prosecutor: There is 0.000001 that the suspect is innocent The d f d t In thi it f Th defendant: I this city of 10M people, the probability of th l th b bilit f the suspect being innocent is approximately 90% Who is right? The d f d t Th defendant Prove for that? You do the math Slide 20 Artificial Intelligence Machine Learning
  21. 21. Next Class How we can use these concepts in machine learning Slide 21 Artificial Intelligence Introduction to C++
  22. 22. Introduction to Machine Learning Lecture 9 Bayesian decision theory – An introduction Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull