Successfully reported this slideshow.
Upcoming SlideShare
×

# Introduction to Bayesian Statistics

738 views

Published on

Lecture slides Introduction to Bayesian Statistics

Published in: Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
• awesome

Are you sure you want to  Yes  No

### Introduction to Bayesian Statistics

1. 1. Introduction to Bayesian Statistics Machine Learning and Data Mining Philipp Singer CC image courtesy of user mattbuck007 on Flickr
2. 2. 2 Conditional Probability
3. 3. 3 Conditional Probability ● Probability of event A given that B is true ● P(cough|cold) > P(cough) ● Fundamental in probability theory
4. 4. 4 Before we start with Bayes ... ● Another perspective on conditional probability ● Conditional probability via growing trimmed trees ● https://www.youtube.com/watch?v=Zxm4Xxvzohk
5. 5. 5 Bayes Theorem
6. 6. 6 Bayes Theorem ● P(A|B) is conditional probability of observing A given B is true ● P(B|A) is conditional probability of observing B given A is true ● P(A) and P(B) are probabilities of A and B without conditioning on each other
7. 7. 7 Visualize Bayes Theorem Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/ All possible outcomes Some event
8. 8. 8 Visualize Bayes Theorem All people in study People having cancer
9. 9. 9 Visualize Bayes Theorem All people in study People where screening test is positive
10. 10. 10 Visualize Bayes Theorem People having positive screening test and cancer
11. 11. 11 Visualize Bayes Theorem ● Given the test is positive, what is the probability that said person has cancer?
12. 12. 12 Visualize Bayes Theorem ● Given the test is positive, what is the probability that said person has cancer?
13. 13. 13 Visualize Bayes Theorem ● Given that someone has cancer, what is the probability that said person had a positive test?
14. 14. 14 Example: Fake coin ● Two coins – One fair – One unfair ● What is the probability of having the fair coin after flipping Heads? CC image courtesy of user pagedooley on Flickr
15. 15. 15 Example: Fake coin CC image courtesy of user pagedooley on Flickr
16. 16. 16 Example: Fake coin CC image courtesy of user pagedooley on Flickr
17. 17. 17 Update of beliefs ● Allows new evidence to update beliefs ● Prior can also be posterior of previous update
18. 18. 18 Example: Fake coin CC image courtesy of user pagedooley on Flickr ● Belief update ● What is probability of seeing a fair coin after we have already seen one Heads
19. 19. 19 Bayesian Inference
20. 20. 20Source: https://xkcd.com/1132/
21. 21. 21 Bayesian Inference ● Statistical inference of parameters Parameters Data Additional knowledge
22. 22. 22 Coin flip example ● Flip a coin several times ● Is it fair? ● Let's use Bayesian inference
23. 23. 23 Binomial model ● Probability p of flipping heads ● Flipping tails: 1-p ● Binomial model
24. 24. 24 Prior ● Prior belief about parameter(s) ● Conjugate prior – Posterior of same distribution as prior – Beta distribution conjugate to binomial ● Beta prior
25. 25. 25 Beta distribution ● Continuous probability distribution ● Interval [0,1] ● Two shape parameters: α and β – If >= 1, interpret as pseudo counts – α would refer to flipping heads
26. 26. 26 Beta distribution
27. 27. 27 Beta distribution
28. 28. 28 Beta distribution
29. 29. 29 Beta distribution
30. 30. 30 Beta distribution
31. 31. 31 Posterior ● Posterior also Beta distribution ● For exact deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf
32. 32. 32 Posterior ● Assume – Binomial p = 0.4 – Uniform Beta prior: α=1 and β=1 – 200 random variates from binomial distribution (Heads=80) – Update posterior
33. 33. 33 Posterior ● Assume – Binomial p = 0.4 – Biased Beta prior: α=50 and β=10 – 200 random variates from binomial distribution (Heads=80) – Update posterior
34. 34. 34 Posterior ● Convex combination of prior and data ● The stronger our prior belief, the more data we need to overrule the prior ● The less prior belief we have, the quicker the data overrules the prior
35. 35. 36 So is the coin fair? ● Examine posterior – 95% posterior density interval – ROPE [1]: Region of practical equivalence for null hypothesis – Fair coin: [0.45,0.55] ● 95% HDI: (0.33, 0.47) ● Cannot reject null ● More samples→ we can [1] Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.
36. 36. 37 Bayesian Model Comparison ● Parameters marginalized out ● Average of likelihood weighted by prior Evidence
37. 37. 38 Bayesian Model Comparison ● Bayes factors [1] ● Ratio of marginal likelihoods ● Interpretation table by Kass & Raftery [1] ● >100 → decisive evidence against M2 [1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors." Journal of the american statistical association 90.430 (1995): 773-795.
38. 38. 39 So is the coin fair? ● Null hypothesis ● Alternative hypothesis – Anything is possible – Beta(1,1) ● Bayes factor
39. 39. 40 So is the coin fair? ● n = 200 ● k = 80 ● Bayes factor ● (Decent) preference for alt. hypothesis
40. 40. 41 Other priors ● Prior can encode (theories) hypotheses ● Biased hypothesis: Beta(101,11) ● Haldane prior: Beta(0.001, 0.001) – u-shaped – high probability on p=1 or (1-p)=1
41. 41. 42 Frequentist approach ● So is the coin fair? ● Binomial test with null p=0.5 – one-tailed – 0.0028 ● Chi² test
42. 42. 43 Posterior prediction ● Posterior mean ● If data large→converges to MLE ● MAP: Maximum a posteriori – Bayesian estimator – uses mode
43. 43. 44 Bayesian prediction ● Posterior predictive distribution ● Distribution of unobserved observations conditioned on observed data (train, test) Frequentist MLE
44. 44. 45 Alternative Bayesian Inference ● Often marginal likelihood not easy to evaluate – No analytical solution – Numerical integration expensive ● Alternatives – Monte Carlo integration ● Markov Chain Monte Carlo (MCMC) ● Gibbs sampling ● Metropolis-Hastings algorithm – Laplace approximation – Variational Bayes
45. 45. 46 Bayesian (Machine) Learning
46. 46. 47 Bayesian Models ● Example: Markov Chain Model – Dirichlet prior, Categorical Likelihood ● Bayesian networks ● Topic models (LDA) ● Hierarchical Bayesian models
47. 47. 48 Generalized Linear Model ● Multiple linear regression ● Logistic regression ● Bayesian ANOVA
48. 48. 49 Bayesian Statistical Tests ● Alternatives to frequentist approaches ● Bayesian correlation ● Bayesian t-test
49. 49. 50 Questions? Philipp Singer philipp.singer@gesis.org Image credit: talk of Mike West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf