Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction to Bayesian Statistics
Machine Learning and Data Mining
Philipp Singer
CC image courtesy of user mattbuck007 ...
2
Conditional Probability
3
Conditional Probability
● Probability of event A given that B is true
● P(cough|cold) > P(cough)
● Fundamental in probab...
4
Before we start with Bayes ...
● Another perspective on conditional probability
● Conditional probability via growing tr...
5
Bayes Theorem
6
Bayes Theorem
● P(A|B) is conditional probability of observing A
given B is true
● P(B|A) is conditional probability of ...
7
Visualize Bayes Theorem
Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/
All possible
outcomes
Some e...
8
Visualize Bayes Theorem
All people
in study
People having
cancer
9
Visualize Bayes Theorem
All people
in study
People where
screening test
is positive
10
Visualize Bayes Theorem
People having
positive screening
test and cancer
11
Visualize Bayes Theorem
● Given the test is positive, what is the probability that said
person has cancer?
12
Visualize Bayes Theorem
● Given the test is positive, what is the probability that said
person has cancer?
13
Visualize Bayes Theorem
● Given that someone has cancer, what is the probability that said
person had a positive test?
14
Example: Fake coin
● Two coins
– One fair
– One unfair
● What is the probability of having the fair coin
after flipping...
15
Example: Fake coin
CC image courtesy of user pagedooley on Flickr
16
Example: Fake coin
CC image courtesy of user pagedooley on Flickr
17
Update of beliefs
● Allows new evidence to update beliefs
● Prior can also be posterior of previous update
18
Example: Fake coin
CC image courtesy of user pagedooley on Flickr
● Belief update
● What is probability of seeing a fai...
19
Bayesian Inference
20Source: https://xkcd.com/1132/
21
Bayesian Inference
● Statistical inference of parameters
Parameters
Data
Additional
knowledge
22
Coin flip example
● Flip a coin several times
● Is it fair?
● Let's use Bayesian inference
23
Binomial model
● Probability p of flipping heads
● Flipping tails: 1-p
● Binomial model
24
Prior
● Prior belief about parameter(s)
● Conjugate prior
– Posterior of same distribution as prior
– Beta distribution...
25
Beta distribution
● Continuous probability distribution
● Interval [0,1]
● Two shape parameters: α and β
– If >= 1, int...
26
Beta distribution
27
Beta distribution
28
Beta distribution
29
Beta distribution
30
Beta distribution
31
Posterior
● Posterior also Beta distribution
● For exact deviation:
http://www.cs.cmu.edu/~10701/lecture/technote2_beta...
32
Posterior
● Assume
– Binomial p = 0.4
– Uniform Beta prior: α=1 and β=1
– 200 random variates from binomial distributio...
33
Posterior
● Assume
– Binomial p = 0.4
– Biased Beta prior: α=50 and β=10
– 200 random variates from binomial distributi...
34
Posterior
● Convex combination of prior and data
● The stronger our prior belief, the more data we
need to overrule the...
36
So is the coin fair?
● Examine posterior
– 95% posterior density interval
– ROPE [1]: Region of practical equivalence f...
37
Bayesian Model Comparison
● Parameters marginalized out
● Average of likelihood weighted by prior
Evidence
38
Bayesian Model Comparison
● Bayes factors [1]
● Ratio of marginal likelihoods
● Interpretation table by Kass & Raftery ...
39
So is the coin fair?
● Null hypothesis
● Alternative hypothesis
– Anything is possible
– Beta(1,1)
● Bayes factor
40
So is the coin fair?
● n = 200
● k = 80
● Bayes factor
● (Decent) preference for alt. hypothesis
41
Other priors
● Prior can encode (theories) hypotheses
● Biased hypothesis: Beta(101,11)
● Haldane prior: Beta(0.001, 0....
42
Frequentist approach
● So is the coin fair?
● Binomial test with null p=0.5
– one-tailed
– 0.0028
● Chi² test
43
Posterior prediction
● Posterior mean
● If data large→converges to MLE
● MAP: Maximum a posteriori
– Bayesian estimator...
44
Bayesian prediction
● Posterior predictive distribution
● Distribution of unobserved observations
conditioned on observ...
45
Alternative Bayesian Inference
● Often marginal likelihood not easy to evaluate
– No analytical solution
– Numerical in...
46
Bayesian (Machine) Learning
47
Bayesian Models
● Example: Markov Chain Model
– Dirichlet prior, Categorical Likelihood
● Bayesian networks
● Topic mod...
48
Generalized Linear Model
● Multiple linear regression
● Logistic regression
● Bayesian ANOVA
49
Bayesian Statistical Tests
● Alternatives to frequentist approaches
● Bayesian correlation
● Bayesian t-test
50
Questions?
Philipp Singer
philipp.singer@gesis.org
Image credit: talk of Mike West: http://www2.stat.duke.edu/~mw/ABS04...
Upcoming SlideShare
Loading in …5
×

Introduction to Bayesian Statistics

738 views

Published on

Lecture slides Introduction to Bayesian Statistics

Published in: Education

Introduction to Bayesian Statistics

  1. 1. Introduction to Bayesian Statistics Machine Learning and Data Mining Philipp Singer CC image courtesy of user mattbuck007 on Flickr
  2. 2. 2 Conditional Probability
  3. 3. 3 Conditional Probability ● Probability of event A given that B is true ● P(cough|cold) > P(cough) ● Fundamental in probability theory
  4. 4. 4 Before we start with Bayes ... ● Another perspective on conditional probability ● Conditional probability via growing trimmed trees ● https://www.youtube.com/watch?v=Zxm4Xxvzohk
  5. 5. 5 Bayes Theorem
  6. 6. 6 Bayes Theorem ● P(A|B) is conditional probability of observing A given B is true ● P(B|A) is conditional probability of observing B given A is true ● P(A) and P(B) are probabilities of A and B without conditioning on each other
  7. 7. 7 Visualize Bayes Theorem Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/ All possible outcomes Some event
  8. 8. 8 Visualize Bayes Theorem All people in study People having cancer
  9. 9. 9 Visualize Bayes Theorem All people in study People where screening test is positive
  10. 10. 10 Visualize Bayes Theorem People having positive screening test and cancer
  11. 11. 11 Visualize Bayes Theorem ● Given the test is positive, what is the probability that said person has cancer?
  12. 12. 12 Visualize Bayes Theorem ● Given the test is positive, what is the probability that said person has cancer?
  13. 13. 13 Visualize Bayes Theorem ● Given that someone has cancer, what is the probability that said person had a positive test?
  14. 14. 14 Example: Fake coin ● Two coins – One fair – One unfair ● What is the probability of having the fair coin after flipping Heads? CC image courtesy of user pagedooley on Flickr
  15. 15. 15 Example: Fake coin CC image courtesy of user pagedooley on Flickr
  16. 16. 16 Example: Fake coin CC image courtesy of user pagedooley on Flickr
  17. 17. 17 Update of beliefs ● Allows new evidence to update beliefs ● Prior can also be posterior of previous update
  18. 18. 18 Example: Fake coin CC image courtesy of user pagedooley on Flickr ● Belief update ● What is probability of seeing a fair coin after we have already seen one Heads
  19. 19. 19 Bayesian Inference
  20. 20. 20Source: https://xkcd.com/1132/
  21. 21. 21 Bayesian Inference ● Statistical inference of parameters Parameters Data Additional knowledge
  22. 22. 22 Coin flip example ● Flip a coin several times ● Is it fair? ● Let's use Bayesian inference
  23. 23. 23 Binomial model ● Probability p of flipping heads ● Flipping tails: 1-p ● Binomial model
  24. 24. 24 Prior ● Prior belief about parameter(s) ● Conjugate prior – Posterior of same distribution as prior – Beta distribution conjugate to binomial ● Beta prior
  25. 25. 25 Beta distribution ● Continuous probability distribution ● Interval [0,1] ● Two shape parameters: α and β – If >= 1, interpret as pseudo counts – α would refer to flipping heads
  26. 26. 26 Beta distribution
  27. 27. 27 Beta distribution
  28. 28. 28 Beta distribution
  29. 29. 29 Beta distribution
  30. 30. 30 Beta distribution
  31. 31. 31 Posterior ● Posterior also Beta distribution ● For exact deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf
  32. 32. 32 Posterior ● Assume – Binomial p = 0.4 – Uniform Beta prior: α=1 and β=1 – 200 random variates from binomial distribution (Heads=80) – Update posterior
  33. 33. 33 Posterior ● Assume – Binomial p = 0.4 – Biased Beta prior: α=50 and β=10 – 200 random variates from binomial distribution (Heads=80) – Update posterior
  34. 34. 34 Posterior ● Convex combination of prior and data ● The stronger our prior belief, the more data we need to overrule the prior ● The less prior belief we have, the quicker the data overrules the prior
  35. 35. 36 So is the coin fair? ● Examine posterior – 95% posterior density interval – ROPE [1]: Region of practical equivalence for null hypothesis – Fair coin: [0.45,0.55] ● 95% HDI: (0.33, 0.47) ● Cannot reject null ● More samples→ we can [1] Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.
  36. 36. 37 Bayesian Model Comparison ● Parameters marginalized out ● Average of likelihood weighted by prior Evidence
  37. 37. 38 Bayesian Model Comparison ● Bayes factors [1] ● Ratio of marginal likelihoods ● Interpretation table by Kass & Raftery [1] ● >100 → decisive evidence against M2 [1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors." Journal of the american statistical association 90.430 (1995): 773-795.
  38. 38. 39 So is the coin fair? ● Null hypothesis ● Alternative hypothesis – Anything is possible – Beta(1,1) ● Bayes factor
  39. 39. 40 So is the coin fair? ● n = 200 ● k = 80 ● Bayes factor ● (Decent) preference for alt. hypothesis
  40. 40. 41 Other priors ● Prior can encode (theories) hypotheses ● Biased hypothesis: Beta(101,11) ● Haldane prior: Beta(0.001, 0.001) – u-shaped – high probability on p=1 or (1-p)=1
  41. 41. 42 Frequentist approach ● So is the coin fair? ● Binomial test with null p=0.5 – one-tailed – 0.0028 ● Chi² test
  42. 42. 43 Posterior prediction ● Posterior mean ● If data large→converges to MLE ● MAP: Maximum a posteriori – Bayesian estimator – uses mode
  43. 43. 44 Bayesian prediction ● Posterior predictive distribution ● Distribution of unobserved observations conditioned on observed data (train, test) Frequentist MLE
  44. 44. 45 Alternative Bayesian Inference ● Often marginal likelihood not easy to evaluate – No analytical solution – Numerical integration expensive ● Alternatives – Monte Carlo integration ● Markov Chain Monte Carlo (MCMC) ● Gibbs sampling ● Metropolis-Hastings algorithm – Laplace approximation – Variational Bayes
  45. 45. 46 Bayesian (Machine) Learning
  46. 46. 47 Bayesian Models ● Example: Markov Chain Model – Dirichlet prior, Categorical Likelihood ● Bayesian networks ● Topic models (LDA) ● Hierarchical Bayesian models
  47. 47. 48 Generalized Linear Model ● Multiple linear regression ● Logistic regression ● Bayesian ANOVA
  48. 48. 49 Bayesian Statistical Tests ● Alternatives to frequentist approaches ● Bayesian correlation ● Bayesian t-test
  49. 49. 50 Questions? Philipp Singer philipp.singer@gesis.org Image credit: talk of Mike West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf

×