How is Bayesian Statistics 
Different? 
by Wayne Tai Lee
Goal 
● Clarify the difference between “classical and 
Bayesian Statistics 
● Lay out the pro/con with this “attitude”
One sentence definition 
Bayesian statistics is a mathematical 
framework to update beliefs as you observe 
more data.
Bayesian Update in Movies 
Recall movies where a female character 
realizes her period is late?
Movie cliché: Am I pregnant? 
● What did I do in the past month?
Movie cliché: Am I pregnant? 
● What did I do in the past month? 
– Forms a prior belief of whether I am pregnant
Movie cliché: Am I pregnant? 
● What did I do in the past month? 
– Forms a prior belief of whether I am pregnant 
● The missing period 
– Data!
Movie cliché: Am I pregnant? 
● What did I do in the past month? 
– Forms a prior belief of whether I am pregnant 
● The missing period 
– Data! 
● Belief is updated as more data is observed!
Bayesian terminology 
● Prior: your belief about pregnancy before 
seeing new data 
● Data: missing period 
● Posterior: your belief that is updated after 
seeing the data
How do we formalize this update? 
● Pregnant is a uncertain event with two 
outcomes: Yes or No
How do we formalize this update? 
● Pregnant is a uncertain event with two 
outcomes: Yes or No 
● “Days delayed of period” is a data point 
– If (Pregnant = Yes), delayed ~ 30*9 days 
– If (Pregnant = No), it might come sooner
Mathematical framework 
● “Pregnant” is a random variable: 
– P(Pregnant = Yes) = X 
– P(Pregnant = No) = (1 - X)
Mathematical framework 
● “Pregnant” is a random variable: 
– P(Pregnant = Yes) = X 
– P(Pregnant = No) = (1 - X) 
● “Days delayed of period” is another random 
variable! 
– P(days delay >= 7 days | Pregnant) = 1 
– P(days delay >= 7 days | Not Pregnant) = Y
Simplify 
● Start with the objective: 
Am I pregnant? 
i.e. P(Pregnant | Data)?
Simplify 
● Start with the objective: 
Am I pregnant? 
i.e. P(Pregnant | Data)? 
● Note all the numbers we know are the form of 
P( **** | Pregnant)
Conditional Probability! 
P(Pregnant | Data) 
= P(Data | Pregnant) P(Pregnant) / P(Data)
Conditional Probability! 
P(Pregnant | Data) 
= P(Data | Pregnant) P(Pregnant) / P(Data) 
Immediate implication: 
● If your prior says you cannot be pregnant, 
your belief cannot be changed!
“Bayes Rule” 
P(Pregnant | Data) 
= P(Data | Pregnant) P(Pregnant) / P(Data) 
= P(Data | Pregnant) P(Pregnant) / 
[ P(Data | Pregnant) P(Pregnant) + 
P(Data | Not Pregnant) P(Not Pregnant) ]
“Bayes Rule” 
P(Pregnant | Data) 
= P(Data | Pregnant) P(Pregnant) / P(Data) 
= P(Data | Pregnant) P(Pregnant) / 
[ P(Data | Pregnant) P(Pregnant) + 
P(Data | Not Pregnant) P(Not Pregnant) ] 
Why add more numbers? 
P(Data) was hard to compute, so chop it into 
pieces we know!
P(Data): Big Issue for Bayesians 
● Pregnant is binary which made this realllllly 
easy 
● In general, a lot of “tricks” are trying to 
– solve for P(Data) 
● Belief propagation in graphical models 
– getting around it 
● Sampling: MCMC 
● Approximation: Variational Bayes
Back to the key question: 
P(Pregnant | Data) 
= P(Data | Pregnant) P(Pregnant) / 
[ P(Data | Pregnant) P(Pregnant) + 
P(Data | Not Pregnant) P(Not Pregnant) ] 
= 1 * X / [ 1 * X + Y * (1 - X) ]
Back to the key question:
Can add more data 
….....almost for free! 
● Notice “Data” is quite general: 
– Can add pregnancy strips data to further 
update beliefs! 
– Treat previous outputs as priors then update 
similarly!
So.....what's the big deal? 
● Your belief matters a lot! 
– Your prior changes the outcome 
● Your prior and my prior may be different
What “could” a bad Frequentist 
Do? 
● Calculate the p-value for you, i.e. 
P(Late period | Not Pregnant) 
● Declare that you're Pregnant if this is <= 5%
What “could” a bad Frequentist 
Do? 
● Calculate the p-value for you, i.e. 
P(Late period | Not Pregnant) 
● Declare that you're Pregnant if this is <= 5% 
● Declaration has 5% false positive and a 
certain false negative rates
What “could” a bad Frequentist 
Do? 
● Calculate the p-value for you, i.e. 
P(Late period | Not Pregnant) 
● Declare that you're Pregnant if this is <= 5% 
● Declaration has 5% false positive and a 
certain false negative rates 
● Issue: Not as relevant to you! Rates are for all 
the people using this procedure...not specific 
to your case!
“not as relevant”? 
● There's no consideration of your specific case 
– There was no P(Pregnant) in the p-value 
calculation 
– You could be really sure that you're not 
pregnant....doesn't change the calculation!
What would a Frequentist say? 
● P(Pregnant) = 100% or 0% 
– Fixed but unknown 
– NOT uncertain 
● …Not actually interested in a single event 
– Probabilities are defined for repeated events 
– Will not write down P(Pregnant | Data) 
– For your one case, anything could be true
What would a Frequentist say? 
● P(Pregnant) = 100% or 0% 
– Fixed but unknown 
– NOT uncertain 
● …Not actually interested in a single event 
– Probabilities are defined for repeated events 
– Will not write down P(Pregnant | Data) 
– For your one case, anything could be true 
● Would say “Go talk to a doctor”
Key difference 
● “Attitude” 
– What can be a random variable? 
● Bayesian: Uncertain events 
● Frequentist: Repeatable events
Implications of this attitude 
● Bayesian: 
– Can incorporate prior knowledge easily 
– Can update beliefs easily 
– Can tackle a wider class of problems since 
probabilities are “beliefs”
Implications of this attitude 
● Bayesian: 
– Can incorporate prior knowledge easily 
– Can update beliefs easily 
– Can tackle a wider class of problems since 
probabilities are “beliefs” 
– Must specify a model 
– Your belief can be different from mine 
● Our answers will be different!
Implications of this attitude 
● Frequentist: 
– Probabilities are more objective 
– Harder to cheat 
– Has non-parametric methods
Implications of this attitude 
● Frequentist: 
– Probabilities are more objective 
– Harder to cheat 
– Has non-parametric methods 
– Focused on repeatable events 
– Prior knowledge is introduced in an ad hoc 
format 
– Usually need lots of data
In the end... 
● Frequentist and Bayesian use the same rules 
of probabilities 
● Difference exists in set-up: “What is random?” 
– Bayesians: uncertainty in knowledge 
– Frequentist: intrinsic randomness
Take Home 
● Different problems should use different 
approaches! 
– Both schools are awesome!~ 
● Be aware of what you're using and be 
consistent!

What is bayesian statistics and how is it different?

  • 1.
    How is BayesianStatistics Different? by Wayne Tai Lee
  • 2.
    Goal ● Clarifythe difference between “classical and Bayesian Statistics ● Lay out the pro/con with this “attitude”
  • 3.
    One sentence definition Bayesian statistics is a mathematical framework to update beliefs as you observe more data.
  • 4.
    Bayesian Update inMovies Recall movies where a female character realizes her period is late?
  • 5.
    Movie cliché: AmI pregnant? ● What did I do in the past month?
  • 6.
    Movie cliché: AmI pregnant? ● What did I do in the past month? – Forms a prior belief of whether I am pregnant
  • 7.
    Movie cliché: AmI pregnant? ● What did I do in the past month? – Forms a prior belief of whether I am pregnant ● The missing period – Data!
  • 8.
    Movie cliché: AmI pregnant? ● What did I do in the past month? – Forms a prior belief of whether I am pregnant ● The missing period – Data! ● Belief is updated as more data is observed!
  • 9.
    Bayesian terminology ●Prior: your belief about pregnancy before seeing new data ● Data: missing period ● Posterior: your belief that is updated after seeing the data
  • 10.
    How do weformalize this update? ● Pregnant is a uncertain event with two outcomes: Yes or No
  • 11.
    How do weformalize this update? ● Pregnant is a uncertain event with two outcomes: Yes or No ● “Days delayed of period” is a data point – If (Pregnant = Yes), delayed ~ 30*9 days – If (Pregnant = No), it might come sooner
  • 12.
    Mathematical framework ●“Pregnant” is a random variable: – P(Pregnant = Yes) = X – P(Pregnant = No) = (1 - X)
  • 13.
    Mathematical framework ●“Pregnant” is a random variable: – P(Pregnant = Yes) = X – P(Pregnant = No) = (1 - X) ● “Days delayed of period” is another random variable! – P(days delay >= 7 days | Pregnant) = 1 – P(days delay >= 7 days | Not Pregnant) = Y
  • 14.
    Simplify ● Startwith the objective: Am I pregnant? i.e. P(Pregnant | Data)?
  • 15.
    Simplify ● Startwith the objective: Am I pregnant? i.e. P(Pregnant | Data)? ● Note all the numbers we know are the form of P( **** | Pregnant)
  • 16.
    Conditional Probability! P(Pregnant| Data) = P(Data | Pregnant) P(Pregnant) / P(Data)
  • 17.
    Conditional Probability! P(Pregnant| Data) = P(Data | Pregnant) P(Pregnant) / P(Data) Immediate implication: ● If your prior says you cannot be pregnant, your belief cannot be changed!
  • 18.
    “Bayes Rule” P(Pregnant| Data) = P(Data | Pregnant) P(Pregnant) / P(Data) = P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ]
  • 19.
    “Bayes Rule” P(Pregnant| Data) = P(Data | Pregnant) P(Pregnant) / P(Data) = P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ] Why add more numbers? P(Data) was hard to compute, so chop it into pieces we know!
  • 20.
    P(Data): Big Issuefor Bayesians ● Pregnant is binary which made this realllllly easy ● In general, a lot of “tricks” are trying to – solve for P(Data) ● Belief propagation in graphical models – getting around it ● Sampling: MCMC ● Approximation: Variational Bayes
  • 21.
    Back to thekey question: P(Pregnant | Data) = P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ] = 1 * X / [ 1 * X + Y * (1 - X) ]
  • 22.
    Back to thekey question:
  • 23.
    Can add moredata ….....almost for free! ● Notice “Data” is quite general: – Can add pregnancy strips data to further update beliefs! – Treat previous outputs as priors then update similarly!
  • 24.
    So.....what's the bigdeal? ● Your belief matters a lot! – Your prior changes the outcome ● Your prior and my prior may be different
  • 25.
    What “could” abad Frequentist Do? ● Calculate the p-value for you, i.e. P(Late period | Not Pregnant) ● Declare that you're Pregnant if this is <= 5%
  • 26.
    What “could” abad Frequentist Do? ● Calculate the p-value for you, i.e. P(Late period | Not Pregnant) ● Declare that you're Pregnant if this is <= 5% ● Declaration has 5% false positive and a certain false negative rates
  • 27.
    What “could” abad Frequentist Do? ● Calculate the p-value for you, i.e. P(Late period | Not Pregnant) ● Declare that you're Pregnant if this is <= 5% ● Declaration has 5% false positive and a certain false negative rates ● Issue: Not as relevant to you! Rates are for all the people using this procedure...not specific to your case!
  • 28.
    “not as relevant”? ● There's no consideration of your specific case – There was no P(Pregnant) in the p-value calculation – You could be really sure that you're not pregnant....doesn't change the calculation!
  • 29.
    What would aFrequentist say? ● P(Pregnant) = 100% or 0% – Fixed but unknown – NOT uncertain ● …Not actually interested in a single event – Probabilities are defined for repeated events – Will not write down P(Pregnant | Data) – For your one case, anything could be true
  • 30.
    What would aFrequentist say? ● P(Pregnant) = 100% or 0% – Fixed but unknown – NOT uncertain ● …Not actually interested in a single event – Probabilities are defined for repeated events – Will not write down P(Pregnant | Data) – For your one case, anything could be true ● Would say “Go talk to a doctor”
  • 31.
    Key difference ●“Attitude” – What can be a random variable? ● Bayesian: Uncertain events ● Frequentist: Repeatable events
  • 32.
    Implications of thisattitude ● Bayesian: – Can incorporate prior knowledge easily – Can update beliefs easily – Can tackle a wider class of problems since probabilities are “beliefs”
  • 33.
    Implications of thisattitude ● Bayesian: – Can incorporate prior knowledge easily – Can update beliefs easily – Can tackle a wider class of problems since probabilities are “beliefs” – Must specify a model – Your belief can be different from mine ● Our answers will be different!
  • 34.
    Implications of thisattitude ● Frequentist: – Probabilities are more objective – Harder to cheat – Has non-parametric methods
  • 35.
    Implications of thisattitude ● Frequentist: – Probabilities are more objective – Harder to cheat – Has non-parametric methods – Focused on repeatable events – Prior knowledge is introduced in an ad hoc format – Usually need lots of data
  • 36.
    In the end... ● Frequentist and Bayesian use the same rules of probabilities ● Difference exists in set-up: “What is random?” – Bayesians: uncertainty in knowledge – Frequentist: intrinsic randomness
  • 37.
    Take Home ●Different problems should use different approaches! – Both schools are awesome!~ ● Be aware of what you're using and be consistent!