- 2. tl;dr
- 3. Bayes enables updating beliefs by (fancy) counting
- 5. Bayes’ Theorem - What is it good for?
- 6. Bayes’ Theorem - What is it good for?
- 7. Bayes’ Theorem - What is it good for?
- 8. Some Zynga-ish examples Bayesian A/B testing (T2 subsidiary) Mixed Media Model (Cathy) Creative Health Monitoring (Ivy) Growth Scorecard
- 9. What to expect from this workshop What is it? Why is it true? When is it useful? How to use it? Specifically how to use it in my models Levels of Understanding
- 10. Who is Steve? “Steve is very shy and withdrawn, invariably helpful but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.” Is Steve more likely to be a librarian or a farmer? 10% 90%
- 11. Kahneman and Tversky “Steve is very shy and withdrawn, invariably helpful but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.” Is Steve more likely to be a librarian or a farmer? 10% 90%
- 12. Kahneman and Tversky “Steve is very shy and withdrawn, invariably helpful but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.” Is Steve more likely to be a librarian or a farmer? 10% 90%
- 16. 20 1
- 17. Spoiler Alert We are going to describe this in pictures 20 1
- 18. 200 10
- 19. 200 10 “Steve is very shy and withdrawn, invariably helpful but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.” Is Steve more likely to be a librarian or a farmer?
- 20. 200 10 40%
- 21. 200 10 40% 10%
- 22. 200 4 40% 10%
- 23. 4 40% 10% 20
- 25. Heart of Bayes Theorem All Possibilities All Possibilities fitting evidence P(Librarian given the evidence)
- 26. Heart of Bayes Theorem All Possibilities All Possibilities fitting evidence P(Librarian given the evidence)
- 27. Heart of Bayes Theorem For each possibility Count the number of ways evidence can happen Make this ratio: P(Librarian given the evidence)
- 28. When to use Bayes Rule We have a hypothesis We observe some evidence “Steve is very shy and withdrawn, invariably helpful but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.” We want P(H|E) P(hypothesis given evidence)
- 29. Heart of Bayes Theorem All Possibilities All Possibilities fitting evidence P(Librarian given the description) How can we write this more mathematically?
- 30. Goal: P(H|E) P(hypothesis given evidence) P(Librarian given description)
- 31. Goal: P(H|E) P(H) = 1/21
- 32. Goal: P(H|E) Prior → P(H) = 1/21
- 33. Goal: P(H|E) Prior → P(H) = 1/21 P(E|H) = 0.4
- 34. Goal: P(H|E) Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood
- 35. Goal: P(H|E) Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1
- 36. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 P(H|E)
- 37. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 P(H|E) =
- 38. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) =
- 39. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) =
- 40. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = = (# ) 210
- 41. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = = (# ) 210
- 42. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = = (# ) P(H) 10
- 43. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = = (# ) P(H) P(E|H) 4
- 44. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = = (# ) P(H) P(E|H) 4 (# ) P(H) P(E|H) 4
- 45. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = = (# ) P(H) P(E|H) (# ) P(H) P(E|H) + (# ) 4 4 210
- 46. P(￢H) = 20/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = = (# ) P(H) P(E|H) (# ) P(H) P(E|H) + (# ) P(￢H) 4 4 200
- 47. P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = = (# ) P(H) P(E|H) (# ) P(H) P(E|H) + (# ) P(￢H)P(E|￢H) 4 4 20 P(￢H) = 20/21
- 48. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = = (# ) P(H) P(E|H) (# ) P(H) P(E|H) + (# ) P(￢H) P(E|￢H)
- 49. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = P(H) P(E|H) P(H) P(E|H) + P(￢H) P(E|￢H)
- 50. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = P(H) P(E|H) P(H) P(E|H) + P(￢H) P(E|￢H) Bayes Theorem
- 51. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = P(H) P(E|H) P(H) P(E|H) + P(￢H) P(E|￢H) Bayes Theorem P(E)
- 52. Prior → P(H) = 1/21 P(E|H) = 0.4 Likelihood P(E|￢H) = 0.1 + P(H|E) = P(H) P(E|H) Bayes Theorem P(E) Posterior
- 53. P(H|E) = P(H) P(E|H) Bayes Theorem P(E)
- 54. P(H|E) = P(H) P(E|H) Bayes Theorem P(E) 1 1
- 55. P(H|E) = P(H) P(E|H) Bayes Theorem P(E) 1 1
- 56. P(H|E) = P(H) P(E|H) Bayes Theorem P(E) 1 1 Final Note About Steve
- 57. Bayesian Data Analysis in 3 easy steps! 1. For each possible explanation of the data 2. Count all the ways data can happen 3. Explanations with more ways to produce the data are more plausible (make a ratio)
- 58. Urns and Marbles - really?
- 68. Count ways evidence can occur
- 69. Count ways evidence can occur
- 70. Count ways evidence can occur
- 73. Count ways evidence can occur
- 75. Make the ratio
- 76. Make the ratio
- 77. Make the ratio
- 81. Bayesian Updating in 3 easy steps! 1. Make a model for how observations can happen. Do it again for each possible explanation 2. Count all the ways data can happen for each explanation 3. The relative values of step 2 give the relative plausibility
- 82. What proportion of the earth is covered by water? demo
- 83. 1. For each possible explanation of the data 2. Count all the ways data can happen 3. Make a ratio Get more Evidence and Update
- 84. Bayes enables updating beliefs by (fancy) counting
- 85. Bayes enables updating beliefs using MCMC
- 86. When to use Bayesian inference? ● Data is limited ○ Updating Bayesian estimates converge with modest data support ○ In large sample size limit, Bayesian and Frequentist converge ● Uncertainty is important ○ Bayesian estimates are distributions ○ Probabilities are conducive to confidence metrics ● Quantifiable prior beliefs ○ We know something about the world ahead of time ○ Easy to update if what we know changes
- 87. End of Session 1 - Questions?
- 89. Take-Two: Parachute Division Strauss just announced that Take-Two is going to pivot it’s Zynga Mobile division to solely manufacturing parachute toys, called ‘Chutes. Our new mission is to “Connect the world through ‘Chutes”. Manufacturing has already started, and we have the first batch fresh off the assembly line. Uh oh! In the frenzy of the re-org, we forgot to characterize the aerodynamic performance of this toy—namely, it’s drag coefficient—which is a requirement by the World Aerial Toy Association. Legal said we are vulnerable to huge fines if we can’t provide the drag coefficient in our Terms of Service by the time we go live with our new toy. We have 1 hour to solve this problem! Our goal, is to devise an experiment that uses a Bayesian approach to quickly and confidently estimate the ‘Chute’s drag coefficient.
- 90. Aerodynamics Overview Luckily, Take-Two has a resident Rocket Scientist on retainer. Dr. Ryan has offered to give some background aerodynamics information to get us started. First, let’s draw a Free Body Diagram. Weight (W) Drag (D)
- 91. Aerodynamics Overview Our manufacturer provided us the weight measurement. Equation for Drag force: Density of air Reference area Velocity Drag coefficient What is the stable, unaccelerated state of a ‘Chute? When the Drag force balances Weight, we are gliding at a terminal velocity. Let’s assume this happens fairly fast after initial drop. Weight (W) Drag (D)
- 92. Aerodynamics Overview Once a ‘Chute is at a terminal velocity, Drag = Weight. We can solve for V. Great, but how can we find Cd? What if we try to measure V through an experiment? Weight (W) Drag (D)
- 93. Drag Coefficient Experiment Proposed experiment: Drop for a known height and record the glide time. We know: Frequentist approach: Measure a bunch of distance and time. Calculate Cd, and report the average value. The only unknown is Cd
- 94. Bayesian Approach Our experiment generates data. Our unknown random variable 𝛉 is Cd. Likelihood: For a given value of Cd, what’s the probability of generating our data? We can use our physics model! Prior: Our best guess of the probability distribution of a ‘Chute’s drag coefficient. Posterior: Updated guess of Cd, updated given the set of data we collected.
- 95. Bayesian Approach Likelihood: Prior: Expert consultant: Posterior: PyMC3 package will solve for this, and will give us posterior distributions for Cd. *Our experimental error recording time t is normally distributed
- 96. Challenge Goal: Get Strauss and Legal a Bayesian estimate of Cd! Split up into groups of two, and conduct your experiment: 1. Clone this Databricks notebook: https://dbc-019b8f42- 900e.cloud.databricks.com/?#notebook/4116061 2. Drop the ‘Chute at least 10 times from a known height and record it’s freefall time. Enter your data in the Databricks notebook. 3. Notice your prior distribution for Cd, given to you by your Expert Consultant 4. Run PyMC3 to calculate the posterior estimate of Cd. Record the distribution’s mean and standard deviation. 5. Plot your posterior estimates of Cd after: a. No data, plot the prior b. 1 data point c. 5 data points d. 10 data points We will reconvene and discuss our results after 45 minutes!
- 97. Reference Material ● Bayesian parameter estimation examples: https://epubs.siam.org/doi/pdf/10.1137/100788604 https://cimec.org.ar/ojs/index.php/mc/article/download/5564/5542 https://arxiv.org/abs/2104.08621 ● A great Bayes intro w/ Regression examples: https://bayesball.github.io/BOOK/ https://www.bayesrulesbook.com/ ● Bayesian A/B testing POC @ Zynga: https://github-ca.corp.zynga.com/kryan/BayesRozesPOC
- 98. Appendix
- 107. MCMC
- 108. MCMC You can evaluate the function, but you cannot draw sample from the function
- 120. chi-feng.github.io D E M O
- 126. When to use Bayesian inference? ● When data is lacking ○ Frequentist: large error bars, poor estimates ○ By making more assumptions, we get to leverage more information ○ In large sample size limit, Bayesian and Frequentist approaches converge ● When you desire levels of belief over True/False statements ○ Bayesian approach still has distribution for parameter, even for large sample size ○ Bayesian interpretation of probability is more intuitive to people
- 127. Bayes’ Theorem
- 128. Bayes’ Theorem
- 129. Bayes’ Theorem
- 130. Bayes’ Theorem likelihood evidence posterior Observations, data, features Outcome, label
- 131. Bayesian POV ● Experiment → prior notion + data = new (posterior) notion ● Unknown parameters → associated probabilities interpreted as “belief” in truth ● Probabilities → how well a proposition is supported by the data provided as evidence for it