- 1. Hello, is it Bayes you’re looking for?
- 3. Bayes’ Theorem likelihood evidence posterior Observations, data, features Outcome, label
- 4. Bayesian POV ● Experiment → prior notion + data = new (posterior) notion ● Unknown parameters → associated probabilities interpreted as “belief” in truth ● Probabilities → how well a proposition is supported by the data provided as evidence for it
- 6. McElreath, Richard. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC, 2018.
- 37. Globeof ForkingWater For each possibleproportion of water, Count number of waysdatacould happen. Must statehow observationsaregenerated
- 38. TossT eFirst
- 41. W
- 42. × = TossT eSecond relative plaus ibility of L relative plaus ibility of W relative plaus ibility of LW
- 46. (1) No minimum samplesize
- 48. (3) No point estimate mean mode T edis tribution istheestimate Alwaysusethe entiredistribution
- 49. 0.0 1.0 2.0 proportion water density 0 0.5 1 (4) No onetrueinterval Intervals communicateshape of posterior 50%
- 50. 0.0 1.0 2.0 proportion water density 0 0.5 1 (4) No onetrueinterval Intervals communicateshape of posterior 95%isobvious superstition. Nothing magical happensat theboundary. 99%
- 53. Computingtheposterior 1. Analytical approach (of en impossible) 2. Grid approximation (very intensive) 3. Quadraticapproximation (limited) 4. Markov chain MonteCarlo (intensive)
- 54. MCMC 29.1 The problems to be solved Monte Carlo methods are computational techniques that make use of random numbers. The aims of Monte Carlo methods are to solve one or both of the following problems. Problem 1: to generate samples { x(r ) } R r = 1 from a given probability distribu- tion P(x). Problem 2: to estimate expectations of functions under this distribution, for example Φ = ⟨φ(x)⟩ ≡ dN x P(x)φ(x). (29.3) The probability distribution P(x), which we call the target density, might be a distribution from statistical physics or a conditional distribution arising in data modelling – for example, the posterior probability of a model’s pa- rameters given some observed data. We will generally assume that x is an N -dimensional vector with real components xn, but we will sometimes con- sider discrete spaces also.
- 55. MCMC You can evaluate the function, but you cannot draw sample from the function ght Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981 n buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links. 358 29 — Monte Carlo Methods (a) 0 0.5 1 1.5 2 2.5 3 -4 -2 0 2 4 P*(x) (b) 0 0.5 1 1.5 2 2.5 3 -4 -2 0 2 4 P*(x) Figure 29.1. (a) The function P∗ (x) = exp 0.4(x − 0.4)2 − 0.08x4 . How to draw samples from this density? (b) The function P∗ (x) evaluated at a discrete set of uniformly spaced points { xi } . How to draw samples from this discrete distribution? This is one of the important properties of Monte Carlo methods. University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981 or 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links. 29 — Monte Carlo Methods = dx dy P∗(x) is the volume of the lake. You are provided with a satellite navigation system, and a plumbline. Using the navigator, you your boat to any desired location x on the map; using the plumbline measure P∗(x) at that point. You can also measure the plankton ation there. lem 1 is to draw 1cm3 water samples at random from the lake, in way that each sample is equally likely to come from any point within Problem 2 is to ﬁnd the average plankton concentration. earedifficult problemsto solve becauseat theoutset weknow nothing e depth P∗ (x). Perhaps much of the volume of the lake is contained Figure 29.3. A slice through a lake that includes some canyons. w, deep underwater canyons (ﬁgure 29.3), in which case, to correctly rom the lake and correctly estimate Φ our method must implicitly
- 66. MCMC strategies • Metropolis: Granddaddy of them all • Metropolis-Hastings(MH): More general • Gibbssampling(GS): Ef cient version of MH • Metropolisand Gibbsare“guess and check” strategies • Hamiltonian MonteCarlo (HMC) fundamentally dif erent • New methodsbeingdeveloped, but futurebelongsto thegradient
- 67. chi-feng.github.io D E M O
- 68. Hamiltonian MonteCarlo • Problem with Gibbssampling(GS) • High dimension spacesareconcentrated • GSgetsstuck, degeneratestowards random walk • Inef cient becausere-explores • Hamiltonian dynamicsto therescue • represent parameter stateasparticle • f ick it around frictionlesslog-posterior • record positions • no more“guessand check” • all proposalsaregood proposals William Rowan Hamilton (1805–1865) Commemorated on Irish Euro coin
- 69. Hamiltonian parable • KingMonty’skingdom isanarrow valley N–S • Population distribution inversely proportional to altitude • Algorithm: • Start drivingrandomly N or Sat random speed • Car speedsup asit goesdownhill • Car slowsasit goesuphill, might turn around • Drivefor pre-specif ed duration, then stop • Repeat • Stoppingpositionswill beproportional to population North South
- 71. 0 100 200 300 time position 0 south north
- 73. When to use Bayesian inference? ● When data is lacking ○ Frequentist: large error bars, poor estimates ○ By making more assumptions, we get to leverage more information ○ In large sample size limit, Bayesian and Frequentist approaches converge ● When you desire levels of belief over True/False statements ○ Bayesian approach still has distribution for parameter, even for large sample size ○ Bayesian interpretation of probability is more intuitive to people

- At the end of the day we want a probability density function. That has proportion on the x axis and some ratio on the y, and the area under the function has to equal 1.
- The way we want to answer those questions is through Bayesian data analysis Humble approach to inference, because all it does is the following Every Bayesian model instantiates this approach. This water problem is complicated – possible explanation of data – proportion. Infinite number of ratios between 0 and 1
- Only four some are blue and some are white. We don’t know how many of each kind. Sample with replacement. Given our observation how can we use that to make an inference of the contents of the bag.
- Assume the second possibility is true – not true for ever, just for right now.
- Second possibility branches off from the first. 4 + 4 + 4 + 4 – possibilities.
- Ok so why are we doing this. We are doing this so, we are doing this because now we have a visual device that lets us count all the way the data could have occurred assuming the contents of the bag were 1 blue marble and 4 white marbles. We will look at the other possibilities next, but right now we are sticking to that assumption. Let’s trace that out.
- All the other possibilities are forecloses because the first one was blue.
- So let’s summarize, because that is the goal. We will assume each of these and compare counts – that is essentially Bayesian inference, but usually it is done with complicated integrals.
- Typical Bayesian analysis looks fancier, these calculations are automated – but this is the logic behind it.
- Why do this? It is unreasonable effective Nothing more than logic. State assumptions. They have implications for data / observations. Assumptions that are more compatible with the observations are considered more plausible. Summarize we have all the explanations here in a column.
- We give each explanation a parameter or index based on the proportion of blue marbles in each explanation. This will tie in with the water on Earth problem.
- Relative count values are important. Not the absolute value. Deal with arbitrary combinatorics of data sets is to normalize. Normalize counts to sum to one.
- In any reasonably sized data set the numbers would get huge, so we need to normalize.