Learn how to implement Bayesian workflows using CmdStanPy (a Python interface for Stan). In this hands-on workshop, we will be working with a very fun (surprise!) dataset and make predictions using Bayesian methods.
CmdStanPy allows pythonistas to add the power of Bayesian inference to their toolkit via a small set of functions and objects designed to use minimal memory and parallelize computation. Given a dataset and a statistical model written as a Stan program, CmdStanPy compiles the model, runs Stan’s MCMC sampler (via CmdStan) to obtain a sample from the posterior, and assembles this sample as a numpy nd-array or pandas.dataframe for downstream visualization and analysis.
Mitzi Morris is a member of the Stan development team and the developer of CmdStanPy ( https://mc-stan.org/about/team/ ).
She has worked as a software engineer in both academia and industry. She started out writing tools for Natural Language Processing in C and Java, then moved to genomics and biomedical informatics where she built pipelines for high-throughput sequencing electronic medical records data, all of which led to an increased interest in doing more and better statistics. She has been a Stan contributor since 2014 and joined the Stan team at Columbia in 2017.
What are the advantages and disadvantages of membrane structures.pptx
Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris
1. Introduction to Bayesian
Workflows with CmdStanPy
Mitzi Morris
Stan Development Team
Columbia University, New York NY
September 10, 2019
1
2. Talk Outline
• Audience survey
• A few words about Bayesian Data Analysis
• A few words about Stan and CmdStanPy
• Let’s do some Data Analysis!
2
3. Bayesian Data Analysis
• “Statistics is applied statistics and Bayesian data analysis is
statistics using conditional probability” - Andrew Gelman
• "By Bayesian data analysis, we mean practical methods for
making inferences from data using probability models for
quantities we observe and about which we wish to learn.
• “The essential characteristic of Bayesian
methods is their explicit use of probability
for quantifying uncertainty in inferences
based on statistical analysis.”
- Gelman et al., Bayesian Data Analysis,
3rd edition, 2013
3
4. 2019 FIFA Women’s World Cup
We wish to learn WHO WILL WIN?
The Data
• Soccer Power Index (SPI) before the tournament - estimate of
team rank going into the World Cup
• Final scores from all the matches through the quarter finals 4
5. Statistical Modeling Terminology
• y - data
• θ - parameters
• p(y, θ) - joint probability distribution of the data and
parameters
• p(y| θ) - conditional probability of the data given the
parameters
• if y is fixed, this is the likelihood function
• if θ is fixed, this is the sampling distribution
• p(θ| y) - posterior probability distribution - the probability
of the parameters given the data
• p(θ) - prior probability distribution - the probability of the
parameters before any data are observed
• p(˜y| y) - posterior predictive distribution - the probability of
new data (˜y) conditioned on observed data (y)
5
6. Bayes’s Rule and how we use it
Relates the posterior probability to the joint probability
p(θ|y) =
p(y, θ)
p(y)
[def of conditional probability]
=
p(y|θ) p(θ)
p(y)
[rewrite joint probability as conditional]
because factor p(y) doesn’t depend on θ and is constant for fixed y,
it acts as a proprotional constant and can be omitted,
therefore all we need to compute is:
p(θ|y) ∝ p(y|θ) p(θ) [unnormalized posterior density]
The posterior is proportional to the prior times the likelihood
6
7. “quantifying uncertainty in inferences”
The posterior is proportional to the prior times the likelihood
p(θ|y) ∝ p(y|θ) p(θ)
• We can compute the mean, median, mode
of the posterior probability function.
• Quantiles of the posterior probability function
provide credible intervals.
7
8. Bayesian Workflow
Simple workflow:
• (Data gathering, preliminary data analysis)
• Build the full joint probability model - use everything you know
about the world and the data
• Fit data to model (using Stan!)
• Evaluate the fit:
• how good is the fit?
• do the predictions make sense?
• how sensitive are the results to the modeling assumptions?
Full workflow - model expansion and model comparison - many
iterations of the simple workflow
8
9. Stan - the man, the language, the software
• Named after Stanislaw Ulam - originator of Monte Carlo (MC)
estimation techniques
• Probabilistic programming language
• Stan NUTS-HMC sampler - Markov Chain Monte Carlo
(MCMC) sampler
• Rich eco-system of downstream analysis packages (but not
enough in Python!)
• Open-source - https://github.com/stan-dev/stan
9
10. Stan Programming Language example model bernoulli.stan
data {
int<lower=0> N;
int<lower=0,upper=1> y[N];
}
parameters {
real<lower=0,upper=1> theta;
}
model {
theta ~ beta(1,1);
y ~ bernoulli(theta);
}
10
11. Monte Carlo Simulation: Calculate π
Computing π = 3.14... via simulation is the textbook application of
Monte Carlo methods.
• Generate points (x,y) uniformly at
random within range (-1, 1)
• Calculate proportion within unit
circle: x2 + y2 < 1
• Area of the square is 4
• Area of a circle is π r2
• Area of the unit circle is π
• Ratio of points inside circle
to total points is π
4
• π = points inside circle × 4
11
12. Monte Carlo Simulation: Calculate π using Python
import numpy as np
def estimate_pi(n: int) -> float:
xs = np.random.uniform(-1,1,n)
ys = np.random.uniform(-1,1,n)
dist_to_origin = [x**2 + y**2 for x,y in zip(xs, ys)]
in_circle = sum(dist < 1 for dist in dist_to_origin)
pi = float(4 * (in_circle / n))
return pi
N Pi.estimate elapsed.time
100 3.500 0.0008
10000 2.150 0.0300
1000000 3.139 3.2000
100000000 3.141 323.8000
12
13. Markov Chain Monte Carlo (MCMC)
• Standard MC estimation uses set of independent, identically
distributed (i.i.d.) draws according to probability function p(θ),
e.g. np.random.uniform(-1,1,n).
• For models where prior and likelihood are complex functions,
cannot compute directly.
• A Markov Chain is a sequence of draws where the conditional
probability of each draw depends only on the previous draw.
• This requires that the Markov Chain has converged to a
stationary state.
• Markov Chain Monte Carlo is a random sample of draws
from a Markov Chain.
• Warmup is the process of getting to convergence.
• If the chain has not converged, your sample is not valid.
13
14. Stan’s secret sauce: HMC-NUTS sampler
• Hamiltonian Monte Carlo - algorithm for efficient MCMC
sampling.
• Not actually secret: same algorithm used in PyMC3 and
Edward
• References and tutorials:
• Hoffman and Gelman, 2014
• Monnahan, 2016 - start here
• Stan User’s Guide
• Michael Betancourt tutorials and videos
14
15. CmdStanPy
• Designed to be lightweight
• minimal package dependencies
• minimal use of in-memory data structures
• good for production workflows
• Keeps up with latest Stan release
• BSD license
• Requirements:
• Python3
• C++ (comes with anaconda or Xcode) (PR in progress for
Windows installs)
15
16. Let’s do some data analysis!
Repository of models, data, and iPython notebooks:
• https://github.com/nyc-pyladies/2019-cmdstanpy-bayesian-workshop
Just the ipython notebook, run in Google colab:
• http://bit.ly/2m7DUjP
16
17. Massive Thanks!
NYC PyLadies, especially:
• Nitya Mandyam
• Melissa Ferrari
• Felice Ho
NYC WiMLDS, especially:
• Reshama Shaikh
Paris WiMLDS, especially:
• Caroline Chavier
Everyone who asked a question - keep on questioning!
17