Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris

Introduction to Bayesian
Workﬂows with CmdStanPy
Mitzi Morris
Stan Development Team
Columbia University, New York NY
September 10, 2019
1

Talk Outline
• Audience survey
• A few words about Bayesian Data Analysis
• A few words about Stan and CmdStanPy
• Let’s do some Data Analysis!
2

Bayesian Data Analysis
• “Statistics is applied statistics and Bayesian data analysis is
statistics using conditional probability” - Andrew Gelman
• "By Bayesian data analysis, we mean practical methods for
making inferences from data using probability models for
quantities we observe and about which we wish to learn.
• “The essential characteristic of Bayesian
methods is their explicit use of probability
for quantifying uncertainty in inferences
based on statistical analysis.”
- Gelman et al., Bayesian Data Analysis,
3rd edition, 2013
3

2019 FIFA Women’s World Cup
We wish to learn WHO WILL WIN?
The Data
• Soccer Power Index (SPI) before the tournament - estimate of
team rank going into the World Cup
• Final scores from all the matches through the quarter ﬁnals 4

Statistical Modeling Terminology
• y - data
• θ - parameters
• p(y, θ) - joint probability distribution of the data and
parameters
• p(y| θ) - conditional probability of the data given the
parameters
• if y is ﬁxed, this is the likelihood function
• if θ is ﬁxed, this is the sampling distribution
• p(θ| y) - posterior probability distribution - the probability
of the parameters given the data
• p(θ) - prior probability distribution - the probability of the
parameters before any data are observed
• p(˜y| y) - posterior predictive distribution - the probability of
new data (˜y) conditioned on observed data (y)
5

Bayes’s Rule and how we use it
Relates the posterior probability to the joint probability
p(θ|y) =
p(y, θ)
p(y)
[def of conditional probability]
=
p(y|θ) p(θ)
p(y)
[rewrite joint probability as conditional]
because factor p(y) doesn’t depend on θ and is constant for ﬁxed y,
it acts as a proprotional constant and can be omitted,
therefore all we need to compute is:
p(θ|y) ∝ p(y|θ) p(θ) [unnormalized posterior density]
The posterior is proportional to the prior times the likelihood
6

“quantifying uncertainty in inferences”
The posterior is proportional to the prior times the likelihood
p(θ|y) ∝ p(y|θ) p(θ)
• We can compute the mean, median, mode
of the posterior probability function.
• Quantiles of the posterior probability function
provide credible intervals.
7

Bayesian Workflow
Simple workflow:
• (Data gathering, preliminary data analysis)
• Build the full joint probability model - use everything you know
about the world and the data
• Fit data to model (using Stan!)
• Evaluate the fit:
• how good is the fit?
• do the predictions make sense?
• how sensitive are the results to the modeling assumptions?
Full workflow - model expansion and model comparison - many
iterations of the simple workflow
8

Stan - the man, the language, the software
• Named after Stanislaw Ulam - originator of Monte Carlo (MC)
estimation techniques
• Probabilistic programming language
• Stan NUTS-HMC sampler - Markov Chain Monte Carlo
(MCMC) sampler
• Rich eco-system of downstream analysis packages (but not
enough in Python!)
• Open-source - https://github.com/stan-dev/stan
9

Stan Programming Language example model bernoulli.stan
data {
int<lower=0> N;
int<lower=0,upper=1> y[N];
}
parameters {
real<lower=0,upper=1> theta;
}
model {
theta ~ beta(1,1);
y ~ bernoulli(theta);
}
10

Monte Carlo Simulation: Calculate π
Computing π = 3.14... via simulation is the textbook application of
Monte Carlo methods.
• Generate points (x,y) uniformly at
random within range (-1, 1)
• Calculate proportion within unit
circle: x2 + y2 < 1
• Area of the square is 4
• Area of a circle is π r2
• Area of the unit circle is π
• Ratio of points inside circle
to total points is π
4
• π = points inside circle × 4
11

Monte Carlo Simulation: Calculate π using Python
import numpy as np
def estimate_pi(n: int) -> float:
xs = np.random.uniform(-1,1,n)
ys = np.random.uniform(-1,1,n)
dist_to_origin = [x**2 + y**2 for x,y in zip(xs, ys)]
in_circle = sum(dist < 1 for dist in dist_to_origin)
pi = float(4 * (in_circle / n))
return pi
N Pi.estimate elapsed.time
100 3.500 0.0008
10000 2.150 0.0300
1000000 3.139 3.2000
100000000 3.141 323.8000
12

Markov Chain Monte Carlo (MCMC)
• Standard MC estimation uses set of independent, identically
distributed (i.i.d.) draws according to probability function p(θ),
e.g. np.random.uniform(-1,1,n).
• For models where prior and likelihood are complex functions,
cannot compute directly.
• A Markov Chain is a sequence of draws where the conditional
probability of each draw depends only on the previous draw.
• This requires that the Markov Chain has converged to a
stationary state.
• Markov Chain Monte Carlo is a random sample of draws
from a Markov Chain.
• Warmup is the process of getting to convergence.
• If the chain has not converged, your sample is not valid.
13

Stan’s secret sauce: HMC-NUTS sampler
• Hamiltonian Monte Carlo - algorithm for eﬃcient MCMC
sampling.
• Not actually secret: same algorithm used in PyMC3 and
Edward
• References and tutorials:
• Hoﬀman and Gelman, 2014
• Monnahan, 2016 - start here
• Stan User’s Guide
• Michael Betancourt tutorials and videos
14

CmdStanPy
• Designed to be lightweight
• minimal package dependencies
• minimal use of in-memory data structures
• good for production workﬂows
• Keeps up with latest Stan release
• BSD license
• Requirements:
• Python3
• C++ (comes with anaconda or Xcode) (PR in progress for
Windows installs)
15

Let’s do some data analysis!
Repository of models, data, and iPython notebooks:
• https://github.com/nyc-pyladies/2019-cmdstanpy-bayesian-workshop
Just the ipython notebook, run in Google colab:
• http://bit.ly/2m7DUjP
16

Massive Thanks!
NYC PyLadies, especially:
• Nitya Mandyam
• Melissa Ferrari
• Felice Ho
NYC WiMLDS, especially:
• Reshama Shaikh
Paris WiMLDS, especially:
• Caroline Chavier
Everyone who asked a question - keep on questioning!
17

Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris

Recommended

Recommended

More Related Content

Similar to Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris

Similar to Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris (20)

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Recently uploaded

Recently uploaded (20)

Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris