SlideShare a Scribd company logo
1 of 17
Download to read offline
Introduction to Bayesian
Workflows with CmdStanPy
Mitzi Morris
Stan Development Team
Columbia University, New York NY
September 10, 2019
1
Talk Outline
• Audience survey
• A few words about Bayesian Data Analysis
• A few words about Stan and CmdStanPy
• Let’s do some Data Analysis!
2
Bayesian Data Analysis
• “Statistics is applied statistics and Bayesian data analysis is
statistics using conditional probability” - Andrew Gelman
• "By Bayesian data analysis, we mean practical methods for
making inferences from data using probability models for
quantities we observe and about which we wish to learn.
• “The essential characteristic of Bayesian
methods is their explicit use of probability
for quantifying uncertainty in inferences
based on statistical analysis.”
- Gelman et al., Bayesian Data Analysis,
3rd edition, 2013
3
2019 FIFA Women’s World Cup
We wish to learn WHO WILL WIN?
The Data
• Soccer Power Index (SPI) before the tournament - estimate of
team rank going into the World Cup
• Final scores from all the matches through the quarter finals 4
Statistical Modeling Terminology
• y - data
• θ - parameters
• p(y, θ) - joint probability distribution of the data and
parameters
• p(y| θ) - conditional probability of the data given the
parameters
• if y is fixed, this is the likelihood function
• if θ is fixed, this is the sampling distribution
• p(θ| y) - posterior probability distribution - the probability
of the parameters given the data
• p(θ) - prior probability distribution - the probability of the
parameters before any data are observed
• p(˜y| y) - posterior predictive distribution - the probability of
new data (˜y) conditioned on observed data (y)
5
Bayes’s Rule and how we use it
Relates the posterior probability to the joint probability
p(θ|y) =
p(y, θ)
p(y)
[def of conditional probability]
=
p(y|θ) p(θ)
p(y)
[rewrite joint probability as conditional]
because factor p(y) doesn’t depend on θ and is constant for fixed y,
it acts as a proprotional constant and can be omitted,
therefore all we need to compute is:
p(θ|y) ∝ p(y|θ) p(θ) [unnormalized posterior density]
The posterior is proportional to the prior times the likelihood
6
“quantifying uncertainty in inferences”
The posterior is proportional to the prior times the likelihood
p(θ|y) ∝ p(y|θ) p(θ)
• We can compute the mean, median, mode
of the posterior probability function.
• Quantiles of the posterior probability function
provide credible intervals.
7
Bayesian Workflow
Simple workflow:
• (Data gathering, preliminary data analysis)
• Build the full joint probability model - use everything you know
about the world and the data
• Fit data to model (using Stan!)
• Evaluate the fit:
• how good is the fit?
• do the predictions make sense?
• how sensitive are the results to the modeling assumptions?
Full workflow - model expansion and model comparison - many
iterations of the simple workflow
8
Stan - the man, the language, the software
• Named after Stanislaw Ulam - originator of Monte Carlo (MC)
estimation techniques
• Probabilistic programming language
• Stan NUTS-HMC sampler - Markov Chain Monte Carlo
(MCMC) sampler
• Rich eco-system of downstream analysis packages (but not
enough in Python!)
• Open-source - https://github.com/stan-dev/stan
9
Stan Programming Language example model bernoulli.stan
data {
int<lower=0> N;
int<lower=0,upper=1> y[N];
}
parameters {
real<lower=0,upper=1> theta;
}
model {
theta ~ beta(1,1);
y ~ bernoulli(theta);
}
10
Monte Carlo Simulation: Calculate π
Computing π = 3.14... via simulation is the textbook application of
Monte Carlo methods.
• Generate points (x,y) uniformly at
random within range (-1, 1)
• Calculate proportion within unit
circle: x2 + y2 < 1
• Area of the square is 4
• Area of a circle is π r2
• Area of the unit circle is π
• Ratio of points inside circle
to total points is π
4
• π = points inside circle × 4
11
Monte Carlo Simulation: Calculate π using Python
import numpy as np
def estimate_pi(n: int) -> float:
xs = np.random.uniform(-1,1,n)
ys = np.random.uniform(-1,1,n)
dist_to_origin = [x**2 + y**2 for x,y in zip(xs, ys)]
in_circle = sum(dist < 1 for dist in dist_to_origin)
pi = float(4 * (in_circle / n))
return pi
N Pi.estimate elapsed.time
100 3.500 0.0008
10000 2.150 0.0300
1000000 3.139 3.2000
100000000 3.141 323.8000
12
Markov Chain Monte Carlo (MCMC)
• Standard MC estimation uses set of independent, identically
distributed (i.i.d.) draws according to probability function p(θ),
e.g. np.random.uniform(-1,1,n).
• For models where prior and likelihood are complex functions,
cannot compute directly.
• A Markov Chain is a sequence of draws where the conditional
probability of each draw depends only on the previous draw.
• This requires that the Markov Chain has converged to a
stationary state.
• Markov Chain Monte Carlo is a random sample of draws
from a Markov Chain.
• Warmup is the process of getting to convergence.
• If the chain has not converged, your sample is not valid.
13
Stan’s secret sauce: HMC-NUTS sampler
• Hamiltonian Monte Carlo - algorithm for efficient MCMC
sampling.
• Not actually secret: same algorithm used in PyMC3 and
Edward
• References and tutorials:
• Hoffman and Gelman, 2014
• Monnahan, 2016 - start here
• Stan User’s Guide
• Michael Betancourt tutorials and videos
14
CmdStanPy
• Designed to be lightweight
• minimal package dependencies
• minimal use of in-memory data structures
• good for production workflows
• Keeps up with latest Stan release
• BSD license
• Requirements:
• Python3
• C++ (comes with anaconda or Xcode) (PR in progress for
Windows installs)
15
Let’s do some data analysis!
Repository of models, data, and iPython notebooks:
• https://github.com/nyc-pyladies/2019-cmdstanpy-bayesian-workshop
Just the ipython notebook, run in Google colab:
• http://bit.ly/2m7DUjP
16
Massive Thanks!
NYC PyLadies, especially:
• Nitya Mandyam
• Melissa Ferrari
• Felice Ho
NYC WiMLDS, especially:
• Reshama Shaikh
Paris WiMLDS, especially:
• Caroline Chavier
Everyone who asked a question - keep on questioning!
17

More Related Content

Similar to Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris

NIDM-Results. A standard for describing and sharing neuroimaging results: app...
NIDM-Results. A standard for describing and sharing neuroimaging results: app...NIDM-Results. A standard for describing and sharing neuroimaging results: app...
NIDM-Results. A standard for describing and sharing neuroimaging results: app...Camille Maumet
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer visionEran Shlomo
 
Probability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdfProbability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdfVedant Srivastava
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big DataGianvito Siciliano
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]MithunPChandra
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 
OHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysisOHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysisCamille Maumet
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...cscpconf
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsIstituto nazionale di statistica
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 

Similar to Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris (20)

Into to prob_prog_hari
Into to prob_prog_hariInto to prob_prog_hari
Into to prob_prog_hari
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 
NIDM-Results. A standard for describing and sharing neuroimaging results: app...
NIDM-Results. A standard for describing and sharing neuroimaging results: app...NIDM-Results. A standard for describing and sharing neuroimaging results: app...
NIDM-Results. A standard for describing and sharing neuroimaging results: app...
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
MLE.pdf
MLE.pdfMLE.pdf
MLE.pdf
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Defense_final
Defense_finalDefense_final
Defense_final
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer vision
 
Probability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdfProbability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdf
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
OHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysisOHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysis
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Core Training Presentations- 3 Estimating an Ag Database using CE Methods
Core Training Presentations- 3 Estimating an Ag Database using CE MethodsCore Training Presentations- 3 Estimating an Ag Database using CE Methods
Core Training Presentations- 3 Estimating an Ag Database using CE Methods
 

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 
Iana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdfIana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdf
 
41 WiMLDS Kyiv Paris Poznan.pdf
41 WiMLDS Kyiv Paris Poznan.pdf41 WiMLDS Kyiv Paris Poznan.pdf
41 WiMLDS Kyiv Paris Poznan.pdf
 

Recently uploaded

chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 

Recently uploaded (20)

chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 

Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris

  • 1. Introduction to Bayesian Workflows with CmdStanPy Mitzi Morris Stan Development Team Columbia University, New York NY September 10, 2019 1
  • 2. Talk Outline • Audience survey • A few words about Bayesian Data Analysis • A few words about Stan and CmdStanPy • Let’s do some Data Analysis! 2
  • 3. Bayesian Data Analysis • “Statistics is applied statistics and Bayesian data analysis is statistics using conditional probability” - Andrew Gelman • "By Bayesian data analysis, we mean practical methods for making inferences from data using probability models for quantities we observe and about which we wish to learn. • “The essential characteristic of Bayesian methods is their explicit use of probability for quantifying uncertainty in inferences based on statistical analysis.” - Gelman et al., Bayesian Data Analysis, 3rd edition, 2013 3
  • 4. 2019 FIFA Women’s World Cup We wish to learn WHO WILL WIN? The Data • Soccer Power Index (SPI) before the tournament - estimate of team rank going into the World Cup • Final scores from all the matches through the quarter finals 4
  • 5. Statistical Modeling Terminology • y - data • θ - parameters • p(y, θ) - joint probability distribution of the data and parameters • p(y| θ) - conditional probability of the data given the parameters • if y is fixed, this is the likelihood function • if θ is fixed, this is the sampling distribution • p(θ| y) - posterior probability distribution - the probability of the parameters given the data • p(θ) - prior probability distribution - the probability of the parameters before any data are observed • p(˜y| y) - posterior predictive distribution - the probability of new data (˜y) conditioned on observed data (y) 5
  • 6. Bayes’s Rule and how we use it Relates the posterior probability to the joint probability p(θ|y) = p(y, θ) p(y) [def of conditional probability] = p(y|θ) p(θ) p(y) [rewrite joint probability as conditional] because factor p(y) doesn’t depend on θ and is constant for fixed y, it acts as a proprotional constant and can be omitted, therefore all we need to compute is: p(θ|y) ∝ p(y|θ) p(θ) [unnormalized posterior density] The posterior is proportional to the prior times the likelihood 6
  • 7. “quantifying uncertainty in inferences” The posterior is proportional to the prior times the likelihood p(θ|y) ∝ p(y|θ) p(θ) • We can compute the mean, median, mode of the posterior probability function. • Quantiles of the posterior probability function provide credible intervals. 7
  • 8. Bayesian Workflow Simple workflow: • (Data gathering, preliminary data analysis) • Build the full joint probability model - use everything you know about the world and the data • Fit data to model (using Stan!) • Evaluate the fit: • how good is the fit? • do the predictions make sense? • how sensitive are the results to the modeling assumptions? Full workflow - model expansion and model comparison - many iterations of the simple workflow 8
  • 9. Stan - the man, the language, the software • Named after Stanislaw Ulam - originator of Monte Carlo (MC) estimation techniques • Probabilistic programming language • Stan NUTS-HMC sampler - Markov Chain Monte Carlo (MCMC) sampler • Rich eco-system of downstream analysis packages (but not enough in Python!) • Open-source - https://github.com/stan-dev/stan 9
  • 10. Stan Programming Language example model bernoulli.stan data { int<lower=0> N; int<lower=0,upper=1> y[N]; } parameters { real<lower=0,upper=1> theta; } model { theta ~ beta(1,1); y ~ bernoulli(theta); } 10
  • 11. Monte Carlo Simulation: Calculate π Computing π = 3.14... via simulation is the textbook application of Monte Carlo methods. • Generate points (x,y) uniformly at random within range (-1, 1) • Calculate proportion within unit circle: x2 + y2 < 1 • Area of the square is 4 • Area of a circle is π r2 • Area of the unit circle is π • Ratio of points inside circle to total points is π 4 • π = points inside circle × 4 11
  • 12. Monte Carlo Simulation: Calculate π using Python import numpy as np def estimate_pi(n: int) -> float: xs = np.random.uniform(-1,1,n) ys = np.random.uniform(-1,1,n) dist_to_origin = [x**2 + y**2 for x,y in zip(xs, ys)] in_circle = sum(dist < 1 for dist in dist_to_origin) pi = float(4 * (in_circle / n)) return pi N Pi.estimate elapsed.time 100 3.500 0.0008 10000 2.150 0.0300 1000000 3.139 3.2000 100000000 3.141 323.8000 12
  • 13. Markov Chain Monte Carlo (MCMC) • Standard MC estimation uses set of independent, identically distributed (i.i.d.) draws according to probability function p(θ), e.g. np.random.uniform(-1,1,n). • For models where prior and likelihood are complex functions, cannot compute directly. • A Markov Chain is a sequence of draws where the conditional probability of each draw depends only on the previous draw. • This requires that the Markov Chain has converged to a stationary state. • Markov Chain Monte Carlo is a random sample of draws from a Markov Chain. • Warmup is the process of getting to convergence. • If the chain has not converged, your sample is not valid. 13
  • 14. Stan’s secret sauce: HMC-NUTS sampler • Hamiltonian Monte Carlo - algorithm for efficient MCMC sampling. • Not actually secret: same algorithm used in PyMC3 and Edward • References and tutorials: • Hoffman and Gelman, 2014 • Monnahan, 2016 - start here • Stan User’s Guide • Michael Betancourt tutorials and videos 14
  • 15. CmdStanPy • Designed to be lightweight • minimal package dependencies • minimal use of in-memory data structures • good for production workflows • Keeps up with latest Stan release • BSD license • Requirements: • Python3 • C++ (comes with anaconda or Xcode) (PR in progress for Windows installs) 15
  • 16. Let’s do some data analysis! Repository of models, data, and iPython notebooks: • https://github.com/nyc-pyladies/2019-cmdstanpy-bayesian-workshop Just the ipython notebook, run in Google colab: • http://bit.ly/2m7DUjP 16
  • 17. Massive Thanks! NYC PyLadies, especially: • Nitya Mandyam • Melissa Ferrari • Felice Ho NYC WiMLDS, especially: • Reshama Shaikh Paris WiMLDS, especially: • Caroline Chavier Everyone who asked a question - keep on questioning! 17