SlideShare a Scribd company logo
Data science workflow
Andrew Gelman
Dept of Statistics and Dept of Political Science
Columbia University, New York
PyData, New York, 28 Nov 2017
The (abridged) model in Stan
parameters {
real b;
real<lower=0> sigma_a;
real<lower=0> sigma_y;
vector[nteams] a;
}
model {
a ~ normal(b*prior_score, sigma_a)
sqrt_dif ~ normal(a[team1] - a[team2], sigma_y);
}
Fit the model
Inference for Stan model: worldcup_first_try.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 25% 50% 75% n_eff Rhat
b 0.46 0.00 0.09 0.40 0.46 0.52 1039 1.00
sigma_a 0.14 0.00 0.07 0.09 0.13 0.19 203 1.01
sigma_y 0.42 0.00 0.05 0.38 0.42 0.46 956 1.00
a[1] 0.35 0.00 0.13 0.27 0.36 0.44 4000 1.00
a[2] 0.39 0.00 0.12 0.31 0.38 0.46 4000 1.00
a[3] 0.43 0.01 0.15 0.33 0.42 0.52 756 1.00
a[4] 0.20 0.01 0.16 0.11 0.22 0.31 966 1.00
a[5] 0.29 0.00 0.13 0.21 0.29 0.36 4000 1.00
. . .
Graph the estimates
Compare to model fit without prior rankings
Compare model to predictions
After finding and fixing a bug
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 5 10 15 20
0.00.20.40.60.81.0
Data on putts in pro golf
Distance from hole (feet)
Probabilityofsuccess
1346/1443
577/694
337/455
208/353
149/272
136/256
111/240
69/217
67/200
75/237
52/202
46/192
54/174
28/167
27/201
31/195
33/191
20/147
24/152
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 5 10 15 20
0.00.20.40.60.81.0
What's the probability of making a golf putt?
Distance from hole (feet)
Probabilityofsuccess
Logistic regression,
a = 2.2, b = −0.3
Geometry-based model
x
R
r
−2σ 0 2σ
Stan code
data {
int J;
int n[J];
real x[J];
int y[J];
real r;
real R;
}
parameters {
real<lower=0> sigma;
}
model {
real p[J];
p = 2*Phi(asin((R-r)/x) / sigma) - 1;
y ~ binomial(n, p);
}
Fit the model
golf <- read.table("golf.txt", header=TRUE, skip=2)
x <- golf$x
y <- golf$y
n <- golf$n
J <- length(y)
r <- (1.68/2)/12
R <- (4.25/2)/12
fit1 <- stan("golf1.stan")
Check convergence
> print(fit1)
Inference for Stan model: golf1.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 25% 50% 75% n_eff Rhat
sigma 0.03 0.00 0.00 0.03 0.03 0.03 1692 1
sigma_degrees 1.53 0.00 0.02 1.51 1.53 1.54 1692 1
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 5 10 15 20
0.00.20.40.60.81.0
What's the probability of making a golf putt?
Distance from hole (feet)
Probabilityofsuccess
Geometry−based model,
sigma = 1.5
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 5 10 15 20
0.00.20.40.60.81.0
Two models fit to the golf putting data
Distance from hole (feet)
Probabilityofsuccess
Logistic regression,
a = 2.2, b = −0.3
Geometry−based model,
sigma = 1.5
Birthdays!
The published graphs show data from 30 days in the year
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988
Trends
60
80
100
120
Relative Number of Births
Slow trend
Fast non-periodic component
Mean
Mon Tue Wed Thu Fri Sat Sun
Dayofweekeffect
60
80
100
120
1972
1976
1980
1984
1988
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Seasonaleffect
60
80
100
120
1972
1976
1980
1984
1988
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Dayofyeareffect
60
80
100
120
New year
Valentine's day
Leap dayApril 1st Memorial day
Independence day
Labor day
Halloween
Thanksgiving
Christmas
Mon Tue Wed Thu Fri Sat Sun
Dayofweekeffect
60
80
100
120
2002
2006
2010
2014
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Seasonaleffect
60
80
100
120
2002
2006
2010
2014
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Dayofyeareffect
60
80
100
120
New year
Valentine's day
Leap day
April 1st Memorial day
Independence day
Labor day
9/11
Halloween
Thanksgiving
Christmas
2000 2002 2004 2006 2008 2010 2012 2014
Trends
60
80
100
120
Relative Number of Births
Slow trend
Fast non-periodic component
Mean
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Dayofyeareffect
50
60
70
80
90
100
110
120
New year
Valentine's day
Leap day
April 1stMemorial day
Independence day
Labor day
9/11
Halloween
Thanksgiving
Christmas
13th day of month
Xbox estimates, adjusting for demographics
Xbox estimates, adjusting for demographics and
partisanship
Data from 2016
Some ideas in data science workflow
Data and information
Replication
Fake-data simulation (or statistical theory)
Comparing predictions to data
The network of models

More Related Content

What's hot

The final
The finalThe final
The final
michael chavez
 
STRESS ANALYSIS OF AN ISOTROPIC MATERIAL
STRESS ANALYSIS OF AN ISOTROPIC MATERIALSTRESS ANALYSIS OF AN ISOTROPIC MATERIAL
STRESS ANALYSIS OF AN ISOTROPIC MATERIALRohit Katarya
 
Deep Learning A-Z™: Artificial Neural Networks (ANN) - How do Neural Networks...
Deep Learning A-Z™: Artificial Neural Networks (ANN) - How do Neural Networks...Deep Learning A-Z™: Artificial Neural Networks (ANN) - How do Neural Networks...
Deep Learning A-Z™: Artificial Neural Networks (ANN) - How do Neural Networks...
Kirill Eremenko
 
Teoría y problemas de Sumas Notables II sn26 ccesa007
Teoría y problemas de Sumas Notables II  sn26 ccesa007Teoría y problemas de Sumas Notables II  sn26 ccesa007
Teoría y problemas de Sumas Notables II sn26 ccesa007
Demetrio Ccesa Rayme
 
Copier correction du devoir_de_synthèse_de_topographie
Copier correction du devoir_de_synthèse_de_topographieCopier correction du devoir_de_synthèse_de_topographie
Copier correction du devoir_de_synthèse_de_topographie
Ahmed Manai
 
Math unit21 formulae
Math unit21 formulaeMath unit21 formulae
Math unit21 formulae
eLearningJa
 
A game theoretic approach for runtime capacity allocation in map-reduce (WACC...
A game theoretic approach for runtime capacity allocation in map-reduce (WACC...A game theoretic approach for runtime capacity allocation in map-reduce (WACC...
A game theoretic approach for runtime capacity allocation in map-reduce (WACC...
EUBra BIGSEA
 
Javier dominguez 20800945 actividad 1_estructuras discretas
Javier dominguez 20800945 actividad 1_estructuras discretasJavier dominguez 20800945 actividad 1_estructuras discretas
Javier dominguez 20800945 actividad 1_estructuras discretas
JavierJoseDominguezd
 
Open GL T0074 56 sm4
Open GL T0074 56 sm4Open GL T0074 56 sm4
Open GL T0074 56 sm4Roziq Bahtiar
 
Boolean difference examples
Boolean difference examplesBoolean difference examples
Boolean difference examples
Aledin Group of Companies
 
Supersymmetric Q-balls and boson stars in (d + 1) dimensions - Mexico city ta...
Supersymmetric Q-balls and boson stars in (d + 1) dimensions - Mexico city ta...Supersymmetric Q-balls and boson stars in (d + 1) dimensions - Mexico city ta...
Supersymmetric Q-balls and boson stars in (d + 1) dimensions - Mexico city ta...
Jurgen Riedel
 
HMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude ControlHMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude Control
Pantelis Sopasakis
 
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Hsien-Hsin Sean Lee, Ph.D.
 
คู่มือการใช้ Casiofx5800 p surveyingprograms
คู่มือการใช้ Casiofx5800 p surveyingprogramsคู่มือการใช้ Casiofx5800 p surveyingprograms
คู่มือการใช้ Casiofx5800 p surveyingprogramsTherdkeat Khuonhat
 
Problem Application of Antiderivatives
Problem Application of AntiderivativesProblem Application of Antiderivatives
Problem Application of Antiderivatives
nyaz26
 
Fast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimizationFast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimization
Pantelis Sopasakis
 
Free FE practice problems
Free FE practice problemsFree FE practice problems
Free FE practice problems
EIT Experts
 

What's hot (19)

The final
The finalThe final
The final
 
STRESS ANALYSIS OF AN ISOTROPIC MATERIAL
STRESS ANALYSIS OF AN ISOTROPIC MATERIALSTRESS ANALYSIS OF AN ISOTROPIC MATERIAL
STRESS ANALYSIS OF AN ISOTROPIC MATERIAL
 
Deep Learning A-Z™: Artificial Neural Networks (ANN) - How do Neural Networks...
Deep Learning A-Z™: Artificial Neural Networks (ANN) - How do Neural Networks...Deep Learning A-Z™: Artificial Neural Networks (ANN) - How do Neural Networks...
Deep Learning A-Z™: Artificial Neural Networks (ANN) - How do Neural Networks...
 
Teoría y problemas de Sumas Notables II sn26 ccesa007
Teoría y problemas de Sumas Notables II  sn26 ccesa007Teoría y problemas de Sumas Notables II  sn26 ccesa007
Teoría y problemas de Sumas Notables II sn26 ccesa007
 
Copier correction du devoir_de_synthèse_de_topographie
Copier correction du devoir_de_synthèse_de_topographieCopier correction du devoir_de_synthèse_de_topographie
Copier correction du devoir_de_synthèse_de_topographie
 
Math unit21 formulae
Math unit21 formulaeMath unit21 formulae
Math unit21 formulae
 
Cilindro
CilindroCilindro
Cilindro
 
A game theoretic approach for runtime capacity allocation in map-reduce (WACC...
A game theoretic approach for runtime capacity allocation in map-reduce (WACC...A game theoretic approach for runtime capacity allocation in map-reduce (WACC...
A game theoretic approach for runtime capacity allocation in map-reduce (WACC...
 
Javier dominguez 20800945 actividad 1_estructuras discretas
Javier dominguez 20800945 actividad 1_estructuras discretasJavier dominguez 20800945 actividad 1_estructuras discretas
Javier dominguez 20800945 actividad 1_estructuras discretas
 
Open GL T0074 56 sm4
Open GL T0074 56 sm4Open GL T0074 56 sm4
Open GL T0074 56 sm4
 
Boolean difference examples
Boolean difference examplesBoolean difference examples
Boolean difference examples
 
Supersymmetric Q-balls and boson stars in (d + 1) dimensions - Mexico city ta...
Supersymmetric Q-balls and boson stars in (d + 1) dimensions - Mexico city ta...Supersymmetric Q-balls and boson stars in (d + 1) dimensions - Mexico city ta...
Supersymmetric Q-balls and boson stars in (d + 1) dimensions - Mexico city ta...
 
HMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude ControlHMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude Control
 
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
 
Trigo functions
Trigo functionsTrigo functions
Trigo functions
 
คู่มือการใช้ Casiofx5800 p surveyingprograms
คู่มือการใช้ Casiofx5800 p surveyingprogramsคู่มือการใช้ Casiofx5800 p surveyingprograms
คู่มือการใช้ Casiofx5800 p surveyingprograms
 
Problem Application of Antiderivatives
Problem Application of AntiderivativesProblem Application of Antiderivatives
Problem Application of Antiderivatives
 
Fast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimizationFast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimization
 
Free FE practice problems
Free FE practice problemsFree FE practice problems
Free FE practice problems
 

Similar to Data Science Workflow

jacobi method, gauss siedel for solving linear equations
jacobi method, gauss siedel for solving linear equationsjacobi method, gauss siedel for solving linear equations
jacobi method, gauss siedel for solving linear equations
Department of Telecommunications, Ministry of Communication & IT (INDIA)
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
Matt Moores
 
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
The Statistical and Applied Mathematical Sciences Institute
 
Laporan pemodelan dan simulasi
Laporan pemodelan dan simulasiLaporan pemodelan dan simulasi
Laporan pemodelan dan simulasi
Irwansyah Hazniel
 
Comparison GUM versus GUM+1
Comparison GUM  versus GUM+1Comparison GUM  versus GUM+1
Comparison GUM versus GUM+1
Maurice Maeck
 
CHAPTER 7.pdfdjdjdjdjdjdjdjsjsjddhhdudsko
CHAPTER 7.pdfdjdjdjdjdjdjdjsjsjddhhdudskoCHAPTER 7.pdfdjdjdjdjdjdjdjsjsjddhhdudsko
CHAPTER 7.pdfdjdjdjdjdjdjdjsjsjddhhdudsko
SydneyJaydeanKhanyil
 
Compression of “noisy” measurement data for plotting with TikZ and pgfplots
Compression of “noisy” measurement data for plotting with TikZ and pgfplotsCompression of “noisy” measurement data for plotting with TikZ and pgfplots
Compression of “noisy” measurement data for plotting with TikZ and pgfplots
Mathias Magdowski
 
Muhammad ariefnugraha 142014066_kode4
Muhammad ariefnugraha 142014066_kode4Muhammad ariefnugraha 142014066_kode4
Muhammad ariefnugraha 142014066_kode4
Muhammad Nugraha
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
Hojin Yang
 
Vu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptxVu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptx
QucngV
 
ADVANCED ALGORITHMS-UNIT-3-Final.ppt
ADVANCED   ALGORITHMS-UNIT-3-Final.pptADVANCED   ALGORITHMS-UNIT-3-Final.ppt
ADVANCED ALGORITHMS-UNIT-3-Final.ppt
ssuser702532
 
Precomputation for SMC-ABC with undirected graphical models
Precomputation for SMC-ABC with undirected graphical modelsPrecomputation for SMC-ABC with undirected graphical models
Precomputation for SMC-ABC with undirected graphical models
Matt Moores
 
Introduction to MATLAB
Introduction to MATLAB Introduction to MATLAB
Introduction to MATLAB
COMSATS Abbottabad
 
sheet6.pdf
sheet6.pdfsheet6.pdf
sheet6.pdf
aminasouyah
 
doc6.pdf
doc6.pdfdoc6.pdf
doc6.pdf
aminasouyah
 
paper6.pdf
paper6.pdfpaper6.pdf
paper6.pdf
aminasouyah
 
lecture5.pdf
lecture5.pdflecture5.pdf
lecture5.pdf
aminasouyah
 
Application of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimizationApplication of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimizationPranamesh Chakraborty
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
Matt Moores
 

Similar to Data Science Workflow (20)

jacobi method, gauss siedel for solving linear equations
jacobi method, gauss siedel for solving linear equationsjacobi method, gauss siedel for solving linear equations
jacobi method, gauss siedel for solving linear equations
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
 
Laporan pemodelan dan simulasi
Laporan pemodelan dan simulasiLaporan pemodelan dan simulasi
Laporan pemodelan dan simulasi
 
Comparison GUM versus GUM+1
Comparison GUM  versus GUM+1Comparison GUM  versus GUM+1
Comparison GUM versus GUM+1
 
CHAPTER 7.pdfdjdjdjdjdjdjdjsjsjddhhdudsko
CHAPTER 7.pdfdjdjdjdjdjdjdjsjsjddhhdudskoCHAPTER 7.pdfdjdjdjdjdjdjdjsjsjddhhdudsko
CHAPTER 7.pdfdjdjdjdjdjdjdjsjsjddhhdudsko
 
Compression of “noisy” measurement data for plotting with TikZ and pgfplots
Compression of “noisy” measurement data for plotting with TikZ and pgfplotsCompression of “noisy” measurement data for plotting with TikZ and pgfplots
Compression of “noisy” measurement data for plotting with TikZ and pgfplots
 
Numerical Methods Solving Linear Equations
Numerical Methods Solving Linear EquationsNumerical Methods Solving Linear Equations
Numerical Methods Solving Linear Equations
 
Muhammad ariefnugraha 142014066_kode4
Muhammad ariefnugraha 142014066_kode4Muhammad ariefnugraha 142014066_kode4
Muhammad ariefnugraha 142014066_kode4
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
 
Vu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptxVu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptx
 
ADVANCED ALGORITHMS-UNIT-3-Final.ppt
ADVANCED   ALGORITHMS-UNIT-3-Final.pptADVANCED   ALGORITHMS-UNIT-3-Final.ppt
ADVANCED ALGORITHMS-UNIT-3-Final.ppt
 
Precomputation for SMC-ABC with undirected graphical models
Precomputation for SMC-ABC with undirected graphical modelsPrecomputation for SMC-ABC with undirected graphical models
Precomputation for SMC-ABC with undirected graphical models
 
Introduction to MATLAB
Introduction to MATLAB Introduction to MATLAB
Introduction to MATLAB
 
sheet6.pdf
sheet6.pdfsheet6.pdf
sheet6.pdf
 
doc6.pdf
doc6.pdfdoc6.pdf
doc6.pdf
 
paper6.pdf
paper6.pdfpaper6.pdf
paper6.pdf
 
lecture5.pdf
lecture5.pdflecture5.pdf
lecture5.pdf
 
Application of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimizationApplication of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimization
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
PyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
PyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
PyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
PyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
PyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
PyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
PyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
PyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
PyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
PyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
PyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
PyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 

Recently uploaded (20)

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 

Data Science Workflow

  • 1. Data science workflow Andrew Gelman Dept of Statistics and Dept of Political Science Columbia University, New York PyData, New York, 28 Nov 2017
  • 2.
  • 3.
  • 4. The (abridged) model in Stan parameters { real b; real<lower=0> sigma_a; real<lower=0> sigma_y; vector[nteams] a; } model { a ~ normal(b*prior_score, sigma_a) sqrt_dif ~ normal(a[team1] - a[team2], sigma_y); }
  • 5. Fit the model Inference for Stan model: worldcup_first_try. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 25% 50% 75% n_eff Rhat b 0.46 0.00 0.09 0.40 0.46 0.52 1039 1.00 sigma_a 0.14 0.00 0.07 0.09 0.13 0.19 203 1.01 sigma_y 0.42 0.00 0.05 0.38 0.42 0.46 956 1.00 a[1] 0.35 0.00 0.13 0.27 0.36 0.44 4000 1.00 a[2] 0.39 0.00 0.12 0.31 0.38 0.46 4000 1.00 a[3] 0.43 0.01 0.15 0.33 0.42 0.52 756 1.00 a[4] 0.20 0.01 0.16 0.11 0.22 0.31 966 1.00 a[5] 0.29 0.00 0.13 0.21 0.29 0.36 4000 1.00 . . .
  • 7. Compare to model fit without prior rankings
  • 8. Compare model to predictions
  • 9. After finding and fixing a bug
  • 10. q q q q q q q q q q q q q q q q q q q 0 5 10 15 20 0.00.20.40.60.81.0 Data on putts in pro golf Distance from hole (feet) Probabilityofsuccess 1346/1443 577/694 337/455 208/353 149/272 136/256 111/240 69/217 67/200 75/237 52/202 46/192 54/174 28/167 27/201 31/195 33/191 20/147 24/152
  • 11. q q q q q q q q q q q q q q q q q q q 0 5 10 15 20 0.00.20.40.60.81.0 What's the probability of making a golf putt? Distance from hole (feet) Probabilityofsuccess Logistic regression, a = 2.2, b = −0.3
  • 13. Stan code data { int J; int n[J]; real x[J]; int y[J]; real r; real R; } parameters { real<lower=0> sigma; } model { real p[J]; p = 2*Phi(asin((R-r)/x) / sigma) - 1; y ~ binomial(n, p); }
  • 14. Fit the model golf <- read.table("golf.txt", header=TRUE, skip=2) x <- golf$x y <- golf$y n <- golf$n J <- length(y) r <- (1.68/2)/12 R <- (4.25/2)/12 fit1 <- stan("golf1.stan")
  • 15. Check convergence > print(fit1) Inference for Stan model: golf1. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 25% 50% 75% n_eff Rhat sigma 0.03 0.00 0.00 0.03 0.03 0.03 1692 1 sigma_degrees 1.53 0.00 0.02 1.51 1.53 1.54 1692 1
  • 16. q q q q q q q q q q q q q q q q q q q 0 5 10 15 20 0.00.20.40.60.81.0 What's the probability of making a golf putt? Distance from hole (feet) Probabilityofsuccess Geometry−based model, sigma = 1.5
  • 17. q q q q q q q q q q q q q q q q q q q 0 5 10 15 20 0.00.20.40.60.81.0 Two models fit to the golf putting data Distance from hole (feet) Probabilityofsuccess Logistic regression, a = 2.2, b = −0.3 Geometry−based model, sigma = 1.5
  • 19. The published graphs show data from 30 days in the year
  • 20.
  • 21.
  • 22. 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 Trends 60 80 100 120 Relative Number of Births Slow trend Fast non-periodic component Mean Mon Tue Wed Thu Fri Sat Sun Dayofweekeffect 60 80 100 120 1972 1976 1980 1984 1988 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Seasonaleffect 60 80 100 120 1972 1976 1980 1984 1988 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Dayofyeareffect 60 80 100 120 New year Valentine's day Leap dayApril 1st Memorial day Independence day Labor day Halloween Thanksgiving Christmas
  • 23. Mon Tue Wed Thu Fri Sat Sun Dayofweekeffect 60 80 100 120 2002 2006 2010 2014 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Seasonaleffect 60 80 100 120 2002 2006 2010 2014 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Dayofyeareffect 60 80 100 120 New year Valentine's day Leap day April 1st Memorial day Independence day Labor day 9/11 Halloween Thanksgiving Christmas 2000 2002 2004 2006 2008 2010 2012 2014 Trends 60 80 100 120 Relative Number of Births Slow trend Fast non-periodic component Mean
  • 24. Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Dayofyeareffect 50 60 70 80 90 100 110 120 New year Valentine's day Leap day April 1stMemorial day Independence day Labor day 9/11 Halloween Thanksgiving Christmas 13th day of month
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Xbox estimates, adjusting for demographics
  • 31. Xbox estimates, adjusting for demographics and partisanship
  • 33. Some ideas in data science workflow Data and information Replication Fake-data simulation (or statistical theory) Comparing predictions to data The network of models