SlideShare a Scribd company logo
1 of 22
Download to read offline
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 1/22
Prerequisites
This post is an introduction to Bayesian probability and inference. We will discuss the intuition
behind these concepts, and provide some examples written in Python to help you get started. To
get the most out of this introduction, the reader should have a basic understanding of statistics
and probability, as well as some experience with Python. The examples use the
Python package pymc3.
Introduction to Bayesian Thinking
USE CASES, LEARN DATA SCIENCE, PYTHON, DATA VISUALIZATION
Introduction to Bayesian
Inference
Aaron Kramer 12.12.16

5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 2/22
Bayesian inference is an extremely powerful set of tools for modeling any random variable, such
as the value of a regression parameter, a demographic statistic, a business KPI, or the part of
speech of a word. We provide our understanding of a problem and some data, and in return get
a quantitative measure of how certain we are of a particular fact. This approach to modeling
uncertainty is particularly useful when:
Data is limited
We're worried about overfitting
We have reason to believe that some facts are more likely than others, but that
information is not contained in the data we model on
We're interested in precisely knowing how likely certain facts are, as opposed
to just picking the most likely fact
The table below enumerates some applied tasks that exhibit these challenges, and describes
how Bayesian inference can be used to solve them. Don't worry if the Bayesian solutions are
foreign to you, they will make more sense as you read this post:
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 3/22
 
Typically, Bayesian inference is a term used as a counterpart to frequentist inference. This can be
confusing, as the lines drawn between the two approaches are blurry. The true Bayesian and
frequentist distinction is that of philosophical differences between how people interpret what
probability is. We'll focus on Bayesian concepts that are foreign to traditional frequentist
approaches and are actually used in applied work, specifically the prior and posterior
distributions.
Consider Bayes' theorem:
Think of A as some proposition about the world, and B as some data or evidence. For example, A
represents the proposition that it rained today, and B represents the evidence that the sidewalk
outside is wet: 
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 4/22
p(rain | wet) asks, "What is the probability that it rained given that it is wet outside?" To evaluate
this question, let's walk through the right side of the equation. Before looking at the ground,
what is the probability that it rained, p(rain)? Think of this as the plausibility of an assumption
about the world. We then ask how likely the observation that it is wet outside is under that
assumption, p(wet | rain)? This procedure effectively updates our initial beliefs about a
proposition with some observation, yielding a final measure of the plausibility of rain, given the
evidence.
This procedure is the basis for Bayesian inference, where our initial beliefs are represented by
the prior distribution p(rain), and our final beliefs are represented by the posterior distribution
p(rain | wet). The denominator simply asks, "What is the total plausibility of the evidence?",
whereby we have to consider all assumptions to ensure that the posterior is a proper probability
distribution.
Bayesians are uncertain about what is true (the value of a KPI, a regression coefficient, etc.), and
use data as evidence that certain facts are more likely than others. Prior distributions reflect our
beliefs before seeing any data, and posterior distributions reflect our beliefs after we have
considered all the evidence. To unpack what that means and how to leverage these concepts for
actual analysis, let's consider the example of evaluating new marketing campaigns.
Example: Evaluating New Marketing Campaigns Using Bayesian Inference 
Assume that we run an ecommerce platform for clothing and in order to bring people to our site,
we deploy several digital marketing campaigns. These campaigns feature various ad images and
captions, and are presented on a number of social networking websites. We want to present the
ads that are the most successful. For the sake of simplicity, we can assume that the most
successful campaign is the one that results in the highest click-through rate: the ads that are
most likely to be clicked if shown.
We introduce a new campaign called "facebook-yellow-dress," a campaign presented to
Facebook users featuring a yellow dress. The ad has been presented to 10 users so far, and 7 of
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 5/22
the users have clicked on it. We would like to estimate the probability that the next user will click
on the ad. 
By encoding a click as a success and a non-click as a failure, we're estimating the probability
θ that a given user will click on the ad. Naturally, we are going to use the campaign's historical
record as evidence. Because we are considering unordered draws of an event that can be either
0 or 1, we can infer the probability θ by considering the campaign's history as a sample from a
binomial distribution, with probability of success θ. Traditional approaches of inference consider
multiple values of θ and pick the value that is most aligned with the data. This is known as
maximum likelihood, because we're evaluating how likely our data is under various assumptions
and choosing the best assumption as true. More formally:
 argmax p(X |θ), where X is the data we've observed.
Here, p(X |θ) is our likelihood function; if we fix the parameter θ, what is the probability of
observing the data we've seen? Let's look at the likelihood of various values of θ given the data
we have for facebook-yellow-dress:
θ
import numpy as np
from scipy.misc import factorial
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (16,7)
def likelihood(theta, n, x):
"""
likelihood function for a binomial distribution
n: [int] the number of experiments
x: [int] the number of successes
theta: [float] the proposed probability of success
"""
return (factorial(n) / (factorial(x) * factorial(n - x))) 
* (theta ** x) * ((1 - theta) ** (n - x))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 6/22
  
#the number of impressions for our facebook-yellow-dress campaignn_impressions
#the number of clicks for our facebook-yellow-dress campaign
n_clicks = 7.
#observed click through rate
ctr = n_clicks / n_impressions
#0 to 1, all possible click through rates
possible_theta_values = map(lambda x: x/100., range(100))
#evaluate the likelihood function for possible click through rates
likelihoods = map(lambda theta: likelihood(theta, n, x)
, possible_theta_values)
#pick the best theta
mle = possible_theta_values[np.argmax(likelihoods)]
#plot
f, ax = plt.subplots(1)
ax.plot(possible_theta_values, likelihoods)
ax.axvline(mle, linestyle = "--")
ax.set_xlabel("Theta")
ax.set_ylabel("Likelihood")
ax.grid()
ax.set_title("Likelihood of Theta for New Campaign")
plt.show()
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 7/22
 
Of the 10 people we showed the new ad to, 7 of them clicked on it. So naturally, our likelihood
function is telling us that the most likely value of theta is 0.7. However, some of our analysts are
skeptical. The performance of this campaign seems extremely high given how our other
campaigns have done historically. Let's overlay this likelihood function with the distribution of
click-through rates from our previous 100 campaigns:
plt.rcParams['figure.figsize'] = (16, 7)
import numpy as np
import pandas as pd
true_a = 11.5
true_b = 48.5
#number of marketing campaigns
N = 100#randomly generate "true" click through rate for each campaign
p = np.random.beta(true_a,true_b, size=N)
#randomly pick the number of impressions for each campaign
impressions = np.random.randint(1, 10000, size=N)
#sample number of clicks for each campaign
clicks = np.random.binomial(impressions, p).astype(float)
click_through_rates = clicks / impressions
#plot the histogram of previous click through rates with the evidence#of the n
f, ax = plt.subplots(1)
ax.axvline(mle, linestyle = "--")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 8/22
 
Clearly, the maximum likelihood method is giving us a value that is outside what we
would normally see. Perhaps our analysts are right to be skeptical; as the campaign
continues to run, its click-through rate could decrease. Alternatively, this campaign
could be truly outperforming all previous campaigns. We can't be sure. Ideally, we
would rely on other campaigns' history if we had no data from our new campaign. And
ax.plot(possible_theta_values, likelihoods)
zero_to_one = [j/100. for j in xrange(100)]
counts, bins = np.histogram(click_through_rates
, bins=zero_to_one)
counts = counts / 100.
ax.plot(bins[:-1],counts, alpha = .5)
line1, line2, line3 = ax.lines
ax.legend((line2, line3), ('Likelihood of Theta for New Campaign'
, 'Frequency of Theta Historically')
, loc = 'upper left')
ax.set_xlabel("Theta")
ax.grid()
ax.set_title("Evidence vs Historical Click Through Rates")
plt.show()
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 9/22
as we got more and more data, we would allow the new campaign data to speak for
itself.
The Prior Distribution
This skepticism corresponds to prior probability in Bayesian inference. Before considering any
data at all, we believe that certain values of θ are more likely than others, given what we know
about marketing campaigns. We believe, for instance, that p(θ = 0.2)>p(θ = 0.5), since none of our
previous campaigns have had click-through rates remotely close to 0.5. We express our prior
beliefs of θ with p(θ). Using historical campaigns to assess p(θ) is our choice as a researcher.
Generally, prior distributions can be chosen with many goals in mind:
Informative; empirical: We have some data from related experiments and
choose to leverage that data to inform our prior beliefs. Our prior beliefs will
impact our final assessment.
Informative; non-empirical: We have some inherent reason to prefer certain
values over others. For instance, if we want to regularize a regression to prevent
overfitting, we might set the prior distribution of our coefficients to have
decreasing probability as we move away from 0. Our prior beliefs will impact
our final assessment.
Informative; domain-knowledge: Though we do not have supporting data, we
know as domain experts that certain facts are more true than others. Our prior
beliefs will impact our final assessment.
Non-informative: Our prior beliefs will have little to no effect on our final
assessment. We want the data to speak for itself.
For our example, because we have related data and limited data on the new campaign, we will
use an informative, empirical prior. We will choose a beta distribution for our prior for θ. The
beta distribution is a 2 parameter (α, β) distribution that is often used as a prior for
the θ parameter of the binomial distribution. Because we want to use our previous campaigns as
the basis for our prior beliefs, we will determine α and β by fitting a beta distribution to our
historical click-through rates. Below, we fit the beta distribution and compare the estimated prior
distribution with previous click-through rates to ensure the two are properly aligned:
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 10/22
 
from scipy.stats import beta
#fit beta to previous CTRs
prior_parameters = beta.fit(click_through_rates
, floc = 0
, fscale = 1)#extract a,b from fit
prior_a, prior_b = prior_parameters[0:2]
#define prior distribution sample from prior
prior_distribution = beta(prior_a, prior_b)
#get histogram of samples
prior_samples = prior_distribution.rvs(10000)
#get histogram of samples
fit_counts, bins = np.histogram(prior_samples
, zero_to_one)#normalize histogram
fit_counts = map(lambda x: float(x)/fit_counts.sum()
, fit_counts)
#plot
f, ax = plt.subplots(1)
ax.plot(bins[:-1], fit_counts)
hist_ctr, bins = np.histogram(click_through_rates
, zero_to_one)
hist_ctr = map(lambda x: float(x)/hist_ctr.sum()
, hist_ctr)
ax.plot(bins[:-1], hist_ctr)
estimated_prior, previous_click_through_rates = ax.lines
ax.legend((estimated_prior, previous_click_through_rates)
,('Estimated Prior'
, 'Previous Click Through Rates'))
ax.grid()
ax.set_title("Comparing Empirical Prior with Previous Click Through Rates")
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 11/22
We find that the best values of α and β are 11.5 and 48.5, respectively. The beta distribution with
these parameters does a good job capturing the click-through rates from our previous
campaigns, so we will use it as our prior. We will now update our prior beliefs with the data from
the facebook-yellow-dress campaign to form our posterior distribution.
The Posterior Distribution
After considering the 10 impressions of data we have for the facebook-yellow-dress campaign,
the posterior distribution of θ gives us plausibility of any click-through rate from 0 to 1.
The effect of our data, or our evidence, is provided by the likelihood function, p(X|θ). What we
are ultimately interested in is the plausibility of all proposed values of θ given our data or our
posterior distribution p(θ|X). From the earlier section introducing Bayes' Theorem, our posterior
distribution is given by the product of our likelihood function and our prior distribution:
Since p(X) is a constant, as it does not depend on θ, we can think of the posterior distribution as:
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 12/22
We'll now demonstrate how to estimate p(θ|X) using PyMC.
Doing Bayesian Inference with PyMC
Usually, the true posterior must be approximated with numerical methods. To see why, let's
return to the definition of the posterior distribution:
The denominator p(X) is the total probability of observing our data under all possible values of θ.
A more descriptive representation of this quantity is given by:
Which sums the probability of X over all values of θ. This integral usually does not have a closed-
form solution, so we need an approximation. One method of approximating our posterior is by
using Markov Chain Monte Carlo (MCMC), which generates samples in a way that mimics the
unknown distribution. We begin at a particular value, and "propose" another value as a sample
according to a stochastic process. We may reject the sample if the proposed value seems
unlikely and propose another. If we accept the proposal, we move to the new value and propose
another.
PyMC is a python package for building arbitrary probability models and obtaining samples from
the posterior distributions of unknown variables given the model. In our example, we'll use
MCMC to obtain the samples.
The prototypical PyMC program has two components:
Define all variables, and how variables depend on each other
Run an algorithm to simulate a posterior distribution
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 13/22
Let's now obtain samples from the posterior. We select our prior as a Beta(11.5,48.5). Let's see
how observing 7 clicks from 10 impressions updates our beliefs:
 
Let's walk through each line of code:
with pm.Model() as model:
import pymc3 as pm
import numpy as np
#create our data:clicks = np.array([n_clicks])
#clicks represents our successes. We observed 7 clicks.impressions = np.array(
#this represents the number of trials. There were 10 impressions.
with pm.Model() as model:
#sets a context; all code in block "belongs" to the model object
theta_prior = pm.Beta('prior', 11.5, 48.5)
#our prior distribution, Beta (11.5, 48.5)
observations = pm.Binomial('obs',n = impressions
, p = theta_prior
, observed = clicks) #Sampling distribition
#our prior p_prior will be updated with data
start = pm.find_MAP() #find good starting values for the sampling algor
#Max Aposterior values, or values that are most likely
step = pm.NUTS(state=start) #Choose a particular MCMC algorithm #w
trace = pm.sample(5000
, step
, start=start
, progressbar=True) #obtain samples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 14/22
pm.Model creates a PyMC model object. as model assigns it to the variable name "model", and
the with ... : syntax establishes a context manager. All PyMC objects created within the
context manager are added to the model object.
theta_prior = pm.Beta('prior', 11.5, 48.5)
Theta_prior represents a random variable for click-through rates. It will serve as our prior
distribution for the parameter θ, the click-through rate of our facebook-yellow-dress campaign.
This random variable is generated from a beta distribution (pm.Beta); we name this random
variable "prior" and hardcode parameter values 11.5 and 48.5. We could have set the values of
these parameters as random variables as well, but we hardcode them here as they are known.
observations = pm.Binomial('obs',n = impressions , p = theta_prior
, observed = clicks)
This statement represents the likelihood of the data under the model. Again we define the
variable name and set parameter values with n and p. Note that for this variable, the parameter p
is assigned to a random variable, indicating that we are trying to model that variable. Lastly, we
provide observed instances of the variable (i.e. our data) with the observed keyword. Because
we have said this variable is observed, the model will not try to change its values.
start = pm.find_MAP()
step = pm.NUTS(state=start)
trace = pm.sample(2000, step, start=start, progressbar=True)
These three lines define how we are going to sample values from the posterior. pm.find_MAP()
will identify values of theta that are likely in the posterior, and will serve as the starting values for
our sampler.
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 15/22
pm.NUTS(state=start) will determine which sampler to use. The sampling algorithm defines
how we propose new samples given our current state. The proposals can be done completely
randomly, in which case we'll reject samples a lot, or we can propose samples more intelligently.
NUTS (short for the No-U-Turn sample) is an intelligent sampling algorithm. Other choices
include Metropolis Hastings, Gibbs, and Slice sampling.
Lastly, pm.sample(2000, step, start=start, progressbar=True) will generate samples for
us using the sampling algorithm and starting values defined above.
Let's take the histogram of the samples obtained from PyMC to see what the most probable
values of θ are, compared with our prior distribution and the evidence (likelihood of our data for
each value of θ):
#plot the histogram of click through rates
plt.rcParams['figure.figsize'] = (16, 7)
#get histogram of samples from posterior distribution of CTRs
posterior_counts, posterior_bins = np.histogram(trace['prior']
,bins=zero_to_one)
#normalized histogramposterior_counts = posterior_counts / float(posterior_cou
#take the mean of the samples as most plausible value
most_plausible_theta = np.mean(trace['prior'])
#histogram of samples from prior distribution
prior_counts, bins = np.histogram(prior_samples
, zero_to_one)#normalize
prior_counts = map(lambda x: float(x)/prior_counts.sum()
, prior_counts)
#plot
f, ax = plt.subplots(1)
ax.plot(possible_theta_values, likelihoods)
ax.plot(bins[:-1],prior_counts, alpha = .2)
ax.plot(bins[:-1],posterior_counts)
ax.axvline(most_plausible_theta, linestyle = "--", alpha = .2)
line1, line2, line3, line4 = ax.lines
ax.legend((line1, line2, line3, line4), ('Evidence'
, 'Prior Probability for Theta'
, 'Posterior Probability for Theta'
, 'Most Plausible Theta'
), loc = 'upper left')
ax.set_xlabel("Theta")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 16/22
 
Now that we have a full distribution for the probability of various values of θ, we can take the
mean of the distribution as our most plausible value for θ, which is about 0.27.
The data has caused us to believe that the true click-through rate is higher than we originally
thought, but far lower than the 0.7 click-through rate observed so far from the facebook-yellow-
dress campaign. Why is this the case? Note how wide our likelihood function is; it's telling us that
there is a wide range of values of θ under which our data is likely. If the range of values under
which the data were plausible were narrower, then our posterior would have shifted further. See
what happens to the posterior if we observed a 0.7 click-through rate from 10, 100, 1,000, and
10,000 impressions:
ax.grid()
ax.set_title("Prior Distribution Updated with Some Evidence")
plt.show()
27
28
29
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 17/22
import pymc3 as pm
import numpy as np
#create our data:
traces = {}
for ad_impressions in [10, 100, 1000, 10000]: #maintaining observed CTR of 0.7
clicks = np.array([ctr * ad_impressions]) #re-estimate the posterior fo
impressions = np.array([ad_impressions]) #increasing numbers of impress
with pm.Model() as model:
theta_prior = pm.Beta('prior', 11.5, 48.5)
observations = pm.Binomial('obs',n = impressions
, p = theta_prior
, observed = clicks)
start = pm.find_MAP()
step = pm.NUTS(state=start)
trace = pm.sample(5000
, step
, start=start
, progressbar=True)
traces[ad_impressions] = trace
f, ax = plt.subplots(1)
ax.plot(bins[:-1],prior_counts, alpha = .2)
counts = {}
for ad_impressions in [10, 100, 1000, 10000]:
trace = traces[ad_impressions]
posterior_counts, posterior_bins = np.histogram(trace['prior'], bins=[j/10
posterior_counts = posterior_counts / float(len(trace))
ax.plot(bins[:-1], posterior_counts)
line0, line1, line2, line3, line4 = ax.lines
ax.legend((line0, line1, line2, line3, line4), ('Prior Distribution'
,'Posterior after 10 Impressio
, 'Posterior after 100 Impress
, 'Posterior after 1000 Impres
,'Posterior after 10000 Impres
ax.set_xlabel("Theta")
ax.axvline(ctr, linestyle = "--", alpha = .5)
ax.grid()
ax.set_ylabel("Probability of Theta")
ax.set_title("Posterior Shifts as Weight of Evidence Increases")
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 18/22
 
As we obtain more and more data, we are more certain that the 0.7 success rate is the true
success rate. Conditioning on more data as we update our prior, the likelihood function begins
to play a larger role in our ultimate assessment because the weight of the evidence gets
stronger. This would be particularly useful in practice if we wanted a continuous, fair assessment
of how our campaigns are performing without having to worry about overfitting to a small
sample.
There are a lot of concepts are beyond the scope of this tutorial, but are important for doing
Bayesian analysis successfully, such as how to choose a prior, which sampling algorithm to
choose, determining if the sampler is giving us good samplers, or checking for sampler
convergence. Hopefully this tutorial inspires you to continue exploring the fascinating world of
Bayesian inference.  
Additional Information 
Books:
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 19/22
Bayesian Data Analysis by Andrew Gelman 
The Bayesian Choice by Christian P. Robert 
Jaynes' Probability Theory
Software:
PyMC 
Stan 
Dimple 
Figaro
Hakaru 
Papers:
Overview of MCMC
Historical Discussion of Bayesian Probability
Other: 
Bayesian Methods for Hackers 
Want to keep learning? Download our new study from Forrester about the tools and
practices keeping companies on the forefront of data science.
 
Photo by mattbuck. [CC BY-SA 3.0], via Wikimedia Commons
 
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 20/22
  
AARON KRAMER
Data analyst at DataScience. I'm into basketball, python, machine learning,
algorithms and economics.
Enjoyed this post? Don't forget to share.
RELATED CONTENT
 
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 21/22
What Skills Does a Data
Scientist Actually Need? A
Guide to the Most Popular
Data Jobs
READ MORE 
Mixing Business with Data: An
Elevate Recap
READ MORE 
Data Science Will Be More
Complex and Collaborative in
2018
READ MORE 
Assessing Evidence for
Causality Using the E-Value
READ MORE 
SUBSCRIBE TO OUR NEWSLETTER
Enter email address 
5/23/2018 Introduction to Bayesian Inference
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 22/22
      
© 2018 DataScience.com All Rights Reserved
Platform Solutions Resources Tools Company

More Related Content

What's hot

hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forestJaey Jeong
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
The Method Of Maximum Likelihood
The Method Of Maximum LikelihoodThe Method Of Maximum Likelihood
The Method Of Maximum LikelihoodMax Chipulu
 
Supervised learning
Supervised learningSupervised learning
Supervised learningankit_ppt
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Pythonfreshdatabos
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
Bayesian Non-parametric Models for Data Science using PyMC
 Bayesian Non-parametric Models for Data Science using PyMC Bayesian Non-parametric Models for Data Science using PyMC
Bayesian Non-parametric Models for Data Science using PyMCMLReview
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?Wayne Lee
 
POLYGLOT-NER: Massive Multilingual Named Entity Recognition
POLYGLOT-NER: Massive Multilingual Named Entity RecognitionPOLYGLOT-NER: Massive Multilingual Named Entity Recognition
POLYGLOT-NER: Massive Multilingual Named Entity RecognitionBryan Perozzi
 
R MarkdownとBeamerでプレゼンテーション資料作成
R MarkdownとBeamerでプレゼンテーション資料作成R MarkdownとBeamerでプレゼンテーション資料作成
R MarkdownとBeamerでプレゼンテーション資料作成Hiroki Itô
 
PRML輪読#12
PRML輪読#12PRML輪読#12
PRML輪読#12matsuolab
 
To explain or to predict
To explain or to predictTo explain or to predict
To explain or to predictGalit Shmueli
 
ベイズ統計入門
ベイズ統計入門ベイズ統計入門
ベイズ統計入門Miyoshi Yuya
 
アンサンブル木モデル解釈のためのモデル簡略化法
アンサンブル木モデル解釈のためのモデル簡略化法アンサンブル木モデル解釈のためのモデル簡略化法
アンサンブル木モデル解釈のためのモデル簡略化法Satoshi Hara
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes ClassifierArunabha Saha
 

What's hot (20)

hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Xgboost
XgboostXgboost
Xgboost
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
The Method Of Maximum Likelihood
The Method Of Maximum LikelihoodThe Method Of Maximum Likelihood
The Method Of Maximum Likelihood
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Bayesian Non-parametric Models for Data Science using PyMC
 Bayesian Non-parametric Models for Data Science using PyMC Bayesian Non-parametric Models for Data Science using PyMC
Bayesian Non-parametric Models for Data Science using PyMC
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?
 
POLYGLOT-NER: Massive Multilingual Named Entity Recognition
POLYGLOT-NER: Massive Multilingual Named Entity RecognitionPOLYGLOT-NER: Massive Multilingual Named Entity Recognition
POLYGLOT-NER: Massive Multilingual Named Entity Recognition
 
R MarkdownとBeamerでプレゼンテーション資料作成
R MarkdownとBeamerでプレゼンテーション資料作成R MarkdownとBeamerでプレゼンテーション資料作成
R MarkdownとBeamerでプレゼンテーション資料作成
 
PRML輪読#12
PRML輪読#12PRML輪読#12
PRML輪読#12
 
To explain or to predict
To explain or to predictTo explain or to predict
To explain or to predict
 
ベイズ統計入門
ベイズ統計入門ベイズ統計入門
ベイズ統計入門
 
アンサンブル木モデル解釈のためのモデル簡略化法
アンサンブル木モデル解釈のためのモデル簡略化法アンサンブル木モデル解釈のためのモデル簡略化法
アンサンブル木モデル解釈のためのモデル簡略化法
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 

Similar to Introduction to Bayesian Inference

Big data camp la futures so bright tim-shea
Big data camp la   futures so bright tim-sheaBig data camp la   futures so bright tim-shea
Big data camp la futures so bright tim-sheaData Con LA
 
Data Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkData Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkDavid Chiu
 
Media buying optimization
Media buying optimizationMedia buying optimization
Media buying optimizationWhirl Data
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Prognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White PaperPrognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White PaperImpetus Technologies
 
Pollyanna Document Classifier
Pollyanna Document ClassifierPollyanna Document Classifier
Pollyanna Document ClassifierVijay PG
 
Lobsters, Wine and Market Research
Lobsters, Wine and Market ResearchLobsters, Wine and Market Research
Lobsters, Wine and Market ResearchTed Clark
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dssNiyitegekabilly
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsMax Pagels
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
A Review on Subjectivity Analysis through Text Classification Using Mining Te...
A Review on Subjectivity Analysis through Text Classification Using Mining Te...A Review on Subjectivity Analysis through Text Classification Using Mining Te...
A Review on Subjectivity Analysis through Text Classification Using Mining Te...IJERA Editor
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
Calculus in Machine Learning
Calculus in Machine Learning Calculus in Machine Learning
Calculus in Machine Learning Gokul Jayan
 
Lightning talk on the future of analytics - CloudCamp London, 2016
Lightning talk on the future of analytics - CloudCamp London, 2016 Lightning talk on the future of analytics - CloudCamp London, 2016
Lightning talk on the future of analytics - CloudCamp London, 2016 Jon Hawes
 
Hypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxHypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxadampcarr67227
 

Similar to Introduction to Bayesian Inference (20)

Big data camp la futures so bright tim-shea
Big data camp la   futures so bright tim-sheaBig data camp la   futures so bright tim-shea
Big data camp la futures so bright tim-shea
 
Data Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkData Analysis - Making Big Data Work
Data Analysis - Making Big Data Work
 
Bivariate Regression
Bivariate RegressionBivariate Regression
Bivariate Regression
 
Naive.pdf
Naive.pdfNaive.pdf
Naive.pdf
 
Media buying optimization
Media buying optimizationMedia buying optimization
Media buying optimization
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Prognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White PaperPrognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White Paper
 
Pollyanna Document Classifier
Pollyanna Document ClassifierPollyanna Document Classifier
Pollyanna Document Classifier
 
Lobsters, Wine and Market Research
Lobsters, Wine and Market ResearchLobsters, Wine and Market Research
Lobsters, Wine and Market Research
 
Bayesian reasoning
Bayesian reasoningBayesian reasoning
Bayesian reasoning
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual Bandits
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
A Review on Subjectivity Analysis through Text Classification Using Mining Te...
A Review on Subjectivity Analysis through Text Classification Using Mining Te...A Review on Subjectivity Analysis through Text Classification Using Mining Te...
A Review on Subjectivity Analysis through Text Classification Using Mining Te...
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
Calculus in Machine Learning
Calculus in Machine Learning Calculus in Machine Learning
Calculus in Machine Learning
 
Lightning talk on the future of analytics - CloudCamp London, 2016
Lightning talk on the future of analytics - CloudCamp London, 2016 Lightning talk on the future of analytics - CloudCamp London, 2016
Lightning talk on the future of analytics - CloudCamp London, 2016
 
Linear_Regression
Linear_RegressionLinear_Regression
Linear_Regression
 
The Art and Science of Data-Driven Creativity (in Advertising) - Ken Gamage, ...
The Art and Science of Data-Driven Creativity (in Advertising) - Ken Gamage, ...The Art and Science of Data-Driven Creativity (in Advertising) - Ken Gamage, ...
The Art and Science of Data-Driven Creativity (in Advertising) - Ken Gamage, ...
 
Hypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxHypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docx
 

More from Steven Scott

02.bayesian learning
02.bayesian learning02.bayesian learning
02.bayesian learningSteven Scott
 
Mixture conditional-density
Mixture conditional-densityMixture conditional-density
Mixture conditional-densitySteven Scott
 
01.conditional prob
01.conditional prob01.conditional prob
01.conditional probSteven Scott
 
Bayesian inference and the pest of premature interpretation.
Bayesian inference and the pest of premature interpretation.Bayesian inference and the pest of premature interpretation.
Bayesian inference and the pest of premature interpretation.Steven Scott
 
02.bayesian learning
02.bayesian learning02.bayesian learning
02.bayesian learningSteven Scott
 
“I. Conjugate priors”
“I. Conjugate priors”“I. Conjugate priors”
“I. Conjugate priors”Steven Scott
 
01.ConditionalProb.pdf in the Bayes_intro folder
01.ConditionalProb.pdf in the Bayes_intro folder01.ConditionalProb.pdf in the Bayes_intro folder
01.ConditionalProb.pdf in the Bayes_intro folderSteven Scott
 
00Overview PDF in the Bayes_intro folder
00Overview PDF in the Bayes_intro folder00Overview PDF in the Bayes_intro folder
00Overview PDF in the Bayes_intro folderSteven Scott
 
Using Statistics to Conduct More Efficient Searches
Using Statistics to Conduct More Efficient SearchesUsing Statistics to Conduct More Efficient Searches
Using Statistics to Conduct More Efficient SearchesSteven Scott
 

More from Steven Scott (10)

bayesian learning
bayesian learningbayesian learning
bayesian learning
 
02.bayesian learning
02.bayesian learning02.bayesian learning
02.bayesian learning
 
Mixture conditional-density
Mixture conditional-densityMixture conditional-density
Mixture conditional-density
 
01.conditional prob
01.conditional prob01.conditional prob
01.conditional prob
 
Bayesian inference and the pest of premature interpretation.
Bayesian inference and the pest of premature interpretation.Bayesian inference and the pest of premature interpretation.
Bayesian inference and the pest of premature interpretation.
 
02.bayesian learning
02.bayesian learning02.bayesian learning
02.bayesian learning
 
“I. Conjugate priors”
“I. Conjugate priors”“I. Conjugate priors”
“I. Conjugate priors”
 
01.ConditionalProb.pdf in the Bayes_intro folder
01.ConditionalProb.pdf in the Bayes_intro folder01.ConditionalProb.pdf in the Bayes_intro folder
01.ConditionalProb.pdf in the Bayes_intro folder
 
00Overview PDF in the Bayes_intro folder
00Overview PDF in the Bayes_intro folder00Overview PDF in the Bayes_intro folder
00Overview PDF in the Bayes_intro folder
 
Using Statistics to Conduct More Efficient Searches
Using Statistics to Conduct More Efficient SearchesUsing Statistics to Conduct More Efficient Searches
Using Statistics to Conduct More Efficient Searches
 

Recently uploaded

定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一Fs sss
 
Ethics of Animal Research Laika mission.ppt
Ethics of Animal Research Laika mission.pptEthics of Animal Research Laika mission.ppt
Ethics of Animal Research Laika mission.pptShafqatShakeel1
 
NPPE STUDY GUIDE - NOV2021_study_104040.pdf
NPPE STUDY GUIDE - NOV2021_study_104040.pdfNPPE STUDY GUIDE - NOV2021_study_104040.pdf
NPPE STUDY GUIDE - NOV2021_study_104040.pdfDivyeshPatel234692
 
Outsmarting the Attackers A Deep Dive into Threat Intelligence.docx
Outsmarting the Attackers A Deep Dive into Threat Intelligence.docxOutsmarting the Attackers A Deep Dive into Threat Intelligence.docx
Outsmarting the Attackers A Deep Dive into Threat Intelligence.docxmanas23pgdm157
 
办理学位证(Massey证书)新西兰梅西大学毕业证成绩单原版一比一
办理学位证(Massey证书)新西兰梅西大学毕业证成绩单原版一比一办理学位证(Massey证书)新西兰梅西大学毕业证成绩单原版一比一
办理学位证(Massey证书)新西兰梅西大学毕业证成绩单原版一比一A SSS
 
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量sehgh15heh
 
MIdterm Review International Trade.pptx review
MIdterm Review International Trade.pptx reviewMIdterm Review International Trade.pptx review
MIdterm Review International Trade.pptx reviewSheldon Byron
 
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一fjjwgk
 
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一z xss
 
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
定制(NYIT毕业证书)美国纽约理工学院毕业证成绩单原版一比一
定制(NYIT毕业证书)美国纽约理工学院毕业证成绩单原版一比一定制(NYIT毕业证书)美国纽约理工学院毕业证成绩单原版一比一
定制(NYIT毕业证书)美国纽约理工学院毕业证成绩单原版一比一2s3dgmej
 
tools in IDTelated to first year vtu students is useful where they can refer ...
tools in IDTelated to first year vtu students is useful where they can refer ...tools in IDTelated to first year vtu students is useful where they can refer ...
tools in IDTelated to first year vtu students is useful where they can refer ...vinbld123
 
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCRdollysharma2066
 
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一F La
 
Call Girl in Low Price Delhi Punjabi Bagh 9711199012
Call Girl in Low Price Delhi Punjabi Bagh  9711199012Call Girl in Low Price Delhi Punjabi Bagh  9711199012
Call Girl in Low Price Delhi Punjabi Bagh 9711199012sapnasaifi408
 
VIP Call Girls Service Saharanpur Aishwarya 8250192130 Independent Escort Ser...
VIP Call Girls Service Saharanpur Aishwarya 8250192130 Independent Escort Ser...VIP Call Girls Service Saharanpur Aishwarya 8250192130 Independent Escort Ser...
VIP Call Girls Service Saharanpur Aishwarya 8250192130 Independent Escort Ser...Suhani Kapoor
 
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证nhjeo1gg
 
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607dollysharma2066
 
Digital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, IndiaDigital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, IndiaDigital Discovery Institute
 
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样umasea
 

Recently uploaded (20)

定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
 
Ethics of Animal Research Laika mission.ppt
Ethics of Animal Research Laika mission.pptEthics of Animal Research Laika mission.ppt
Ethics of Animal Research Laika mission.ppt
 
NPPE STUDY GUIDE - NOV2021_study_104040.pdf
NPPE STUDY GUIDE - NOV2021_study_104040.pdfNPPE STUDY GUIDE - NOV2021_study_104040.pdf
NPPE STUDY GUIDE - NOV2021_study_104040.pdf
 
Outsmarting the Attackers A Deep Dive into Threat Intelligence.docx
Outsmarting the Attackers A Deep Dive into Threat Intelligence.docxOutsmarting the Attackers A Deep Dive into Threat Intelligence.docx
Outsmarting the Attackers A Deep Dive into Threat Intelligence.docx
 
办理学位证(Massey证书)新西兰梅西大学毕业证成绩单原版一比一
办理学位证(Massey证书)新西兰梅西大学毕业证成绩单原版一比一办理学位证(Massey证书)新西兰梅西大学毕业证成绩单原版一比一
办理学位证(Massey证书)新西兰梅西大学毕业证成绩单原版一比一
 
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量
 
MIdterm Review International Trade.pptx review
MIdterm Review International Trade.pptx reviewMIdterm Review International Trade.pptx review
MIdterm Review International Trade.pptx review
 
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一
 
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一
 
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档
 
定制(NYIT毕业证书)美国纽约理工学院毕业证成绩单原版一比一
定制(NYIT毕业证书)美国纽约理工学院毕业证成绩单原版一比一定制(NYIT毕业证书)美国纽约理工学院毕业证成绩单原版一比一
定制(NYIT毕业证书)美国纽约理工学院毕业证成绩单原版一比一
 
tools in IDTelated to first year vtu students is useful where they can refer ...
tools in IDTelated to first year vtu students is useful where they can refer ...tools in IDTelated to first year vtu students is useful where they can refer ...
tools in IDTelated to first year vtu students is useful where they can refer ...
 
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR
 
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一
 
Call Girl in Low Price Delhi Punjabi Bagh 9711199012
Call Girl in Low Price Delhi Punjabi Bagh  9711199012Call Girl in Low Price Delhi Punjabi Bagh  9711199012
Call Girl in Low Price Delhi Punjabi Bagh 9711199012
 
VIP Call Girls Service Saharanpur Aishwarya 8250192130 Independent Escort Ser...
VIP Call Girls Service Saharanpur Aishwarya 8250192130 Independent Escort Ser...VIP Call Girls Service Saharanpur Aishwarya 8250192130 Independent Escort Ser...
VIP Call Girls Service Saharanpur Aishwarya 8250192130 Independent Escort Ser...
 
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证
 
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
 
Digital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, IndiaDigital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, India
 
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样
 

Introduction to Bayesian Inference

  • 1. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 1/22 Prerequisites This post is an introduction to Bayesian probability and inference. We will discuss the intuition behind these concepts, and provide some examples written in Python to help you get started. To get the most out of this introduction, the reader should have a basic understanding of statistics and probability, as well as some experience with Python. The examples use the Python package pymc3. Introduction to Bayesian Thinking USE CASES, LEARN DATA SCIENCE, PYTHON, DATA VISUALIZATION Introduction to Bayesian Inference Aaron Kramer 12.12.16 
  • 2. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 2/22 Bayesian inference is an extremely powerful set of tools for modeling any random variable, such as the value of a regression parameter, a demographic statistic, a business KPI, or the part of speech of a word. We provide our understanding of a problem and some data, and in return get a quantitative measure of how certain we are of a particular fact. This approach to modeling uncertainty is particularly useful when: Data is limited We're worried about overfitting We have reason to believe that some facts are more likely than others, but that information is not contained in the data we model on We're interested in precisely knowing how likely certain facts are, as opposed to just picking the most likely fact The table below enumerates some applied tasks that exhibit these challenges, and describes how Bayesian inference can be used to solve them. Don't worry if the Bayesian solutions are foreign to you, they will make more sense as you read this post:
  • 3. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 3/22   Typically, Bayesian inference is a term used as a counterpart to frequentist inference. This can be confusing, as the lines drawn between the two approaches are blurry. The true Bayesian and frequentist distinction is that of philosophical differences between how people interpret what probability is. We'll focus on Bayesian concepts that are foreign to traditional frequentist approaches and are actually used in applied work, specifically the prior and posterior distributions. Consider Bayes' theorem: Think of A as some proposition about the world, and B as some data or evidence. For example, A represents the proposition that it rained today, and B represents the evidence that the sidewalk outside is wet: 
  • 4. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 4/22 p(rain | wet) asks, "What is the probability that it rained given that it is wet outside?" To evaluate this question, let's walk through the right side of the equation. Before looking at the ground, what is the probability that it rained, p(rain)? Think of this as the plausibility of an assumption about the world. We then ask how likely the observation that it is wet outside is under that assumption, p(wet | rain)? This procedure effectively updates our initial beliefs about a proposition with some observation, yielding a final measure of the plausibility of rain, given the evidence. This procedure is the basis for Bayesian inference, where our initial beliefs are represented by the prior distribution p(rain), and our final beliefs are represented by the posterior distribution p(rain | wet). The denominator simply asks, "What is the total plausibility of the evidence?", whereby we have to consider all assumptions to ensure that the posterior is a proper probability distribution. Bayesians are uncertain about what is true (the value of a KPI, a regression coefficient, etc.), and use data as evidence that certain facts are more likely than others. Prior distributions reflect our beliefs before seeing any data, and posterior distributions reflect our beliefs after we have considered all the evidence. To unpack what that means and how to leverage these concepts for actual analysis, let's consider the example of evaluating new marketing campaigns. Example: Evaluating New Marketing Campaigns Using Bayesian Inference  Assume that we run an ecommerce platform for clothing and in order to bring people to our site, we deploy several digital marketing campaigns. These campaigns feature various ad images and captions, and are presented on a number of social networking websites. We want to present the ads that are the most successful. For the sake of simplicity, we can assume that the most successful campaign is the one that results in the highest click-through rate: the ads that are most likely to be clicked if shown. We introduce a new campaign called "facebook-yellow-dress," a campaign presented to Facebook users featuring a yellow dress. The ad has been presented to 10 users so far, and 7 of
  • 5. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 5/22 the users have clicked on it. We would like to estimate the probability that the next user will click on the ad.  By encoding a click as a success and a non-click as a failure, we're estimating the probability θ that a given user will click on the ad. Naturally, we are going to use the campaign's historical record as evidence. Because we are considering unordered draws of an event that can be either 0 or 1, we can infer the probability θ by considering the campaign's history as a sample from a binomial distribution, with probability of success θ. Traditional approaches of inference consider multiple values of θ and pick the value that is most aligned with the data. This is known as maximum likelihood, because we're evaluating how likely our data is under various assumptions and choosing the best assumption as true. More formally:  argmax p(X |θ), where X is the data we've observed. Here, p(X |θ) is our likelihood function; if we fix the parameter θ, what is the probability of observing the data we've seen? Let's look at the likelihood of various values of θ given the data we have for facebook-yellow-dress: θ import numpy as np from scipy.misc import factorial import matplotlib.pyplot as plt %matplotlib inline plt.rcParams['figure.figsize'] = (16,7) def likelihood(theta, n, x): """ likelihood function for a binomial distribution n: [int] the number of experiments x: [int] the number of successes theta: [float] the proposed probability of success """ return (factorial(n) / (factorial(x) * factorial(n - x))) * (theta ** x) * ((1 - theta) ** (n - x)) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
  • 6. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 6/22    #the number of impressions for our facebook-yellow-dress campaignn_impressions #the number of clicks for our facebook-yellow-dress campaign n_clicks = 7. #observed click through rate ctr = n_clicks / n_impressions #0 to 1, all possible click through rates possible_theta_values = map(lambda x: x/100., range(100)) #evaluate the likelihood function for possible click through rates likelihoods = map(lambda theta: likelihood(theta, n, x) , possible_theta_values) #pick the best theta mle = possible_theta_values[np.argmax(likelihoods)] #plot f, ax = plt.subplots(1) ax.plot(possible_theta_values, likelihoods) ax.axvline(mle, linestyle = "--") ax.set_xlabel("Theta") ax.set_ylabel("Likelihood") ax.grid() ax.set_title("Likelihood of Theta for New Campaign") plt.show() 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
  • 7. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 7/22   Of the 10 people we showed the new ad to, 7 of them clicked on it. So naturally, our likelihood function is telling us that the most likely value of theta is 0.7. However, some of our analysts are skeptical. The performance of this campaign seems extremely high given how our other campaigns have done historically. Let's overlay this likelihood function with the distribution of click-through rates from our previous 100 campaigns: plt.rcParams['figure.figsize'] = (16, 7) import numpy as np import pandas as pd true_a = 11.5 true_b = 48.5 #number of marketing campaigns N = 100#randomly generate "true" click through rate for each campaign p = np.random.beta(true_a,true_b, size=N) #randomly pick the number of impressions for each campaign impressions = np.random.randint(1, 10000, size=N) #sample number of clicks for each campaign clicks = np.random.binomial(impressions, p).astype(float) click_through_rates = clicks / impressions #plot the histogram of previous click through rates with the evidence#of the n f, ax = plt.subplots(1) ax.axvline(mle, linestyle = "--") 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
  • 8. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 8/22   Clearly, the maximum likelihood method is giving us a value that is outside what we would normally see. Perhaps our analysts are right to be skeptical; as the campaign continues to run, its click-through rate could decrease. Alternatively, this campaign could be truly outperforming all previous campaigns. We can't be sure. Ideally, we would rely on other campaigns' history if we had no data from our new campaign. And ax.plot(possible_theta_values, likelihoods) zero_to_one = [j/100. for j in xrange(100)] counts, bins = np.histogram(click_through_rates , bins=zero_to_one) counts = counts / 100. ax.plot(bins[:-1],counts, alpha = .5) line1, line2, line3 = ax.lines ax.legend((line2, line3), ('Likelihood of Theta for New Campaign' , 'Frequency of Theta Historically') , loc = 'upper left') ax.set_xlabel("Theta") ax.grid() ax.set_title("Evidence vs Historical Click Through Rates") plt.show() 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
  • 9. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 9/22 as we got more and more data, we would allow the new campaign data to speak for itself. The Prior Distribution This skepticism corresponds to prior probability in Bayesian inference. Before considering any data at all, we believe that certain values of θ are more likely than others, given what we know about marketing campaigns. We believe, for instance, that p(θ = 0.2)>p(θ = 0.5), since none of our previous campaigns have had click-through rates remotely close to 0.5. We express our prior beliefs of θ with p(θ). Using historical campaigns to assess p(θ) is our choice as a researcher. Generally, prior distributions can be chosen with many goals in mind: Informative; empirical: We have some data from related experiments and choose to leverage that data to inform our prior beliefs. Our prior beliefs will impact our final assessment. Informative; non-empirical: We have some inherent reason to prefer certain values over others. For instance, if we want to regularize a regression to prevent overfitting, we might set the prior distribution of our coefficients to have decreasing probability as we move away from 0. Our prior beliefs will impact our final assessment. Informative; domain-knowledge: Though we do not have supporting data, we know as domain experts that certain facts are more true than others. Our prior beliefs will impact our final assessment. Non-informative: Our prior beliefs will have little to no effect on our final assessment. We want the data to speak for itself. For our example, because we have related data and limited data on the new campaign, we will use an informative, empirical prior. We will choose a beta distribution for our prior for θ. The beta distribution is a 2 parameter (α, β) distribution that is often used as a prior for the θ parameter of the binomial distribution. Because we want to use our previous campaigns as the basis for our prior beliefs, we will determine α and β by fitting a beta distribution to our historical click-through rates. Below, we fit the beta distribution and compare the estimated prior distribution with previous click-through rates to ensure the two are properly aligned:
  • 10. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 10/22   from scipy.stats import beta #fit beta to previous CTRs prior_parameters = beta.fit(click_through_rates , floc = 0 , fscale = 1)#extract a,b from fit prior_a, prior_b = prior_parameters[0:2] #define prior distribution sample from prior prior_distribution = beta(prior_a, prior_b) #get histogram of samples prior_samples = prior_distribution.rvs(10000) #get histogram of samples fit_counts, bins = np.histogram(prior_samples , zero_to_one)#normalize histogram fit_counts = map(lambda x: float(x)/fit_counts.sum() , fit_counts) #plot f, ax = plt.subplots(1) ax.plot(bins[:-1], fit_counts) hist_ctr, bins = np.histogram(click_through_rates , zero_to_one) hist_ctr = map(lambda x: float(x)/hist_ctr.sum() , hist_ctr) ax.plot(bins[:-1], hist_ctr) estimated_prior, previous_click_through_rates = ax.lines ax.legend((estimated_prior, previous_click_through_rates) ,('Estimated Prior' , 'Previous Click Through Rates')) ax.grid() ax.set_title("Comparing Empirical Prior with Previous Click Through Rates") plt.show() 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
  • 11. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 11/22 We find that the best values of α and β are 11.5 and 48.5, respectively. The beta distribution with these parameters does a good job capturing the click-through rates from our previous campaigns, so we will use it as our prior. We will now update our prior beliefs with the data from the facebook-yellow-dress campaign to form our posterior distribution. The Posterior Distribution After considering the 10 impressions of data we have for the facebook-yellow-dress campaign, the posterior distribution of θ gives us plausibility of any click-through rate from 0 to 1. The effect of our data, or our evidence, is provided by the likelihood function, p(X|θ). What we are ultimately interested in is the plausibility of all proposed values of θ given our data or our posterior distribution p(θ|X). From the earlier section introducing Bayes' Theorem, our posterior distribution is given by the product of our likelihood function and our prior distribution: Since p(X) is a constant, as it does not depend on θ, we can think of the posterior distribution as:
  • 12. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 12/22 We'll now demonstrate how to estimate p(θ|X) using PyMC. Doing Bayesian Inference with PyMC Usually, the true posterior must be approximated with numerical methods. To see why, let's return to the definition of the posterior distribution: The denominator p(X) is the total probability of observing our data under all possible values of θ. A more descriptive representation of this quantity is given by: Which sums the probability of X over all values of θ. This integral usually does not have a closed- form solution, so we need an approximation. One method of approximating our posterior is by using Markov Chain Monte Carlo (MCMC), which generates samples in a way that mimics the unknown distribution. We begin at a particular value, and "propose" another value as a sample according to a stochastic process. We may reject the sample if the proposed value seems unlikely and propose another. If we accept the proposal, we move to the new value and propose another. PyMC is a python package for building arbitrary probability models and obtaining samples from the posterior distributions of unknown variables given the model. In our example, we'll use MCMC to obtain the samples. The prototypical PyMC program has two components: Define all variables, and how variables depend on each other Run an algorithm to simulate a posterior distribution
  • 13. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 13/22 Let's now obtain samples from the posterior. We select our prior as a Beta(11.5,48.5). Let's see how observing 7 clicks from 10 impressions updates our beliefs:   Let's walk through each line of code: with pm.Model() as model: import pymc3 as pm import numpy as np #create our data:clicks = np.array([n_clicks]) #clicks represents our successes. We observed 7 clicks.impressions = np.array( #this represents the number of trials. There were 10 impressions. with pm.Model() as model: #sets a context; all code in block "belongs" to the model object theta_prior = pm.Beta('prior', 11.5, 48.5) #our prior distribution, Beta (11.5, 48.5) observations = pm.Binomial('obs',n = impressions , p = theta_prior , observed = clicks) #Sampling distribition #our prior p_prior will be updated with data start = pm.find_MAP() #find good starting values for the sampling algor #Max Aposterior values, or values that are most likely step = pm.NUTS(state=start) #Choose a particular MCMC algorithm #w trace = pm.sample(5000 , step , start=start , progressbar=True) #obtain samples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
  • 14. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 14/22 pm.Model creates a PyMC model object. as model assigns it to the variable name "model", and the with ... : syntax establishes a context manager. All PyMC objects created within the context manager are added to the model object. theta_prior = pm.Beta('prior', 11.5, 48.5) Theta_prior represents a random variable for click-through rates. It will serve as our prior distribution for the parameter θ, the click-through rate of our facebook-yellow-dress campaign. This random variable is generated from a beta distribution (pm.Beta); we name this random variable "prior" and hardcode parameter values 11.5 and 48.5. We could have set the values of these parameters as random variables as well, but we hardcode them here as they are known. observations = pm.Binomial('obs',n = impressions , p = theta_prior , observed = clicks) This statement represents the likelihood of the data under the model. Again we define the variable name and set parameter values with n and p. Note that for this variable, the parameter p is assigned to a random variable, indicating that we are trying to model that variable. Lastly, we provide observed instances of the variable (i.e. our data) with the observed keyword. Because we have said this variable is observed, the model will not try to change its values. start = pm.find_MAP() step = pm.NUTS(state=start) trace = pm.sample(2000, step, start=start, progressbar=True) These three lines define how we are going to sample values from the posterior. pm.find_MAP() will identify values of theta that are likely in the posterior, and will serve as the starting values for our sampler.
  • 15. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 15/22 pm.NUTS(state=start) will determine which sampler to use. The sampling algorithm defines how we propose new samples given our current state. The proposals can be done completely randomly, in which case we'll reject samples a lot, or we can propose samples more intelligently. NUTS (short for the No-U-Turn sample) is an intelligent sampling algorithm. Other choices include Metropolis Hastings, Gibbs, and Slice sampling. Lastly, pm.sample(2000, step, start=start, progressbar=True) will generate samples for us using the sampling algorithm and starting values defined above. Let's take the histogram of the samples obtained from PyMC to see what the most probable values of θ are, compared with our prior distribution and the evidence (likelihood of our data for each value of θ): #plot the histogram of click through rates plt.rcParams['figure.figsize'] = (16, 7) #get histogram of samples from posterior distribution of CTRs posterior_counts, posterior_bins = np.histogram(trace['prior'] ,bins=zero_to_one) #normalized histogramposterior_counts = posterior_counts / float(posterior_cou #take the mean of the samples as most plausible value most_plausible_theta = np.mean(trace['prior']) #histogram of samples from prior distribution prior_counts, bins = np.histogram(prior_samples , zero_to_one)#normalize prior_counts = map(lambda x: float(x)/prior_counts.sum() , prior_counts) #plot f, ax = plt.subplots(1) ax.plot(possible_theta_values, likelihoods) ax.plot(bins[:-1],prior_counts, alpha = .2) ax.plot(bins[:-1],posterior_counts) ax.axvline(most_plausible_theta, linestyle = "--", alpha = .2) line1, line2, line3, line4 = ax.lines ax.legend((line1, line2, line3, line4), ('Evidence' , 'Prior Probability for Theta' , 'Posterior Probability for Theta' , 'Most Plausible Theta' ), loc = 'upper left') ax.set_xlabel("Theta") 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
  • 16. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 16/22   Now that we have a full distribution for the probability of various values of θ, we can take the mean of the distribution as our most plausible value for θ, which is about 0.27. The data has caused us to believe that the true click-through rate is higher than we originally thought, but far lower than the 0.7 click-through rate observed so far from the facebook-yellow- dress campaign. Why is this the case? Note how wide our likelihood function is; it's telling us that there is a wide range of values of θ under which our data is likely. If the range of values under which the data were plausible were narrower, then our posterior would have shifted further. See what happens to the posterior if we observed a 0.7 click-through rate from 10, 100, 1,000, and 10,000 impressions: ax.grid() ax.set_title("Prior Distribution Updated with Some Evidence") plt.show() 27 28 29
  • 17. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 17/22 import pymc3 as pm import numpy as np #create our data: traces = {} for ad_impressions in [10, 100, 1000, 10000]: #maintaining observed CTR of 0.7 clicks = np.array([ctr * ad_impressions]) #re-estimate the posterior fo impressions = np.array([ad_impressions]) #increasing numbers of impress with pm.Model() as model: theta_prior = pm.Beta('prior', 11.5, 48.5) observations = pm.Binomial('obs',n = impressions , p = theta_prior , observed = clicks) start = pm.find_MAP() step = pm.NUTS(state=start) trace = pm.sample(5000 , step , start=start , progressbar=True) traces[ad_impressions] = trace f, ax = plt.subplots(1) ax.plot(bins[:-1],prior_counts, alpha = .2) counts = {} for ad_impressions in [10, 100, 1000, 10000]: trace = traces[ad_impressions] posterior_counts, posterior_bins = np.histogram(trace['prior'], bins=[j/10 posterior_counts = posterior_counts / float(len(trace)) ax.plot(bins[:-1], posterior_counts) line0, line1, line2, line3, line4 = ax.lines ax.legend((line0, line1, line2, line3, line4), ('Prior Distribution' ,'Posterior after 10 Impressio , 'Posterior after 100 Impress , 'Posterior after 1000 Impres ,'Posterior after 10000 Impres ax.set_xlabel("Theta") ax.axvline(ctr, linestyle = "--", alpha = .5) ax.grid() ax.set_ylabel("Probability of Theta") ax.set_title("Posterior Shifts as Weight of Evidence Increases") plt.show() 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
  • 18. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 18/22   As we obtain more and more data, we are more certain that the 0.7 success rate is the true success rate. Conditioning on more data as we update our prior, the likelihood function begins to play a larger role in our ultimate assessment because the weight of the evidence gets stronger. This would be particularly useful in practice if we wanted a continuous, fair assessment of how our campaigns are performing without having to worry about overfitting to a small sample. There are a lot of concepts are beyond the scope of this tutorial, but are important for doing Bayesian analysis successfully, such as how to choose a prior, which sampling algorithm to choose, determining if the sampler is giving us good samplers, or checking for sampler convergence. Hopefully this tutorial inspires you to continue exploring the fascinating world of Bayesian inference.   Additional Information  Books:
  • 19. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 19/22 Bayesian Data Analysis by Andrew Gelman  The Bayesian Choice by Christian P. Robert  Jaynes' Probability Theory Software: PyMC  Stan  Dimple  Figaro Hakaru  Papers: Overview of MCMC Historical Discussion of Bayesian Probability Other:  Bayesian Methods for Hackers  Want to keep learning? Download our new study from Forrester about the tools and practices keeping companies on the forefront of data science.   Photo by mattbuck. [CC BY-SA 3.0], via Wikimedia Commons  
  • 20. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 20/22    AARON KRAMER Data analyst at DataScience. I'm into basketball, python, machine learning, algorithms and economics. Enjoyed this post? Don't forget to share. RELATED CONTENT  
  • 21. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 21/22 What Skills Does a Data Scientist Actually Need? A Guide to the Most Popular Data Jobs READ MORE  Mixing Business with Data: An Elevate Recap READ MORE  Data Science Will Be More Complex and Collaborative in 2018 READ MORE  Assessing Evidence for Causality Using the E-Value READ MORE  SUBSCRIBE TO OUR NEWSLETTER Enter email address 
  • 22. 5/23/2018 Introduction to Bayesian Inference https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials 22/22        © 2018 DataScience.com All Rights Reserved Platform Solutions Resources Tools Company