SlideShare a Scribd company logo
Bayesian Learning
Steven L. Scott
In the last section, on conditional probability, we saw that Bayes’ rule can be written
p(θ|y) ∝ p(y|θ)p(θ).
The distribution p(θ) is called the prior distribution, or just “the prior,” p(y|θ) is the likelihood function,
and p(θ|y) is the posterior distribution. The prior distribution describes one’s belief about the value of θ
before seeing y. The posterior distribution describes the same person’s belief about θ after seeing y. Bayes’
theorem describes the process of learning about θ when y is observed.
1 An example
Let’s look at Bayes’ rule through an example. Suppose a biased coin with success probability θ is indepen-
dently flipped 10 times, and 3 successes are observed. The data y = 3 arise from a binomial distribution
with n = 10 and p = θ, so the likelihood is
p(y = 3|θ) =
10
3
θ3
(1 − θ)7
. (1)
What should the prior distribution be? In an abstract problem like this, most people are comfortable
assuming that there is no reason to prefer any one legal value of θ to another, which would imply the uniform
prior: p(θ) = 1 for θ ∈ (0, 1), with p(θ) = 0 otherwise. This is a common strategy in practice. In the absence
of any “real” prior information about a parameter’s value (which is a typical situation), one strives to choose
a prior that is “nearly noninformative.” We will see below that this is not always possible, but it is a useful
guiding principle. The prior and likelihood for this example are shown in the first two panels of Figure 1.
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
θ
priordensity
0.0 0.2 0.4 0.6 0.8 1.0
0.000.050.100.150.200.25
θ
likelihood
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.53.0
θ
posteriordensity
(a) (b) (c)
Figure 1: Bayesian learning in the binomial example.
To find the posterior distribution we simply multiply the prior times the likelihood (which in this case
just gives the likelihood), and normalize so that the result integrates to 1. In this case the normalization
1
constant is proportional to a mathematical special function known as the “beta function”, and the resulting
distribution is a known distribution called the “beta distribution.” The density of the beta distribution with
parameters a and b is
p(θ) =
Γ(a + b)
Γ(a)Γ(b)
θa−1
(1 − θ)b−1
. (2)
If θ is a random variable with the density function in equation (2) then we say θ ∼ Be(a, b). If we ignore
factors other than θ and 1−θ we see that in our example a−1 = 3 and b−1 = 7, so our posterior distribution
must be Be(4, 8). This distribution is plotted in Figure 1(c). Because it is simply a renormalization of the
function in Figure 1(b), the two panels differ only in the axis labels.
2 Conjugate priors
The uniform prior used in the previous section would be inappropriate if we actually had prior information
that θ was small. For example, if y counted conversions on a website, we might have historical information
about the distribution of conversion rates on similar sites. If we can describe our prior belief in the form of
a Be(a, b) distribution (i.e. if we can represent our prior beliefs by choosing specific numerical values of a
and b), then the posterior distribution after observing y successes out of n binomial trials is
p(θ|y) ∝
n
y
θy
(1 − θ)n−y
likelihood
Γ(a + b)
Γ(a)Γ(b)
θa−1
(1 − θ)b−1
prior
∝ θy+a−1
(1 − θ)n−y+b−1
.
(3)
We move from the first line of equation (3) to the second by combining the exponents of θ and 1 − θ, and
ignoring factors that don’t depend on θ. We recognize the outcome as proportional to the Be(y+a, n−y+b)
distribution. Thus “Bayesian learning” in this example amounts to adding y to a and n − y to b. That’s a
helpful way of understanding the prior parameters: a and b represent “prior successes” and “prior failures.”
When Bayes’ rule combines a likelihood and a prior in such a way that the posterior is from the same
model family as the prior, the prior is said to be conjugate to the likelihood. Most models don’t have
conjugate priors, but many models in the exponential family do. A distribution is in the exponential family
if its log density is a linear function of some function of the data. That is, if its density can be written
p(y|θ) = a(θ)b(y)ec(θ) d(y)
. (4)
Many of the famous “named” distributions are in the exponential family, including binomial, Poisson, ex-
ponential, and Gaussian. The student t distribution is an example of a “famous” distribution that is not in
the exponential family.
If a model is in the exponential family then it has sufficient statistics: i d(yi). You can find the
conjugate prior for an exponential family model by imagining equation (4) as a function of θ rather than y,
and renormalizing (assuming the integral with respect to θ is finite). This formulation makes it clear that
the parameters of the prior can be interpreted as sufficient statistics for the model, like how a and b can be
thought of as prior successes and failures in the binomial example.
A second example is the variance of a Gaussian model with known mean. Error terms in many models are
often assumed to be zero-mean Gaussian random variables, so this problem comes up frequently. Suppose
yi ∼ N 0, σ2
, independently, and let y = y1, . . . , yn. The likelihood function is
p(y|σ2
) = (2π)−n/2 1
σ2
n/2
exp −
1
2σ2
i
y2
i . (5)
2
Distribution Conjugate Prior
binomial beta
Poisson / exponential gamma
normal mean (known variance) Normal
normal precision (known mean) gamma
Table 1: Some models with conjugate priors
The expression containing 1/σ2
in equation (5) looks like the kernel of the gamma distribution. We write
θ ∼ Ga(a, b) if
p(θ|a, b) =
ba
Γ(a)
θa−1
exp(−bθ). (6)
If one assumes the prior 1/σ2
∼ Ga df
2 , ss
2 then Bayes’ rule gives
p(1/σ2
|y) ∝
1
σ2
n/2
exp −
1
2σ2
i
y2
i
likelihood
1
σ2
df
2 −1
exp −
ss
2
1
σ2
prior
∝
1
σ2
n+df
2 −1
exp −
1
σ2
ss + i y2
i
2
∝ Ga
n + df
2
,
ss + i y2
i
2
.
(7)
Notice how the parameters of the prior df and ss interact with the sufficient statistics of the model. One
can interpret df as a “prior sample size” and ss as a “prior sum of squares.”
It is important to stress that not all models have conjugate priors, and even when they do conjugate
priors may not appropriately express certain types of prior knowledge. Yet when they exist, thinking about
prior distributions through the lens of conjugate priors can help you understand the information content of
the assumed prior.
3 Posteriors compromise between prior and likelihood
Conjugate priors allow us to mathematically study the relationship between prior and likelihood. In the
binomial example with a beta prior, the Be(a, b) distribution has mean π = a/(a + b) and variance π(1 −
π)/(ν + 1), where ν = a + b. It is clear from equation (3) that a acts like a prior number of successes and b
a prior number of failures. The mean of the posterior distribution Be(a + y, b + n − y) is thus
˜π =
a + y
ν + n
= ν
a/ν
ν + n
+ n
y/n
ν + n
. (8)
Equation (8) shows the posterior mean θ is a weighted average of the prior mean a/ν and the mean of the
data y/n. The weights in the average are proportional to ν and n, which are the total information content
in the prior and the data, respectively.
The posterior variance is
˜π(1 − ˜π)
n + ν + 1
. (9)
The total amount of information in the posterior distribution is often measured by its precision, which is the
inverse (reciprocal) of its variance. The precision of Be(a + y, b + n − y) is
n
˜π(1 − ˜π)
+
ν + 1
˜π(1 − ˜π)
,
3
which is the sum of the precision from the prior and from the data.
The results shown above are not specific to the binomial distribution. In the general setting, the posterior
mean is a precision weighted average of the mean from the data and the mean from the prior, while the
inverse of the posterior variance is the sum of the prior precision and data precision. This fact helps us get
a sense of the relative importance of the prior vs the data in forming the posteriror distribution.
4 How much should you worry about the prior?
People new to Bayesian reasoning are often concerned about “assuming the answer,” in the sense that their
choice of a prior distribution will unduly influence the posterior distribution. There is good news and bad
on this front.
4.1 Likelihood dominates prior
First the good news. In regular models with moderate to large amounts of data, the data asymptotically
overwhelm the prior. Consider Figure 2, which compares a few different Beta prior distributions with the
same data, to see impact on the posterior. In panel (a) the data contain only 10 observations, so varying the
a and b parameters in the prior distribution by one or two units each represents an appreciable change in
the total available information. Panel2(b) shows the same analysis when there are 100 observations in the
data, so moving a prior parameter by one or two units doesn’t have a particularly big impact.
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.53.0
θ
density
Be(1, 1)
Be(.5, .5)
Be(2, .5)
Be(.5, 2)
0.0 0.2 0.4 0.6 0.8 1.0
02468
θ
density
Be(1, 1)
Be(.5, .5)
Be(2, .5)
Be(.5, 2)
(a) (b)
Figure 2: How the posterior distributions varies with the choice of prior. (a) 3 successes from 10 trials, (b) 30
successes from 100 trials.
Whatever prior you choose contains a fixed amount of information. If you imagine applying that prior
to larger and larger data sets, its influence will eventually vanish.
4
4.2 Sometimes priors do strange things
Now for the bad news. Even though many models are insensitive to a poorly chosen prior, not all of them
are. If your model is based on means, standard deviations, and regression coefficients, then there is a good
chance that any “weak” prior that you choose will have minimal impact. If the model has lots of latent
variables and other weakly identified unknowns, then the prior is probably more influential. Because priors
can sometimes carry more influence than intended, researchers have spent a considerable amount of time
thinking about how to best represent “prior ignorance” using a default prior. Kass and Wasserman 1996
ably summarize these efforts.
One issue that can come up is that the amount of information in a prior distributions can depend on
the scale in which one views a parameter. For example suppose you place a uniform prior on θ, but then
the analysis calls for the distribution of z = log(θ/(1 − θ)). The Jacobian of this transformation implies
f(z) = θ(1−θ), which is plotted (as a function of z) in Figure 3. The uniform prior on θ is clearly informative
for logit(θ).
−10 −5 0 5 10
0.000.050.100.150.200.25
z
density
Figure 3: The solid line shows the density of a uniform random variable on the logit scale, derived mathematically.
The histogram is the logit transform of 10,000 uniform random deviates.
4.3 Should you worry about priors?
Sometimes you need to, and sometimes you don’t. Until you get enough experience to trust your intuition
about whether a prior is worth worrying about, it is prudent to try an analysis under a few different choices
of prior. You can vary the prior parameters among a few reasonable values, or you can experiment to see
just how extreme the prior would need to be to derail the analysis.
In their paper, Kass and Wasserman made the point that problems where weak priors can make a big
difference tend to be “hard” problems where there is not much information in the data, in which case a
non-Bayesian analysis wouldn’t be particularly compelling (or in some cases, wouldn’t be possible). If you
find that modest variations in the prior lead to different conclusions, then you’re in a hard problem. In that
case a practical strategy is to think about the scale on which you want to analyze your model, and choose
a prior that represents reasonable assumptions on that scale. State your assumptions up front, and present
5
the results with a 2-3 other prior choices to show their impact. Then proceed with your chosen prior for the
rest of the analysis.
6

More Related Content

What's hot

Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
Federico Cerutti
 
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Jonathan Zimmermann
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
Christian Robert
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testing
jemille6
 
Stability criterion of periodic oscillations in a (9)
Stability criterion of periodic oscillations in a (9)Stability criterion of periodic oscillations in a (9)
Stability criterion of periodic oscillations in a (9)
Alexander Decker
 
eatonmuirheadsoaita
eatonmuirheadsoaitaeatonmuirheadsoaita
eatonmuirheadsoaita
Robb Muirhead
 
The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options Trader
Ashwin Rao
 
Characterization of student’s t distribution with some application to finance
Characterization of student’s t  distribution with some application to financeCharacterization of student’s t  distribution with some application to finance
Characterization of student’s t distribution with some application to finance
Alexander Decker
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
Ashwin Rao
 
testing as a mixture estimation problem
testing as a mixture estimation problemtesting as a mixture estimation problem
testing as a mixture estimation problem
Christian Robert
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Christian Robert
 
ISBA 2016: Foundations
ISBA 2016: FoundationsISBA 2016: Foundations
ISBA 2016: Foundations
Christian Robert
 
Equivariance
EquivarianceEquivariance
Equivariance
mustafa sarac
 
Probability distributionv1
Probability distributionv1Probability distributionv1
Probability distributionv1
Beatrice van Eden
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016
Christian Robert
 
Chapter4
Chapter4Chapter4
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
Christian Robert
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
Junya Tanaka
 
Probability/Statistics Lecture Notes 4: Hypothesis Testing
Probability/Statistics Lecture Notes 4: Hypothesis TestingProbability/Statistics Lecture Notes 4: Hypothesis Testing
Probability/Statistics Lecture Notes 4: Hypothesis Testing
jemille6
 

What's hot (19)

Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testing
 
Stability criterion of periodic oscillations in a (9)
Stability criterion of periodic oscillations in a (9)Stability criterion of periodic oscillations in a (9)
Stability criterion of periodic oscillations in a (9)
 
eatonmuirheadsoaita
eatonmuirheadsoaitaeatonmuirheadsoaita
eatonmuirheadsoaita
 
The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options Trader
 
Characterization of student’s t distribution with some application to finance
Characterization of student’s t  distribution with some application to financeCharacterization of student’s t  distribution with some application to finance
Characterization of student’s t distribution with some application to finance
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
 
testing as a mixture estimation problem
testing as a mixture estimation problemtesting as a mixture estimation problem
testing as a mixture estimation problem
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
 
ISBA 2016: Foundations
ISBA 2016: FoundationsISBA 2016: Foundations
ISBA 2016: Foundations
 
Equivariance
EquivarianceEquivariance
Equivariance
 
Probability distributionv1
Probability distributionv1Probability distributionv1
Probability distributionv1
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016
 
Chapter4
Chapter4Chapter4
Chapter4
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
Probability/Statistics Lecture Notes 4: Hypothesis Testing
Probability/Statistics Lecture Notes 4: Hypothesis TestingProbability/Statistics Lecture Notes 4: Hypothesis Testing
Probability/Statistics Lecture Notes 4: Hypothesis Testing
 

Similar to bayesian learning

“I. Conjugate priors”
“I. Conjugate priors”“I. Conjugate priors”
“I. Conjugate priors”
Steven Scott
 
01.ConditionalProb.pdf in the Bayes_intro folder
01.ConditionalProb.pdf in the Bayes_intro folder01.ConditionalProb.pdf in the Bayes_intro folder
01.ConditionalProb.pdf in the Bayes_intro folder
Steven Scott
 
01.conditional prob
01.conditional prob01.conditional prob
01.conditional prob
Steven Scott
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
Sammer Qader
 
Machine learning (3)
Machine learning (3)Machine learning (3)
Machine learning (3)
NYversity
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
Alberto Labarga
 
Logistics regression
Logistics regressionLogistics regression
Logistics regression
SALWAidrissiakhannou
 
Probability and Statistics Assignment Help
Probability and Statistics Assignment HelpProbability and Statistics Assignment Help
Probability and Statistics Assignment Help
Statistics Assignment Help
 
Bayesian Statistics.pdf
Bayesian Statistics.pdfBayesian Statistics.pdf
Bayesian Statistics.pdf
MuhammadAnas742878
 
2018 MUMS Fall Course - Introduction to statistical and mathematical model un...
2018 MUMS Fall Course - Introduction to statistical and mathematical model un...2018 MUMS Fall Course - Introduction to statistical and mathematical model un...
2018 MUMS Fall Course - Introduction to statistical and mathematical model un...
The Statistical and Applied Mathematical Sciences Institute
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The d
MerrileeDelvalle969
 
Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
Matthew L Levy
 
Regression Analysis.pdf
Regression Analysis.pdfRegression Analysis.pdf
Regression Analysis.pdf
ShrutidharaSarma1
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptx
avinashBajpayee1
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
Eric Conner
 
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022SEcon 103 Homework 2Manu NavjeevanAugust 15, 2022S
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
EvonCanales257
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)
NYversity
 
Introduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.pptIntroduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.ppt
Long Dang
 
Binary Classification with Models and Data Density Distribution by Xuan Chen
Binary Classification with Models and Data Density Distribution by Xuan ChenBinary Classification with Models and Data Density Distribution by Xuan Chen
Binary Classification with Models and Data Density Distribution by Xuan Chen
Xuan Chen
 
New test123
New test123New test123

Similar to bayesian learning (20)

“I. Conjugate priors”
“I. Conjugate priors”“I. Conjugate priors”
“I. Conjugate priors”
 
01.ConditionalProb.pdf in the Bayes_intro folder
01.ConditionalProb.pdf in the Bayes_intro folder01.ConditionalProb.pdf in the Bayes_intro folder
01.ConditionalProb.pdf in the Bayes_intro folder
 
01.conditional prob
01.conditional prob01.conditional prob
01.conditional prob
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
Machine learning (3)
Machine learning (3)Machine learning (3)
Machine learning (3)
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Logistics regression
Logistics regressionLogistics regression
Logistics regression
 
Probability and Statistics Assignment Help
Probability and Statistics Assignment HelpProbability and Statistics Assignment Help
Probability and Statistics Assignment Help
 
Bayesian Statistics.pdf
Bayesian Statistics.pdfBayesian Statistics.pdf
Bayesian Statistics.pdf
 
2018 MUMS Fall Course - Introduction to statistical and mathematical model un...
2018 MUMS Fall Course - Introduction to statistical and mathematical model un...2018 MUMS Fall Course - Introduction to statistical and mathematical model un...
2018 MUMS Fall Course - Introduction to statistical and mathematical model un...
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The d
 
Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
 
Regression Analysis.pdf
Regression Analysis.pdfRegression Analysis.pdf
Regression Analysis.pdf
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptx
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
 
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022SEcon 103 Homework 2Manu NavjeevanAugust 15, 2022S
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)
 
Introduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.pptIntroduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.ppt
 
Binary Classification with Models and Data Density Distribution by Xuan Chen
Binary Classification with Models and Data Density Distribution by Xuan ChenBinary Classification with Models and Data Density Distribution by Xuan Chen
Binary Classification with Models and Data Density Distribution by Xuan Chen
 
New test123
New test123New test123
New test123
 

More from Steven Scott

02.bayesian learning
02.bayesian learning02.bayesian learning
02.bayesian learning
Steven Scott
 
Mixture conditional-density
Mixture conditional-densityMixture conditional-density
Mixture conditional-density
Steven Scott
 
Bayesian inference and the pest of premature interpretation.
Bayesian inference and the pest of premature interpretation.Bayesian inference and the pest of premature interpretation.
Bayesian inference and the pest of premature interpretation.
Steven Scott
 
02.bayesian learning
02.bayesian learning02.bayesian learning
02.bayesian learning
Steven Scott
 
Introduction to Bayesian Inference
Introduction to Bayesian InferenceIntroduction to Bayesian Inference
Introduction to Bayesian Inference
Steven Scott
 
00Overview PDF in the Bayes_intro folder
00Overview PDF in the Bayes_intro folder00Overview PDF in the Bayes_intro folder
00Overview PDF in the Bayes_intro folder
Steven Scott
 
Using Statistics to Conduct More Efficient Searches
Using Statistics to Conduct More Efficient SearchesUsing Statistics to Conduct More Efficient Searches
Using Statistics to Conduct More Efficient Searches
Steven Scott
 

More from Steven Scott (7)

02.bayesian learning
02.bayesian learning02.bayesian learning
02.bayesian learning
 
Mixture conditional-density
Mixture conditional-densityMixture conditional-density
Mixture conditional-density
 
Bayesian inference and the pest of premature interpretation.
Bayesian inference and the pest of premature interpretation.Bayesian inference and the pest of premature interpretation.
Bayesian inference and the pest of premature interpretation.
 
02.bayesian learning
02.bayesian learning02.bayesian learning
02.bayesian learning
 
Introduction to Bayesian Inference
Introduction to Bayesian InferenceIntroduction to Bayesian Inference
Introduction to Bayesian Inference
 
00Overview PDF in the Bayes_intro folder
00Overview PDF in the Bayes_intro folder00Overview PDF in the Bayes_intro folder
00Overview PDF in the Bayes_intro folder
 
Using Statistics to Conduct More Efficient Searches
Using Statistics to Conduct More Efficient SearchesUsing Statistics to Conduct More Efficient Searches
Using Statistics to Conduct More Efficient Searches
 

Recently uploaded

The Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac SignThe Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac Sign
my Pandit
 
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
essorprof62
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fix
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fixKalyan chart 6366249026 India satta Matta Matka 143 jodi fix
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fix
satta Matta matka 143 Kalyan chart jodi 6366249026
 
State of D2C in India: A Logistics Update
State of D2C in India: A Logistics UpdateState of D2C in India: A Logistics Update
State of D2C in India: A Logistics Update
RedSeer
 
Discover the Beauty and Functionality of The Expert Remodeling Service
Discover the Beauty and Functionality of The Expert Remodeling ServiceDiscover the Beauty and Functionality of The Expert Remodeling Service
Discover the Beauty and Functionality of The Expert Remodeling Service
obriengroupinc04
 
Enhancing Adoption of AI in Agri-food: Introduction
Enhancing Adoption of AI in Agri-food: IntroductionEnhancing Adoption of AI in Agri-food: Introduction
Enhancing Adoption of AI in Agri-food: Introduction
Cor Verdouw
 
Lukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptxLukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptx
pavelborek
 
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdfPDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
HajeJanKamps
 
Pro Tips for Effortless Contract Management
Pro Tips for Effortless Contract ManagementPro Tips for Effortless Contract Management
Pro Tips for Effortless Contract Management
Eternity Paralegal Services
 
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
dpbossdpboss69
 
deft. 2024 pricing guide for onboarding
deft.  2024 pricing guide for onboardingdeft.  2024 pricing guide for onboarding
deft. 2024 pricing guide for onboarding
hello960827
 
1 Circular 003_2023 ISO 27001_2022 Transition Arrangments v3.pdf
1 Circular 003_2023 ISO 27001_2022 Transition Arrangments v3.pdf1 Circular 003_2023 ISO 27001_2022 Transition Arrangments v3.pdf
1 Circular 003_2023 ISO 27001_2022 Transition Arrangments v3.pdf
ISONIKELtd
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
欧洲杯赌球-欧洲杯赌球买球官方官网-欧洲杯赌球比赛投注官网|【​网址​🎉ac55.net🎉​】
欧洲杯赌球-欧洲杯赌球买球官方官网-欧洲杯赌球比赛投注官网|【​网址​🎉ac55.net🎉​】欧洲杯赌球-欧洲杯赌球买球官方官网-欧洲杯赌球比赛投注官网|【​网址​🎉ac55.net🎉​】
欧洲杯赌球-欧洲杯赌球买球官方官网-欧洲杯赌球比赛投注官网|【​网址​🎉ac55.net🎉​】
valvereliz227
 
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Niswey
 
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
concepsionchomo153
 
2024.06 CPMN Cambridge - Beyond Now-Next-Later.pdf
2024.06 CPMN Cambridge - Beyond Now-Next-Later.pdf2024.06 CPMN Cambridge - Beyond Now-Next-Later.pdf
2024.06 CPMN Cambridge - Beyond Now-Next-Later.pdf
Cambridge Product Management Network
 
L'indice de performance des ports à conteneurs de l'année 2023
L'indice de performance des ports à conteneurs de l'année 2023L'indice de performance des ports à conteneurs de l'année 2023
L'indice de performance des ports à conteneurs de l'année 2023
SPATPortToamasina
 

Recently uploaded (20)

The Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac SignThe Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac Sign
 
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fix
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fixKalyan chart 6366249026 India satta Matta Matka 143 jodi fix
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fix
 
State of D2C in India: A Logistics Update
State of D2C in India: A Logistics UpdateState of D2C in India: A Logistics Update
State of D2C in India: A Logistics Update
 
Discover the Beauty and Functionality of The Expert Remodeling Service
Discover the Beauty and Functionality of The Expert Remodeling ServiceDiscover the Beauty and Functionality of The Expert Remodeling Service
Discover the Beauty and Functionality of The Expert Remodeling Service
 
Enhancing Adoption of AI in Agri-food: Introduction
Enhancing Adoption of AI in Agri-food: IntroductionEnhancing Adoption of AI in Agri-food: Introduction
Enhancing Adoption of AI in Agri-food: Introduction
 
Lukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptxLukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptx
 
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdfPDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
 
Pro Tips for Effortless Contract Management
Pro Tips for Effortless Contract ManagementPro Tips for Effortless Contract Management
Pro Tips for Effortless Contract Management
 
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
 
deft. 2024 pricing guide for onboarding
deft.  2024 pricing guide for onboardingdeft.  2024 pricing guide for onboarding
deft. 2024 pricing guide for onboarding
 
1 Circular 003_2023 ISO 27001_2022 Transition Arrangments v3.pdf
1 Circular 003_2023 ISO 27001_2022 Transition Arrangments v3.pdf1 Circular 003_2023 ISO 27001_2022 Transition Arrangments v3.pdf
1 Circular 003_2023 ISO 27001_2022 Transition Arrangments v3.pdf
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
欧洲杯赌球-欧洲杯赌球买球官方官网-欧洲杯赌球比赛投注官网|【​网址​🎉ac55.net🎉​】
欧洲杯赌球-欧洲杯赌球买球官方官网-欧洲杯赌球比赛投注官网|【​网址​🎉ac55.net🎉​】欧洲杯赌球-欧洲杯赌球买球官方官网-欧洲杯赌球比赛投注官网|【​网址​🎉ac55.net🎉​】
欧洲杯赌球-欧洲杯赌球买球官方官网-欧洲杯赌球比赛投注官网|【​网址​🎉ac55.net🎉​】
 
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
 
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
 
2024.06 CPMN Cambridge - Beyond Now-Next-Later.pdf
2024.06 CPMN Cambridge - Beyond Now-Next-Later.pdf2024.06 CPMN Cambridge - Beyond Now-Next-Later.pdf
2024.06 CPMN Cambridge - Beyond Now-Next-Later.pdf
 
L'indice de performance des ports à conteneurs de l'année 2023
L'indice de performance des ports à conteneurs de l'année 2023L'indice de performance des ports à conteneurs de l'année 2023
L'indice de performance des ports à conteneurs de l'année 2023
 

bayesian learning

  • 1. Bayesian Learning Steven L. Scott In the last section, on conditional probability, we saw that Bayes’ rule can be written p(θ|y) ∝ p(y|θ)p(θ). The distribution p(θ) is called the prior distribution, or just “the prior,” p(y|θ) is the likelihood function, and p(θ|y) is the posterior distribution. The prior distribution describes one’s belief about the value of θ before seeing y. The posterior distribution describes the same person’s belief about θ after seeing y. Bayes’ theorem describes the process of learning about θ when y is observed. 1 An example Let’s look at Bayes’ rule through an example. Suppose a biased coin with success probability θ is indepen- dently flipped 10 times, and 3 successes are observed. The data y = 3 arise from a binomial distribution with n = 10 and p = θ, so the likelihood is p(y = 3|θ) = 10 3 θ3 (1 − θ)7 . (1) What should the prior distribution be? In an abstract problem like this, most people are comfortable assuming that there is no reason to prefer any one legal value of θ to another, which would imply the uniform prior: p(θ) = 1 for θ ∈ (0, 1), with p(θ) = 0 otherwise. This is a common strategy in practice. In the absence of any “real” prior information about a parameter’s value (which is a typical situation), one strives to choose a prior that is “nearly noninformative.” We will see below that this is not always possible, but it is a useful guiding principle. The prior and likelihood for this example are shown in the first two panels of Figure 1. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 θ priordensity 0.0 0.2 0.4 0.6 0.8 1.0 0.000.050.100.150.200.25 θ likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.52.02.53.0 θ posteriordensity (a) (b) (c) Figure 1: Bayesian learning in the binomial example. To find the posterior distribution we simply multiply the prior times the likelihood (which in this case just gives the likelihood), and normalize so that the result integrates to 1. In this case the normalization 1
  • 2. constant is proportional to a mathematical special function known as the “beta function”, and the resulting distribution is a known distribution called the “beta distribution.” The density of the beta distribution with parameters a and b is p(θ) = Γ(a + b) Γ(a)Γ(b) θa−1 (1 − θ)b−1 . (2) If θ is a random variable with the density function in equation (2) then we say θ ∼ Be(a, b). If we ignore factors other than θ and 1−θ we see that in our example a−1 = 3 and b−1 = 7, so our posterior distribution must be Be(4, 8). This distribution is plotted in Figure 1(c). Because it is simply a renormalization of the function in Figure 1(b), the two panels differ only in the axis labels. 2 Conjugate priors The uniform prior used in the previous section would be inappropriate if we actually had prior information that θ was small. For example, if y counted conversions on a website, we might have historical information about the distribution of conversion rates on similar sites. If we can describe our prior belief in the form of a Be(a, b) distribution (i.e. if we can represent our prior beliefs by choosing specific numerical values of a and b), then the posterior distribution after observing y successes out of n binomial trials is p(θ|y) ∝ n y θy (1 − θ)n−y likelihood Γ(a + b) Γ(a)Γ(b) θa−1 (1 − θ)b−1 prior ∝ θy+a−1 (1 − θ)n−y+b−1 . (3) We move from the first line of equation (3) to the second by combining the exponents of θ and 1 − θ, and ignoring factors that don’t depend on θ. We recognize the outcome as proportional to the Be(y+a, n−y+b) distribution. Thus “Bayesian learning” in this example amounts to adding y to a and n − y to b. That’s a helpful way of understanding the prior parameters: a and b represent “prior successes” and “prior failures.” When Bayes’ rule combines a likelihood and a prior in such a way that the posterior is from the same model family as the prior, the prior is said to be conjugate to the likelihood. Most models don’t have conjugate priors, but many models in the exponential family do. A distribution is in the exponential family if its log density is a linear function of some function of the data. That is, if its density can be written p(y|θ) = a(θ)b(y)ec(θ) d(y) . (4) Many of the famous “named” distributions are in the exponential family, including binomial, Poisson, ex- ponential, and Gaussian. The student t distribution is an example of a “famous” distribution that is not in the exponential family. If a model is in the exponential family then it has sufficient statistics: i d(yi). You can find the conjugate prior for an exponential family model by imagining equation (4) as a function of θ rather than y, and renormalizing (assuming the integral with respect to θ is finite). This formulation makes it clear that the parameters of the prior can be interpreted as sufficient statistics for the model, like how a and b can be thought of as prior successes and failures in the binomial example. A second example is the variance of a Gaussian model with known mean. Error terms in many models are often assumed to be zero-mean Gaussian random variables, so this problem comes up frequently. Suppose yi ∼ N 0, σ2 , independently, and let y = y1, . . . , yn. The likelihood function is p(y|σ2 ) = (2π)−n/2 1 σ2 n/2 exp − 1 2σ2 i y2 i . (5) 2
  • 3. Distribution Conjugate Prior binomial beta Poisson / exponential gamma normal mean (known variance) Normal normal precision (known mean) gamma Table 1: Some models with conjugate priors The expression containing 1/σ2 in equation (5) looks like the kernel of the gamma distribution. We write θ ∼ Ga(a, b) if p(θ|a, b) = ba Γ(a) θa−1 exp(−bθ). (6) If one assumes the prior 1/σ2 ∼ Ga df 2 , ss 2 then Bayes’ rule gives p(1/σ2 |y) ∝ 1 σ2 n/2 exp − 1 2σ2 i y2 i likelihood 1 σ2 df 2 −1 exp − ss 2 1 σ2 prior ∝ 1 σ2 n+df 2 −1 exp − 1 σ2 ss + i y2 i 2 ∝ Ga n + df 2 , ss + i y2 i 2 . (7) Notice how the parameters of the prior df and ss interact with the sufficient statistics of the model. One can interpret df as a “prior sample size” and ss as a “prior sum of squares.” It is important to stress that not all models have conjugate priors, and even when they do conjugate priors may not appropriately express certain types of prior knowledge. Yet when they exist, thinking about prior distributions through the lens of conjugate priors can help you understand the information content of the assumed prior. 3 Posteriors compromise between prior and likelihood Conjugate priors allow us to mathematically study the relationship between prior and likelihood. In the binomial example with a beta prior, the Be(a, b) distribution has mean π = a/(a + b) and variance π(1 − π)/(ν + 1), where ν = a + b. It is clear from equation (3) that a acts like a prior number of successes and b a prior number of failures. The mean of the posterior distribution Be(a + y, b + n − y) is thus ˜π = a + y ν + n = ν a/ν ν + n + n y/n ν + n . (8) Equation (8) shows the posterior mean θ is a weighted average of the prior mean a/ν and the mean of the data y/n. The weights in the average are proportional to ν and n, which are the total information content in the prior and the data, respectively. The posterior variance is ˜π(1 − ˜π) n + ν + 1 . (9) The total amount of information in the posterior distribution is often measured by its precision, which is the inverse (reciprocal) of its variance. The precision of Be(a + y, b + n − y) is n ˜π(1 − ˜π) + ν + 1 ˜π(1 − ˜π) , 3
  • 4. which is the sum of the precision from the prior and from the data. The results shown above are not specific to the binomial distribution. In the general setting, the posterior mean is a precision weighted average of the mean from the data and the mean from the prior, while the inverse of the posterior variance is the sum of the prior precision and data precision. This fact helps us get a sense of the relative importance of the prior vs the data in forming the posteriror distribution. 4 How much should you worry about the prior? People new to Bayesian reasoning are often concerned about “assuming the answer,” in the sense that their choice of a prior distribution will unduly influence the posterior distribution. There is good news and bad on this front. 4.1 Likelihood dominates prior First the good news. In regular models with moderate to large amounts of data, the data asymptotically overwhelm the prior. Consider Figure 2, which compares a few different Beta prior distributions with the same data, to see impact on the posterior. In panel (a) the data contain only 10 observations, so varying the a and b parameters in the prior distribution by one or two units each represents an appreciable change in the total available information. Panel2(b) shows the same analysis when there are 100 observations in the data, so moving a prior parameter by one or two units doesn’t have a particularly big impact. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.52.02.53.0 θ density Be(1, 1) Be(.5, .5) Be(2, .5) Be(.5, 2) 0.0 0.2 0.4 0.6 0.8 1.0 02468 θ density Be(1, 1) Be(.5, .5) Be(2, .5) Be(.5, 2) (a) (b) Figure 2: How the posterior distributions varies with the choice of prior. (a) 3 successes from 10 trials, (b) 30 successes from 100 trials. Whatever prior you choose contains a fixed amount of information. If you imagine applying that prior to larger and larger data sets, its influence will eventually vanish. 4
  • 5. 4.2 Sometimes priors do strange things Now for the bad news. Even though many models are insensitive to a poorly chosen prior, not all of them are. If your model is based on means, standard deviations, and regression coefficients, then there is a good chance that any “weak” prior that you choose will have minimal impact. If the model has lots of latent variables and other weakly identified unknowns, then the prior is probably more influential. Because priors can sometimes carry more influence than intended, researchers have spent a considerable amount of time thinking about how to best represent “prior ignorance” using a default prior. Kass and Wasserman 1996 ably summarize these efforts. One issue that can come up is that the amount of information in a prior distributions can depend on the scale in which one views a parameter. For example suppose you place a uniform prior on θ, but then the analysis calls for the distribution of z = log(θ/(1 − θ)). The Jacobian of this transformation implies f(z) = θ(1−θ), which is plotted (as a function of z) in Figure 3. The uniform prior on θ is clearly informative for logit(θ). −10 −5 0 5 10 0.000.050.100.150.200.25 z density Figure 3: The solid line shows the density of a uniform random variable on the logit scale, derived mathematically. The histogram is the logit transform of 10,000 uniform random deviates. 4.3 Should you worry about priors? Sometimes you need to, and sometimes you don’t. Until you get enough experience to trust your intuition about whether a prior is worth worrying about, it is prudent to try an analysis under a few different choices of prior. You can vary the prior parameters among a few reasonable values, or you can experiment to see just how extreme the prior would need to be to derail the analysis. In their paper, Kass and Wasserman made the point that problems where weak priors can make a big difference tend to be “hard” problems where there is not much information in the data, in which case a non-Bayesian analysis wouldn’t be particularly compelling (or in some cases, wouldn’t be possible). If you find that modest variations in the prior lead to different conclusions, then you’re in a hard problem. In that case a practical strategy is to think about the scale on which you want to analyze your model, and choose a prior that represents reasonable assumptions on that scale. State your assumptions up front, and present 5
  • 6. the results with a 2-3 other prior choices to show their impact. Then proceed with your chosen prior for the rest of the analysis. 6