Bayesian inference is a statistical method that updates beliefs about the likelihood of events or hypotheses based on prior knowledge and observed evidence. Unlike classical statistics, it treats both parameters and data as random variables, combining prior beliefs with new data using Bayes' theorem to compute the posterior distribution. This posterior represents the updated belief about the parameters given the data, allowing for predictions, parameter estimation, and uncertainty quantification. This approach provides a rigorous framework for incorporating prior knowledge into statistical analysis, making it widely used in fields like machine learning, data science, and scientific research for its ability to provide meaningful and interpretable results.
2. Three students are constructing their prior belief
about π, the proportion of Hamilton residents
who support building a casino in Hamilton.
Prior distributions
Anna
prior mean is 0.2
prior standard
deviation is 0.08
beta
distribution(a, b)
Bart
Uniform Prior
p(π) = 1
for 0 <= π <=1.
Beta distribution
a = b = 1
Chris
Trapezoidal shape
continuous prior
3. Compute Anna's prior distribution (beta
distribution) and equivalent sample size.
• Mean = π̥ = 0.2 σ̥ = Standard deviation = 0.8
π̥=
𝑎𝑎
(𝑎𝑎+𝑏𝑏)
0.2 =
𝑎𝑎
(𝑎𝑎+𝑏𝑏)
b = 4a
σ̥=
̥
π 1− ̥
π
𝑎𝑎+𝑏𝑏+1
0.082
=
0.2 1− 0.2
𝑎𝑎+𝑏𝑏+1
0.0064 =
0.16
(5𝑎𝑎+1)
𝑏𝑏 = 4𝑎𝑎
a = 4.8
b= 19.2
�
𝑒𝑒𝑒𝑒
= 𝑎𝑎 + 𝑏𝑏 + 1
= 4.8 +19.2 +1
= 25
Prior Distribution Beta distribution (a = 4.8 , b = 19.2)
4. Bart's prior distribution (beta distribution) and
equivalent sample size.
• uniform prior : a = b = 1
• Equivalent sample size is a + b + 1 = 3
Prior Distribution Beta distribution (a = 1 , b = 1)
6. Find g(π).
• 0 ≤ 𝛑𝛑 ≤ 𝟎𝟎. 𝟏𝟏 𝟎𝟎. 𝟏𝟏 ≤ 𝛑𝛑 ≤ 𝟎𝟎. 𝟑𝟑 𝟎𝟎. 𝟑𝟑 ≤ 𝝅𝝅 ≤ 𝟎𝟎. 𝟓𝟓
• Y = m X + C Y = m X + C Y = m X + C
• 1.0=m * 0.05 + 0 2 = 0*X + C 1.0 = 0.4 *(-2.0/0.2) + C
• m=20 C=2 C=5
• Y=20𝝅𝝅 Y=2 Y= 5-10 𝝅𝝅
g (𝛑𝛑 ) = �
𝟐𝟐𝟐𝟐𝟐𝟐 ; 𝟎𝟎 ≤ 𝛑𝛑 ≤ 𝟎𝟎. 𝟏𝟏
𝟐𝟐 ; 𝟎𝟎. 𝟏𝟏 ≤ 𝛑𝛑 ≤ 𝟎𝟎. 𝟑𝟑
𝟓𝟓 − 𝟏𝟏𝟏𝟏 𝛑𝛑 ; 𝟎𝟎. 𝟑𝟑 ≤ 𝝅𝝅 ≤ 𝟎𝟎. 𝟓𝟓
7. Is g(π) a proper density? Does it have to be
proper prior to find the posterior distribution.
• To check if g(π) is a proper density, we need to ensure that its
integral over the entire real line is equal to 1.
• In this case, the integral can be calculated by summing the areas of
the trapezoids.
• Area=(0.5*2*0.1)+(2*0.2)+(0.5*2*0.2)=0.6 ≠ 1
• Area of the above graph is not equal to 1.
• However; this is not a problem since the relative weights given by
the shape of the distribution are all that is needed since the
constant will cancel out.
So, g (𝛑𝛑 ) is not a proper density.
8. Posterior Distribution of all three students
• The posterior distribution for each student can be obtained using
Bayes' theorem,
• y = 26 successes out of n = 100 trials
For Anna, (a = 4.8, b= 19.2)
• PosteriorAnna ∝ Beta(y+a ,(n-y)+b)
• P( 𝛑𝛑| 𝐲𝐲) α P(𝛑𝛑) x P(Y| 𝛑𝛑)
• P( 𝛑𝛑| 𝐲𝐲) α {(𝛑𝛑𝟒𝟒.𝟖𝟖−𝟏𝟏 (𝟏𝟏 − 𝛑𝛑)𝟏𝟏𝟏𝟏.𝟐𝟐−𝟏𝟏) x (𝛑𝛑𝟐𝟐𝟐𝟐 (𝟏𝟏 − 𝛑𝛑)𝟏𝟏𝟏𝟏𝟏𝟏−𝟐𝟐𝟐𝟐)}
• P( 𝛑𝛑| 𝐲𝐲) α Beta(a=30.8, b=93.2)
Posterior ∝ Likelihood × Prior
10. Now, we need to integrate
this unnormalized posterior
over the entire parameter
space (0 to 0.5) to normalize
it. This is a numerical
integration task that typically
requires the use of
specialized software or
programming libraries.
• For Chris,
• PosteriorChris ∝ Trapezoidal(g(π))
1.For 0≤ 𝛑𝛑 ≤0.1:
Posterior(𝛑𝛑 ∣y)∝ (𝛑𝛑𝟐𝟐𝟐𝟐 (𝟏𝟏 − 𝛑𝛑)𝟏𝟏𝟏𝟏𝟏𝟏−𝟐𝟐𝟐𝟐) ×(20𝛑𝛑)
2.For 0.1≤ 𝛑𝛑 ≤0.3:
Posterior(𝛑𝛑 ∣y)∝ (𝛑𝛑𝟐𝟐𝟐𝟐 (𝟏𝟏 − 𝛑𝛑)𝟏𝟏𝟏𝟏𝟏𝟏−𝟐𝟐𝟐𝟐) × (2)
3.For 0.3≤ 𝛑𝛑 ≤0.5:
Posterior(𝛑𝛑 ∣y)∝ (𝛑𝛑𝟐𝟐𝟐𝟐
(𝟏𝟏 − 𝛑𝛑)𝟏𝟏𝟏𝟏𝟏𝟏−𝟐𝟐𝟐𝟐
) ×(5−10 𝛑𝛑)
Normalized Posterior(π ∣y)=
Posterior(π ∣y)
�
0
1
2 Posterior(π ∣y) 𝑑𝑑π
11. Plots of prior
distributions and
posterior
distributions
• We see that the three
students end up with very
similar posteriors, despite
starting with priors having
quite 3 different shapes.
13. Plots of prior distributions and posterior
distribution using software R.
14. Conclusion
• Anna thinks that the prior distribution is the beta distribution.
• But Bart doesn't know the local feeling about casinos, so he uses a
uniform distribution.
• And Chris thinks it's a Trapezoidal distribution.
• So prior distributions are different.
• But the posterior distributions, are same for all three.
The plot helps visualize how different prior beliefs influence the update
process in Bayesian inference.