PHIL 6334 - Probability/Statistics Lecture Notes 2:
Conditional Probabilities and Bayes’ theorem
Aris Spanos [Spring 2014]
1

The view from the ( F  ()) perspective

1.1

Conditional probability

Consider the probability set up described by the probability
space ( F  ())  where  - set of all possible outcomes, F field of events of interest, and  () a probability set function
assigning probabilities to events in F
For any two events  and  in F the following formula for
conditional probability holds:
 (|)=

 ( ∩ )
  ()  0
 ()

(1)

This formula treats the events  and  symmetrically, and
thus:
 ( ∩ )
  ()  0
(2)
 (|)=
 ()
Solving (1) and (2) for  ( ∩ ) yields the multiplication
rule:
 ( ∩ )= (|)· ()= (|)· ()

(3)

Substituting (3) into (1) yields the conditional probability:
 (|)=

 (|)· ()
  ()  0
 ()

1

(4)
Example. Consider the random experiment of tossing a
fair coin twice:
 = {() ( ) ( ) (  )}
Let the events of interest be:
= {() ( ) ( )}   ()=75
= {( ) ( ) (  )}   ()=75
The conditional probability of  given  takes the form:
 (|)=

 ( ∩ )
=
 ()

5 2
75 = 3

(5)



since  (∩)= (( ) ( ))=5 Notice also that:
 (|)=

5 2
75 = 3

→ (∩)= (|)· ()=

5
75 (75)

=5

Now consider introducing a third event:
= {() ( ) (  )}   ()=75
What is the conditional probability of  given  and ?
 (| ∩ )=

 ( ∩ [ ∩ ])
  ( ∩ )  0
 ( ∩ )

which in light of the fact that:

 ( ∩ )= (|)· () →  (|)  0  ()  0
 (∩)= (() ( ))=5  (∩)= (( ) (  ))=5
 (∩∩)= ( )=25 which imply that:
 (| ∩ )=

25 1
=   (|)=
5
2
2

2
3
1.2

Bayes’ theorem from the ( F  ()) perspective

The conditional probability formula in (4) is transformed into
an updating rule by interpreting the two events  and  as a
hypothesis  and evidence  to yield Bayes’ formula:
 (|)· ()
 (|)=
  ()  0
(6)
 ()
(i)  (|) as the posterior probability of ,
(ii)  (|) is interpreted as the likelihood of ,
(iii)  () is interpreted as the prior probability of , and
(iv)  () is interpreted as the initial probability of evidence.
Remark 1: Viewed from the probability space ( F  ())
perspective, (6) makes mathematical sense only when the hypothesis  and evidence  belong to the same field F This is
potentially problematic because in empirical modeling  lives
in Plato’s world and  lives in the real world. Hence, (6) presumes that the two worlds can be easily merged in  with 
and  constituting overlapping events. However, Bayesians
feel timid to introduce ( ∩ ) and assign it a probability
using Bayes’ formula:
 (∩)
 (|) =
  ()  0
 ()
Instead, they replace  (∩) with  (|)· (), which
although mathematically equivalent, the terms  (|) and
 () can be given more beguiling interpretations! These issues become more insidious when Bayes’ formula is viewed
from the { (x; θ) θ∈Θ ∈R } perspective.
The most problematic of the probabilistic assignments (i)(iv) is  () because it’s not obvious where the probability
3
could come from. The Bayesians seek to address this conundrum by defining (iv) in terms of (ii)-(iii). In particular, they
use  and not- denoted by (¬, the "catch-all"), to define
a partition of :
= ∪ ¬
and then use  ( ∪ ¬)= ()+ (¬)= ()=1
to deduce the total probability rule:
 ()= ()· (|) +  (¬)· (|¬)

(7)

This rule holds for any set of events (1 2  ) that constitutes a partition of  in the sense that if:
(1 ∪ 2 ∪  ∪ ) = ∩ =∅ for any 6=  =1 2  
X
 ()=
 ()· (|)
=1

The rule in (7) is often used to write Bayes’ formula as:

 (|)=

 (|)· ()
  ()  0
 ()· (|)+ (¬)· (|¬)

(8)

Remark 2: It is important to distinguish between the
formula for conditional probabilities (4), which is totally noncontroversial, and Bayes’ formula (8) which is controversial
because:
(a) it assumes that a hypothesis  and evidence  are just
overlapping events in the same field F and
(b) it invokes the total probability formula to assign a probability to 

4
1.3

Bayesian Confirmation Theory

The Bayesian confirmation theory relies on comparing the
prior with the posterior probability of hypothesis :
 (|)   ()

[i] Confirmation:

[ii] Disconfirmation:  (|)   ()
In case [i] evidence  confirms hypothesis , and in case [ii]
evidence  disconfirms hypothesis .
The degree of confirmation is measured using some
measure c( ) of the "degree to which  raises the probability of ". The most popular such Bayesian measures are:
( )= (|) −  ()

( )= (|) −  (¬)

( )= (|) −  ()

( )= (|) −  (¬)

(|)
( )=  ()

 (|)
( )=  (|¬)

One can use any one of the above measures to argue that:
According to measure c( ), evidence  favors hypothesis 1 over 0 iff:
c(1 )  c(0 )
For instance using the measure ( ) in the case of two
competing hypotheses 0 and 1 :
 (1 |)
 (1 )



 (0 |) Bayes  (|1 )
⇔
 (0 )
 ()
5



 (|0 )
 ()

⇔

 (|1 )
 (|0 )

1
where  (|1) is the (Bayesian) likelihood ratio.
 (|0 )
For comparison purposes let us contrast this to the ratio of
the posteriors:
 (1 |)
 (0 |) =

 (|1 )· (1 )
 ()
 (|0 )· (0 )
 ()

which is the product of

=  (|1)· (1)  1
 (|0 )· (0 )

 (|1 )
 (|0 )

and the ratio of the priors

 (1 )
 (0 ) .

Remark 3: It is important to note that the above measures are considered different only when they are not ordinally equivalent in the sense that they give rise to the same
ranking. This, however, raises serious questions about the appropriateness of such measures since ordinal measures render
the differences between the same ranking uninterpretable; how
can one interpret such differences as measuring the degree of
confirmation.

6

6334 Day 3 slides: Spanos-lecture-2

  • 1.
    PHIL 6334 -Probability/Statistics Lecture Notes 2: Conditional Probabilities and Bayes’ theorem Aris Spanos [Spring 2014] 1 The view from the ( F  ()) perspective 1.1 Conditional probability Consider the probability set up described by the probability space ( F  ())  where  - set of all possible outcomes, F field of events of interest, and  () a probability set function assigning probabilities to events in F For any two events  and  in F the following formula for conditional probability holds:  (|)=  ( ∩ )   ()  0  () (1) This formula treats the events  and  symmetrically, and thus:  ( ∩ )   ()  0 (2)  (|)=  () Solving (1) and (2) for  ( ∩ ) yields the multiplication rule:  ( ∩ )= (|)· ()= (|)· () (3) Substituting (3) into (1) yields the conditional probability:  (|)=  (|)· ()   ()  0  () 1 (4)
  • 2.
    Example. Consider therandom experiment of tossing a fair coin twice:  = {() ( ) ( ) (  )} Let the events of interest be: = {() ( ) ( )}   ()=75 = {( ) ( ) (  )}   ()=75 The conditional probability of  given  takes the form:  (|)=  ( ∩ ) =  () 5 2 75 = 3 (5)  since  (∩)= (( ) ( ))=5 Notice also that:  (|)= 5 2 75 = 3 → (∩)= (|)· ()= 5 75 (75) =5 Now consider introducing a third event: = {() ( ) (  )}   ()=75 What is the conditional probability of  given  and ?  (| ∩ )=  ( ∩ [ ∩ ])   ( ∩ )  0  ( ∩ ) which in light of the fact that:  ( ∩ )= (|)· () →  (|)  0  ()  0  (∩)= (() ( ))=5  (∩)= (( ) (  ))=5  (∩∩)= ( )=25 which imply that:  (| ∩ )= 25 1 =   (|)= 5 2 2 2 3
  • 3.
    1.2 Bayes’ theorem fromthe ( F  ()) perspective The conditional probability formula in (4) is transformed into an updating rule by interpreting the two events  and  as a hypothesis  and evidence  to yield Bayes’ formula:  (|)· ()  (|)=   ()  0 (6)  () (i)  (|) as the posterior probability of , (ii)  (|) is interpreted as the likelihood of , (iii)  () is interpreted as the prior probability of , and (iv)  () is interpreted as the initial probability of evidence. Remark 1: Viewed from the probability space ( F  ()) perspective, (6) makes mathematical sense only when the hypothesis  and evidence  belong to the same field F This is potentially problematic because in empirical modeling  lives in Plato’s world and  lives in the real world. Hence, (6) presumes that the two worlds can be easily merged in  with  and  constituting overlapping events. However, Bayesians feel timid to introduce ( ∩ ) and assign it a probability using Bayes’ formula:  (∩)  (|) =   ()  0  () Instead, they replace  (∩) with  (|)· (), which although mathematically equivalent, the terms  (|) and  () can be given more beguiling interpretations! These issues become more insidious when Bayes’ formula is viewed from the { (x; θ) θ∈Θ ∈R } perspective. The most problematic of the probabilistic assignments (i)(iv) is  () because it’s not obvious where the probability 3
  • 4.
    could come from.The Bayesians seek to address this conundrum by defining (iv) in terms of (ii)-(iii). In particular, they use  and not- denoted by (¬, the "catch-all"), to define a partition of : = ∪ ¬ and then use  ( ∪ ¬)= ()+ (¬)= ()=1 to deduce the total probability rule:  ()= ()· (|) +  (¬)· (|¬) (7) This rule holds for any set of events (1 2  ) that constitutes a partition of  in the sense that if: (1 ∪ 2 ∪  ∪ ) = ∩ =∅ for any 6=  =1 2   X  ()=  ()· (|) =1 The rule in (7) is often used to write Bayes’ formula as:  (|)=  (|)· ()   ()  0  ()· (|)+ (¬)· (|¬) (8) Remark 2: It is important to distinguish between the formula for conditional probabilities (4), which is totally noncontroversial, and Bayes’ formula (8) which is controversial because: (a) it assumes that a hypothesis  and evidence  are just overlapping events in the same field F and (b) it invokes the total probability formula to assign a probability to  4
  • 5.
    1.3 Bayesian Confirmation Theory TheBayesian confirmation theory relies on comparing the prior with the posterior probability of hypothesis :  (|)   () [i] Confirmation: [ii] Disconfirmation:  (|)   () In case [i] evidence  confirms hypothesis , and in case [ii] evidence  disconfirms hypothesis . The degree of confirmation is measured using some measure c( ) of the "degree to which  raises the probability of ". The most popular such Bayesian measures are: ( )= (|) −  () ( )= (|) −  (¬) ( )= (|) −  () ( )= (|) −  (¬) (|) ( )=  ()  (|) ( )=  (|¬) One can use any one of the above measures to argue that: According to measure c( ), evidence  favors hypothesis 1 over 0 iff: c(1 )  c(0 ) For instance using the measure ( ) in the case of two competing hypotheses 0 and 1 :  (1 |)  (1 )   (0 |) Bayes  (|1 ) ⇔  (0 )  () 5   (|0 )  () ⇔  (|1 )  (|0 ) 1
  • 6.
    where  (|1)is the (Bayesian) likelihood ratio.  (|0 ) For comparison purposes let us contrast this to the ratio of the posteriors:  (1 |)  (0 |) =  (|1 )· (1 )  ()  (|0 )· (0 )  () which is the product of =  (|1)· (1)  1  (|0 )· (0 )  (|1 )  (|0 ) and the ratio of the priors  (1 )  (0 ) . Remark 3: It is important to note that the above measures are considered different only when they are not ordinally equivalent in the sense that they give rise to the same ranking. This, however, raises serious questions about the appropriateness of such measures since ordinal measures render the differences between the same ranking uninterpretable; how can one interpret such differences as measuring the degree of confirmation. 6